The "ptopo.source" from PMIx_Load_topology may not be "hwloc". Newer
version of pmix will append the version of hwloc, e.g. "hwloc:2.9.0".
Thus, we need to use strncmp instead of strcmp.
In case PMIx_Load_topology fail, we have the option of fallback. But we
may want to believe that PMIx_Load_topology should work and see its
error if it fails.
We call MPIR_Typerep_reduce_is_supported to determine whether we do
collective host buffer swap in reduce and allreduce. We may want to make
better decision based on message size, thus we are adding the count to
the parameters.
Add a cvar to disable yaksa reduction for large messages.
Use zeMemPutIpcHandle to release IPC handles from zeMemGetIpcHandle
instead of calling close() which leads to crash at finalize step.
Also clean up caches in zeMemFree hook functions.
MPIDI_POSIX_Bcast_tree_type and MPIDI_POSIX_Reduce_tree_type are used
without proper initialization. Initialize the values in
MPIDI_POSIX_nb_release_gather_comm_init.
As the parent can be changed in release_gather_release for reduce, each
rank needs to wait until the parent has updated the flag; otherwise, a
rank could access the memory that is not available to use.
Add MPIR_CVAR_REDUCE_INTRANODE_MSG_SIZE_THRESHOLD so MPIR_CVAR_REDUCE_INTRANODE_TREE_KVAL
and MPIR_CVAR_REDUCE_INTRANODE_TREE_TYPE are used when the message size is smaller than
or equal to this threshold; while MPIR_CVAR_REDUCE_INTRANODE_TREE_KVAL_LARGE and
MPIR_CVAR_REDUCE_INTRANODE_TREE_TYPE_LARGE are used when the message size is larger than
this threshold.
The ofi injected messages may require explicit progress to kick the
message out. But this is difficult to control due to lack of request
handles. This is problemetic when user issue inject then immediately
dive into computation, or when they move on to a different dedicated vci
and neglect to progress the previous vci, which, by all indication,
has completed and doesn't require progress.
Add the cvar as a remedy.
We still believe this is a libfabric issue.
For nonblocking collective, if the internal pt2pt operations are using
non-0 vci, we need create the nonblocking collective request in the same
vci pool, or it won't be progressed effectively. In fact, current
multi-vci nonblocking collectives relies on global progress to work.
Add MPID_Request_create_from_comm to create request from the per-comm
vci when it is enabled.
MPI_Session_init accepts builtin error handlers from parameters, but the
error systems may not initialized yet. Make sure the builtins are
initialized so we can recognize, for example, MPI_ERRORS_RETURN.
The MPIR_Process.memory_alloc_kinds may not be initialized yet,
initialize them separately for now.
Now we check for session info before the actual init,
MPIR_Process.memory_alloc_kinds may not have been initialized yet when
we call MPI_Session_init. Leave the temp var memory_alloc_kinds NULL if
it is not provided in session info hints, and inherit it from
MPIR_Process.memory_alloc_kinds after it is initialized.
Pass in const namespaces and parent rank instead of the whole
pmix_proc_t. For one, pmix_proc_t may be big too pass in parameters; for
two, pmix_proc_t is too opaque for the semantics here.
The PMI KVS commands may exceed static buffer size due to long value
length. This fixes following error:
mpiexec: src/pmi_wire.c:810: PMIU_cmd_output_v2:
Assertion `!PMIU_cmd_is_static(pmicmd)' failed.
PMIx uses PMIx_Load_topology to load hwloc topology to avoid multiple
process try to redundantly probe hardware and create congestion.
Add MPIR_pmi_load_hwloc_topology as a wrapper function for
PMIx_Load_topology. It provides fallback implementation when
PMIx_Load_topology is not available.
Add a lightweight version of MPIR_pmi_barrier() for PMIx which does not
collect data. Calling MPIR_pmi_barrier_only() in MPII_Init_thread.
Also add MPIR_CVAR_CH4_INIT_SKIP_PMI_BARRIER which by default skip
the barrier.