Define macro PMI_FROM_3RD_PARTY if linked with a 3rd party pmi
library such as cray, Slurm, and openpmix. They often run into issues
when quieried with nonexistent job attribute keys such as
PMI_process_mapping, PMI_hwloc_pmi, etc.
We have a few patches recently to workaround openpmix with non-existent
job attributes.
Cray PMI used to work, but we encountered a hang with PALS v1.3.4.
This commit skips querying these keys as job attributes to bypass these
potential issues. They are often not supported by 3rd party pmi anyway.
PGI compilers, now nvc, spits many warnings by default, some of them
not worth to fix. For an example, the warning,
branch_past_initialization force splitting declaring and initializing
local variables with local scopes.
Suppress them by options instead.
1. Separate arch related fast options to `sse2`, `avx` and `avx512f`.
The names represent the corresponding compiler "-m" flags. The `sse2`
is enabled by default because it is widely supported. These options
are consolidated in MPL's configure. Note, these flags will only be
added to the CFLAGS when building MPL, not parent library.
2. The MPL_Memcpy_stream function is uninlined to accommodate for this
change. Both AMD and Intel architecture show no performance
difference uninline it (data included in PR).
Propagating errors in collectives is a very fragile and complex way of
collective error checking and deadlock prevention. It also takes away 2
bits from the valuable tag space. Set the default to disable.
[This is a testing commit].
PMIx uses PMIx_Load_topology to load hwloc topology to avoid multiple
process try to redundantly probe hardware and create congestion.
Add MPIR_pmi_load_hwloc_topology as a wrapper function for
PMIx_Load_topology. It provides fallback implementation when
PMIx_Load_topology is not available.
non-temporal store also available in SSE2 instruction set, which should
be generally available on x86 arch. The SSE2 version serves as a
fallback if AVX is not available.
Remove the configure option
--enable-error-messages={all,generic,class,none}. Support "all" by
default.
attr: pass thru user callback error
Return from attribute callback function are supplied by user and
potentially are meaningful to users. Direct return the code rather than
creating new code.
The macOS linker in Xcode 15 only accepts -commons,use_dylibs in
-ld_classic mode. Historically we used this option for building MPICH
with the ifort compiler, which is no longer supported on macOS. Remove
the use of this flag at build time and also from the wrapper
scripts. See more discussion at
https://gitlab.com/petsc/petsc/-/merge_requests/7341#note_1810586723.
This reverts commit
4d93cefa8f. pmodels/mpich#6904 has a valid
use-case that we should support. An additional commit will modify the
compile scripts so these flags will not leak out of the build.
* Add option -mpi-abi to mpicc/mpicxx to use the standard MPI ABI.
* Implement mpicc/mpicc_abi and mpicxx/mpicxx_abi from a single source
script to ease maintenance, avoid code duplication, and prevent the
code from becoming out of sync.
We need to add the include dir of the both srcdir tree and builddir tree
to AM_CPPFLAGS if the header files is generated. Rather than inflate
AM_CPPFLAGS, move mpif90model.h.in to src/include since it is already in
AM_CPPFLAGS.
* Pass --enable-mpi-abi to romio configure
* Build libromio_abi.so and libpromio_abi.so when enabled
* Include mpi_abi.h instead of mpi.h when BUILD_MPI_ABI is defined
* internally adio.h will include mpio.h without abi, and
romio_abi_internal.h with abi. I.e. mpio.h is skipped ifdef
BUILD_MPI_ABI.
ISO_C_BINDING is an intrisic feature added since Fortran 2003. Check
this feature so we can provide the generic interface using Type(c_ptr),
e.g. in MPI_Alloc_mem.
The HAVE_ROMIO macro is defined in mpichconf.h, not available in
mpicxx.h. Instead, use autoconf macro HAVE_CXX_IO and do substitution
into "#if 1" or "#if 0" at configure time.
The PAC_HAVE_ROMIO macro is called inside the enable_romio branch, thus
won't work with --disable_romio. It is simple enough to put in
configure.ac directly.
PMIX_INFO_LOAD and PMIX_ERR_NOT_IMPLEMENTED are deprecated but still
used by some client implementations. Our internal client will not
support them, so add workarounds.
When MPICH_DEBUG_REQUEST is defined (config option to be added), we add
an info string to the request objects that can be used to update
readable debug information to the request.
Allow users to force --with-pm and --with-pmilib options. It allows
users to build hydra even with external pmi library or build embedded
libpmi concurrently with an external pmi library (e.g. libpmix). This
may not work due to conflicts, but it is at user's descretion.
Use may load 3rd party pmi libraries with:
--with-pmi1=path (libpmi.so)
--with-pmi2=path (libpmi2.so)
--with-pmix=path (libpmix.so)
As of this commit, following options are more or less independent:
--with-pmi=default|pmi1|pmi2|pmix
--with-pmilib=default|no|mpich|install
--with-pm=default|no|gforker|remshell|hydra
The default option can be internally overwritten. with_pmilib=no will
not build internal(embedded) libpmi. with_pm=no will not build internal
pm.
A special option is --with-pmi=slurm. This is mainly because Slurm installs
header as #include "slurm/pmi.h".
Make sure to set LIBS before checking PMI capability.
Also disable built-in pm if user is configuring with external pmi
library. Hydra can be installed separated if needed.
Cray PMI does not define PMI2_keyval_t. Check it in configure and
typedef INFO_TYPE PMI2_keyval_t if needed. This will allow mpich to compile using
Cray PMI. The PMI2_keyval_t is only used in PMI2_Job_Spawn. The name
publishing API always uses NULL for its info pointers. It is possible
that our INFO_TYPE is incompatible with Cray PMI's internal type, which
will break PMI2_Job_spawn. If so, this need be fixed in the future,
possibly on the Cray side.
There should be negligible difference between error-checking all and
runtime, but the latter allows users to use MPIR_CVAR_ERROR_CHECKING to
disable error checking at runtime.
We allowed use of --with-pmi=[cray|slurm] as convenience options.
Deprecate these options to encourage downstream to use standardized PMI
interfaces.
Rename the cray option to oldcray since the newer Cray environment
should work with the standard `--with-pmi=path` or `--with-pmi2=path`
options directly.
The --with-pmilib option is used to configure special settings for slurm
and cray pmi. Deprecate this option as downstream, e.g. cray, is fixing
the non-conformant implementations.
User still can manually set --with-pmi in order to work with old Slurm
and Cray libraries until the newer implementation from Slurm and Cray
become ubiquitous.
When hydra's -disable-auto-cleanup, it sends SIGUSR1 to notify process
failures. ch3:sock does not support this feature, but at least ignore
the signal to prevent it being killed by it.
Add a port of cpi.c to CUDA. Use a GPU kernel to compute the partial
areas at each process, then sum them with a final MPI_Reduce from device
memory into CPU memory. This is intended to be used as smoke test for
functioning GPU support.