MPI
The Message Passing Interface (MPI) implementation is an important part of an HPC Vega cluster. The vast majority of software that needs to communicate across nodes uses MPI. Although it is not the only means of communication between processes, it is the only one that we provide a standard implementation for that can be used both for compiling and running software. Other more specialized communication implementations are typically built with the software that utilizes them.
More information about MPI can be found at the following location:
There are several implementations of the MPI standard. The one that we use on the HPC Vega is Open MPI.
Open MPI on HPC Vega is available as a module. Search and display for the desired module:
$ module keyword openmpi/4.1.2.1
# Check all options with `tab`
$ module load openmpi/<tab>
Loading the selected module (Default Open MPI version is 4.1.2.1).
- Compile C/C++ code with
openmpi
(eg. mpicc, mpic++, mpiCC and mpicxx) - Compile Fortran code with
openmpi/gnu
(eg. C/C++ compilers, mpifort, mpif70, mpif77 and mpif90) - Compile with latest OneAPI compilers (Used latest version is: 2022.0.2):
oneapi/compiler/latest
. - Compile with OneAPI MPI
oneapi/mpi/latest
Intel OneAPI MPI
After cluster upgrade (OS, Kernel, Mellanox OFED Drivers,..) we recommend usage of the latest version, or the version greater than 2021.6.0
.
Recommended version:
$ module load OpenMPI/4.1.5-GCC-12.3.0
Find older versions:
$ module spider openmpi
Once you have loaded the Open MPI module all commands will be available for running your MPI programs.
$ mpirun ./program
UCX
Unified Communication X (UCX) is an award winning, optimized production-proven communication framework for modern, high-bandwidth and low-latency networks.
Open MPI on Vega is compiled with UCX.
Add the following rows in sbatch script:
export UCX_TLS=self,sm,rc,ud
export OMPI_MCA_PML="ucx"
export OMPI_MCA_osc="ucx"
Single node jobs
Use any version on HPC Vega (eg. cvmfs modules mounted at /cvmfs available through modules or system Open MPI).
Multi node jobs
HPC Vega is using Infiniband, therefore for successful job execution correct environment must be setup.
The system Open MPI module is already configured by default. btl
disabled by default, therefore UCX
takes correct interface (ib0).
In case you are using any other version that is not properly compiled and setup, the correct interface (ib0) must be selected. An example is available below.
Usage of srun
:
Check supported API interfaces with command. We recommend usage of pmix
.
srun --mpi=list
Example of SBATCH script:
#!/bin/bash
#SBATCH --jobname=multinode-srun-test
#SBATCH --nodes=2
#SBATVH --ntasks=4
#SBATCH --ntasks-per-node=2
#SBATCH --time=00:10:00
#SBATCH --output %j.output
#SBATCH --partition=cpu
srun --mpi=pmix_v3 ./mpibench
Example of SBATCH script:
#!/bin/bash
#SBATCH --jobname=multinode-srun-test
#SBATCH --nodes=2
#SBATVH --ntasks=4
#SBATCH --ntasks-per-node=2
#SBATCH --time=00:10:00
#SBATCH --output %j.output
#SBATCH --partition=cpu
module load oneapi/mpi/2021.6.0
srun --mpi=pmix_v3 ./mpiBench-Intel
Usage of mpirun
:
Open MPI, version 3.x
$ mpirun -mca btl_tcp_if_include ib0 ./mpibench
Open MPI, version prior to 4.0.3
Enable ucx
and disable btl
.
Example of SBATCH script:
#!/bin/bash
#SBATCH --job-name=multinode-mpirun-test
#SBATCH --nodes=2
#SBATVH --ntasks=4
#SBATCH --ntasks-per-node=2
#SBATCH --time=00:15:00
#SBATCH --output %j.output
#SBATCH --partition=cpu
module load OpenMPI/4.0.3-GCC-9.3.0
mpirun -mca pml ucx -mca btl ^uct,tcp,openib,vader --bind-to core ./mpibench
mpiBench
can be found at:
- Source code: mpiBench
- Compiled code:
/ceph/hpc/software/mpibench
For more information for multithreaded, multicore and other jobs with Open MPI, Open MP, hybrid can be found at: LINK.