MPI

The Message Passing Interface (MPI) implementation is an important part of an HPC Vega cluster. The vast majority of software that needs to communicate across nodes uses MPI. Although it is not the only means of communication between processes, it is the only one that we provide a standard implementation for that can be used both for compiling and running software. Other more specialized communication implementations are typically built with the software that utilizes them.

More information about MPI can be found at the following location:

Message Passing Interface (MPI)

There are several implementations of the MPI standard. The one that we use on the HPC Vega is Open MPI.

Open MPI on HPC Vega is available as a module. Search and display for the desired module:

$ module keyword openmpi/4.1.2.1

# Check all options with `tab`
$ module load openmpi/<tab>

Loading the selected module (Default Open MPI version is 4.1.2.1).

Compile C/C++ code with openmpi (eg. mpicc, mpic++, mpiCC and mpicxx)
Compile Fortran code with openmpi/gnu (eg. C/C++ compilers, mpifort, mpif70, mpif77 and mpif90)
Compile with latest OneAPI compilers (Used latest version is: 2022.0.2): oneapi/compiler/latest.
Compile with OneAPI MPI oneapi/mpi/latest

Intel OneAPI MPI

After cluster upgrade (OS, Kernel, Mellanox OFED Drivers,..) we recommend usage of the latest version, or the version greater than 2021.6.0.

Recommended version:

$ module load OpenMPI/4.1.5-GCC-12.3.0

Find older versions:

$ module spider openmpi

Once you have loaded the Open MPI module all commands will be available for running your MPI programs.

$ mpirun ./program

UCX

Unified Communication X (UCX) is an award winning, optimized production-proven communication framework for modern, high-bandwidth and low-latency networks. Open MPI on Vega is compiled with UCX.
Add the following rows in sbatch script:

export UCX_TLS=self,sm,rc,ud
export OMPI_MCA_PML="ucx"
export OMPI_MCA_osc="ucx"

Single node jobs

Use any version on HPC Vega (eg. cvmfs modules mounted at /cvmfs available through modules or system Open MPI).

Multi node jobs

HPC Vega is using Infiniband, therefore for successful job execution correct environment must be setup.

The system Open MPI module is already configured by default. btl disabled by default, therefore UCX takes correct interface (ib0).

In case you are using any other version that is not properly compiled and setup, the correct interface (ib0) must be selected. An example is available below.

Usage of `srun`:

Check supported API interfaces with command. We recommend usage of pmix.

srun --mpi=list

Example of SBATCH script:

#!/bin/bash
#SBATCH --jobname=multinode-srun-test
#SBATCH --nodes=2
#SBATVH --ntasks=4
#SBATCH --ntasks-per-node=2
#SBATCH --time=00:10:00
#SBATCH --output %j.output
#SBATCH --partition=cpu

srun --mpi=pmix_v3 ./mpibench

Example of SBATCH script:

#!/bin/bash
#SBATCH --jobname=multinode-srun-test
#SBATCH --nodes=2
#SBATVH --ntasks=4
#SBATCH --ntasks-per-node=2
#SBATCH --time=00:10:00
#SBATCH --output %j.output
#SBATCH --partition=cpu

module load oneapi/mpi/2021.6.0

srun --mpi=pmix_v3 ./mpiBench-Intel

Usage of `mpirun`:

Open MPI, version 3.x

$ mpirun -mca btl_tcp_if_include ib0 ./mpibench

Open MPI, version prior to 4.0.3

Enable ucx and disable btl. Example of SBATCH script:

#!/bin/bash
#SBATCH --job-name=multinode-mpirun-test
#SBATCH --nodes=2
#SBATVH --ntasks=4
#SBATCH --ntasks-per-node=2
#SBATCH --time=00:15:00
#SBATCH --output %j.output
#SBATCH --partition=cpu

module load OpenMPI/4.0.3-GCC-9.3.0

mpirun -mca pml ucx -mca btl ^uct,tcp,openib,vader --bind-to core ./mpibench

mpiBench can be found at:

Source code: mpiBench
Compiled code: /ceph/hpc/software/mpibench

For more information for multithreaded, multicore and other jobs with Open MPI, Open MP, hybrid can be found at: LINK.

MPI

UCX

Single node jobs

Multi node jobs

Usage of srun:

Usage of mpirun:

Usage of `srun`:

Usage of `mpirun`: