The simplest way to start a job is by using the srun command, where a single command in a single command line can create a resource allocation and run tasks for a specific job step. With the srun command, parallel transactions on the Slurm system can be started. Slurm srun is often equated with mpirun for MPI type transactions. If these are not parallel tasks of this type, it is better to use sbatch.
There are many options that can be assigned to the srun command. In particular, these options allow user to control which resources are allocated and how tasks are distributed among these resources.
In the example below, the hostname command is executed, namely four tasks (-n 4) are executed on two nodes (-N 2) and task numbers are also included on the output signal (-l). The default partition is used, as it is not specifically defined. By default, however, one job per node is also used.
[user@login0004 ~]$ srun -N 2 -n 4 hostname cn0321 cn0321 cn0320 cn0320
In the following example, when starting a hostname job, two nodes are required, each with ten tasks per node, two CPUs per task (40 CPUs in total), 1 GB of memory on a partition named express, for one hour:
srun --partition=cpu --nodes=2 --ntasks 10 --cpus-per-task 2 \ --time=00:00:30 --mem=1G hostname
More information on starting jobs using srun command is available at: link.
(sbatch) command passes a user-generated batch script to Slurm. A batch script can be assigned to the sbatch command with a file name on the command line, or if no file name is specified, sbatch reads the script from a standard entry. Each script must start with a line
#!/bin/sh, and a batch script can also contain a large variety of options, but each line with a stated option must be preceded by a line
#SBATCH. The required resources and other parameters for the execution of the job (selection of the type of partition or partition itself, duration of the task, determination of the output file, etc.) can be determined with the
#SBATCH parameters, followed by any number of tasks started with the
#!/bin/bash #SBATCH --job-name=test #SBATCH --output=result.txt #SBATCH --ntasks=1 #SBATCH --time=10:00 #SBATCH --mem-per-cpu=100 srun hostname srun sleep 60
The sbatch command stops further processing of
#SBATCH directives when the first line without spaces is reached in the script, which is not a comment, and the command itself shuts down as soon as the script is successfully transferred to the Slurm controller and assigned a job ID. A batch script does not have to allocated resources immediately as it may sit in a queue for some time before the necessary resources become available.
By default, standard output and standard error are directed to a file named
"slurm-% j.out", where
"% j" is replaced by the job assignment number, and the file is created on the first job assignment node. Except for the batch script itself, Slurm does not move user files.
Example of a job running the sbatch command:
$ sbatch --partition=cpu --job-name=test --mem=4G \ --time=5-0:0 --output=test.log myscript.sh
Which is the same as:
$ sbatch -p cpu -J test --mem=4G -t 5-0:0 -o test.log \ myscript.sh
And the same as:
#!/bin/bash #SBATCH --partition=longcpu #SBATCH --job-name=test #SBATCH --mem=4G #SBATCH --time=5-0:0 #SBATCH --output=test.log sh myscript.sh
Good practice: Use
--ntasks-per-nodeswitch. Add an executable bit to your scripts:
chmod a + x my_script.sh
The difference between srun and sbatch
- Both commands are executed with the same switches (options).
- sbatch is the only one to know sets of jobs with the same input - array jobs.
- srun is the only one to know the possibility of performing the --exclusive allocation, which enables the allocation of the entire node and thus the execution of several parallel tasks within one resource allocation (from SLURM v20.02 including additional gres resources, e.g. GPU).