The most important issue that we have to take in account when we want to run a parallel job in slurm is to use always the ‘srun’ command.


We have two ways to run a parallel job:


  1. Using a bash script and using the sbatch command.
  2. Using the srun command directly


Using a bash script and using the sbatch command

First, we have to prepare a bash script in which we are going to indicate how many computers and how many cores each computer we ask for will run our job with:

#SBATCH -p normal # partition (queue)
#SBATCH -N 2 # number of nodes
#SBATCH --ntasks-per-node=4 # number of cores in each node
#SBATCH --mem-per-cpu 1000 # mem to user for each core
#SBATCH -t 0-00:15 # time (D-HH:MM)
#SBATCH -o slurm.%N.%j.out # STDOUT
#SBATCH -e slurm.%N.%j.err # STDERR
#SBATCH --mail-type=START,END,FAIL # notifications for job start, done & fail
#SBATCH [email protected] # send-to address


# Parallel program to run
srun mpi-example/mpi_mm



We can use diferents flags, for instance:


#SBATCH --ntasks=16 : How many cores in total we are asking for. These cores are going to be split by nodes (example: we can use 10 cores in one node and 6 cores  in the second node)
#SBATCH --mem=16G : Specifies the real memory required per node. A memory size specification of zero is treated as a special case and grants the job access to all of the memory on each node.


Using the srun command directly


We can avoid using a job script and indicate all the flags in a command line:


srun -N 4 --ntasks-per-node=8 --mem-per-cpu=2GB -t 00:30:00 mpi-example/mpi_mm


This way can be very useful if we want to use a script that send several jobs to the cluster.