The most important issue that we have to take in account when we want to run a parallel job in slurm is to use always the ‘srun’ command.
We have two ways to run a parallel job:
- Using a bash script and using the sbatch command.
- Using the srun command directly
Using a bash script and using the sbatch command
First, we have to prepare a bash script in which we are going to indicate how many computers and how many cores each computer we ask for will run our job with:
# Parallel program to run
We can use diferents flags, for instance:
#SBATCH --ntasks=16 : How many cores in total we are asking for. These cores are going to be split by nodes (example: we can use 10 cores in one node and 6 cores in the second node)
#SBATCH --mem=16G : Specifies the real memory required per node. A memory size specification of zero is treated as a special case and grants the job access to all of the memory on each node.
Using the srun command directly
We can avoid using a job script and indicate all the flags in a command line:
srun -N 4 --ntasks-per-node=8 --mem-per-cpu=2GB -t 00:30:00 mpi-example/mpi_mm
This way can be very useful if we want to use a script that send several jobs to the cluster.