Slurm is a resource manager and job scheduler. Users submit jobs, which are scheduled and allocated resources (CPU time, memory, etc.) by the resource manager.

After preparing the submission script, Slurm is responsible for disturbing it among the cluster's machines (nodes). At the same time it returns job information to the user.

Some useful commands / modes:

  • INTERACTIVE ( allows users to quickly allocate resources on a SLURM cluster to obtain an intercative shell to work in ).
  • SRUN ( srun is used to submit a job for execution in real time ).
  • SBATCH ( sbatch is used to submit a job script for later execution ).

The main difference between SRUN and SBATCH is that srun is interactive and blocking ( you get the result in your terminal and you cannot write other commands until it is finished ), while sbatch is batch processing and non-blocking ( results are written to a file and you can submit other commands right away ).

srun immediately executes the script on the remote host, while sbatch copies the scrpt in an internal storage and then uploads it on the computing node when the job starts.

You tipically use sbatch to submit a job and srun in the submission script to create job steps. sbatch allocate resources to the job, while srun launches parallel tasks across those resources.

 

 SBATCH MAIN OPTIONS

-J, --job-name=<jobname> : Specify a name for the job allocation.

 

--mail-type=<type> : Notify user by email when certain event types occur. Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL ...

--mail-user=<user> : User to receive email notification

 

--mem=<size[units]> : Specify the real memory required per node. Different units can be specified using the suffix [K|M|G|T].

--mem-per-cpu=<size[units]>: Minimum memory required per allocated CPU.

 

-N, --nodes=<minnodes[-maxnodes]>: Request that a minimum of minnodes nodes be allocated to this job

 

-n, --ntasks=<number>  : Used in parallelization. The default is one task per node, but note that the --cpus-per-task option will change this default. 

-c, --cpus-per-task=<ncpus> : Advise the Slurm controller that ensuing job steps will require ncpus number of processors per task

 

-o, --output=<filename pattern> AND -e, --error=<filename pattern> : Instruct Slurm to connect the batch script's standard output/error directly to the filename(s) specified in the "filename pattern". By default both standard output and standard error are directed to the same file. the default file name is "slurm-%j.out", where the "%j" is replaced by the job ID.

 

 

 

SUBMITTING A JOB

#!/bin/bash

 

#SBATCH --job-name=gatk

#SBATCH --cpus-per-task=1

#SBATCH --mem-per-cpu=4G

#SBATCH -o slurm.%j.out

#SBATCH -e slurm.%j.err

 

# mail alert at start, end and abortion of execution

#SBATCH --mail-type=END

#SBATCH [email protected]

 

#running your commands

date

sleep 60

date

 

 

GATHERING INFORMATION

Multithreading is the ability of a CPU or a single core in a multi-core processor to execute multiple processes or threads concurrently, supported by the operating system.

A thread scheduler might be implemented in software to take advantage of this feature.

Usually, the number of threads used in software is equivalent to --cpus-per-task in sbatch.

 

#!/bin/bash

#SBATCH --cpus-per-task=4

 

Run_software --threads 4 [params…]

 

 

Arrays

Submit a number of "near identical" jobs simultaneously in the form of a job array.

In this example, it runs the same script.sh 30 times for different files (they must be named as xxx1..xxx30)

 

$ sbatch --array=1-30 script.sh

 

 

GNU Parallel

Another way to submit a large number of jobs in parallel which files' names differs in more than one number. Within a sbatch that allocates all the resources needed, the srun submits all the jobs in parallel.

You can find more information about how to run programs here.

FastQC

It is a software that provides quality controls for NGS data ( https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ ).

It can be run in a Graphical User Interface (GUI) or in command-line.

It uses multithreading, analysing several files at the same time.

1. Navigate the folder ...

$ cd sbatch_header

 

... OR: 2. Get fastqc and scripts files from a public site

$ git clone https://github.com/situpf/sbatch_header.git

$ cd sbatch_header

$ wget https://sit-web.upf.edu/results/test_fastq_files/test_R1.fastq.gz

$ wget https://sit-web.upf.edu/results/test_fastq_files/test_R2.fastq.gz

3. List files and have a quick look at them

$ ls -htl

 

4. Create a new directory for the results

$ mkdir fastqc

5. Look for the fastQC module and load it

($ interactive)

$ module av fastqc

$ module load FastQC/0.11.7-Java-1.8.0_74

6. Run fastQC in an interactive mode

$ time fastqc  -o fastqc test_R1.fastq.gz

$ time fastqc  -o fastqc test_R2.fastq.gz

7. Take a look at the output files

$ exit #leave interactive

$ module av firefox

$ module load Firefox/44.0.2

$ firefox --no-remote fastqc/*html

8. Run fastQC in an interactive mode (2 CPUs)

$ interactive -c 2 -r training

$ module load FastQC/0.11.7-Java-1.8.0_74

$ time fastqc  -o fastqc test_R1.fastq.gz test_R2.fastq.gz -t 2

9.  Run fastQC using sbatch (edit sbatch_header/header.sh)

#!/bin/bash

#SBATCH --job-name=fastqc

#SBATCH --cpus-per-task=2

#SBATCH --mem-per-cpu=4G

#SBATCH -o slurm.%j.out

#SBATCH -e slurm.%j.err

#SBATCH --mail-type=END

#SBATCH [email protected]

 

module load FastQC/0.11.7-Java-1.8.0_74

date

fastqc -o fastqc test_R1.fastq.gz test_R2.fastq.gz -t 2

date

10. Run sbatch

$ sbatch sbatch_header/header.sh

 

11. Monitor your job

$ squeue -u <user> 

$ sacct -j <jobid>

$ scontrol show job <jobid> -dd

$ seff <jobid>

12. Look at slurm output files

$ more slurm.<jobid>.out

$ more slurm.<jobid>.err