Index

0. Brief cluster description

1. Getting an Account

2. Connection

2.1 Connecting outside the PRBB

3. Directory structure & Quota

4. Software

     5.1 Running Programs

     5.2 Submitting to the queue

     5.3 queue information

     5.4 Watching over your jobs

     5.5 Job output

6. Job arrays

               GNU Parallel: An alternative to job arrays

7. Interactive Jobs

8. Paralel Jobs

     8.1 Using a bash script and using the sbatch command

     8.2 Using the srun command directly

9. GPU Resource

10. Advice and Good clustering practices

11. MATLAB

12. Inspiration

 

0. Brief cluster description

 

MARVIN has:

- 12 computing nodes with 16 cores and 128GB RAM.

- 1 computing node with 20 cores and 256GB RAM.

- 4 computing nodes with 32 cores and 256GB RAM each.

- 1 computing node with 48 cores and 512GB RAM.  

All in all 388 cores. The disk space available for MARVIN and the nodes are approximately 645TB each, divided into three main groups:

 

homes: 20 TB

scratch: 268 TB

projects: 357 TB

 

All of them GPFS. Thus meaning high performance access. The OS running in the computing nodes is CentOS 7.x x86_64

 

1. Getting an account

In order to get an account, send an email containing your full name from the academic email account where you want to receive any cluster notification to sit@upf.edu.

2. Connection

The first thing to do is connect to the cluster. Depending what O.S. you are using it will be different.

Windows:

Install Putty (https://www.chiark.greenend.org.uk/~sgtatham/putty/download.html) Type in the “Host name”: username@marvin.s.upf.edu type your password and login.

Linux / MacOS:

Open a terminal and type: ssh username@marvin.s.upf.edu

 

If you want to use programs which have a GUI (Graphical User Interface) you will need to do the following:

Windows:

Install Xming server or any other X-Window server (you will probably need to install a fonts package), run it and when you run putty, go to the left options tree-menu and check: Connection – SSH – X11 – Enable X11 forwarding.

Linux / MacOS:

Open a terminal and type: ssh username@marvin.s.upf.edu -X (its an upper-case X, lower case x does the oposite)

2.1 Connecting outside the PRBB

The cluster front-end “mr-login” doesn’t have direct access to the internet, so in order to connect to marvin from OUTSIDE the PRBB you’ll have to install the UPF’s VPN client. You can get it here: https://www.upf.edu/bibtic/serveis/xarxes/vpn/

3. Directory structure & quota

Inside the cluster there are three main filesets:

 

Fileset

description

quota

backup

homes

non-replicable data, like scripts and such

50Gb

– You can check your quota with:

 

myquota

weekly

scratch

extension of your home, without backup nor quota and shared data

no

no

projects

shared data with other users

no

weekly

 

4. Software

By default all of the installed software is not directly available at the command prompt. There are a large number of software applications, libraries, tools, and compilers installed on Marvin. Some of the installations require environment settings that are incompatible with other installations. The Marvin-Cluster ‘Environment Modules’ are used to activate (load) software required by the user. The module commands allow the user to load/unload software, and the modules packages modify the user environment variables necessary to run the software. The modules package is used to customize user environment settings. It allows the user to control what versions of a software package will be used when the user compiles or runs a program in their current login session or batch job.

Software Available For a current list of all software available enter the command “module avail” at the command prompt. Software Install Request To submit a request to have software installed on the cluster send us an e-mail to sit@upf.edu with the subject “SOFTWARE INSTALL REQUEST” with the software name, link to download, and any concern you have about it.

Summary of Basic Module Commands

For a current list of all available modules (software) installed use the command:

 module avail

Basic help on each module:

module help

Display details for each module:

module display

For additional information:

man module

 

Intel Compilations

By default the software you can see with modules are compiled with GCC toolchains. If you wish to compile your software with the Intel compiler for a better performance please add this requirement when you send the email for your install request.

 

To show the software compilated with intel type the folowing:

module load modulepath/arch

module av

 

To go to the previous modules, loaded with the toolchains which contain GCC type:

module load modulepath/noarch

 

When you change from no arch to arch modules or conversely, the loaded modules will remain inactive.

5.1 Running Programs

Once you are connected to marvin, please, DO NOT RUN ANYTHING THERE. No really, this is a cluster, and you are suposed to run things here, but “here” it is not “the cluster” yet. Now you are connected to “mr-login”, which is the “front end” of the cluster. This means that this computer is only a shared “entry point” to the whole cluster. It is shared by all the users and all the proceses, so, if you run things here, you will go slower, and you’ll slow other people too (and we can stop without warning anything we find running this way). If you want to launch anything, you have to submit it to a queue, as explained below.

 

5.2 Submitting a job to the queue system

 

To use the queue system all you need to do si:

sbatch scriptname

 

Where scriptname is a script that contains whatever you need in order to run the programs you want to.

 

Example:

 

Let’s say we’re already connected to marvin:

 

 

-bash-4.2$ pwd

/homes/users/agonzalez

 

In our case, we just want to check the date, sleep and re-check the date:

 

-bash-4.2$ cat exemple.sh

#!/bin/bash

 

# set the partition where the job will run

#SBATCH --partition=normal

 

# set the number of nodes

#SBATCH --nodes=1

 

# set max wallclock time

#SBATCH --time=1:00:00

 

# mail alert at start, end and abortion of execution

#SBATCH --mail-type=ALL

 

# send mail to this address

#SBATCH --mail-user=alfons.gonzalez@upf.edu

 

# run the application

DIR=/homes/users/agonzalez/test-slurm

date > $DIR/testing.log

sleep 60

date >> $DIR/testing.log

-bash-4.2$

Now let’s send it:

 

-bash-4.2$ sbatch exemple.sh

Submitted batch job 4196

-bash-4.2$ squeue

            JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

             4196    normal exemple. agonzale  R       0:03      1 mr-00-01

-bash-4.2$

 

Notice that the squeue command has given us info about our job

As we’ve tell the queue system that we want to receive a mail at start, abortion or end of execution, once we receive the mail we can check the results:

 

-bash-4.2$ ls test-slurm/

testing.log

-bash-4.2$ cat test-slurm/testing.log

Tue Feb  7 16:06:47 CET 2017

Tue Feb  7 16:07:47 CET 2017

-bash-4.2$

  

man sbatch, man squeue and man sinfo will give you more information about how to use those commands to submit your jobs and monitor them.

5.3 Queue information

 

 

5.4 Watching over your jobs

 

Useful commands of the queue system:

 

squeue: Shows your currently waiting or running queued jobs.

 

sinfo: view information about Slurm nodes and partitions

 

sacct -j jobid: Shows information about already finished jobs

 

scancel jobid: Deletes from the queue a job. You can delete only your own jobs

 

smem: Shows information about the cluster occupacy

 

sit-web.upf.edu/sitservices: Web interface where you can see the status of the cluster.

 

Examples of useful commands

 

command

description

squeue -u <username>

List all current jobs for a user

squeue -u <username> -t RUNNING

List all running jobs for a user

squeue -u <username> -t PENDING

List all pending jobs for a user

showq-slurm -o -U -q <partition>

List priority order of jobs for the current user (you) in a given partition

scontrol show jobid -dd <jobid>

List detailed information for a job (useful for troubleshooting)

sstat --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j <jobid> --allsteps

List status info for a currently running job

scancel -t PENDING -u <username>

To cancel all the pending jobs for a user

 

5.5 Job output

 

By default standard error and standard output of your job will go to the same file, called slurm-%j.out , where %j is the JOBID number of your job.

 

standard error contains all the messages that your job has sent to the error channel, i.e. “command not found”, “no such file or directory”…

 

standard output contains all the messages that your job has sent to the standard output messages, i.e. anything you see written.

 

It is highly advisable that you control where standard error and standard output go. You can do it using sbatch flags -e for standard error and -o for standard output.

 

6. Job arrays

 

SLURM allows you to submit a number of "near identical" jobs simultaneously in the form of a job array. To take advantage of this, you will need a set of jobs that differ only by an "index" of some kind.

For example, say that you would like to run tophat, a splice-aware transcript-to-genome mapping tool, on 30 separate transcript files named trans1.fq, trans2.fq, trans3.fq, etc.

 

First, construct a SLURM batch script, called tophat.sh, using special SLURM job array variables:

 

tophat.sh

#!/bin/bash

#SBATCH -J tophat # A single job name for the array

#SBATCH -n 1 # Number of cores

#SBATCH -N 1 # All cores on one machine

#SBATCH -p fast # Partition

#SBATCH --mem 4000 # Memory request (4Gb)

#SBATCH -t 0-2:00 # Maximum execution time (D-HH:MM)

#SBATCH -o tophat_%A_%a.out # Standard output

#SBATCH -e tophat_%A_%a.err # Standard error  

module load TopHat/2.0.11

tophat /whatever/Mus_musculus/mm10/chromFatrans"${SLURM_ARRAY_TASK_ID}".fq

 

Then launch the batch process using the --array option to specify the indexes.

 

sbatch --array=1-30 tophat.sh

 

In the script, two types of substitution variables are available when running job arrays. The first, %A and %a, represent the job ID and the job array index, respectively. These can be used in the sbatch parameters to generate unique names. The second, SLURM_ARRAY_TASK_ID, is a bash environment variable that contains the current array index and can be used in the script itself. In this example, 30 jobs will be submitted each with a different input file and different standard error and standard out files.

GNU Parallel: An alternative to job arrays

 

There is another way to submit a large number of jobs in parallel which files’ names differs in more than one number (like 1-30 in the example below). With GNU parallel, the user should check how many CPUs is able to use (replace lab_sit with your lab name):

 

-bash-4.2$ sacctmgr show qos lab_sit format="Name,GrpTRES,MaxTRESPerUser"

 

     Name       GrpTRES     MaxTRESPU

---------- ------------- -------------

  lab_sit       cpu=450       cpu=450

 

In this example, we want to run STAR to map 84 samples (paired-end FastQ = 168 files) to a reference, and we created this script:

 

run_all_in_nodes.sh

#!/bin/bash

 

module load parallel

module load STAR/2.5.2b-foss-2016b

 

### input file = FW / define RV file

fw=$1

rv=`echo $fw | sed 's/R1_001/R2_001/'`

 

### print sample identifier and the node where the job is running

echo "Running sample $2 in `hostname`"

 

### STAR index / prefix to output files

ind=/path/to/STAR/index/

pref=output_path/$2

 

### STAR command

STAR --runThreadN 8 --genomeDir $ind --readFilesIn $fw $rv --readFilesCommand zcat --outFileNamePrefix $pref --outSAMtype BAM SortedByCoordinate

 

Each job (or task) uses 8 threads (or CPUs), so we should calculate how many tasks we can ask to run the maximum number of jobs simultaneously (in this example, c=cpus-per-task=8)

 

-bash-4.2$ c=8

-bash-4.2$ sinfo | gawk -v c=$c 'BEGIN{n=0}!/^NODELIST/{split($3,x,"/"); split(x[2]/c,y,".");n+=y[1]}END{print n-1}'

29

 

Pay attention to the maximum CPUs per user that we saw before!!! We are going  to use  29x8=232 CPUs (<450 CPUs available per user).

Now, we prepare the script:

 

parallel.sbatch

#!/bin/bash

 

### Here, we are saying to use all the nodes availables and the number of CPUs per job

#SBATCH --nodes=1-18

#SBATCH --cpus-per-task=8

 

### Define number of tasks:

### 1. Number of tasks from the calculation done before

#SBATCH --ntasks=29

### 2. (Not used) Define how many tasks per node in all requested nodes. It’s a fixed number, it will only run 2 tasks per node when it’s possible (if a node has only 8 CPUs, it will not use them;  and if a node has 32 CPUs it will only use 16)

# #SBATCH --ntasks-per-node=2

 

module load parallel

 

if [ ! -d star ];then mkdir star;fi

 

### execute each job as a task in a node, with 8 cpus

srun="srun -N1 -n1 -c8"

 

### parallel how many jobs to start: -j 0 -> run as many jobs as possible

parallel="parallel --delay .2 -j 0"

 

### run the parallel command

ls path/to/fastq/*R1*.fastq.gz | $parallel "$srun ./run_all_in_nodes.sh {} {/.} "

 

And finally, run multiple jobs that can be controlled by the same job ID:

 

-bash-4.2$ sbatch parallel.sbatch

Submitted batch job 110766

 

-bash-4.2$ j

 JOBID PARTITION NAME       USER          TIME_LEFT    TIME_LIMIT       START_TIME   ST NODES  CPUS  NODELIST(REASON

110766 normal    parallel.  mtormo        UNLIMITED     UNLIMITED 2017-08-31T09:45    R    12   232  mr-00-[03-11,14

 

-bash-4.2$ sacct -j 110766

      JobID    JobName  Partition    Account  AllocCPUS      State ExitCode

------------ ---------- ---------- ---------- ---------- ---------- --------

110766       parallel.+     normal lab_genom+        232    RUNNING      0:0

110766.0     run_all_i+            lab_genom+          8    RUNNING      0:0

110766.1     run_all_i+            lab_genom+          8    RUNNING      0:0

110766.2     run_all_i+            lab_genom+          8    RUNNING      0:0

110766.3     run_all_i+            lab_genom+          8    RUNNING      0:0

110766.4     run_all_i+            lab_genom+          8    RUNNING      0:0

110766.5     run_all_i+            lab_genom+          8    RUNNING      0:0

110766.6     run_all_i+            lab_genom+          8    RUNNING      0:0

110766.7     run_all_i+            lab_genom+          8    RUNNING      0:0

110766.8     run_all_i+            lab_genom+          8    RUNNING      0:0

110766.9     run_all_i+            lab_genom+          8    RUNNING      0:0

110766.10    run_all_i+            lab_genom+          8    RUNNING      0:0

110766.11    run_all_i+            lab_genom+          8    RUNNING      0:0

110766.12    run_all_i+            lab_genom+          8    RUNNING      0:0

110766.13    run_all_i+            lab_genom+          8    RUNNING      0:0

110766.14    run_all_i+            lab_genom+          8    RUNNING      0:0

110766.15    run_all_i+            lab_genom+          8    RUNNING      0:0

110766.16    run_all_i+            lab_genom+          8    RUNNING      0:0

110766.17    run_all_i+            lab_genom+          8    RUNNING      0:0

110766.18    run_all_i+            lab_genom+          8    RUNNING      0:0

110766.19    run_all_i+            lab_genom+          8    RUNNING      0:0

110766.20    run_all_i+            lab_genom+          8    RUNNING      0:0

110766.21    run_all_i+            lab_genom+          8    RUNNING      0:0

110766.22    run_all_i+            lab_genom+          8    RUNNING      0:0

110766.23    run_all_i+            lab_genom+          8    RUNNING      0:0

110766.24    run_all_i+            lab_genom+          8    RUNNING      0:0

110766.25    run_all_i+            lab_genom+          8    RUNNING      0:0

110766.26    run_all_i+            lab_genom+          8    RUNNING      0:0

110766.27    run_all_i+            lab_genom+          8    RUNNING      0:0

110766.28    run_all_i+            lab_genom+          8    RUNNING      0:0


For more information about the GNU parallel options and advantages, pelase visit https://www.upf.edu/web/sct-sit/gnu-parallel-tutorial

7. Interactive jobs

 

Although Slurm has a command designed to run interactive jobs (salloc), we strongly recommend to use a tool named ‘interactive’ that is available in Marvin. Exist several ways to run interactive jobs (normaly are different combinatios between the salloc and the srun command) but the use for the ‘interactive’ tool help to the final user to do it in the correct way:

 

[msanchez@mr-login ~]# interactive -h
Usage: interactive [-A] [-p] [-a] [-N] [-c] [-T|-n] [-c] [-m] [-e] [-r] [-w] [-J] [-x]

Optional arguments:
     -A: account (non-default account)
     -p: partition (default: normal)
     -a: architecture (default: , values hsw=Haswell skl=SkyLake wsw=Warsaw)
     -N: number of nodes
     -T: number of tasks per node
     -n: number of tasks (default: 1)
     -c: number of CPU cores (default: 1)
     -m: amount of memory (GB) per core (default: 1 [GB])
     -e: email address to which the begin session notification is to be sent
     -r: specify a reservation name
     -w: target node
     -J: job name
     -x: binary that you want to run interactively
example : interactive -A snow -a hsw -c 4 -J MyFirstInteractiveJob
example : interactive -A snow -a hsw -c 4 -J MyFirstInteractiveJob -x "MyBinary MyOptions"

Written by: Alan Orth <a.orth@cgiar.org>
Modified by: Jordi Blasco <jordi.blasco@hpcnow.com>
[msanchez@mr-login ~]#

 

Examples:

 

1.- Run an interactive job with a resource reservation of 4 nodes with 8 cores and 4Gb of RAM for each core.

 

[msanchez@mr-login ~]$ interactive -N 4 -T 8 -m 4 -e miguelangel.sanchez@upf.edu -J test-interactive-job

 

2.- Run an interactive jobs with 32 cores (we don’t mind the number of cores and how many cores we have in each node). We also ask for 8 GB in each core:


[msanchez@mr-login ~]$ interactive -N 4 -n 32 -m 8 -e miguelangel.sanchez@upf.edu -J test-interactive-job

8. Parallel jobs

The most important issue that we have to take in account when we want to run a parallel job in slurm is to use always the ‘srun’ command.

 

We have two ways to run a parallel job:

8.1 Using a bash script and using the sbatch command

 

First, we have to prepare a bash script in which we are going to indicate how many computers and how many cores each computer we ask for will run our job with:

 

test-mpi.sh

#!/bin/bash
#
#SBATCH -p normal # partition (queue)
#SBATCH -N 2 # number of nodes
#SBATCH --ntasks-per-node=4 # number of cores in each node
#SBATCH --mem-per-cpu 1000 # mem to user for each core
#SBATCH -t 0-00:15 # time (D-HH:MM)
#SBATCH -o slurm.%N.%j.out # STDOUT
#SBATCH -e slurm.%N.%j.err # STDERR
#SBATCH --mail-type=START,END,FAIL # notifications for job start, done & fail
#SBATCH --mail-user=miguelangel.sanchez@upf.edu # send-to address

 

# Parallel program to run
srun mpi-example/mpi_mm

 

 

We can use diferents flags, for instance:

 

#SBATCH --ntasks=16 : How many cores in total we are asking for. These cores are going to be split by nodes (example: we can use 10 cores in one node and 6 cores  in the second node)
#SBATCH --mem=16G : Specifies the real memory required per node. A memory size specification of zero is treated as a special case and grants the job access to all of the memory on each node.

 

8.2 Using the srun command directly

 

We can avoid using a job script and indicate all the flags in a command line:

 

srun -N 4 --ntasks-per-node=8 --mem-per-cpu=2GB -t 00:30:00 mpi-example/mpi_mm

 

This way can be very useful if we want to use a script that send several jobs to the cluster.

 

9. GPU Resource

Marvin offers as an extra resource the possibility of use GPU to run jobs. To do that first we need to have a program previously compiled with CUDA (nVidia GPU compilers):

 

At the moment, we only have one computer with 4 GPU (mr-00-12).

 

To ask for a GPU resource we have to add this flag to our job script:

 

#SBATCH --gres=gpu:2

 

Below we have a completed job script that asks for 2 GPUs.

 

#!/bin/bash
#
#SBATCH -p normal # partition (queue)
#SBATCH -N 1 # number of nodes
#SBATCH -n 2 # number of cores
#SBATCH --gres=gpu:2
#SBATCH --mem 1000 # memory pool for all cores
#SBATCH -t 0-01:00 # time (D-HH:MM)
#SBATCH -o slurm.%N.%j.out # STDOUT
#SBATCH -e slurm.%N.%j.err # STDERR
#SBATCH --mail-type=END,FAIL # notifications for job done & fail
#SBATCH --mail-user=miguelangel.sanchez@upf.edu # send-to address

module load CUDA

cd /homes/users/msanchez/gpu_burn-0.7
./gpu_burn 120

 

 

10. Advice and Good clustering practices

  • Keep your home directory as clean as possible. An ideal home directory would be clean of files. It would only have there a directory for each of your projects. Once done that, whenever you submit a job,     use -e/-o options in order to send those files to the directory of the Project they belong.

 
  • Use sbatch for anything that it is not editing, moving or zipping files. But if the files are pretty big, then sbatch also the zips.

 
  • Test your jobs before launching them. A programm could work in your computer or in another cluster and fail here. Maybe here we need to install a perl module, a library or maybe you got a typo in the input file path. Submiting small test executions is fast and really useful.

 
  • You should have a roughly idea of the amount of resources your job needs: time, memory, hard disk space.If not, you're increasing the risk of a job failure due to walltime, too much swapping or lack of disk space. Which will be a waste of time for you and anyone else using the cluster.

 
  • Use sbtach. Once you’re in marvin you are in the login node, this is not the cluster, it is only the front end and queue manager computer. If you run heavy things here the computer will go slower not only to you, but to anyone else using it.

 
  • Use  the “--mem=XXX” option. It is needed by the scheduler system in order to know how much memory will need your job.

 
  • Use a reasonable amount of memory. Any extra memory you are requesting, is memory that other users (or even you) will not be able to use, and make everything slower (jobs not running because, even though there are free cpu slots, there is not enough free memory).

 
  • We do have more than 300 CPUs. That mean we can run 300 jobs     simultaneously. Yeah, it’s not Marenostrum but… use it!!! I mean, if you have to run a process that will last 20 hours, but you manage it to become 20 processes of 1 hour each… It will work 20 times faster!!! Sometimes it is not possible to break it into pieces     that way, but much others it is.

 
  • But keep in mind: N cores => No more than 30xN simultaneous sbatch jobs. as mentioned, it is ok to split a 200h job in 20 x 10h jobs, or even in 100 x 2h jobs, but it doesn’t make sense to try to split in 20000 jobs.

 
  • It doesn't make sense to send jobs shorter than 30'. Because even when a queue is empty it takes some seconds  for a job to start. And when it's finished it takes again some time to leave the queue, inform the master, write accounting logs, and so on. Therefore, if your job takes only a few minutes of computation, it spends most of the time geting in and out of the queue, instead of computing.

 
  • As an extension of the previous point, if you want to run an installed program, and that program has a parallel version, let us known. They tend to be much faster and efficient.

 
  • If you need to run several similar jobs, here is an easy way to do it, making a script that sends jobs with different parameters, but try to avoid neverending self-calling loops. They are pretty nasty to kill :S

 

11. MATLAB

 

In order to use the Matlab licenses into the cluster, the Slurm queue system provides resources to use this license:

  • To show the current status of licenses:

scontrol show lic

  • If you want to use 2 licenses as resource:

sbatch -L matlab:2 <job-script>.sh

      or inside the bash script:

#SBATCH -L matlab:2

 

12. Inspiration

Marvin is the ship's robot aboard the starship Heart of Gold. Originally built as one of many failed prototypes of Sirius Cybernetics Corporation's GPP (Genuine People Personalities) technology, Marvin is afflicted with severe depression and boredom, in part because he has a "brain the size of a planet" which he is seldom, if ever, given the chance to use. Indeed, the true horror of Marvin's existence is that no task he could be given would occupy even the tiniest fraction of his vast intellect. Marvin claims he is 50,000 times more intelligent than a human, (or 30 billion times more intelligent than a live mattress) though this is, if anything, an underestimation. When kidnapped by the bellicose Krikkit robots and tied to the interfaces of their intelligent war computer, Marvin simultaneously manages to plan the entire planet's military strategy, solve "all of the major mathematical, physical, chemical, biological, sociological, philosophical, etymological, meteorological and psychological problems of the Universe except his own, three times over", and compose a number of lullabies. 

https://en.wikipedia.org/wiki/Marvin_the_Paranoid_Android