High Performance Computing

Computing

Marvin is a High Performance Computing (HPC) cluster named after the famous robot character from the movie "The Hitchhiker's Guide to the Galaxy". The cluster is designed to provide a hight computing power and speed, enabling scientists, engineers, and researchers to perform complex simulations, data analysis, and modeling tasks with ease.
 
Marvin  is equipped with state-of-the-art hardware and software components, including powerful CPUs, GPUs, high-speed interconnects (Infiniband), large-capacity storage devices (IBM Spectrum Scale), and specialized software libraries and tools for scientific computing managed by EasyBuild. The cluster is optimized for parallel computing, allowing users to break down complex tasks into smaller, more manageable chunks that can be processed simultaneously across multiple nodes.
 
Marvin is managed by a team of skilled administrators who ensure that the cluster runs smoothly and efficiently. They provide users with access to the cluster's resources and support them in using the cluster for their research projects. The cluster is available to users from different fields, focusing on biology, medicine, data science and others.
 
With its high-performance computing capabilities and advanced features, Marvin is a very valuable tool for researchers and scientists who need to process large amounts of data and perform complex simulations and calculations. Its name pays homage to one of the most beloved characters from science fiction, and it embodies the spirit of innovation and discovery that drives scientific research.


The Data Center where is located Marvin, in Ciutadella Campus

 

HOW TO USE THE MARVIN CLUSTER?

There are two ways for the users to use the Marvin cluster and access its resources. The most common option is to connect directly to the cluster using command-line tools or terminal. Once connected, they can execute commands directly on the cluster nodes.
 Another way is to run graphic user interfaces (GUI) on the Marvin cluster. This option will allow users to have a more interactive and visual environment.
 

Below we explain these two options and we provide their corresponding tutorials to use the cluster.

 

Command-line tools

All users can interact with Marvin cluster using command-line tools. They only need to open a terminal window in their local machine, and then start an SSH connection to the Marvin cluster.

Here you will find the tutorials to make the connection to the cluster, depending on the user’s OS ( Linux, Windows…)

 

Running Jupyter in Marvin

This option is for all users who are working with Python.
 

Click here to access the tutorial on running Jupyter in Marvin

 

 

Running RStudio in Marvin

This option is for all users who are working with R.
 

 

Click here to access the tutorial on running RStudio in Marvin

 

Remote Desktop (GUI) in Marvin 

Users can use Marvin cluster through a remote desktop. This option is for users who work with graphical-based softwares (Fiji, Cellpose) and large files (ie: large images from Microscopy with several dimensions). By running the steps provided in the tutorial, users will be able to submit a job in a Marvin node with their specific necessities and connect graphically to it. 

 

 

Click here to access the tutorial on Remote Desktop in Marvin.


 

Data Storage

The storage of the HPC cluster is managed by IBM Spectrum Scale version 5.1.0, which is a high-performance parallel file system designed to handle large volumes of data with high throughput and low latency. The storage is split into two filesystems: projects and robbyfs.

The robbyfs filesystem includes two partitions named scratch and homes. Scratch has a storage capacity of 297TB, while homes has a storage capacity of 8TB. The projects filesystem has a storage capacity of 517TB. The scratch partition is typically used for temporary data storage during computation, while the homes partition is used to store user home directories and related files.

The system manages SATA disks for data storage, which are slower but offer a high capacity and are cost-effective for large data storage. In contrast, the system uses SSD disks for metadata storage, which are faster and provide low latency access to data.

The interconnection between the storage and compute nodes is managed by Infiniband network EDR, which is a high-speed interconnect designed for HPC applications. This network offers low latency and high bandwidth communication between the nodes, enabling fast data transfers and parallel processing.

Overall, the storage system of the HPC cluster is designed to handle large volumes of data with high throughput and low latency, making it suitable for high-performance computing applications.

 

Bioinformatics

The SCC, together with the Genomics Core Facility, also offers a comprehensive Bioinformatics service, utilizing advanced computational techniques and algorithms to provide data analysis and interpretation. We specialize in extracting valuable insights from complex biological data, covering genomics, proteomics, and other omics fields.
 
Our service encompasses a wide range of techniques, including data visualization, statistical analysis, and data integration. We provide tailored solutions to help researchers effectively interpret and present their findings. Additionally, we develop custom pipelines and workflows to streamline data processing, ensuring efficient and reproducible analyses. We also assist users in the development of their own pipelines and workflows, offering guidance and supervision whenever needed.
 
Our Bioinformatics service caters to various research areas, such as analyzing gene expression patterns, identifying genetic variations, studying microbial communities, and exploring evolutionary relationships. We work closely with researchers to understand their objectives and deliver scientifically rigorous results that drive impactful discoveries. By collaborating with us, researchers can unlock the full potential of their data and gain valuable insights into the intricacies of life sciences.