UPF Scientific Computing Core Facility

Research Storage

Store, process and manage research data close to Correfoc computing resources.

Research Storage is based on GPFS, the shared parallel filesystem used by Correfoc compute resources. It provides spaces for active computation, shared project data and personal user environments. Future additions will cover institutional backup for data outside Correfoc and GPFS archive space for cold datasets moved to tape.

Storage at a glance

FilesystemGPFS, Correfoc's shared parallel filesystem
Capacity3 PB raw storage capacity
scratchTemporary computation
projectsShared research data
homesPersonal user files
Tape backupFor projects and homes
Coming soonBackup for external research data
Coming soonGPFS archive filesystem for cold datasets

Choose the right storage area

Keep research data close to compute, but use the right space for the right purpose: use scratch for temporary computation, projects for shared research data and homes for personal user files.

scratch

Temporary high-performance space for active computations, intermediate files and job outputs that can be regenerated.

Use for

  • Temporary job files
  • Intermediate results
  • High-I/O computations
  • Data that can be regenerated

Backup: Not backed up. Do not keep the only copy of important data in scratch.

projects

Shared project space for research groups, datasets, workflows and relevant results.

Use for

  • Shared research data
  • Project datasets
  • Important outputs
  • Group workflows
  • Files available to project members

Backup: Backed up to tape.

homes

Personal user space for configuration, small scripts and lightweight files.

Use for

  • Shell configuration
  • Personal scripts
  • Small working files
  • User settings

Backup: Backed up to tape.

Home directories should not be used as bulk project storage.

Coming soon: backup and GPFS archive services

Two planned additions will cover data that does not fit the active GPFS project spaces.

Coming soon

Backup for research data outside the cluster

Additional institutional backup space for research data not stored on Correfoc, such as data on lab servers, workstations or other institutional systems.

Availability, conditions and technical procedures will be defined before the service is opened.

Coming soon

GPFS archive filesystem for cold scientific data

A dedicated archive filesystem within GPFS for datasets that must be preserved but are no longer active in daily computation.

Archive is for cold data. It is not active working storage, and data may be less immediately accessible because it is moved to tape.

Storage workflow diagram

Active compute data

Compute jobs
scratchtemporary computation
projectsshared research data
Tape backup for projects

Personal environment

User settings and scripts
homes
Tape backup for homes

Coming soon

External research data
Additional backup service
 
Cold scientific datasets
GPFS archive filesystem, moved to tape

Compute close to your data

Correfoc CPU and GPU resources are connected to GPFS, the cluster's shared filesystem, allowing jobs, notebooks, IDEs and graphical applications to work close to research data. This reduces unnecessary transfers between laptops, lab servers and compute nodes.

  • HPC jobs can read and write data directly from GPFS spaces.
  • Interactive environments can work close to the same data.
  • Large datasets should not be repeatedly copied to personal computers unless needed.

Storage good practices

  • Use scratch for temporary computation, then move important results to projects.
  • Use projects for shared group data, workflows and relevant outputs.
  • Keep homes for configuration, scripts and lightweight personal files.
  • Avoid millions of small files and unnecessary copies of large datasets.
  • Do not use personal computers as the main storage location for large research datasets.
  • Use the GPFS archive filesystem only for inactive data that must be preserved on tape.
  • Contact SCC for unusual I/O, very large file counts or backup/archive planning.

Backup and preservation overview

Use this table as the quick status view for backup and preservation.

Storage or service Main purpose Status
scratch Temporary computation Not backed up / temporary
projects Shared project data Backed up to tape
homes Personal user files Backed up to tape
Additional backup service Backup for research data outside Correfoc Coming soon
GPFS archive filesystem Preservation of cold scientific datasets moved to tape Coming soon

Not sure where your data should go?

If your project uses large datasets, produces many files, stores important data outside the cluster or has long-term preservation needs, contact the SCC team before designing the workflow.