Data storage and FileSystems

From Wiki

Data Storage architecture

The available storage areas can be

  • temporary (data are cancelled after a given period);
  • permanent (data are never cancelled or cancelled only a few months after the "end" of the project);

they can also be

  • user specific (each username has a different data area);
  • project specific (accessible by all users within the same project).

Important: It is the user's responsibility to backup your important data.

/homes: permanent and user specific

/homes/<username> is the area where you are placed after the login procedure. It is where system, and user applications store their dot-files and dot-directories (.nwchemrc, .ssh, ...) and where users keep initialization files specific for the systems (.cshrc, .profile, ...). There is a home area for each username on the machine.

This area is conceived to store programs and small personal data. It has a quota of 100 GB. Files are never deleted from this area. File retention is related to the life of the username; data are preserved until the username remains active.

/work: permanent and project specific

/work is a scratch area for collaborative work within a given project. File retention is related to the life of the project. Files in /work will be conserved up to 6 months after the project end, and then they will be cancelled. Please note that there is no back-up in this area.

This area is conceived for hosting large working data files since it is characterized by the high bandwidth of a parallel file system. It behaves very well when I/O is performed accessing large blocks of data, while it is not well suited for frequent and small I/O operations. This is the main area for maintaining scratch files resulting from batch processing.

There is one /work area for each active project on the machine. The default quota is 100 GB per project, but extensions can be considered by the Help Desk if motivated. The PI of the project all collaborators are allowed to read/write in there. Collaborators are advised to create a personal directory in /work/<project> for storing their personal files.

/tmp: temporary and user specific

Each compute node is equipped with a local storage whose dimension differs depending on the node.

When a job starts, a temporary area is defined on the storage local to each compute node:

/scratch_local/slurm_job.$SLURM_JOB_ID

which can be used exclusively by the job's owner. During your jobs, you can access the area from /tmp. In your sbatch script, for example, you can move the input data of your simulations to /tmp before the beginning of your run and also write on /tmp your results. This would further improve the I/O speed of your code.

However, the directory is removed at the job's end; hence always remember to save the data stored in such area to a permanent directory in your sbatch script at the end of the run. Please note that the area is located on local disks, so it can be accessed only by the processes running on the specific node. For multinode jobs, if you need all the processes to access some data, please use the shared filesystems.