AImageLab-HPC

File Systems and Data Management

Last updated: March 29, 2026


AImageLab-HPC provides two storage areas with different characteristics and intended uses. Neither area is backed up - users are solely responsible for protecting their important data.

Important: There are no environment variables such as $HOME or $WORK on AImageLab-HPC. Always use absolute paths: /homes/<username> and /work/<project>.

Overview

Path Filesystem Quota Backup Deleted on expiry
/homes/<username> NFS 100 GB per user (default) No 6 months after username expiry
/work/<project> BeeGFS Per project (set at provisioning) No Yes - immediately
/tmp (node-local) tmpfs Varies by node No At job end

/homes - Personal Area

/homes/<username> is your personal home directory, hosted on NFS. It is intended for:

  • Personal configuration files (.bashrc, .ssh, etc.)
  • Small scripts and source code
  • Software installations that produce many small files (e.g. Python virtual environments)

Do not use /homes for large datasets or production I/O. NFS is not suited for the high-throughput parallel access patterns typical of HPC workloads. Large reads and writes from compute nodes should always go through /work.

The default quota is 100 GB. If you need more, contact the HPC Helpdesk.

/homes/<username> is retained for 6 months after your username expires, then permanently deleted.

/work - Project Area

/work/<project> is a shared project workspace hosted on BeeGFS, a high-performance parallel distributed filesystem. It is the primary area for all production data, model checkpoints, datasets, and job outputs.

Key properties:

  • Shared: all members of a project have read/write access to /work/<project>.
  • Per-project quota: set at provisioning time and visible with squota (see below).
  • No backup: data loss is permanent.
  • Deleted on expiry: when a project expires, its /work/<project> directory is deleted immediately with no grace period. Ensure you export any data you wish to keep before the project end date.

The PI is the owner of the root /work/<project> directory. Collaborators are advised to create personal subdirectories:

mkdir /work/<project>/<username>

By default, files you create are readable and writable only by you. To share files with project collaborators:

chmod 770 /work/<project>/<username>/my_shared_dir

Since /work/<project> is not accessible to users outside the project, opening permissions to group level is safe within the project.

/tmp - Job-local Temporary Storage

Most compute nodes are equipped with fast local storage (tmpfs). When a job starts, /tmp on the compute node is available as a private temporary area that is automatically cleared when the job ends.

Request tmpfs storage in your job script:

#SBATCH --gres=gpu:1,tmpfs:50G

A typical pattern for I/O-intensive workloads is to copy input data to /tmp at the start of the job, write outputs there, then copy results back to /work before the job ends:

cp /work/<project>/dataset.tar /tmp/
tar -xf /tmp/dataset.tar -C /tmp/

python train.py --data /tmp/dataset --output /tmp/checkpoints

cp -r /tmp/checkpoints /work/<project>/results/

Note: /tmp is local to each compute node. For multi-node jobs, each node has its own independent /tmp - it is not shared across nodes.

The available tmpfs capacity varies by node. Use sinfo -o "%n %G" -p all_usr_prod to inspect the gres configuration per node.

Checking Storage Quota - squota

The squota command shows your current storage usage and quota for all areas you have access to:

squota

Example output:

Filesystem               User/Project    Usage (chunks)    Quota (chunks)    % (chunks)    Usage (GB)    Quota (GB)    % (GB)
------------  -------------------------  ----------------  ----------------  ------------  ------------  ------------  --------
       /work                   ai4a2026                 0                                       0.00        100.00      0.00
       /work    baraldi_doxee_ix_studio          11045886                                    2239.66       3072.00     72.91
       /work          baraldi_doxee_pwd           2043548                                     246.20       1024.00     24.04

Column descriptions:

Column Description
Filesystem Storage area
User/Project Username (for /homes) or project name (for /work)
Usage (chunks) Number of BeeGFS storage chunks currently used
Quota (chunks) Chunk quota, if set (usually not enforced - GB quota applies)
Usage (GB) Actual storage used in gigabytes
Quota (GB) Maximum allowed storage in gigabytes
% (GB) Percentage of GB quota consumed

Tip: Use squota rather than du -sh to check disk usage on /work. du traverses the filesystem and generates heavy metadata load on BeeGFS; squota reads quota counters directly and is instantaneous.

BeeGFS Best Practices

BeeGFS is a distributed parallel filesystem optimised for high-throughput sequential I/O on large files. Understanding its characteristics helps avoid performance pitfalls.

What BeeGFS is good at

  • Large sequential reads and writes (datasets, model checkpoints, video files)
  • High-bandwidth parallel access from many compute nodes simultaneously
  • Storing a moderate number of large files

What to avoid

Many small files. BeeGFS metadata performance degrades significantly when directories contain millions of small files (e.g. unzipped ImageNet, raw frames, pip-installed packages). Each file open/stat/close requires a metadata server round-trip. Symptoms include very slow ls, find, and job startup times.

Mitigations:
- Store datasets as archives (tar, zip) or in container formats (HDF5, Zarr, WebDataset, LMDB) and read them streaming.
- Install Python environments in /homes, not /work - NFS handles small files better for this workload.
- If you must have many small files in /work, organise them into subdirectories of ≤ 10,000 files each.

Recursive metadata operations. Commands like ls -lR, find /work/<project>, and du -sh /work/<project> walk the entire directory tree and can generate enormous metadata load, slowing the filesystem for all users. Use squota for quota checks, and scope find with -maxdepth to limit traversal depth.

Frequent small random writes. Appending tiny amounts of data in a tight loop (e.g. writing one line to a log file per iteration) is inefficient. Buffer output in memory and flush in larger chunks, or write logs locally to $TMPDIR and copy them to /work at job end.

Concurrent writes to the same file from multiple processes. Avoid having multiple parallel workers append to the same file simultaneously. Use a single writer process, or have each worker write to its own file and merge afterwards.

Practical tips

  • Open files once per job, perform all I/O, then close - avoid repeatedly opening and closing the same file.
  • For distributed training checkpoints, write one file per process rather than having all ranks write to a single shared file.
  • Use /tmp for intermediate files that are read and written repeatedly during a job; copy only final outputs to /work.

Data Transfer

Login node transfers (small files)

For small transfers that complete quickly, use scp, sftp, or rsync directly via the login nodes:

# Upload from local machine to cluster
scp /local/path/to/file <username>@ailb-login-02.ing.unimore.it:/work/<project>/

# Download from cluster to local machine
scp <username>@ailb-login-02.ing.unimore.it:/work/<project>/results.tar.gz /local/path/

# Sync a local directory to the cluster
rsync -avP /local/dataset/ <username>@ailb-login-02.ing.unimore.it:/work/<project>/dataset/

Login nodes enforce a CPU time limit. For large transfers, use the dedicated data mover instead.

Data mover (large transfers)

For large or long-running transfers, use the dedicated data mover node ailb-data.ing.unimore.it. It has no CPU time limit and is optimised for sustained transfer throughput.

The data mover supports scp, sftp, and rsync. You cannot open an interactive shell on it - it only accepts file transfer commands.

# Upload a large dataset
rsync -avP /local/large_dataset/ <username>@ailb-data.ing.unimore.it:/work/<project>/large_dataset/

# Download results
rsync -avP <username>@ailb-data.ing.unimore.it:/work/<project>/results/ /local/results/

# Using scp
scp -r <username>@ailb-data.ing.unimore.it:/work/<project>/outputs/ /local/outputs/

Interactive SFTP

sftp provides an interactive session for exploring and transferring files:

sftp <username>@ailb-login-02.ing.unimore.it

Useful commands inside an sftp session:

Command Description
ls / lls List remote / local directory
cd / lcd Change remote / local directory
pwd / lpwd Print remote / local working directory
get <file> Download file from remote
put <file> Upload file to remote
exit Close session

Windows users

scp and rsync are available via Git Bash or Windows Subsystem for Linux (WSL). GUI alternatives include FileZilla and WinSCP.