AImageLab-HPC

SAI - Quick Start Guide

Last updated: April 21, 2026


This guide is for students enrolled in the Scalable AI (SAI 2026) course. It walks you through logging into the cluster, setting up your PyTorch environment, and running a Jupyter Notebook or a VSCode remote session on a compute node.

Step 1 - Log into the Cluster

Connect via SSH to one of the login nodes:

ssh <username>@ailb-login-02.ing.unimore.it

or

ssh <username>@ailb-login-03.ing.unimore.it

Replace <username> with your AImageLab-HPC username (which is different from your UNIMORE username — check your welcome email). You will be prompted for your UNIMORE password.

First login: If this is your first time connecting, your SSH client will ask you to confirm the host fingerprint. Type yes and press Enter.

If you need to connect from outside the UNIMORE network, copy your SSH public key to the cluster first:

ssh-copy-id <username>@ailb-login-02.ing.unimore.it

Step 2 - Set Up Your PyTorch Environment

AImageLab-HPC provides an optimised PyTorch module. Use it rather than installing PyTorch via pip to get a build that is compiled against the cluster’s CUDA drivers.

Load the module and create a virtual environment that inherits it:

module purge
module load py-torch/2.8.0-gcc-11.4.0-cuda-12.6.3
mkdir -p /homes/<username>/sai2026
python -m venv --system-site-packages /homes/<username>/sai2026/venv

Activate the environment and install JupyterLab and any additional packages:

source /homes/<username>/sai2026/venv/bin/activate
pip install jupyterlab

Why --system-site-packages? This flag makes the virtual environment inherit the packages exposed by the loaded module (including torch, torchvision, and their dependencies) so you do not need to reinstall them. Packages you install with pip inside the venv take precedence over the module ones.

Verify that PyTorch can see the GPU:

python -c "import torch; print(torch.__version__); print(torch.cuda.is_available())"

Step 3 - Run a Jupyter Notebook on a Compute Node

Running Jupyter directly on a login node is not allowed. You must submit a SLURM job that starts JupyterLab on a compute node, then connect to it via an SSH tunnel.

3a - Write the Job Script

Create a file called jupyter_sai.sh in your home directory:

#!/bin/bash
#SBATCH --job-name=jupyter_sai
#SBATCH --partition=all_serial
#SBATCH --cpus-per-task=2
#SBATCH --mem=16G
#SBATCH --time=04:00:00
#SBATCH --output=/homes/%u/sai2026/jupyter_%j.out
#SBATCH --account=sai2026

module purge
module load py-torch/2.8.0-gcc-11.4.0-cuda-12.6.3
source /homes/$USER/sai2026/venv/bin/activate
jupyter lab --no-browser --ip=0.0.0.0 --port=8888

3b - Submit the Job

sbatch jupyter_sai.sh

Wait a few seconds, then check that the job is running:

squeue --me

Note the node name shown in the NODELIST column (e.g. nico).

3c - Find the Connection Token

Once the job is running, the log file will contain the JupyterLab URL and token:

tail -f /homes/<username>/sai2026/jupyter_<job_id>.out

Look for a line like:

http://nico:8888/lab?token=abc123...

Note down the node name and the token.

3d - Open the SSH Tunnel

From your local machine (not the cluster), open a new terminal and run:

ssh -L 8888:<node>:8888 <username>@ailb-login-02.ing.unimore.it

Replace <node> with the node name from the previous step. Keep this terminal open for the duration of your session.

If port 8888 is already in use on your machine, use a different local port (e.g. 8889):

ssh -L 8889:<node>:8888 <username>@ailb-login-02.ing.unimore.it

3e - Open JupyterLab in Your Browser

Navigate to:

http://localhost:8888

When prompted, paste the token from the log output. You can also paste the full URL from the log, replacing the node hostname with localhost:

http://localhost:8888/lab?token=abc123...

You are now connected to a JupyterLab session running on a compute node under the sai2026 account.

Step 4 - Remote Debugging with VSCode (Optional)

If you prefer to work interactively with VSCode rather than Jupyter, the cluster supports remote debugging via the SSH extension. This lets you edit code and launch GPU-accelerated debug sessions directly from your local VSCode installation.

Full instructions are available in the Remote debugging via Desktop IDE guide. Use the sai2026 SLURM account when submitting your interactive job.

Stopping the Session

When finished, shut down JupyterLab from the browser (File → Shut Down) or cancel the job from the cluster:

scancel <job_id>

Closing the browser tab alone does not release the compute node — the SLURM job will keep running until it is cancelled or the walltime expires.