SLURM Partitions: Difference between revisions

From Wiki
(Created page with "A list of partitions defined on the cluster, with access rights and resources definition, can be displayed with the command sinfo: > sinfo -o "%10D %20F %P" The command returns a more readable output which shows, for each partition, the total number of nodes and the number of nodes by state in the format "Allocated/Idle/Other/Total". In the following table you can find the main features and limits imposed on the partitions. Note: '''cpu''' refers to a logical cpu (1 H...")
 
No edit summary
Line 1: Line 1:
A list of partitions defined on the cluster, with access rights and resources definition, can be displayed with the command sinfo:
Partitions are work queues that have a set of rules/policies and computational nodes included in it to run the jobs.
 
A list of partitions defined on the cluster, with access rights and resources definition, can be displayed with the command <code>sinfo</code>:
  > sinfo -o "%10D %20F %P"
  > sinfo -o "%10D %20F %P"
The command returns a more readable output which shows, for each partition, the total number of nodes and the number of nodes by state in the format "Allocated/Idle/Other/Total".
The command returns a more readable output which shows, for each partition, the total number of nodes and the number of nodes by state in the format "Allocated/Idle/Other/Total".
Line 6: Line 8:


Note: '''cpu''' refers to a logical cpu (1 HT).
Note: '''cpu''' refers to a logical cpu (1 HT).
=== Partition Table ===
{| class="wikitable"
{| class="wikitable"
| colspan="1" |'''SLURM'''
| colspan="1" |'''SLURM'''

Revision as of 21:38, 5 February 2023

Partitions are work queues that have a set of rules/policies and computational nodes included in it to run the jobs.

A list of partitions defined on the cluster, with access rights and resources definition, can be displayed with the command sinfo:

> sinfo -o "%10D %20F %P"

The command returns a more readable output which shows, for each partition, the total number of nodes and the number of nodes by state in the format "Allocated/Idle/Other/Total".

In the following table you can find the main features and limits imposed on the partitions.

Note: cpu refers to a logical cpu (1 HT).

Partition Table

SLURM

partition

Job QOS # cores/# GPUper job max walltime max running jobs per user/

max n. of cores/nodes/GPUs per user

priority notes
dev normal max = 8 CPUs 04:00:00 4 GPU

max mem = 40GB

40
students-dev normal max = 8 CPUs 02:00:00 4 GPU per account

max mem = 20GB

20
prod normal 24:00:00 max 25 jobs per user

14 GPU

10
special > 24:00:00 max 25 jobs per user

128 CPUs/600 GB/0 GPUs

10 reserved for non-interruptable jobs. Request to aimagelab-srv-support@unimore.it
special-dbg 4:00:00 max 25 jobs per user

32 CPUs/128 GB/0 GPUs

40 reserved for debugging jobs with qos special. Request to aimagelab-srv-support@unimore.it
lowprio 24:00:00 max 25 jobs per user

14 GPU

5 active projects/users with exhausted budget. Request to aimagelab-srv-support@unimore.it
students-prod normal 24:00:00 4 GPU per account 1 runs on a subset of 12 GPUs