AImageLab SRV

Table of Contents

SLURM Features


AImageLab-SRV is a heterogeneous computing environment designed to accommodate a wide variety of computational workloads. Due to the diverse range of hardware available, it is essential for users to understand how to effectively utilize node features to maximize resource efficiency. This document outlines the available SLURM node features in the AImageLab-SRV cluster and provides guidance on how to properly request specific hardware configurations when submitting jobs.

Node Features

Node features describe the specific capabilities of each node, particularly in terms of GPU model and the amount of available VRAM. These features can be specified when submitting a job using the --constraint directive in SLURM. This allows users to target nodes with the appropriate hardware for their computational needs.

Supported GPU Features

The current list of supported GPU node features is as follows:

  • gpu_1080_11G
  • gpu_2080_11G
  • gpu_A40_45G
  • gpu_K80_12G
  • gpu_L40S_45G
  • gpu_P100_16G
  • gpu_RTX5000_16G
  • gpu_RTX6000_24G
  • gpu_RTX_A5000_24G

Each feature is named according to the GPU model and the amount of VRAM it possesses. For example, the feature gpu_RTX6000_24G corresponds to nodes equipped with an NVIDIA RTX 6000 GPU that has 24 GB of VRAM.

Cluster Partitions

AImageLab-SRV provides three main partitions, each with different purposes and GPU availability:

Partition Purpose Available GPU Types
all_serial Debug gpu_P100_16G, gpu_K80_12G
all_usr_prod Production All GPU types except P100 and K80
boost_usr_prod High-VRAM Prod Only GPUs with VRAM > 24 GB

Submitting Jobs with Node Features

When submitting jobs on the AImageLab-SRV cluster, users can specify desired node features using the --constraint directive in their SLURM job script. This allows users to ensure that their jobs are allocated to nodes with the necessary GPU resources.

💡 Example: Requesting GPUs with ≥ 24 GB of VRAM

If your job requires a GPU with at least 24 GB of VRAM, you can specify the following constraint in your SLURM job script:

#SBATCH --constraint="gpu_RTX6000_24G|gpu_RTX_A5000_24G|gpu_A40_45G|gpu_L40S_45G"

This expression requests nodes equipped with NVIDIA RTX 6000 (24 GB), RTX A5000 (24 GB), A40 (45 GB), or L40S (45 GB) GPUs, ensuring sufficient VRAM.

Guidelines for Specifying Constraints

To optimize the use of cluster resources and to avoid unnecessary job delays, users should adhere to the following guidelines when specifying node features:

Allow Higher VRAM GPUs 🟢: Always allow nodes with GPUs that have more VRAM than the minimum required by your job. For example, if your script requires 24 GB of VRAM, it is advisable to include nodes with 45 GB VRAM GPUs in your constraints.

Include Compatible Lower VRAM GPUs 🔄: When your job has a specific VRAM requirement, always include GPUs with slightly higher or exactly matching VRAM capacities. For instance, if your job needs 9 GB of VRAM, you must allow nodes with GPUs having 11 GB, 12 GB, 16 GB, 24 GB, or 45 GB of VRAM.

🚨 Penalties for Non-Compliance

⚠️ Attention: Users who do not comply with the guidelines for specifying node constraints will be logged. Continued non-compliance may result in the user being temporarily blocked from submitting jobs on the cluster. This measure is in place to ensure fair and efficient use of the AImageLab-SRV resources.

Last updated: November 29, 2025