AImageLab SRV

Maintenance changelog, 2024-07-26


Article

Dear users,

This is to inform you that aimagelab-srv is now back online and available to regular users. As this maintenance slot involved a set of breaking changes, it is advisable to carefully read the maintenance changelog below before starting using the system.

Login/frontend nodes

  • We are introducing a new enhanced login/frontend node, termed ailb-login-03, with 8 NVIDIA P100 and 6.TB of local scratch disk space. This node also supports newer CUDA versions.
  • We are now disabling the access to regular users to ailb-login-01 (which acts as the main controller of the cluster), to improve the cluster stability and resilience.
  • Older frontend DNS names (aimagelab-srv-login, aimagelab-srv-00) are now disabled.

A summary of the login/frontend nodes is reported below.

Hostname Description
ailb-login-01 Not accessible to users.
ailb-login-02 Basic node. Supports CUDA up to 11.7.
ailb-login-03 Enhanced node. Supports newer CUDA versions, 6.5TB local scratch.

Data mover node

  • Our data mover node, ailb-data, is now available again. Recall to use this node for heavy data transfers that do not fit the 10 minutes CPU limit of login/frontend nodes.

Changes to /homes quota management

  • Quotas on /homes are now soft quotas with a 15 minutes grace period. This should ease the installation of heavy Python packages.

Software updates

  • Updated kernel version to 5.15.0-101-generic
  • Updated NVIDIA drivers to 535.183.01 (except ailb-login-02)
  • Updated BeeGFS client versions to 7.4.4

Published: July 28, 2024