HPC for AI Research

Tutorial at ICIAP 2025

Organized within the MINERVA EU project Organized in conjunction with NVIDIA

Date & venue: 16 September, 2025 14:30-18, Building E T1

Abstract

Deep Learning has revolutionized AI, driving advancements in computer vision, natural language processing, and multimodal learning. However, training and deploying large-scale AI models come with computational challenges. As models grow in size and complexity, efficiency in both training and inference becomes crucial for scalability, accessibility, and sustainability. This tutorial, organized as part of the MINERVA EU project, will provide a comprehensive overview of how to leverage high-performance computing resources, covering distributed training strategies, optimization techniques, and resource management for large-scale deep learning. We will discuss best practices for leveraging multi-GPU and multi-node architectures, memory-efficient training, and mixed-precision techniques to accelerate AI workloads. Special focus will be given to the latest advancements in Transformer-based models, their computational impact, and strategies to make them more efficient. Participants will gain hands-on experience with state-of-the-art frameworks, exploring practical approaches for optimizing AI training and inference on HPC clusters. The tutorial will also address energy efficiency concerns, discussing techniques for reducing AI’s carbon footprint while maintaining model performance. Attendees will have a deeper understanding of how to design, train, and deploy AI models efficiently, making high-performance computing more accessible to the research community.

Description

Deep Learning (DL) has revolutionized Artificial Intelligence (AI), enabling state-of-the-art advancements in computer vision, natural language processing, and multimodal learning. However, the computational cost of training and deploying large-scale AI models is immense, requiring optimized hardware utilization, parallelization strategies, and efficient resource scheduling to achieve scalability. This tutorial will explore the computational aspects of large-scale AI, with a focus on optimizing deep learning training and inference on high-performance computing (HPC) infrastructures.

The tutorial will provide a hands-on introduction to the fundamental principles and practical techniques of high-performance AI computing, including distributed training strategies, GPU and multi-node acceleration, memory-efficient model optimization, and inference acceleration. We will cover best practices for leveraging HPC environments, including supercomputing clusters and cloud-based AI infrastructures, to scale deep learning workloads efficiently. Participants will learn how to implement state-of-the-art techniques such as model parallelism, mixed-precision training, and advanced scheduling policies to maximize performance and reduce energy consumption.

The tutorial is designed for AI researchers, data scientists, and engineers interested in training and deploying deep learning models efficiently at scale. It will include theoretical insights, real-world case studies, and interactive hands-on labs using PyTorch, DeepSpeed, and distributed computing frameworks.

At the end of this tutorial, participants will:

Understand fundamental and advanced techniques in high-performance AI computing
Gain practical experience in implementing large-scale AI models efficiently
Learn to optimize AI workloads for HPC environments
Acquire best practices for distributed training, model parallelism, and energy-efficient AI computing
Acquire the knowledge needed to request HPC resources from EuroHPC and CINECA

Syllabus & Program

Introduction to HPC environments — Sergio Orlandini (CINECA) ~60 min

Overview of cluster architecture
Multi-Node and Multi-GPU architectures
How to get computational resources @CINECA
Network and data movement on a HPC cluster
Software environments on a HPC cluster
Different views of a compute node: Hybrid MPI/openMP/GPU simulations
Managing resources: the SLURM queue manager
Tips & Tricks on a HPC cluster

State-of-the-art Deep Learning architectures — Lorenzo Baraldi (UNIMORE) ~30 min

The attention operator, self-/cross-attention and its variants
The Transformer architecture
Transformer-based models for Language (BERT, BART, GPT-x)
Transformer-based models for Computer Vision (ViT, DeIT, DETR)
Usage of learnable queries, Perceiver, Perceiver IO
Connecting multiple modalities: encoder-decoder, encoder-only and decoder-only approaches
Optimized variants of the attention operator

Self- and semi-supervised training approaches — Lorenzo Baraldi (UNIMORE) ~30 min

Masked language models, Next Sentence prediction
Noisy labels / Self-training
Contrastive learning
Multi-modal foundation models: CLIP, Flamingo
Scaling laws for Deep Learning architectures
The Mixture of Experts approach

Training techniques for large-scale models — Andrea Pilzer (NVIDIA) ~60 min

Overview of parallelism
Distributed data parallelism
Reducing numerical precision
DataLoader optimization

Slides & Materials

Sergio Orlandini
Introduction to HPC environments

Coming soon
Lorenzo Baraldi
Slides for: State-of-the-art DL architectures; Self-/semi-supervised training

View PDF
Andrea Pilzer
Training techniques; Distributed training

View PDF

Organizers

Lorenzo Baraldi
UNIMORE
Andrea Pilzer
NVIDIA
Sergio Orlandini
CINECA

About MINERVA

This tutorial is organized within the MINERVA EU project.

MINERVA is a European initiative supporting advances in AI research and its efficient deployment on high-performance computing infrastructures. Learn more on the official website.

MINERVA AI Community Survey

Help shape future HPC facilities for AI research by sharing your needs and priorities. The survey takes ~5 minutes.

Fill out the survey

Participation

This tutorial is part of ICIAP 2025. Details about date, venue, and registration can be found on the conference website.