Abstract
Deep Learning has revolutionized AI, driving advancements in computer vision, natural language processing, and multimodal learning. However, training and deploying large-scale AI models come with computational challenges. As models grow in size and complexity, efficiency in both training and inference becomes crucial for scalability, accessibility, and sustainability. This tutorial, organized as part of the MINERVA EU project, will provide a comprehensive overview of how to leverage high-performance computing resources, covering distributed training strategies, optimization techniques, and resource management for large-scale deep learning. We will discuss best practices for leveraging multi-GPU and multi-node architectures, memory-efficient training, and mixed-precision techniques to accelerate AI workloads. Special focus will be given to the latest advancements in Transformer-based models, their computational impact, and strategies to make them more efficient. Participants will gain hands-on experience with state-of-the-art frameworks, exploring practical approaches for optimizing AI training and inference on HPC clusters. The tutorial will also address energy efficiency concerns, discussing techniques for reducing AI’s carbon footprint while maintaining model performance. Attendees will have a deeper understanding of how to design, train, and deploy AI models efficiently, making high-performance computing more accessible to the research community.
Description
Deep Learning (DL) has revolutionized Artificial Intelligence (AI), enabling state-of-the-art advancements in computer vision, natural language processing, and multimodal learning. However, the computational cost of training and deploying large-scale AI models is immense, requiring optimized hardware utilization, parallelization strategies, and efficient resource scheduling to achieve scalability. This tutorial will explore the computational aspects of large-scale AI, with a focus on optimizing deep learning training and inference on high-performance computing (HPC) infrastructures.
The tutorial will provide a hands-on introduction to the fundamental principles and practical techniques of high-performance AI computing, including distributed training strategies, GPU and multi-node acceleration, memory-efficient model optimization, and inference acceleration. We will cover best practices for leveraging HPC environments, including supercomputing clusters and cloud-based AI infrastructures, to scale deep learning workloads efficiently. Participants will learn how to implement state-of-the-art techniques such as model parallelism, mixed-precision training, and advanced scheduling policies to maximize performance and reduce energy consumption.
The tutorial is designed for AI researchers, data scientists, and engineers interested in training and deploying deep learning models efficiently at scale. It will include theoretical insights, real-world case studies, and interactive hands-on labs using PyTorch, DeepSpeed, and distributed computing frameworks.
- Understand fundamental and advanced techniques in high-performance AI computing
- Gain practical experience in implementing large-scale AI models efficiently
- Learn to optimize AI workloads for HPC environments
- Acquire best practices for distributed training, model parallelism, and energy-efficient AI computing
- Acquire the knowledge needed to request HPC resources from EuroHPC and CINECA
Syllabus & Program
- Overview of cluster architecture
- Multi-Node and Multi-GPU architectures
- How to get computational resources @CINECA
- Network and data movement on a HPC cluster
- Software environments on a HPC cluster
- Different views of a compute node: Hybrid MPI/openMP/GPU simulations
- Managing resources: the SLURM queue manager
- Tips & Tricks on a HPC cluster
- The attention operator, self-/cross-attention and its variants
- The Transformer architecture
- Transformer-based models for Language (BERT, BART, GPT-x)
- Transformer-based models for Computer Vision (ViT, DeIT, DETR)
- Usage of learnable queries, Perceiver, Perceiver IO
- Connecting multiple modalities: encoder-decoder, encoder-only and decoder-only approaches
- Optimized variants of the attention operator
- Masked language models, Next Sentence prediction
- Noisy labels / Self-training
- Contrastive learning
- Multi-modal foundation models: CLIP, Flamingo
- Scaling laws for Deep Learning architectures
- The Mixture of Experts approach
- Overview of parallelism
- Distributed data parallelism
- Reducing numerical precision
- DataLoader optimization
Organizers
-
Lorenzo BaraldiUNIMORE
-
Andrea PilzerNVIDIA
-
Sergio OrlandiniCINECA
About MINERVA
This tutorial is organized within the MINERVA EU project.
MINERVA is a European initiative supporting advances in AI research and its efficient deployment on high-performance computing infrastructures. Learn more on the official website.
MINERVA AI Community Survey
Help shape future HPC facilities for AI research by sharing your needs and priorities. The survey takes ~5 minutes.
Fill out the surveyParticipation
This tutorial is part of ICIAP 2025. Details about date, venue, and registration can be found on the conference website.