ICIAP 2025 Logo
UNIMORE Logo
CINECA Logo
MINERVA EU Project Logo

Abstract

Deep Learning has revolutionized AI, driving advancements in computer vision, natural language processing, and multimodal learning. However, training and deploying large-scale AI models come with computational challenges. As models grow in size and complexity, efficiency in both training and inference becomes crucial for scalability, accessibility, and sustainability. This tutorial, organized as part of the MINERVA EU project, will provide a comprehensive overview of how to leverage high-performance computing resources, covering distributed training strategies, optimization techniques, and resource management for large-scale deep learning. We will discuss best practices for leveraging multi-GPU and multi-node architectures, memory-efficient training, and mixed-precision techniques to accelerate AI workloads. Special focus will be given to the latest advancements in Transformer-based models, their computational impact, and strategies to make them more efficient. Participants will gain hands-on experience with state-of-the-art frameworks, exploring practical approaches for optimizing AI training and inference on HPC clusters. The tutorial will also address energy efficiency concerns, discussing techniques for reducing AI’s carbon footprint while maintaining model performance. Attendees will have a deeper understanding of how to design, train, and deploy AI models efficiently, making high-performance computing more accessible to the research community.

Description

Deep Learning (DL) has revolutionized Artificial Intelligence (AI), enabling state-of-the-art advancements in computer vision, natural language processing, and multimodal learning. However, the computational cost of training and deploying large-scale AI models is immense, requiring optimized hardware utilization, parallelization strategies, and efficient resource scheduling to achieve scalability. This tutorial will explore the computational aspects of large-scale AI, with a focus on optimizing deep learning training and inference on high-performance computing (HPC) infrastructures.

The tutorial will provide a hands-on introduction to the fundamental principles and practical techniques of high-performance AI computing, including distributed training strategies, GPU and multi-node acceleration, memory-efficient model optimization, and inference acceleration. We will cover best practices for leveraging HPC environments, including supercomputing clusters and cloud-based AI infrastructures, to scale deep learning workloads efficiently. Participants will learn how to implement state-of-the-art techniques such as model parallelism, mixed-precision training, and advanced scheduling policies to maximize performance and reduce energy consumption.

The tutorial is designed for AI researchers, data scientists, and engineers interested in training and deploying deep learning models efficiently at scale. It will include theoretical insights, real-world case studies, and interactive hands-on labs using PyTorch, DeepSpeed, and distributed computing frameworks.

Syllabus & Program

  • Overview of cluster architecture
  • Multi-Node and Multi-GPU architectures
  • How to get computational resources @CINECA
  • Network and data movement on a HPC cluster
  • Software environments on a HPC cluster
  • Different views of a compute node: Hybrid MPI/openMP/GPU simulations
  • Managing resources: the SLURM queue manager
  • Tips & Tricks on a HPC cluster
  • The attention operator, self-/cross-attention and its variants
  • The Transformer architecture
  • Transformer-based models for Language (BERT, BART, GPT-x)
  • Transformer-based models for Computer Vision (ViT, DeIT, DETR)
  • Usage of learnable queries, Perceiver, Perceiver IO
  • Connecting multiple modalities: encoder-decoder, encoder-only and decoder-only approaches
  • Optimized variants of the attention operator
  • Masked language models, Next Sentence prediction
  • Noisy labels / Self-training
  • Contrastive learning
  • Multi-modal foundation models: CLIP, Flamingo
  • Scaling laws for Deep Learning architectures
  • The Mixture of Experts approach
  • Overview of parallelism
  • Distributed data parallelism
  • Reducing numerical precision
  • DataLoader optimization

Slides & Materials

  • Sergio Orlandini
    Introduction to HPC environments
    Coming soon
  • Lorenzo Baraldi
    Slides for: State-of-the-art DL architectures; Self-/semi-supervised training
    View PDF
  • Andrea Pilzer
    Training techniques; Distributed training
    View PDF
Organizers
  • Lorenzo Baraldi
    UNIMORE
  • Andrea Pilzer
    NVIDIA
  • Sergio Orlandini
    CINECA
About MINERVA

This tutorial is organized within the MINERVA EU project.

MINERVA is a European initiative supporting advances in AI research and its efficient deployment on high-performance computing infrastructures. Learn more on the official website.

MINERVA AI Community Survey

Help shape future HPC facilities for AI research by sharing your needs and priorities. The survey takes ~5 minutes.

Fill out the survey
Participation

This tutorial is part of ICIAP 2025. Details about date, venue, and registration can be found on the conference website.