ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Knowledge Distillation Beyond Model Compression

Fahad Sarfraz, Elahe Arani, Bahram Zonooz

Auto-TLDR; Knowledge Distillation from Teacher to Student

Abstract Slides Poster

Knowledge distillation (KD) is commonly deemed as an effective model compression technique in which a compact model (student) is trained under the supervision of a larger pretrained model or an ensemble of models (teacher). Various techniques have been proposed since the original formulation, which mimics different aspects of the teacher such as the representation space, decision boundary or intra-data relationship. Some methods replace the one way knowledge distillation from a static teacher with collaborative learning between a cohort of students. Despite the recent advances, a clear understanding of where knowledge resides in a deep neural network and optimal method for capturing knowledge from teacher and transferring it to student still remains an open question. In this study we provide an extensive study on 9 different knowledge distillation methods which covers a broad spectrum of approaches to capture and transfer knowledge. We demonstrate the versatility of the KD framework on different datasets and network architectures under varying capacity gaps between the teacher and student. The study provides intuition for the effects of mimicking different aspects of the teacher and derives insights from the performance of the different distillation approaches to guide the the design of more effective KD methods . Furthermore, our study shows the effectiveness of the KD framework in learning efficiently under varying severity levels of label noise and class imbalance, consistently providing significant generalization gains over standard training. We emphasize that the efficacy of KD goes much beyond a model compression technique and should be considered as a general purpose training paradigm which offers more robustness to common challenges in the real-world datasets compared to the standard training procedure.

Similar papers

Efficient Online Subclass Knowledge Distillation for Image Classification

Maria Tzelepi, Nikolaos Passalis, Anastasios Tefas

Auto-TLDR; OSKD: Online Subclass Knowledge Distillation

Knowledge Distillation Beyond Model Compression

Similar papers

Efficient Online Subclass Knowledge Distillation for Image Classification

Feature Fusion for Online Mutual Knowledge Distillation

Adaptive Noise Injection for Training Stochastic Student Networks from Deterministic Teachers

Knowledge Distillation with a Precise Teacher and Prediction with Abstention

Distilling Spikes: Knowledge Distillation in Spiking Neural Networks

Compact CNN Structure Learning by Knowledge Distillation

Local Clustering with Mean Teacher for Semi-Supervised Learning

Stochastic Label Refinery: Toward Better Target Label Distribution

Towards Robust Learning with Different Label Noise Distributions

On-Manifold Adversarial Data Augmentation Improves Uncertainty Calibration

Meta Soft Label Generation for Noisy Labels

Channel Planting for Deep Neural Networks Using Knowledge Distillation

Is the Meta-Learning Idea Able to Improve the Generalization of Deep Neural Networks on the Standard Supervised Learning?

A Delayed Elastic-Net Approach for Performing Adversarial Attacks

Automatic Student Network Search for Knowledge Distillation

P-DIFF: Learning Classifier with Noisy Labels Based on Probability Difference Distributions

Improving Model Accuracy for Imbalanced Image Classification Tasks by Adding a Final Batch Normalization Layer: An Empirical Study

FastSal: A Computationally Efficient Network for Visual Saliency Prediction

Adversarial Knowledge Distillation for a Compact Generator

Exploiting Non-Linear Redundancy for Neural Model Compression

Beyond Cross-Entropy: Learning Highly Separable Feature Distributions for Robust and Accurate Classification

Teacher-Student Training and Triplet Loss for Facial Expression Recognition under Occlusion

Contextual Classification Using Self-Supervised Auxiliary Models for Deep Neural Networks

MaxDropout: Deep Neural Network Regularization Based on Maximum Output Values

Iterative Label Improvement: Robust Training by Confidence Based Filtering and Dataset Partitioning

On the Information of Feature Maps and Pruning of Deep Neural Networks

Verifying the Causes of Adversarial Examples

Learning Embeddings for Image Clustering: An Empirical Study of Triplet Loss Approaches

Can Data Placement Be Effective for Neural Networks Classification Tasks? Introducing the Orthogonal Loss

Norm Loss: An Efficient yet Effective Regularization Method for Deep Neural Networks

Towards Low-Bit Quantization of Deep Neural Networks with Limited Data

Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge

Optimal Transport As a Defense against Adversarial Attacks

Defense Mechanism against Adversarial Attacks Using Density-Based Representation of Images

Neural Compression and Filtering for Edge-assisted Real-time Object Detection in Challenged Networks

Multi-Order Feature Statistical Model for Fine-Grained Visual Categorization

Exploiting Distilled Learning for Deep Siamese Tracking

Revisiting ImprovedGAN with Metric Learning for Semi-Supervised Learning

Graph-Based Interpolation of Feature Vectors for Accurate Few-Shot Classification

Adaptive Distillation for Decentralized Learning from Heterogeneous Clients

Variational Inference with Latent Space Quantization for Adversarial Resilience

A Close Look at Deep Learning with Small Data

Adversarially Training for Audio Classifiers

MINT: Deep Network Compression Via Mutual Information-Based Neuron Trimming

Smart Inference for Multidigit Convolutional Neural Network Based Barcode Decoding

Dynamic Multi-Path Neural Network

Minority Class Oriented Active Learning for Imbalanced Datasets

Rethinking of Deep Models Parameters with Respect to Data Distribution