ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Knowledge Distillation with a Precise Teacher and Prediction with Abstention

Xu Yi, Jian Pu, Hui Zhao

Auto-TLDR; Knowledge Distillation using Deep gambler loss and selective classification framework

Abstract Slides Poster

Knowledge distillation, which aims to train model under the supervision from another large model (teacher model) to the original model (student model), has achieved remarkable results in supervised learning. However, there are two major problems with existing knowledge distillation methods. One is the teacher's supervision is sometimes misleading, and the other is the student's prediction is not accurate enough. To address the first issue, instead of learning a combination of both teachers and ground truth, we apply knowledge adjustment to correct teachers' supervision using ground truth. For the second problem, we use the selective classification framework to train the student model. In particular, the deep gambler loss is adopted to predict with reservation by explicitly introducing the $(m+1)$-th class. We consider two settings of knowledge distillation: (1) distillation across different network structures ({\it AlexNet, ResNet}), and (2) distillation across networks with different depths ({\it ResNet18, ResNet50}) to evaluate the effectiveness of our method. The experimental results on benchmark datasets (i.e., {\it Fashion-MNIST, SVHN, CIFAR10, CIFAR100}) are reported with higher prediction accuracies and lower coverage errors.

Similar papers

Efficient Online Subclass Knowledge Distillation for Image Classification

Maria Tzelepi, Nikolaos Passalis, Anastasios Tefas

Auto-TLDR; OSKD: Online Subclass Knowledge Distillation

Knowledge Distillation with a Precise Teacher and Prediction with Abstention

Similar papers

Efficient Online Subclass Knowledge Distillation for Image Classification

Distilling Spikes: Knowledge Distillation in Spiking Neural Networks

Knowledge Distillation Beyond Model Compression

Stochastic Label Refinery: Toward Better Target Label Distribution

Feature Fusion for Online Mutual Knowledge Distillation

Compact CNN Structure Learning by Knowledge Distillation

Channel Planting for Deep Neural Networks Using Knowledge Distillation

Automatic Student Network Search for Knowledge Distillation

Local Clustering with Mean Teacher for Semi-Supervised Learning

Towards Low-Bit Quantization of Deep Neural Networks with Limited Data

Adaptive Noise Injection for Training Stochastic Student Networks from Deterministic Teachers

Can Data Placement Be Effective for Neural Networks Classification Tasks? Introducing the Orthogonal Loss

Boundary Optimised Samples Training for Detecting Out-Of-Distribution Images

A Boundary-Aware Distillation Network for Compressed Video Semantic Segmentation

Class-Incremental Learning with Topological Schemas of Memory Spaces

FastSal: A Computationally Efficient Network for Visual Saliency Prediction

Iterative Label Improvement: Robust Training by Confidence Based Filtering and Dataset Partitioning

Is the Meta-Learning Idea Able to Improve the Generalization of Deep Neural Networks on the Standard Supervised Learning?

Adversarial Knowledge Distillation for a Compact Generator

Quasibinary Classifier for Images with Zero and Multiple Labels

Smart Inference for Multidigit Convolutional Neural Network Based Barcode Decoding

Contextual Classification Using Self-Supervised Auxiliary Models for Deep Neural Networks

Adaptive Distillation for Decentralized Learning from Heterogeneous Clients

Teacher-Student Competition for Unsupervised Domain Adaptation

Multi-Order Feature Statistical Model for Fine-Grained Visual Categorization

A Delayed Elastic-Net Approach for Performing Adversarial Attacks

Meta Soft Label Generation for Noisy Labels

Teacher-Student Training and Triplet Loss for Facial Expression Recognition under Occlusion

Image Representation Learning by Transformation Regression

Beyond Cross-Entropy: Learning Highly Separable Feature Distributions for Robust and Accurate Classification

Verifying the Causes of Adversarial Examples

P-DIFF: Learning Classifier with Noisy Labels Based on Probability Difference Distributions

Revisiting ImprovedGAN with Metric Learning for Semi-Supervised Learning

Rethinking Experience Replay: A Bag of Tricks for Continual Learning

Norm Loss: An Efficient yet Effective Regularization Method for Deep Neural Networks

IDA-GAN: A Novel Imbalanced Data Augmentation GAN

MetaMix: Improved Meta-Learning with Interpolation-based Consistency Regularization

Rethinking of Deep Models Parameters with Respect to Data Distribution

Pretraining Image Encoders without Reconstruction Via Feature Prediction Loss

Exploiting Knowledge Embedded Soft Labels for Image Recognition

NeuralFP: Out-Of-Distribution Detection Using Fingerprints of Neural Networks

Knowledge Distillation for Action Anticipation Via Label Smoothing

Probability Guided Maxout

Exploiting Distilled Learning for Deep Siamese Tracking

Fine-Tuning DARTS for Image Classification

Generalization Comparison of Deep Neural Networks Via Output Sensitivity

Improving Batch Normalization with Skewness Reduction for Deep Neural Networks

Adversarially Constrained Interpolation for Unsupervised Domain Adaptation