Towards Low-Bit Quantization of Deep Neural Networks with Limited Data

Yong Yuan, Chen Chen, Xiyuan Hu, Silong Peng

Responsive image

Auto-TLDR; Low-Precision Quantization of Deep Neural Networks with Limited Data

Slides Poster

Recent machine learning methods use increasingly large deep neural networks to achieve state-of-the-art results in various tasks. Network quantization can effectively reduce computation and memory costs without modifying network structures, facilitating the deployment of deep neural networks (DNNs) on cloud and edge devices. However, most of the existing methods usually need time-consuming training or fine-tuning and access to the original training dataset that may be unavailable due to privacy or security concerns. In this paper, we present a novel method to achieve low-precision quantization of deep neural networks with limited data. Firstly, to reduce the complexity of per-channel quantization and the degeneration of per-layer quantization, we introduce group-wise quantization which separates the output channels into groups that each group is quantized separately. Secondly, to better distill knowledge from the pre-trained FP32 model with limited data, we introduce a two-stage knowledge distillation method that divides the optimization process into independent optimization stage and joint optimization stage to address the limitation of layer-wise supervision and global supervision. Extensive experiments on ImageNet2012 (ResNet18/50, ShuffleNetV2, and MobileNetV2) demonstrate that the proposed approach can significantly improve the quantization model's accuracy when only a few training samples are available. We further show that the method also extends to other computer vision architectures and tasks such as object detection and semantic segmentation.

Similar papers

Channel Planting for Deep Neural Networks Using Knowledge Distillation

Kakeru Mitsuno, Yuichiro Nomura, Takio Kurita

Responsive image

Auto-TLDR; Incremental Training for Deep Neural Networks with Knowledge Distillation

Slides Poster Similar

In recent years, deeper and wider neural networks have shown excellent performance in computer vision tasks, while their enormous amount of parameters results in increased computational cost and overfitting. Several methods have been proposed to compress the size of the networks without reducing network performance. Network pruning can reduce redundant and unnecessary parameters from a network. Knowledge distillation can transfer the knowledge of deeper and wider networks to smaller networks. The performance of the smaller network obtained by these methods is bounded by the predefined network. Neural architecture search has been proposed, which can search automatically the architecture of the networks to break the structure limitation. Also, there is a dynamic configuration method to train networks incrementally as sub-networks. In this paper, we present a novel incremental training algorithm for deep neural networks called planting. Our planting can search the optimal network architecture with smaller number of parameters for improving the network performance by augmenting channels incrementally to layers of the initial networks while keeping the earlier trained parameters fixed. Also, we propose using the knowledge distillation method for training the channels planted. By transferring the knowledge of deeper and wider networks, we can grow the networks effectively and efficiently. We evaluate the effectiveness of the proposed method on different datasets such as CIFAR-10/100 and STL-10. For the STL-10 dataset, we show that we are able to achieve comparable performance with only 7% parameters compared to the larger network and reduce the overfitting caused by a small amount of the data.

Automatic Student Network Search for Knowledge Distillation

Zhexi Zhang, Wei Zhu, Junchi Yan, Peng Gao, Guotong Xie

Responsive image

Auto-TLDR; NAS-KD: Knowledge Distillation for BERT

Slides Poster Similar

Pre-trained language models (PLMs), such as BERT, have achieved outstanding performance on multiple natural language processing (NLP) tasks. However, such pre-trained models usually contain a huge number of parameters and are computationally expensive. The high resource demand hinders their application on resource-restricted devices like mobile phones. Knowledge distillation (KD) is an effective compression approach, aiming at encouraging a light-weight student network to imitate the teacher network, and accordingly latent knowledge is transferred from the teacher to student. However, the great majority of student networks in previous KD methods are manually designed, normally a subnetwork of the teacher network. Transformer is generally utilized as the student for compressing BERT but still contains masses of parameters. Motivated by this, we propose a novel approach named NAS-KD, which automatically generates an optimal student network using neural architecture search (NAS) to enhance the distillation for BERT. Experiment on 7 classification tasks in NLP domain demonstrates that NAS-KD can substantially reduce the size of BERT without much performance sacrifice.

Compact CNN Structure Learning by Knowledge Distillation

Waqar Ahmed, Andrea Zunino, Pietro Morerio, Vittorio Murino

Responsive image

Auto-TLDR; Knowledge Distillation for Compressing Deep Convolutional Neural Networks

Slides Poster Similar

The concept of compressing deep Convolutional Neural Networks (CNNs) is essential to use limited computation, power, and memory resources on embedded devices. However, existing methods achieve this objective at the cost of a drop in inference accuracy in computer vision tasks. To address such a drawback, we propose a framework that leverages knowledge distillation along with customizable block-wise optimization to learn a lightweight CNN structure while preserving better control over the compression-performance tradeoff. Considering specific resource constraints, e.g., floating-point operations per second (FLOPs) or model-parameters, our method results in a state of the art network compression while being capable of achieving better inference accuracy. In a comprehensive evaluation, we demonstrate that our method is effective, robust, and consistent with results over a variety of network architectures and datasets, at negligible training overhead. In particular, for the already compact network MobileNet_v2, our method offers up to 2x and 5.2x better model compression in terms of FLOPs and model-parameters, respectively, while getting 1.05% better model performance than the baseline network.

FastSal: A Computationally Efficient Network for Visual Saliency Prediction

Feiyan Hu, Kevin Mcguinness

Responsive image

Auto-TLDR; MobileNetV2: A Convolutional Neural Network for Saliency Prediction

Slides Poster Similar

This paper focuses on the problem of visual saliency prediction, predicting regions of an image that tend to attract human visual attention, under a constrained computational budget. We modify and test various recent efficient convolutional neural network architectures like EfficientNet and MobileNetV2 and compare them with existing state-of-the-art saliency models such as SalGAN and DeepGaze II both in terms of standard accuracy metrics like AUC and NSS, and in terms of the computational complexity and model size. We find that MobileNetV2 makes an excellent backbone for a visual saliency model and can be effective even without a complex decoder. We also show that knowledge transfer from a more computationally expensive model like DeepGaze II can be achieved via pseudo-labelling an unlabelled dataset, and that this approach gives result on-par with many state-of-the-art algorithms with a fraction of the computational cost and model size.

Fast Implementation of 4-Bit Convolutional Neural Networks for Mobile Devices

Anton Trusov, Elena Limonova, Dmitry Slugin, Dmitry Nikolaev, Vladimir V. Arlazarov

Responsive image

Auto-TLDR; Efficient Quantized Low-Precision Neural Networks for Mobile Devices

Slides Poster Similar

Quantized low-precision neural networks are very popular because they require less computational resources for inference and can provide high performance, which is vital for real-time and embedded recognition systems. However, their advantages are apparent for FPGA and ASIC devices, while general-purpose processor architectures are not always able to perform low-bit integer computations efficiently. The most frequently used low-precision neural network model for mobile central processors is an 8-bit quantized network. However, in a number of cases, it is possible to use fewer bits for weights and activations, and the only problem is the difficulty of efficient implementation. We introduce an efficient implementation of 4-bit matrix multiplication for quantized neural networks and perform time measurements on a mobile ARM processor. It shows 2.9 times speedup compared to standard floating-point multiplication and is 1.5 times faster than 8-bit quantized one. We also demonstrate a 4-bit quantized neural network for OCR recognition on the MIDV-500 dataset. 4-bit quantization gives 95.0% accuracy and 48% overall inference speedup, while an 8-bit quantized network gives 95.4% accuracy and 39% speedup. The results show that 4-bit quantization perfectly suits mobile devices, yielding good enough accuracy and low inference time.

Resource-efficient DNNs for Keyword Spotting using Neural Architecture Search and Quantization

David Peter, Wolfgang Roth, Franz Pernkopf

Responsive image

Auto-TLDR; Neural Architecture Search for Keyword Spotting in Limited Resource Environments

Slides Poster Similar

This paper introduces neural architecture search (NAS) for the automatic discovery of small models for keyword spotting (KWS) in limited resource environments. We employ a differentiable NAS approach to optimize the structure of convolutional neural networks (CNNs) to meet certain memory constraints for storing weights as well as constraints on the number of operations per inference. Using NAS only, we were able to obtain a highly efficient model with 95.6% accuracy on the Google speech commands dataset with 494.8 kB of memory usage and 19.6 million operations. Additionally, weight quantization is used to reduce the memory consumption even further. We show that weight quantization to low bit-widths (e.g. 1 bit) can be used without substantial loss in accuracy. By increasing the number of input features from 10 MFCC to 20 MFCC we were able to increase the accuracy to 96.6% at 340.1 kB of memory usage and 27.1 million operations.

VPU Specific CNNs through Neural Architecture Search

Ciarán Donegan, Hamza Yous, Saksham Sinha, Jonathan Byrne

Responsive image

Auto-TLDR; Efficient Convolutional Neural Networks for Edge Devices using Neural Architecture Search

Slides Poster Similar

The success of deep learning at computer vision tasks has led to an ever-increasing number of applications on edge devices. Often with the use of edge AI hardware accelerators like the Intel Movidius Vision Processing Unit (VPU). Performing computer vision tasks on edge devices is challenging. Many Convolutional Neural Networks (CNNs) are too complex to run on edge devices with limited computing power. This has created large interest in designing efficient CNNs and one promising way of doing this is through Neural Architecture Search (NAS). NAS aims to automate the design of neural networks. NAS can also optimize multiple different objectives together, like accuracy and efficiency, which is difficult for humans. In this paper, we use a differentiable NAS method to find efficient CNNs for VPU that achieves state-of-the-art classification accuracy on ImageNet. Our NAS designed model outperforms MobileNetV2, having almost 1\% higher top-1 accuracy while being 13\% faster on MyriadX VPU. To the best of our knowledge, this is the first time a VPU specific CNN has been designed using a NAS algorithm. Our results also reiterate the fact that efficient networks must be designed for each specific hardware. We show that efficient networks targeted at different devices do not perform as well on the VPU.

A Discriminant Information Approach to Deep Neural Network Pruning

Zejiang Hou, Sy Kung

Responsive image

Auto-TLDR; Channel Pruning Using Discriminant Information and Reinforcement Learning

Slides Poster Similar

Network pruning has become the de facto tool to accelerate and compress deep convolutional neural networks for mobile and edge applications. Previous works tend to perform channel selection in layer-wise manner based on predefined heuristics, without considering layer importance or systematically optimizing the pruned structure. In this work, we propose a novel channel pruning method that jointly harnesses two strategies: (1) a channel importance ranking heuristics based on the feature-maps discriminant power, (2) a searching method for optimal pruning budget allocation. For the former, we propose a Discriminant Information (DI) based channel selection algorithm. We use a small batch of training samples to compute the DI score for each channel and rank the channel importance so that channels really contributing to the feature-maps discriminant power are retained. For the latter, in order to search the optimal pruning budget allocation, we formulate a reward maximization problem to discover the layer importance and generating the pruning budget accordingly. Such reward maximization can be efficiently solved by the policy gradient algorithm in reinforcement learning, yielding our final pruned network which achieves the best accuracy-efficiency trade-off. Experiments on a variety of CNN architectures and benchmark datasets show that our proposed channel pruning methods compare favorably with previous state-of-the-art methods. On ImageNet, our pruned MobileNetV2 outperforms the previous layer-wise state-of-the-art pruning method CPLI \cite{guo2020channel} by 2\% Top-1 accuracy while reducing the FLOPs by 50\%.

Distilling Spikes: Knowledge Distillation in Spiking Neural Networks

Ravi Kumar Kushawaha, Saurabh Kumar, Biplab Banerjee, Rajbabu Velmurugan

Responsive image

Auto-TLDR; Knowledge Distillation in Spiking Neural Networks for Image Classification

Slides Poster Similar

Spiking Neural Networks (SNN) are energy-efficient computing architectures that exchange spikes for processing information, unlike classical Artificial Neural Networks (ANN). Due to this, SNNs are better suited for real-life deployments. However, similar to ANNs, SNNs also benefit from deeper architectures to obtain improved performance. Furthermore, like the deep ANNs, the memory, compute and power requirements of SNNs also increase with model size, and model compression becomes a necessity. Knowledge distillation is a model com- pression technique that enables transferring the learning of a large machine learning model to a smaller model with minimal loss in performance. In this paper, we propose techniques for knowledge distillation in spiking neural networks for the task of image classification. We present ways to distill spikes from a larger SNN, also called the teacher network, to a smaller one, also called the student network, while minimally impacting the classification accuracy. We demonstrate the effectiveness of the proposed method with detailed experiments on three standard datasets while proposing novel distillation methodologies and loss functions. We also present a multi-stage knowledge distillation technique for SNNs using an intermediate network to obtain higher performance from the student network. Our approach is expected to open up new avenues for deploying high performing large SNN models on resource-constrained hardware platforms.

Neural Compression and Filtering for Edge-assisted Real-time Object Detection in Challenged Networks

Yoshitomo Matsubara, Marco Levorato

Responsive image

Auto-TLDR; Deep Neural Networks for Remote Object Detection Using Edge Computing

Slides Poster Similar

The edge computing paradigm places compute-capable devices - edge servers - at the network edge to assist mobile devices in executing data analysis tasks. Intuitively, offloading compute-intense tasks to edge servers can reduce their execution time. However, poor conditions of the wireless channel connecting the mobile devices to the edge servers may degrade the overall capture-to-output delay achieved by edge offloading. Herein, we focus on edge computing supporting remote object detection by means of Deep Neural Networks (DNN), and develop a framework to reduce the amount of data transmitted over the wireless link. The core idea we propose builds on recent approaches splitting DNNs into sections - namely head and tail models - executed by the mobile device and edge server, respectively. The wireless link, then, is used to transport the output of the last layer of the head model to the edge server, instead of the DNN input. Most prior work focuses on classification tasks and leaves the DNN structure unaltered. Herein, we focus on DNNs for three different object detection tasks, which present a much more convoluted structure, and modify the architecture of the network to: (i) achieve in-network compression by introducing a bottleneck layer in the early layers on the head model, and (ii) prefilter pictures that do not contain objects of interest using a convolutional neural network. Results show that the proposed technique represents an effective intermediate option between local and edge computing in a parameter region where these extreme point solutions fail to provide satisfactory performance.

HFP: Hardware-Aware Filter Pruning for Deep Convolutional Neural Networks Acceleration

Fang Yu, Chuanqi Han, Pengcheng Wang, Ruoran Huang, Xi Huang, Li Cui

Responsive image

Auto-TLDR; Hardware-Aware Filter Pruning for Convolutional Neural Networks

Slides Poster Similar

Convolutional Neural Networks (CNNs) are powerful but computationally demanding and memory intensive, thus impeding their practical applications on resource-constrained hardware. Filter pruning is an efficient approach for deep CNN compression and acceleration, which aims to eliminate some filters with tolerable performance degradation. In the literature, the majority of approaches prune networks by defining the redundant filters or training the networks with a sparsity prior loss function. These approaches mainly use FLOPs as their speed metric. However, the inference latency of pruned networks cannot be directly controlled on the hardware platform, which is an important dimension of practicality. To address this issue, we propose a novel Hardware-aware Filter Pruning method (HFP) which can produce pruned networks that satisfy the actual latency budget on the hardwares of interest. In addition, we propose an iterative pruning framework called Opti-Cut to decrease the accuracy degradation of pruning process and accelerate the pruning procedure whilst meeting the hardware budget. More specifically, HFP first builds up a lookup table for fast estimating the latency of target network about filter configuration layer by layer. Then, HFP leverages information gain (IG) to globally evaluate the filters' contribution to network output distribution. HFP utilizes the Opti-Cut framework to globally prune filters with the minimum IG one by one until the latency budget is satisfied. We verify the effectiveness of the proposed method on CIFAR-10 and ImageNet. Compared with the state-of-the-art pruning methods, HFP demonstrates superior performances on VGGNet, ResNet and MobileNet V1/V2.

Slimming ResNet by Slimming Shortcut

Donggyu Joo, Doyeon Kim, Junmo Kim

Responsive image

Auto-TLDR; SSPruning: Slimming Shortcut Pruning on ResNet Based Networks

Slides Poster Similar

Conventional network pruning methods on convolutional neural networks (CNNs) reduce the number of input or output channels of convolution layers. With these approaches, the channels in the plain network can be pruned without any restrictions. However, in case of the ResNet based networks which have shortcuts (skip connections), the channel slimming of existing pruning methods is limited to the inside of each residual block. Since the number of Flops and parameters are also highly related to the number of channels in the shortcuts, more investigation on pruning channels in shortcuts is required. In this paper, we propose a novel pruning method, Slimming Shortcut Pruning (SSPruning), for pruning channels in shortcuts on ResNet based networks. First, we separate the long shortcut in individual regions that can be pruned independently without considering its long connections. Then, by applying our Importance Learning Gate (ILG) which learns the importance of channels globally regardless of channel type and location (i.e., in the shortcut or inside of the block), we can finally achieve an optimally pruned model. Through various experiments, we have confirmed that our method yields outstanding results when we prune the shortcuts and inside of the block together.

Knowledge Distillation with a Precise Teacher and Prediction with Abstention

Xu Yi, Jian Pu, Hui Zhao

Responsive image

Auto-TLDR; Knowledge Distillation using Deep gambler loss and selective classification framework

Slides Poster Similar

Knowledge distillation, which aims to train model under the supervision from another large model (teacher model) to the original model (student model), has achieved remarkable results in supervised learning. However, there are two major problems with existing knowledge distillation methods. One is the teacher's supervision is sometimes misleading, and the other is the student's prediction is not accurate enough. To address the first issue, instead of learning a combination of both teachers and ground truth, we apply knowledge adjustment to correct teachers' supervision using ground truth. For the second problem, we use the selective classification framework to train the student model. In particular, the deep gambler loss is adopted to predict with reservation by explicitly introducing the $(m+1)$-th class. We consider two settings of knowledge distillation: (1) distillation across different network structures ({\it AlexNet, ResNet}), and (2) distillation across networks with different depths ({\it ResNet18, ResNet50}) to evaluate the effectiveness of our method. The experimental results on benchmark datasets (i.e., {\it Fashion-MNIST, SVHN, CIFAR10, CIFAR100}) are reported with higher prediction accuracies and lower coverage errors.

Efficient Online Subclass Knowledge Distillation for Image Classification

Maria Tzelepi, Nikolaos Passalis, Anastasios Tefas

Responsive image

Auto-TLDR; OSKD: Online Subclass Knowledge Distillation

Slides Poster Similar

Deploying state-of-the-art deep learning models on embedded systems dictates certain storage and computation limitations. During the recent few years Knowledge Distillation (KD) has been recognized as a prominent approach to address this issue. That is, KD has been effectively proposed for training fast and compact deep learning models by transferring knowledge from more complex and powerful models. However, knowledge distillation, in its conventional form, involves multiple stages of training, rendering it a computationally and memory demanding procedure. In this paper, a novel single-stage self knowledge distillation method is proposed, namely Online Subclass Knowledge Distillation (OSKD), that aims at revealing the similarities inside classes, improving the performance of any deep neural model in an online manner. Hence, as opposed to existing online distillation methods, we are able to acquire further knowledge from the model itself, without building multiple identical models or using multiple models to teach each other, rendering the OSKD approach more efficient. The experimental evaluation on two datasets validates that the proposed method improves the classification performance.

Compression of YOLOv3 Via Block-Wise and Channel-Wise Pruning for Real-Time and Complicated Autonomous Driving Environment Sensing Applications

Jiaqi Li, Yanan Zhao, Li Gao, Feng Cui

Responsive image

Auto-TLDR; Pruning YOLOv3 with Batch Normalization for Autonomous Driving

Slides Poster Similar

Nowadays, in the area of autonomous driving, the computational power of the object detectors is limited by the embedded devices and the public datasets for autonomous driving are over-idealistic. In this paper, we propose a pipeline combining both block-wise pruning and channel-wise pruning to compress the object detection model iteratively. We enforce the introduced factor of the residual blocks and the scale parameters in Batch Normalization (BN) layers to sparsity to select the less important residual blocks and channels. Moreover, a modified loss function has been proposed to remedy the class-imbalance problem. After removing the unimportant structures iteratively, we get the pruned YOLOv3 trained on our datasets which have more abundant and elaborate classes. Evaluated by our validation sets on the server, the pruned YOLOv3 saves 79.7% floating point operations (FLOPs), 93.8% parameter size, 93.8% model volume and 45.4% inference times with only 4.16% mean of average precision (mAP) loss. Evaluated on the embedded device, the pruned model operates about 13 frames per second with 4.53% mAP loss. These results show that the real-time property and accuracy of the pruned YOLOv3 can meet the needs of the embedded devices in complicated autonomous driving environments.

Dynamic Multi-Path Neural Network

Yingcheng Su, Yichao Wu, Ken Chen, Ding Liang, Xiaolin Hu

Responsive image

Auto-TLDR; Dynamic Multi-path Neural Network

Slides Similar

Although deeper and larger neural networks have achieved better performance, due to overwhelming burden on computation, they cannot meet the demands of deployment on resource-limited devices. An effective strategy to address this problem is to make use of dynamic inference mechanism, which changes the inference path for different samples at runtime. Existing methods only reduce the depth by skipping an entire specific layer, which may lose important information in this layer. In this paper, we propose a novel method called Dynamic Multi-path Neural Network (DMNN), which provides more topology choices in terms of both width and depth on the fly. For better modelling the inference path selection, we further introduce previous state and object category information to guide the training process. Compared to previous dynamic inference techniques, the proposed method is more flexible and easier to incorporate into most modern network architectures. Experimental results on ImageNet and CIFAR-100 demonstrate the superiority of our method on both efficiency and classification accuracy.

On Resource-Efficient Bayesian Network Classifiers and Deep Neural Networks

Wolfgang Roth, Günther Schindler, Holger Fröning, Franz Pernkopf

Responsive image

Auto-TLDR; Quantization-Aware Bayesian Network Classifiers for Small-Scale Scenarios

Slides Poster Similar

We present two methods to reduce the complexity of Bayesian network (BN) classifiers. First, we introduce quantization-aware training using the straight-through gradient estimator to quantize the parameters of BNs to few bits. Second, we extend a recently proposed differentiable tree-augmented naive Bayes (TAN) structure learning approach to also consider the model size. Both methods are motivated by recent developments in the deep learning community, and they provide effective means to trade off between model size and prediction accuracy, which is demonstrated in extensive experiments. Furthermore, we contrast quantized BN classifiers with quantized deep neural networks (DNNs) for small-scale scenarios which have hardly been investigated in the literature. We show Pareto optimal models with respect to model size, number of operations, and test error and find that both model classes are viable options.

NAS-EOD: An End-To-End Neural Architecture Search Method for Efficient Object Detection

Huigang Zhang, Liuan Wang, Jun Sun, Li Sun, Hiromichi Kobashi, Nobutaka Imamura

Responsive image

Auto-TLDR; NAS-EOD: Neural Architecture Search for Object Detection on Edge Devices

Slides Similar

Model efficiency for object detection has become more and more important recently, especially when intelligent mobile devices are more and more convenient and developed today. Current small models for this task is either extended from the models for classification task, or pruned directly on the basis of large models. These pipelines are not task-specific or data-oriented so that their performance are not good enough for users. In this work, we propose a neural architecture search (NAS) method to build a detection model automatically that can perform well on edge devices. Specifically, the proposed method supports the search of not only multi-scale feature network, but also backbone network. This enables us to search out a global optimal model. To the best of our knowledge, it is a first attempt for searching an overall detection model via NAS. Additionally, we add latency information into the main objective during performance estimation, so that the search process can find a final model suitable for edge devices. Experiments on the PASCAL VOC benchmark indicate that the searched model (named NAS-EOD) can get good accuracy even without ImageNet pre-training. When using ImageNet pre-training, our model is superior to state-of-the-art small object detection models.

Attention Based Pruning for Shift Networks

Ghouthi Hacene, Carlos Lassance, Vincent Gripon, Matthieu Courbariaux, Yoshua Bengio

Responsive image

Auto-TLDR; Shift Attention Layers for Efficient Convolutional Layers

Slides Poster Similar

In many application domains such as computer vision, Convolutional Layers (CLs) are key to the accuracy of deep learning methods. However, it is often required to assemble a large number of CLs, each containing thousands of parameters, in order to reach state-of-the-art accuracy, thus resulting in complex and demanding systems that are poorly fitted to resource-limited devices. Recently, methods have been proposed to replace the generic convolution operator by the combination of a shift operation and a simpler 1x1 convolution. The resulting block, called Shift Layer (SL), is an efficient alternative to CLs in the sense it allows to reach similar accuracies on various tasks with faster computations and fewer parameters. In this contribution, we introduce Shift Attention Layers (SALs), which extend SLs by using an attention mechanism that learns which shifts are the best at the same time the network function is trained. We demonstrate SALs are able to outperform vanilla SLs (and CLs) on various object recognition benchmarks while significantly reducing the number of float operations and parameters for the inference.

Filter Pruning Using Hierarchical Group Sparse Regularization for Deep Convolutional Neural Networks

Kakeru Mitsuno, Takio Kurita

Responsive image

Auto-TLDR; Hierarchical Group Sparse Regularization for Sparse Convolutional Neural Networks

Slides Poster Similar

Since the convolutional neural networks are often trained with redundant parameters, it is possible to reduce redundant kernels or filters to obtain a compact network without dropping the classification accuracy. In this paper, we propose a filter pruning method using the hierarchical group sparse regularization. It is shown in our previous work that the hierarchical group sparse regularization is effective in obtaining sparse networks in which filters connected to unnecessary channels are automatically close to zero. After training the convolutional neural network with the hierarchical group sparse regularization, the unnecessary filters are selected based on the increase of the classification loss of the randomly selected training samples to obtain a compact network. It is shown that the proposed method can reduce more than 50% parameters of ResNet for CIFAR-10 with only 0.3% decrease in the accuracy of test samples. Also, 34% parameters of ResNet are reduced for TinyImageNet-200 with higher accuracy than the baseline network.

Feature Fusion for Online Mutual Knowledge Distillation

Jangho Kim, Minsung Hyun, Inseop Chung, Nojun Kwak

Responsive image

Auto-TLDR; Feature Fusion Learning Using Fusion of Sub-Networks

Slides Poster Similar

We propose a learning framework named Feature Fusion Learning (FFL) that efficiently trains a powerful classifier through a fusion module which combines the feature maps generated from parallel neural networks and generates meaningful feature maps. Specifically, we train a number of parallel neural networks as sub-networks, then we combine the feature maps from each sub-network using a fusion module to create a more meaningful feature map. The fused feature map is passed into the fused classifier for overall classification. Unlike existing feature fusion methods, in our framework, an ensemble of sub-network classifiers transfers its knowledge to the fused classifier and then the fused classifier delivers its knowledge back to each sub-network, mutually teaching one another in an online-knowledge distillation manner. This mutually teaching system not only improves the performance of the fused classifier but also obtains performance gain in each sub-network. Moreover, our model is more beneficial than other alternative methods because different types of network can be used for each sub-network. We have performed a variety of experiments on multiple datasets such as CIFAR-10, CIFAR-100 and ImageNet and proved that our method is more effective than other alternative methods in terms of performances of both sub-networks and the fused classifier, and the aspect of generating meaningful feature maps.

Exploiting Non-Linear Redundancy for Neural Model Compression

Muhammad Ahmed Shah, Raphael Olivier, Bhiksha Raj

Responsive image

Auto-TLDR; Compressing Deep Neural Networks with Linear Dependency

Slides Poster Similar

Deploying deep learning models with millions, even billions, of parameters is challenging given real world memory, power and compute constraints. In an effort to make these models more practical, in this paper, we propose a novel model compression approach that exploits linear dependence between the activations in a layer to eliminate entire structural units (neurons/convolutional filters). Our approach also adjusts the weights of the layer in a manner that is provably lossless while training if the removed neuron was perfectly predictable. We combine this approach with an annealing algorithm that may be applied during training, or even on a trained model, and demonstrate, using popular datasets, that our technique can reduce the parameters of VGG and AlexNet by more than 97\% on \cifar, 85\% on \caltech, and 19\% on ImageNet at less than 2\% loss in accuracy. Furthermore, we provide theoretical results showing that in overparametrized, locally linear (ReLU) neural networks where redundant features exist, and with correct hyperparameter selection, our method is indeed able to capture and suppress those dependencies.

Learning to Prune in Training via Dynamic Channel Propagation

Shibo Shen, Rongpeng Li, Zhifeng Zhao, Honggang Zhang, Yugeng Zhou

Responsive image

Auto-TLDR; Dynamic Channel Propagation for Neural Network Pruning

Slides Poster Similar

In this paper, we propose a novel network training mechanism called "dynamic channel propagation" to prune the model during the training period. In particular, we pick up a specific group of channels in each convolutional layer to participate in the forward propagation in training time according to the significance level of channel, which is defined as channel utility. The utility values with respect to all selected channels are updated simultaneously with the error back-propagation process and will constantly change. Furthermore, when the training ends, channels with high utility values are retained whereas those with low utility values are discarded. Hence, our proposed method trains and prunes neural networks simultaneously. We empirically evaluate our novel training method on various representative benchmark datasets and advanced convolutional neural network (CNN) architectures, including VGGNet and ResNet. The experiment results verify superior performance and robust effectiveness of our approach.

WeightAlign: Normalizing Activations by Weight Alignment

Xiangwei Shi, Yunqiang Li, Xin Liu, Jan Van Gemert

Responsive image

Auto-TLDR; WeightAlign: Normalization of Activations without Sample Statistics

Slides Poster Similar

Batch normalization (BN) allows training very deep networks by normalizing activations by mini-batch sample statistics which renders BN unstable for small batch sizes. Current small-batch solutions such as Instance Norm, Layer Norm, and Group Norm use channel statistics which can be computed even for a single sample. Such methods are less stable than BN as they critically depend on the statistics of a single input sample. To address this problem, we propose a normalization of activation without sample statistics. We present WeightAlign: a method that normalizes the weights by the mean and scaled standard derivation computed within a filter, which normalizes activations without computing any sample statistics. Our proposed method is independent of batch size and stable over a wide range of batch sizes. Because weight statistics are orthogonal to sample statistics, we can directly combine WeightAlign with any method for activation normalization. We experimentally demonstrate these benefits for classification on CIFAR-10, CIFAR-100, ImageNet, for semantic segmentation on PASCAL VOC 2012 and for domain adaptation on Office-31.

Efficient-Receptive Field Block with Group Spatial Attention Mechanism for Object Detection

Jiacheng Zhang, Zhicheng Zhao, Fei Su

Responsive image

Auto-TLDR; E-RFB: Efficient-Receptive Field Block for Deep Neural Network for Object Detection

Slides Poster Similar

Object detection has been paid rising attention in computer vision field. Convolutional Neural Networks (CNNs) extract high-level semantic features of images, which directly determine the performance of object detection. As a common solution, embedding integration modules into CNNs can enrich extracted features and thereby improve the performance. However, the instability and inconsistency of internal multiple branches exist in these modules. To address this problem, we propose a novel multibranch module called Efficient-Receptive Field Block (E-RFB), in which multiple levels of features are combined for network optimization. Specifically, by downsampling and increasing depth, the E-RFB provides sufficient RF. Second, in order to eliminate the inconsistency across different branches, a novel spatial attention mechanism, namely, Group Spatial Attention Module (GSAM) is proposed. The GSAM gradually narrows a feature map by channel grouping; thus it encodes the information between spatial and channel dimensions into the final attention heat map. Third, the proposed module can be easily joined in various CNNs to enhance feature representation as a plug-and-play component. With SSD-style detectors, our method halves the parameters of the original detection head and achieves high accuracy on the PASCAL VOC and MS COCO datasets. Moreover, the proposed method achieves superior performance compared with state-of-the-art methods based on similar framework.

ResNet-Like Architecture with Low Hardware Requirements

Elena Limonova, Daniil Alfonso, Dmitry Nikolaev, Vladimir V. Arlazarov

Responsive image

Auto-TLDR; BM-ResNet: Bipolar Morphological ResNet for Image Classification

Slides Poster Similar

One of the most computationally intensive parts in modern recognition systems is an inference of deep neural networks that are used for image classification, segmentation, enhancement, and recognition. The growing popularity of edge computing makes us look for ways to reduce its time for mobile and embedded devices. One way to decrease the neural network inference time is to modify a neuron model to make it more efficient for computations on a specific device. The example of such a model is a bipolar morphological neuron model. The bipolar morphological neuron is based on the idea of replacing multiplication with addition and maximum operations. This model has been demonstrated for simple image classification with LeNet-like architectures [1]. In the paper, we introduce a bipolar morphological ResNet (BM-ResNet) model obtained from a much more complex ResNet architecture by converting its layers to bipolar morphological ones. We apply BM-ResNet to image classification on MNIST and CIFAR-10 datasets with only a moderate accuracy decrease from 99.3% to 99.1% and from 85.3% to 85.1%. We also estimate the computational complexity of the resulting model. We show that for the majority of ResNet layers, the considered model requires 2.1-2.9 times fewer logic gates for implementation and 15-30% lower latency.

Adversarial Knowledge Distillation for a Compact Generator

Hideki Tsunashima, Shigeo Morishima, Junji Yamato, Qiu Chen, Hirokatsu Kataoka

Responsive image

Auto-TLDR; Adversarial Knowledge Distillation for Generative Adversarial Nets

Slides Poster Similar

In this paper, we propose memory-efficient Generative Adversarial Nets (GANs) in line with knowledge distillation. Most existing GANs have a shortcoming in terms of the number of model parameters and low processing speed. Here, to tackle the problem, we propose Adversarial Knowledge Distillation for Generative models (AKDG) for highly efficient GANs, in terms of unconditional generation. Using AKDG, model size and processing speed are substantively reduced. Through an adversarial training exercise with a distillation discriminator, a student generator successfully mimics a teacher generator in fewer model layers and fewer parameters and at a higher processing speed. Moreover, our AKDG is network architecture-agnostic. Comparison of AKDG-applied models to vanilla models suggests that it achieves closer scores to a teacher generator and more efficient performance than a baseline method with respect to Inception Score (IS) and Frechet Inception Distance (FID). In CIFAR-10 experiments, improving IS/FID 1.17pt/55.19pt and in LSUN bedroom experiments, improving FID 71.1pt in comparison to the conventional distillation method for GANs.

E-DNAS: Differentiable Neural Architecture Search for Embedded Systems

Javier García López, Antonio Agudo, Francesc Moreno-Noguer

Responsive image

Auto-TLDR; E-DNAS: Differentiable Architecture Search for Light-Weight Networks for Image Classification

Slides Poster Similar

Designing optimal and light weight networks to fit in resource-limited platforms like mobiles, DSPs or GPUs is a challenging problem with a wide range of interesting applications, {\em e.g.} in embedded systems for autonomous driving. While most approaches are based on manual hyperparameter tuning, there exist a new line of research, the so-called NAS (Neural Architecture Search) methods, that aim to optimize several metrics during the design process, including memory requirements of the network, number of FLOPs, number of MACs (Multiply-ACcumulate operations) or inference latency. However, while NAS methods have shown very promising results, they are still significantly time and cost consuming. In this work we introduce E-DNAS, a differentiable architecture search method, which improves the efficiency of NAS methods in designing light-weight networks for the task of image classification. Concretely, E-DNAS computes, in a differentiable manner, the optimal size of a number of meta-kernels that capture patterns of the input data at different resolutions. We also leverage on the additive property of convolution operations to merge several kernels with different compatible sizes into a single one, reducing thus the number of operations and the time required to estimate the optimal configuration. We evaluate our approach on several datasets to perform classification. We report results in terms of the SoC (System on Chips) metric, typically used in the Texas Instruments TDA2x families for autonomous driving applications. The results show that our approach allows designing low latency architectures significantly faster than state-of-the-art.

Knowledge Distillation Beyond Model Compression

Fahad Sarfraz, Elahe Arani, Bahram Zonooz

Responsive image

Auto-TLDR; Knowledge Distillation from Teacher to Student

Slides Poster Similar

Knowledge distillation (KD) is commonly deemed as an effective model compression technique in which a compact model (student) is trained under the supervision of a larger pretrained model or an ensemble of models (teacher). Various techniques have been proposed since the original formulation, which mimics different aspects of the teacher such as the representation space, decision boundary or intra-data relationship. Some methods replace the one way knowledge distillation from a static teacher with collaborative learning between a cohort of students. Despite the recent advances, a clear understanding of where knowledge resides in a deep neural network and optimal method for capturing knowledge from teacher and transferring it to student still remains an open question. In this study we provide an extensive study on 9 different knowledge distillation methods which covers a broad spectrum of approaches to capture and transfer knowledge. We demonstrate the versatility of the KD framework on different datasets and network architectures under varying capacity gaps between the teacher and student. The study provides intuition for the effects of mimicking different aspects of the teacher and derives insights from the performance of the different distillation approaches to guide the the design of more effective KD methods . Furthermore, our study shows the effectiveness of the KD framework in learning efficiently under varying severity levels of label noise and class imbalance, consistently providing significant generalization gains over standard training. We emphasize that the efficacy of KD goes much beyond a model compression technique and should be considered as a general purpose training paradigm which offers more robustness to common challenges in the real-world datasets compared to the standard training procedure.

Progressive Gradient Pruning for Classification, Detection and Domain Adaptation

Le Thanh Nguyen-Meidine, Eric Granger, Marco Pedersoli, Madhu Kiran, Louis-Antoine Blais-Morin

Responsive image

Auto-TLDR; Progressive Gradient Pruning for Iterative Filter Pruning of Convolutional Neural Networks

Slides Poster Similar

Although deep neural networks (NNs) have achieved state-of-the-art accuracy in many visual recognition tasks, the growing computational complexity and energy consumption of networks remains an issue, especially for applications on plat-forms with limited resources and requiring real-time processing.Filter pruning techniques have recently shown promising results for the compression and acceleration of convolutional NNs(CNNs). However, these techniques involve numerous steps and complex optimisations because some only prune after training CNNs, while others prune from scratch during training by integrating sparsity constraints or modifying the loss function.In this paper we propose a new Progressive Gradient Pruning(PGP) technique for iterative filter pruning during training. In contrast to previous progressive pruning techniques, it relies on a novel filter selection criterion that measures the change in filter weights, uses a new hard and soft pruning strategy and effectively adapts momentum tensors during the backward propagation pass. Experimental results obtained after training various CNNs on image data for classification, object detection and domain adaptation benchmarks indicate that the PGP technique can achieve a better trade-off between classification accuracy and network (time and memory) complexity than PSFP and other state-of-the-art filter pruning techniques.

Multi-Order Feature Statistical Model for Fine-Grained Visual Categorization

Qingtao Wang, Ke Zhang, Shaoli Huang, Lianbo Zhang, Jin Fan

Responsive image

Auto-TLDR; Multi-Order Feature Statistical Method for Fine-Grained Visual Categorization

Slides Poster Similar

Fine-grained visual categorization aims to learn a robust image representation modeling subtle differences from similar categories. Existing methods in this field tackle the problem by designing complex frameworks, which produce high-level features by performing first-order or second-order pooling. Despite the impressive performance achieved by these strategies, the single-order networks only carry linear or non-linear information of the last convolutional layer, neglecting the fact that feature from different orders are mutually complementary. In this paper, we propose a Multi-Order Feature Statistical Method (MOFS), which learns fine-grained features characterizing multiple orders. Specifically, the MOFS consists of two sub-modules: (i) a first-order module modeling both mid-level and high-level features. (ii) a covariance feature statistical module capturing high-order features. By deploying these two sub-modules on the top of existing backbone networks, MOFS simultaneously captures multi-level of discrimative patters including local, global and co-related patters. We evaluate the proposed method on three challenging benchmarks, namely CUB-200-2011, Stanford Cars, and FGVC-Aircraft. Compared with state-of-the-art methods, experiments results exhibit superior performance in recognizing fine-grained objects

Learning Sparse Deep Neural Networks Using Efficient Structured Projections on Convex Constraints for Green AI

Michel Barlaud, Frederic Guyard

Responsive image

Auto-TLDR; Constrained Deep Neural Network with Constrained Splitting Projection

Slides Poster Similar

In recent years, deep neural networks (DNN) have been applied to different domains and achieved dramatic performance improvements over state-of-the-art classical methods. These performances of DNNs were however often obtained with networks containing millions of parameters and which training required heavy computational power. In order to cope with this computational issue a huge literature deals with proximal regularization methods which are time consuming.\\ In this paper, we propose instead a constrained approach. We provide the general framework for our new splitting projection gradient method. Our splitting algorithm iterates a gradient step and a projection on convex sets. We study algorithms for different constraints: the classical $\ell_1$ unstructured constraint and structured constraints such as the nuclear norm, the $\ell_{2,1} $ constraint (Group LASSO). We propose a new $\ell_{1,1} $ structured constraint for which we provide a new projection algorithm We demonstrate the effectiveness of our method on three popular datasets (MNIST, Fashion MNIST and CIFAR). Experiments on these datasets show that our splitting projection method with our new $\ell_{1,1} $ structured constraint provides the best reduction of memory and computational power. Experiments show that fully connected linear DNN are more efficient for green AI.

Exploiting Distilled Learning for Deep Siamese Tracking

Chengxin Liu, Zhiguo Cao, Wei Li, Yang Xiao, Shuaiyuan Du, Angfan Zhu

Responsive image

Auto-TLDR; Distilled Learning Framework for Siamese Tracking

Slides Poster Similar

Existing deep siamese trackers are typically built on off-the-shelf CNN models for feature learning, with the demand for huge power consumption and memory storage. This limits current deep siamese trackers to be carried on resource-constrained devices like mobile phones, given factor that such a deployment normally requires cost-effective considerations. In this work, we address this issue by presenting a novel Distilled Learning Framework(DLF) for siamese tracking, which aims at learning tracking model with efficiency and high accuracy. Specifically, we propose two simple yet effective knowledge distillation strategies, denote as point-wise distillation and pair-wise distillation, which are designed for transferring knowledge from a more discriminative teacher tracker into a compact student tracker. In this way, cost-effective and high performance tracking could be achieved. Extensive experiments on several tracking benchmarks demonstrate the effectiveness of our proposed method.

Activation Density Driven Efficient Pruning in Training

Timothy Foldy-Porto, Yeshwanth Venkatesha, Priyadarshini Panda

Responsive image

Auto-TLDR; Real-Time Neural Network Pruning with Compressed Networks

Slides Poster Similar

Neural network pruning with suitable retraining can yield networks with considerably fewer parameters than the original with comparable degrees of accuracy. Typical pruning methods require large, fully trained networks as a starting point from which they perform a time-intensive iterative pruning and retraining procedure to regain the original accuracy. We propose a novel pruning method that prunes a network real-time during training, reducing the overall training time to achieve an efficient compressed network. We introduce an activation density based analysis to identify the optimal relative sizing or compression for each layer of the network. Our method is architecture agnostic, allowing it to be employed on a wide variety of systems. For VGG-19 and ResNet18 on CIFAR-10, CIFAR-100, and TinyImageNet, we obtain exceedingly sparse networks (up to $200 \times$ reduction in parameters and over $60 \times$ reduction in inference compute operations in the best case) with accuracy comparable to the baseline network. By reducing the network size periodically during training, we achieve total training times that are shorter than those of previously proposed pruning methods. Furthermore, training compressed networks at different epochs with our proposed method yields considerable reduction in training compute complexity ($1.6\times$ to $3.2\times$ lower) at near iso-accuracy as compared to a baseline network trained entirely from scratch.

Adaptive Noise Injection for Training Stochastic Student Networks from Deterministic Teachers

Yi Xiang Marcus Tan, Yuval Elovici, Alexander Binder

Responsive image

Auto-TLDR; Adaptive Stochastic Networks for Adversarial Attacks

Slides Similar

Adversarial attacks have been a prevalent problem causing misclassification in machine learning models, with stochasticity being a promising direction towards greater robustness. However, stochastic networks frequently underperform compared to deterministic deep networks. In this work, we present a conceptually clear adaptive noise injection mechanism in combination with teacher-initialisation, which adjusts its degree of randomness dynamically through the computation of mini-batch statistics. This mechanism is embedded within a simple framework to obtain stochastic networks from existing deterministic networks. Our experiments show that our method is able to outperform prior baselines under white-box settings, exemplified through CIFAR-10 and CIFAR-100. Following which, we perform in-depth analysis on varying different components of training with our approach on the effects of robustness and accuracy, through the study of the evolution of decision boundary and trend curves of clean accuracy/attack success over differing degrees of stochasticity. We also shed light on the effects of adversarial training on a pre-trained network, through the lens of decision boundaries.

Smart Inference for Multidigit Convolutional Neural Network Based Barcode Decoding

Duy-Thao Do, Tolcha Yalew, Tae Joon Jun, Daeyoung Kim

Responsive image

Auto-TLDR; Smart Inference for Barcode Decoding using Deep Convolutional Neural Network

Slides Poster Similar

Barcodes are ubiquitous and have been used in most of critical daily activities for decades. However, most of traditional decoders require well-founded barcode under a relatively standard condition. While wilder conditioned barcodes such as underexposed, occluded, blurry, wrinkled and rotated are commonly captured in reality, those traditional decoders show weakness of recognizing. Several works attempted to solve those challenging barcodes, but many limitations still exist. This work aims to solve the decoding problem using deep convolutional neural network with the possibility of running on portable devices. Firstly, we proposed a special modification of inference based on the feature of having checksum and test-time augmentation, named as Smart Inference (SI) in prediction phase of a trained model. SI considerably boosts accuracy and reduces the false prediction for trained models. Secondly, we have created a large practical evaluation dataset of real captured 1D barcode under various challenging conditions to test our methods vigorously, which is publicly available for other researchers. The experiments' results demonstrated the SI effectiveness with the highest accuracy of 95.85% which outperformed many existing decoders on the evaluation set. Finally, we successfully minimized the best model by knowledge distillation to a shallow model which is shown to have high accuracy (90.85%) with good inference speed of 34.2 ms per image on a real edge device.

On the Information of Feature Maps and Pruning of Deep Neural Networks

Mohammadreza Soltani, Suya Wu, Jie Ding, Robert Ravier, Vahid Tarokh

Responsive image

Auto-TLDR; Compressing Deep Neural Models Using Mutual Information

Slides Poster Similar

A technique for compressing deep neural models achieving competitive performance to state-of-the-art methods is proposed. The approach utilizes the mutual information between the feature maps and the output of the model in order to prune the redundant layers of the network. Extensive numerical experiments on both CIFAR-10, CIFAR-100, and Tiny ImageNet data sets demonstrate that the proposed method can be effective in compressing deep models, both in terms of the numbers of parameters and operations. For instance, by applying the proposed approach to DenseNet model with 0.77 million parameters and 293 million operations for classification of CIFAR-10 data set, a reduction of 62.66% and 41.00% in the number of parameters and the number of operations are respectively achieved, while increasing the test error only by less than 1%.

A Boundary-Aware Distillation Network for Compressed Video Semantic Segmentation

Hongchao Lu

Responsive image

Auto-TLDR; A Boundary-Aware Distillation Network for Video Semantic Segmentation

Slides Poster Similar

In recent years optical flow is often estimated to reuse features so as to accelerate video semantic segmentation. With addition of optical flow network, however, extra cost may incur and accuracy may thus be degraded because of repeated warping operation. In this paper, we propose a boundary-aware distillation network (BDNet) that replaces optical flow network with block motion vectors encoded in compressed video, resulting in negligible computational complexity. In order to make salient features, an auxiliary boundary-aware stream is added to the main stream to jointly estimate silhouette and segmentation of objects. To further correct warped features, a well-trained teacher network is employed to transfer knowledge to the main stream. Both boundary-aware stream and the teacher network are neglected during inference stage, so that video segmentation network enables to get faster without increasing any computational burden. By splitting the task into three components, our BDNet shows almost 10% time saving as well as 1.6% accuracy improvement over baseline on the Cityscapes dataset.

Exploiting Elasticity in Tensor Ranks for Compressing Neural Networks

Jie Ran, Rui Lin, Hayden Kwok-Hay So, Graziano Chesi, Ngai Wong

Responsive image

Auto-TLDR; Nuclear-Norm Rank Minimization Factorization for Deep Neural Networks

Slides Poster Similar

Elasticities in depth, width, kernel size and resolution have been explored in compressing deep neural networks (DNNs). Recognizing that the kernels in a convolutional neural network (CNN) are 4-way tensors, we further exploit a new elasticity dimension along the input-output channels. Specifically, a novel nuclear-norm rank minimization factorization (NRMF) approach is proposed to dynamically and globally search for the reduced tensor ranks during training. Correlation between tensor ranks across multiple layers is revealed, and a graceful tradeoff between model size and accuracy is obtained. Experiments then show the superiority of NRMF over the previous non-elastic variational Bayesian matrix factorization (VBMF) scheme.

Real-Time Monocular Depth Estimation with Extremely Light-Weight Neural Network

Mian Jhong Chiu, Wei-Chen Chiu, Hua-Tsung Chen, Jen-Hui Chuang

Responsive image

Auto-TLDR; Real-Time Light-Weight Depth Prediction for Obstacle Avoidance and Environment Sensing with Deep Learning-based CNN

Slides Poster Similar

Obstacle avoidance and environment sensing are crucial applications in autonomous driving and robotics. Among all types of sensors, RGB camera is widely used in these applications as it can offer rich visual contents with relatively low-cost, and using a single image to perform depth estimation has become one of the main focuses in resent research works. However, prior works usually rely on highly complicated computation and power-consuming GPU to achieve such task; therefore, we focus on developing a real-time light-weight system for depth prediction in this paper. Based on the well-known encoder-decoder architecture, we propose a supervised learning-based CNN with detachable decoders that produce depth predictions with different scales. We also formulate a novel log-depth loss function that computes the difference of predicted depth map and ground truth depth map in log space, so as to increase the prediction accuracy for nearby locations. To train our model efficiently, we generate depth map and semantic segmentation with complex teacher models. Via a series of ablation studies and experiments, it is validated that our model can efficiently performs real-time depth prediction with only 0.32M parameters, with the best trained model outperforms previous works on KITTI dataset for various evaluation matrices.

Adaptive Image Compression Using GAN Based Semantic-Perceptual Residual Compensation

Ruojing Wang, Zitang Sun, Sei-Ichiro Kamata, Weili Chen

Responsive image

Auto-TLDR; Adaptive Image Compression using GAN based Semantic-Perceptual Residual Compensation

Slides Poster Similar

Image compression is a basic task in image processing. In this paper, We present an adaptive image compression algorithm that relies on GAN based semantic-perceptual residual compensation, which is available to offer visually pleasing reconstruction at a low bitrate. Our method adopt an U-shaped encoding and decoding structure accompanied by a well-designed dense residual connection with strip pooling module to improve the original auto-encoder. Besides, we introduce the idea of adversarial learning by introducing a discriminator thus constructed a complete GAN. To improve the coding efficiency, we creatively designed an adaptive semantic-perception residual compensation block based on Grad-CAM algorithm. In the improvement of the quantizer, we embed the method of soft-quantization so as to solve the problem to some extent that back propagation process is irreversible. Simultaneously, we use the latest FLIF lossless compression algorithm and BPG vector compression algorithm to perform deeper compression on the image. More importantly experimental results including PSNR, MS-SSIM demonstrate that the proposed approach outperforms the current state-of-the-art image compression methods.

Compression Strategies and Space-Conscious Representations for Deep Neural Networks

Giosuè Marinò, Gregorio Ghidoli, Marco Frasca, Dario Malchiodi

Responsive image

Auto-TLDR; Compression of Large Convolutional Neural Networks by Weight Pruning and Quantization

Slides Poster Similar

Recent advances in deep learning have made available large, powerful convolutional neural networks (CNN) with state-of-the-art performance in several real-world applications. Unfortunately, these large-sized models have millions of parameters, thus they are not deployable on resource-limited platforms (e.g. where RAM is limited). Compression of CNNs thereby becomes a critical problem to achieve memory-efficient and possibly computationally faster model representations. In this paper, we investigate the impact of lossy compression of CNNs by weight pruning and quantization, and lossless weight matrix representations based on source coding. We tested several combinations of these techniques on four benchmark datasets for classification and regression problems, achieving compression rates up to 165 times, while preserving or improving the model performance.

MINT: Deep Network Compression Via Mutual Information-Based Neuron Trimming

Madan Ravi Ganesh, Jason Corso, Salimeh Yasaei Sekeh

Responsive image

Auto-TLDR; Mutual Information-based Neuron Trimming for Deep Compression via Pruning

Slides Poster Similar

Most approaches to deep neural network compression via pruning either evaluate a filter’s importance using its weights or optimize an alternative objective function with sparsity constraints. While these methods offer a useful way to approximate contributions from similar filters, they often either ignore the dependency between layers or solve a more difficult optimization objective than standard cross-entropy. Our method, Mutual Information-based Neuron Trimming (MINT), approaches deep compression via pruning by enforcing sparsity based on the strength of the relationship between filters of adjacent layers, across every pair of layers. The relationship is calculated using conditional geometric mutual information which evaluates the amount of similar information exchanged between the filters using a graph-based criterion. When pruning a network, we ensure that retained filters contribute the majority of the information towards succeeding layers which ensures high performance. Our novel approach outperforms existing state-of-the-art compression-via-pruning methods on the standard benchmarks for this task: MNIST, CIFAR-10, and ILSVRC2012, across a variety of network architectures. In addition, we discuss our observations of a common denominator between our pruning methodology’s response to adversarial attacks and calibration statistics when compared to the original network.

Fast and Accurate Real-Time Semantic Segmentation with Dilated Asymmetric Convolutions

Leonel Rosas-Arias, Gibran Benitez-Garcia, Jose Portillo-Portillo, Gabriel Sanchez-Perez, Keiji Yanai

Responsive image

Auto-TLDR; FASSD-Net: Dilated Asymmetric Pyramidal Fusion for Real-Time Semantic Segmentation

Slides Poster Similar

Recent works have shown promising results applied to real-time semantic segmentation tasks. To maintain fast inference speed, most of the existing networks make use of light decoders, or they simply do not use them at all. This strategy helps to maintain a fast inference speed; however, their accuracy performance is significantly lower in comparison to non-real-time semantic segmentation networks. In this paper, we introduce two key modules aimed to design a high-performance decoder for real-time semantic segmentation for reducing the accuracy gap between real-time and non-real-time segmentation networks. Our first module, Dilated Asymmetric Pyramidal Fusion (DAPF), is designed to substantially increase the receptive field on the top of the last stage of the encoder, obtaining richer contextual features. Our second module, Multi-resolution Dilated Asymmetric (MDA) module, fuses and refines detail and contextual information from multi-scale feature maps coming from early and deeper stages of the network. Both modules exploit contextual information without excessively increasing the computational complexity by using asymmetric convolutions. Our proposed network entitled “FASSD-Net” reaches 78.8% of mIoU accuracy on the Cityscapes validation dataset at 41.1 FPS on full resolution images (1024x2048). Besides, with a light version of our network, we reach 74.1% of mIoU at 133.1 FPS (full resolution) on a single NVIDIA GTX 1080Ti card with no additional acceleration techniques. The source code and pre-trained models are available at https://github.com/GibranBenitez/FASSD-Net.

Joint Face Alignment and 3D Face Reconstruction with Efficient Convolution Neural Networks

Keqiang Li, Huaiyu Wu, Xiuqin Shang, Zhen Shen, Gang Xiong, Xisong Dong, Bin Hu, Fei-Yue Wang

Responsive image

Auto-TLDR; Mobile-FRNet: Efficient 3D Morphable Model Alignment and 3D Face Reconstruction from a Single 2D Facial Image

Slides Poster Similar

3D face reconstruction from a single 2D facial image is a challenging and concerned problem. Recent methods based on CNN typically aim to learn parameters of 3D Morphable Model (3DMM) from 2D images to render face alignment and 3D face reconstruction. Most algorithms are designed for faces with small, medium yaw angles, which is extremely challenging to align faces in large poses. At the same time, they are not efficient usually. The main challenge is that it takes time to determine the parameters accurately. In order to address this challenge with the goal of improving performance, this paper proposes a novel and efficient end-to-end framework. We design an efficient and lightweight network model combined with Depthwise Separable Convolution and Muti-scale Representation, Lightweight Attention Mechanism, named Mobile-FRNet. Simultaneously, different loss functions are used to constrain and optimize 3DMM parameters and 3D vertices during training to improve the performance of the network. Meanwhile, extensive experiments on the challenging datasets show that our method significantly improves the accuracy of face alignment and 3D face reconstruction. The model parameters and complexity of our method are also improved greatly.

AOAM: Automatic Optimization of Adjacency Matrix for Graph Convolutional Network

Yuhang Zhang, Hongshuai Ren, Jiexia Ye, Xitong Gao, Yang Wang, Kejiang Ye, Cheng-Zhong Xu

Responsive image

Auto-TLDR; Adjacency Matrix for Graph Convolutional Network in Non-Euclidean Space

Slides Poster Similar

Graph Convolutional Network (GCN) is adopted to tackle the problem of the convolution operation in non-Euclidean space. Although previous works on GCN have made some progress, one of their limitations is that their input Adjacency Matrix (AM) is designed manually and requires domain knowledge, which is cumbersome, tedious and error-prone. In addition, entries of this fixed Adjacency Matrix are generally designed as binary values (i.e., ones and zeros) which can not reflect more complex relationship between nodes. However, many applications require a weighted and dynamic Adjacency Matrix instead of an unweighted and fixed Adjacency Matrix. To this end, there are few works focusing on designing a more flexible Adjacency Matrix. In this paper, we propose an end-to-end algorithm to improve the GCN performance by focusing on the Adjacency Matrix. We first provide a calculation method that called node information entropy to update the matrix. Then, we analyze the search strategy in a continuous space and introduce the Deep Deterministic Policy Gradient (DDPG) method to overcome the demerit of the discrete space search. Finally, we integrate the GCN and reinforcement learning into an end-to-end framework. Our method can automatically define the adjacency matrix without artificial knowledge. At the same time, the proposed approach can deal with any size of the matrix and provide a better value for the network. Four popular datasets are selected to evaluate the capability of our algorithm. The method in this paper achieves the state-of-the-art performance on Cora and Pubmed datasets, respectively, with the accuracy of 84.6% and 81.6%.

Stochastic Label Refinery: Toward Better Target Label Distribution

Xi Fang, Jiancheng Yang, Bingbing Ni

Responsive image

Auto-TLDR; Stochastic Label Refinery for Deep Supervised Learning

Slides Poster Similar

This paper proposes a simple yet effective strategy for improving deep supervised learning, named Stochastic Label Refinery (SLR), by refining training labels to more informative labels. When training a neural network, target distributions (or ground-truth) are typically "hard", which means the target label of each category consists of only 0 and 1. However, the fixed "hard" target distributions do not capture association between categories or that between objects. In this study, instead of using the hard target distributions, we iteratively generate "soft" target label distributions for training the neural networks, which leads to better performances. The soft target distributions are obtained via an Expectation-Maximization (EM) iteration, where the "true" target distributions and the learned models are regarded as hidden variables. In E step, the models are optimized to approximate the target distributions on stochastic splits of training data; In M step, the target distributions are updated with predicted pseudo-label on leave-out splits. Extensive experiments on classification and ordinal regression tasks, empirically prove that the refined target distribution consistently leads to considerable performance improvements even applied on competitive baselines. Notably, in DeepDR 2020 Diabetic Retinopathy Grading (DeepDRiD) challenge, our method improves the quadratic weighted kappa on official validation set from 0.8247 to 0.8348 and achieves a state-of-the-art score on online test set. The proposed SLR technique is easy to implement and practically applicable. The code will be open sourced soon.

Softer Pruning, Incremental Regularization

Linhang Cai, Zhulin An, Yongjun Xu

Responsive image

Auto-TLDR; Asymptotic SofteR Filter Pruning for Deep Neural Network Pruning

Slides Poster Similar

Network pruning is widely used to compress Deep Neural Networks (DNNs). The Soft Filter Pruning (SFP) method zeroizes the pruned filters during training while updating them in the next training epoch. Thus the trained information of the pruned filters is completely dropped. To utilize the trained pruned filters, we proposed a SofteR Filter Pruning (SRFP) method and its variant, Asymptotic SofteR Filter Pruning (ASRFP), simply decaying the pruned weights with a monotonic decreasing parameter. Our methods perform well across various netowrks, datasets and pruning rates, also transferable to weight pruning. On ILSVRC-2012, ASRFP prunes 40% of the parameters on ResNet-34 with 1.63% top-1 and 0.68% top-5 accuracy improvement. In theory, SRFP and ASRFP are an incremental regularization of the pruned filters. Besides, We note that SRFP and ASRFP pursue better results while slowing down the speed of convergence.