ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Fast Implementation of 4-Bit Convolutional Neural Networks for Mobile Devices

Anton Trusov, Elena Limonova, Dmitry Slugin, Dmitry Nikolaev, Vladimir V. Arlazarov

Auto-TLDR; Efficient Quantized Low-Precision Neural Networks for Mobile Devices

Abstract Slides Poster

Quantized low-precision neural networks are very popular because they require less computational resources for inference and can provide high performance, which is vital for real-time and embedded recognition systems. However, their advantages are apparent for FPGA and ASIC devices, while general-purpose processor architectures are not always able to perform low-bit integer computations efficiently. The most frequently used low-precision neural network model for mobile central processors is an 8-bit quantized network. However, in a number of cases, it is possible to use fewer bits for weights and activations, and the only problem is the difficulty of efficient implementation. We introduce an efficient implementation of 4-bit matrix multiplication for quantized neural networks and perform time measurements on a mobile ARM processor. It shows 2.9 times speedup compared to standard floating-point multiplication and is 1.5 times faster than 8-bit quantized one. We also demonstrate a 4-bit quantized neural network for OCR recognition on the MIDV-500 dataset. 4-bit quantization gives 95.0% accuracy and 48% overall inference speedup, while an 8-bit quantized network gives 95.4% accuracy and 39% speedup. The results show that 4-bit quantization perfectly suits mobile devices, yielding good enough accuracy and low inference time.

Similar papers

ResNet-Like Architecture with Low Hardware Requirements

Elena Limonova, Daniil Alfonso, Dmitry Nikolaev, Vladimir V. Arlazarov

Auto-TLDR; BM-ResNet: Bipolar Morphological ResNet for Image Classification

Fast Implementation of 4-Bit Convolutional Neural Networks for Mobile Devices

Similar papers

ResNet-Like Architecture with Low Hardware Requirements

Towards Low-Bit Quantization of Deep Neural Networks with Limited Data

Compression Strategies and Space-Conscious Representations for Deep Neural Networks

VPU Specific CNNs through Neural Architecture Search

Resource-efficient DNNs for Keyword Spotting using Neural Architecture Search and Quantization

Neural Compression and Filtering for Edge-assisted Real-time Object Detection in Challenged Networks

Fast Approximate Modelling of the Next Combination Result for Stopping the Text Recognition in a Video

On Resource-Efficient Bayesian Network Classifiers and Deep Neural Networks

E-DNAS: Differentiable Neural Architecture Search for Embedded Systems

Learning Sparse Deep Neural Networks Using Efficient Structured Projections on Convex Constraints for Green AI

Fast and Accurate Real-Time Semantic Segmentation with Dilated Asymmetric Convolutions

Slimming ResNet by Slimming Shortcut

Attention Based Pruning for Shift Networks

HFP: Hardware-Aware Filter Pruning for Deep Convolutional Neural Networks Acceleration

A Discriminant Information Approach to Deep Neural Network Pruning

Trainable Spectrally Initializable Matrix Transformations in Convolutional Neural Networks

Exploiting Elasticity in Tensor Ranks for Compressing Neural Networks

Multimodal Side-Tuning for Document Classification

Smart Inference for Multidigit Convolutional Neural Network Based Barcode Decoding

Dynamic Multi-Path Neural Network

ConvMath : A Convolutional Sequence Network for Mathematical Expression Recognition

Progressive Gradient Pruning for Classification, Detection and Domain Adaptation

Porting a Convolutional Neural Network for Stereo Matching in Hardware

Compact CNN Structure Learning by Knowledge Distillation

Directional Graph Networks with Hard Weight Assignments

Neuron-Based Network Pruning Based on Majority Voting

FastSal: A Computationally Efficient Network for Visual Saliency Prediction

Not All Domains Are Equally Complex: Adaptive Multi-Domain Learning

Compression of YOLOv3 Via Block-Wise and Channel-Wise Pruning for Real-Time and Complicated Autonomous Driving Environment Sensing Applications

A Gated and Bifurcated Stacked U-Net Module for Document Image Dewarping

Learning to Prune in Training via Dynamic Channel Propagation

The DeepHealth Toolkit: A Unified Framework to Boost Biomedical Applications

Watch Your Strokes: Improving Handwritten Text Recognition with Deformable Convolutions

Adaptive Image Compression Using GAN Based Semantic-Perceptual Residual Compensation

On-Device Text Image Super Resolution

Cross-People Mobile-Phone Based Airwriting Character Recognition

Approach for Document Detection by Contours and Contrasts

Real-Time Monocular Depth Estimation with Extremely Light-Weight Neural Network

Softer Pruning, Incremental Regularization

Temporal Binary Representation for Event-Based Action Recognition

Efficient Online Subclass Knowledge Distillation for Image Classification

Fast and Efficient Neural Network for Light Field Disparity Estimation

Fourier Domain Pruning of MobileNet-V2 with Application to Video Based Wildfire Detection

Vision-Based Layout Detection from Scientific Literature Using Recurrent Convolutional Neural Networks

Light3DPose: Real-Time Multi-Person 3D Pose Estimation from Multiple Views

PICK: Processing Key Information Extraction from Documents Using Improved Graph Learning-Convolutional Networks

Recursive Recognition of Offline Handwritten Mathematical Expressions

Hierarchical Deep Hashing for Fast Large Scale Image Retrieval