Dmitry Nikolaev

Papers from this author

Approach for Document Detection by Contours and Contrasts

Daniil Tropin, Sergey Ilyuhin, Dmitry Nikolaev, Vladimir V. Arlazarov

Responsive image

Auto-TLDR; A countor-based method for arbitrary document detection on a mobile device

Slides Poster Similar

This paper considers the task of arbitrary document detection performed on a mobile device. The classical contour-based approach often mishandles cases with occlusion, complex background, or blur. Region-based approach, which relies on the contrast between object and background, does not have limitations, however its known implementations are highly resource-consuming. We propose a modification of a countor-based method, in which the competing hypotheses of the contour location are ranked according to the contrast between the areas inside and outside the border. In the performed experiments such modification leads to the 40% decrease of alternatives ordering errors and 10% decrease of the overall number of detection errors. We updated state-of-the-art performance on the open MIDV-500 dataset and demonstrated competitive results with the state-of-the-art on the SmartDoc dataset.

Fast Implementation of 4-Bit Convolutional Neural Networks for Mobile Devices

Anton Trusov, Elena Limonova, Dmitry Slugin, Dmitry Nikolaev, Vladimir V. Arlazarov

Responsive image

Auto-TLDR; Efficient Quantized Low-Precision Neural Networks for Mobile Devices

Slides Poster Similar

Quantized low-precision neural networks are very popular because they require less computational resources for inference and can provide high performance, which is vital for real-time and embedded recognition systems. However, their advantages are apparent for FPGA and ASIC devices, while general-purpose processor architectures are not always able to perform low-bit integer computations efficiently. The most frequently used low-precision neural network model for mobile central processors is an 8-bit quantized network. However, in a number of cases, it is possible to use fewer bits for weights and activations, and the only problem is the difficulty of efficient implementation. We introduce an efficient implementation of 4-bit matrix multiplication for quantized neural networks and perform time measurements on a mobile ARM processor. It shows 2.9 times speedup compared to standard floating-point multiplication and is 1.5 times faster than 8-bit quantized one. We also demonstrate a 4-bit quantized neural network for OCR recognition on the MIDV-500 dataset. 4-bit quantization gives 95.0% accuracy and 48% overall inference speedup, while an 8-bit quantized network gives 95.4% accuracy and 39% speedup. The results show that 4-bit quantization perfectly suits mobile devices, yielding good enough accuracy and low inference time.

ResNet-Like Architecture with Low Hardware Requirements

Elena Limonova, Daniil Alfonso, Dmitry Nikolaev, Vladimir V. Arlazarov

Responsive image

Auto-TLDR; BM-ResNet: Bipolar Morphological ResNet for Image Classification

Slides Poster Similar

One of the most computationally intensive parts in modern recognition systems is an inference of deep neural networks that are used for image classification, segmentation, enhancement, and recognition. The growing popularity of edge computing makes us look for ways to reduce its time for mobile and embedded devices. One way to decrease the neural network inference time is to modify a neuron model to make it more efficient for computations on a specific device. The example of such a model is a bipolar morphological neuron model. The bipolar morphological neuron is based on the idea of replacing multiplication with addition and maximum operations. This model has been demonstrated for simple image classification with LeNet-like architectures [1]. In the paper, we introduce a bipolar morphological ResNet (BM-ResNet) model obtained from a much more complex ResNet architecture by converting its layers to bipolar morphological ones. We apply BM-ResNet to image classification on MNIST and CIFAR-10 datasets with only a moderate accuracy decrease from 99.3% to 99.1% and from 85.3% to 85.1%. We also estimate the computational complexity of the resulting model. We show that for the majority of ResNet layers, the considered model requires 2.1-2.9 times fewer logic gates for implementation and 15-30% lower latency.