ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

On the Impact of Lossy Image and Video Compression on the Performance of Deep Convolutional Neural Network Architectures

Matt Poyser, Toby Breckon, Amir Atapour-Abarghouei

Auto-TLDR; The Impact of Lossy Image Compression on Deep Neural Networks for Image-based Detection and Classification

Abstract Slides

Recent advances in generalized image understanding have seen a surge in the use of deep convolutional neural networks (CNN) across a broad range of image-based detection, classification and prediction tasks. Whilst the reported performance of these approaches is impressive, this paper investigates the hitherto unapproached question of the impact of commonplace image and video compression techniques on the performance of such deep learning architectures. Focusing on the JPEG and H.264 (MPEG-4 AVC) as a representative proxy for contemporary lossy image/video compression techniques that are in common use within network-connected image/video devices and infrastructure, we examine the impact performance across five discrete tasks: human pose estimation, semantic segmentation, object detection, action recognition, and monocular depth estimation. As such, within this study we include a variety of network architectures and genres spanning end-to-end convolution, encoder-decoder, region-based CNN (R-CNN), dual-stream, and generative adversarial networks (GAN). Our results show a non-linear and non-uniform relationship between network performance and the level of lossy compression applied. Notably, performance decreases significantly below a JPEG quality (quantization) level of 15% and a H.264 Constant Rate Factor (CRF) of 40. However, re-training said architectures on pre-compressed imagery conversely recovers network performance by up to 78.4% in some cases. Furthermore, there is a correlation between architectures employing an encoder-decoder pipeline and those that demonstrate resilience to lossy image compression. The characteristics of this input compression to output performance impact can be used to inform design decisions within future image/video devices and infrastructure.

Similar papers

Object Detection in the DCT Domain: Is Luminance the Solution?

Benjamin Deguerre, Clement Chatelain, Gilles Gasso

Auto-TLDR; Jpeg Deep: Object Detection Using Compressed JPEG Images

On the Impact of Lossy Image and Video Compression on the Performance of Deep Convolutional Neural Network Architectures

Similar papers

Object Detection in the DCT Domain: Is Luminance the Solution?

Adaptive Image Compression Using GAN Based Semantic-Perceptual Residual Compensation

A Grid-Based Representation for Human Action Recognition

A NoGAN Approach for Image and Video Restoration and Compression Artifact Removal

Real-Time Monocular Depth Estimation with Extremely Light-Weight Neural Network

Enhancing Semantic Segmentation of Aerial Images with Inhibitory Neurons

StrongPose: Bottom-up and Strong Keypoint Heat Map Based Pose Estimation

A Fine-Grained Dataset and Its Efficient Semantic Segmentation for Unstructured Driving Scenarios

What and How? Jointly Forecasting Human Action and Pose

On the Use of Benford's Law to Detect GAN-Generated Images

Real Time Fencing Move Classification and Detection at Touch Time During a Fencing Match

Boosting High-Level Vision with Joint Compression Artifacts Reduction and Super-Resolution

MedZip: 3D Medical Images Lossless Compressor Using Recurrent Neural Network (LSTM)

3D Attention Mechanism for Fine-Grained Classification of Table Tennis Strokes Using a Twin Spatio-Temporal Convolutional Neural Networks

Towards Practical Compressed Video Action Recognition: A Temporal Enhanced Multi-Stream Network

HPERL: 3D Human Pose Estimastion from RGB and LiDAR

Motion Complementary Network for Efficient Action Recognition

Attention-Oriented Action Recognition for Real-Time Human-Robot Interaction

Modeling Long-Term Interactions to Enhance Action Recognition

TinyVIRAT: Low-Resolution Video Action Recognition

Object Detection on Monocular Images with Two-Dimensional Canonical Correlation Analysis

Light3DPose: Real-Time Multi-Person 3D Pose Estimation from Multiple Views

The Role of Cycle Consistency for Generating Better Human Action Videos from a Single Frame

Detecting Manipulated Facial Videos: A Time Series Solution

Neural Compression and Filtering for Edge-assisted Real-time Object Detection in Challenged Networks

Estimation of Abundance and Distribution of SaltMarsh Plants from Images Using Deep Learning

Weight Estimation from an RGB-D Camera in Top-View Configuration

Automatic Semantic Segmentation of Structural Elements related to the Spinal Cord in the Lumbar Region by Using Convolutional Neural Networks

PSDNet: A Balanced Architecture of Accuracy and Parameters for Semantic Segmentation

Computational Data Analysis for First Quantization Estimation on JPEG Double Compressed Images

Delivering Meaningful Representation for Monocular Depth Estimation

2D Deep Video Capsule Network with Temporal Shift for Action Recognition

Fast and Accurate Real-Time Semantic Segmentation with Dilated Asymmetric Convolutions

Partially Supervised Multi-Task Network for Single-View Dietary Assessment

Future Urban Scenes Generation through Vehicles Synthesis

Late Fusion of Bayesian and Convolutional Models for Action Recognition

Single View Learning in Action Recognition

Better Prior Knowledge Improves Human-Pose-Based Extrinsic Camera Calibration

A Multi-Task Neural Network for Action Recognition with 3D Key-Points

Gabriella: An Online System for Real-Time Activity Detection in Untrimmed Security Videos

Motion U-Net: Multi-Cue Encoder-Decoder Network for Motion Segmentation

Attention-Driven Body Pose Encoding for Human Activity Recognition

Early Wildfire Smoke Detection in Videos

Multi-Scale Residual Pyramid Attention Network for Monocular Depth Estimation

Tilting at Windmills: Data Augmentation for Deeppose Estimation Does Not Help with Occlusions

RWF-2000: An Open Large Scale Video Database for Violence Detection

Improving Robotic Grasping on Monocular Images Via Multi-Task Learning and Positional Loss

Ordinal Depth Classification Using Region-Based Self-Attention