ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Video Object Detection Using Object's Motion Context and Spatio-Temporal Feature Aggregation

Jaekyum Kim, Junho Koh, Byeongwon Lee, Seungji Yang, Jun Won Choi

Auto-TLDR; Video Object Detection Using Spatio-Temporal Aggregated Features and Gated Attention Network

Abstract Slides Poster

The deep learning technique has recently led to significant improvement in object-detection accuracy. Numerous object detection schemes have been designed to process each frame independently. However, in many applications, object detection is performed using video data, which consists of a sequence of two-dimensional (2D) image frames. Thus, the object detection accuracy can be improved by exploiting the temporal context of the video sequence. In this paper, we propose a novel video object detection method that exploits both the motion context of the object and spatio-temporal aggregated features in the video sequence to enhance the object detection performance. First, the motion of the object is captured by the correlation between the spatial feature maps of two adjacent frames. Then, the embedding vector, representing the motion context, is obtained by feeding the N correlation maps to long short term memory (LSTM). In addition to generating the motion context vector, the spatial feature maps for N adjacent frames are aggregated to boost the quality of the feature map. The gated attention network is employed to selectively combine only highly correlated feature maps based on their relevance. While most video object detectors are applied to two-stage detectors, our proposed method is applicable to one-stage detectors, which tend to be preferred for practical applications owing to reduced computational complexity. Our numerical evaluation conducted on the ImageNet VID dataset shows that our network offers significant performance gain over baseline algorithms, and it outperforms the existing state-of-the-art one-stage video object detection methods.

Similar papers

ScarfNet: Multi-Scale Features with Deeply Fused and Redistributed Semantics for Enhanced Object Detection

Jin Hyeok Yoo, Dongsuk Kum, Jun Won Choi

Auto-TLDR; Semantic Fusion of Multi-scale Feature Maps for Object Detection

Video Object Detection Using Object's Motion Context and Spatio-Temporal Feature Aggregation

Similar papers

ScarfNet: Multi-Scale Features with Deeply Fused and Redistributed Semantics for Enhanced Object Detection

Temporal Feature Enhancement Network with External Memory for Object Detection in Surveillance Video

Correlation-Based ConvNet for Small Object Detection in Videos

SFPN: Semantic Feature Pyramid Network for Object Detection

Forground-Guided Vehicle Perception Framework

Efficient-Receptive Field Block with Group Spatial Attention Mechanism for Object Detection

Bidirectional Matrix Feature Pyramid Network for Object Detection

Detective: An Attentive Recurrent Model for Sparse Object Detection

Small Object Detection by Generative and Discriminative Learning

Video Semantic Segmentation Using Deep Multi-View Representation Learning

Detecting Objects with High Object Region Percentage

MFI: Multi-Range Feature Interchange for Video Action Recognition

A Modified Single-Shot Multibox Detector for Beyond Real-Time Object Detection

Learning a Dynamic High-Resolution Network for Multi-Scale Pedestrian Detection

Construction Worker Hardhat-Wearing Detection Based on an Improved BiFPN

SyNet: An Ensemble Network for Object Detection in UAV Images

ACCLVOS: Atrous Convolution with Spatial-Temporal ConvLSTM for Video Object Segmentation

Learning Object Deformation and Motion Adaption for Semi-Supervised Video Object Segmentation

FeatureNMS: Non-Maximum Suppression by Learning Feature Embeddings

Object Detection Using Dual Graph Network

Utilising Visual Attention Cues for Vehicle Detection and Tracking

Wavelet Attention Embedding Networks for Video Super-Resolution

Tiny Object Detection in Aerial Images

Cascade Saliency Attention Network for Object Detection in Remote Sensing Images

A Novel Region of Interest Extraction Layer for Instance Segmentation

Siamese Dynamic Mask Estimation Network for Fast Video Object Segmentation

EDD-Net: An Efficient Defect Detection Network

Early Wildfire Smoke Detection in Videos

Object Detection Model Based on Scene-Level Region Proposal Self-Attention

Feature Pyramid Hierarchies for Multi-Scale Temporal Action Detection

SAT-Net: Self-Attention and Temporal Fusion for Facial Action Unit Detection

ACRM: Attention Cascade R-CNN with Mix-NMS for Metallic Surface Defect Detection

Hierarchical Head Design for Object Detectors

Mutual-Supervised Feature Modulation Network for Occluded Pedestrian Detection

A Fast and Accurate Object Detector for Handwritten Digit String Recognition

MagnifierNet: Learning Efficient Small-Scale Pedestrian Detector towards Multiple Dense Regions

Flow-Guided Spatial Attention Tracking for Egocentric Activity Recognition

Deep Real-Time Hand Detection Using CFPN on Embedded Systems

You Ought to Look Around: Precise, Large Span Action Detection

PRF-Ped: Multi-Scale Pedestrian Detector with Prior-Based Receptive Field

TSMSAN: A Three-Stream Multi-Scale Attentive Network for Video Saliency Detection

Human Segmentation with Dynamic LiDAR Data

One-Stage Multi-Task Detector for 3D Cardiac MR Imaging

Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action Localization

CenterRepp: Predict Central Representative Point Set's Distribution for Detection

P2 Net: Augmented Parallel-Pyramid Net for Attention Guided Pose Estimation

Scene Text Detection with Selected Anchors

Revisiting Sequence-To-Sequence Video Object Segmentation with Multi-Task Loss and Skip-Memory