ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Detective: An Attentive Recurrent Model for Sparse Object Detection

Amine Kechaou, Manuel Martinez, Monica Haurilet, Rainer Stiefelhagen

Auto-TLDR; Detective: An attentive object detector that identifies objects in images in a sequential manner

Abstract Slides Poster

In this work, we present Detective – an attentive object detector that identifies objects in images in a sequential manner. Our network is based on an encoder-decoder architecture, where the encoder is a convolutional neural network, and the decoder is a convolutional recurrent neural network coupled with an attention mechanism. At each iteration, our decoder focuses on the relevant parts of the image using an attention mechanism, and then estimates the object’s class and the bounding box coordinates. Current object detection models generate dense predictions and rely on post-processing to remove duplicate predictions. Detective is a sparse object detector that generates a single bounding box per object instance. However, training a sparse object detector is challenging, as it requires the model to reason at the instance level and not just at the class and spatial levels. We propose a training mechanism based on the Hungarian Algorithm and a loss that balances the localization and classification tasks. This allows Detective to achieve promising results on the PASCAL VOC object detection dataset. Our experiments demonstrate that sparse object detection is possible and has a great potential for future developments in applications where the order of the objects to be predicted is of interest.

Similar papers

SyNet: An Ensemble Network for Object Detection in UAV Images

Berat Mert Albaba, Sedat Ozer

Auto-TLDR; SyNet: Combining Multi-Stage and Single-Stage Object Detection for Aerial Images

Abstract Poster Similar

Recent advances in camera equipped drone applications and their widespread use increased the demand on vision based object detection algorithms for aerial images. Object detection process is inherently a challenging task as a generic computer vision problem, however, since the use of object detection algorithms on UAVs (or on drones) is relatively a new area, it remains as a more challenging problem to detect objects in aerial images. There are several reasons for that including: (i) the lack of large drone datasets including large object variance, (ii) the large orientation and scale variance in drone images when compared to the ground images, and (iii) the difference in texture and shape features between the ground and the aerial images. Deep learning based object detection algorithms can be classified under two main categories: (a) single-stage detectors and (b) multi-stage detectors. Both single-stage and multi-stage solutions have their advantages and disadvantages over each other. However, a technique to combine the good sides of each of those solutions could yield even a stronger solution than each of those solutions individually. In this paper, we propose an ensemble network, SyNet, that combines a multi-stage method with a single-stage one with the motivation of decreasing the high false negative rate of multi-stage detectors and increasing the quality of the single-stage detector proposals. As building blocks, CenterNet and Cascade R-CNN with pretrained feature extractors are utilized along with an ensembling strategy. We report the state of the art results obtained by our proposed solution on two different datasets: namely MS-COCO and visDrone with \%52.1 $mAP_{IoU = 0.75}$ is obtained on MS-COCO $val2017$ dataset and \%26.2 $mAP_{IoU = 0.75}$ is obtained on VisDrone $test-set$. Our code is available at: https://github.com/mertalbaba/SyNet}{https://github.com/mer talbaba/SyNet

MAGNet: Multi-Region Attention-Assisted Grounding of Natural Language Queries at Phrase Level

Amar Shrestha, Krittaphat Pugdeethosapol, Haowen Fang, Qinru Qiu

Auto-TLDR; MAGNet: A Multi-Region Attention-Aware Grounding Network for Free-form Textual Queries

Detective: An Attentive Recurrent Model for Sparse Object Detection

Similar papers

SyNet: An Ensemble Network for Object Detection in UAV Images

MAGNet: Multi-Region Attention-Assisted Grounding of Natural Language Queries at Phrase Level

FeatureNMS: Non-Maximum Suppression by Learning Feature Embeddings

ACRM: Attention Cascade R-CNN with Mix-NMS for Metallic Surface Defect Detection

ScarfNet: Multi-Scale Features with Deeply Fused and Redistributed Semantics for Enhanced Object Detection

Forground-Guided Vehicle Perception Framework

Detecting Objects with High Object Region Percentage

A Fast and Accurate Object Detector for Handwritten Digit String Recognition

Hierarchical Head Design for Object Detectors

A Modified Single-Shot Multibox Detector for Beyond Real-Time Object Detection

CASNet: Common Attribute Support Network for Image Instance and Panoptic Segmentation

SFPN: Semantic Feature Pyramid Network for Object Detection

Convolutional STN for Weakly Supervised Object Localization

A Novel Region of Interest Extraction Layer for Instance Segmentation

Iterative Bounding Box Annotation for Object Detection

Multi-View Object Detection Using Epipolar Constraints within Cluttered X-Ray Security Imagery

Utilising Visual Attention Cues for Vehicle Detection and Tracking

CenterRepp: Predict Central Representative Point Set's Distribution for Detection

Hybrid Cascade Point Search Network for High Precision Bar Chart Component Detection

Object Detection Model Based on Scene-Level Region Proposal Self-Attention

Video Object Detection Using Object's Motion Context and Spatio-Temporal Feature Aggregation

Tiny Object Detection in Aerial Images

Context for Object Detection Via Lightweight Global and Mid-Level Representations

Bidirectional Matrix Feature Pyramid Network for Object Detection

One-Stage Multi-Task Detector for 3D Cardiac MR Imaging

Temporal Feature Enhancement Network with External Memory for Object Detection in Surveillance Video

HPERL: 3D Human Pose Estimastion from RGB and LiDAR

Object Detection Using Dual Graph Network

DualBox: Generating BBox Pair with Strong Correspondence Via Occlusion Pattern Clustering and Proposal Refinement

Scene Text Detection with Selected Anchors

PRF-Ped: Multi-Scale Pedestrian Detector with Prior-Based Receptive Field

VTT: Long-Term Visual Tracking with Transformers

Construction Worker Hardhat-Wearing Detection Based on an Improved BiFPN

EAGLE: Large-Scale Vehicle Detection Dataset in Real-World Scenarios Using Aerial Imagery

Correlation-Based ConvNet for Small Object Detection in Videos

Multiple-Step Sampling for Dense Object Detection and Counting

Object Detection in the DCT Domain: Is Luminance the Solution?

Revisiting Sequence-To-Sequence Video Object Segmentation with Multi-Task Loss and Skip-Memory

StrongPose: Bottom-up and Strong Keypoint Heat Map Based Pose Estimation

Cascade Saliency Attention Network for Object Detection in Remote Sensing Images

FourierNet: Compact Mask Representation for Instance Segmentation Using Differentiable Shape Decoders

Gabriella: An Online System for Real-Time Activity Detection in Untrimmed Security Videos

SynDHN: Multi-Object Fish Tracker Trained on Synthetic Underwater Videos

Learning a Dynamic High-Resolution Network for Multi-Scale Pedestrian Detection

CDeC-Net: Composite Deformable Cascade Network for Table Detection in Document Images

Efficient-Receptive Field Block with Group Spatial Attention Mechanism for Object Detection

Enriching Video Captions with Contextual Text

Small Object Detection by Generative and Discriminative Learning