ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Adaptive Word Embedding Module for Semantic Reasoning in Large-Scale Detection

Yu Zhang, Xiaoyu Wu, Ruolin Zhu

Auto-TLDR; Adaptive Word Embedding Module for Object Detection

Abstract Slides Poster

In recent years, convolutional neural networks have achieved rapid development in the field of object detection. However, due to the imbalance of data, high costs in labor and uneven level of data labeling, the overall performance of the previous detection network has dropped sharply when dataset extended to the large-scale with hundreds and thousands categories. We present the Adaptive Word Embedding Module, extracting the adaptive semantic knowledge graph to reach semantic consistency within one image. Our method endows the ability to infer global semantic of detection networks without other attribute or relationship annotations. Compared with Faster RCNN, the algorithm on the MSCOCO dataset was significantly improved by 4.1%, and the mAP value has reached 32.8%. On the VG1000 dataset, it increased by 0.9% to 6.7% compared with Faster RCNN. Adaptive Word Embedding Module is lightweight, general-purpose and can be plugged into diverse detection networks. Code will be made available.

Similar papers

Object Detection Using Dual Graph Network

Shengjia Chen, Zhixin Li, Feicheng Huang, Canlong Zhang, Huifang Ma

Auto-TLDR; A Graph Convolutional Network for Object Detection with Key Relation Information

Abstract Slides Similar

Most object detection methods focus only on the local information near the region proposal and ignore the object's global semantic relation and local spatial relation information, resulting in limited performance. To capture and explore these important relations, we propose a detection method based on a graph convolutional network (GCN). Two independent relation graph networks are used to obtain the global semantic information of the object in labels and the local spatial information in images. Semantic relation networks can implicitly acquire global knowledge, and by constructing a directed graph on the dataset, each node is represented by the word embedding of labels and then sent to the GCN to obtain high-level semantic representation. The spatial relation network encodes the relation by the positional relation module and the visual connection module, and enriches the object features through local key information from objects. The feature representation is further improved by aggregating the outputs of the two networks. Instead of directly disseminating visual features in the network, the dual-graph network explores more advanced feature information, giving the detector the ability to obtain key relations in labels and region proposals. Experiments on the PASCAL VOC and MS COCO datasets demonstrate that key relation information significantly improve the performance of detection with better ability to detect small objects and reasonable boduning box. The results on COCO dataset demonstrate our method obtains around 32.3% improvement on AP in terms of small objects.

Context for Object Detection Via Lightweight Global and Mid-Level Representations

Mesut Erhan Unal, Adriana Kovashka

Auto-TLDR; Context-Based Object Detection with Semantic Similarity

Adaptive Word Embedding Module for Semantic Reasoning in Large-Scale Detection

Similar papers

Object Detection Using Dual Graph Network

Context for Object Detection Via Lightweight Global and Mid-Level Representations

Using Scene Graphs for Detecting Visual Relationships

Small Object Detection by Generative and Discriminative Learning

Detecting Objects with High Object Region Percentage

SFPN: Semantic Feature Pyramid Network for Object Detection

A Novel Region of Interest Extraction Layer for Instance Segmentation

Object Detection Model Based on Scene-Level Region Proposal Self-Attention

Forground-Guided Vehicle Perception Framework

MAGNet: Multi-Region Attention-Assisted Grounding of Natural Language Queries at Phrase Level

Construction Worker Hardhat-Wearing Detection Based on an Improved BiFPN

Mutual-Supervised Feature Modulation Network for Occluded Pedestrian Detection

Human-Centric Parsing Network for Human-Object Interaction Detection

Multi-Modal Contextual Graph Neural Network for Text Visual Question Answering

Zero-Shot Text Classification with Semantically Extended Graph Convolutional Network

More Correlations Better Performance: Fully Associative Networks for Multi-Label Image Classification

Efficient-Receptive Field Block with Group Spatial Attention Mechanism for Object Detection

Hybrid Cascade Point Search Network for High Precision Bar Chart Component Detection

Open Set Domain Recognition Via Attention-Based GCN and Semantic Matching Optimization

Learning a Dynamic High-Resolution Network for Multi-Scale Pedestrian Detection

Scene Text Detection with Selected Anchors

Bidirectional Matrix Feature Pyramid Network for Object Detection

Incrementally Zero-Shot Detection by an Extreme Value Analyzer

ScarfNet: Multi-Scale Features with Deeply Fused and Redistributed Semantics for Enhanced Object Detection

Transformer-Encoder Detector Module: Using Context to Improve Robustness to Adversarial Attacks on Object Detection

SyNet: An Ensemble Network for Object Detection in UAV Images

Vision-Based Layout Detection from Scientific Literature Using Recurrent Convolutional Neural Networks

MagnifierNet: Learning Efficient Small-Scale Pedestrian Detector towards Multiple Dense Regions

Cascade Saliency Attention Network for Object Detection in Remote Sensing Images

Boundary-Aware Graph Convolution for Semantic Segmentation

Dynamic Low-Light Image Enhancement for Object Detection Via End-To-End Training

Cross-View Relation Networks for Mammogram Mass Detection

CASNet: Common Attribute Support Network for Image Instance and Panoptic Segmentation

Semantics to Space(S2S): Embedding Semantics into Spatial Space for Zero-Shot Verb-Object Query Inferencing

DualBox: Generating BBox Pair with Strong Correspondence Via Occlusion Pattern Clustering and Proposal Refinement

Object Detection on Monocular Images with Two-Dimensional Canonical Correlation Analysis

Image-Based Table Cell Detection: A New Dataset and an Improved Detection Method

StrongPose: Bottom-up and Strong Keypoint Heat Map Based Pose Estimation

Dual Path Multi-Modal High-Order Features for Textual Content Based Visual Question Answering

FeatureNMS: Non-Maximum Suppression by Learning Feature Embeddings

Nighttime Pedestrian Detection Based on Feature Attention and Transformation

Triplet-Path Dilated Network for Detection and Segmentation of General Pathological Images

VSB^2-Net: Visual-Semantic Bi-Branch Network for Zero-Shot Hashing

EDD-Net: An Efficient Defect Detection Network

Label Incorporated Graph Neural Networks for Text Classification

P2 Net: Augmented Parallel-Pyramid Net for Attention Guided Pose Estimation

Adaptive Remote Sensing Image Attribute Learning for Active Object Detection

Deep Real-Time Hand Detection Using CFPN on Embedded Systems