ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Improving Visual Relation Detection Using Depth Maps

Sahand Sharifzadeh, Sina Moayed Baharlou, Max Berrendorf, Rajat Koner, Volker Tresp

Auto-TLDR; Exploiting Depth Maps for Visual Relation Detection

Abstract Slides Poster

State-of-the-art visual relation detection methods mostly rely on object information extracted from RGB images such as 2D bounding boxes, feature maps, and predicted class probabilities. Depth maps can additionally provide valuable information on object relations, e.g. helping to detect not only spatial relations, such as standing behind, but also non-spatial relations, such as holding. In this work, we study the effect of using different object information with a focus on depth maps. To enable this study, we release a new synthetic dataset of depth maps, VG-Depth, as an extension to Visual Genome (VG). We also note that given the highly imbalanced distribution of relations in VG, typical evaluation metrics for visual relation detection cannot reveal improvements of under-represented relations. To address this problem, we propose using an additional metric, calling it Macro Recall@K, and demonstrate its remarkable performance on VG. Finally, our experiments confirm that by effective utilization of depth maps within a simple, yet competitive framework, the performance of visual relation detection can be improved by a margin of up to 8%.

Similar papers

Using Scene Graphs for Detecting Visual Relationships

Anurag Tripathi, Siddharth Srivastava, Brejesh Lall, Santanu Chaudhury

Auto-TLDR; Relationship Detection using Context Aligned Scene Graph Embeddings

Improving Visual Relation Detection Using Depth Maps

Similar papers

Using Scene Graphs for Detecting Visual Relationships

EdgeNet: Semantic Scene Completion from a Single RGB-D Image

Object Detection on Monocular Images with Two-Dimensional Canonical Correlation Analysis

Context for Object Detection Via Lightweight Global and Mid-Level Representations

Incorporating Depth Information into Few-Shot Semantic Segmentation

Multi-Modal Contextual Graph Neural Network for Text Visual Question Answering

MAGNet: Multi-Region Attention-Assisted Grounding of Natural Language Queries at Phrase Level

Cross-View Relation Networks for Mammogram Mass Detection

6D Pose Estimation with Correlation Fusion

Exploring and Exploiting the Hierarchical Structure of a Scene for Scene Graph Generation

Visual Style Extraction from Chart Images for Chart Restyling

Detective: An Attentive Recurrent Model for Sparse Object Detection

Two-Level Attention-Based Fusion Learning for RGB-D Face Recognition

Semantics to Space(S2S): Embedding Semantics into Spatial Space for Zero-Shot Verb-Object Query Inferencing

FashionGraph: Understanding Fashion Data Using Scene Graph Generation

Detecting Objects with High Object Region Percentage

HPERL: 3D Human Pose Estimastion from RGB and LiDAR

Multi-Scale Relational Reasoning with Regional Attention for Visual Question Answering

Enhanced Vote Network for 3D Object Detection in Point Clouds

Yolo+FPN: 2D and 3D Fused Object Detection with an RGB-D Camera

Object Detection Using Dual Graph Network

Question-Agnostic Attention for Visual Question Answering

Multi-Stage Attention Based Visual Question Answering

Hierarchical Head Design for Object Detectors

Dual Path Multi-Modal High-Order Features for Textual Content Based Visual Question Answering

Adaptive Word Embedding Module for Semantic Reasoning in Large-Scale Detection

Human-Centric Parsing Network for Human-Object Interaction Detection

Real-Time Monocular Depth Estimation with Extremely Light-Weight Neural Network

Dynamic Guided Network for Monocular Depth Estimation

SyNet: An Ensemble Network for Object Detection in UAV Images

A Novel Attention-Based Aggregation Function to Combine Vision and Language

Extending Single Beam Lidar to Full Resolution by Fusing with Single Image Depth Estimation

Enhancing Deep Semantic Segmentation of RGB-D Data with Entangled Forests

MixedFusion: 6D Object Pose Estimation from Decoupled RGB-Depth Features

Context Aware Group Activity Recognition

FatNet: A Feature-Attentive Network for 3D Point Cloud Processing

Transformer Reasoning Network for Image-Text Matching and Retrieval

DEN: Disentangling and Exchanging Network for Depth Completion

Delivering Meaningful Representation for Monocular Depth Estimation

More Correlations Better Performance: Fully Associative Networks for Multi-Label Image Classification

Towards Efficient 3D Point Cloud Scene Completion Via Novel Depth View Synthesis

Named Entity Recognition and Relation Extraction with Graph Neural Networks in Semi Structured Documents

End-To-End Hierarchical Relation Extraction for Generic Form Understanding

A Grid-Based Representation for Human Action Recognition

Point In: Counting Trees with Weakly Supervised Segmentation Network

Multiscale Attention-Based Prototypical Network for Few-Shot Semantic Segmentation

Domain Siamese CNNs for Sparse Multispectral Disparity Estimation

A Fine-Grained Dataset and Its Efficient Semantic Segmentation for Unstructured Driving Scenarios