ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

6D Pose Estimation with Correlation Fusion

Yi Cheng, Hongyuan Zhu, Ying Sun, Cihan Acar, Wei Jing, Yan Wu, Liyuan Li, Cheston Tan, Joo-Hwee Lim

Auto-TLDR; Intra- and Inter-modality Fusion for 6D Object Pose Estimation with Attention Mechanism

Abstract Slides Poster

6D object pose estimation is widely applied in robotic tasks such as grasping and manipulation. Prior methods using RGB-only images are vulnerable to heavy occlusion and poor illumination, so it is important to complement them with depth information. However, existing methods using RGB-D data cannot adequately exploit consistent and complementary information between RGB and depth modalities. In this paper, we present a novel method to effectively consider the correlation within and across both modalities with attention mechanism to learn discriminative and compact multi-modal features. Then, effective fusion strategies for intra- and inter-correlation modules are explored to ensure efficient information flow between RGB and depth. To our best knowledge, this is the first work to explore effective intra- and inter-modality fusion in 6D pose estimation. The experimental results show that our method can achieve the state-of-the-art performance on LineMOD and YCBVideo dataset. We also demonstrate that the proposed method can benefit a real-world robot grasping task by providing accurate object pose estimation.

Similar papers

MixedFusion: 6D Object Pose Estimation from Decoupled RGB-Depth Features

Hangtao Feng, Lu Zhang, Xu Yang, Zhiyong Liu

Auto-TLDR; MixedFusion: Combining Color and Point Clouds for 6D Pose Estimation

6D Pose Estimation with Correlation Fusion

Similar papers

MixedFusion: 6D Object Pose Estimation from Decoupled RGB-Depth Features

Two-Level Attention-Based Fusion Learning for RGB-D Face Recognition

Yolo+FPN: 2D and 3D Fused Object Detection with an RGB-D Camera

Dynamic Guided Network for Monocular Depth Estimation

Incorporating Depth Information into Few-Shot Semantic Segmentation

A Grid-Based Representation for Human Action Recognition

Multi-Scale Residual Pyramid Attention Network for Monocular Depth Estimation

Towards Efficient 3D Point Cloud Scene Completion Via Novel Depth View Synthesis

Extending Single Beam Lidar to Full Resolution by Fusing with Single Image Depth Estimation

Orthographic Projection Linear Regression for Single Image 3D Human Pose Estimation

Progressive Scene Segmentation Based on Self-Attention Mechanism

Light3DPose: Real-Time Multi-Person 3D Pose Estimation from Multiple Views

Multi-Stage Attention Based Visual Question Answering

Improving Visual Relation Detection Using Depth Maps

P2 Net: Augmented Parallel-Pyramid Net for Attention Guided Pose Estimation

HPERL: 3D Human Pose Estimastion from RGB and LiDAR

MANet: Multimodal Attention Network Based Point-View Fusion for 3D Shape Recognition

Median-Shape Representation Learning for Category-Level Object Pose Estimation in Cluttered Environments

Exploring Severe Occlusion: Multi-Person 3D Pose Estimation with Gated Convolution

Ordinal Depth Classification Using Region-Based Self-Attention

Enhancing Deep Semantic Segmentation of RGB-D Data with Entangled Forests

RefiNet: 3D Human Pose Refinement with Depth Maps

What and How? Jointly Forecasting Human Action and Pose

Enhanced Vote Network for 3D Object Detection in Point Clouds

Vision-Based Multi-Modal Framework for Action Recognition

Pose-Aware Multi-Feature Fusion Network for Driver Distraction Recognition

PEAN: 3D Hand Pose Estimation Adversarial Network

Real-Time Monocular Depth Estimation with Extremely Light-Weight Neural Network

Deeply-Fused Attentive Network for Stereo Matching

Efficient-Receptive Field Block with Group Spatial Attention Mechanism for Object Detection

Attentive Hybrid Feature Based a Two-Step Fusion for Facial Expression Recognition

Joint Face Alignment and 3D Face Reconstruction with Efficient Convolution Neural Networks

Two-Stage Adaptive Object Scene Flow Using Hybrid CNN-CRF Model

Boundary-Aware Graph Convolution for Semantic Segmentation

Self-Supervised Detection and Pose Estimation of Logistical Objects in 3D Sensor Data

Object Detection on Monocular Images with Two-Dimensional Canonical Correlation Analysis

Integrating Historical States and Co-Attention Mechanism for Visual Dialog

Attention-Oriented Action Recognition for Real-Time Human-Robot Interaction

Dual Path Multi-Modal High-Order Features for Textual Content Based Visual Question Answering

Future Urban Scenes Generation through Vehicles Synthesis

FastCompletion: A Cascade Network with Multiscale Group-Fused Inputs for Real-Time Depth Completion

PC-Net: A Deep Network for 3D Point Clouds Analysis

NetCalib: A Novel Approach for LiDAR-Camera Auto-Calibration Based on Deep Learning

Do We Really Need Scene-Specific Pose Encoders?

Multi-Scale Cascading Network with Compact Feature Learning for RGB-Infrared Person Re-Identification

S-VoteNet: Deep Hough Voting with Spherical Proposal for 3D Object Detection

P2D: A Self-Supervised Method for Depth Estimation from Polarimetry

Delivering Meaningful Representation for Monocular Depth Estimation