ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Multi-Direction Convolution for Semantic Segmentation

Dehui Li, Zhiguo Cao, Ke Xian, Xinyuan Qi, Chao Zhang, Hao Lu

Auto-TLDR; Multi-Direction Convolution for Contextual Segmentation

Abstract Slides

Context is known to be one of crucial factors effecting the performance improvement of semantic segmentation. However, state-of-the-art segmentation models built upon fully convolutional networks are inherently weak in encoding contextual information because of stacked local operations such as convolution and pooling. Failing to capture context leads to inferior segmentation performance. Despite many context modules have been proposed to relieve this problem, they still operate in a local manner or use the same contextual information in different positions (due to upsampling). In this paper, we introduce the idea of Multi-Direction Convolution (MDC)—a novel operator capable of encoding rich contextual information. This operator is inspired by an observation that the standard convolution only slides along the spatial dimension (x, y direction) where the channel dimension (z direction) is fixed, which renders slow growth of the receptive field (RF). If considering the channel-fixed convolution to be one-direction, MDC is multi-direction in the sense that MDC slides along both spatial and channel dimensions, i.e., it slides along x, y when z is fixed, along x, z when y is fixed, and along y, z when x is fixed. In this way, MDC is able to encode rich contextual information with the fast increase of the RF. Compared to existing context modules, the encoded context is position-sensitive because no upsampling is required. MDC is also efficient and easy to implement. It can be implemented with few standard convolution layers with permutation. We show through extensive experiments that MDC effectively and selectively enlarges the RF and outperforms existing contextual modules on two standard benchmarks, including Cityscapes and PASCAL VOC2012.

Similar papers

Fast and Accurate Real-Time Semantic Segmentation with Dilated Asymmetric Convolutions

Leonel Rosas-Arias, Gibran Benitez-Garcia, Jose Portillo-Portillo, Gabriel Sanchez-Perez, Keiji Yanai

Auto-TLDR; FASSD-Net: Dilated Asymmetric Pyramidal Fusion for Real-Time Semantic Segmentation

Multi-Direction Convolution for Semantic Segmentation

Similar papers

Fast and Accurate Real-Time Semantic Segmentation with Dilated Asymmetric Convolutions

Global-Local Attention Network for Semantic Segmentation in Aerial Images

Boundary-Aware Graph Convolution for Semantic Segmentation

Real-Time Semantic Segmentation Via Region and Pixel Context Network

Enhanced Feature Pyramid Network for Semantic Segmentation

Transitional Asymmetric Non-Local Neural Networks for Real-World Dirt Road Segmentation

Semantic Segmentation Refinement Using Entropy and Boundary-guided Monte Carlo Sampling and Directed Regional Search

PSDNet: A Balanced Architecture of Accuracy and Parameters for Semantic Segmentation

GSTO: Gated Scale-Transfer Operation for Multi-Scale Feature Learning in Semantic Segmentation

Multiscale Attention-Based Prototypical Network for Few-Shot Semantic Segmentation

Enhancing Semantic Segmentation of Aerial Images with Inhibitory Neurons

Multi-Scale Residual Pyramid Attention Network for Monocular Depth Estimation

UHRSNet: A Semantic Segmentation Network Specifically for Ultra-High-Resolution Images

Efficient-Receptive Field Block with Group Spatial Attention Mechanism for Object Detection

Progressive Scene Segmentation Based on Self-Attention Mechanism

DE-Net: Dilated Encoder Network for Automated Tongue Segmentation

Dynamic Guided Network for Monocular Depth Estimation

Triplet-Path Dilated Network for Detection and Segmentation of General Pathological Images

DARN: Deep Attentive Refinement Network for Liver Tumor Segmentation from 3D CT Volume

A Fine-Grained Dataset and Its Efficient Semantic Segmentation for Unstructured Driving Scenarios

DA-RefineNet: Dual-Inputs Attention RefineNet for Whole Slide Image Segmentation

CT-UNet: An Improved Neural Network Based on U-Net for Building Segmentation in Remote Sensing Images

Context-Aware Residual Module for Image Classification

Attention Stereo Matching Network

Do Not Treat Boundaries and Regions Differently: An Example on Heart Left Atrial Segmentation

PS^2-Net: A Locally and Globally Aware Network for Point-Based Semantic Segmentation

Deeply-Fused Attentive Network for Stereo Matching

PCANet: Pyramid Context-Aware Network for Retinal Vessel Segmentation

Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting

Efficient High-Resolution High-Level-Semantic Representation Learning for Human Pose Estimation

Ordinal Depth Classification Using Region-Based Self-Attention

Incorporating Depth Information into Few-Shot Semantic Segmentation

Video Semantic Segmentation Using Deep Multi-View Representation Learning

3D Semantic Labeling of Photogrammetry Meshes Based on Active Learning

Delivering Meaningful Representation for Monocular Depth Estimation

Cross-Domain Semantic Segmentation of Urban Scenes Via Multi-Level Feature Alignment

EdgeNet: Semantic Scene Completion from a Single RGB-D Image

Attention Pyramid Module for Scene Recognition

A Multi-Task Contextual Atrous Residual Network for Brain Tumor Detection & Segmentation

Joint Semantic-Instance Segmentation of 3D Point Clouds: Instance Separation and Semantic Fusion

Bidirectional Matrix Feature Pyramid Network for Object Detection

Accurate Cell Segmentation in Digital Pathology Images Via Attention Enforced Networks

Single Image Deblurring Using Bi-Attention Network

Spatial-Related and Scale-Aware Network for Crowd Counting

SFPN: Semantic Feature Pyramid Network for Object Detection

Learnable Higher-Order Representation for Action Recognition

PointSpherical: Deep Shape Context for Point Cloud Learning in Spherical Coordinates

Hierarchically Aggregated Residual Transformation for Single Image Super Resolution