ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Fast and Accurate Real-Time Semantic Segmentation with Dilated Asymmetric Convolutions

Leonel Rosas-Arias, Gibran Benitez-Garcia, Jose Portillo-Portillo, Gabriel Sanchez-Perez, Keiji Yanai

Auto-TLDR; FASSD-Net: Dilated Asymmetric Pyramidal Fusion for Real-Time Semantic Segmentation

Abstract Slides Poster

Recent works have shown promising results applied to real-time semantic segmentation tasks. To maintain fast inference speed, most of the existing networks make use of light decoders, or they simply do not use them at all. This strategy helps to maintain a fast inference speed; however, their accuracy performance is significantly lower in comparison to non-real-time semantic segmentation networks. In this paper, we introduce two key modules aimed to design a high-performance decoder for real-time semantic segmentation for reducing the accuracy gap between real-time and non-real-time segmentation networks. Our first module, Dilated Asymmetric Pyramidal Fusion (DAPF), is designed to substantially increase the receptive field on the top of the last stage of the encoder, obtaining richer contextual features. Our second module, Multi-resolution Dilated Asymmetric (MDA) module, fuses and refines detail and contextual information from multi-scale feature maps coming from early and deeper stages of the network. Both modules exploit contextual information without excessively increasing the computational complexity by using asymmetric convolutions. Our proposed network entitled “FASSD-Net” reaches 78.8% of mIoU accuracy on the Cityscapes validation dataset at 41.1 FPS on full resolution images (1024x2048). Besides, with a light version of our network, we reach 74.1% of mIoU at 133.1 FPS (full resolution) on a single NVIDIA GTX 1080Ti card with no additional acceleration techniques. The source code and pre-trained models are available at https://github.com/GibranBenitez/FASSD-Net.

Similar papers

Multi-Direction Convolution for Semantic Segmentation

Dehui Li, Zhiguo Cao, Ke Xian, Xinyuan Qi, Chao Zhang, Hao Lu

Auto-TLDR; Multi-Direction Convolution for Contextual Segmentation

Abstract Slides Similar

Context is known to be one of crucial factors effecting the performance improvement of semantic segmentation. However, state-of-the-art segmentation models built upon fully convolutional networks are inherently weak in encoding contextual information because of stacked local operations such as convolution and pooling. Failing to capture context leads to inferior segmentation performance. Despite many context modules have been proposed to relieve this problem, they still operate in a local manner or use the same contextual information in different positions (due to upsampling). In this paper, we introduce the idea of Multi-Direction Convolution (MDC)—a novel operator capable of encoding rich contextual information. This operator is inspired by an observation that the standard convolution only slides along the spatial dimension (x, y direction) where the channel dimension (z direction) is fixed, which renders slow growth of the receptive field (RF). If considering the channel-fixed convolution to be one-direction, MDC is multi-direction in the sense that MDC slides along both spatial and channel dimensions, i.e., it slides along x, y when z is fixed, along x, z when y is fixed, and along y, z when x is fixed. In this way, MDC is able to encode rich contextual information with the fast increase of the RF. Compared to existing context modules, the encoded context is position-sensitive because no upsampling is required. MDC is also efficient and easy to implement. It can be implemented with few standard convolution layers with permutation. We show through extensive experiments that MDC effectively and selectively enlarges the RF and outperforms existing contextual modules on two standard benchmarks, including Cityscapes and PASCAL VOC2012.

Transitional Asymmetric Non-Local Neural Networks for Real-World Dirt Road Segmentation

Yooseung Wang, Jihun Park

Auto-TLDR; Transitional Asymmetric Non-Local Neural Networks for Semantic Segmentation on Dirt Roads

Fast and Accurate Real-Time Semantic Segmentation with Dilated Asymmetric Convolutions

Similar papers

Multi-Direction Convolution for Semantic Segmentation

Transitional Asymmetric Non-Local Neural Networks for Real-World Dirt Road Segmentation

Real-Time Semantic Segmentation Via Region and Pixel Context Network

Global-Local Attention Network for Semantic Segmentation in Aerial Images

PSDNet: A Balanced Architecture of Accuracy and Parameters for Semantic Segmentation

GSTO: Gated Scale-Transfer Operation for Multi-Scale Feature Learning in Semantic Segmentation

Semantic Segmentation Refinement Using Entropy and Boundary-guided Monte Carlo Sampling and Directed Regional Search

Enhanced Feature Pyramid Network for Semantic Segmentation

Boundary-Aware Graph Convolution for Semantic Segmentation

E-DNAS: Differentiable Neural Architecture Search for Embedded Systems

Enhancing Semantic Segmentation of Aerial Images with Inhibitory Neurons

A Fine-Grained Dataset and Its Efficient Semantic Segmentation for Unstructured Driving Scenarios

FastCompletion: A Cascade Network with Multiscale Group-Fused Inputs for Real-Time Depth Completion

Multiscale Attention-Based Prototypical Network for Few-Shot Semantic Segmentation

Stage-Wise Neural Architecture Search

Context-Aware Residual Module for Image Classification

UHRSNet: A Semantic Segmentation Network Specifically for Ultra-High-Resolution Images

Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting

NAS-EOD: An End-To-End Neural Architecture Search Method for Efficient Object Detection

EdgeNet: Semantic Scene Completion from a Single RGB-D Image

Real-Time Monocular Depth Estimation with Extremely Light-Weight Neural Network

Triplet-Path Dilated Network for Detection and Segmentation of General Pathological Images

Attention Pyramid Module for Scene Recognition

Multiple Document Datasets Pre-Training Improves Text Line Detection with Deep Neural Networks

OCT Image Segmentation Using NeuralArchitecture Search and SRGAN

Attention Based Coupled Framework for Road and Pothole Segmentation

Delivering Meaningful Representation for Monocular Depth Estimation

Attention As Activation

Boosting High-Level Vision with Joint Compression Artifacts Reduction and Super-Resolution

VPU Specific CNNs through Neural Architecture Search

Efficient-Receptive Field Block with Group Spatial Attention Mechanism for Object Detection

Progressive Scene Segmentation Based on Self-Attention Mechanism

FastSal: A Computationally Efficient Network for Visual Saliency Prediction

Dual Encoder Fusion U-Net (DEFU-Net) for Cross-manufacturer Chest X-Ray Segmentation

Hierarchically Aggregated Residual Transformation for Single Image Super Resolution

LiNet: A Lightweight Network for Image Super Resolution

Neural Architecture Search for Image Super-Resolution Using Densely Connected Search Space: DeCoNAS

Operation and Topology Aware Fast Differentiable Architecture Search

Dynamic Multi-Path Neural Network

Temporal Feature Enhancement Network with External Memory for Object Detection in Surveillance Video

ResFPN: Residual Skip Connections in Multi-Resolution Feature Pyramid Networks for Accurate Dense Pixel Matching

Slimming ResNet by Slimming Shortcut

Dynamic Guided Network for Monocular Depth Estimation

Deeply-Fused Attentive Network for Stereo Matching

BiLuNet: A Multi-Path Network for Semantic Segmentation on X-Ray Images

PC-Net: A Deep Network for 3D Point Clouds Analysis

Fine-Tuning DARTS for Image Classification

FatNet: A Feature-Attentive Network for 3D Point Cloud Processing