ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Revisiting Sequence-To-Sequence Video Object Segmentation with Multi-Task Loss and Skip-Memory

Fatemeh Azimi, Benjamin Bischke, Sebastian Palacio, Federico Raue, Jörn Hees, Andreas Dengel

Auto-TLDR; Sequence-to-Sequence Learning for Video Object Segmentation

Abstract Slides Poster

Video Object Segmentation (VOS) is an active research area of the visual domain. One of its fundamental sub-tasks is semi-supervised / one-shot learning: given only the segmentation mask for the first frame, the task is to provide pixel-accurate masks for the object over the rest of the sequence. Despite much progress in the last years, we noticed that many of the existing approaches lose objects in longer sequences, especially when the object is small or briefly occluded. In this work, we build upon a sequence-to-sequence approach that employs an encoder-decoder architecture together with a memory module for exploiting the sequential data. We further improve this approach by proposing a model that manipulates multi-scale spatio-temporal information using memory-equipped skip connections. Furthermore, we incorporate an auxiliary task based on distance classification which greatly enhances the quality of edges in segmentation masks. We compare our approach to the state of the art and show considerable improvement in the contour accuracy metric and the overall segmentation accuracy.

Similar papers

ACCLVOS: Atrous Convolution with Spatial-Temporal ConvLSTM for Video Object Segmentation

Muzhou Xu, Shan Zong, Chunping Liu, Shengrong Gong, Zhaohui Wang, Yu Xia

Auto-TLDR; Semi-supervised Video Object Segmentation using U-shape Convolution and ConvLSTM

Revisiting Sequence-To-Sequence Video Object Segmentation with Multi-Task Loss and Skip-Memory

Similar papers

ACCLVOS: Atrous Convolution with Spatial-Temporal ConvLSTM for Video Object Segmentation

Video Semantic Segmentation Using Deep Multi-View Representation Learning

Learning Object Deformation and Motion Adaption for Semi-Supervised Video Object Segmentation

Siamese Dynamic Mask Estimation Network for Fast Video Object Segmentation

Object Segmentation Tracking from Generic Video Cues

Gabriella: An Online System for Real-Time Activity Detection in Untrimmed Security Videos

Early Wildfire Smoke Detection in Videos

Transitional Asymmetric Non-Local Neural Networks for Real-World Dirt Road Segmentation

Tracking Fast Moving Objects by Segmentation Network

Motion U-Net: Multi-Cue Encoder-Decoder Network for Motion Segmentation

Human Segmentation with Dynamic LiDAR Data

Coarse to Fine: Progressive and Multi-Task Learning for Salient Object Detection

A Novel Region of Interest Extraction Layer for Instance Segmentation

Detective: An Attentive Recurrent Model for Sparse Object Detection

Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition

CAggNet: Crossing Aggregation Network for Medical Image Segmentation

Automatic Semantic Segmentation of Structural Elements related to the Spinal Cord in the Lumbar Region by Using Convolutional Neural Networks

Superpixel-Based Refinement for Object Proposal Generation

Motion-Supervised Co-Part Segmentation

A Multi-Task Contextual Atrous Residual Network for Brain Tumor Detection & Segmentation

A Fine-Grained Dataset and Its Efficient Semantic Segmentation for Unstructured Driving Scenarios

Residual Learning of Video Frame Interpolation Using Convolutional LSTM

STaRFlow: A SpatioTemporal Recurrent Cell for Lightweight Multi-Frame Optical Flow Estimation

Boundary-Aware Graph Convolution for Semantic Segmentation

Flow-Guided Spatial Attention Tracking for Egocentric Activity Recognition

TinyVIRAT: Low-Resolution Video Action Recognition

Feature Pyramid Hierarchies for Multi-Scale Temporal Action Detection

FourierNet: Compact Mask Representation for Instance Segmentation Using Differentiable Shape Decoders

RescueNet: Joint Building Segmentation and Damage Assessment from Satellite Imagery

Future Urban Scenes Generation through Vehicles Synthesis

Modeling Long-Term Interactions to Enhance Action Recognition

CT-UNet: An Improved Neural Network Based on U-Net for Building Segmentation in Remote Sensing Images

Multiscale Attention-Based Prototypical Network for Few-Shot Semantic Segmentation

Learning to Segment Clustered Amoeboid Cells from Brightfield Microscopy Via Multi-Task Learning with Adaptive Weight Selection

Directed Variational Cross-encoder Network for Few-Shot Multi-image Co-segmentation

ResFPN: Residual Skip Connections in Multi-Resolution Feature Pyramid Networks for Accurate Dense Pixel Matching

SFPN: Semantic Feature Pyramid Network for Object Detection

Point In: Counting Trees with Weakly Supervised Segmentation Network

Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting

FOANet: A Focus of Attention Network with Application to Myocardium Segmentation

PSDNet: A Balanced Architecture of Accuracy and Parameters for Semantic Segmentation

Attentive Visual Semantic Specialized Network for Video Captioning

Planar 3D Transfer Learning for End to End Unimodal MRI Unbalanced Data Segmentation

What and How? Jointly Forecasting Human Action and Pose

Context Matters: Self-Attention for Sign Language Recognition

Global-Local Attention Network for Semantic Segmentation in Aerial Images

Video Object Detection Using Object's Motion Context and Spatio-Temporal Feature Aggregation

Towards Practical Compressed Video Action Recognition: A Temporal Enhanced Multi-Stream Network