ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Video Semantic Segmentation Using Deep Multi-View Representation Learning

Akrem Sellami, Salvatore Tabbone

Auto-TLDR; Deep Multi-view Representation Learning for Video Object Segmentation

Abstract Slides Poster

In this paper, we propose a deep learning model based on deep multi-view representation learning, to address the video object segmentation task. The proposed model emphasizes the importance of the inherent correlation between video frames and incorporates a multi-view representation learning based on deep canonically correlated autoencoders. The multi-view representation learning in our model provides an efficient mechanism for capturing inherent correlations by jointly extracting useful features and learning better representation into a joint feature space, i.e., shared representation. To increase the training data and the learning capacity, we train the proposed model with pairs of video frames, i.e., $F_{a}$ and $F_{b}$. During the segmentation phase, the deep canonically correlated autoencoders model encodes useful features by processing multiple reference frames together, which is used to detect the frequently reappearing. Our model enhances the state-of-the-art deep learning-based methods that mainly focus on learning discriminative foreground representations over appearance and motion. Experimental results over two large benchmarks demonstrate the ability of the proposed method to outperform competitive approaches and to reach good performances, in terms of semantic segmentation.

Similar papers

ACCLVOS: Atrous Convolution with Spatial-Temporal ConvLSTM for Video Object Segmentation

Muzhou Xu, Shan Zong, Chunping Liu, Shengrong Gong, Zhaohui Wang, Yu Xia

Auto-TLDR; Semi-supervised Video Object Segmentation using U-shape Convolution and ConvLSTM

Video Semantic Segmentation Using Deep Multi-View Representation Learning

Similar papers

ACCLVOS: Atrous Convolution with Spatial-Temporal ConvLSTM for Video Object Segmentation

Revisiting Sequence-To-Sequence Video Object Segmentation with Multi-Task Loss and Skip-Memory

Learning Object Deformation and Motion Adaption for Semi-Supervised Video Object Segmentation

Object Segmentation Tracking from Generic Video Cues

Early Wildfire Smoke Detection in Videos

Siamese Dynamic Mask Estimation Network for Fast Video Object Segmentation

Human Segmentation with Dynamic LiDAR Data

Motion U-Net: Multi-Cue Encoder-Decoder Network for Motion Segmentation

TSMSAN: A Three-Stream Multi-Scale Attentive Network for Video Saliency Detection

Residual Learning of Video Frame Interpolation Using Convolutional LSTM

Boundary-Aware Graph Convolution for Semantic Segmentation

Video Object Detection Using Object's Motion Context and Spatio-Temporal Feature Aggregation

A Grid-Based Representation for Human Action Recognition

Siamese Fully Convolutional Tracker with Motion Correction

MFI: Multi-Range Feature Interchange for Video Action Recognition

Multiscale Attention-Based Prototypical Network for Few-Shot Semantic Segmentation

Global-Local Attention Network for Semantic Segmentation in Aerial Images

Machine-Learned Regularization and Polygonization of Building Segmentation Masks

Motion-Supervised Co-Part Segmentation

Tracking Fast Moving Objects by Segmentation Network

Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition

Incorporating Depth Information into Few-Shot Semantic Segmentation

Coarse to Fine: Progressive and Multi-Task Learning for Salient Object Detection

Do Not Treat Boundaries and Regions Differently: An Example on Heart Left Atrial Segmentation

Video Reconstruction by Spatio-Temporal Fusion of Blurred-Coded Image Pair

Attention Based Coupled Framework for Road and Pothole Segmentation

GraphBGS: Background Subtraction Via Recovery of Graph Signals

Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action Localization

Gabriella: An Online System for Real-Time Activity Detection in Untrimmed Security Videos

3D Semantic Labeling of Photogrammetry Meshes Based on Active Learning

PHNet: Parasite-Host Network for Video Crowd Counting

Multi-Direction Convolution for Semantic Segmentation

What and How? Jointly Forecasting Human Action and Pose

RescueNet: Joint Building Segmentation and Damage Assessment from Satellite Imagery

Forground-Guided Vehicle Perception Framework

Directed Variational Cross-encoder Network for Few-Shot Multi-image Co-segmentation

Enhanced Feature Pyramid Network for Semantic Segmentation

Self-Supervised Learning of Dynamic Representations for Static Images

Modeling Long-Term Interactions to Enhance Action Recognition

CAggNet: Crossing Aggregation Network for Medical Image Segmentation

Future Urban Scenes Generation through Vehicles Synthesis

HMFlow: Hybrid Matching Optical Flow Network for Small and Fast-Moving Objects

Progressive Scene Segmentation Based on Self-Attention Mechanism

Flow-Guided Spatial Attention Tracking for Egocentric Activity Recognition

Global Feature Aggregation for Accident Anticipation

CASNet: Common Attribute Support Network for Image Instance and Panoptic Segmentation

TinyVIRAT: Low-Resolution Video Action Recognition

FOANet: A Focus of Attention Network with Application to Myocardium Segmentation