ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Mutual Information Based Method for Unsupervised Disentanglement of Video Representation

Aditya Sreekar P, Ujjwal Tiwari, Anoop Namboodiri

Auto-TLDR; MIPAE: Mutual Information Predictive Auto-Encoder for Video Prediction

Abstract Slides Poster

Video Prediction is an interesting and challenging task of predicting future frames from a given set context frames that belong to a video sequence. Video prediction models have found prospective applications in Maneuver Planning, Health care, Autonomous Navigation and Simulation. One of the major challenges in future frame generation is due to the high dimensional nature of visual data. In this work, we propose Mutual Information Predictive Auto-Encoder (MIPAE) framework, that reduces the task of predicting high dimensional video frames by factorising video representations into content and low dimensional pose latent variables that are easy to predict. A standard LSTM network is used to predict these low dimensional pose representations. Content and the predicted pose representations are decoded to generate future frames. Our approach leverages the temporal structure of the latent generative factors of a video and a novel mutual information loss to learn disentangled video representations. We also propose a metric based on mutual information gap (MIG) to quantitatively access the effectiveness of disentanglement on DSprites and MPI3D-real datasets. MIG scores corroborate with the visual superiority of frames predicted by MIPAE. We also compare our method quantitatively on evaluation metrics LPIPS, SSIM and PSNR.

Similar papers

Reducing the Variance of Variational Estimates of Mutual Information by Limiting the Critic's Hypothesis Space to RKHS

Aditya Sreekar P, Ujjwal Tiwari, Anoop Namboodiri

Auto-TLDR; Mutual Information Estimation from Variational Lower Bounds Using a Critic's Hypothesis Space

Abstract Slides Similar

Mutual information (MI) is an information-theoretic measure of dependency between two random variables. Several methods to estimate MI, from samples of two random variables with unknown underlying probability distributions have been proposed in the literature. Recent methods realize parametric probability distributions or critic as a neural network to approximate unknown density ratios. The approximated density ratios are used to estimate different variational lower bounds of MI. While these methods provide reliable estimation when the true MI is low, they produce high variance estimates in cases of high MI. We argue that the high variance characteristic is due to the uncontrolled complexity of the critic's hypothesis space. In support of this argument, we use the data-driven Rademacher complexity of the hypothesis space associated with the critic's architecture to analyse generalization error bound of variational lower bound estimates of MI. In the proposed work, we show that it is possible to negate the high variance characteristics of these estimators by constraining the critic's hypothesis space to Reproducing Hilbert Kernel Space (RKHS), which corresponds to a kernel learned using Automated Spectral Kernel Learning (ASKL). By analysing the aforementioned generalization error bounds, we augment the overall optimisation objective with effective regularisation term. We empirically demonstrate the efficacy of this regularization in enforcing proper bias variance tradeoff on four variational lower bounds, namely NWJ, MINE, JS and SMILE.

Dual-MTGAN: Stochastic and Deterministic Motion Transfer for Image-To-Video Synthesis

Fu-En Yang, Jing-Cheng Chang, Yuan-Hao Lee, Yu-Chiang Frank Wang

Auto-TLDR; Dual Motion Transfer GAN for Convolutional Neural Networks

Mutual Information Based Method for Unsupervised Disentanglement of Video Representation

Similar papers

Reducing the Variance of Variational Estimates of Mutual Information by Limiting the Critic's Hypothesis Space to RKHS

Dual-MTGAN: Stochastic and Deterministic Motion Transfer for Image-To-Video Synthesis

Learning to Take Directions One Step at a Time

AVAE: Adversarial Variational Auto Encoder

Learning Interpretable Representation for 3D Point Clouds

Disentangled Representation Learning for Controllable Image Synthesis: An Information-Theoretic Perspective

Future Urban Scenes Generation through Vehicles Synthesis

The Role of Cycle Consistency for Generating Better Human Action Videos from a Single Frame

Variational Deep Embedding Clustering by Augmented Mutual Information Maximization

Variational Capsule Encoder

A Joint Representation Learning and Feature Modeling Approach for One-Class Recognition

JUMPS: Joints Upsampling Method for Pose Sequences

Disentangle, Assemble, and Synthesize: Unsupervised Learning to Disentangle Appearance and Location

Interpolation in Auto Encoders with Bridge Processes

Combining GANs and AutoEncoders for Efficient Anomaly Detection

DAG-Net: Double Attentive Graph Neural Network for Trajectory Forecasting

Let's Play Music: Audio-Driven Performance Video Generation

Phase Retrieval Using Conditional Generative Adversarial Networks

Talking Face Generation Via Learning Semantic and Temporal Synchronous Landmarks

Exemplar Guided Cross-Spectral Face Hallucination Via Mutual Information Disentanglement

Motion-Supervised Co-Part Segmentation

Video Anomaly Detection by Estimating Likelihood of Representations

Semantics-Guided Representation Learning with Applications to Visual Synthesis

Residual Learning of Video Frame Interpolation Using Convolutional LSTM

AG-GAN: An Attentive Group-Aware GAN for Pedestrian Trajectory Prediction

Shape Consistent 2D Keypoint Estimation under Domain Shift

Switching Dynamical Systems with Deep Neural Networks

Video Reconstruction by Spatio-Temporal Fusion of Blurred-Coded Image Pair

Discriminative Multi-Level Reconstruction under Compact Latent Space for One-Class Novelty Detection

Epitomic Variational Graph Autoencoder

PoseCVAE: Anomalous Human Activity Detection

GAN-Based Gaussian Mixture Model Responsibility Learning

Local Facial Attribute Transfer through Inpainting

What and How? Jointly Forecasting Human Action and Pose

Generative Deep-Neural-Network Mixture Modeling with Semi-Supervised MinMax+EM Learning

Image Representation Learning by Transformation Regression

Improved anomaly detection by training an autoencoder with skip connections on images corrupted with Stain-shaped noise

Single-Modal Incremental Terrain Clustering from Self-Supervised Audio-Visual Feature Learning

Temporally Coherent Embeddings for Self-Supervised Video Representation Learning

Pretraining Image Encoders without Reconstruction Via Feature Prediction Loss

Auto Encoding Explanatory Examples with Stochastic Paths

Learning Low-Shot Generative Networks for Cross-Domain Data

Transferable Model for Shape Optimization subject to Physical Constraints

Estimation of Clinical Tremor Using Spatio-Temporal Adversarial AutoEncoder

High Resolution Face Age Editing

Variational Inference with Latent Space Quantization for Adversarial Resilience

IDA-GAN: A Novel Imbalanced Data Augmentation GAN

Feature-Aware Unsupervised Learning with Joint Variational Attention and Automatic Clustering