ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Novel View Synthesis from a 6-DoF Pose by Two-Stage Networks

Xiang Guo, Bo Li, Yuchao Dai, Tongxin Zhang, Hui Deng

Auto-TLDR; Novel View Synthesis from a 6-DoF Pose Using Generative Adversarial Network

Abstract Slides Poster

Novel view synthesis is a challenging problem in 3D vision and robotics. Different from the existing works, which need the reference images or 3D model, we propose a novel paradigm to this problem. That is, we synthesize the novel view from a 6-DoF pose directly. Although this setting is the most straightforward way, there are few works addressing it. While, our experiments demonstrate that, with a concise CNN, we could get a meaningful parametric model which could reconstruct the correct scenery images only from the 6-DoF pose. To this end, we propose a two-stage learning strategy, which consists of two consecutive CNNs: GenNet and RefineNet. The GenNet generates a coarse image from a camera pose. The RefineNet is a generative adversarial network that could refine the coarse image. In this way, we decouple the geometric relationship mapping and texture detail rendering. Extensive experiments conducted on the public datasets prove the effectiveness of our method. We believe this paradigm is of high research and application value and could be an important direction in novel view synthesis. We will share our code after the acceptance of this work.

Similar papers

5D Light Field Synthesis from a Monocular Video

Kyuho Bae, Andre Ivan, Hajime Nagahara, In Kyu Park

Auto-TLDR; Synthesis of Light Field Video from Monocular Video using Deep Learning

Abstract Slides Similar

Commercially available light field cameras have difficulty in capturing 5D (4D + time) light field videos. They can only capture still light filed images or are excessively expensive for normal users to capture the light field video. To tackle this problem, we propose a deep learning-based method for synthesizing a light field video from a monocular video. We propose a new synthetic light field video dataset that renders photorealistic scenes using Unreal Engine because no light field video dataset is available. The proposed deep learning framework synthesizes the light field video with a full set (9x9) of sub-aperture images from a normal monocular video. The proposed network consists of three sub-networks, namely, feature extraction, 5D light field video synthesis, and temporal consistency refinement. Experimental results show that our model can successfully synthesize the light field video for synthetic and real scenes and outperforms the previous frame-by-frame method quantitatively and qualitatively.

Hybrid Approach for 3D Head Reconstruction: Using Neural Networks and Visual Geometry

Oussema Bouafif, Bogdan Khomutenko, Mohammed Daoudi

Auto-TLDR; Recovering 3D Head Geometry from a Single Image using Deep Learning and Geometric Techniques

Novel View Synthesis from a 6-DoF Pose by Two-Stage Networks

Similar papers

5D Light Field Synthesis from a Monocular Video

Hybrid Approach for 3D Head Reconstruction: Using Neural Networks and Visual Geometry

VITON-GT: An Image-Based Virtual Try-On Model with Geometric Transformations

Deep Realistic Novel View Generation for City-Scale Aerial Images

Free-Form Image Inpainting Via Contrastive Attention Network

A Multi-Task Neural Network for Action Recognition with 3D Key-Points

Extending Single Beam Lidar to Full Resolution by Fusing with Single Image Depth Estimation

SIDGAN: Single Image Dehazing without Paired Supervision

Towards Efficient 3D Point Cloud Scene Completion Via Novel Depth View Synthesis

Future Urban Scenes Generation through Vehicles Synthesis

GarmentGAN: Photo-Realistic Adversarial Fashion Transfer

Learning to Take Directions One Step at a Time

Continuous Learning of Face Attribute Synthesis

Unsupervised Face Manipulation Via Hallucination

Position-Aware and Symmetry Enhanced GAN for Radial Distortion Correction

Robust Pedestrian Detection in Thermal Imagery Using Synthesized Images

Do We Really Need Scene-Specific Pose Encoders?

A GAN-Based Blind Inpainting Method for Masonry Wall Images

Mask-Based Style-Controlled Image Synthesis Using a Mask Style Encoder

UCCTGAN: Unsupervised Clothing Color Transformation Generative Adversarial Network

Let's Play Music: Audio-Driven Performance Video Generation

SECI-GAN: Semantic and Edge Completion for Dynamic Objects Removal

Local Facial Attribute Transfer through Inpainting

The Role of Cycle Consistency for Generating Better Human Action Videos from a Single Frame

MixedFusion: 6D Object Pose Estimation from Decoupled RGB-Depth Features

Coherence and Identity Learning for Arbitrary-Length Face Video Generation

Image Inpainting with Contrastive Relation Network

Photometric Stereo with Twin-Fisheye Cameras

Orthographic Projection Linear Regression for Single Image 3D Human Pose Estimation

Learning to Implicitly Represent 3D Human Body from Multi-Scale Features and Multi-View Images

Super-Resolution Guided Pore Detection for Fingerprint Recognition

Semantic-Guided Inpainting Network for Complex Urban Scenes Manipulation

Global Image Sentiment Transfer

Ω-GAN: Object Manifold Embedding GAN for Image Generation by Disentangling Parameters into Pose and Shape Manifolds

Augmentation of Small Training Data Using GANs for Enhancing the Performance of Image Classification

An Unsupervised Approach towards Varying Human Skin Tone Using Generative Adversarial Networks

Efficient Shadow Detection and Removal Using Synthetic Data with Domain Adaptation

Residual Learning of Video Frame Interpolation Using Convolutional LSTM

Learning Disentangled Representations for Identity Preserving Surveillance Face Camouflage

Attributes Aware Face Generation with Generative Adversarial Networks

Dual-MTGAN: Stochastic and Deterministic Motion Transfer for Image-To-Video Synthesis

Real-Time Monocular Depth Estimation with Extremely Light-Weight Neural Network

Partially Supervised Multi-Task Network for Single-View Dietary Assessment

Video Lightening with Dedicated CNN Architecture

Unsupervised Learning of Landmarks Based on Inter-Intra Subject Consistencies

Cascade Attention Guided Residue Learning GAN for Cross-Modal Translation

Attention2AngioGAN: Synthesizing Fluorescein Angiography from Retinal Fundus Images Using Generative Adversarial Networks

Boundary Guided Image Translation for Pose Estimation from Ultra-Low Resolution Thermal Sensor