ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

SAILenv: Learning in Virtual Visual Environments Made Simple

Enrico Meloni, Luca Pasqualini, Matteo Tiezzi, Marco Gori, Stefano Melacci

Auto-TLDR; SAILenv: A Simple and Customized Platform for Visual Recognition in Virtual 3D Environment

Abstract Slides Poster

Recently, researchers in Machine Learning algorithms, Computer Vision scientists, engineers and others, showed a growing interest in 3D simulators as a mean to artificially create experimental settings that are very close to those in the real world. However, most of the existing platforms to interface algorithms with 3D environments are often designed to setup navigation-related experiments, to study physical interactions, or to handle ad-hoc cases that are not thought to be customized, sometimes lacking a strong photorealistic appearance and an easy-to-use software interface. In this paper, we present a novel platform, SAILenv, that is specifically designed to be simple and customizable, and that allows researchers to experiment visual recognition in virtual 3D scenes. A few lines of code are needed to interface every algorithm with the virtual world, and non-3D-graphics experts can easily customize the 3D environment itself, exploiting a collection of photorealistic objects. Our framework yields pixel-level semantic and instance labeling, depth, and, to the best of our knowledge, it is the only one that provides motion-related information directly inherited from the 3D engine. The client-server communication operates at a low level, avoiding the overhead of HTTP-based data exchanges. We perform experiments using a state-of-the-art object detector trained on real-world images, showing that it is able to recognize the photorealistic 3D objects of our environment. The computational burden of the optical flow compares favourably with the estimation performed using modern GPU-based convolutional networks or more classic implementations. We believe that the scientific community will benefit from the easiness and high-quality of our framework to evaluate newly proposed algorithms in their own customized realistic conditions.

Similar papers

On Embodied Visual Navigation in Real Environments through Habitat

Marco Rosano, Antonino Furnari, Luigi Gulino, Giovanni Maria Farinella

Auto-TLDR; Learning Navigation Policies on Real World Observations using Real World Images and Sensor and Actuation Noise

SAILenv: Learning in Virtual Visual Environments Made Simple

Similar papers

On Embodied Visual Navigation in Real Environments through Habitat

The DeepHealth Toolkit: A Unified Framework to Boost Biomedical Applications

Explore and Explain: Self-Supervised Navigation and Recounting

Deep Reinforcement Learning on a Budget: 3D Control and Reasoning without a Supercomputer

Multiple Future Prediction Leveraging Synthetic Trajectories

Object-Oriented Map Exploration and Construction Based on Auxiliary Task Aided DRL

RISEdb: A Novel Indoor Localization Dataset

Future Urban Scenes Generation through Vehicles Synthesis

Surface Material Dataset for Robotics Applications (SMDRA): A Dataset with Friction Coefficient and RGB-D for Surface Segmentation

Object Segmentation Tracking from Generic Video Cues

Two-Stage Adaptive Object Scene Flow Using Hybrid CNN-CRF Model

Learning Dictionaries of Kinematic Primitives for Action Classification

Transformer Networks for Trajectory Forecasting

OmniFlowNet: A Perspective Neural Network Adaptation for Optical Flow Estimation in Omnidirectional Images

Effective Deployment of CNNs for 3DoF Pose Estimation and Grasping in Industrial Settings

A Bayesian Approach to Reinforcement Learning of Vision-Based Vehicular Control

Improving Robotic Grasping on Monocular Images Via Multi-Task Learning and Positional Loss

Polarimetric Image Augmentation

Derivation of Geometrically and Semantically Annotated UAV Datasets at Large Scales from 3D City Models

Benchmarking Cameras for OpenVSLAM Indoors

Enhancing Deep Semantic Segmentation of RGB-D Data with Entangled Forests

Learning from Learners: Adapting Reinforcement Learning Agents to Be Competitive in a Card Game

Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition

STaRFlow: A SpatioTemporal Recurrent Cell for Lightweight Multi-Frame Optical Flow Estimation

Low Dimensional State Representation Learning with Reward-Shaped Priors

P2D: A Self-Supervised Method for Depth Estimation from Polarimetry

DAG-Net: Double Attentive Graph Neural Network for Trajectory Forecasting

IPT: A Dataset for Identity Preserved Tracking in Closed Domains

IPN Hand: A Video Dataset and Benchmark for Real-Time Continuous Hand Gesture Recognition

RWF-2000: An Open Large Scale Video Database for Violence Detection

Weight Estimation from an RGB-D Camera in Top-View Configuration

Learning to Segment Dynamic Objects Using SLAM Outliers

Motion-Supervised Co-Part Segmentation

Learning Non-Rigid Surface Reconstruction from Spatio-Temporal Image Patches

Detecting Anomalies from Video-Sequences: A Novel Descriptor

What and How? Jointly Forecasting Human Action and Pose

End-To-End Deep Learning Methods for Automated Damage Detection in Extreme Events at Various Scales

Early Wildfire Smoke Detection in Videos

Self-Supervised Detection and Pose Estimation of Logistical Objects in 3D Sensor Data

Light3DPose: Real-Time Multi-Person 3D Pose Estimation from Multiple Views

A Novel Region of Interest Extraction Layer for Instance Segmentation

Semantic Object Segmentation in Cultural Sites Using Real and Synthetic Data

Extending Single Beam Lidar to Full Resolution by Fusing with Single Image Depth Estimation

A Grid-Based Representation for Human Action Recognition

Ground-truthing Large Human Behavior Monitoring Datasets

Developing Motion Code Embedding for Action Recognition in Videos

A Heuristic-Based Decision Tree for Connected Components Labeling of 3D Volumes

A Fine-Grained Dataset and Its Efficient Semantic Segmentation for Unstructured Driving Scenarios