Visual Prediction of Driver Behavior in Shared Road Areas

Peter Gawronski, Darius Burschka

Responsive image

Auto-TLDR; Predicting Vehicle Behavior in Shared Road Segment Intersections Using Topological Knowledge

Slides Poster

We propose a framework to analyze and predict vehicles behavior within shared road segments like intersections or at narrow passages. The system first identifies critical interaction regions based on topological knowledge. It then checks possible colliding trajectories from the current state of vehicles in the scene, defined by overlapping occupation times in road segments. For each possible interaction area, it analyzes the behavioral profile of both vehicles. Depending on right of way and (unpredictable) behavior parameters, different outcomes are expected and will be tested against input. The interaction between vehicles is analyzed over a short time horizon based on an initial action from one vehicle and the reaction by the other. The vehicle to yield most often performs the first action and the response of the opponent vehicle is measured after a reaction time. The observed reaction is classified by attention, if there was a reaction at all, and the collaboration of the opponent vehicle, whether it helps to resolve the situation or hinders it. The output is a classification of behavior of involved vehicles in terms of active participation in the interaction and assertiveness of driving style in terms of collaborative or disruptive behavior. The additional knowledge is used to refine the prediction of intention and outcome of a scene, which is then compared to the current status to catch unexpected behavior. The applicability of the concept and ideas of the approach is validated on scenarios from the recent Intersection Drone (inD) data set.

Similar papers

Vehicle Lane Merge Visual Benchmark

Kai Cordes, Hellward Broszio

Responsive image

Auto-TLDR; A Benchmark for Automated Cooperative Maneuvering Using Multi-view Video Streams and Ground Truth Vehicle Description

Slides Poster Similar

Automated driving is regarded as the most promising technology for improving road safety in the future. In this context, connected vehicles have an important role regarding their ability to perform cooperative maneuvers for challenging traffic situations. We propose a benchmark for automated cooperative maneuvers. The targeted cooperative maneuver is the vehicle lane merge where a vehicle on the acceleration lane merges into the traffic of a motorway. The benchmark enables the evaluation of vehicle localization approaches as well as the study of cooperative maneuvers. It consists of temporally synchronized multi-view video streams, highly accurate camera calibration, and ground truth vehicle descriptions, including position, heading, speed, and shape. For benchmark generation, the lane merge maneuver is performed by human drivers on a test track, resulting in 120 lane merge data sets with various traffic situations and video recording conditions.

Holistic Grid Fusion Based Stop Line Estimation

Runsheng Xu, Faezeh Tafazzoli, Li Zhang, Timo Rehfeld, Gunther Krehl, Arunava Seal

Responsive image

Auto-TLDR; Fused Multi-Sensory Data for Stop Lines Detection in Intersection Scenarios

Slides Similar

Intersection scenarios provide the most complex traffic situations in Autonomous Driving and Driving Assistance Systems. Knowing where to stop in advance in an intersection is an essential parameter in controlling the longitudinal velocity of the vehicle. Most of the existing methods in literature solely use cameras to detect stop lines, which is typically not sufficient in terms of detection range. To address this issue, we propose a method that takes advantage of fused multi-sensory data including stereo camera and lidar as input and utilizes a carefully designed convolutional neural network architecture to detect stop lines. Our experiments show that the proposed approach can improve detection range compared to camera data alone, works under heavy occlusion without observing the ground markings explicitly, is able to predict stop lines for all lanes and allows detection at a distance up to 50 meters.

Image Sequence Based Cyclist Action Recognition Using Multi-Stream 3D Convolution

Stefan Zernetsch, Steven Schreck, Viktor Kress, Konrad Doll, Bernhard Sick

Responsive image

Auto-TLDR; 3D-ConvNet: A Multi-stream 3D Convolutional Neural Network for Detecting Cyclists in Real World Traffic Situations

Slides Poster Similar

In this article, we present an approach to detect basic movements of cyclists in real world traffic situations based on image sequences, optical flow (OF) sequences, and past positions using a multi-stream 3D convolutional neural network (3D-ConvNet) architecture. To resolve occlusions of cyclists by other traffic participants or road structures, we use a wide angle stereo camera system mounted at a heavily frequented public intersection. We created a large dataset consisting of 1,639 video sequences containing cyclists, recorded in real world traffic, resulting in over 1.1 million samples. Through modeling the cyclists' behavior by a state machine of basic cyclist movements, our approach takes every situation into account and is not limited to certain scenarios. We compare our method to an approach solely based on position sequences. Both methods are evaluated taking into account frame wise and scene wise classification results of basic movements, and detection times of basic movement transitions, where our approach outperforms the position based approach by producing more reliable detections with shorter detection times. Our code and parts of our dataset are made publicly available.

Multiple Future Prediction Leveraging Synthetic Trajectories

Lorenzo Berlincioni, Federico Becattini, Lorenzo Seidenari, Alberto Del Bimbo

Responsive image

Auto-TLDR; Synthetic Trajectory Prediction using Markov Chains

Slides Poster Similar

Trajectory prediction is an important task, especially in autonomous driving. The ability to forecast the position of other moving agents can yield to an effective planning, ensuring safety for the autonomous vehicle as well for the observed entities. In this work we propose a data driven approach based on Markov Chains to generate synthetic trajectories, which are useful for training a multiple future trajectory predictor. The advantages are twofold: on the one hand synthetic samples can be used to augment existing datasets and train more effective predictors; on the other hand, it allows to generate samples with multiple ground truths, corresponding to diverse equally likely outcomes of the observed trajectory. We define a trajectory prediction model and a loss that explicitly address the multimodality of the problem and we show that combining synthetic and real data leads to prediction improvements, obtaining state of the art results.

Street-Map Based Validation of Semantic Segmentation in Autonomous Driving

Laura Von Rueden, Tim Wirtz, Fabian Hueger, Jan David Schneider, Nico Piatkowski, Christian Bauckhage

Responsive image

Auto-TLDR; Semantic Segmentation Mask Validation Using A-priori Knowledge from Street Maps

Slides Poster Similar

Artificial intelligence for autonomous driving must meet strict requirements on safety and robustness, which motivates the thorough validation of learned models. However, current validation approaches mostly require ground truth data and are thus both cost-intensive and limited in their applicability. We propose to overcome these limitations by a model agnostic validation using a-priori knowledge from street maps. In particular, we show how to validate semantic segmentation masks and demonstrate the potential of our approach using OpenStreetMap. We introduce validation metrics that indicate false positive or negative road segments. Besides the validation approach, we present a method to correct the vehicle's GPS position so that a more accurate localization can be used for the street map based validation. Lastly, we present quantitative results on the Cityscapes dataset indicating that our validation approach can indeed uncover errors in semantic segmentation masks.

Real-Time End-To-End Lane ID Estimation Using Recurrent Networks

Ibrahim Halfaoui, Fahd Bouzaraa, Onay Urfalioglu

Responsive image

Auto-TLDR; Real-Time, Vision-Only Lane Identification Using Monocular Camera

Slides Poster Similar

Acquiring information about the road lane structure is a crucial step for autonomous navigation. To this end, several approaches tackle this task from different perspectives such as lane marking detection or semantic lane segmentation.However, to the best of our knowledge, there is yet no purely vision based end-to-end solution to answer the precise question: How to estimate the relative number or "ID" of the current driven lane within a multi-lane road or a highway? In this work, we propose a real-time, vision-only (i.e. monocular camera) solution to the problem based on a dual left-right convention. We interpret this task as a classification problem by limiting the maximum number of lane candidates to eight. Our approach is designed to meet low-complexity specifications and limited runtime requirements. It harnesses the temporal dimension inherent to the input sequences to improve upon high complexity state-of-the-art models. We achieve more than 95% accuracy on a challenging test set with extreme conditions and different routes.

A Bayesian Approach to Reinforcement Learning of Vision-Based Vehicular Control

Zahra Gharaee, Karl Holmquist, Linbo He, Michael Felsberg

Responsive image

Auto-TLDR; Bayesian Reinforcement Learning for Autonomous Driving

Slides Poster Similar

In this paper, we present a state-of-the-art reinforcement learning method for autonomous driving. Our approach employs temporal difference learning in a Bayesian framework to learn vehicle control signals from sensor data. The agent has access to images from a forward facing camera, which are pre-processed to generate semantic segmentation maps. We trained our system using both ground truth and estimated semantic segmentation input. Based on our observations from a large set of experiments, we conclude that training the system on ground truth input data leads to better performance than training the system on estimated input even if estimated input is used for evaluation. The system is trained and evaluated in a realistic simulated urban environment using the CARLA simulator. The simulator also contains a benchmark that allows for comparing to other systems and methods. The required training time of the system is shown to be lower and the performance on the benchmark superior to competing approaches.

Multimodal End-To-End Learning for Autonomous Steering in Adverse Road and Weather Conditions

Jyri Sakari Maanpää, Josef Taher, Petri Manninen, Leo Pakola, Iaroslav Melekhov, Juha Hyyppä

Responsive image

Auto-TLDR; End-to-End Learning for Autonomous Steering in Adverse Road and Weather Conditions with Lidar Data

Slides Poster Similar

Autonomous driving is challenging in adverse road and weather conditions in which there might not be lane lines, the road might be covered in snow and the visibility might be poor. We extend the previous work on end-to-end learning for autonomous steering to operate in these adverse real-life conditions with multimodal data. We collected 28 hours of driving data in several road and weather conditions and trained convolutional neural networks to predict the car steering wheel angle from front-facing color camera images and lidar range and reflectance data. We compared the CNN model performances based on the different modalities and our results show that the lidar modality improves the performances of different multimodal sensor-fusion models. We also performed on-road tests with different models and they support this observation.

Attention Based Coupled Framework for Road and Pothole Segmentation

Shaik Masihullah, Ritu Garg, Prerana Mukherjee, Anupama Ray

Responsive image

Auto-TLDR; Few Shot Learning for Road and Pothole Segmentation on KITTI and IDD

Slides Poster Similar

In this paper, we propose a novel attention based coupled framework for road and pothole segmentation. In many developing countries as well as in rural areas, the drivable areas are neither well-defined, nor well-maintained. Under such circumstances, an Advance Driver Assistant System (ADAS) is needed to assess the drivable area and alert about the potholes ahead to ensure vehicle safety. Moreover, this information can also be used in structured environments for assessment and maintenance of road health. We demonstrate few shot learning approach for pothole detection to leverage accuracy even with fewer training samples. We report the exhaustive experimental results for road segmentation on KITTI and IDD datasets. We also present pothole segmentation on IDD.

RONELD: Robust Neural Network Output Enhancement for Active Lane Detection

Zhe Ming Chng, Joseph Mun Hung Lew, Jimmy Addison Lee

Responsive image

Auto-TLDR; Real-Time Robust Neural Network Output Enhancement for Active Lane Detection

Slides Poster Similar

Accurate lane detection is critical for navigation in autonomous vehicles, particularly the active lane which demarcates the single road space that the vehicle is currently traveling on. Recent state-of-the-art lane detection algorithms utilize convolutional neural networks (CNNs) to train deep learning models on popular benchmarks such as TuSimple and CULane. While each of these models works particularly well on train and test inputs obtained from the same dataset, the performance drops significantly on unseen datasets of different environments. In this paper, we present a real-time robust neural network output enhancement for active lane detection (RONELD) method to identify, track, and optimize active lanes from deep learning probability map outputs. We first adaptively extract lane points from the probability map outputs, followed by detecting curved and straight lanes before using weighted least squares linear regression on straight lanes to fix broken lane edges resulting from fragmentation of edge maps in real images. Lastly, we hypothesize true active lanes through tracking preceding frames. Experimental results demonstrate an up to two-fold increase in accuracy using RONELD on cross-dataset validation tests.

Visual Localization for Autonomous Driving: Mapping the Accurate Location in the City Maze

Dongfang Liu, Yiming Cui, Xiaolei Guo, Wei Ding, Baijian Yang, Yingjie Chen

Responsive image

Auto-TLDR; Feature Voting for Robust Visual Localization in Urban Settings

Slides Poster Similar

Accurate localization is a foundational capacity, required for autonomous vehicles to accomplish other tasks such as navigation or path planning. It is a common practice for vehicles to use GPS to acquire location information. However, the application of GPS can result in severe challenges when vehicles run within the inner city where different kinds of structures may shadow the GPS signal and lead to inaccurate location results. To address the localization challenges of urban settings, we propose a novel feature voting technique for visual localization. Different from the conventional front-view-based method, our approach employs views from three directions (front, left, and right) and thus significantly improves the robustness of location prediction. In our work, we craft the proposed feature voting method into three state-of-the-art visual localization networks and modify their architectures properly so that they can be applied for vehicular operation. Extensive field test results indicate that our approach can predict location robustly even in challenging inner-city settings. Our research sheds light on using the visual localization approach to help autonomous vehicles to find accurate location information in a city maze, within a desirable time constraint.

PolyLaneNet: Lane Estimation Via Deep Polynomial Regression

Talles Torres, Rodrigo Berriel, Thiago Paixão, Claudine Badue, Alberto F. De Souza, Thiago Oliveira-Santos

Responsive image

Auto-TLDR; Real-Time Lane Detection with Deep Polynomial Regression

Slides Poster Similar

One of the main factors that contributed to the large advances in autonomous driving is the advent of deep learning. For safer self-driving vehicles, one of the problems that has yet to be solved completely is lane detection. Since methods for this task have to work in real time (+30 FPS), they not only have to be effective (i.e., have high accuracy) but they also have to be efficient (i.e., fast). In this work, we present a novel method for lane detection that uses as input an image from a forward-looking camera mounted in the vehicle and outputs polynomials representing each lane marking in the image, via deep polynomial regression. The proposed method is shown to be competitive with existing state-of-the-art methods in the TuSimple dataset, while maintaining its efficiency (115 FPS). Additionally, extensive qualitative results on two additional public datasets are presented, alongside with limitations in the evaluation metrics used by recent works for lane detection. Finally, we provide source code and trained models that allow others to replicate all the results shown in this paper, which is surprisingly rare in state-of-the-art lane detection methods.

Ghost Target Detection in 3D Radar Data Using Point Cloud Based Deep Neural Network

Mahdi Chamseddine, Jason Rambach, Oliver Wasenmüler, Didier Stricker

Responsive image

Auto-TLDR; Point Based Deep Learning for Ghost Target Detection in 3D Radar Point Clouds

Slides Poster Similar

Ghost targets are targets that appear at wrong locations in radar data and are caused by the presence of multiple indirect reflections between the target and the sensor. In this work, we introduce the first point based deep learning approach for ghost target detection in 3D radar point clouds. This is done by extending the PointNet network architecture by modifying its input to include radar point features beyond location and introducing skip connetions. We compare different input modalities and analyze the effects of the changes we introduced. We also propose an approach for automatic labeling of ghost targets 3D radar data using lidar as reference. The algorithm is trained and tested on real data in various driving scenarios and the tests show promising results in classifying real and ghost radar targets.

DAG-Net: Double Attentive Graph Neural Network for Trajectory Forecasting

Alessio Monti, Alessia Bertugli, Simone Calderara, Rita Cucchiara

Responsive image

Auto-TLDR; Recurrent Generative Model for Multi-modal Human Motion Behaviour in Urban Environments

Slides Poster Similar

Understanding human motion behaviour is a critical task for several possible applications like self-driving cars or social robots, and in general for all those settings where an autonomous agent has to navigate inside a human-centric environment. This is non-trivial because human motion is inherently multi-modal: given a history of human motion paths, there are many plausible ways by which people could move in the future. Additionally, people activities are often driven by goals, e.g. reaching particular locations or interacting with the environment. We address both the aforementioned aspects by proposing a new recurrent generative model that considers both single agents’ future goals and interactions between different agents. The model exploits a double attention-based graph neural network to collect information about the mutual influences among different agents and integrates it with data about agents’ possible future objectives. Our proposal is general enough to be applied in different scenarios: the model achieves state-of-the-art results in both urban environments and also in sports applications.

Towards life-long mapping of dynamic environments using temporal persistence modeling

Georgios Tsamis, Ioannis Kostavelis, Dimitrios Giakoumis, Dimitrios Tzovaras

Responsive image

Auto-TLDR; Lifelong Mapping for Mobile Robot Navigation in Dynamic Environments

Slides Poster Similar

The contemporary SLAM mapping systems assume a static environment and build a map that is then used for mobile robot navigation disregarding the dynamic changes in this environment. The paper at hand presents a novel solution for the \emph{lifelong mapping} problem that continually updates a metric map represented as a 2D occupancy grid in large scale indoor environments with movable objects such as people, robots, objects etc. suitable for industrial applications. We formalize each cell's occupancy as a failure analysis problem and contribute temporal persistence modeling (TPM), an algorithm for probabilistic prediction of the time that a cell in an observed location is expected to be ``occupied" or ``empty" given sparse prior observations from a task specific mobile robot. Our work is evaluated in Gazebo simulation environment against the nominal occupancy of cells and the estimated obstacles persistence. We also show that robot navigation with lifelong mapping demands less re-plans and leads to more efficient navigation in highly dynamic environments.

Map-Based Temporally Consistent Geolocalization through Learning Motion Trajectories

Bing Zha, Alper Yilmaz

Responsive image

Auto-TLDR; Exploiting Motion Trajectories for Geolocalization of Object on Topological Map using Recurrent Neural Network

Slides Poster Similar

In this paper, we propose a novel trajectory learning method that exploits motion trajectories on topological map using recurrent neural network for temporally consistent geolocalization of object. Inspired by human's ability to both be aware of distance and direction of self-motion in navigation, our trajectory learning method learns a pattern representation of trajectories encoded as a sequence of distances and turning angles to assist self-localization. We pose the learning process as a conditional sequence prediction problem in which each output locates the object on a traversable edge in a map. Considering the prediction sequence ought to be topologically connected in the graph-structured map, we adopt two different hypotheses generation and elimination strategies to eliminate disconnected sequence prediction. We demonstrate our approach on the KITTI stereo visual odometry dataset which is a city-scale environment. The key benefits of our approach to geolocalization are that 1) we take advantage of powerful sequence modeling ability of recurrent neural network and its robustness to noisy input, 2) only require a map in the form of a graph and 3) simply use an affordable sensor that generates motion trajectory. The experiments show that the motion trajectories can be learned by training an recurrent neural network, and temporally consistent geolocation can be predicted with both of the proposed strategies.

Semantic Segmentation for Pedestrian Detection from Motion in Temporal Domain

Guo Cheng, Jiang Yu Zheng

Responsive image

Auto-TLDR; Motion Profile: Recognizing Pedestrians along with their Motion Directions in a Temporal Way

Slides Poster Similar

In autonomous driving, state-of-the-art methods detect pedestrian through appearance in 2-D spatial images. However, these approaches are typically time-consuming because of the complexity of algorithms to cope with large variations in shape, pose, action, and illumination. They also fall short of capturing temporal continuity in motion trace. In a completely different approach, this work recognizes pedestrians along with their motion directions in a temporal way. By projecting a driving video to a 2-D temporal image called Motion Profile (MP), we can robustly distinguish pedestrian in motion and standing-still against smooth background motion. To ensure non-redundant data processing of deep network on a compact motion profile further, a novel temporal-shift memory (TSM) model is developed to perform deep learning of sequential input in linear processing time. In experiments containing various pedestrian motion from sensors such as video and LiDAR, we demonstrate that, with the data size around 3/720th of video volume, this motion-based method can reach the detecting rate of pedestrians at 90% in near and mid-range on the road. With a super-fast processing speed and good accuracy, this method is promising for intelligent vehicles.

AV-SLAM: Autonomous Vehicle SLAM with Gravity Direction Initialization

Kaan Yilmaz, Baris Suslu, Sohini Roychowdhury, L. Srikar Muppirisetty

Responsive image

Auto-TLDR; VI-SLAM with AGI: A combination of three SLAM algorithms for autonomous vehicles

Slides Poster Similar

Simultaneous localization and mapping (SLAM) algorithms that are aimed at autonomous vehicles (AVs) are required to utilize sensor redundancies specific to AVs and enable accurate, fast and repeatable estimations of pose and path trajectories. In this work, we present a combination of three SLAM algorithms that utilize a different subset of available sensors such as inertial measurement unit (IMU), a gray-scale mono-camera, and a Lidar. Also, we propose a novel acceleration-based gravity direction initialization (AGI) method for the visual-inertial SLAM algorithm. We analyze the SLAM algorithms and initialization methods for pose estimation accuracy, speed of convergence and repeatability on the KITTI odometry sequences. The proposed VI-SLAM with AGI method achieves relative pose errors less than 2\%, convergence in half a minute or less and convergence time variability less than 3s, which makes it preferable for AVs.

Generic Merging of Structure from Motion Maps with a Low Memory Footprint

Gabrielle Flood, David Gillsjö, Patrik Persson, Anders Heyden, Kalle Åström

Responsive image

Auto-TLDR; A Low-Memory Footprint Representation for Robust Map Merge

Slides Poster Similar

With the development of cheap image sensors, the amount of available image data have increased enormously, and the possibility of using crowdsourced collection methods has emerged. This calls for development of ways to handle all these data. In this paper, we present new tools that will enable efficient, flexible and robust map merging. Assuming that separate optimisations have been performed for the individual maps, we show how only relevant data can be stored in a low memory footprint representation. We use these representations to perform map merging so that the algorithm is invariant to the merging order and independent of the choice of coordinate system. The result is a robust algorithm that can be applied to several maps simultaneously. The result of a merge can also be represented with the same type of low-memory footprint format, which enables further merging and updating of the map in a hierarchical way. Furthermore, the method can perform loop closing and also detect changes in the scene between the capture of the different image sequences. Using both simulated and real data — from both a hand held mobile phone and from a drone — we verify the performance of the proposed method.

Electroencephalography Signal Processing Based on Textural Features for Monitoring the Driver’s State by a Brain-Computer Interface

Giulia Orrù, Marco Micheletto, Fabio Terranova, Gian Luca Marcialis

Responsive image

Auto-TLDR; One-dimensional Local Binary Pattern Algorithm for Estimating Driver Vigilance in a Brain-Computer Interface System

Slides Poster Similar

In this study we investigate a textural processing method of electroencephalography (EEG) signal as an indicator to estimate the driver's vigilance in a hypothetical Brain-Computer Interface (BCI) system. The novelty of the solution proposed relies on employing the one-dimensional Local Binary Pattern (1D-LBP) algorithm for feature extraction from pre-processed EEG data. From the resulting feature vector, the classification is done according to three vigilance classes: awake, tired and drowsy. The claim is that the class transitions can be detected by describing the variations of the micro-patterns' occurrences along the EEG signal. The 1D-LBP is able to describe them by detecting mutual variations of the signal temporarily "close" as a short bit-code. Our analysis allows to conclude that the 1D-LBP adoption has led to significant performance improvement. Moreover, capturing the class transitions from the EEG signal is effective, although the overall performance is not yet good enough to develop a BCI for assessing the driver's vigilance in real environments.

Learning from Learners: Adapting Reinforcement Learning Agents to Be Competitive in a Card Game

Pablo Vinicius Alves De Barros, Ana Tanevska, Alessandra Sciutti

Responsive image

Auto-TLDR; Adaptive Reinforcement Learning for Competitive Card Games

Slides Poster Similar

Learning how to adapt to complex and dynamic environments is one of the most important factors that contribute to our intelligence. Endowing artificial agents with this ability is not a simple task, particularly in competitive scenarios. In this paper, we present a broad study on how popular reinforcement learning algorithms can be adapted and implemented to learn and to play a real-world implementation of a competitive multiplayer card game. We propose specific training and validation routines for the learning agents, in order to evaluate how the agents learn to be competitive and explain how they adapt to each others' playing style. Finally, we pinpoint how the behavior of each agent derives from their learning style and create a baseline for future research on this scenario.

A Fine-Grained Dataset and Its Efficient Semantic Segmentation for Unstructured Driving Scenarios

Kai Andreas Metzger, Peter Mortimer, Hans J "Joe" Wuensche

Responsive image

Auto-TLDR; TAS500: A Semantic Segmentation Dataset for Autonomous Driving in Unstructured Environments

Slides Poster Similar

Research in autonomous driving for unstructured environments suffers from a lack of semantically labeled datasets compared to its urban counterpart. Urban and unstructured outdoor environments are challenging due to the varying lighting and weather conditions during a day and across seasons. In this paper, we introduce TAS500, a novel semantic segmentation dataset for autonomous driving in unstructured environments. TAS500 offers fine-grained vegetation and terrain classes to learn drivable surfaces and natural obstacles in outdoor scenes effectively. We evaluate the performance of modern semantic segmentation models with an additional focus on their efficiency. Our experiments demonstrate the advantages of fine-grained semantic classes to improve the overall prediction accuracy, especially along the class boundaries. The dataset, code, and pretrained model are available online.

Temporal Pulses Driven Spiking Neural Network for Time and Power Efficient Object Recognition in Autonomous Driving

Wei Wang, Shibo Zhou, Jingxi Li, Xiaohua Li, Junsong Yuan, Zhanpeng Jin

Responsive image

Auto-TLDR; Spiking Neural Network for Real-Time Object Recognition on Temporal LiDAR Pulses

Slides Poster Similar

Accurate real-time object recognition from sensory data has long been a crucial and challenging task for autonomous driving. Even though deep neural networks (DNNs) have been widely applied in this area, their considerable processing latency, power consumption as well as computational complexity have been challenging issues for real-time autonomous driving applications. In this paper, we propose an approach to address the real-time object recognition problem utilizing spiking neural networks (SNNs). The proposed SNN model works directly with raw temporal LiDAR pulses without the pulse-to-point cloud preprocessing procedure, which can significantly reduce delay and power consumption. Being evaluated on various datasets derived from LiDAR and dynamic vision sensor (DVS), including Sim LiDAR, KITTI, and DVS-barrel, our proposed model has shown remarkable time and power efficiency, while achieving comparable recognition performance as the state-of-the-art methods. This paper highlights the SNN's great potentials in autonomous driving and related applications. To the best of our knowledge, this is the first attempt to use SNN to directly perform time and energy efficient object recognition on temporal LiDAR pulses in the setting of autonomous driving.

RISEdb: A Novel Indoor Localization Dataset

Carlos Sanchez Belenguer, Erik Wolfart, Álvaro Casado Coscollá, Vitor Sequeira

Responsive image

Auto-TLDR; Indoor Localization Using LiDAR SLAM and Smartphones: A Benchmarking Dataset

Slides Poster Similar

In this paper we introduce a novel public dataset for developing and benchmarking indoor localization systems. We have selected and 3D mapped a set of representative indoor environments including a large office building, a conference room, a workshop, an exhibition area and a restaurant. Our acquisition pipeline is based on a portable LiDAR SLAM backpack to map the buildings and to accurately track the pose of the user as it moves freely inside them. We introduce the calibration procedures that enable us to acquire and geo-reference live data coming from different independent sensors rigidly attached to the backpack. This has allowed us to collect long sequences of spherical and stereo images, together with all the sensor readings coming from a consumer smartphone and locate them inside the map with centimetre accuracy. The dataset addresses many of the limitations of existing indoor localization datasets regarding the scale and diversity of the mapped buildings; the number of acquired sequences under varying conditions; the accuracy of the ground-truth trajectory; the availability of a detailed 3D model and the availability of different sensor types. It enables the benchmarking of existing and the development of new indoor localization approaches, in particular for deep learning based systems that require large amounts of labeled training data.

Derivation of Geometrically and Semantically Annotated UAV Datasets at Large Scales from 3D City Models

Sidi Wu, Lukas Liebel, Marco Körner

Responsive image

Auto-TLDR; Large-Scale Dataset of Synthetic UAV Imagery for Geometric and Semantic Annotation

Slides Poster Similar

While in high demand for the development of deep learning approaches, extensive datasets of annotated UAV imagery are still scarce today. Manual annotation, however, is time-consuming and, thus, has limited the potential for creating large-scale datasets. We tackle this challenge by presenting a procedure for the automatic creation of simulated UAV image sequences in urban areas and pixel-level annotations from publicly available data sources. We synthesize photo-realistic UAV imagery from Goole Earth Studio and derive annotations from an open CityGML model that not only provides geometric but also semantic information. The first dataset we exemplarily created using our approach contains 144000 images of Berlin, Germany, with four types of annotations, namely semantic labels as well as depth, surface normals, and edge maps. In the future, a complete pipeline regarding all the technical problems will be provided, together with more accurate models to refine some of the empirical settings currently, to automatically generate a large-scale dataset with reliable ground-truth annotations over the whole city of Berlin. The dataset, as well as the source code, will be published by then. Different methods will also be facilitated to test the usability of the dataset. We believe our dataset can be used for, and not limited to, tasks like pose estimation, geo-localization, monocular depth estimation, edge detection, building/surface classification, and plane segmentation. A potential research pipeline for geo-localization based on the synthetic dataset is provided.

Uncertainty-Sensitive Activity Recognition: A Reliability Benchmark and the CARING Models

Alina Roitberg, Monica Haurilet, Manuel Martinez, Rainer Stiefelhagen

Responsive image

Auto-TLDR; CARING: Calibrated Action Recognition with Input Guidance

Slides Similar

Beyond assigning the correct class, an activity recognition model should also to be able to determine, how certain it is in its predictions. We present the first study of how well the confidence values of modern action recognition architectures indeed reflect the probability of the correct outcome and propose a learning-based approach for improving it. First, we extend two popular action recognition datasets with a reliability benchmark in form of the expected calibration error and reliability diagrams. Since our evaluation highlights that confidence values of standard action recognition architectures do not represent the uncertainty well, we introduce a new approach which learns to transform the model output into realistic confidence estimates through an additional calibration network. The main idea of our Calibrated Action Recognition with Input Guidance (CARING) model is to learn an optimal scaling parameter depending on the video representation. We compare our model with the native action recognition networks and the temperature scaling approach - a wide spread calibration method utilized in image classification. While temperature scaling alone drastically improves the reliability of the confidence values, our CARING method consistently leads to the best uncertainty estimates in all benchmark settings.

Calibration and Absolute Pose Estimation of Trinocular Linear Camera Array for Smart City Applications

Martin Ahrnbom, Mikael Nilsson, Håkan Ardö, Kalle Åström, Oksana Yastremska-Kravchenko, Aliaksei Laureshyn

Responsive image

Auto-TLDR; Trinocular Linear Camera Array Calibration for Traffic Surveillance Applications

Slides Poster Similar

A method for calibrating a Trinocular Linear Camera Array (TLCA) for traffic surveillance applications, such as towards smart cities, is presented. A TLCA-specific parametrization guarantees that the calibration finds a model where all the cameras are on a straight line. The method uses both a chequerboard close to the camera, as well as measured 3D points far from the camera: points measured in world coordinates, as well as their corresponding 2D points found manually in the images. Superior calibration accuracy can be obtained compared to standard methods using only a single data source, largely due to the use of chequerboards, while the line constraint in the parametrization allows for joint rectification. The improved triangulation accuracy, from 8-12 cm to around 6 cm when calibrating with 30-50 points in our experiment, allowing better road user analysis. The method is demonstrated by a proof-of-concept application where a point cloud is generated from multiple disparity maps, visualizing road user detections in 3D.

Learning Dictionaries of Kinematic Primitives for Action Classification

Alessia Vignolo, Nicoletta Noceti, Alessandra Sciutti, Francesca Odone, Giulio Sandini

Responsive image

Auto-TLDR; Action Understanding using Visual Motion Primitives

Slides Poster Similar

This paper proposes a method based on visual motion primitives to address the problem of action understanding. The approach builds in an unsupervised way a dictionary of kinematic primitives from a set of sub-movements obtained by segmenting the velocity profile of an action on the basis of local minima derived directly from the optical flow. The dictionary is then used to describe each sub-movement as a linear combination of atoms using sparse coding. The descriptive capability of the proposed motion representation is experimentally validated on the MoCA dataset, a collection of synchronized multi-view videos and motion capture data of cooking activities. The results show that the approach, despite its simplicity, has a good performance in action classification, especially when the motion primitives are combined over time. Also, the method is proved to be tolerant to view point changes, and can thus support cross-view action recognition. Overall, the method may be seen as a backbone of a general approach to action understanding, with potential applications in robotics.

CARRADA Dataset: Camera and Automotive Radar with Range-Angle-Doppler Annotations

Arthur Ouaknine, Alasdair Newson, Julien Rebut, Florence Tupin, Patrick Pérez

Responsive image

Auto-TLDR; CARRADA: A dataset of synchronized camera and radar recordings with range-angle-Doppler annotations for autonomous driving

Slides Poster Similar

High quality perception is essential for autonomous driving (AD) systems. To reach the accuracy and robustness that are required by such systems, several types of sensors must be combined. Currently, mostly cameras and laser scanners (lidar) are deployed to build a representation of the world around the vehicle. While radar sensors have been used for a long time in the automotive industry, they are still under-used for AD despite their appealing characteristics (notably, their ability to measure the relative speed of obstacles and to operate even in adverse weather conditions). To a large extent, this situation is due to the relative lack of automotive datasets with real radar signals that are both raw and annotated. In this work, we introduce CARRADA, a dataset of synchronized camera and radar recordings with range-angle-Doppler annotations. We also present a semi-automatic annotation approach, which was used to annotate the dataset, and a radar semantic segmentation baseline, which we evaluate on several metrics. Both our code and dataset will be released.

Deep Next-Best-View Planner for Cross-Season Visual Route Classification

Kurauchi Kanya, Kanji Tanaka

Responsive image

Auto-TLDR; Active Visual Place Recognition using Deep Convolutional Neural Network

Slides Poster Similar

This paper addresses the problem of active visual place recognition (VPR) from a novel perspective of long-term autonomy. In our approach, a next-best-view (NBV) planner plans an optimal action-observation-sequence to maximize the expected cost-performance for a visual route classification task. A difficulty arises from the fact that the NBV planner is trained and tested in different domains (times of day, weather conditions, and seasons). Existing NBV methods may be confused and deteriorated by the domain-shifts, and require significant efforts for adapting them to a new domain. We address this issue by a novel deep convolutional neural network (DNN) -based NBV planner that does not require the adaptation. Our main contributions in this paper are summarized as follows: (1) We present a novel domain-invariant NBV planner that is specifically tailored for DNN-based VPR. (2) We formulate the active VPR as a POMDP problem and present a feasible solution to address the inherent intractability. Specifically, the probability distribution vector (PDV) output by the available DNN is used as a domain-invariant observation model without the need to retrain it. (3) We verify efficacy of the proposed approach through challenging cross-season VPR experiments, where it is confirmed that the proposed approach clearly outperforms the previous single-view-based or multi-view-based VPR in terms of VPR accuracy and/or action-observation-cost.

Lane Detection Based on Object Detection and Image-To-Image Translation

Hiroyuki Komori, Kazunori Onoguchi

Responsive image

Auto-TLDR; Lane Marking and Road Boundary Detection from Monocular Camera Images using Inverse Perspective Mapping

Slides Poster Similar

In this paper, we propose a method to detect various types of lane markings and road boundaries simultaneously from a monocular camera image. This method detects lane markings and road boundaries in IPM images obtained by the Inverse Perspective Mapping of input images. First, bounding boxes surrounding a lane marking or the road boundary are extracted by the object detection network. At the same time, these areas are labelled with a solid line, a dashed line, a zebra line, a curb, a grass, a sidewall and so on. Next, in each bounding box, lane marking boundaries or road boundaries are drawn by the image-to-image translation network. We use YOLOv3 for the object detection and pix2pix for the image translation. We create our own datasets including various types of lane markings and road boundaries and evaluate our approach using these datasets qualitatively and quantitatively.

Real-Time Drone Detection and Tracking with Visible, Thermal and Acoustic Sensors

Fredrik Svanström, Cristofer Englund, Fernando Alonso-Fernandez

Responsive image

Auto-TLDR; Automatic multi-sensor drone detection using sensor fusion

Slides Poster Similar

This paper explores the process of designing an automatic multi-sensor drone detection system. Besides the common video and audio sensors, the system also includes a thermal infrared camera, which is shown to be a feasible solution to the drone detection task. Even with slightly lower resolution, the performance is just as good as a camera in visible range. The detector performance as a function of the sensor-to-target distance is also investigated. In addition, using sensor fusion, the system is made more robust than the individual sensors, helping to reduce false detections. To counteract the lack of public datasets, a novel video dataset containing 650 annotated infrared and visible videos of drones, birds, airplanes and helicopters is also presented. The database is complemented with an audio dataset of the classes drones, helicopters and background noise.

Road Network Metric Learning for Estimated Time of Arrival

Yiwen Sun, Kun Fu, Zheng Wang, Changshui Zhang, Jieping Ye

Responsive image

Auto-TLDR; Road Network Metric Learning for Estimated Time of Arrival (RNML-ETA)

Slides Poster Similar

Recently, deep learning have achieved promising results in Estimated Time of Arrival (ETA), which is considered as predicting the travel time from the origin to the destination along a given path. One of the key techniques is to use embedding vectors to represent the elements of road network, such as the links (road segments). However, the embedding suffers from the data sparsity problem that many links in the road network are traversed by too few floating cars even in large ride-hailing platforms like Uber and DiDi. Insufficient data makes the embedding vectors in an under-fitting status, which undermines the accuracy of ETA prediction. To address the data sparsity problem, we propose the Road Network Metric Learning framework for ETA (RNML ETA). It consists of two components: (1) a main regression task to predict the travel time, and (2) an auxiliary metric learning task to improve the quality of link embedding vectors. We further propose the triangle loss, a novel loss function to improve the efficiency of metric learning. We validated the effectiveness of RNML-ETA on large scale real-world datasets, by showing that our method outperforms the state-of-the-art model and the promotion concentrates on the cold links with few data.

Future Urban Scenes Generation through Vehicles Synthesis

Alessandro Simoni, Luca Bergamini, Andrea Palazzi, Simone Calderara, Rita Cucchiara

Responsive image

Auto-TLDR; Predicting the Future of an Urban Scene with a Novel View Synthesis Paradigm

Slides Poster Similar

In this work we propose a deep learning pipeline to predict the visual future appearance of an urban scene. Despite recent advances, generating the entire scene in an end-to-end fashion is still far from being achieved. Instead, here we follow a two stages approach, where interpretable information is included in the loop and each actor is modelled independently. We leverage a per-object novel view synthesis paradigm; i.e. generating a synthetic representation of an object undergoing a geometrical roto-translation in the 3D space. Our model can be easily conditioned with constraints (e.g. input trajectories) provided by state-of-the-art tracking methods or by the user itself. This allows us to generate a set of diverse realistic futures starting from the same input in a multi-modal fashion. We visually and quantitatively show the superiority of this approach over traditional end-to-end scene-generation methods on CityFlow, a challenging real world dataset.

Benchmarking Cameras for OpenVSLAM Indoors

Kevin Chappellet, Guillaume Caron, Fumio Kanehiro, Ken Sakurada, Abderrahmane Kheddar

Responsive image

Auto-TLDR; OpenVSLAM: Benchmarking Camera Types for Visual Simultaneous Localization and Mapping

Slides Poster Similar

In this paper we benchmark different types of cameras and evaluate their performance in terms of reliable localization reliability and precision in Visual Simultaneous Localization and Mapping (vSLAM). Such benchmarking is merely found for visual odometry, but never for vSLAM. Existing studies usually compare several algorithms for a given camera. %This work is the first to handle the dual of the latter, i.e. comparing several cameras for a given SLAM algorithm. The evaluation methodology we propose is applied to the recent OpenVSLAM framework. The latter is versatile enough to natively deal with perspective, fisheye, 360 cameras in a monocular or stereoscopic setup, an in RGB or RGB-D modalities. Results in various sequences containing light variation and scenery modifications in the scene assess quantitatively the maximum localization rate for 360 vision. In the contrary, RGB-D vision shows the lowest localization rate, but highest precision when localization is possible. Stereo-fisheye trades-off with localization rates and precision between 360 vision and RGB-D vision. The dataset with ground truth will be made available in open access to allow evaluating other/future vSLAM algorithms with respect to these camera types.

Anomaly Detection, Localization and Classification for Railway Inspection

Riccardo Gasparini, Andrea D'Eusanio, Guido Borghi, Stefano Pini, Giuseppe Scaglione, Simone Calderara, Eugenio Fedeli, Rita Cucchiara

Responsive image

Auto-TLDR; Anomaly Detection and Localization using thermal images in the lowlight environment

Slides Similar

The ability to detect, localize and classify objects that are anomalies is a challenging task in the computer vision community. In this paper, we tackle these tasks developing a framework to automatically inspect the railway during the night. Specifically, it is able to predict the presence, the image coordinates and the class of obstacles. To deal with the lowlight environment, the framework is based on thermal images and consists of three different modules that address the problem of detecting anomalies, predicting their image coordinates and classifying them. Moreover, due to the absolute lack of publicly released datasets collected in the railway context for anomaly detection, we introduce a new multi-modal dataset, acquired from a rail drone, used to evaluate the proposed framework. Experimental results confirm the accuracy of the framework and its suitability, in terms of computational load, performance, and inference time, to be implemented on a self-powered inspection system.

Utilising Visual Attention Cues for Vehicle Detection and Tracking

Feiyan Hu, Venkatesh Gurram Munirathnam, Noel E O'Connor, Alan Smeaton, Suzanne Little

Responsive image

Auto-TLDR; Visual Attention for Object Detection and Tracking in Driver-Assistance Systems

Slides Poster Similar

Advanced Driver-Assistance Systems (ADAS) have been attracting attention from many researchers. Vision based sensors are the closest way to emulate human driver visual behavior while driving. In this paper, we explore possible ways to use visual attention (saliency) for object detection and tracking. We investigate: 1) How a visual attention map such as a subjectness attention or saliency map and an objectness attention map can facilitate region proposal generation in a 2-stage object detector; 2) How a visual attention map can be used for tracking multiple objects. We propose a neural network that can simultaneously detect objects as and generate objectness and subjectness maps to save computational power. We further exploit the visual attention map during tracking using a sequential Monte Carlo probability hypothesis density (PHD) filter. The experiments are conducted on KITTI and DETRAC datasets. The use of visual attention and hierarchical features has shown a considerable improvement of≈8% in object detection which effectively increased tracking performance by≈4% on KITTI dataset.

Fully Convolutional Neural Networks for Raw Eye Tracking Data Segmentation, Generation, and Reconstruction

Wolfgang Fuhl, Yao Rong, Enkelejda Kasneci

Responsive image

Auto-TLDR; Semantic Segmentation of Eye Tracking Data with Fully Convolutional Neural Networks

Slides Poster Similar

In this paper, we use fully convolutional neural networks for the semantic segmentation of eye tracking data. We also use these networks for reconstruction, and in conjunction with a variational auto-encoder to generate eye movement data. The first improvement of our approach is that no input window is necessary, due to the use of fully convolutional networks and therefore any input size can be processed directly. The second improvement is that the used and generated data is raw eye tracking data (position X, Y and time) without preprocessing. This is achieved by pre-initializing the filters in the first layer and by building the input tensor along the z axis. We evaluated our approach on three publicly available datasets and compare the results to the state of the art.

Unconstrained Vision Guided UAV Based Safe Helicopter Landing

Arindam Sikdar, Abhimanyu Sahu, Debajit Sen, Rohit Mahajan, Ananda Chowdhury

Responsive image

Auto-TLDR; Autonomous Helicopter Landing in Hazardous Environments from Unmanned Aerial Images Using Constrained Graph Clustering

Slides Poster Similar

In this paper, we have addressed the problem of automated detection of safe zone(s) for helicopter landing in hazardous environments from images captured by an Unmanned Aerial Vehicle (UAV). The unconstrained motion of the image capturing drone (the UAV in our case) makes the problem further difficult. The solution pipeline consists of natural landmark detection and tracking, stereo-pair generation using constrained graph clustering, digital terrain map construction and safe landing zone detection. The main methodological contribution lies in mathematically formulating epipolar constraint and then using it in a Minimum Spanning Tree (MST) based graph clustering approach. We have also made publicly available AHL (Autonomous Helicopter Landing) dataset, a new aerial video dataset captured by a drone, with annotated ground-truths. Experimental comparisons with other competing clustering methods i) in terms of Dunn Index and Davies Bouldin Index as well as ii) for frame-level safe zone detection in terms of F-measure and confusion matrix clearly demonstrate the effectiveness of the proposed formulation.

Enhancing Depth Quality of Stereo Vision Using Deep Learning-Based Prior Information of the Driving Environment

Weifu Li, Vijay John, Seiichi Mita

Responsive image

Auto-TLDR; A Novel Post-processing Mathematical Framework for Stereo Vision

Slides Poster Similar

Generation of high density depth values of the driving environment is indispensable for autonomous driving. Stereo vision is one of the practical and effective methods to generate these depth values. However, the accuracy of the stereo vision is limited by texture-less regions, such as sky and road areas, and repeated patterns in the image. To overcome these problems, we propose to enhance the stereo generated depth by incorporating prior information of the driving environment. Prior information, generated by deep learning-based U-Net model, is utilized in a novel post-processing mathematical framework to refine the stereo generated depth. The proposed mathematical framework is formulated as an optimization problem, which refines the errors due to texture-less regions and repeated patterns. Owing to its mathematical formulation, the post-processing framework is not a black-box and is explainable, and can be readily utilized for depth maps generated by any stereo vision algorithm. The proposed framework is qualitatively validated on the acquired dataset and KITTI dataset. The results obtained show that the proposed framework improves the stereo depth generation accuracy

Transformer Networks for Trajectory Forecasting

Francesco Giuliari, Hasan Irtiza, Marco Cristani, Fabio Galasso

Responsive image

Auto-TLDR; TransformerNetworks for Trajectory Prediction of People Interactions

Slides Poster Similar

Most recent successes on forecasting the people mo-tion are based on LSTM models andallmost recent progress hasbeen achieved by modelling the social interaction among peopleand the people interaction with the scene. We question the useof the LSTM models and propose the novel use of TransformerNetworks for trajectory forecasting. This is a fundamental switchfrom the sequential step-by-step processing of LSTMs to theonly-attention-based memory mechanisms of Transformers. Inparticular, we consider both the original Transformer Network(TF) and the larger Bidirectional Transformer (BERT), state-of-the-art on all natural language processing tasks. Our proposedTransformers predict the trajectories of the individual peoplein the scene. These are “simple” models because each personis modelled separately without any complex human-human norscene interaction terms. In particular, the TF modelwithoutbells and whistlesyields the best score on the largest and mostchallenging trajectory forecasting benchmark of TrajNet [1]. Ad-ditionally, its extension which predicts multiple plausible futuretrajectories performs on par with more engineered techniqueson the 5 datasets of ETH [2]+UCY [3]. Finally, we showthat Transformers may deal with missing observations, as itmay be the case with real sensor data. Code is available atgithub.com/FGiuliari/Trajectory-Transformer

Explainable Online Validation of Machine Learning Models for Practical Applications

Wolfgang Fuhl, Yao Rong, Thomas Motz, Michael Scheidt, Andreas Markus Hartel, Andreas Koch, Enkelejda Kasneci

Responsive image

Auto-TLDR; A Reformulation of Regression and Classification for Machine Learning Algorithm Validation

Slides Poster Similar

We present a reformulation of the regression and classification, which aims to validate the result of a machine learning algorithm. Our reformulation simplifies the original problem and validates the result of the machine learning algorithm using the training data. Since the validation of machine learning algorithms must always be explainable, we perform our experiments with the kNN algorithm as well as with an algorithm based on conditional probabilities, which is proposed in this work. For the evaluation of our approach, three publicly available data sets were used and three classification and two regression problems were evaluated. The presented algorithm based on conditional probabilities is also online capable and requires only a fraction of memory compared to the kNN algorithm.

Localization of Unmanned Aerial Vehicles in Corridor Environments Using Deep Learning

Ram Padhy, Shahzad Ahmad, Sachin Verma, Sambit Bakshi, Pankaj Kumar Sa

Responsive image

Auto-TLDR; A monocular vision assisted localization algorithm for indoor corridor environments

Slides Poster Similar

We propose a monocular vision assisted localization algorithm, that will help a UAV navigate safely in indoor corridor environments. Always, the aim is to navigate the UAV through a corridor in the forward direction by keeping it at the center with no orientation either to the left or right side. The algorithm makes use of the RGB image, captured from the UAV front camera, and passes it through a trained Deep Neural Network (DNN) to predict the position of the UAV as either on the left or center or right side of the corridor. Depending upon the divergence of the UAV with respect to an imaginary central line, known as the central bisector line (CBL) of the corridor, a suitable command is generated to bring the UAV to the center. When the UAV is at the center of the corridor, a new image is passed through another trained DNN to predict the orientation of the UAV with respect to the CBL of the corridor. If the UAV is either left or right tilted, an appropriate command is generated to rectify the orientation. We also propose a new corridor dataset, named UAVCorV1, which contains images as captured by the UAV front camera when the UAV is at all possible locations of a variety of corridors. An exhaustive set of experiments in different corridors reveal the efficacy of the proposed algorithm.

Story Comparison for Estimating Field of View Overlap in a Video Collection

Thierry Malon, Sylvie Chambon, Alain Crouzil, Vincent Charvillat

Responsive image

Auto-TLDR; Finding Videos with Overlapping Fields of View Using Video Data

Slides Similar

Determining the links between large amounts of video data with no prior knowledge of the camera positions is a hard task to automate. From a collection of videos acquired from static cameras simultaneously, we propose a method for finding groups of videos with overlapping fields of view. Each video is first processed individually: at regular time steps, objects are detected and are assigned a category and an appearance descriptor. Next, the video is split into cells at different resolutions and we assign to each cell its story: it consists of the list of objects detected in the cell over time. Once the stories are established for each video, the links between cells of different videos are determined by comparing their stories: two cells are linked if they show simultaneous detections of objects of the same category with similar appearances. Pairs of videos with overlapping fields of view are identified using these links between cells. A link graph is finally returned, in which each node represents a video, and the edges indicate pairs of overlapping videos. The approach is evaluated on a set of 63 real videos from both public datasets and live surveillance videos, as well as on 84 synthetic videos, and shows promising results.

AG-GAN: An Attentive Group-Aware GAN for Pedestrian Trajectory Prediction

Yue Song, Niccolò Bisagno, Syed Zohaib Hassan, Nicola Conci

Responsive image

Auto-TLDR; An attentive group-aware GAN for motion prediction in crowded scenarios

Slides Poster Similar

Understanding human behaviors in crowded scenarios requires analyzing not only the position of the subjects in space, but also the scene context. Existing approaches mostly rely on the motion history of each pedestrian and model the interactions among people by considering the entire surrounding neighborhood. In our approach, we address the problem of motion prediction by applying coherent group clustering and a global attention mechanism on the LSTM-based Generative Adversarial Networks (GANs). The proposed model consists of an attentive group-aware GAN that observes the agents' past motion and predicts future paths, using (i) a group pooling module to model neighborhood interaction, and (ii) an attention module to specifically focus on hidden states. The experimental results demonstrate that our proposal outperforms state-of-the-art models on common benchmark datasets, and is able to generate socially-acceptable trajectories.

Extraction and Analysis of 3D Kinematic Parameters of Table Tennis Ball from a Single Camera

Jordan Calandre, Renaud Péteri, Laurent Mascarilla, Benoit Tremblais

Responsive image

Auto-TLDR; 3D Ball Trajectories Analysis using a Single Camera for Sport Gesture Analysis

Slides Poster Similar

Vision is the first indicator for coaches to assess the quality of a sport gesture. However, gesture analysis using computer vision is often restricted to laboratory experiments, far from the real conditions in which athletes train on a daily basis. In this perspective, we introduce 3D ball trajectories analysis using a single camera with very few acquisition constraints. A key point of the proposal is the estimation of the apparent ball size for obtaining ball to camera distance. For this purpose, a 2D CNN is trained using a generated dataset that enables a reliable ball size extraction, even in case of high motion blur. The final objective is not only to be able to determine ball trajectories, but most importantly to retrieve their relevant physical parameters. With a precise estimation of those trajectories, it is indeed possible to extract the ball tangential and rotation speed, related to the so-called Magnus effect. Validation experiments for characterizing table tennis strokes are presented on both a synthetic dataset and on real video sequences.

Inferring Functional Properties from Fluid Dynamics Features

Andrea Schillaci, Maurizio Quadrio, Carlotta Pipolo, Marcello Restelli, Giacomo Boracchi

Responsive image

Auto-TLDR; Exploiting Convective Properties of Computational Fluid Dynamics for Medical Diagnosis

Slides Poster Similar

In a wide range of applied problems involving fluid flows, Computational Fluid Dynamics (CFD) provides detailed quantitative information on the flow field, at various levels of fidelity and computational cost. However, CFD alone cannot predict high-level functional properties of the system that are not easily obtained from the equations of fluid motion. In this work, we present a data-driven framework to extract additional information, such as medical diagnostic output, from CFD solutions. The task is made difficult by the huge data dimensionality of CFD, together with the limited amount of training data implied by its high computational cost. By pursuing a traditional ML pipeline of pre-processing, feature extraction, and model training, we demonstrate that informative features can be extracted from CFD data. Two experiments, pertaining to different application domains, support the claim that the convective properties implicit into a CFD solution can be leveraged to retrieve functional information for which an analytical definition is missing. Despite the preliminary nature of our study and the relative simplicity of both the geometrical and CFD models, for the first time we demonstrate that the combination of ML and CFD can diagnose a complex system in terms of high-level functional information.

CCA: Exploring the Possibility of Contextual Camouflage Attack on Object Detection

Shengnan Hu, Yang Zhang, Sumit Laha, Ankit Sharma, Hassan Foroosh

Responsive image

Auto-TLDR; Contextual camouflage attack for object detection

Slides Poster Similar

Deep neural network based object detection has become the cornerstone of many real-world applications. Along with this success comes concerns about its vulnerability to malicious attacks. To gain more insight into this issue, we propose a contextual camouflage attack (CCA for short) algorithm to influence the performance of object detectors. In this paper, we use an evolutionary search strategy and adversarial machine learning in interactions with a photo-realistic simulated environment to find camouflage patterns that are effective over a huge variety of object locations, camera poses, and lighting conditions. The proposed camouflages are validated effective to most of the state-of-the-art object detectors.