Camera Calibration Using Parallel Line Segments

Gaku Nakano

Responsive image

Auto-TLDR; Closed-Form Calibration of Surveillance Cameras using Parallel 3D Line Segment Projections

Slides Poster

This paper proposes a camera calibration method suitable for surveillance cameras using the image projection of parallel 3D line segments of the same length. We assume that vertical line segments are perpendicular to the ground plane and their bottom end-points are on the ground plane. Under this assumption, the camera parameters can be directly solved by at least two line segments without estimating vanishing points. Extending the minimal solution, we derive a closed-form solution to the least squares case with more than two line segments. Lens distortion is jointly optimized in bundle adjustment. Synthetic data evaluation shows that the best depression angle of a camera is around 50 degrees. In real data evaluation, we use body joints of pedestrians as vertical line segments. The experimental results on publicly available datasets show that the proposed method with a human pose detector can correctly calibrate wide-angle cameras including radial distortion.

Similar papers

Calibration and Absolute Pose Estimation of Trinocular Linear Camera Array for Smart City Applications

Martin Ahrnbom, Mikael Nilsson, Håkan Ardö, Kalle Åström, Oksana Yastremska-Kravchenko, Aliaksei Laureshyn

Responsive image

Auto-TLDR; Trinocular Linear Camera Array Calibration for Traffic Surveillance Applications

Slides Poster Similar

A method for calibrating a Trinocular Linear Camera Array (TLCA) for traffic surveillance applications, such as towards smart cities, is presented. A TLCA-specific parametrization guarantees that the calibration finds a model where all the cameras are on a straight line. The method uses both a chequerboard close to the camera, as well as measured 3D points far from the camera: points measured in world coordinates, as well as their corresponding 2D points found manually in the images. Superior calibration accuracy can be obtained compared to standard methods using only a single data source, largely due to the use of chequerboards, while the line constraint in the parametrization allows for joint rectification. The improved triangulation accuracy, from 8-12 cm to around 6 cm when calibrating with 30-50 points in our experiment, allowing better road user analysis. The method is demonstrated by a proof-of-concept application where a point cloud is generated from multiple disparity maps, visualizing road user detections in 3D.

Minimal Solvers for Indoor UAV Positioning

Marcus Valtonen Örnhag, Patrik Persson, Mårten Wadenbäck, Kalle Åström, Anders Heyden

Responsive image

Auto-TLDR; Relative Pose Solvers for Visual Indoor UAV Navigation

Slides Poster Similar

In this paper we consider a collection of relative pose problems which arise naturally in applications for visual indoor UAV navigation. We focus on cases where additional information from an onboard IMU is available and thus provides a partial extrinsic calibration through the gravitational vector. The solvers are designed for a partially calibrated camera, for a variety of realistic indoor scenarios, which makes it possible to navigate using images of the ground floor. Current state-of-the-art solvers use more general assumptions, such as using arbitrary planar structures; however, these solvers do not yield adequate reconstructions for real scenes, nor do they perform fast enough to be incorporated in real-time systems. We show that the proposed solvers enjoy better numerical stability, are faster, and require fewer point correspondences, compared to state-of-the-art solvers. These properties are vital components for robust navigation in real-time systems, and we demonstrate on both synthetic and real data that our method outperforms other methods, and yields superior motion estimation.

Better Prior Knowledge Improves Human-Pose-Based Extrinsic Camera Calibration

Olivier Moliner, Sangxia Huang, Kalle Åström

Responsive image

Auto-TLDR; Improving Human-pose-based Extrinsic Calibration for Multi-Camera Systems

Slides Poster Similar

Accurate extrinsic calibration of wide baseline multi-camera systems enables better understanding of 3D scenes for many applications and is of great practical importance. Classical Structure-from-Motion calibration methods require special calibration equipment so that accurate point correspondences can be detected between different views. In addition, an operator with some training is usually needed to ensure that data is collected in a way that leads to good calibration accuracy. This limits the ease of adoption of such technologies. Recently, methods have been proposed to use human pose estimation models to establish point correspondences, thus removing the need for any special equipment. The challenge with this approach is that human pose estimation algorithms typically produce much less accurate feature points compared to classical patch-based methods. Another problem is that ambient human motion might not be optimal for calibration. We build upon prior works and introduce several novel ideas to improve the accuracy of human-pose-based extrinsic calibration. Our first contribution is a robust reprojection loss based on a better understanding of the sources of pose estimation error. Our second contribution is a 3D human pose likelihood model learned from motion capture data. We demonstrate significant improvements in calibration accuracy by evaluating our method on four publicly available datasets.

A Two-Step Approach to Lidar-Camera Calibration

Yingna Su, Yaqing Ding, Jian Yang, Hui Kong

Responsive image

Auto-TLDR; Closed-Form Calibration of Lidar-camera System for Ego-motion Estimation and Scene Understanding

Slides Poster Similar

Autonomous vehicles and robots are typically equipped with Lidar and camera. Hence, calibrating the Lidar-camera system is of extreme importance for ego-motion estimation and scene understanding. In this paper, we propose a two-step approach (coarse + fine) for the external calibration between a camera and a multiple-line Lidar. First, a new closed-form solution is proposed to obtain the initial calibration parameters. We compare our solution with the state-of-the-art SVD-based algorithm, and show the benefits of both the efficiency and stability. With the initial calibration parameters, the ICP-based calibration framework is used to register the point clouds which extracted from the camera and Lidar coordinate frames, respectively. Our method has been applied to two Lidar-camera systems: an HDL-64E Lidar-camera system, and a VLP-16 Lidar-camera system. Experimental results demonstrate that our method achieves promising performance and higher accuracy than two open-source methods.

Total Estimation from RGB Video: On-Line Camera Self-Calibration, Non-Rigid Shape and Motion

Antonio Agudo

Responsive image

Auto-TLDR; Joint Auto-Calibration, Pose and 3D Reconstruction of a Non-rigid Object from an uncalibrated RGB Image Sequence

Slides Poster Similar

In this paper we present a sequential approach to jointly retrieve camera auto-calibration, camera pose and the 3D reconstruction of a non-rigid object from an uncalibrated RGB image sequence, without assuming any prior information about the shape structure, nor the need for a calibration pattern, nor the use of training data at all. To this end, we propose a Bayesian filtering approach based on a sum-of-Gaussians filter composed of a bank of extended Kalman filters (EKF). For every EKF, we make use of dynamic models to estimate its state vector, which later will be Gaussianly combined to achieve a global solution. To deal with deformable objects, we incorporate a mechanical model solved by using the finite element method. Thanks to these ingredients, the resulting method is both efficient and robust to several artifacts such as missing and noisy observations as well as sudden camera motions, while being available for a wide variety of objects and materials, including isometric and elastic shape deformations. Experimental validation is proposed in real experiments, showing its strengths with respect to competing approaches.

A Globally Optimal Method for the PnP Problem with MRP Rotation Parameterization

Manolis Lourakis, George Terzakis

Responsive image

Auto-TLDR; A Direct least squares, algebraic PnP solver with modified Rodrigues parameters

Poster Similar

The perspective-n-point (PnP) problem is of fundamental importance in computer vision. A global optimality condition for PnP that is independent of a particular rotation parameterization was recently developed by Nakano. This paper puts forward a direct least squares, algebraic PnP solution that extends Nakano's work by combining his optimality condition with the modified Rodrigues parameters (MRPs) for parameterizing rotation. The result is a system of polynomials that is solved using the Groebner basis approach. An MRP vector has twice the rotational range of the classical Rodrigues (i.e., Cayley) vector used by Nakano to represent rotation. The proposed solver provides strong guarantees that the full rotation singularity associated with MRPs is avoided. Furthermore, detailed experiments provide evidence that our solver attains accuracy that is indistinguishable from Nakano's Cayley-based solution with a moderate increase in computational cost.

Photometric Stereo with Twin-Fisheye Cameras

Jordan Caracotte, Fabio Morbidi, El Mustapha Mouaddib

Responsive image

Auto-TLDR; Photometric stereo problem for low-cost 360-degree cameras

Slides Poster Similar

In this paper, we introduce and solve, for the first time, the photometric stereo problem for low-cost 360-degree cameras. In particular, we present a spherical image irradiance equation which is adapted to twin-fisheye cameras, and an original algorithm for the estimation of light directions based on the specular highlights observed on mirror balls. Extensive experiments with synthetic and real-world images captured by a Ricoh Theta V camera, demonstrate the effectiveness and robustness of the proposed 3D reconstruction pipeline. To foster reproducible research, the image dataset and code developed for this paper are made publicly available at the address: https://home.mis.u-picardie.fr/~fabio/PhotoSphere.html

Generic Merging of Structure from Motion Maps with a Low Memory Footprint

Gabrielle Flood, David Gillsjö, Patrik Persson, Anders Heyden, Kalle Åström

Responsive image

Auto-TLDR; A Low-Memory Footprint Representation for Robust Map Merge

Slides Poster Similar

With the development of cheap image sensors, the amount of available image data have increased enormously, and the possibility of using crowdsourced collection methods has emerged. This calls for development of ways to handle all these data. In this paper, we present new tools that will enable efficient, flexible and robust map merging. Assuming that separate optimisations have been performed for the individual maps, we show how only relevant data can be stored in a low memory footprint representation. We use these representations to perform map merging so that the algorithm is invariant to the merging order and independent of the choice of coordinate system. The result is a robust algorithm that can be applied to several maps simultaneously. The result of a merge can also be represented with the same type of low-memory footprint format, which enables further merging and updating of the map in a hierarchical way. Furthermore, the method can perform loop closing and also detect changes in the scene between the capture of the different image sequences. Using both simulated and real data — from both a hand held mobile phone and from a drone — we verify the performance of the proposed method.

Rotational Adjoint Methods for Learning-Free 3D Human Pose Estimation from IMU Data

Caterina Emilia Agelide Buizza, Yiannis Demiris

Responsive image

Auto-TLDR; Learning-free 3D Human Pose Estimation from Inertial Measurement Unit Data

Poster Similar

We present a new framework for learning-free 3D human pose estimation from Inertial Measurement Unit (IMU) data. The proposed method does not rely on a full motion sequence to calculate a pose for any particular time point and thus can operate in real-time. A cost function based only on joint rotations is used, removing the need for frequent transformations between rotations and 3D Cartesian coordinates. A Jacobian that preserves skeleton structure is derived using Adjoint methods from Variational Data Assimilation. To facilitate further research in IMU-based Motion Capture, we provide a dataset that combines RGB and depth images from an Intel RealSense camera, marker-based motion capture from an Optitrack system and Xsens IMU data. We have evaluated our method on both our dataset and the Total Capture dataset, showing an average error across 24 joints of 0.45 and 0.48 radians respectively.

AV-SLAM: Autonomous Vehicle SLAM with Gravity Direction Initialization

Kaan Yilmaz, Baris Suslu, Sohini Roychowdhury, L. Srikar Muppirisetty

Responsive image

Auto-TLDR; VI-SLAM with AGI: A combination of three SLAM algorithms for autonomous vehicles

Slides Poster Similar

Simultaneous localization and mapping (SLAM) algorithms that are aimed at autonomous vehicles (AVs) are required to utilize sensor redundancies specific to AVs and enable accurate, fast and repeatable estimations of pose and path trajectories. In this work, we present a combination of three SLAM algorithms that utilize a different subset of available sensors such as inertial measurement unit (IMU), a gray-scale mono-camera, and a Lidar. Also, we propose a novel acceleration-based gravity direction initialization (AGI) method for the visual-inertial SLAM algorithm. We analyze the SLAM algorithms and initialization methods for pose estimation accuracy, speed of convergence and repeatability on the KITTI odometry sequences. The proposed VI-SLAM with AGI method achieves relative pose errors less than 2\%, convergence in half a minute or less and convergence time variability less than 3s, which makes it preferable for AVs.

Generic Document Image Dewarping by Probabilistic Discretization of Vanishing Points

Gilles Simon, Salvatore Tabbone

Responsive image

Auto-TLDR; Robust Document Dewarping using vanishing points

Slides Poster Similar

Document images dewarping is still a challenge especially when documents are captured with one camera in an uncontrolled environment. In this paper we propose a generic approach based on vanishing points (VP) to reconstruct the 3D shape of document pages. Unlike previous methods we do not need to segment the text included in the documents. Therefore, our approach is less sensitive to pre-processing and segmentation errors. The computation of the VPs is robust and relies on the a-contrario framework, which has only one parameter whose setting is based on probabilistic reasoning instead of experimental tuning. Thus, our method can be applied to any kind of document including text and non-text blocks and extended to other kind of images. Experimental results show that the proposed method is robust to a variety of distortions.

NetCalib: A Novel Approach for LiDAR-Camera Auto-Calibration Based on Deep Learning

Shan Wu, Amnir Hadachi, Damien Vivet, Yadu Prabhakar

Responsive image

Auto-TLDR; Automatic Calibration of LiDAR and Cameras using Deep Neural Network

Slides Poster Similar

A fusion of LiDAR and cameras have been widely used in many robotics applications such as classification, segmentation, object detection, and autonomous driving. It is essential that the LiDAR sensor can measure distances accurately, which is a good complement to the cameras. Hence, calibrating sensors before deployment is a mandatory step. The conventional methods include checkerboards, specific patterns, or human labeling, which is trivial and human-labor extensive if we do the same calibration process every time. The main propose of this research work is to build a deep neural network that is capable of automatically finding the geometric transformation between LiDAR and cameras. The results show that our model manages to find the transformations from randomly sampled artificial errors. Besides, our work is open-sourced for the community to fully utilize the advances of the methodology for developing more the approach, initiating collaboration, and innovation in the topic.

Position-Aware and Symmetry Enhanced GAN for Radial Distortion Correction

Yongjie Shi, Xin Tong, Jingsi Wen, He Zhao, Xianghua Ying, Jinshi Hongbin Zha

Responsive image

Auto-TLDR; Generative Adversarial Network for Radial Distorted Image Correction

Slides Poster Similar

This paper presents a novel method based on the generative adversarial network for radial distortion correction. Instead of generating a corrected image, our generator predicts a pixel flow map to measure the pixel offset between the distorted and corrected image. The quality of the generated pixel flow map and the warped image are judged by the discriminator. As texture far away from the image center has strong distortion, we develop an Adaptive Inverted Foveal layer which can transform the deformation to the intensity of the image to exploit this property. Rotation symmetry enhanced convolution kernels are applied to extract geometric features of different orientations explicitly. These learned features are recalibrated using the Squeeze-and-Excitation block to assign different weights for different directions. Moreover, we construct a first real-world radial distorted image dataset RD600 annotated with ground truth to evaluate our proposed method. We conduct extensive experiments to validate the effectiveness of each part of our framework. The further experiment shows our approach outperforms previous methods in both synthetic and real-world datasets quantitatively and qualitatively.

An Adaptive Model for Face Distortion Correction

Duong H. Nguyen, Tien D. Bui

Responsive image

Auto-TLDR; Adaptive Polynomial Model for Face Distortion Correction in Selfie Photos

Poster Similar

The age of social media insists on developing devices that are able to capture and share ones' moments with high fidelity. Handheld devices such as smartphones with wide-angle cameras have shown the current trend in mobile photography. Although one can take great delight in a wide field of view through modern cameras, nearby objects or faces may be distorted significantly. Recent works have obtained impressive results in this research area, but there is still a tradeoff between image quality and processing time to consider. This work introduces an adaptive polynomial model that automatically selects faces and performs image distortion correction. Since the photos are processed locally, faces are undistorted, and the background is close to the original state. Unlike other content-aware based methods which rely on heavy computing components and high image resolution, our model is suitable for mobile devices to tackle face distortion issue in selfie photos.

Computing Stable Resultant-Based Minimal Solvers by Hiding a Variable

Snehal Bhayani, Zuzana Kukelova, Janne Heikkilä

Responsive image

Auto-TLDR; Sparse Permian-Based Method for Solving Minimal Systems of Polynomial Equations

Slides Similar

Many computer vision applications require robust and efficient estimation of camera geometry. The robust estimation is usually based on solving camera geometry problems from a minimal number of input data measurements, i.e., solving minimal problems, in a RANSAC-style framework. Minimal problems often result in complex systems of polynomial equations. The existing state-of-the-art methods for solving such systems are either based on Groebner Basis and the action matrix method, which have been extensively studied and optimized in the recent years or recently proposed approach based on a resultant computation using an extra variable. In this paper, we study an interesting alternative resultant-based method for solving sparse systems of polynomial equations by hiding one variable. This approach results in a larger eigenvalue problem than the action matrix and extra variable resultant-based methods; however, it does not need to compute an inverse or elimination of large matrices that may be numerically unstable. The proposed approach includes several improvements to the standard sparse resultant algorithms, which significantly improves the efficiency and stability of the hidden variable resultant-based solvers as we demonstrate on several interesting computer vision problems. We show that for the studied problems, our sparse resultant based approach leads to more stable solvers than the state-of-the-art Groebner Basis as well as existing resultant-based solvers, especially in close to critical configurations. Our new method can be fully automated and incorporated into existing tools for the automatic generation of efficient minimal solvers.

Benchmarking Cameras for OpenVSLAM Indoors

Kevin Chappellet, Guillaume Caron, Fumio Kanehiro, Ken Sakurada, Abderrahmane Kheddar

Responsive image

Auto-TLDR; OpenVSLAM: Benchmarking Camera Types for Visual Simultaneous Localization and Mapping

Slides Poster Similar

In this paper we benchmark different types of cameras and evaluate their performance in terms of reliable localization reliability and precision in Visual Simultaneous Localization and Mapping (vSLAM). Such benchmarking is merely found for visual odometry, but never for vSLAM. Existing studies usually compare several algorithms for a given camera. %This work is the first to handle the dual of the latter, i.e. comparing several cameras for a given SLAM algorithm. The evaluation methodology we propose is applied to the recent OpenVSLAM framework. The latter is versatile enough to natively deal with perspective, fisheye, 360 cameras in a monocular or stereoscopic setup, an in RGB or RGB-D modalities. Results in various sequences containing light variation and scenery modifications in the scene assess quantitatively the maximum localization rate for 360 vision. In the contrary, RGB-D vision shows the lowest localization rate, but highest precision when localization is possible. Stereo-fisheye trades-off with localization rates and precision between 360 vision and RGB-D vision. The dataset with ground truth will be made available in open access to allow evaluating other/future vSLAM algorithms with respect to these camera types.

RISEdb: A Novel Indoor Localization Dataset

Carlos Sanchez Belenguer, Erik Wolfart, Álvaro Casado Coscollá, Vitor Sequeira

Responsive image

Auto-TLDR; Indoor Localization Using LiDAR SLAM and Smartphones: A Benchmarking Dataset

Slides Poster Similar

In this paper we introduce a novel public dataset for developing and benchmarking indoor localization systems. We have selected and 3D mapped a set of representative indoor environments including a large office building, a conference room, a workshop, an exhibition area and a restaurant. Our acquisition pipeline is based on a portable LiDAR SLAM backpack to map the buildings and to accurately track the pose of the user as it moves freely inside them. We introduce the calibration procedures that enable us to acquire and geo-reference live data coming from different independent sensors rigidly attached to the backpack. This has allowed us to collect long sequences of spherical and stereo images, together with all the sensor readings coming from a consumer smartphone and locate them inside the map with centimetre accuracy. The dataset addresses many of the limitations of existing indoor localization datasets regarding the scale and diversity of the mapped buildings; the number of acquired sequences under varying conditions; the accuracy of the ground-truth trajectory; the availability of a detailed 3D model and the availability of different sensor types. It enables the benchmarking of existing and the development of new indoor localization approaches, in particular for deep learning based systems that require large amounts of labeled training data.

Vehicle Lane Merge Visual Benchmark

Kai Cordes, Hellward Broszio

Responsive image

Auto-TLDR; A Benchmark for Automated Cooperative Maneuvering Using Multi-view Video Streams and Ground Truth Vehicle Description

Slides Poster Similar

Automated driving is regarded as the most promising technology for improving road safety in the future. In this context, connected vehicles have an important role regarding their ability to perform cooperative maneuvers for challenging traffic situations. We propose a benchmark for automated cooperative maneuvers. The targeted cooperative maneuver is the vehicle lane merge where a vehicle on the acceleration lane merges into the traffic of a motorway. The benchmark enables the evaluation of vehicle localization approaches as well as the study of cooperative maneuvers. It consists of temporally synchronized multi-view video streams, highly accurate camera calibration, and ground truth vehicle descriptions, including position, heading, speed, and shape. For benchmark generation, the lane merge maneuver is performed by human drivers on a test track, resulting in 120 lane merge data sets with various traffic situations and video recording conditions.

User-Independent Gaze Estimation by Extracting Pupil Parameter and Its Mapping to the Gaze Angle

Sang Yoon Han, Nam Ik Cho

Responsive image

Auto-TLDR; Gaze Point Estimation using Pupil Shape for Generalization

Slides Poster Similar

Since gaze estimation plays a crucial role in recognizing human intentions, it has been researched for a long time, and its accuracy is ever increasing. However, due to the wide variation in eye shapes and focusing abilities between the individuals, accuracies of most algorithms vary depending on each person in the test group, especially when the initial calibration is not well performed. To alleviate the user-dependency, we attempt to derive features that are general for most people and use them as the input to a deep network instead of using the images as the input. Specifically, we use the pupil shape as the core feature because it is directly related to the 3D eyeball rotation, and thus the gaze direction. While existing deep learning methods learn the gaze point by extracting various features from the image, we focus on the mapping function from the eyeball rotation to the gaze point by using the pupil shape as the input. It is shown that the accuracy of gaze point estimation also becomes robust for the uncalibrated points by following the characteristics of the mapping function. Also, our gaze network learns the gaze difference to facilitate the re-calibration process to fix the calibration-drift problem that typically occurs with glass-type or head-mount devices.

Learning Non-Rigid Surface Reconstruction from Spatio-Temporal Image Patches

Matteo Pedone, Abdelrahman Mostafa, Janne Heikkilä

Responsive image

Auto-TLDR; Dense Spatio-Temporal Depth Maps of Deformable Objects from Video Sequences

Slides Poster Similar

We present a method to reconstruct a dense spatio-temporal depth map of a non-rigidly deformable object directly from a video sequence. The estimation of depth is performed locally on spatio-temporal patches of the video, and then the full depth video of the entire shape is recovered by combining them together. Since the geometric complexity of a local spatio-temporal patch of a deforming non-rigid object is often simple enough to be faithfully represented with a parametric model, we artificially generate a database of small deforming rectangular meshes rendered with different material properties and light conditions, along with their corresponding depth videos, and use such data to train a convolutional neural network. We tested our method on both synthetic and Kinect data and experimentally observed that the reconstruction error is significantly lower than the one obtained using other approaches like conventional non-rigid structure from motion.

Extending Single Beam Lidar to Full Resolution by Fusing with Single Image Depth Estimation

Yawen Lu, Yuxing Wang, Devarth Parikh, Guoyu Lu

Responsive image

Auto-TLDR; Self-supervised LIDAR for Low-Cost Depth Estimation

Slides Similar

Depth estimation is playing an important role in indoor and outdoor scene understanding, autonomous driving, augmented reality and many other tasks. Vehicles and robotics are able to use active illumination sensors such as LIDAR to receive high precision depth estimation. However, high-resolution Lidars are usually too expensive, which limits its massive production on various applications. Though single beam LIDAR enjoys the benefits of low cost, one beam depth sensing is not usually sufficient to perceive the surrounding environment in many scenarios. In this paper, we propose a learning-based framework to explore to replicate similar or even higher performance as costly LIDARs with our designed self-supervised network and a low-cost single-beam LIDAR. After the accurate calibration with a visible camera, the single beam LIDAR can adjust the scale uncertainty of the depth map estimated by the visible camera. The adjusted depth map enjoys the benefits of high resolution and sensing accuracy as high beam LIDAR and maintains low-cost as single beam LIDAR. Thus we can achieve similar sensing effect of high beam LIDAR with more than a 50-100 times cheaper price (e.g., \$80000 Velodyne HDL-64E LIDAR v.s. \$1000 SICK TIM-781 2D LIDAR and normal camera). The proposed approach is verified on our collected dataset and public dataset with superior depth-sensing performance.

Multi-Camera Sports Players 3D Localization with Identification Reasoning

Yukun Yang, Ruiheng Zhang, Wanneng Wu, Yu Peng, Xu Min

Responsive image

Auto-TLDR; Probabilistic and Identified Occupancy Map for Sports Players 3D Localization

Slides Poster Similar

Multi-camera sports players 3D localization is always a challenging task due to heavy occlusions in crowded sports scene. Traditional methods can only provide players locations without identification information. Existing methods of localization may cause ambiguous detection and unsatisfactory precision and recall, especially when heavy occlusions occur. To solve this problem, we propose a generic localization method by providing distinguishable results that have the probabilities of locations being occupied by players with unique ID labels. We design the algorithms with a multi-dimensional Bayesian model to create a Probabilistic and Identified Occupancy Map (PIOM). By using this model, we jointly apply deep learning-based object segmentation and identification to obtain sports players probable positions and their likely identification labels. This approach not only provides players 3D locations but also gives their ID information that are distinguishable from others. Experimental results demonstrate that our method outperforms the previous localization approaches with reliable and distinguishable outcomes.

Extraction and Analysis of 3D Kinematic Parameters of Table Tennis Ball from a Single Camera

Jordan Calandre, Renaud Péteri, Laurent Mascarilla, Benoit Tremblais

Responsive image

Auto-TLDR; 3D Ball Trajectories Analysis using a Single Camera for Sport Gesture Analysis

Slides Poster Similar

Vision is the first indicator for coaches to assess the quality of a sport gesture. However, gesture analysis using computer vision is often restricted to laboratory experiments, far from the real conditions in which athletes train on a daily basis. In this perspective, we introduce 3D ball trajectories analysis using a single camera with very few acquisition constraints. A key point of the proposal is the estimation of the apparent ball size for obtaining ball to camera distance. For this purpose, a 2D CNN is trained using a generated dataset that enables a reliable ball size extraction, even in case of high motion blur. The final objective is not only to be able to determine ball trajectories, but most importantly to retrieve their relevant physical parameters. With a precise estimation of those trajectories, it is indeed possible to extract the ball tangential and rotation speed, related to the so-called Magnus effect. Validation experiments for characterizing table tennis strokes are presented on both a synthetic dataset and on real video sequences.

Motion Segmentation with Pairwise Matches and Unknown Number of Motions

Federica Arrigoni, Tomas Pajdla, Luca Magri

Responsive image

Auto-TLDR; Motion Segmentation using Multi-Modelfitting andpermutation synchronization

Slides Poster Similar

In this paper we address motion segmentation, that is the problem of clustering points in multiple images according to a number of moving objects. Two-frame correspondences are assumed as input without prior knowledge about trajectories. Our method is based on principles from ''multi-model fitting'' and ''permutation synchronization'', and - differently from previous techniques working under the same assumptions - it can handle an unknown number of motions. The proposed approach is validated on standard datasets, showing that it can correctly estimate the number of motions while maintaining comparable or better accuracy than the state of the art.

Exact and Convergent Iterative Methods to Compute the Orthogonal Point-To-Ellipse Distance

Siyu Guo, Pingping Hu, Zhigang Ling, He Wen, Min Liu, Lu Tang

Responsive image

Auto-TLDR; Convergent iterative algorithm for orthogonal distance based ellipse fitting

Slides Poster Similar

Computation of the orthogonal distance from a given point to an ellipse is the basis of orthogonal distance based ellipse fitting methods. The problem of this orthogonal distance and the corresponding orthogonal contacting point on the ellipse is investigated, and two algorithms, the exact one and the convergent iterative one, are proposed. The exact algorithm utilizes the closed form solution of quartic equations, but is numerically unstable. The iterative algorithm, however, uses Newton’s method to solve the equation, and starts from an initial solution that is proven to lead to a convergent iteration. The proposed algorithms are compared in experiments with an existing rival. Although the rival algorithm is slightly faster and more accurate in realistic scenarios, divergence is likely to occur. On the other hand, both our exact and iterative algorithms can reliably produce the solution needed. While the exact algorithm encounters numeric instability, the iterative algorithm is only slightly outperformed by the existing rival in speed and accuracy, but at the same time provides more reliable computation process, thus making it a preferable method for the task.

Exploring Severe Occlusion: Multi-Person 3D Pose Estimation with Gated Convolution

Renshu Gu, Gaoang Wang, Jenq-Neng Hwang

Responsive image

Auto-TLDR; 3D Human Pose Estimation for Multi-Human Videos with Occlusion

Slides Similar

3D human pose estimation (HPE) is crucial in human behavior analysis, augmented reality/virtual reality (AR/VR) applications, and self-driving industry. Videos that contain multiple potentially occluded people captured from freely moving monocular cameras are very common in real-world scenarios, while 3D HPE for such scenarios is quite challenging, partially because there is a lack of such data with accurate 3D ground truth labels in existing datasets. In this paper, we propose a temporal regression network with a gated convolution module to transform 2D joints to 3D and recover the missing occluded joints in the meantime. A simple yet effective localization approach is further conducted to transform the normalized pose to the global trajectory. To verify the effectiveness of our approach, we also collect a new moving camera multi-human (MMHuman) dataset that includes multiple people with heavy occlusion captured by moving cameras. The 3D ground truth joints are provided by accurate motion capture (MoCap) system. From the experiments on static-camera based Human3.6M data and our own collected moving-camera based data, we show that our proposed method outperforms most state-of-the-art 2D-to-3D pose estimation methods, especially for the scenarios with heavy occlusions.

A Plane-Based Approach for Indoor Point Clouds Registration

Ketty Favre, Muriel Pressigout, Luce Morin, Eric Marchand

Responsive image

Auto-TLDR; A plane-based registration approach for indoor environments based on LiDAR data

Slides Poster Similar

Iterative Closest Point (ICP) is one of the mostly used algorithms for 3D point clouds registration. This classical approach can be impacted by the large number of points contained in a point cloud. Planar structures, which are less numerous than points, can be used in well-structured man-made environment. In this paper we propose a registration method inspired by the ICP algorithm in a plane-based registration approach for indoor environments. This method is based solely on data acquired with a LiDAR sensor. A new metric based on plane characteristics is introduced to find the best plane correspondences. The optimal transformation is estimated through a two-step minimization approach, successively performing robust plane-to-plane minimization and non-linear robust point-to-plane registration. Experiments on the Autonomous Systems Lab (ASL) dataset show that the proposed method enables to successfully register 100% of the scans from the three indoor sequences. Experiments also show that the proposed method is more robust in large motion scenarios than other state-of-the-art algorithms.

P2D: A Self-Supervised Method for Depth Estimation from Polarimetry

Marc Blanchon, Desire Sidibe, Olivier Morel, Ralph Seulin, Daniel Braun, Fabrice Meriaudeau

Responsive image

Auto-TLDR; Polarimetric Regularization for Monocular Depth Estimation

Slides Poster Similar

Monocular depth estimation is a recurring subject in the field of computer vision. Its ability to describe scenes via a depth map while reducing the constraints related to the formulation of perspective geometry tends to favor its use. However, despite the constant improvement of algorithms, most methods exploit only colorimetric information. Consequently, robustness to events to which the modality is not sensitive to, like specularity or transparency, is neglected. In response to this phenomenon, we propose using polarimetry as an input for a self-supervised monodepth network. Therefore, we propose exploiting polarization cues to encourage accurate reconstruction of scenes. Furthermore, we include a term of polarimetric regularization to state-of-the-art method to take specific advantage of the data. Our method is evaluated both qualitatively and quantitatively demonstrating that the contribution of this new information as well as an enhanced loss function improves depth estimation results, especially for specular areas.

Effective Deployment of CNNs for 3DoF Pose Estimation and Grasping in Industrial Settings

Daniele De Gregorio, Riccardo Zanella, Gianluca Palli, Luigi Di Stefano

Responsive image

Auto-TLDR; Automated Deep Learning for Robotic Grasping Applications

Slides Poster Similar

In this paper we investigate how to effectively deploy deep learning in practical industrial settings, such as robotic grasping applications. When a deep-learning based solution is proposed, usually lacks of any simple method to generate the training data. In the industrial field, where automation is the main goal, not bridging this gap is one of the main reasons why deep learning is not as widespread as it is in the academic world. For this reason, in this work we developed a system composed by a 3-DoF Pose Estimator based on Convolutional Neural Networks (CNNs) and an effective procedure to gather massive amounts of training images in the field with minimal human intervention. By automating the labeling stage, we also obtain very robust systems suitable for production-level usage. An open source implementation of our solution is provided, alongside with the dataset used for the experimental evaluation.

Unconstrained Vision Guided UAV Based Safe Helicopter Landing

Arindam Sikdar, Abhimanyu Sahu, Debajit Sen, Rohit Mahajan, Ananda Chowdhury

Responsive image

Auto-TLDR; Autonomous Helicopter Landing in Hazardous Environments from Unmanned Aerial Images Using Constrained Graph Clustering

Slides Poster Similar

In this paper, we have addressed the problem of automated detection of safe zone(s) for helicopter landing in hazardous environments from images captured by an Unmanned Aerial Vehicle (UAV). The unconstrained motion of the image capturing drone (the UAV in our case) makes the problem further difficult. The solution pipeline consists of natural landmark detection and tracking, stereo-pair generation using constrained graph clustering, digital terrain map construction and safe landing zone detection. The main methodological contribution lies in mathematically formulating epipolar constraint and then using it in a Minimum Spanning Tree (MST) based graph clustering approach. We have also made publicly available AHL (Autonomous Helicopter Landing) dataset, a new aerial video dataset captured by a drone, with annotated ground-truths. Experimental comparisons with other competing clustering methods i) in terms of Dunn Index and Davies Bouldin Index as well as ii) for frame-level safe zone detection in terms of F-measure and confusion matrix clearly demonstrate the effectiveness of the proposed formulation.

Hybrid Approach for 3D Head Reconstruction: Using Neural Networks and Visual Geometry

Oussema Bouafif, Bogdan Khomutenko, Mohammed Daoudi

Responsive image

Auto-TLDR; Recovering 3D Head Geometry from a Single Image using Deep Learning and Geometric Techniques

Slides Poster Similar

Recovering the 3D geometric structure of a face from a single input image is a challenging active research area in computer vision. In this paper, we present a novel method for reconstructing 3D heads from a single or multiple image(s) using a hybrid approach based on deep learning and geometric techniques. We propose an encoder-decoder network based on the U-net architecture and trained on synthetic data only. It predicts both pixel-wise normal vectors and landmarks maps from a single input photo. Landmarks are used for the pose computation and the initialization of the optimization problem, which, in turn, reconstructs the 3D head geometry by using a parametric morphable model and normal vector fields. State-of-the-art results are achieved through qualitative and quantitative evaluation tests on both single and multi-view settings. Despite the fact that the model was trained only on synthetic data, it successfully recovers 3D geometry and precise poses for real-world images.

3D Pots Configuration System by Optimizing Over Geometric Constraints

Jae Eun Kim, Muhammad Zeeshan Arshad, Seong Jong Yoo, Je Hyeong Hong, Jinwook Kim, Young Min Kim

Responsive image

Auto-TLDR; Optimizing 3D Configurations for Stable Pottery Restoration from irregular and noisy evidence

Slides Poster Similar

While potteries are common artifacts excavated in archaeological sites, the restoration process relies on the manual cleaning and reassembling shattered pieces. Since the number of possible 3D configurations is considerably large, the exhaustive manual trial may result in an abrasion on fractured surfaces and even failure to find the correct matches. As a result, many recent works suggest virtual reassembly from 3D scans of the fragments. The problem is challenging in the view of the conventional 3D geometric analysis, as it is hard to extract reliable shape features from the thin break lines. We propose to optimize the global configuration by combining geometric constraints with information from noisy shape features. Specifically, we enforce bijection and continuity of sequence of correspondences given estimates of corners and pair-wise matching scores between multiple break lines. We demonstrate that our pipeline greatly increases the accuracy of correspondences, resulting in the stable restoration of 3D configurations from irregular and noisy evidence.

Distortion-Adaptive Grape Bunch Counting for Omnidirectional Images

Ryota Akai, Yuzuko Utsumi, Yuka Miwa, Masakazu Iwamura, Koichi Kise

Responsive image

Auto-TLDR; Object Counting for Omnidirectional Images Using Stereographic Projection

Poster Similar

This paper proposes the first object counting method for omnidirectional images. Because conventional object counting methods cannot handle the distortion of omnidirectional images, we propose to process them using stereographic projection, which enables conventional methods to obtain a good approximation of the density function. However, the images obtained by stereographic projection are still distorted. Hence, to manage this distortion, we propose two methods. One is a new data augmentation method designed for the stereographic projection of omnidirectional images. The other is a distortion-adaptive Gaussian kernel that generates a density map ground truth while taking into account the distortion of stereographic projection. Using the counting of grape bunches as a case study, we constructed an original grape-bunch image dataset consisting of omnidirectional images and conducted experiments to evaluate the proposed method. The results show that the proposed method performs better than a direct application of the conventional method, improving mean absolute error by 14.7% and mean squared error by 10.5%.

Orthographic Projection Linear Regression for Single Image 3D Human Pose Estimation

Yahui Zhang, Shaodi You, Theo Gevers

Responsive image

Auto-TLDR; A Deep Neural Network for 3D Human Pose Estimation from a Single 2D Image in the Wild

Slides Poster Similar

3D human pose estimation from a single 2D image in the wild is an important computer vision task but yet extremely challenging. Unlike images taken from indoor and well constrained environments, 2D outdoor images in the wild are extremely complex because of varying imaging conditions. Furthermore, 2D images usually do not have corresponding 3D pose ground truth making a supervised approach ill constrained. Therefore, in this paper, we propose to associate the 3D human pose, the 2D human pose projection and the 2D image appearance through a new orthographic projection based linear regression module. Unlike existing reprojection based approaches, our orthographic projection and regression do not suffer from small angle problems, which usually lead to overfitting in the depth dimension. Hence, we propose a deep neural network which adopts the 2D pose, 3D pose regression and orthographic projection linear regression module. The proposed method shows state-of-the art performance on the Human3.6M dataset and generalizes well to in-the-wild images.

Light3DPose: Real-Time Multi-Person 3D Pose Estimation from Multiple Views

Alessio Elmi, Davide Mazzini, Pietro Tortella

Responsive image

Auto-TLDR; 3D Pose Estimation of Multiple People from a Few calibrated Camera Views using Deep Learning

Slides Poster Similar

We present an approach to perform 3D pose estimation of multiple people from a few calibrated camera views. Our architecture, leveraging the recently proposed unprojection layer, aggregates feature-maps from a 2D pose estimator backbone into a comprehensive representation of the 3D scene. Such intermediate representation is then elaborated by a fully-convolutional volumetric network and a decoding stage to extract 3D skeletons with sub-voxel accuracy. Our method achieves state of the art MPJPE on the CMU Panoptic dataset using a few unseen views and obtains competitive results even with a single input view. We also assess the transfer learning capabilities of the model by testing it against the publicly available Shelf dataset obtaining good performance metrics. The proposed method is inherently efficient: as a pure bottom-up approach, it is computationally independent of the number of people in the scene. Furthermore, even though the computational burden of the 2D part scales linearly with the number of input views, the overall architecture is able to exploit a very lightweight 2D backbone which is orders of magnitude faster than the volumetric counterpart, resulting in fast inference time. The system can run at 6 FPS, processing up to 10 camera views on a single 1080Ti GPU.

Holistic Grid Fusion Based Stop Line Estimation

Runsheng Xu, Faezeh Tafazzoli, Li Zhang, Timo Rehfeld, Gunther Krehl, Arunava Seal

Responsive image

Auto-TLDR; Fused Multi-Sensory Data for Stop Lines Detection in Intersection Scenarios

Slides Similar

Intersection scenarios provide the most complex traffic situations in Autonomous Driving and Driving Assistance Systems. Knowing where to stop in advance in an intersection is an essential parameter in controlling the longitudinal velocity of the vehicle. Most of the existing methods in literature solely use cameras to detect stop lines, which is typically not sufficient in terms of detection range. To address this issue, we propose a method that takes advantage of fused multi-sensory data including stereo camera and lidar as input and utilizes a carefully designed convolutional neural network architecture to detect stop lines. Our experiments show that the proposed approach can improve detection range compared to camera data alone, works under heavy occlusion without observing the ground markings explicitly, is able to predict stop lines for all lanes and allows detection at a distance up to 50 meters.

Inner Eye Canthus Localization for Human Body Temperature Screening

Claudio Ferrari, Lorenzo Berlincioni, Marco Bertini, Alberto Del Bimbo

Responsive image

Auto-TLDR; Automatic Localization of the Inner Eye Canthus in Thermal Face Images using 3D Morphable Face Model

Slides Poster Similar

In this paper, we propose an automatic approach for localizing the inner eye canthus in thermal face images. We first coarsely detect 5 facial keypoints corresponding to the center of the eyes, the nosetip and the ears. Then we compute a sparse 2D-3D points correspondence using a 3D Morphable Face Model (3DMM). This correspondence is used to project the entire 3D face onto the image, and subsequently locate the inner eye canthus. Detecting this location allows to obtain the most precise body temperature measurement for a person using a thermal camera. We evaluated the approach on a thermal face dataset provided with manually annotated landmarks. However, such manual annotations are normally conceived to identify facial parts such as eyes, nose and mouth, and are not specifically tailored for localizing the eye canthus region. As additional contribution, we enrich the original dataset by using the annotated landmarks to deform and project the 3DMM onto the images. Then, by manually selecting a small region corresponding to the eye canthus, we enrich the dataset with additional annotations. By using the manual landmarks, we ensure the correctness of the 3DMM projection, which can be used as ground-truth for future evaluations. Moreover, we supply the dataset with the 3D head poses and per-point visibility masks for detecting self-occlusions. The data will be publicly released.

IPT: A Dataset for Identity Preserved Tracking in Closed Domains

Thomas Heitzinger, Martin Kampel

Responsive image

Auto-TLDR; Identity Preserved Tracking Using Depth Data for Privacy and Privacy

Slides Poster Similar

We present a public dataset for Identity Preserved Tracking (IPT) consisting of sequences of depth data recorded using an Orbbec Astra depth sensor. The dataset features sequences in ten different locations with a high amount of background variation and is designed to be applicable to a wide range of tasks. Its labeling is versatile, allowing for tracking in either 3d space or image coordinates. Next to frame-by-frame 3d and inferred bounding box labeling we provide supplementary annotation of camera poses and room layouts, split in multiple semantically distinct categories. Intended use-cases are applications where both a high level understanding of scene understanding and privacy are central points of consideration, such as active and assisted living (AAL), security and industrial safety. Compared to similar public datasets IPT distinguishes itself with its sequential data format, 3d instance labeling and room layout annotation. We present baseline object detection results in image coordinates using a YOLOv3 network architecture and implement a background model suitable for online tracking applications to increase detection accuracy. Additionally we propose a novel volumetric non-maximum suppression (V-NMS) approach, taking advantage of known room geometry. Last we provide baseline person tracking results utilizing Multiple Object Tracking Challenge (MOTChallenge) evaluation metrics of the CVPR19 benchmark.

Map-Based Temporally Consistent Geolocalization through Learning Motion Trajectories

Bing Zha, Alper Yilmaz

Responsive image

Auto-TLDR; Exploiting Motion Trajectories for Geolocalization of Object on Topological Map using Recurrent Neural Network

Slides Poster Similar

In this paper, we propose a novel trajectory learning method that exploits motion trajectories on topological map using recurrent neural network for temporally consistent geolocalization of object. Inspired by human's ability to both be aware of distance and direction of self-motion in navigation, our trajectory learning method learns a pattern representation of trajectories encoded as a sequence of distances and turning angles to assist self-localization. We pose the learning process as a conditional sequence prediction problem in which each output locates the object on a traversable edge in a map. Considering the prediction sequence ought to be topologically connected in the graph-structured map, we adopt two different hypotheses generation and elimination strategies to eliminate disconnected sequence prediction. We demonstrate our approach on the KITTI stereo visual odometry dataset which is a city-scale environment. The key benefits of our approach to geolocalization are that 1) we take advantage of powerful sequence modeling ability of recurrent neural network and its robustness to noisy input, 2) only require a map in the form of a graph and 3) simply use an affordable sensor that generates motion trajectory. The experiments show that the motion trajectories can be learned by training an recurrent neural network, and temporally consistent geolocation can be predicted with both of the proposed strategies.

Learning Knowledge-Rich Sequential Model for Planar Homography Estimation in Aerial Video

Pu Li, Xiaobai Liu

Responsive image

Auto-TLDR; Sequential Estimation of Planar Homographic Transformations over Aerial Videos

Slides Poster Similar

This paper presents an unsupervised approach that leverages raw aerial videos to learn to estimate planar homographic transformation between consecutive video frames. Previous learning-based estimators work on pairs of images to estimate their planar homographic transformations but suffer from severe over-fitting issues, especially when applying over aerial videos. To address this concern, we develop a sequential estimator that directly processes a sequence of video frames and estimates their pairwise planar homographic transformations in batches. We also incorporate a set of spatial-temporal knowledge to regularize the learning of such a sequence-to-sequence model. We collect a set of challenging aerial videos and compare the proposed method to the alternative algorithms. Empirical studies suggest that our sequential model achieves significant improvement over alternative image-based methods and the knowledge-rich regularization further boosts our system performance. Our codes and dataset could be found at https://github.com/Paul-LiPu/DeepVideoHomography

Two-Stage Adaptive Object Scene Flow Using Hybrid CNN-CRF Model

Congcong Li, Haoyu Ma, Qingmin Liao

Responsive image

Auto-TLDR; Adaptive object scene flow estimation using a hybrid CNN-CRF model and adaptive iteration

Slides Poster Similar

Scene flow estimation based on stereo sequences is a comprehensive task relevant to disparity and optical flow. Some existing methods are time-consuming and often fail in the presence of reflective surfaces. In this paper, we propose a two-stage adaptive object scene flow estimation method using a hybrid CNN-CRF model (ACOSF), which benefits from high-quality features and the structured modelling capability. Meanwhile, in order to balance the computational efficiency and accuracy, we employ adaptive iteration for energy function optimization, which is flexible and efficient for various scenes. Besides, we utilize high-quality pixel selection to reduce the computation time with only a slight decrease in accuracy. Our method achieves competitive results with the state-of-the-art, which ranks second on the challenging KITTI 2015 scene flow benchmark.

Approach for Document Detection by Contours and Contrasts

Daniil Tropin, Sergey Ilyuhin, Dmitry Nikolaev, Vladimir V. Arlazarov

Responsive image

Auto-TLDR; A countor-based method for arbitrary document detection on a mobile device

Slides Poster Similar

This paper considers the task of arbitrary document detection performed on a mobile device. The classical contour-based approach often mishandles cases with occlusion, complex background, or blur. Region-based approach, which relies on the contrast between object and background, does not have limitations, however its known implementations are highly resource-consuming. We propose a modification of a countor-based method, in which the competing hypotheses of the contour location are ranked according to the contrast between the areas inside and outside the border. In the performed experiments such modification leads to the 40% decrease of alternatives ordering errors and 10% decrease of the overall number of detection errors. We updated state-of-the-art performance on the open MIDV-500 dataset and demonstrated competitive results with the state-of-the-art on the SmartDoc dataset.

A Multi-Task Neural Network for Action Recognition with 3D Key-Points

Rongxiao Tang, Wang Luyang, Zhenhua Guo

Responsive image

Auto-TLDR; Multi-task Neural Network for Action Recognition and 3D Human Pose Estimation

Slides Poster Similar

Action recognition and 3D human pose estimation are the fundamental problems in computer vision and closely related. In this work, we propose a multi-task neural network for action recognition and 3D human pose estimation. The results of the previous methods are still error-prone especially when tested against the images taken in-the-wild, leading error results in action recognition. To solve this problem, we propose a principled approach to generate high quality 3D pose ground truth given any in-the-wild image with a person inside. We achieve this by first devising a novel stereo inspired neural network to directly map any 2D pose to high quality 3D counterpart. Based on the high-quality 3D labels, we carefully design the multi-task framework for action recognition and 3D human pose estimation. The proposed architecture can utilize the shallow, deep features of the images, and the in-the-wild 3D human key-points to guide a more precise result. High quality 3D key-points can fully reflect the morphological features of motions, thus boosting the performance on action recognition. Experiments demonstrate that 3D pose estimation leads to significantly higher performance on action recognition than separated learning. We also evaluate the generalization ability of our method both quantitatively and qualitatively. The proposed architecture performs favorably against the baseline 3D pose estimation methods. In addition, the reported results on Penn Action and NTU datasets demonstrate the effectiveness of our method on the action recognition task.

RONELD: Robust Neural Network Output Enhancement for Active Lane Detection

Zhe Ming Chng, Joseph Mun Hung Lew, Jimmy Addison Lee

Responsive image

Auto-TLDR; Real-Time Robust Neural Network Output Enhancement for Active Lane Detection

Slides Poster Similar

Accurate lane detection is critical for navigation in autonomous vehicles, particularly the active lane which demarcates the single road space that the vehicle is currently traveling on. Recent state-of-the-art lane detection algorithms utilize convolutional neural networks (CNNs) to train deep learning models on popular benchmarks such as TuSimple and CULane. While each of these models works particularly well on train and test inputs obtained from the same dataset, the performance drops significantly on unseen datasets of different environments. In this paper, we present a real-time robust neural network output enhancement for active lane detection (RONELD) method to identify, track, and optimize active lanes from deep learning probability map outputs. We first adaptively extract lane points from the probability map outputs, followed by detecting curved and straight lanes before using weighted least squares linear regression on straight lanes to fix broken lane edges resulting from fragmentation of edge maps in real images. Lastly, we hypothesize true active lanes through tracking preceding frames. Experimental results demonstrate an up to two-fold increase in accuracy using RONELD on cross-dataset validation tests.

Sequential Non-Rigid Factorisation for Head Pose Estimation

Stefania Cristina, Kenneth Patrick Camilleri

Responsive image

Auto-TLDR; Sequential Shape-and-Motion Factorisation for Head Pose Estimation in Eye-Gaze Tracking

Slides Poster Similar

Within the context of eye-gaze tracking, the capability of permitting the user to move naturally is an important step towards allowing for more natural user interaction in less constrained scenarios. Natural movement can be characterised by changes in head pose, as well as non-rigid face deformations as the user performs different facial expressions. While the estimation of head pose within the domain of eye-gaze tracking is being increasingly considered, the face is most often regarded as a rigid body. The few methods that factor the challenge of handling face deformations into the head pose estimation problem, often require the availability of a pre-defined face model or a considerable amount of training data. In this paper, we direct our attention towards the application of shape-and-motion factorisation for head pose estimation, since this does not generally rely on the availability of an initial face model. Over the years, various shape-and-motion factorisation methods have been proposed to address the challenges of rigid and non-rigid shape and motion recovery, in a batch or sequential manner. However, the real-time recovery of non-rigid shape and motion by factorisation remains, in general, an open problem. Our work addresses this open problem by proposing a sequential factorisation method for non-rigid shape and motion recovery, which does not rely on the availability of a pre-defined face deformation model or training data. Quantitative and qualitative results show that our method can handle various non-rigid face deformations without deterioration of the head pose estimation accuracy.

Mobile Augmented Reality: Fast, Precise, and Smooth Planar Object Tracking

Dmitrii Matveichev, Daw-Tung Lin

Responsive image

Auto-TLDR; Planar Object Tracking with Sparse Optical Flow Tracking and Descriptor Matching

Slides Poster Similar

We propose an innovative method for combining sparse optical flow tracking and descriptor matching algorithms. The proposed approach solves the following problems that are inherent to keypoint-based and optical flow based tracking algorithms: spatial jitter, extreme scale transformation, extreme perspective transformation, degradation in the number of tracking points, and drifting of tracking points. Our algorithm provides smooth object-position tracking under six degrees of freedom transformations with a small computational cost for providing a high-quality real-time AR experience on mobile platforms. We experimentally demonstrate that our approach outperforms the state-of-the-art tracking algorithms while offering faster computational time. A mobile augmented reality (AR) application, which is developed using our approach, delivers planar object tracking with 30 FPS on modern mobile phones for a camera resolution of 1280$\times$720. Finally, we compare the performance of our AR application with that of the Vuforia-based AR application on the same planar objects database. The test results show that our AR application delivers better AR experience than Vuforia in terms of smooth transition of object-pose between video frames.

Visual Saliency Oriented Vehicle Scale Estimation

Qixin Chen, Tie Liu, Jiali Ding, Zejian Yuan, Yuanyuan Shang

Responsive image

Auto-TLDR; Regularized Intensity Matching for Vehicle Scale Estimation with salient object detection

Slides Poster Similar

Vehicle scale estimation with a single camera is a typical application for intelligent transportation and it faces the challenges from visual computing while intensity-based method and descriptor-based method should be balanced. This paper proposed a vehicle scale estimation method based on salient object detection to resolve this problem. The regularized intensity matching method is proposed in Lie Algebra to achieve robust and accurate scale estimation, and descriptor matching and intensity matching are combined to minimize the proposed loss function. The visual attention mechanism is designed to select image patches with texture and remove the occluded image patches. Then the weights are assigned to pixels from the selected image patches which alleviates the influence of noise-corrupted pixels. The experiments show that the proposed method significantly outperforms state-of-the-art methods with regard to the robustness and accuracy of vehicle scale estimation.

Generalized Conics: Properties and Applications

Aysylu Gabdulkhakova, Walter Kropatsch

Responsive image

Auto-TLDR; A Generalized Conic Representation for Distance Fields

Slides Poster Similar

In this paper the properties of the generalized conics are used to create a unified framework for generating various types of the distance fields. The main concept behind this work is a metric that measures the distance from a point to a line segment according to the definition of the ellipse. The proposed representation provides a possibility to efficiently compute the proximity, arithmetic mean of the distances and a space tessellation with regard to the given set of polygonal objects, line segments and points. In addition, the weights can be introduced for objects, their parts and combinations. This fact leads to a hierarchical representation that can be efficiently obtained using the pixel-wise operations. The practical value of the proposed ideas is demonstrated on example of applications like skeletonization, smoothing, optimal location finding and clustering.