A New Geodesic-Based Feature for Characterization of 3D Shapes: Application to Soft Tissue Organ Temporal Deformations

Karim Makki, Amine Bohi, Augustin Ogier, Marc-Emmanuel Bellemare

Responsive image

Auto-TLDR; Spatio-Temporal Feature Descriptors for 3D Shape Characterization from Point Clouds

Slides Poster

Spatio-temporal feature descriptors are of great importance for characterizing the local changes of 3D deformable shapes. In this study, we propose a method for characterizing 3D shapes from point clouds and we show a direct application on a study of organ temporal deformations. As an example, we characterize the behavior of the bladder during forced respiratory motion with a reduced number of 3D surface points: first, a set of equidistant points representing the vertices of quadrilateral mesh for the organ surface are tracked throughout a long dynamic MRI sequence using a large deformation diffeomorphic metric mapping (LDDMM) framework. Second, a novel 3D shape descriptor invariant to translation, scale and rotation is proposed for characterizing the temporal organ deformations by employing an Eulerian Partial Differential Equations (PDEs) methodology. We demonstrate the robustness of our feature on both synthetic 3D shapes and realistic dynamic Magnetic Resonance Imaging (MRI) data sequences portraying the bladder deformation during a forced breathing exercise. Promising results are obtained, showing that the proposed feature may be useful for several computer vision applications such as medical imaging, aerodynamics and robotics.

Similar papers

A Plane-Based Approach for Indoor Point Clouds Registration

Ketty Favre, Muriel Pressigout, Luce Morin, Eric Marchand

Responsive image

Auto-TLDR; A plane-based registration approach for indoor environments based on LiDAR data

Slides Poster Similar

Iterative Closest Point (ICP) is one of the mostly used algorithms for 3D point clouds registration. This classical approach can be impacted by the large number of points contained in a point cloud. Planar structures, which are less numerous than points, can be used in well-structured man-made environment. In this paper we propose a registration method inspired by the ICP algorithm in a plane-based registration approach for indoor environments. This method is based solely on data acquired with a LiDAR sensor. A new metric based on plane characteristics is introduced to find the best plane correspondences. The optimal transformation is estimated through a two-step minimization approach, successively performing robust plane-to-plane minimization and non-linear robust point-to-plane registration. Experiments on the Autonomous Systems Lab (ASL) dataset show that the proposed method enables to successfully register 100% of the scans from the three indoor sequences. Experiments also show that the proposed method is more robust in large motion scenarios than other state-of-the-art algorithms.

Facetwise Mesh Refinement for Multi-View Stereo

Andrea Romanoni, Matteo Matteucci

Responsive image

Auto-TLDR; Facetwise Refinement of Multi-View Stereo using Delaunay Triangulations

Slides Similar

Mesh refinement is a fundamental step for accurate Multi-View Stereo. It modifies the geometry of an initial manifold mesh to minimize the photometric error induced in a set of camera pairs. This initial mesh is usually the output of volumetric 3D reconstruction based on min-cut over Delaunay Triangulations. Such methods produce a significant amount of non-manifold vertices, therefore they require a vertex split step to explicitly repair them. In this paper we extend this method to preemptively fix the non-manifold vertices by reasoning directly on the Delaunay Triangulation and avoid most vertex splits. The main contribution of this paper addresses the problem of choosing the camera pairs adopted by the refinement process. We treat the problem as a mesh labeling process, where each label corresponds to a camera pair. Differently from the state-of-the-art methods, which use each camera pair to refine all the visible parts of the mesh, we choose, for each facet, the best pair that enforces both the overall visibility and coverage. The refinement step is applied for each facet using only the camera pair selected. This facetwise refinement helps the process to be applied in the most evenly way possible.

3D Point Cloud Registration Based on Cascaded Mutual Information Attention Network

Xiang Pan, Xiaoyi Ji

Responsive image

Auto-TLDR; Cascaded Mutual Information Attention Network for 3D Point Cloud Registration

Slides Poster Similar

For 3D point cloud registration, how to improve the local feature correlation of two point clouds is a challenging problem. In this paper, we propose a cascaded mutual information attention registration network. The network improves the accuracy of point cloud registration by stacking residual structure and using lateral connection. Firstly, the local reference coordinate system is defined by spherical representation for the local point set, which improves the stability and reliability of local features under noise. Secondly, the attention structure is used to improve the network depth and ensure the convergence of the network. Furthermore, a lateral connection is introduced into the network to avoid the loss of features in the process of concatenation. In the experimental part, the results of different algorithms are compared. It can be found that the proposed cascaded network can enhance the correlation of local features between different point clouds. As a result, it improves the registration accuracy significantly over the DCP and other typical algorithms.

Transferable Model for Shape Optimization subject to Physical Constraints

Lukas Harsch, Johannes Burgbacher, Stefan Riedelbauch

Responsive image

Auto-TLDR; U-Net with Spatial Transformer Network for Flow Simulations

Slides Poster Similar

The interaction of neural networks with physical equations offers a wide range of applications. We provide a method which enables a neural network to transform objects subject to given physical constraints. Therefore an U-Net architecture is used to learn the underlying physical behaviour of fluid flows. The network is used to infer the solution of flow simulations which will be shown for a wide range of generic channel flow simulations. Physical meaningful quantities can be computed on the obtained solution, e.g. the total pressure difference or the forces on the objects. A Spatial Transformer Network with thin-plate-splines is used for the interaction between the physical constraints and the geometric representation of the objects. Thus, a transformation from an initial to a target geometry is performed such that the object is fulfilling the given constraints. This method is fully differentiable i.e., gradient informations can be used for the transformation. This can be seen as an inverse design process. The advantage of this method over many other proposed methods is, that the physical constraints are based on the inferred flow field solution. Thus, we can apply a transferable model to varying problem setups, which is not limited to a given set of geometry parameters or physical quantities.

Graph Signal Active Contours

Olivier Lezoray

Responsive image

Auto-TLDR; Adaptation of Active Contour Without Edges for Graph Signal Processing

Slides Similar

With the advent of data living on vertices of graphs, there is much interest in processing the so-called graph signals for partitioning tasks. As active contours have had much impact in the image processing community, their formulation on graphs is of importance to the field of graph signal processing. This paper proposes an adaptation on graphs of a model that combines the Geodesic Active Contour and the Active Contour Without Edges models. In addition, specific terms depending on graphs are introduced in the formulation. This adaptation is solved using a level set formulation with a gradient descent that can be expressed as a morphological front evolution process. Experimental results on different kinds of graphs signals show the benefit of the approach.

Inferring Functional Properties from Fluid Dynamics Features

Andrea Schillaci, Maurizio Quadrio, Carlotta Pipolo, Marcello Restelli, Giacomo Boracchi

Responsive image

Auto-TLDR; Exploiting Convective Properties of Computational Fluid Dynamics for Medical Diagnosis

Slides Poster Similar

In a wide range of applied problems involving fluid flows, Computational Fluid Dynamics (CFD) provides detailed quantitative information on the flow field, at various levels of fidelity and computational cost. However, CFD alone cannot predict high-level functional properties of the system that are not easily obtained from the equations of fluid motion. In this work, we present a data-driven framework to extract additional information, such as medical diagnostic output, from CFD solutions. The task is made difficult by the huge data dimensionality of CFD, together with the limited amount of training data implied by its high computational cost. By pursuing a traditional ML pipeline of pre-processing, feature extraction, and model training, we demonstrate that informative features can be extracted from CFD data. Two experiments, pertaining to different application domains, support the claim that the convective properties implicit into a CFD solution can be leveraged to retrieve functional information for which an analytical definition is missing. Despite the preliminary nature of our study and the relative simplicity of both the geometrical and CFD models, for the first time we demonstrate that the combination of ML and CFD can diagnose a complex system in terms of high-level functional information.

Generalized Shortest Path-Based Superpixels for Accurate Segmentation of Spherical Images

Rémi Giraud, Rodrigo Borba Pinheiro, Yannick Berthoumieu

Responsive image

Auto-TLDR; SPS: Spherical Shortest Path-based Superpixels

Slides Poster Similar

Most of existing superpixel methods are designed to segment standard planar images as pre-processing for computer vision pipelines. Nevertheless, the increasing number of applications based on wide angle capture devices, mainly generating 360° spherical images, have enforced the need for dedicated superpixel approaches. In this paper, we introduce a new superpixel method for spherical images called SphSPS (for Spherical Shortest Path-based Superpixels). Our approach respects the spherical geometry and generalizes the notion of shortest path between a pixel and a superpixel center on the 3D spherical acquisition space. We show that the feature information on such path can be efficiently integrated into our clustering framework and jointly improves the respect of object contours and the shape regularity. To relevantly evaluate this last aspect in the spherical space, we also generalize a planar global regularity metric. Finally, the proposed SphSPS method obtains significantly better performances than both planar and spherical recent superpixel approaches on the reference 360 o spherical panorama segmentation dataset.

Hybrid Approach for 3D Head Reconstruction: Using Neural Networks and Visual Geometry

Oussema Bouafif, Bogdan Khomutenko, Mohammed Daoudi

Responsive image

Auto-TLDR; Recovering 3D Head Geometry from a Single Image using Deep Learning and Geometric Techniques

Slides Poster Similar

Recovering the 3D geometric structure of a face from a single input image is a challenging active research area in computer vision. In this paper, we present a novel method for reconstructing 3D heads from a single or multiple image(s) using a hybrid approach based on deep learning and geometric techniques. We propose an encoder-decoder network based on the U-net architecture and trained on synthetic data only. It predicts both pixel-wise normal vectors and landmarks maps from a single input photo. Landmarks are used for the pose computation and the initialization of the optimization problem, which, in turn, reconstructs the 3D head geometry by using a parametric morphable model and normal vector fields. State-of-the-art results are achieved through qualitative and quantitative evaluation tests on both single and multi-view settings. Despite the fact that the model was trained only on synthetic data, it successfully recovers 3D geometry and precise poses for real-world images.

Neural Machine Registration for Motion Correction in Breast DCE-MRI

Federica Aprea, Stefano Marrone, Carlo Sansone

Responsive image

Auto-TLDR; A Neural Registration Network for Dynamic Contrast Enhanced-Magnetic Resonance Imaging

Slides Poster Similar

Cancer is one of the leading causes of death in the western world, with medical imaging playing a key role for early diagnosis. Focusing on breast cancer, one of the emerging imaging methodologies is Dynamic Contrast Enhanced-Magnetic Resonance Imaging (DCE-MRI). The flip side of using DCE-MRI is in its long acquisition times, often causing the patient to move, resulting in motion artefacts, namely distortions in the acquired image that can affect DCE-MRI analysis. A possible solution consists in the use of Motion Correction Techniques (MCTs), i.e. procedures intended to re-align the post-contrast image to the corresponding pre-contrast (reference) one. This task is particularly critic in DCE-MRI, due to brightness variations introduced in post-contrast images by the contrast-agent flowing. To face this problem, in this work we introduce a new MCT for breast DCE-MRI leveraging Physiologically Based PharmacoKinetic (PBPK) modelling and Artificial Neural Networks (ANN) to determine the most suitable physiologically-compliant transformation. To this aim, we propose a Neural Registration Network relying on a very task-specific loss function explicitly designed to take into account the contrast agent flowing while enforcing a correct re-alignment. We compared the obtained results against some conventional motion correction techniques, evaluating the performance on a patient-by-patient basis. Results clearly show the effectiveness of the proposed approach, resulting as the best performing even when compares against other techniques designed to take into account for brightness variations.

Photometric Stereo with Twin-Fisheye Cameras

Jordan Caracotte, Fabio Morbidi, El Mustapha Mouaddib

Responsive image

Auto-TLDR; Photometric stereo problem for low-cost 360-degree cameras

Slides Poster Similar

In this paper, we introduce and solve, for the first time, the photometric stereo problem for low-cost 360-degree cameras. In particular, we present a spherical image irradiance equation which is adapted to twin-fisheye cameras, and an original algorithm for the estimation of light directions based on the specular highlights observed on mirror balls. Extensive experiments with synthetic and real-world images captured by a Ricoh Theta V camera, demonstrate the effectiveness and robustness of the proposed 3D reconstruction pipeline. To foster reproducible research, the image dataset and code developed for this paper are made publicly available at the address: https://home.mis.u-picardie.fr/~fabio/PhotoSphere.html

Learning Non-Rigid Surface Reconstruction from Spatio-Temporal Image Patches

Matteo Pedone, Abdelrahman Mostafa, Janne Heikkilä

Responsive image

Auto-TLDR; Dense Spatio-Temporal Depth Maps of Deformable Objects from Video Sequences

Slides Poster Similar

We present a method to reconstruct a dense spatio-temporal depth map of a non-rigidly deformable object directly from a video sequence. The estimation of depth is performed locally on spatio-temporal patches of the video, and then the full depth video of the entire shape is recovered by combining them together. Since the geometric complexity of a local spatio-temporal patch of a deforming non-rigid object is often simple enough to be faithfully represented with a parametric model, we artificially generate a database of small deforming rectangular meshes rendered with different material properties and light conditions, along with their corresponding depth videos, and use such data to train a convolutional neural network. We tested our method on both synthetic and Kinect data and experimentally observed that the reconstruction error is significantly lower than the one obtained using other approaches like conventional non-rigid structure from motion.

A Riemannian Framework for Detecting Stimulus-Relevant Fiber Pathways

Jingyong Su, Linlin Tang, Zhipeng Yang, Mengmeng Guo

Responsive image

Auto-TLDR; Clustering Task-Specific Fiber Pathways in Functional MRI using BOLD Signals

Poster Similar

Functional MRI based on blood oxygenation level-dependent (BOLD) contrast is well established as a neuro-imaging technique for detecting neural activity in the cortex of the human brain. Recent studies have shown that variations of BOLD signals in white matter are also related to neural activities both in resting state and under functional loading. We develop a comprehensive framework of detecting task-specific fiber pathways. We not only study fiber tracts as open curves with different physical features (shape, scale, orientation and position), but also incorporate the BOLD signals transmitted along them to find stimulus-relevant pathways. Specifically, we propose a novel Riemannian metric, which is a weighted sum of distances in product space of shapes and functions. This metric provides both a cost function for registration and a proper distance for comparison. Experimental results on real data have shown that we can cluster fiber pathways correctly by evaluating correlations between BOLD signals and stimuli, temporal variations and power spectra of them. The proposed framework can also be easily generalized to various applications where multi-modality data exist.

Total Estimation from RGB Video: On-Line Camera Self-Calibration, Non-Rigid Shape and Motion

Antonio Agudo

Responsive image

Auto-TLDR; Joint Auto-Calibration, Pose and 3D Reconstruction of a Non-rigid Object from an uncalibrated RGB Image Sequence

Slides Poster Similar

In this paper we present a sequential approach to jointly retrieve camera auto-calibration, camera pose and the 3D reconstruction of a non-rigid object from an uncalibrated RGB image sequence, without assuming any prior information about the shape structure, nor the need for a calibration pattern, nor the use of training data at all. To this end, we propose a Bayesian filtering approach based on a sum-of-Gaussians filter composed of a bank of extended Kalman filters (EKF). For every EKF, we make use of dynamic models to estimate its state vector, which later will be Gaussianly combined to achieve a global solution. To deal with deformable objects, we incorporate a mechanical model solved by using the finite element method. Thanks to these ingredients, the resulting method is both efficient and robust to several artifacts such as missing and noisy observations as well as sudden camera motions, while being available for a wide variety of objects and materials, including isometric and elastic shape deformations. Experimental validation is proposed in real experiments, showing its strengths with respect to competing approaches.

Directional Graph Networks with Hard Weight Assignments

Miguel Dominguez, Raymond Ptucha

Responsive image

Auto-TLDR; Hard Directional Graph Networks for Point Cloud Analysis

Slides Poster Similar

Point cloud analysis is an important field for 3D scene understanding. It has applications in self driving cars and robotics (via LIDAR sensors), 3D graphics, and computer-aided design. Neural networks have recently achieved strong results on point cloud analysis problems such as classification and segmentation. Each point cloud network has the challenge of defining a convolution that can learn useful features on unstructured points. Some recent point cloud convolutions create separate weight matrices for separate directions like a CNN, but apply every weight matrix to every neighbor with soft assignments. This increases computational complexity and makes relatively small neighborhood aggregations expensive to compute. We propose Hard Directional Graph Networks (HDGN), a point cloud model that both learns directional weight matrices and assigns a single matrix to each neighbor, achieving directional convolutions at lower computational cost. HDGN's directional modeling achieves state-of-the-art results on multiple point cloud vision benchmarks.

Distinctive 3D Local Deep Descriptors

Fabio Poiesi, Davide Boscaini

Responsive image

Auto-TLDR; DIPs: Local Deep Descriptors for Point Cloud Regression

Slides Poster Similar

We present a simple but yet effective method for learning distinctive 3D local deep descriptors (DIPs) that can be used to register point clouds without requiring an initial alignment. Point cloud patches are extracted, canonicalised with respect to their estimated local reference frame and encoded into rotation-invariant compact descriptors by a PointNet-based deep neural network. DIPs can effectively generalise across different sensor modalities because they are learnt end-to-end from locally and randomly sampled points. Moreover, because DIPs encode only local geometric information, they are robust to clutter, occlusions and missing regions. We evaluate and compare DIPs against alternative hand-crafted and deep descriptors on several indoor and outdoor datasets reconstructed using different sensors. Results show that DIPs (i) achieve comparable results to the state-of-the-art on RGB-D indoor scenes (3DMatch dataset), (ii) outperform state-of-the-art by a large margin on laser-scanner outdoor scenes (ETH dataset), and (iii) generalise to indoor scenes reconstructed with the Visual-SLAM system of Android ARCore.

3D Pots Configuration System by Optimizing Over Geometric Constraints

Jae Eun Kim, Muhammad Zeeshan Arshad, Seong Jong Yoo, Je Hyeong Hong, Jinwook Kim, Young Min Kim

Responsive image

Auto-TLDR; Optimizing 3D Configurations for Stable Pottery Restoration from irregular and noisy evidence

Slides Poster Similar

While potteries are common artifacts excavated in archaeological sites, the restoration process relies on the manual cleaning and reassembling shattered pieces. Since the number of possible 3D configurations is considerably large, the exhaustive manual trial may result in an abrasion on fractured surfaces and even failure to find the correct matches. As a result, many recent works suggest virtual reassembly from 3D scans of the fragments. The problem is challenging in the view of the conventional 3D geometric analysis, as it is hard to extract reliable shape features from the thin break lines. We propose to optimize the global configuration by combining geometric constraints with information from noisy shape features. Specifically, we enforce bijection and continuity of sequence of correspondences given estimates of corners and pair-wise matching scores between multiple break lines. We demonstrate that our pipeline greatly increases the accuracy of correspondences, resulting in the stable restoration of 3D configurations from irregular and noisy evidence.

Robust Skeletonization for Plant Root Structure Reconstruction from MRI

Jannis Horn

Responsive image

Auto-TLDR; Structural reconstruction of plant roots from MRI using semantic root vs shoot segmentation and 3D skeletonization

Slides Poster Similar

Structural reconstruction of plant roots from MRI is challenging, because of low resolution and low signal-to-noise ratio of the 3D measurements which may lead to disconnectivities and wrongly connected roots. We propose a two-stage approach for this task. The first stage is based on semantic root vs. soil segmentation and finds lowest-cost paths from any root voxel to the shoot. The second stage takes the largest fully connected component generated in the first stage and uses 3D skeletonization to extract a graph structure. We evaluate our method on 22 MRI scans and compare to human expert reconstructions.

Fast Blending of Planar Shapes Based on Invariant Invertible and Stable Descriptors

Emna Ghorbel, Faouzi Ghorbel, Ines Sakly, Slim Mhiri

Responsive image

Auto-TLDR; Fined-Fourier-based Invariant Descriptor for Planar Shape Blending

Slides Poster Similar

In this paper, a novel method for blending planar shapes is introduced. This approach is based on the Fined-Fourier-based Invariant Descriptor (Fined-FID) that is invertible, invariant under Euclidean transformations and stable. Our approach extracts the Fined-FID from the two shapes of interest (the source and the target ones). Then, the extracted descriptors are averaged enabling the calculation of intermediate descriptors. Finally, thanks to the inversion criterion, the intermediate shapes are easily recovered by applying the inverse analytical expression to these intermediate descriptors. Compared to previous works, the Fined-FID-based morphing avoid the usual registration step, generates naturally closed intermediate contours and ensure invariance under Euclidean transformations and invariance to the starting point, while being computationally efficient (almost-linear complexity). The performed experiments show the performance of the proposed blending approach with respect to curvature-based methods.

Weakly Supervised Geodesic Segmentation of Egyptian Mummy CT Scans

Avik Hati, Matteo Bustreo, Diego Sona, Vittorio Murino, Alessio Del Bue

Responsive image

Auto-TLDR; A Weakly Supervised and Efficient Interactive Segmentation of Ancient Egyptian Mummies CT Scans Using Geodesic Distance Measure and GrabCut

Slides Poster Similar

In this paper, we tackle the task of automatically analyzing 3D volumetric scans obtained from computed tomography (CT) devices. In particular, we address a particular task for which data is very limited: the segmentation of ancient Egyptian mummies CT scans. We aim at digitally unwrapping the mummy and identify different segments such as body, bandages and jewelry. The problem is complex because of the lack of annotated data for the different semantic regions to segment, thus discouraging the use of strongly supervised approaches. We, therefore, propose a weakly supervised and efficient interactive segmentation method to solve this challenging problem. After segmenting the wrapped mummy from its exterior region using histogram analysis and template matching, we first design a voxel distance measure to find an approximate solution for the body and bandage segments. Here, we use geodesic distances since voxel features as well as spatial relationship among voxels is incorporated in this measure. Next, we refine the solution using a GrabCut based segmentation together with a tracking method on the slices of the scan that assigns labels to different regions in the volume, using limited supervision in the form of scribbles drawn by the user. The efficiency of the proposed method is demonstrated using visualizations and validated through quantitative measures and qualitative unwrapping of the mummy.

Nonlinear Ranking Loss on Riemannian Potato Embedding

Byung Hyung Kim, Yoonje Suh, Honggu Lee, Sungho Jo

Responsive image

Auto-TLDR; Riemannian Potato for Rank-based Metric Learning

Slides Poster Similar

We propose a rank-based metric learning method by leveraging a concept of the Riemannian Potato for better separating non-linear data. By exploring the geometric properties of Riemannian manifolds, the proposed loss function optimizes the measure of dispersion using the distribution of Riemannian distances between a reference sample and neighbors and builds a ranked list according to the similarities. We show the proposed function can learn a hypersphere for each class, preserving the similarity structure inside it on Riemannian manifold. As a result, compared with Euclidean distance-based metric, our method can further jointly reduce the intra-class distances and enlarge the inter-class distances for learned features, consistently outperforming state-of-the-art methods on three widely used non-linear datasets.

RISEdb: A Novel Indoor Localization Dataset

Carlos Sanchez Belenguer, Erik Wolfart, Álvaro Casado Coscollá, Vitor Sequeira

Responsive image

Auto-TLDR; Indoor Localization Using LiDAR SLAM and Smartphones: A Benchmarking Dataset

Slides Poster Similar

In this paper we introduce a novel public dataset for developing and benchmarking indoor localization systems. We have selected and 3D mapped a set of representative indoor environments including a large office building, a conference room, a workshop, an exhibition area and a restaurant. Our acquisition pipeline is based on a portable LiDAR SLAM backpack to map the buildings and to accurately track the pose of the user as it moves freely inside them. We introduce the calibration procedures that enable us to acquire and geo-reference live data coming from different independent sensors rigidly attached to the backpack. This has allowed us to collect long sequences of spherical and stereo images, together with all the sensor readings coming from a consumer smartphone and locate them inside the map with centimetre accuracy. The dataset addresses many of the limitations of existing indoor localization datasets regarding the scale and diversity of the mapped buildings; the number of acquired sequences under varying conditions; the accuracy of the ground-truth trajectory; the availability of a detailed 3D model and the availability of different sensor types. It enables the benchmarking of existing and the development of new indoor localization approaches, in particular for deep learning based systems that require large amounts of labeled training data.

One Step Clustering Based on A-Contrario Framework for Detection of Alterations in Historical Violins

Alireza Rezaei, Sylvie Le Hégarat-Mascle, Emanuel Aldea, Piercarlo Dondi, Marco Malagodi

Responsive image

Auto-TLDR; A-Contrario Clustering for the Detection of Altered Violins using UVIFL Images

Slides Poster Similar

Preventive conservation is an important practice in Cultural Heritage. The constant monitoring of the state of conservation of an artwork helps us reduce the risk of damage and number of interventions necessary. In this work, we propose a probabilistic approach for the detection of alterations on the surface of historical violins based on an a-contrario framework. Our method is a one step NFA clustering solution which considers grey-level and spatial density information in one background model. The proposed method is robust to noise and avoids parameter tuning and any assumption about the quantity of the worn out areas. We have used as input UV induced fluorescence (UVIFL) images for considering details not perceivable with visible light. Tests were conducted on image sequences included in the ``Violins UVIFL imagery'' dataset. Results illustrate the ability of the algorithm to distinguish the worn area from the surrounding regions. Comparisons with the state of the art clustering methods shows improved overall precision and recall.

Incorporating a Graph-Matching Algorithm into a Muscle Mechanics Model

Jose Luis Santacruz Muñoz, Francesc Serratosa

Responsive image

Auto-TLDR; Recomputing the Mesh Grid for Differential Models of the Muscle Mechanics

Slides Poster Similar

Differential models for the simulation of the muscle mechanics are based on iteratively updating a mesh grid and deducing its new state through a finite element model. Models usually assume that the mesh grid is almost regular, and this makes a degradation of the simulation accuracy in long simulation sequences, since the mesh tends to be less regular when the number of iterations increases. We present a model that has the aim of reducing this accuracy degradation. It is based on recomputing the mesh grid returned by the model in each iteration through the concept of graph matching. The new model is currently in use to analyse the dynamics of the human heart when some pressure is applied to it. The final goal of the project (which is not shown in this paper) is to deduce the optimal position and strength pressure applied to the heart that increases the chance of reviving it with the minimum tissue damage. Experimental validation shows our model returns a higher accuracy of the muscle position through some iterations than classical differential models with an insignificant increase of runtime. Thus, it is worth recomputing the mesh grid since the simulation accuracy drastically increases at the expense of a low runtime increase.

3D Semantic Labeling of Photogrammetry Meshes Based on Active Learning

Mengqi Rong, Shuhan Shen, Zhanyi Hu

Responsive image

Auto-TLDR; 3D Semantic Expression of Urban Scenes Based on Active Learning

Slides Poster Similar

As different urban scenes are similar but still not completely consistent, coupled with the complexity of labeling directly in 3D, high-level understanding of 3D scenes has always been a tricky problem. In this paper, we propose a procedural approach for 3D semantic expression of urban scenes based on active learning. We first start with a small labeled image set to fine-tune a semantic segmentation network and then project its probability map onto a 3D mesh model for fusion, finally outputs a 3D semantic mesh model in which each facet has a semantic label and a heat model showing each facet’s confidence. Our key observation is that our algorithm is iterative, in each iteration, we use the output semantic model as a supervision to select several valuable images for annotation to co-participate in the fine-tuning for overall improvement. In this way, we reduce the workload of labeling but not the quality of 3D semantic model. Using urban areas from two different cities, we show the potential of our method and demonstrate its effectiveness.

PointSpherical: Deep Shape Context for Point Cloud Learning in Spherical Coordinates

Hua Lin, Bin Fan, Yongcheng Liu, Yirong Yang, Zheng Pan, Jianbo Shi, Chunhong Pan, Huiwen Xie

Responsive image

Auto-TLDR; Spherical Hierarchical Modeling of 3D Point Cloud

Slides Poster Similar

We propose Spherical Hierarchical modeling of 3D point cloud. Inspired by Shape Context, we design a receptive field on each 3D point by placing a spherical coordinate on it. We sample points using the furthest point method and creating overlapping balls of points. For each ball, we divide the space into radial, polar angular and azimuthal angular bins on which we form a Spherical Hierarchy. We apply 1x1 CNN convolution on points to start the initial feature extraction. Repeated 3D CNN and max pooling over the Spherical bins propagate contextual information until all the information is condensed in the center bin. Extensive experiments on five datasets strongly evidence that our method outperform current models on various Point Cloud Learning tasks, including 2D/3D shape classification, 3D part segmentation and 3D semantic segmentation.

Human Segmentation with Dynamic LiDAR Data

Tao Zhong, Wonjik Kim, Masayuki Tanaka, Masatoshi Okutomi

Responsive image

Auto-TLDR; Spatiotemporal Neural Network for Human Segmentation with Dynamic Point Clouds

Slides Similar

Consecutive LiDAR scans and depth images compose dynamic 3D sequences, which contain more abundant spatiotemporal information than a single frame. Similar to the development history of image and video perception, dynamic 3D sequence perception starts to come into sight after inspiring research on static 3D data perception. This work proposes a spatiotemporal neural network for human segmentation with the dynamic LiDAR point clouds. It takes a sequence of depth images as input. It has a two-branch structure, i.e., the spatial segmentation branch and the temporal velocity estimation branch. The velocity estimation branch is designed to capture motion cues from the input sequence and then propagates them to the other branch. So that the segmentation branch segments humans according to both spatial and temporal features. These two branches are jointly learned on a generated dynamic point cloud data set for human recognition. Our works fill in the blank of dynamic point cloud perception with the spherical representation of point cloud and achieves high accuracy. The experiments indicate that the introduction of temporal feature benefits the segmentation of dynamic point cloud perception.

Interpolation in Auto Encoders with Bridge Processes

Carl Ringqvist, Henrik Hult, Judith Butepage, Hedvig Kjellstrom

Responsive image

Auto-TLDR; Stochastic interpolations from auto encoders trained on flattened sequences

Slides Poster Similar

Auto encoding models have been extensively studied in recent years. They provide an efficient framework for sample generation, as well as for analysing feature learning. Furthermore, they are efficient in performing interpolations between data-points in semantically meaningful ways. In this paper, we introduce a method for generating sequence samples from auto encoders trained on flattened sequences (e.g video sample from auto encoders trained to generate a video frame); as well as a canonical, dimension independent method for generating stochastic interpolations. The distribution of interpolation paths is represented as the distribution of a bridge process constructed from an artificial random data generating process in the latent space, having the prior distribution as its invariant distribution.

Generic Merging of Structure from Motion Maps with a Low Memory Footprint

Gabrielle Flood, David Gillsjö, Patrik Persson, Anders Heyden, Kalle Åström

Responsive image

Auto-TLDR; A Low-Memory Footprint Representation for Robust Map Merge

Slides Poster Similar

With the development of cheap image sensors, the amount of available image data have increased enormously, and the possibility of using crowdsourced collection methods has emerged. This calls for development of ways to handle all these data. In this paper, we present new tools that will enable efficient, flexible and robust map merging. Assuming that separate optimisations have been performed for the individual maps, we show how only relevant data can be stored in a low memory footprint representation. We use these representations to perform map merging so that the algorithm is invariant to the merging order and independent of the choice of coordinate system. The result is a robust algorithm that can be applied to several maps simultaneously. The result of a merge can also be represented with the same type of low-memory footprint format, which enables further merging and updating of the map in a hierarchical way. Furthermore, the method can perform loop closing and also detect changes in the scene between the capture of the different image sequences. Using both simulated and real data — from both a hand held mobile phone and from a drone — we verify the performance of the proposed method.

Encoding Brain Networks through Geodesic Clustering of Functional Connectivity for Multiple Sclerosis Classification

Muhammad Abubakar Yamin, Valsasina Paola, Michael Dayan, Sebastiano Vascon, Tessadori Jacopo, Filippi Massimo, Vittorio Murino, A Rocca Maria, Diego Sona

Responsive image

Auto-TLDR; Geodesic Clustering of Connectivity Matrices for Multiple Sclerosis Classification

Slides Poster Similar

An important task in brain connectivity research is the classification of patients from healthy subjects. In this work, we present a two-step mathematical framework allowing to discriminate between two groups of people with an application to multiple sclerosis. The proposed approach exploits the properties of the connectivity matrices determined using the covariances between signals of a fixed set of brain areas. These positive semi-definite matrices lay on a Riemannian manifold, allowing to use a geodesic distance defined on this space. In order to generate a vector representation useful for classification purpose, but still preserving the network structures, we encoded the data exploiting the network attractors determined by a geodesic clustering of connectivity matrices. Then clustering centroids were used as a dictionary allowing to encode subject’s connectivity matrices as a vector of geodesic distances. A Linear Support Vector Machine was then used to perform classification between subjects. To demonstrate the advantage of using geodesic metrics in this framework, we conducted the same analysis using Euclidean metric. Experimental results validate the fact that employing geodesic metric in this framework leads to a higher classification performance, whereas with Euclidean metric performance was suboptimal.

Unconstrained Vision Guided UAV Based Safe Helicopter Landing

Arindam Sikdar, Abhimanyu Sahu, Debajit Sen, Rohit Mahajan, Ananda Chowdhury

Responsive image

Auto-TLDR; Autonomous Helicopter Landing in Hazardous Environments from Unmanned Aerial Images Using Constrained Graph Clustering

Slides Poster Similar

In this paper, we have addressed the problem of automated detection of safe zone(s) for helicopter landing in hazardous environments from images captured by an Unmanned Aerial Vehicle (UAV). The unconstrained motion of the image capturing drone (the UAV in our case) makes the problem further difficult. The solution pipeline consists of natural landmark detection and tracking, stereo-pair generation using constrained graph clustering, digital terrain map construction and safe landing zone detection. The main methodological contribution lies in mathematically formulating epipolar constraint and then using it in a Minimum Spanning Tree (MST) based graph clustering approach. We have also made publicly available AHL (Autonomous Helicopter Landing) dataset, a new aerial video dataset captured by a drone, with annotated ground-truths. Experimental comparisons with other competing clustering methods i) in terms of Dunn Index and Davies Bouldin Index as well as ii) for frame-level safe zone detection in terms of F-measure and confusion matrix clearly demonstrate the effectiveness of the proposed formulation.

Vesselness Filters: A Survey with Benchmarks Applied to Liver Imaging

Jonas Lamy, Odyssée Merveille, Bertrand Kerautret, Nicolas Passat, Antoine Vacavant

Responsive image

Auto-TLDR; Comparison of Vessel Enhancement Filters for Liver Vascular Network Segmentation

Slides Poster Similar

The accurate knowledge of vascular network geometry is crucial for many clinical applications such as cardiovascular disease diagnosis and surgery planning. Vessel enhancement algorithms are often a key step to improve the robustness of vessel segmentation. A wide variety of enhancement filters exists in the literature, but they are often difficult to compare as the applications and datasets differ from a paper to another and the code is rarely available. In this article, we compare seven vessel enhancement filters covering the last twenty years literature in a unique common framework. We focus our study on the liver vascular network which is under-represented in the literature. The evaluation is made from three points of view: in the whole liver, in the vessel neighborhood and near the bifurcations. The study is performed on two publicly available datasets: the Ircad dataset (CT images) and the VascuSynth dataset adapted for MRI simulation. We discuss the strengths and weaknesses of each method in the hepatic context. In addition, the benchmark framework including a C++ implementation of each compared method is provided. An online demonstration ensures the reproducibility of the results without requiring any additional software.

Two-Stage Adaptive Object Scene Flow Using Hybrid CNN-CRF Model

Congcong Li, Haoyu Ma, Qingmin Liao

Responsive image

Auto-TLDR; Adaptive object scene flow estimation using a hybrid CNN-CRF model and adaptive iteration

Slides Poster Similar

Scene flow estimation based on stereo sequences is a comprehensive task relevant to disparity and optical flow. Some existing methods are time-consuming and often fail in the presence of reflective surfaces. In this paper, we propose a two-stage adaptive object scene flow estimation method using a hybrid CNN-CRF model (ACOSF), which benefits from high-quality features and the structured modelling capability. Meanwhile, in order to balance the computational efficiency and accuracy, we employ adaptive iteration for energy function optimization, which is flexible and efficient for various scenes. Besides, we utilize high-quality pixel selection to reduce the computation time with only a slight decrease in accuracy. Our method achieves competitive results with the state-of-the-art, which ranks second on the challenging KITTI 2015 scene flow benchmark.

A Two-Step Approach to Lidar-Camera Calibration

Yingna Su, Yaqing Ding, Jian Yang, Hui Kong

Responsive image

Auto-TLDR; Closed-Form Calibration of Lidar-camera System for Ego-motion Estimation and Scene Understanding

Slides Poster Similar

Autonomous vehicles and robots are typically equipped with Lidar and camera. Hence, calibrating the Lidar-camera system is of extreme importance for ego-motion estimation and scene understanding. In this paper, we propose a two-step approach (coarse + fine) for the external calibration between a camera and a multiple-line Lidar. First, a new closed-form solution is proposed to obtain the initial calibration parameters. We compare our solution with the state-of-the-art SVD-based algorithm, and show the benefits of both the efficiency and stability. With the initial calibration parameters, the ICP-based calibration framework is used to register the point clouds which extracted from the camera and Lidar coordinate frames, respectively. Our method has been applied to two Lidar-camera systems: an HDL-64E Lidar-camera system, and a VLP-16 Lidar-camera system. Experimental results demonstrate that our method achieves promising performance and higher accuracy than two open-source methods.

Tensor Factorization of Brain Structural Graph for Unsupervised Classification in Multiple Sclerosis

Berardino Barile, Marzullo Aldo, Claudio Stamile, Françoise Durand-Dubief, Dominique Sappey-Marinier

Responsive image

Auto-TLDR; A Fully Automated Tensor-based Algorithm for Multiple Sclerosis Classification based on Structural Connectivity Graph of the White Matter Network

Slides Poster Similar

Analysis of longitudinal changes in brain diseases is essential for a better characterization of pathological processes and evaluation of the prognosis. This is particularly important in Multiple Sclerosis (MS) which is the first traumatic disease in young adults, with unknown etiology and characterized by complex inflammatory and degenerative processes leading to different clinical courses. In this work, we propose a fully automated tensor-based algorithm for the classification of MS clinical forms based on the structural connectivity graph of the white matter (WM) network. Using non-negative tensor factorization (NTF), we first focused on the detection of pathological patterns of the brain WM network affected by significant longitudinal variations. Second, we performed unsupervised classification of different MS phenotypes based on these longitudinal patterns, and finally, we used the latent factors obtained by the factorization algorithm to identify the most affected brain regions.

A Novel Computer-Aided Diagnostic System for Early Assessment of Hepatocellular Carcinoma

Ahmed Alksas, Mohamed Shehata, Gehad Saleh, Ahmed Shaffie, Ahmed Soliman, Mohammed Ghazal, Hadil Abukhalifeh, Abdel Razek Ahmed, Ayman El-Baz

Responsive image

Auto-TLDR; Classification of Liver Tumor Lesions from CE-MRI Using Structured Structural Features and Functional Features

Slides Poster Similar

Early assessment of liver cancer patients with hepatocellular carcinoma (HCC) is of immense importance to provide the proper treatment plan. In this paper, we have developed a two-stage classification computer-aided diagnostic (CAD) system that has the ability to detect and grade the liver observations from multiphase contrast enhanced magnetic resonance imaging (CE-MRI). The proposed approach consists of three main steps. First, a pre-processing is applied to the CE-MRI scans to delineate the tumor lesions that will be used as an ROI across the four different phases of the CE-MRI, (namely, the pre-contrast, late-arterial, portal-venous, and delayed-contrast). Second, a group of three features are modeled to provide a quantitative discrimination between the tumor lesions; namely: i) the tumor appearance that is modeled using a set of texture features, (namely; the first-order histogram, second-order gray-level co-occurrence matrix, and second-order gray-level run-length matrix), to capture any discrimination that may appear in the lesion texture, ii) the spherical harmonics (SH) based shape features that have the ability to describe the shape complexity of the liver tumors, and iii) the functional features that are based on the calculation of the wash-in/wash-out through that evaluate the intensity changes across the post-contrast phases. Finally, the aforementioned individual features were then integrated together to obtain the combined features to be fed to a machine learning classifier towards getting the final diagnostic decision. The proposed CAD system has been tested using hepatic observations that was obtained from 85 participating patients, 34 patients with benign tumors, 34 patients with intermediate tumors and 34 with malignant tumors. Using a random forests based classifier with a leave-one-subject-out (LOSO) cross-validation, the developed CAD system achieved an 87.1% accuracy in distinguishing the malignant, intermediate and benign tumors. The classification performance is then evaluated using k-fold (5/10-fold) cross-validation approach to examine the robustness of the system. The LR-1 lesions were classified from LR-2 benign lesions with 91.2% accuracy, while 85.3% accuracy was achieved differentiating between LR-4 and LR-5 malignant tumors. The obtained results hold a promise of the proposed framework to be reliably used as a noninvasive diagnostic tool for the early detection and grading of liver cancer tumors.

Learning to Implicitly Represent 3D Human Body from Multi-Scale Features and Multi-View Images

Zhongguo Li, Magnus Oskarsson, Anders Heyden

Responsive image

Auto-TLDR; Reconstruction of 3D human bodies from multi-view images using multi-stage end-to-end neural networks

Slides Poster Similar

Reconstruction of 3D human bodies, from images, faces many challenges, due to it generally being an ill-posed problem. In this paper we present a method to reconstruct 3D human bodies from multi-view images, through learning an implicit function to represent 3D shape, based on multi-scale features extracted by multi-stage end-to-end neural networks. Our model consists of several end-to-end hourglass networks for extracting multi-scale features from multi-view images, and a fully connected network for implicit function classification from these features. Given a 3D point, it is projected to multi-view images and these images are fed into our model to extract multi-scale features. The scales of features extracted by the hourglass networks decrease with the depth of our model, which represents the information from local to global scale. Then, the multi-scale features as well as the depth of the 3D point are combined to a new feature vector and the fully connected network classifies the feature vector, in order to predict if the point lies inside or outside of the 3D mesh. The advantage of our method is that we use both local and global features in the fully connected network and represent the 3D mesh by an implicit function, which is more memory-efficient. Experiments on public datasets demonstrate that our method surpasses previous approaches in terms of the accuracy of 3D reconstruction of human bodies from images.

Graph Approximations to Geodesics on Metric Graphs

Robin Vandaele, Yvan Saeys, Tijl De Bie

Responsive image

Auto-TLDR; Topological Pattern Recognition of Metric Graphs Using Proximity Graphs

Slides Poster Similar

In machine learning, high-dimensional point clouds are often assumed to be sampled from a topological space of which the intrinsic dimension is significantly lower than the representation dimension. Proximity graphs, such as the Rips graph or kNN graph, are often used as an intermediate representation to learn or visualize topological and geometrical properties of this space. The key idea behind this approach is that distances on the graph preserve the geodesic distances on the unknown space well, and as such, can be used to infer local and global geometric patterns of this space. Prior results provide us with conditions under which these distances are well-preserved for geodesically convex, smooth, compact manifolds. Yet, proximity graphs are ideal representations for a much broader class of spaces, such as metric graphs, i.e., graphs embedded in the Euclidean space. It turns out—as proven in this paper—that these existing conditions cannot be straightforwardly adapted to these spaces. In this work, we provide novel, flexible, and insightful characteristics and results for topological pattern recognition of metric graphs to bridge this gap.

Self-Supervised Detection and Pose Estimation of Logistical Objects in 3D Sensor Data

Nikolas Müller, Jonas Stenzel, Jian-Jia Chen

Responsive image

Auto-TLDR; A self-supervised and fully automated deep learning approach for object pose estimation using simulated 3D data

Slides Poster Similar

Localization of objects in cluttered scenes with machine learning methods is a fairly young research area. Despite the high potential of object localization for full process automation in Industry 4.0 and logistical environments, 3D data sets for such applications to train machine learning models are not openly available and less publications have been made on that topic. To the authors knowledge, this is the first publication that describes a self-supervised and fully automated deep learning approach for object pose estimation using simulated 3D data. The solution covers the simulated generation of training data, the detection of objects in point clouds using a fully convolutional feedforward network and the computation of the pose for each detected object instance.

A Globally Optimal Method for the PnP Problem with MRP Rotation Parameterization

Manolis Lourakis, George Terzakis

Responsive image

Auto-TLDR; A Direct least squares, algebraic PnP solver with modified Rodrigues parameters

Poster Similar

The perspective-n-point (PnP) problem is of fundamental importance in computer vision. A global optimality condition for PnP that is independent of a particular rotation parameterization was recently developed by Nakano. This paper puts forward a direct least squares, algebraic PnP solution that extends Nakano's work by combining his optimality condition with the modified Rodrigues parameters (MRPs) for parameterizing rotation. The result is a system of polynomials that is solved using the Groebner basis approach. An MRP vector has twice the rotational range of the classical Rodrigues (i.e., Cayley) vector used by Nakano to represent rotation. The proposed solver provides strong guarantees that the full rotation singularity associated with MRPs is avoided. Furthermore, detailed experiments provide evidence that our solver attains accuracy that is indistinguishable from Nakano's Cayley-based solution with a moderate increase in computational cost.

PIF: Anomaly detection via preference embedding

Filippo Leveni, Luca Magri, Giacomo Boracchi, Cesare Alippi

Responsive image

Auto-TLDR; PIF: Anomaly Detection with Preference Embedding for Structured Patterns

Slides Poster Similar

We address the problem of detecting anomalies with respect to structured patterns. To this end, we conceive a novel anomaly detection method called PIF, that combines the advantages of adaptive isolation methods with the flexibility of preference embedding. Specifically, we propose to embed the data in a high dimensional space where an efficient tree-based method, PI-FOREST, is employed to compute an anomaly score. Experiments on synthetic and real datasets demonstrate that PIF favorably compares with state-of-the-art anomaly detection techniques, and confirm that PI-FOREST is better at measuring arbitrary distances and isolate points in the preference space.

Joint Supervised and Self-Supervised Learning for 3D Real World Challenges

Antonio Alliegro, Davide Boscaini, Tatiana Tommasi

Responsive image

Auto-TLDR; Self-supervision for 3D Shape Classification and Segmentation in Point Clouds

Slides Similar

Point cloud processing and 3D shape understanding are very challenging tasks for which deep learning techniques have demonstrated great potentials. Still further progresses are essential to allow artificial intelligent agents to interact with the real world. In many practical conditions the amount of annotated data may be limited and integrating new sources of knowledge becomes crucial to support autonomous learning. Here we consider several scenarios involving synthetic and real world point clouds where supervised learning fails due to data scarcity and large domain gaps. We propose to enrich standard feature representations by leveraging self-supervision through a multi-task model that can solve a 3D puzzle while learning the main task of shape classification or part segmentation. An extensive analysis investigating few-shot, transfer learning and cross-domain settings shows the effectiveness of our approach with state-of-the-art results for 3D shape classification and part segmentation.

Deep Space Probing for Point Cloud Analysis

Yirong Yang, Bin Fan, Yongcheng Liu, Hua Lin, Jiyong Zhang, Xin Liu, 蔡鑫宇 蔡鑫宇, Shiming Xiang, Chunhong Pan

Responsive image

Auto-TLDR; SPCNN: Space Probing Convolutional Neural Network for Point Cloud Analysis

Slides Poster Similar

3D points distribute in a continuous 3D space irregularly, thus directly adapting 2D image convolution to 3D points is not an easy job. Previous works often artificially divide the space into regular grids, yet it could be suboptimal to learn geometry. In this paper, we propose SPCNN, namely, Space Probing Convolutional Neural Network, which naturally generalizes image CNN to deal with point clouds. The key idea of SPCNN is learning to probe the 3D space in an adaptive manner. Specifically, we define a pool of learnable convolutional weights, and let each point in the local region learn to choose a suitable convolutional weight from the pool. This is achieved by constructing a geometry guided index-mapping function that implicitly establishes a correspondence between convolutional weights and some local regions in the neighborhood (Fig. 1). In this way, the index-mapping function learns to adaptively partition nearby space for local geometry pattern recognition. With this convolution as a basic operator, SPCNN, a hierarchical architecture can be developed for effective point cloud analysis. Extensive experiments on challenging benchmarks across three tasks demonstrate that SPCNN achieves the state-of-the-art or has competitive performance.

NetCalib: A Novel Approach for LiDAR-Camera Auto-Calibration Based on Deep Learning

Shan Wu, Amnir Hadachi, Damien Vivet, Yadu Prabhakar

Responsive image

Auto-TLDR; Automatic Calibration of LiDAR and Cameras using Deep Neural Network

Slides Poster Similar

A fusion of LiDAR and cameras have been widely used in many robotics applications such as classification, segmentation, object detection, and autonomous driving. It is essential that the LiDAR sensor can measure distances accurately, which is a good complement to the cameras. Hence, calibrating sensors before deployment is a mandatory step. The conventional methods include checkerboards, specific patterns, or human labeling, which is trivial and human-labor extensive if we do the same calibration process every time. The main propose of this research work is to build a deep neural network that is capable of automatically finding the geometric transformation between LiDAR and cameras. The results show that our model manages to find the transformations from randomly sampled artificial errors. Besides, our work is open-sourced for the community to fully utilize the advances of the methodology for developing more the approach, initiating collaboration, and innovation in the topic.

Map-Based Temporally Consistent Geolocalization through Learning Motion Trajectories

Bing Zha, Alper Yilmaz

Responsive image

Auto-TLDR; Exploiting Motion Trajectories for Geolocalization of Object on Topological Map using Recurrent Neural Network

Slides Poster Similar

In this paper, we propose a novel trajectory learning method that exploits motion trajectories on topological map using recurrent neural network for temporally consistent geolocalization of object. Inspired by human's ability to both be aware of distance and direction of self-motion in navigation, our trajectory learning method learns a pattern representation of trajectories encoded as a sequence of distances and turning angles to assist self-localization. We pose the learning process as a conditional sequence prediction problem in which each output locates the object on a traversable edge in a map. Considering the prediction sequence ought to be topologically connected in the graph-structured map, we adopt two different hypotheses generation and elimination strategies to eliminate disconnected sequence prediction. We demonstrate our approach on the KITTI stereo visual odometry dataset which is a city-scale environment. The key benefits of our approach to geolocalization are that 1) we take advantage of powerful sequence modeling ability of recurrent neural network and its robustness to noisy input, 2) only require a map in the form of a graph and 3) simply use an affordable sensor that generates motion trajectory. The experiments show that the motion trajectories can be learned by training an recurrent neural network, and temporally consistent geolocation can be predicted with both of the proposed strategies.

Movement-Induced Priors for Deep Stereo

Yuxin Hou, Muhammad Kamran Janjua, Juho Kannala, Arno Solin

Responsive image

Auto-TLDR; Fusing Stereo Disparity Estimation with Movement-induced Prior Information

Slides Poster Similar

We propose a method for fusing stereo disparity estimation with movement-induced prior information. Instead of independent inference frame-by-frame, we formulate the problem as a non-parametric learning task in terms of a temporal Gaussian process prior with a movement-driven kernel for inter-frame reasoning. We present a hierarchy of three Gaussian process kernels depending on the availability of motion information, where our main focus is on a new gyroscope-driven kernel for handheld devices with low-quality MEMS sensors, thus also relaxing the requirement of having full 6D camera poses available. We show how our method can be combined with two state-of-the-art deep stereo methods. The method either work in a plug-and-play fashion with pre-trained deep stereo networks, or further improved by jointly training the kernels together with encoder--decoder architectures, leading to consistent improvement.

Vehicle Classification from Profile Measures

Marco Patanè, Andrea Fusiello

Responsive image

Auto-TLDR; SliceNets: Convolutional Neural Networks for 3D Object Classification of Planar Slices

Slides Similar

This paper proposes two novel convolutional neural networks for 3D object classification, tailored to process point clouds that are composed of planar slices (profiles). In particular, the application that we are targeting is the classification of vehicles by scanning them along planes perpendicular to the driving direction, within the context of Electronic Toll Collection. Depending on sensors configurations, the distance between slices can be measured or not, thus resulting in two types of point clouds, namely metric and non-metric. In the latter case, two coordinates are indeed metric but the third one is merely a temporal index. Our networks, named SliceNets, extract metric information from the spatial coordinates and neighborhood information from the third one (either metric or temporal), thus being able to handle both types of point clouds. Experiments on two datasets collected in the field show the effectiveness of our networks in comparison with state-of-the-art ones.

Rotational Adjoint Methods for Learning-Free 3D Human Pose Estimation from IMU Data

Caterina Emilia Agelide Buizza, Yiannis Demiris

Responsive image

Auto-TLDR; Learning-free 3D Human Pose Estimation from Inertial Measurement Unit Data

Poster Similar

We present a new framework for learning-free 3D human pose estimation from Inertial Measurement Unit (IMU) data. The proposed method does not rely on a full motion sequence to calculate a pose for any particular time point and thus can operate in real-time. A cost function based only on joint rotations is used, removing the need for frequent transformations between rotations and 3D Cartesian coordinates. A Jacobian that preserves skeleton structure is derived using Adjoint methods from Variational Data Assimilation. To facilitate further research in IMU-based Motion Capture, we provide a dataset that combines RGB and depth images from an Intel RealSense camera, marker-based motion capture from an Optitrack system and Xsens IMU data. We have evaluated our method on both our dataset and the Total Capture dataset, showing an average error across 24 joints of 0.45 and 0.48 radians respectively.

Cross-Regional Attention Network for Point Cloud Completion

Hang Wu, Yubin Miao

Responsive image

Auto-TLDR; Learning-based Point Cloud Repair with Graph Convolution

Slides Poster Similar

Point clouds obtained from real word scanning are always incomplete and ununiformly distributed, which would cause structural losses in 3D shape representations. Therefore, a learning-based method is introduced in this paper to repair partial point clouds and restore the complete shapes of target objects. First, we design an encoder that takes both local features and global features into consideration. Second, we establish a graph to connect the local features together, and then implement graph convolution with multi-head attention on it. The graph enables each local feature vector to search across the regions and selectively absorb other local features based on the its own features and global features. Third, we design a coarse decoder to collect cross-region features from the graph and generate coarse point clouds with low resolution, and a folding-based decoder to generate fine point clouds with high resolution. Our network is trained on six categories of objects in the ModelNet dataset, and its performance is compared with several existing methods, the results show that our network is able to generate dense complete point cloud with the highest accuracy.