3D Pots Configuration System by Optimizing Over Geometric Constraints

Jae Eun Kim, Muhammad Zeeshan Arshad, Seong Jong Yoo, Je Hyeong Hong, Jinwook Kim, Young Min Kim

Responsive image

Auto-TLDR; Optimizing 3D Configurations for Stable Pottery Restoration from irregular and noisy evidence

Slides Poster

While potteries are common artifacts excavated in archaeological sites, the restoration process relies on the manual cleaning and reassembling shattered pieces. Since the number of possible 3D configurations is considerably large, the exhaustive manual trial may result in an abrasion on fractured surfaces and even failure to find the correct matches. As a result, many recent works suggest virtual reassembly from 3D scans of the fragments. The problem is challenging in the view of the conventional 3D geometric analysis, as it is hard to extract reliable shape features from the thin break lines. We propose to optimize the global configuration by combining geometric constraints with information from noisy shape features. Specifically, we enforce bijection and continuity of sequence of correspondences given estimates of corners and pair-wise matching scores between multiple break lines. We demonstrate that our pipeline greatly increases the accuracy of correspondences, resulting in the stable restoration of 3D configurations from irregular and noisy evidence.

Similar papers

Computing Stable Resultant-Based Minimal Solvers by Hiding a Variable

Snehal Bhayani, Zuzana Kukelova, Janne Heikkilä

Responsive image

Auto-TLDR; Sparse Permian-Based Method for Solving Minimal Systems of Polynomial Equations

Slides Similar

Many computer vision applications require robust and efficient estimation of camera geometry. The robust estimation is usually based on solving camera geometry problems from a minimal number of input data measurements, i.e., solving minimal problems, in a RANSAC-style framework. Minimal problems often result in complex systems of polynomial equations. The existing state-of-the-art methods for solving such systems are either based on Groebner Basis and the action matrix method, which have been extensively studied and optimized in the recent years or recently proposed approach based on a resultant computation using an extra variable. In this paper, we study an interesting alternative resultant-based method for solving sparse systems of polynomial equations by hiding one variable. This approach results in a larger eigenvalue problem than the action matrix and extra variable resultant-based methods; however, it does not need to compute an inverse or elimination of large matrices that may be numerically unstable. The proposed approach includes several improvements to the standard sparse resultant algorithms, which significantly improves the efficiency and stability of the hidden variable resultant-based solvers as we demonstrate on several interesting computer vision problems. We show that for the studied problems, our sparse resultant based approach leads to more stable solvers than the state-of-the-art Groebner Basis as well as existing resultant-based solvers, especially in close to critical configurations. Our new method can be fully automated and incorporated into existing tools for the automatic generation of efficient minimal solvers.

Motion Segmentation with Pairwise Matches and Unknown Number of Motions

Federica Arrigoni, Tomas Pajdla, Luca Magri

Responsive image

Auto-TLDR; Motion Segmentation using Multi-Modelfitting andpermutation synchronization

Slides Poster Similar

In this paper we address motion segmentation, that is the problem of clustering points in multiple images according to a number of moving objects. Two-frame correspondences are assumed as input without prior knowledge about trajectories. Our method is based on principles from ''multi-model fitting'' and ''permutation synchronization'', and - differently from previous techniques working under the same assumptions - it can handle an unknown number of motions. The proposed approach is validated on standard datasets, showing that it can correctly estimate the number of motions while maintaining comparable or better accuracy than the state of the art.

Total Estimation from RGB Video: On-Line Camera Self-Calibration, Non-Rigid Shape and Motion

Antonio Agudo

Responsive image

Auto-TLDR; Joint Auto-Calibration, Pose and 3D Reconstruction of a Non-rigid Object from an uncalibrated RGB Image Sequence

Slides Poster Similar

In this paper we present a sequential approach to jointly retrieve camera auto-calibration, camera pose and the 3D reconstruction of a non-rigid object from an uncalibrated RGB image sequence, without assuming any prior information about the shape structure, nor the need for a calibration pattern, nor the use of training data at all. To this end, we propose a Bayesian filtering approach based on a sum-of-Gaussians filter composed of a bank of extended Kalman filters (EKF). For every EKF, we make use of dynamic models to estimate its state vector, which later will be Gaussianly combined to achieve a global solution. To deal with deformable objects, we incorporate a mechanical model solved by using the finite element method. Thanks to these ingredients, the resulting method is both efficient and robust to several artifacts such as missing and noisy observations as well as sudden camera motions, while being available for a wide variety of objects and materials, including isometric and elastic shape deformations. Experimental validation is proposed in real experiments, showing its strengths with respect to competing approaches.

A Plane-Based Approach for Indoor Point Clouds Registration

Ketty Favre, Muriel Pressigout, Luce Morin, Eric Marchand

Responsive image

Auto-TLDR; A plane-based registration approach for indoor environments based on LiDAR data

Slides Poster Similar

Iterative Closest Point (ICP) is one of the mostly used algorithms for 3D point clouds registration. This classical approach can be impacted by the large number of points contained in a point cloud. Planar structures, which are less numerous than points, can be used in well-structured man-made environment. In this paper we propose a registration method inspired by the ICP algorithm in a plane-based registration approach for indoor environments. This method is based solely on data acquired with a LiDAR sensor. A new metric based on plane characteristics is introduced to find the best plane correspondences. The optimal transformation is estimated through a two-step minimization approach, successively performing robust plane-to-plane minimization and non-linear robust point-to-plane registration. Experiments on the Autonomous Systems Lab (ASL) dataset show that the proposed method enables to successfully register 100% of the scans from the three indoor sequences. Experiments also show that the proposed method is more robust in large motion scenarios than other state-of-the-art algorithms.

A Two-Step Approach to Lidar-Camera Calibration

Yingna Su, Yaqing Ding, Jian Yang, Hui Kong

Responsive image

Auto-TLDR; Closed-Form Calibration of Lidar-camera System for Ego-motion Estimation and Scene Understanding

Slides Poster Similar

Autonomous vehicles and robots are typically equipped with Lidar and camera. Hence, calibrating the Lidar-camera system is of extreme importance for ego-motion estimation and scene understanding. In this paper, we propose a two-step approach (coarse + fine) for the external calibration between a camera and a multiple-line Lidar. First, a new closed-form solution is proposed to obtain the initial calibration parameters. We compare our solution with the state-of-the-art SVD-based algorithm, and show the benefits of both the efficiency and stability. With the initial calibration parameters, the ICP-based calibration framework is used to register the point clouds which extracted from the camera and Lidar coordinate frames, respectively. Our method has been applied to two Lidar-camera systems: an HDL-64E Lidar-camera system, and a VLP-16 Lidar-camera system. Experimental results demonstrate that our method achieves promising performance and higher accuracy than two open-source methods.

Anime Sketch Colorization by Component-Based Matching Using Deep Appearance Features and Graph Representation

Thien Do, Pham Van, Anh Nguyen, Trung Dang, Quoc Nguyen, Bach Hoang, Giao Nguyen

Responsive image

Auto-TLDR; Combining Deep Learning and Graph Representation for Sketch Colorization

Slides Poster Similar

Sketch colorization is usually expensive and time-consuming for artists, and automating this process can have many pragmatic applications in the animation, comic book, and video game industry. However, automatic image colorization faces many challenges, because sketches not only lack texture information but also potentially entail complicated objects that require acute coloring. These difficulties usually result in incorrect color assignments that can ruin the aesthetic appeal of the final output. In this paper, we present a novel component-based matching framework that combines deep learned features and quadratic programming {\color{red} with a new cost function} to solve this colorization problem. The proposed framework inputs a character's sketches as well as a colored image in the same cut of a movie, and outputs a high-quality sequence of colorized frames based on the color assignment in the reference colored image. To carry out this colorization task, we first utilize a pretrained ResNet-34 model to extract elementary components' features to match certain pairs of components (one component from the sketch and one from reference). Next, a graph representation is constructed in order to process and match the remaining components that could not be done in the first step. Since the first step has reduced the number of components to be matched by the graph, we can solve this graph problem in a short computing time even when there are hundreds of different components present in each sketch. We demonstrate the effectiveness of the proposed solution by conducting comprehensive experiments and producing aesthetically pleasing results. To the best of our knowledge, our framework is the first work that combines deep learning and graph representation to colorize anime and achieves a high pixel-level accuracy at a reasonable time cost.

Generic Merging of Structure from Motion Maps with a Low Memory Footprint

Gabrielle Flood, David Gillsjö, Patrik Persson, Anders Heyden, Kalle Åström

Responsive image

Auto-TLDR; A Low-Memory Footprint Representation for Robust Map Merge

Slides Poster Similar

With the development of cheap image sensors, the amount of available image data have increased enormously, and the possibility of using crowdsourced collection methods has emerged. This calls for development of ways to handle all these data. In this paper, we present new tools that will enable efficient, flexible and robust map merging. Assuming that separate optimisations have been performed for the individual maps, we show how only relevant data can be stored in a low memory footprint representation. We use these representations to perform map merging so that the algorithm is invariant to the merging order and independent of the choice of coordinate system. The result is a robust algorithm that can be applied to several maps simultaneously. The result of a merge can also be represented with the same type of low-memory footprint format, which enables further merging and updating of the map in a hierarchical way. Furthermore, the method can perform loop closing and also detect changes in the scene between the capture of the different image sequences. Using both simulated and real data — from both a hand held mobile phone and from a drone — we verify the performance of the proposed method.

A Globally Optimal Method for the PnP Problem with MRP Rotation Parameterization

Manolis Lourakis, George Terzakis

Responsive image

Auto-TLDR; A Direct least squares, algebraic PnP solver with modified Rodrigues parameters

Poster Similar

The perspective-n-point (PnP) problem is of fundamental importance in computer vision. A global optimality condition for PnP that is independent of a particular rotation parameterization was recently developed by Nakano. This paper puts forward a direct least squares, algebraic PnP solution that extends Nakano's work by combining his optimality condition with the modified Rodrigues parameters (MRPs) for parameterizing rotation. The result is a system of polynomials that is solved using the Groebner basis approach. An MRP vector has twice the rotational range of the classical Rodrigues (i.e., Cayley) vector used by Nakano to represent rotation. The proposed solver provides strong guarantees that the full rotation singularity associated with MRPs is avoided. Furthermore, detailed experiments provide evidence that our solver attains accuracy that is indistinguishable from Nakano's Cayley-based solution with a moderate increase in computational cost.

Minimal Solvers for Indoor UAV Positioning

Marcus Valtonen Örnhag, Patrik Persson, Mårten Wadenbäck, Kalle Åström, Anders Heyden

Responsive image

Auto-TLDR; Relative Pose Solvers for Visual Indoor UAV Navigation

Slides Poster Similar

In this paper we consider a collection of relative pose problems which arise naturally in applications for visual indoor UAV navigation. We focus on cases where additional information from an onboard IMU is available and thus provides a partial extrinsic calibration through the gravitational vector. The solvers are designed for a partially calibrated camera, for a variety of realistic indoor scenarios, which makes it possible to navigate using images of the ground floor. Current state-of-the-art solvers use more general assumptions, such as using arbitrary planar structures; however, these solvers do not yield adequate reconstructions for real scenes, nor do they perform fast enough to be incorporated in real-time systems. We show that the proposed solvers enjoy better numerical stability, are faster, and require fewer point correspondences, compared to state-of-the-art solvers. These properties are vital components for robust navigation in real-time systems, and we demonstrate on both synthetic and real data that our method outperforms other methods, and yields superior motion estimation.

Learning Embeddings for Image Clustering: An Empirical Study of Triplet Loss Approaches

Kalun Ho, Janis Keuper, Franz-Josef Pfreundt, Margret Keuper

Responsive image

Auto-TLDR; Clustering Objectives for K-means and Correlation Clustering Using Triplet Loss

Slides Poster Similar

In this work, we evaluate two different image clustering objectives, k-means clustering and correlation clustering, in the context of Triplet Loss induced feature space embeddings. Specifically, we train a convolutional neural network to learn discriminative features by optimizing two popular versions of the Triplet Loss in order to study their clustering properties under the assumption of noisy labels. Additionally, we propose a new, simple Triplet Loss formulation, which shows desirable properties with respect to formal clustering objectives and outperforms the existing methods. We evaluate all three Triplet loss formulations for K-means and correlation clustering on the CIFAR-10 image classification dataset.

Generic Document Image Dewarping by Probabilistic Discretization of Vanishing Points

Gilles Simon, Salvatore Tabbone

Responsive image

Auto-TLDR; Robust Document Dewarping using vanishing points

Slides Poster Similar

Document images dewarping is still a challenge especially when documents are captured with one camera in an uncontrolled environment. In this paper we propose a generic approach based on vanishing points (VP) to reconstruct the 3D shape of document pages. Unlike previous methods we do not need to segment the text included in the documents. Therefore, our approach is less sensitive to pre-processing and segmentation errors. The computation of the VPs is robust and relies on the a-contrario framework, which has only one parameter whose setting is based on probabilistic reasoning instead of experimental tuning. Thus, our method can be applied to any kind of document including text and non-text blocks and extended to other kind of images. Experimental results show that the proposed method is robust to a variety of distortions.

A New Geodesic-Based Feature for Characterization of 3D Shapes: Application to Soft Tissue Organ Temporal Deformations

Karim Makki, Amine Bohi, Augustin Ogier, Marc-Emmanuel Bellemare

Responsive image

Auto-TLDR; Spatio-Temporal Feature Descriptors for 3D Shape Characterization from Point Clouds

Slides Poster Similar

Spatio-temporal feature descriptors are of great importance for characterizing the local changes of 3D deformable shapes. In this study, we propose a method for characterizing 3D shapes from point clouds and we show a direct application on a study of organ temporal deformations. As an example, we characterize the behavior of the bladder during forced respiratory motion with a reduced number of 3D surface points: first, a set of equidistant points representing the vertices of quadrilateral mesh for the organ surface are tracked throughout a long dynamic MRI sequence using a large deformation diffeomorphic metric mapping (LDDMM) framework. Second, a novel 3D shape descriptor invariant to translation, scale and rotation is proposed for characterizing the temporal organ deformations by employing an Eulerian Partial Differential Equations (PDEs) methodology. We demonstrate the robustness of our feature on both synthetic 3D shapes and realistic dynamic Magnetic Resonance Imaging (MRI) data sequences portraying the bladder deformation during a forced breathing exercise. Promising results are obtained, showing that the proposed feature may be useful for several computer vision applications such as medical imaging, aerodynamics and robotics.

Recovery of 2D and 3D Layout Information through an Advanced Image Stitching Algorithm Using Scanning Electron Microscope Images

Aayush Singla, Bernhard Lippmann, Helmut Graeb

Responsive image

Auto-TLDR; Image Stitching for True Geometrical Layout Recovery in Nanoscale Dimension

Slides Poster Similar

Image stitching describes the process of reconstruction of a high resolution image from combining multiple images. Using a scanning electron microscope as the image source, individual images will show patterns in a nm dimension whereas the combined image may cover an area of several mm2. The recovery of the physical layout of modern semiconductor products manufactured in advanced technologies nodes down to 22 nm requires a perfect stitching process with no deviation with respect to the original design data, as any stitching error will result in failures during the reconstruction of the electrical design. In addition, the recovery of the complete design requires the acquisition of all individual layers of a semiconductor device which represent a 3D structure with interconnections defining error limits on the stitching error for each individual scanned image mosaic. An advanced stitching and alignment process is presented enabling a true geometrical layout recovery in nanoscale dimensions which is also applied and evaluated on other use cases from biological applications.

Efficient Game-Theoretic Hypergraph Matching

Jian Hou, Nai-Ming Qi

Responsive image

Auto-TLDR; Hypergraph Matching with Game-Theoretic Clustering

Slides Poster Similar

Feature matching is a fundamental problem in computer vision. Compared with graph matching, hypergraph matching is able to encode more invariance between correspondences. Different from the majority of existing hypergraph matching algorithms, a game-theoretic algorithm has been developed by transforming hypergraph matching to hypergraph clustering, which is then solved within the framework of a non-cooperative multi-player clustering game. This algorithm obtains the final matches as a cluster of consistent candidate matches and has high accuracy and robustness to outliers in comparison with other competitors. However, in further works we find that this algorithm tends to generate a small number of matches, and the increase of number of matches can only be obtained at the cost of a huge computation load. Our investigation of the algorithm shows that it has a large requirement of internal similarity in a cluster, and therefore generates small clusters of high density. This motivates us to expand the cluster so that more candidate matches are accepted as final matches. For this purpose, we define the density of vertices in a hypergraph and expand the cluster based on relative density relationship among the vertices. In matching experiments with both synthetic and real datasets, our algorithm is shown to generate the same number of or more matches with much less running time in comparison with the original algorithm. Meanwhile, it preserves the advantage of high accuracy and robustness to outlier in comparison with some competitors.

Multi-Scale Keypoint Matching

Sina Lotfian, Hassan Foroosh

Responsive image

Auto-TLDR; Multi-Scale Keypoint Matching Using Multi-Scale Information

Slides Poster Similar

We propose a new hierarchical method to match keypoints by exploiting information across multiple scales. Traditionally, for each keypoint a single scale is detected and the matching process is done in the specific scale. We replace this approach with matching across scale-space. The holistic information from higher scales are used for early rejection of candidates that are far away in the feature space. The more localized and finer details of lower scale are then used to decide between remaining possible points. The proposed multi-scale solution is more consistent with the multi-scale processing that is present in the human visual system and is therefore biologically plausible. We evaluate our method on several datasets and achieve state of the art accuracy, while significantly outperforming others in extraction time.

Edge-Aware Monocular Dense Depth Estimation with Morphology

Zhi Li, Xiaoyang Zhu, Haitao Yu, Qi Zhang, Yongshi Jiang

Responsive image

Auto-TLDR; Spatio-Temporally Smooth Dense Depth Maps Using Only a CPU

Slides Poster Similar

Dense depth maps play an important role in Computer Vision and AR (Augmented Reality). For CV applications, a dense depth map is the cornerstone of 3D reconstruction allowing real objects to be precisely displayed in the computer. And Dense depth maps can handle correct occlusion relationships between virtual content and real objects for better user experience in AR. However, the complicated computation limits the development of computing dense depth maps. We present a novel algorithm that produces low latency, spatio-temporally smooth dense depth maps using only a CPU. The depth maps exhibit sharp discontinuities at depth edges in low computational complexity ways. Our algorithm obtains the sparse SLAM reconstruction first, then extracts coarse depth edges from a down-sampled RGB image by morphology operations. Next, we thin the depth edges and align them with image edges. Finally, a Warm-Start initialization scheme and an improved optimization solver are adopted to accelerate convergence. We evaluate our proposal quantitatively and the result shows improvements on the accuracy of depth map with respect to other state-of-the-art and baseline techniques.

Distinctive 3D Local Deep Descriptors

Fabio Poiesi, Davide Boscaini

Responsive image

Auto-TLDR; DIPs: Local Deep Descriptors for Point Cloud Regression

Slides Poster Similar

We present a simple but yet effective method for learning distinctive 3D local deep descriptors (DIPs) that can be used to register point clouds without requiring an initial alignment. Point cloud patches are extracted, canonicalised with respect to their estimated local reference frame and encoded into rotation-invariant compact descriptors by a PointNet-based deep neural network. DIPs can effectively generalise across different sensor modalities because they are learnt end-to-end from locally and randomly sampled points. Moreover, because DIPs encode only local geometric information, they are robust to clutter, occlusions and missing regions. We evaluate and compare DIPs against alternative hand-crafted and deep descriptors on several indoor and outdoor datasets reconstructed using different sensors. Results show that DIPs (i) achieve comparable results to the state-of-the-art on RGB-D indoor scenes (3DMatch dataset), (ii) outperform state-of-the-art by a large margin on laser-scanner outdoor scenes (ETH dataset), and (iii) generalise to indoor scenes reconstructed with the Visual-SLAM system of Android ARCore.

Camera Calibration Using Parallel Line Segments

Gaku Nakano

Responsive image

Auto-TLDR; Closed-Form Calibration of Surveillance Cameras using Parallel 3D Line Segment Projections

Slides Poster Similar

This paper proposes a camera calibration method suitable for surveillance cameras using the image projection of parallel 3D line segments of the same length. We assume that vertical line segments are perpendicular to the ground plane and their bottom end-points are on the ground plane. Under this assumption, the camera parameters can be directly solved by at least two line segments without estimating vanishing points. Extending the minimal solution, we derive a closed-form solution to the least squares case with more than two line segments. Lens distortion is jointly optimized in bundle adjustment. Synthetic data evaluation shows that the best depression angle of a camera is around 50 degrees. In real data evaluation, we use body joints of pedestrians as vertical line segments. The experimental results on publicly available datasets show that the proposed method with a human pose detector can correctly calibrate wide-angle cameras including radial distortion.

Better Prior Knowledge Improves Human-Pose-Based Extrinsic Camera Calibration

Olivier Moliner, Sangxia Huang, Kalle Åström

Responsive image

Auto-TLDR; Improving Human-pose-based Extrinsic Calibration for Multi-Camera Systems

Slides Poster Similar

Accurate extrinsic calibration of wide baseline multi-camera systems enables better understanding of 3D scenes for many applications and is of great practical importance. Classical Structure-from-Motion calibration methods require special calibration equipment so that accurate point correspondences can be detected between different views. In addition, an operator with some training is usually needed to ensure that data is collected in a way that leads to good calibration accuracy. This limits the ease of adoption of such technologies. Recently, methods have been proposed to use human pose estimation models to establish point correspondences, thus removing the need for any special equipment. The challenge with this approach is that human pose estimation algorithms typically produce much less accurate feature points compared to classical patch-based methods. Another problem is that ambient human motion might not be optimal for calibration. We build upon prior works and introduce several novel ideas to improve the accuracy of human-pose-based extrinsic calibration. Our first contribution is a robust reprojection loss based on a better understanding of the sources of pose estimation error. Our second contribution is a 3D human pose likelihood model learned from motion capture data. We demonstrate significant improvements in calibration accuracy by evaluating our method on four publicly available datasets.

Learning to Find Good Correspondences of Multiple Objects

Youye Xie, Yingheng Tang, Gongguo Tang, William Hoff

Responsive image

Auto-TLDR; Multi-Object Inliers and Outliers for Perspective-n-Point and Object Recognition

Slides Poster Similar

Given a set of 3D to 2D putative matches, labeling the correspondences as inliers or outliers plays a critical role in a wide range of computer vision applications including the Perspective-n-Point (PnP) and object recognition. In this paper, we study a more generalized problem which allows the matches to belong to multiple objects with distinct poses. We propose a deep architecture to simultaneously label the correspondences as inliers or outliers and classify the inliers into multiple objects. Specifically, we discretize the 3D rotation space into twenty convex cones based on the facets of a regular icosahedron. For each facet, a facet classifier is trained to predict the probability of a correspondence being an inlier for a pose whose rotation normal vector points towards this facet. An efficient RANSAC-based post-processing algorithm is also proposed to further process the prediction results and detect the objects. Experiments demonstrate that our method is very efficient compared to existing methods and is capable of simultaneously labeling and classifying the inliers of multiple objects with high precision.

Unconstrained Vision Guided UAV Based Safe Helicopter Landing

Arindam Sikdar, Abhimanyu Sahu, Debajit Sen, Rohit Mahajan, Ananda Chowdhury

Responsive image

Auto-TLDR; Autonomous Helicopter Landing in Hazardous Environments from Unmanned Aerial Images Using Constrained Graph Clustering

Slides Poster Similar

In this paper, we have addressed the problem of automated detection of safe zone(s) for helicopter landing in hazardous environments from images captured by an Unmanned Aerial Vehicle (UAV). The unconstrained motion of the image capturing drone (the UAV in our case) makes the problem further difficult. The solution pipeline consists of natural landmark detection and tracking, stereo-pair generation using constrained graph clustering, digital terrain map construction and safe landing zone detection. The main methodological contribution lies in mathematically formulating epipolar constraint and then using it in a Minimum Spanning Tree (MST) based graph clustering approach. We have also made publicly available AHL (Autonomous Helicopter Landing) dataset, a new aerial video dataset captured by a drone, with annotated ground-truths. Experimental comparisons with other competing clustering methods i) in terms of Dunn Index and Davies Bouldin Index as well as ii) for frame-level safe zone detection in terms of F-measure and confusion matrix clearly demonstrate the effectiveness of the proposed formulation.

RISEdb: A Novel Indoor Localization Dataset

Carlos Sanchez Belenguer, Erik Wolfart, Álvaro Casado Coscollá, Vitor Sequeira

Responsive image

Auto-TLDR; Indoor Localization Using LiDAR SLAM and Smartphones: A Benchmarking Dataset

Slides Poster Similar

In this paper we introduce a novel public dataset for developing and benchmarking indoor localization systems. We have selected and 3D mapped a set of representative indoor environments including a large office building, a conference room, a workshop, an exhibition area and a restaurant. Our acquisition pipeline is based on a portable LiDAR SLAM backpack to map the buildings and to accurately track the pose of the user as it moves freely inside them. We introduce the calibration procedures that enable us to acquire and geo-reference live data coming from different independent sensors rigidly attached to the backpack. This has allowed us to collect long sequences of spherical and stereo images, together with all the sensor readings coming from a consumer smartphone and locate them inside the map with centimetre accuracy. The dataset addresses many of the limitations of existing indoor localization datasets regarding the scale and diversity of the mapped buildings; the number of acquired sequences under varying conditions; the accuracy of the ground-truth trajectory; the availability of a detailed 3D model and the availability of different sensor types. It enables the benchmarking of existing and the development of new indoor localization approaches, in particular for deep learning based systems that require large amounts of labeled training data.

Quantization in Relative Gradient Angle Domain for Building Polygon Estimation

Yuhao Chen, Yifan Wu, Linlin Xu, Alexander Wong

Responsive image

Auto-TLDR; Relative Gradient Angle Transform for Building Footprint Extraction from Remote Sensing Data

Slides Poster Similar

Building footprint extraction in remote sensing data benefits many important applications, such as urban planning and population estimation. Recently, rapid development of Convolutional Neural Networks (CNNs) and open-sourced high resolution satellite building image datasets have pushed the performance boundary further for automated building extractions. However, CNN approaches often generate imprecise building morphologies including noisy edges and round corners. In this paper, we leverage the performance of CNNs, and propose a module that uses prior knowledge of building corners to create angular and concise building polygons from CNN segmentation outputs. We describe a new transform, Relative Gradient Angle Transform (RGA Transform) that converts object contours from time vs. space to time vs. angle. We propose a new shape descriptor, Boundary Orientation Relation Set (BORS), to describe angle relationship between edges in RGA domain, such as orthogonality and parallelism. Finally, we develop an energy minimization framework that makes use of the angle relationship in BORS to straighten edges and reconstruct sharp corners, and the resulting corners create a polygon. Experimental results demonstrate that our method refines CNN output from a rounded approximation to a more clear-cut angular shape of the building footprint.

Deep Space Probing for Point Cloud Analysis

Yirong Yang, Bin Fan, Yongcheng Liu, Hua Lin, Jiyong Zhang, Xin Liu, 蔡鑫宇 蔡鑫宇, Shiming Xiang, Chunhong Pan

Responsive image

Auto-TLDR; SPCNN: Space Probing Convolutional Neural Network for Point Cloud Analysis

Slides Poster Similar

3D points distribute in a continuous 3D space irregularly, thus directly adapting 2D image convolution to 3D points is not an easy job. Previous works often artificially divide the space into regular grids, yet it could be suboptimal to learn geometry. In this paper, we propose SPCNN, namely, Space Probing Convolutional Neural Network, which naturally generalizes image CNN to deal with point clouds. The key idea of SPCNN is learning to probe the 3D space in an adaptive manner. Specifically, we define a pool of learnable convolutional weights, and let each point in the local region learn to choose a suitable convolutional weight from the pool. This is achieved by constructing a geometry guided index-mapping function that implicitly establishes a correspondence between convolutional weights and some local regions in the neighborhood (Fig. 1). In this way, the index-mapping function learns to adaptively partition nearby space for local geometry pattern recognition. With this convolution as a basic operator, SPCNN, a hierarchical architecture can be developed for effective point cloud analysis. Extensive experiments on challenging benchmarks across three tasks demonstrate that SPCNN achieves the state-of-the-art or has competitive performance.

Sketch-Based Community Detection Via Representative Node Sampling

Mahlagha Sedghi, Andre Beckus, George Atia

Responsive image

Auto-TLDR; Sketch-based Clustering of Community Detection Using a Small Sketch

Slides Poster Similar

This paper proposes a sketch-based approach to the community detection problem which clusters the full graph through the use of an informative and concise sketch. The reduced sketch is built through an effective sampling approach which selects few nodes that best represent the complete graph and operates on a pairwise node similarity measure based on the average commute time. After sampling, the proposed algorithm clusters the nodes in the sketch, and then infers the cluster membership of the remaining nodes in the full graph based on their aggregate similarity to nodes in the partitioned sketch. By sampling nodes with strong representation power, our approach can improve the success rates over full graph clustering. In challenging cases with large node degree variation, our approach not only maintains competitive accuracy with full graph clustering despite using a small sketch, but also outperforms existing sampling methods. The use of a small sketch allows considerable storage savings, and computational and timing improvements for further analysis such as clustering and visualization. We provide numerical results on synthetic data based on the homogeneous, heterogeneous and degree corrected versions of the stochastic block model, as well as experimental results on real-world data.

Scalable Direction-Search-Based Approach to Subspace Clustering

Yicong He, George Atia

Responsive image

Auto-TLDR; Fast Direction-Search-Based Subspace Clustering

Slides Similar

Subspace clustering finds a multi-subspace representation that best fits a high-dimensional dataset. The computational and storage complexities of existing algorithms limit their usefulness for large scale data. In this paper, we develop a novel scalable approach to subspace clustering termed Fast Direction-Search-Based Subspace Clustering (Fast DiSC). In sharp contrast to existing scalable solutions which are mostly based on the self-expressiveness property of the data, Fast DiSC rests upon a new representation obtained from projections on computed data-dependent directions. These directions are derived from a convex formulation for optimal direction search to gauge hidden similarity relations. The computational complexity is significantly reduced by performing direction search in partitions of sampled data, followed by a retrieval step to cluster out-of-sample data using projections on the computed directions. A theoretical analysis underscores the ability of the proposed formulation to construct local similarity relations for the different data points. Experiments on both synthetic and real data demonstrate that the proposed algorithm can often outperform the state-of-the-art clustering methods.

NetCalib: A Novel Approach for LiDAR-Camera Auto-Calibration Based on Deep Learning

Shan Wu, Amnir Hadachi, Damien Vivet, Yadu Prabhakar

Responsive image

Auto-TLDR; Automatic Calibration of LiDAR and Cameras using Deep Neural Network

Slides Poster Similar

A fusion of LiDAR and cameras have been widely used in many robotics applications such as classification, segmentation, object detection, and autonomous driving. It is essential that the LiDAR sensor can measure distances accurately, which is a good complement to the cameras. Hence, calibrating sensors before deployment is a mandatory step. The conventional methods include checkerboards, specific patterns, or human labeling, which is trivial and human-labor extensive if we do the same calibration process every time. The main propose of this research work is to build a deep neural network that is capable of automatically finding the geometric transformation between LiDAR and cameras. The results show that our model manages to find the transformations from randomly sampled artificial errors. Besides, our work is open-sourced for the community to fully utilize the advances of the methodology for developing more the approach, initiating collaboration, and innovation in the topic.

Graph-Based Image Decoding for Multiplexed in Situ RNA Detection

Gabriele Partel, Carolina Wahlby

Responsive image

Auto-TLDR; A Graph-based Decoding Approach for Multiplexed In situ RNA Detection

Poster Similar

Image-based multiplexed in situ RNA detection makes it possible to map the spatial gene expression of hundreds to thousands of genes in parallel, and thus discern at the same time a large numbers of different cell types to better understand tissue development, heterogeneity, and disease. Fluorescent signals are detected over multiple fluorescent channels and imaging rounds and decoded in order to identify RNA molecules in their morphological context. Here we present a graph-based decoding approach that models the decoding process as a network flow problem jointly optimizing observation likelihoods and distances of signal detections, thus achieving robustness with respect to noise and spatial jitter of the fluorescent signals. We evaluated our method on synthetic data generated at different experimental conditions, and on real data of in situ RNA sequencing, comparing results with respect to alternative and gold standard image decoding pipelines.

3D Dental Biometrics: Automatic Pose-Invariant Dental Arch Extraction and Matching

Zhong Xin, Zhiyuan Zhang

Responsive image

Auto-TLDR; Automatic Dental Arch Extraction and Matching for 3D Dental Identification using Laser-Scanned Plasters

Slides Poster Similar

A novel automatic pose-invariant dental arch extraction and matching framework is developed for 3D dental identification using laser-scanned dental plasters. In our previous attempt [1-5], 3D point-based algorithms have been developed and they have shown a few advantages over existing 2D dental identifications. This study is a continuous effort in developing arch-based algorithms to extract and match dental arch feature in an automatic and pose-invariant way. As best as we know, this is the first attempt at automatic dental arch extraction and matching for 3D dental identification. A Radial Ray Algorithm (RRA) is proposed by projecting dental arch shape from 3D to 2D. This algorithm is fully automatic and fast. Preliminary identification result is obtained by matching 11 postmortem (PM) samples against 200 ante-mortem (AM) samples. 72.7% samples achieved top 5% accuracy. 90.9% samples achieved top 10% accuracy and all 11 samples (100%) achieved top 15.5% accuracy out of the 200-rank list. In addition, the time for identifying a single subject from 200 subjects has been significantly reduced from 45 minutes to 5 minutes by matching the extracted 2D dental arch. Although the extracted 2D arch feature is not as accurate and discriminative as the full 3D arch, it may serve as an important filter feature to improve the identification speed in future investigations.

Joint Supervised and Self-Supervised Learning for 3D Real World Challenges

Antonio Alliegro, Davide Boscaini, Tatiana Tommasi

Responsive image

Auto-TLDR; Self-supervision for 3D Shape Classification and Segmentation in Point Clouds

Slides Similar

Point cloud processing and 3D shape understanding are very challenging tasks for which deep learning techniques have demonstrated great potentials. Still further progresses are essential to allow artificial intelligent agents to interact with the real world. In many practical conditions the amount of annotated data may be limited and integrating new sources of knowledge becomes crucial to support autonomous learning. Here we consider several scenarios involving synthetic and real world point clouds where supervised learning fails due to data scarcity and large domain gaps. We propose to enrich standard feature representations by leveraging self-supervision through a multi-task model that can solve a 3D puzzle while learning the main task of shape classification or part segmentation. An extensive analysis investigating few-shot, transfer learning and cross-domain settings shows the effectiveness of our approach with state-of-the-art results for 3D shape classification and part segmentation.

Localization and Transformation Reconstruction of Image Regions: An Extended Congruent Triangles Approach

Afra'A Ahmad Alyosef, Christian Elias, Andreas Nürnberger

Responsive image

Auto-TLDR; Outlier Filtering of Sub-Image Relations using Geometrical Information

Slides Poster Similar

Most of the existing methods to localize (sub) image relations – a subclass of near-duplicate retrieval techniques – rely on the distinctiveness of matched features of the images being compared. These sets of matching features usually include a proportion of outliers, i.e. features linking non matching regions. In approaches that are designed for retrieval purposes only, these false matches usually have a minor impact on the final ranking. However, if also a localization of regions and corresponding image transformations should be computed, these false matches often have a more significant impact. In this paper, we propose a novel outlier filtering approach based on the geometrical information of the matched features. Our approach is similar to the RANSAC model, but instead of randomly selecting sets of matches and employ them to derive the homography transformation between images or image regions, we exploit in addition the geometrical relation of feature matches to find the best congruent triangle matches. Based on this information we classify outliers and determine the correlation between image regions. We compare our approach with state of art approaches using different feature models and various benchmark data sets (sub-image/panorama with affine transformation, adding blur, noise or scale change). The results indicate that our approach is more robust than the state of art approaches and is able to detect correlation even when most matches are outliers. Moreover, our approach reduces the pre-processing time to filter the matches significantly.

PIF: Anomaly detection via preference embedding

Filippo Leveni, Luca Magri, Giacomo Boracchi, Cesare Alippi

Responsive image

Auto-TLDR; PIF: Anomaly Detection with Preference Embedding for Structured Patterns

Slides Poster Similar

We address the problem of detecting anomalies with respect to structured patterns. To this end, we conceive a novel anomaly detection method called PIF, that combines the advantages of adaptive isolation methods with the flexibility of preference embedding. Specifically, we propose to embed the data in a high dimensional space where an efficient tree-based method, PI-FOREST, is employed to compute an anomaly score. Experiments on synthetic and real datasets demonstrate that PIF favorably compares with state-of-the-art anomaly detection techniques, and confirm that PI-FOREST is better at measuring arbitrary distances and isolate points in the preference space.

Extending Single Beam Lidar to Full Resolution by Fusing with Single Image Depth Estimation

Yawen Lu, Yuxing Wang, Devarth Parikh, Guoyu Lu

Responsive image

Auto-TLDR; Self-supervised LIDAR for Low-Cost Depth Estimation

Slides Similar

Depth estimation is playing an important role in indoor and outdoor scene understanding, autonomous driving, augmented reality and many other tasks. Vehicles and robotics are able to use active illumination sensors such as LIDAR to receive high precision depth estimation. However, high-resolution Lidars are usually too expensive, which limits its massive production on various applications. Though single beam LIDAR enjoys the benefits of low cost, one beam depth sensing is not usually sufficient to perceive the surrounding environment in many scenarios. In this paper, we propose a learning-based framework to explore to replicate similar or even higher performance as costly LIDARs with our designed self-supervised network and a low-cost single-beam LIDAR. After the accurate calibration with a visible camera, the single beam LIDAR can adjust the scale uncertainty of the depth map estimated by the visible camera. The adjusted depth map enjoys the benefits of high resolution and sensing accuracy as high beam LIDAR and maintains low-cost as single beam LIDAR. Thus we can achieve similar sensing effect of high beam LIDAR with more than a 50-100 times cheaper price (e.g., \$80000 Velodyne HDL-64E LIDAR v.s. \$1000 SICK TIM-781 2D LIDAR and normal camera). The proposed approach is verified on our collected dataset and public dataset with superior depth-sensing performance.

One Step Clustering Based on A-Contrario Framework for Detection of Alterations in Historical Violins

Alireza Rezaei, Sylvie Le Hégarat-Mascle, Emanuel Aldea, Piercarlo Dondi, Marco Malagodi

Responsive image

Auto-TLDR; A-Contrario Clustering for the Detection of Altered Violins using UVIFL Images

Slides Poster Similar

Preventive conservation is an important practice in Cultural Heritage. The constant monitoring of the state of conservation of an artwork helps us reduce the risk of damage and number of interventions necessary. In this work, we propose a probabilistic approach for the detection of alterations on the surface of historical violins based on an a-contrario framework. Our method is a one step NFA clustering solution which considers grey-level and spatial density information in one background model. The proposed method is robust to noise and avoids parameter tuning and any assumption about the quantity of the worn out areas. We have used as input UV induced fluorescence (UVIFL) images for considering details not perceivable with visible light. Tests were conducted on image sequences included in the ``Violins UVIFL imagery'' dataset. Results illustrate the ability of the algorithm to distinguish the worn area from the surrounding regions. Comparisons with the state of the art clustering methods shows improved overall precision and recall.

Learning Sign-Constrained Support Vector Machines

Kenya Tajima, Kouhei Tsuchida, Esmeraldo Ronnie Rey Zara, Naoya Ohta, Tsuyoshi Kato

Responsive image

Auto-TLDR; Constrained Sign Constraints for Learning Linear Support Vector Machine

Poster Similar

Domain knowledge is useful to improve the generalization performance of learning machines. Sign constraints are a handy representation to combine domain knowledge with learning machine. In this paper, we consider constraining the signs of the weight coefficients in learning the linear support vector machine, and develop two optimization algorithms for minimizing the empirical risk under the sign constraints. One of the two algorithms is based on the projected gradient method, in which each iteration of the projected gradient method takes O(nd) computational cost and the sublinear convergence of the objective error is guaranteed. The second algorithm is based on the Frank-Wolfe method that also converges sublinearly and possesses a clear termination criterion. We show that each iteration of the Frank-Wolfe also requires O(nd) cost. Furthermore, we derive the explicit expression for the minimal iteration number to ensure an epsilon-accurate solution by analyzing the curvature of the objective function. Finally, we empirically demonstrate that the sign constraints are a promising technique when similarities to the training examples compose the feature vector.

3D Point Cloud Registration Based on Cascaded Mutual Information Attention Network

Xiang Pan, Xiaoyi Ji

Responsive image

Auto-TLDR; Cascaded Mutual Information Attention Network for 3D Point Cloud Registration

Slides Poster Similar

For 3D point cloud registration, how to improve the local feature correlation of two point clouds is a challenging problem. In this paper, we propose a cascaded mutual information attention registration network. The network improves the accuracy of point cloud registration by stacking residual structure and using lateral connection. Firstly, the local reference coordinate system is defined by spherical representation for the local point set, which improves the stability and reliability of local features under noise. Secondly, the attention structure is used to improve the network depth and ensure the convergence of the network. Furthermore, a lateral connection is introduced into the network to avoid the loss of features in the process of concatenation. In the experimental part, the results of different algorithms are compared. It can be found that the proposed cascaded network can enhance the correlation of local features between different point clouds. As a result, it improves the registration accuracy significantly over the DCP and other typical algorithms.

RNN Training along Locally Optimal Trajectories via Frank-Wolfe Algorithm

Yun Yue, Ming Li, Venkatesh Saligrama, Ziming Zhang

Responsive image

Auto-TLDR; Frank-Wolfe Algorithm for Efficient Training of RNNs

Slides Poster Similar

We propose a novel and efficient training method for RNNs by iteratively seeking a local minima on the loss surface within a small region, and leverage this directional vector for the update, in an outer-loop. We propose to utilize the Frank-Wolfe (FW) algorithm in this context. Although, FW implicitly involves normalized gradients, which can lead to a slow convergence rate, we develop a novel RNN training method that, surprisingly, even with the additional cost, the overall training cost is empirically observed to be lower than back-propagation. Our method leads to a new Frank-Wolfe method, that is in essence an SGD algorithm with a restart scheme. We prove that under certain conditions our algorithm has a sublinear convergence rate of $O(1/\epsilon)$ for $\epsilon$ error. We then conduct empirical experiments on several benchmark datasets including those that exhibit long-term dependencies, and show significant performance improvement. We also experiment with deep RNN architectures and show efficient training performance. Finally, we demonstrate that our training method is robust to noisy data.

Hybrid Approach for 3D Head Reconstruction: Using Neural Networks and Visual Geometry

Oussema Bouafif, Bogdan Khomutenko, Mohammed Daoudi

Responsive image

Auto-TLDR; Recovering 3D Head Geometry from a Single Image using Deep Learning and Geometric Techniques

Slides Poster Similar

Recovering the 3D geometric structure of a face from a single input image is a challenging active research area in computer vision. In this paper, we present a novel method for reconstructing 3D heads from a single or multiple image(s) using a hybrid approach based on deep learning and geometric techniques. We propose an encoder-decoder network based on the U-net architecture and trained on synthetic data only. It predicts both pixel-wise normal vectors and landmarks maps from a single input photo. Landmarks are used for the pose computation and the initialization of the optimization problem, which, in turn, reconstructs the 3D head geometry by using a parametric morphable model and normal vector fields. State-of-the-art results are achieved through qualitative and quantitative evaluation tests on both single and multi-view settings. Despite the fact that the model was trained only on synthetic data, it successfully recovers 3D geometry and precise poses for real-world images.

FC-DCNN: A Densely Connected Neural Network for Stereo Estimation

Dominik Hirner, Friedrich Fraundorfer

Responsive image

Auto-TLDR; FC-DCNN: A Lightweight Network for Stereo Estimation

Slides Poster Similar

We propose a novel lightweight network for stereo estimation. Our network consists of a fully-convolutional densely connected neural network (FC-DCNN) that computes matching costs between rectified image pairs. Our FC-DCNN method learns expressive features and performs some simple but effective post-processing steps. The densely connected layer structure connects the output of each layer to the input of each subsequent layer. This network structure in addition to getting rid of any fully-connected layers leads to a very lightweight network. The output of this network is used in order to calculate matching costs and create a cost-volume. Instead of using time and memory-inefficient cost-aggregation methods such as semi-global matching or conditional random fields in order to improve the result, we rely on filtering techniques, namely median filter and guided filter. By computing a left-right consistency check we get rid of inconsistent values. Afterwards we use a watershed foreground-background segmentation on the disparity image with removed inconsistencies. This mask is then used to refine the final prediction. We show that our method works well for both challenging indoor and outdoor scenes by evaluating it on the Middlebury, KITTI and ETH3D benchmarks respectively.

A Randomized Algorithm for Sparse Recovery

Huiyuan Yu, Maggie Cheng, Yingdong Lu

Responsive image

Auto-TLDR; A Constrained Graph Optimization Algorithm for Sparse Signal Recovery

Poster Similar

This paper considers the problem of sparse signal recovery where there is a structure in the signal. Efficient recovery schemes can be designed to leverage the signal structure. Following the model-based compressive sensing framework, we have developed an efficient algorithm for both head and tail approximations for the model-projection problem. The problem is modeled as a constrained graph optimization problem, which is an NP-hard optimization problem. Solving the NP-hard optimization program is then transformed to solving a linear program and finding a randomized algorithm to find an integral solution. The integral solution is optimal-in-expectation. The algorithm is proved to have the same geometric convergence as previous work. The algorithm has been tested on various compressing matrices. It worked well with the matrices with the Restricted Isometry Property (RIP), also worked well with some matrices that have not been shown to have RIP. The proposed algorithm demonstrated improved recoverability and used fewer number of iterations to recover the signal.

3D Semantic Labeling of Photogrammetry Meshes Based on Active Learning

Mengqi Rong, Shuhan Shen, Zhanyi Hu

Responsive image

Auto-TLDR; 3D Semantic Expression of Urban Scenes Based on Active Learning

Slides Poster Similar

As different urban scenes are similar but still not completely consistent, coupled with the complexity of labeling directly in 3D, high-level understanding of 3D scenes has always been a tricky problem. In this paper, we propose a procedural approach for 3D semantic expression of urban scenes based on active learning. We first start with a small labeled image set to fine-tune a semantic segmentation network and then project its probability map onto a 3D mesh model for fusion, finally outputs a 3D semantic mesh model in which each facet has a semantic label and a heat model showing each facet’s confidence. Our key observation is that our algorithm is iterative, in each iteration, we use the output semantic model as a supervision to select several valuable images for annotation to co-participate in the fine-tuning for overall improvement. In this way, we reduce the workload of labeling but not the quality of 3D semantic model. Using urban areas from two different cities, we show the potential of our method and demonstrate its effectiveness.

AdaFilter: Adaptive Filter Design with Local Image Basis Decomposition for Optimizing Image Recognition Preprocessing

Aiga Suzuki, Keiichi Ito, Takahide Ibe, Nobuyuki Otsu

Responsive image

Auto-TLDR; Optimal Preprocessing Filtering for Pattern Recognition Using Higher-Order Local Auto-Correlation

Slides Poster Similar

Image preprocessing is an important process during pattern recognition which increases the recognition performance. Linear convolution filtering is a primary preprocessing method used to enhance particular local patterns of the image which are essential for recognizing the images. However, because of the vast search space of the preprocessing filter, almost no earlier studies have tackled the problem of identifying an optimal preprocessing filter that yields effective features for input images. This paper proposes a novel design method for the optimal preprocessing filter corresponding to a given task. Our method calculates local image bases of the training dataset and represents the optimal filter as a linear combination of these local image bases with the optimized coefficients to maximize the expected generalization performance. Thereby, the optimization problem of the preprocessing filter is converted to a lower-dimensional optimization problem. Our proposed method combined with a higher-order local auto-correlation (HLAC) feature extraction exhibited the best performance both in the anomaly detection task with the conventional pattern recognition algorithm and in the classification task using the deep convolutional neural network compared with typical preprocessing filters.

Learning to Sort Handwritten Text Lines in Reading Order through Estimated Binary Order Relations

Lorenzo Quirós, Enrique Vidal

Responsive image

Auto-TLDR; Automatic Reading Order of Text Lines in Handwritten Text Documents

Slides Similar

Recent advances in Handwritten Text Recognition and Document Layout Analysis make it possible to extract information from digitized documents and make them accessible beyond the archive shelves. But the reading order of the elements in those documents still is an open problem that has to be solved in order to provide that information with the correct structure. Most of the studies on the reading order task are rule-base approaches that focus on printed documents, while less attention has been paid to handwritten text documents. In this work we propose a new approach to automatically determine the reading order of text lines in handwritten text documents. The task is approached as a sorting problem where the order-relation operator is learned directly from examples. We demonstrate the effectiveness of our method on three different datasets.

A Multilinear Sampling Algorithm to Estimate Shapley Values

Ramin Okhrati, Aldo Lipani

Responsive image

Auto-TLDR; A sampling method for Shapley values for multilayer Perceptrons

Slides Poster Similar

Shapley values are great analytical tools in game theory to measure the importance of a player in a game. Due to their axiomatic and desirable properties such as efficiency, they have become popular for feature importance analysis in data science and machine learning. However, the time complexity to compute Shapley values based on the original formula is exponential, and as the number of features increases, this becomes infeasible. Castro et al. [1] developed a sampling algorithm, to estimate Shapley values. In this work, we propose a new sampling method based on a multilinear extension technique as applied in game theory. The aim is to provide a more efficient (sampling) method for estimating Shapley values. Our method is applicable to any machine learning model, in particular for either multiclass classifications or regression problems. We apply the method to estimate Shapley values for multilayer Perceptrons (MLPs) and through experimentation on two datasets, we demonstrate that our method provides more accurate estimations of the Shapley values by reducing the variance of the sampling statistics

Facetwise Mesh Refinement for Multi-View Stereo

Andrea Romanoni, Matteo Matteucci

Responsive image

Auto-TLDR; Facetwise Refinement of Multi-View Stereo using Delaunay Triangulations

Slides Similar

Mesh refinement is a fundamental step for accurate Multi-View Stereo. It modifies the geometry of an initial manifold mesh to minimize the photometric error induced in a set of camera pairs. This initial mesh is usually the output of volumetric 3D reconstruction based on min-cut over Delaunay Triangulations. Such methods produce a significant amount of non-manifold vertices, therefore they require a vertex split step to explicitly repair them. In this paper we extend this method to preemptively fix the non-manifold vertices by reasoning directly on the Delaunay Triangulation and avoid most vertex splits. The main contribution of this paper addresses the problem of choosing the camera pairs adopted by the refinement process. We treat the problem as a mesh labeling process, where each label corresponds to a camera pair. Differently from the state-of-the-art methods, which use each camera pair to refine all the visible parts of the mesh, we choose, for each facet, the best pair that enforces both the overall visibility and coverage. The refinement step is applied for each facet using only the camera pair selected. This facetwise refinement helps the process to be applied in the most evenly way possible.

Weakly Supervised Geodesic Segmentation of Egyptian Mummy CT Scans

Avik Hati, Matteo Bustreo, Diego Sona, Vittorio Murino, Alessio Del Bue

Responsive image

Auto-TLDR; A Weakly Supervised and Efficient Interactive Segmentation of Ancient Egyptian Mummies CT Scans Using Geodesic Distance Measure and GrabCut

Slides Poster Similar

In this paper, we tackle the task of automatically analyzing 3D volumetric scans obtained from computed tomography (CT) devices. In particular, we address a particular task for which data is very limited: the segmentation of ancient Egyptian mummies CT scans. We aim at digitally unwrapping the mummy and identify different segments such as body, bandages and jewelry. The problem is complex because of the lack of annotated data for the different semantic regions to segment, thus discouraging the use of strongly supervised approaches. We, therefore, propose a weakly supervised and efficient interactive segmentation method to solve this challenging problem. After segmenting the wrapped mummy from its exterior region using histogram analysis and template matching, we first design a voxel distance measure to find an approximate solution for the body and bandage segments. Here, we use geodesic distances since voxel features as well as spatial relationship among voxels is incorporated in this measure. Next, we refine the solution using a GrabCut based segmentation together with a tracking method on the slices of the scan that assigns labels to different regions in the volume, using limited supervision in the form of scribbles drawn by the user. The efficiency of the proposed method is demonstrated using visualizations and validated through quantitative measures and qualitative unwrapping of the mummy.

Are Multiple Cross-Correlation Identities Better Than Just Two? Improving the Estimate of Time Differences-Of-Arrivals from Blind Audio Signals

Danilo Greco, Jacopo Cavazza, Alessio Del Bue

Responsive image

Auto-TLDR; Improving Blind Channel Identification Using Cross-Correlation Identity for Time Differences-of-Arrivals Estimation

Slides Poster Similar

Given an unknown audio source, the estimation of time differences-of-arrivals (TDOAs) can be efficiently and robustly solved using blind channel identification and exploiting the cross-correlation identity (CCI). Prior "blind" works have improved the estimate of TDOAs by means of different algorithmic solutions and optimization strategies, while always sticking to the case N = 2 microphones. But what if we can obtain a direct improvement in performance by just increasing N? In this paper we try to investigate this direction, showing that, despite the arguable simplicity, this is capable of (sharply) improving upon state-of-the-art blind channel identification methods based on CCI, without modifying the computational pipeline. Inspired by our results, we seek to warm up the community and the practitioners by paving the way (with two concrete, yet preliminary, examples) towards joint approaches in which advances in the optimization are combined with an increased number of microphones, in order to achieve further improvements.

Transferable Model for Shape Optimization subject to Physical Constraints

Lukas Harsch, Johannes Burgbacher, Stefan Riedelbauch

Responsive image

Auto-TLDR; U-Net with Spatial Transformer Network for Flow Simulations

Slides Poster Similar

The interaction of neural networks with physical equations offers a wide range of applications. We provide a method which enables a neural network to transform objects subject to given physical constraints. Therefore an U-Net architecture is used to learn the underlying physical behaviour of fluid flows. The network is used to infer the solution of flow simulations which will be shown for a wide range of generic channel flow simulations. Physical meaningful quantities can be computed on the obtained solution, e.g. the total pressure difference or the forces on the objects. A Spatial Transformer Network with thin-plate-splines is used for the interaction between the physical constraints and the geometric representation of the objects. Thus, a transformation from an initial to a target geometry is performed such that the object is fulfilling the given constraints. This method is fully differentiable i.e., gradient informations can be used for the transformation. This can be seen as an inverse design process. The advantage of this method over many other proposed methods is, that the physical constraints are based on the inferred flow field solution. Thus, we can apply a transferable model to varying problem setups, which is not limited to a given set of geometry parameters or physical quantities.