Browse ICPR2020 papers

2D Deep Video Capsule Network with Temporal Shift for Action Recognition

Théo Voillemin, Hazem Wannous, Jean-Philippe Vandeborre
Track 2: Biometrics, Human Analysis and Behavior Understanding
Tue 12 Jan 2021 at 14:00 in session OS T2.1

Responsive image

Auto-TLDR; Temporal Shift Module over Capsule Network for Action Recognition in Continuous Videos

Underline Similar papers

Action recognition in continuous video streams is a growing field since the past few years. Deep learning techniques and in particular Convolutional Neural Networks (CNNs) achieved good results in this topic. However, intrinsic CNNs limitations begin to cap the results since 2D CNN cannot capture temporal information and 3D CNN are to much resource demanding for real-time applications. Capsule Network, evolution of CNN, already proves its interesting benefits on small and low informational datasets like MNIST but yet its true potential has not emerged. In this paper we tackle the action recognition problem by proposing a new architecture combining Temporal Shift module over deep Capsule Network. Temporal Shift module permits us to insert temporal information over 2D Capsule Network with a zero computational cost to conserve the lightness of 2D capsules and their ability to connect spatial features. Our proposed approach outperforms or brings near state-of-the-art results on color and depth information on public datasets like First Person Hand Action and DHG 14/28 with a number of parameters 10 to 40 times less than existing approaches.

AerialMPTNet: Multi-Pedestrian Tracking in Aerial Imagery Using Temporal and Graphical Features

Maximilian Kraus, Seyed Majid Azimi, Emec Ercelik, Reza Bahmanyar, Peter Reinartz, Alois Knoll
Track 3: Computer Vision Robotics and Intelligent Systems
Tue 12 Jan 2021 at 15:00 in session PS T3.1

Responsive image

Auto-TLDR; AerialMPTNet: A novel approach for multi-pedestrian tracking in geo-referenced aerial imagery by fusing appearance features

Underline Similar papers

Multi-pedestrian tracking in aerial imagery has several applications such as large-scale event monitoring, disaster management, search-and-rescue missions, and as input into predictive crowd dynamic models. Due to the challenges such as the large number and the tiny size of the pedestrians (e.g., 4 x 4 pixels) with their similar appearances as well as different scales and atmospheric conditions of the images with their extremely low frame rates (e.g., 2 fps), current state-of-the-art algorithms including the deep learning-based ones are unable to perform well. In this paper, we propose AerialMPTNet, a novel approach for multi-pedestrian tracking in geo-referenced aerial imagery by fusing appearance features from a Siamese Neural Network, movement predictions from a Long Short-Term Memory, and pedestrian interconnections from a GraphCNN. In addition, to address the lack of diverse aerial multi-pedestrian tracking datasets, we introduce the Aerial Multi-Pedestrian Tracking (AerialMPT) dataset consisting of 307 frames and 44,740 pedestrians annotated. To the best of our knowledge, AerialMPT is the largest and most diverse dataset to this date and will be released publicly. We evaluate AerialMPTNet on AerialMPT and KIT AIS, and benchmark with several state-of-the-art tracking methods. Results indicate that AerialMPTNet significantly outperforms other methods on accuracy and time-efficiency.

Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection

Yongqiang Dou, Haocheng Yang, Maolin Yang, Yanyan Xu, Dengfeng Ke
Track 2: Biometrics, Human Analysis and Behavior Understanding
Thu 14 Jan 2021 at 12:00 in session PS T2.4

Responsive image

Auto-TLDR; Anti-Spoofing with Balanced Focal Loss Function and Combination Features

Underline Similar papers

It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while lack of indistinguishable samples makes it difficult to train a robust spoofing detector. In this paper, we argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process, to make correct discrimination a top priority. Therefore, to mitigate the data discrepancy between training and inference, we propose to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself. Besides, in the experiments, we select three kinds of features that contain both magnitude-based and phase-based information to form complementary and informative features. Experimental results on the ASVspoof2019 dataset demonstrate the superiority of the proposed methods by comparison between our systems and top-performing ones. Systems trained with the balanced focal loss perform significantly better than conventional cross-entropy loss. With complementary features, our fusion system with only three kinds of features outperforms other systems containing five or more complex single models by 22.5% for min-tDCF and 7% for EER, achieving a min-tDCF and an EER of 0.0124 and 0.55% respectively. Furthermore, we present and discuss the evaluation results on real replay data apart from the simulated ASVspoof2019 data, indicating that research for anti-spoofing still has a long way to go.

Position-Aware and Symmetry Enhanced GAN for Radial Distortion Correction

Yongjie Shi, Xin Tong, Jingsi Wen, He Zhao, Xianghua Ying, Jinshi Hongbin Zha
Track 3: Computer Vision Robotics and Intelligent Systems
Wed 13 Jan 2021 at 12:00 in session PS T3.4

Responsive image

Auto-TLDR; Generative Adversarial Network for Radial Distorted Image Correction

Underline Similar papers

This paper presents a novel method based on the generative adversarial network for radial distortion correction. Instead of generating a corrected image, our generator predicts a pixel flow map to measure the pixel offset between the distorted and corrected image. The quality of the generated pixel flow map and the warped image are judged by the discriminator. As texture far away from the image center has strong distortion, we develop an Adaptive Inverted Foveal layer which can transform the deformation to the intensity of the image to exploit this property. Rotation symmetry enhanced convolution kernels are applied to extract geometric features of different orientations explicitly. These learned features are recalibrated using the Squeeze-and-Excitation block to assign different weights for different directions. Moreover, we construct a first real-world radial distorted image dataset RD600 annotated with ground truth to evaluate our proposed method. We conduct extensive experiments to validate the effectiveness of each part of our framework. The further experiment shows our approach outperforms previous methods in both synthetic and real-world datasets quantitatively and qualitatively.

The HisClima Database: Historical Weather Logs for Automatic Transcription and Information Extraction

Verónica Romero, Joan Andreu Sánchez
Track 4: Document and Media Analysis
Tue 12 Jan 2021 at 17:00 in session PS T4.1

Responsive image

Auto-TLDR; Automatic Handwritten Text Recognition and Information Extraction from Historical Weather Logs

Underline Similar papers

Knowing the weather and atmospheric conditions from the past can help weather researchers to generate models like the ones used to predict how weather conditions are likely to change as global temperatures continue to rise. Many historical weather records are available from the past registered on a systemic basis. Historical weather logs were registered in ships, when they were on the high seas, recording daily weather conditions such as: wind speed, temperature, coordinates, etc. These historical documents represent an important source of knowledge with valuable information to extract climatic information of several centuries ago. Such information is usually collected by experts that devote a lot of time. This paper presents a new database, compiled from a ship log mainly composed by handwritten tables that contain mainly numerical information, to support research in automatic handwriting recognition and information extraction. In addition, a study is presented about the capability of state-of-the-art handwritten text recognition systems and information extraction techniques, when applied to the presented database. Baseline results are reported for reference in future studies.

Stochastic 3D Rock Reconstruction Using GANs

Sergio Damas, Andrea Valsecchi
Track 1: Artificial Intelligence, Machine Learning for Pattern Analysis
Tue 12 Jan 2021 at 15:00 in session PS T1.2

Responsive image

Auto-TLDR; Generative Adversarial Neural Networks for 3D-to-3D Reconstruction of Porous Media

Underline Similar papers

The study of the physical properties of porous media is crucial for petrophysics laboratories. Even though micro computed tomography (CT) could be useful, the appropriate evaluation of flow properties would involve the acquisition of a large number of representative images. That is often unfeasible. Stochastic reconstruction methods aim to generate novel, realistic rock images from a small sample, thus avoiding a large acquisition process. In this contribution, we improve a previous method for 3D-to-3D reconstruction of the structure of porous media by applying generative adversarial neural networks (GANs). We compare several measures of pore morphology between simulated and acquired images. Experiments include Beadpack, Berea sandstone, and Ketton limestone images. Results show that our GANs-based method can reconstruct three-dimensional images of porous media at different scales that are representative of the morphology of the original images. Furthermore, the generation of multiple images is much faster than classical image reconstruction methods.

Temporal Pulses Driven Spiking Neural Network for Time and Power Efficient Object Recognition in Autonomous Driving

Wei Wang, Shibo Zhou, Jingxi Li, Xiaohua Li, Junsong Yuan, Zhanpeng Jin
Track 3: Computer Vision Robotics and Intelligent Systems
Tue 12 Jan 2021 at 17:00 in session PS T3.2

Responsive image

Auto-TLDR; Spiking Neural Network for Real-Time Object Recognition on Temporal LiDAR Pulses

Underline Similar papers

Accurate real-time object recognition from sensory data has long been a crucial and challenging task for autonomous driving. Even though deep neural networks (DNNs) have been widely applied in this area, their considerable processing latency, power consumption as well as computational complexity have been challenging issues for real-time autonomous driving applications. In this paper, we propose an approach to address the real-time object recognition problem utilizing spiking neural networks (SNNs). The proposed SNN model works directly with raw temporal LiDAR pulses without the pulse-to-point cloud preprocessing procedure, which can significantly reduce delay and power consumption. Being evaluated on various datasets derived from LiDAR and dynamic vision sensor (DVS), including Sim LiDAR, KITTI, and DVS-barrel, our proposed model has shown remarkable time and power efficiency, while achieving comparable recognition performance as the state-of-the-art methods. This paper highlights the SNN's great potentials in autonomous driving and related applications. To the best of our knowledge, this is the first attempt to use SNN to directly perform time and energy efficient object recognition on temporal LiDAR pulses in the setting of autonomous driving.

Memetic Evolution of Training Sets with Adaptive Radial Basis Kernels for Support Vector Machines

Jakub Nalepa, Wojciech Dudzik, Michal Kawulok
Track 1: Artificial Intelligence, Machine Learning for Pattern Analysis
Wed 13 Jan 2021 at 16:30 in session PS T1.7

Responsive image

Auto-TLDR; Memetic Algorithm for Evolving Support Vector Machines with Adaptive Kernels

Underline Similar papers

Support vector machines (SVMs) are a supervised learning technique that can be applied in both binary and multi-class classification and regression tasks. SVMs seamlessly handle continuous and categorical variables. Their training is, however, both time- and memory-costly for large training data, and selecting an incorrect kernel function or its hyperparameters leads to suboptimal decision hyperplanes. In this paper, we introduce a memetic algorithm for evolving SVM training sets with adaptive radial basis function kernels to not only make the deployment of SVMs easier for emerging big data applications, but also to improve their generalization abilities over the unseen data. We build upon two observations: first, only a small subset of all training vectors, called the support vectors, contribute to the position of the decision boundary, hence the other vectors can be removed from the training set without deteriorating the performance of the model. Second, selecting different kernel hyperparameters for different training vectors may help better reflect the subtle characteristics of the space while determining the hyperplane. The experiments over almost 100 benchmark and synthetic sets showed that our algorithm delivers models outperforming both SVMs optimized using state-of-the-art evolutionary techniques, and other supervised learners.

Unconstrained Vision Guided UAV Based Safe Helicopter Landing

Arindam Sikdar, Abhimanyu Sahu, Debajit Sen, Rohit Mahajan, Ananda Chowdhury
Track 3: Computer Vision Robotics and Intelligent Systems
Wed 13 Jan 2021 at 16:30 in session PS T3.5

Responsive image

Auto-TLDR; Autonomous Helicopter Landing in Hazardous Environments from Unmanned Aerial Images Using Constrained Graph Clustering

Underline Similar papers

In this paper, we have addressed the problem of automated detection of safe zone(s) for helicopter landing in hazardous environments from images captured by an Unmanned Aerial Vehicle (UAV). The unconstrained motion of the image capturing drone (the UAV in our case) makes the problem further difficult. The solution pipeline consists of natural landmark detection and tracking, stereo-pair generation using constrained graph clustering, digital terrain map construction and safe landing zone detection. The main methodological contribution lies in mathematically formulating epipolar constraint and then using it in a Minimum Spanning Tree (MST) based graph clustering approach. We have also made publicly available AHL (Autonomous Helicopter Landing) dataset, a new aerial video dataset captured by a drone, with annotated ground-truths. Experimental comparisons with other competing clustering methods i) in terms of Dunn Index and Davies Bouldin Index as well as ii) for frame-level safe zone detection in terms of F-measure and confusion matrix clearly demonstrate the effectiveness of the proposed formulation.

Weakly Supervised Geodesic Segmentation of Egyptian Mummy CT Scans

Avik Hati, Matteo Bustreo, Diego Sona, Vittorio Murino, Alessio Del Bue
Track 3: Computer Vision Robotics and Intelligent Systems
Thu 14 Jan 2021 at 16:00 in session PS T3.9

Responsive image

Auto-TLDR; A Weakly Supervised and Efficient Interactive Segmentation of Ancient Egyptian Mummies CT Scans Using Geodesic Distance Measure and GrabCut

Underline Similar papers

In this paper, we tackle the task of automatically analyzing 3D volumetric scans obtained from computed tomography (CT) devices. In particular, we address a particular task for which data is very limited: the segmentation of ancient Egyptian mummies CT scans. We aim at digitally unwrapping the mummy and identify different segments such as body, bandages and jewelry. The problem is complex because of the lack of annotated data for the different semantic regions to segment, thus discouraging the use of strongly supervised approaches. We, therefore, propose a weakly supervised and efficient interactive segmentation method to solve this challenging problem. After segmenting the wrapped mummy from its exterior region using histogram analysis and template matching, we first design a voxel distance measure to find an approximate solution for the body and bandage segments. Here, we use geodesic distances since voxel features as well as spatial relationship among voxels is incorporated in this measure. Next, we refine the solution using a GrabCut based segmentation together with a tracking method on the slices of the scan that assigns labels to different regions in the volume, using limited supervision in the form of scribbles drawn by the user. The efficiency of the proposed method is demonstrated using visualizations and validated through quantitative measures and qualitative unwrapping of the mummy.

Ghost Target Detection in 3D Radar Data Using Point Cloud Based Deep Neural Network

Mahdi Chamseddine, Jason Rambach, Oliver Wasenmüler, Didier Stricker
Track 1: Artificial Intelligence, Machine Learning for Pattern Analysis
Fri 15 Jan 2021 at 16:00 in session PS T1.16

Responsive image

Auto-TLDR; Point Based Deep Learning for Ghost Target Detection in 3D Radar Point Clouds

Underline Similar papers

Ghost targets are targets that appear at wrong locations in radar data and are caused by the presence of multiple indirect reflections between the target and the sensor. In this work, we introduce the first point based deep learning approach for ghost target detection in 3D radar point clouds. This is done by extending the PointNet network architecture by modifying its input to include radar point features beyond location and introducing skip connetions. We compare different input modalities and analyze the effects of the changes we introduced. We also propose an approach for automatic labeling of ghost targets 3D radar data using lidar as reference. The algorithm is trained and tested on real data in various driving scenarios and the tests show promising results in classifying real and ghost radar targets.

Learning Error-Driven Curriculum for Crowd Counting

Wenxi Li, Zhuoqun Cao, Qian Wang, Songjian Chen, Rui Feng
Track 3: Computer Vision Robotics and Intelligent Systems
Wed 13 Jan 2021 at 14:00 in session OS T3.2

Responsive image

Auto-TLDR; Learning Error-Driven Curriculum for Crowd Counting with TutorNet

Underline Similar papers

Density regression has been widely employed in crowd counting. However, the frequency imbalance of pixel values in the density map is still an obstacle to improve the performance. In this paper, we propose a novel learning strategy for learning error-driven curriculum, which uses an additional network to supervise the training of the main network. A tutoring network called TutorNet is proposed to repetitively indicate the critical errors of the main network. TutorNet generates pixel-level weights to formulate the curriculum for the main network during training, so that the main network will assign a higher weight to those hard examples than easy examples. Furthermore, we scale the density map by a factor to enlarge the distance among inter-examples, which is well known to improve the performance. Extensive experiments on two challenging benchmark datasets show that our method has achieved state-of-the-art performance.

Exploiting Knowledge Embedded Soft Labels for Image Recognition

Lixian Yuan, Riquan Chen, Hefeng Wu, Tianshui Chen, Wentao Wang, Pei Chen
Track 1: Artificial Intelligence, Machine Learning for Pattern Analysis
Wed 13 Jan 2021 at 12:00 in session PS T1.4

Responsive image

Auto-TLDR; A Soft Label Vector for Image Recognition

Underline Similar papers

Objects from correlated classes usually share highly similar appearances while objects from uncorrelated classes are very different. Most of current image recognition works treat each class independently, which ignores these class correlations and inevitably leads to sub-optimal performance in many cases. Fortunately, object classes inherently form a hierarchy with different levels of abstraction and this hierarchy encodes rich correlations among different classes. In this work, we utilize a soft label vector that encodes the prior knowledge of class correlations as extra regularization to train the image classifiers. Specifically, for each class, instead of simply using a one-hot vector, we assign a high value to its correlated classes and assign small values to those uncorrelated ones, thus generating knowledge embedded soft labels. We conduct experiments on both general and fine-grained image recognition benchmarks and demonstrate its superiority compared with existing methods.

Are Spoofs from Latent Fingerprints a Real Threat for the Best State-Of-Art Liveness Detectors?

Roberto Casula, Giulia Orrù, Daniele Angioni, Xiaoyi Feng, Gian Luca Marcialis, Fabio Roli
Track 2: Biometrics, Human Analysis and Behavior Understanding
Fri 15 Jan 2021 at 13:00 in session OS T2.3

Responsive image

Auto-TLDR; ScreenSpoof: Attacks using latent fingerprints against state-of-art fingerprint liveness detectors and verification systems

Underline Similar papers

We investigated the threat level of realistic attacks using latent fingerprints against sensors equipped with state-of-art liveness detectors and fingerprint verification systems which integrate such liveness algorithms. To the best of our knowledge, only a previous investigation was done with spoofs from latent prints. In this paper, we focus on using snapshot pictures of latent fingerprints. These pictures provide molds, that allows, after some digital processing, to fabricate high-quality spoofs. Taking a snapshot picture is much simpler than developing fingerprints left on a surface by magnetic powders and lifting the trace by a tape. What we are interested here is to evaluate preliminary at which extent attacks of the kind can be considered a real threat for state-of-art fingerprint liveness detectors and verification systems. To this aim, we collected a novel data set of live and spoof images fabricated with snapshot pictures of latent fingerprints. This data set provide a set of attacks at the most favourable conditions. We refer to this method and the related data set as "ScreenSpoof". Then, we tested with it the performances of the best liveness detection algorithms, namely, the three winners of the LivDet competition. Reported results point out that the ScreenSpoof method is a threat of the same level, in terms of detection and verification errors, than that of attacks using spoofs fabricated with the full consensus of the victim. We think that this is a notable result, never reported in previous work.

Toward Text-Independent Cross-Lingual Speaker Recognition Using English-Mandarin-Taiwanese Dataset

Yi-Chieh Wu, Wen-Hung Liao
Track 2: Biometrics, Human Analysis and Behavior Understanding
Fri 15 Jan 2021 at 15:00 in session PS T2.5

Responsive image

Auto-TLDR; Cross-lingual Speech for Biometric Recognition

Underline Similar papers

Over 40% of the world's population is bilingual. Existing speaker identification/verification systems, however, assume the same language type for both enrollment and recognition stages. In this work, we investigate the feasibility of employing multilingual speech for biometric application. We establish a dataset containing audio recorded in English, Mandarin and Taiwanese. Three acoustic features, namely, i-vector, d-vector and x-vector have been evaluated for both speaker verification (SV) and identification (SI) tasks. Preliminary experimental results indicate that x-vector achieves the best overall performance. Additionally, model trained with hybrid data demonstrates highest accuracy associated with the cost of data collection efforts. In SI tasks, we obtained over 91\% cross-lingual accuracy all models using 3-second audio. In SV tasks, the EER among cross-lingual test is at most 6.52\%, which is observed on the model trained by English corpus. The outcome suggests the feasibility of adopting cross-lingual speech in building text-independent speaker recognition systems.

Face Super-Resolution Network with Incremental Enhancement of Facial Parsing Information

Shuang Liu, Chengyi Xiong, Zhirong Gao
Track 5: Image and Signal Processing
Tue 12 Jan 2021 at 17:00 in session PS T5.2

Responsive image

Auto-TLDR; Learning-based Face Super-Resolution with Incremental Boosting Facial Parsing Information

Underline Similar papers

Recently, facial priors based face super-resolution (SR) methods have obtained significant performance gains in dealing with extremely degraded facial images, and facial priors have also been proved useful in facilitating the inference of face images. Based on this, how to fully fuse facial priors into deep features to improve face SR performance has attracted a major attention. In this paper, we propose a learning-based face SR approach with incremental boosting facial parsing information (IFPSR) for high-magnification of low-resolution faces. The proposed IFPSR method consists of three main parts: i) a three-stage parsing map embedded features upsampling network, in which image recovery and prior estimation processes are performed simultaneously and progressively to improve the image resolution; ii) a progressive training method and a joint facial attention and heatmap loss to obtain better facial attributes; iii) the channel attention strategy in residual dense blocks to adaptively learn facial features. Extensive experimental results show that compared with the state-of-the-art methods in terms of quantitative and qualitative metrics, our approach can achieve an outstanding balance between SR image quality and low network complexity.

Mask-Based Style-Controlled Image Synthesis Using a Mask Style Encoder

Jaehyeong Cho, Wataru Shimoda, Keiji Yanai
Track 5: Image and Signal Processing
Fri 15 Jan 2021 at 15:00 in session PS T5.7

Responsive image

Auto-TLDR; Style-controlled Image Synthesis from Semantic Segmentation masks using GANs

Underline Similar papers

In recent years, the advances in Generative Adversarial Networks (GANs) have shown impressive results for image generation and translation tasks. In particular, the image-to-image translation is a method of learning mapping from a source domain to a target domain and synthesizing an image. Image-to-image translation can be applied to a variety of tasks, making it possible to quickly and easily synthesize realistic images from semantic segmentation masks. However, in the existing image-to-image translation method, there is a limitation on controlling the style of the translated image, and it is not easy to synthesize an image by controlling the style of each mask element in detail. Therefore, we propose an image synthesis method that controls the style of each element by improving the existing image-to-image translation method. In the proposed method, we implement a style encoder that extracts style features for each mask element. The extracted style features are concatenated to the semantic mask in the normalization layer, and used the style-controlled image synthesis of each mask element. In experiments, we train style-controlled images synthesis using the datasets consisting of semantic segmentation masks and real images. The results show that the proposed method has excellent performance for style-controlled images synthesis for each element.

Automatically Gather Address Specific Dwelling Images Using Google Street View

Salman Khan, Carl Salvaggio
Track 3: Computer Vision Robotics and Intelligent Systems
Tue 12 Jan 2021 at 15:00 in session PS T3.1

Responsive image

Auto-TLDR; Automatic Address Specific Dwelling Image Collection Using Google Street View Data

Underline Similar papers

Exciting research is being conducted using Google’s street view imagery. Researchers can have access to training data that allows CNN training for topics ranging from assessing neighborhood environments to estimating the age of a building. However, due to the uncontrolled nature of imagery available via Google’s Street View API, data collection can be lengthy and tedious. In an effort to help researchers gather address specific dwelling images efficiently, we developed an innovative and novel way of automatically performing this task. It was accomplished by exploiting Google’s publicly available platform with a combination of 3 separate network types and postprocessing techniques. Our uniquely developed NMS technique helped achieve 99.4%, valid, address specific dwelling images.

On Learning Random Forests for Random Forest Clustering

Manuele Bicego, Francisco Escolano
Track 1: Artificial Intelligence, Machine Learning for Pattern Analysis
Tue 12 Jan 2021 at 15:00 in session PS T1.2

Responsive image

Auto-TLDR; Learning Random Forests for Clustering

Underline Similar papers

In this paper we study the poorly investigated problem of learning Random Forests for distance-based Random Forest clustering. We studied both classic schemes as well as alternative approaches, novel in this context. In particular, we investigated the suitability of Gaussian Density Forests, Random Forests specifically designed for density estimation. Further, we introduce a novel variant of Random Forest, based on an effective non parametric by-pass estimator of the Renyi entropy, which can be useful when the parametric assumption is too strict. An empirical evaluation involving different datasets and different RF-clustering strategies confirms that the learning step is crucial for RF-clustering. We also present a set of practical guidelines useful to determine the most suitable variant of RF-clustering according to the problem under examination.

Human Embryo Cell Centroid Localization and Counting in Time-Lapse Sequences

Lisette Lockhart, Parvaneh Saeedi, Jason Au, Jon Havelock
Track 5: Image and Signal Processing
Tue 12 Jan 2021 at 17:00 in session PS T5.1

Responsive image

Auto-TLDR; Automated Time-Lapse Estimation of Embryo Cell Stage in Time-lapse Sequences

Underline Similar papers

Couples suffering from infertility issues often use In Vitro Fertilization (IVF) treatment to give birth. Continuous embryo monitoring with time-lapse imaging enables time-based development metrics alongside visual features to assess an embryo’s quality before transfer. Tracking embryonic cell development provides valuable information about its likelihood of leading to a positive pregnancy. Automating this task is challenging due to cell overlap, occlusion, and variation. In this paper, cell stage is identified by counting detected cell centroids in early embryo time-lapse sequences. A convolutional regression network is trained on Gaussian-annotated centroid maps to localize cell centroids. Added network attention blocks encode spatio-temporal relationship in time-lapse sequences to emphasize relevant features in the current frame based on previous frame and cell (i.e. blastomere) movement. The proposed approach was applied to 108 embryo sequences including 1- to 4-cell stage, achieving cell centroid localization distance error of 3.98 pixels, cell detection rate 80.9%, and cell counting accuracy of 80.2%.