Exploring Seismocardiogram Biometrics with Wavelet Transform

Po-Ya Hsu, Po-Han Hsu, Hsin-Li Liu

Responsive image

Auto-TLDR; Seismocardiogram Biometric Matching Using Wavelet Transform and Deep Learning Models

Slides Poster

Seismocardiogram (SCG) has become easily accessible in the past decade owing to the advance of sensor technology. However, SCG biometric has not been widely explored. In this paper, we propose combining wavelet transform together with deep learning models, machine learning classifiers, or structural similarity metric to perform SCG biometric matching tasks. We validate the proposed methods on the publicly available dataset from PhysioNet database. The dataset contains one hour long electrocardiogram, breathing, and SCG data of 20 subjects. We train the models on the first five minute SCG and conduct identification on the last five minute SCG. We evaluate the identification and authentication performance with recognition rate and equal error rate, respectively. Based on the results, we show that wavelet transformed SCG biometric can achieve state-of-the-art performance when combined with deep learning models, machine learning classifiers, or structural similarity.

Similar papers

A Low-Complexity R-Peak Detection Algorithm with Adaptive Thresholding for Wearable Devices

Tiago Rodrigues, Hugo Plácido Da Silva, Ana Luisa Nobre Fred, Sirisack Samoutphonh

Responsive image

Auto-TLDR; Real-Time and Low-Complexity R-peak Detection for Single Lead ECG Signals

Slides Poster Similar

A reliable detection of the R-peaks in an electrocardiogram (ECG) time series is a fundamental step for further rhythmic, heart rate variability (HRV) analysis, biometric recognition techniques and additional ECG waveform based analysis. In this paper, a novel real-time and low-complexity R-peak detection algorithm is presented for single lead ECG signals. The detection algorithm is divided in two stages. In the first pre-processing stage, the QRS complex is enhanced by taking the double derivative, squaring and moving window integration. In the second, the detection of the R-peak is achieved based on a finite state machine approach. The detection threshold is dynamically adapted and follows an exponential decay after each detection, making it suitable for R-peak detection under fast heart rate and R-wave amplitude changes with no additional search back. The proposed algorithm was evaluated in a private single lead ECG database acquired using a FieldWiz wearable device. The database comprises five recordings from four different subjects, recorded during dynamic conditions, running, trail running and gym sessions. The raw ECG signals were annotated for the R-peak and benchmarked against common QRS detectors and proposed method. The combined acquisition setup and presented approach resulted in R-peak detection Sensivitity (Se) of 99.77% and Positive Predictive Value of (PPV) of 99.18%, comparable to state of the art real time QRS detectors. Due to its low computational complexity, this method can be implemented in embedded wearable systems, suited for cardiovascular tracking devices in dynamic use cases and R-peak detection.

EasiECG: A Novel Inter-Patient Arrhythmia Classification Method Using ECG Waves

Chuanqi Han, Ruoran Huang, Fang Yu, Xi Huang, Li Cui

Responsive image

Auto-TLDR; EasiECG: Attention-based Convolution Factorization Machines for Arrhythmia Classification

Slides Poster Similar

Abstract—In an ECG record, the PQRST waves are of important medical significance which provide ample information reflecting heartbeat activities. In this paper, we propose a novel arrhythmia classification method namely EasiECG, characterized by simplicity and accuracy. Compared with other works, the EasiECG takes the configuration of these five key waves into account and does not require complicated feature engineering. Meanwhile, an additional encoding of the extracted features makes the EasiECG applicable even on samples with missing waves. To automatically capture interactions that contribute to the classification among the processed features, a novel adapted classification model named Attention-based Convolution Factorization Machines (ACFM) is proposed. In detail, the ACFM can learn both linear and high-order interactions from linear regression and convolution on outer-product feature interaction maps, respectively. After that, an attention mechanism implemented in the model can further assign different importance of these interactions when predicting certain types of heartbeats. To validate the effectiveness and practicability of our EasiECG, extensive experiments of inter-patient paradigm on the benchmark MIT-BIH arrhythmia database are conducted. To tackle the imbalanced sample problem in this dataset, an ingenious loss function: focal loss is adopted when training. The experiment results show that our method is competitive compared with other state-of-the-arts, especially in classifying the Supraventricular ectopic beats. Besides, the EasiECG achieves an overall accuracy of 87.6% on samples with a missing wave in the related experiment, demonstrating the robustness of our proposed method.

Hybrid Network for End-To-End Text-Independent Speaker Identification

Wajdi Ghezaiel, Luc Brun, Olivier Lezoray

Responsive image

Auto-TLDR; Text-Independent Speaker Identification with Scattering Wavelet Network and Convolutional Neural Networks

Slides Poster Similar

Deep learning has recently improved the performance of Speaker Identification (SI) systems. Promising results have been obtained with Convolutional Neural Networks (CNNs). This success are mostly driven by the advent of large datasets. However in the context of commercial applications, collection of large amount of training data is not always possible. In addition, robustness of a SI system is adversely effected by short utterances. SI with only a few and short utterances is a challenging problem. Therefore, in this paper, we propose a novel text-independent speaker identification system. The proposed system can identify speakers by learning from only few training short utterances examples. To achieve this, we combine CNN with Scattering Wavelet Network. We propose a two-stage feature extraction framework using a two-layer wavelet scattering network coupled with a CNN for SI system. The proposed architecture takes variable length speech segments. To evaluate the effectiveness of the proposed approach, Timit and Librispeech datasets are used in the experiments. These conducted experiments show that our hybrid architecture performs successfully for SI, even with a small number and short duration of training samples. In comparaison with related methods, the obtained results shows that an hybrid architecture achieve better performance.

Epileptic Seizure Prediction: A Semi-Dilated Convolutional Neural Network Architecture

Ramy Hussein, Rabab K. Ward, Soojin Lee, Martin Mckeown

Responsive image

Auto-TLDR; Semi-Dilated Convolutional Network for Seizure Prediction using EEG Scalograms

Poster Similar

Despite many recent advances in machine learning and time-series classification, accurate prediction of seizures remains elusive. In this work, we develop a convolutional network module that uses Electroencephalogram (EEG) scalograms to distinguish between the pre-seizure and normal brain activities. Since the EEG scalogram takes rectangular image format with many more temporal bins than spectral bins, the presented module uses "semi-dilated convolutions" to also create a proportional non-square receptive field. The proposed semi-dilated convolutions support exponential expansion of the receptive field over the long dimension (image width, i.e. time) while maintaining high resolution over the short dimension (image height, i.e., frequency). The proposed architecture comprises a set of co-operative semi-dilated convolutional blocks, each block has a stack of parallel semi-dilated convolutional modules with different dilation rates. Results show that our proposed seizure prediction solution outperforms the state-of-the-art methods, achieving a seizure prediction sensitivity of 88.45% and 89.52% for the American Epilepsy Society and Melbourne University EEG datasets, respectively.

Generalized Iris Presentation Attack Detection Algorithm under Cross-Database Settings

Mehak Gupta, Vishal Singh, Akshay Agarwal, Mayank Vatsa, Richa Singh

Responsive image

Auto-TLDR; MVNet: A Deep Learning-based PAD Network for Iris Recognition against Presentation Attacks

Slides Poster Similar

The deployment of biometrics features based person identification has increased significantly from border access to mobile unlock to electronic transactions. Iris recognition is considered as one of the most accurate biometric modality for person identification. However, the vulnerability of this recognition towards presentation attacks, especially towards the 3D contact lenses, can limit its potential deployments. The textured lenses are so effective in hiding the real texture of iris that it can fool not only the automatic recognition algorithms but also the human examiners. While in literature, several presentation attack detection (PAD) algorithms are presented; however, the significant limitation is the generalizability against an unseen database, unseen sensor, and different imaging environment. Inspired by the success of the hybrid algorithm or fusion of multiple detection networks, we have proposed a deep learning-based PAD network that utilizes multiple feature representation layers. The computational complexity is an essential factor in training the deep neural networks; therefore, to limit the computational complexity while learning multiple feature representation layers, a base model is kept the same. The network is trained end-to-end using a softmax classifier. We have evaluated the performance of the proposed network termed as MVNet using multiple databases such as IIITD-WVU MUIPA, IIITD-WVU UnMIPA database under cross-database training-testing settings. The experiments are performed extensively to assess the generalizability of the proposed algorithm.

Super-Resolution Guided Pore Detection for Fingerprint Recognition

Syeda Nyma Ferdous, Ali Dabouei, Jeremy Dawson, Nasser M. Nasarabadi

Responsive image

Auto-TLDR; Super-Resolution Generative Adversarial Network for Fingerprint Recognition Using Pore Features

Slides Poster Similar

Performance of fingerprint recognition algorithms substantially rely on fine features extracted from fingerprints. Apart from minutiae and ridge patterns, pore features have proven to be usable for fingerprint recognition. Although features from minutiae and ridge patterns are quite attainable from low-resolution images, using pore features is practical only if the fingerprint image is of high resolution which necessitates a model that enhances the image quality of the conventional 500 ppi legacy fingerprints preserving the fine details. To find a solution for recovering pore information from low-resolution fingerprints, we adopt a joint learning-based approach that combines both super-resolution and pore detection networks. Our modified single image Super-Resolution Generative Adversarial Network (SRGAN) framework helps to reliably reconstruct high-resolution fingerprint samples from low-resolution ones assisting the pore detection network to identify pores with a high accuracy. The network jointly learns a distinctive feature representation from a real low-resolution fingerprint sample and successfully synthesizes a high-resolution sample from it. To add discriminative information and uniqueness for all the subjects, we have integrated features extracted from a deep fingerprint verifier with the SRGAN quality discriminator. We also add ridge reconstruction loss, utilizing ridge patterns to make the best use of extracted features. Our proposed method solves the recognition problem by improving the quality of fingerprint images. High recognition accuracy of the synthesized samples that is close to the accuracy achieved using the original high-resolution images validate the effectiveness of our proposed model.

Automatic Classification of Human Granulosa Cells in Assisted Reproductive Technology Using Vibrational Spectroscopy Imaging

Marina Paolanti, Emanuele Frontoni, Giorgia Gioacchini, Giorgini Elisabetta, Notarstefano Valentina, Zacà Carlotta, Carnevali Oliana, Andrea Borini, Marco Mameli

Responsive image

Auto-TLDR; Predicting Oocyte Quality in Assisted Reproductive Technology Using Machine Learning Techniques

Slides Poster Similar

In the field of reproductive technology, the biochemical composition of female gametes has been successfully investigated with the use of vibrational spectroscopy. Currently, in assistive reproductive technology (ART), there are no shared criteria for the choice of oocyte, and automatic classification methods for the best quality oocytes have not yet been applied. In this paper, considering the lack of criteria in Assisted Reproductive Technology (ART), we use Machine Learning (ML) techniques to predict oocyte quality for a successful pregnancy. To improve the chances of successful implantation and minimize any complications during the pregnancy, Fourier transform infrared microspectroscopy (FTIRM) analysis has been applied on granulosa cells (GCs) collected along with the oocytes during oocyte aspiration, as it is routinely done in ART, and specific spectral biomarkers were selected by multivariate statistical analysis. A proprietary biological reference dataset (BRD) was successfully collected to predict the best oocyte for a successful pregnancy. Personal health information are stored, maintained and backed up using a cloud computing service. Using a user-friendly interface, the user will evaluate whether or not the selected oocyte will have a positive result. This interface includes a dashboard for retrospective analysis, reporting, real-time processing, and statistical analysis. The experimental results are promising and confirm the efficiency of the method in terms of classification metrics: precision, recall, and F1-score (F1) measures.

Exploring Spatial-Temporal Representations for fNIRS-based Intimacy Detection via an Attention-enhanced Cascade Convolutional Recurrent Neural Network

Chao Li, Qian Zhang, Ziping Zhao

Responsive image

Auto-TLDR; Intimate Relationship Prediction by Attention-enhanced Cascade Convolutional Recurrent Neural Network Using Functional Near-Infrared Spectroscopy

Slides Poster Similar

The detection of intimacy plays a crucial role in the improvement of intimate relationship, which contributes to promote the family and social harmony. Previous studies have shown that different degrees of intimacy have significant differences in brain imaging. Recently, a few of work has emerged to recognise intimacy automatically by using machine learning technique. Moreover, considering the temporal dynamic characteristics of intimacy relationship on neural mechanism, how to model spatio-temporal dynamics for intimacy prediction effectively is still a challenge. In this paper, we propose a novel method to explore deep spatial-temporal representations for intimacy prediction by Attention-enhanced Cascade Convolutional Recurrent Neural Network (ACCRNN). Given the advantages of time-frequency resolution in complex neuronal activities analysis, this paper utilizes functional near-infrared spectroscopy (fNIRS) to analyse and infer to intimate relationship. We collect a fNIRS-based dataset for the analysis of intimate relationship. Forty-two-channel fNIRS signals are recorded from the 44 subjects' prefrontal cortex when they watched a total of 18 photos of lovers, friends and strangers for 30 seconds per photo. The experimental results show that our proposed method outperforms the others in terms of accuracy with the precision of 96.5%. To the best of our knowledge, this is the first time that such a hybrid deep architecture has been employed for fNIRS-based intimacy prediction.

3D Facial Matching by Spiral Convolutional Metric Learning and a Biometric Fusion-Net of Demographic Properties

Soha Sadat Mahdi, Nele Nauwelaers, Philip Joris, Giorgos Bouritsas, Imperial London, Sergiy Bokhnyak, Susan Walsh, Mark Shriver, Michael Bronstein, Peter Claes

Responsive image

Auto-TLDR; Multi-biometric Fusion for Biometric Verification using 3D Facial Mesures

Slides Similar

Face recognition is a widely accepted biometric verification tool, as the face contains a lot of information about the identity of a person. In this study, a 2-step neural-based pipeline is presented for matching 3D facial shape to multiple DNA-related properties (sex, age, BMI and genomic background). The first step consists of a triplet loss-based metric learner that compresses facial shape into a lower dimensional embedding while preserving information about the property of interest. Most studies in the field of metric learning have only focused on Euclidean data. In this work, geometric deep learning is employed to learn directly from 3D facial meshes. To this end, spiral convolutions are used along with a novel mesh-sampling scheme that retains uniformly sampled 3D points at different levels of resolution. The second step is a multi-biometric fusion by a fully connected neural network. The network takes an ensemble of embeddings and property labels as input and returns genuine and imposter scores. Since embeddings are accepted as an input, there is no need to train classifiers for the different properties and available data can be used more efficiently. Results obtained by a 10-fold cross-validation for biometric verification show that combining multiple properties leads to stronger biometric systems. Furthermore, the proposed neural-based pipeline outperforms a linear baseline, which consists of principal component analysis, followed by classification with linear support vector machines and a Naïve Bayes-based score-fuser.

One-Shot Representational Learning for Joint Biometric and Device Authentication

Sudipta Banerjee, Arun Ross

Responsive image

Auto-TLDR; Joint Biometric and Device Recognition from a Single Biometric Image

Slides Poster Similar

In this work, we propose a method to simultaneously perform (i) biometric recognition (\textit{i.e.}, identify the individual), and (ii) device recognition, (\textit{i.e.}, identify the device) from a single biometric image, say, a face image, using a one-shot schema. Such a joint recognition scheme can be useful in devices such as smartphones for enhancing security as well as privacy. We propose to automatically learn a joint representation that encapsulates both biometric-specific and sensor-specific features. We evaluate the proposed approach using iris, face and periocular images acquired using near-infrared iris sensors and smartphone cameras. Experiments conducted using 14,451 images from 13 sensors resulted in a rank-1 identification accuracy of upto 99.81\% and a verification accuracy of upto 100\% at a false match rate of 1\%.

Multi-Scale and Attention Based ResNet for Heartbeat Classification

Haojie Zhang, Gongping Yang, Yuwen Huang, Feng Yuan, Yilong Yin

Responsive image

Auto-TLDR; A Multi-Scale and Attention based ResNet for ECG heartbeat classification in intra-patient and inter-patient paradigms

Slides Poster Similar

This paper presents a novel deep learning framework for the electrocardiogram (ECG) heartbeat classification. Although there have been some studies with excellent overall accuracy, these studies have not been very accurate in the diagnosis of arrhythmia classes especially such as supraventricular ectopic beat (SVEB) and ventricular ectopic beat (VEB). In our work, we propose a Multi-Scale and Attention based ResNet for heartbeat classification in intra-patient and inter-patient paradigms respectively. Firstly, we extract shallow features from a convolutional layer. Secondly, the shallow features are sent into three branches with different convolution kernels in order to combine receptive fields of different sizes. Finally, fully connected layers are used to classify the heartbeat. Besides, we design a new attention mechanism based on the characteristics of heartbeat data. At last, extensive experiments on benchmark dataset demonstrate the effectiveness of our proposed model.

Explainable Online Validation of Machine Learning Models for Practical Applications

Wolfgang Fuhl, Yao Rong, Thomas Motz, Michael Scheidt, Andreas Markus Hartel, Andreas Koch, Enkelejda Kasneci

Responsive image

Auto-TLDR; A Reformulation of Regression and Classification for Machine Learning Algorithm Validation

Slides Poster Similar

We present a reformulation of the regression and classification, which aims to validate the result of a machine learning algorithm. Our reformulation simplifies the original problem and validates the result of the machine learning algorithm using the training data. Since the validation of machine learning algorithms must always be explainable, we perform our experiments with the kNN algorithm as well as with an algorithm based on conditional probabilities, which is proposed in this work. For the evaluation of our approach, three publicly available data sets were used and three classification and two regression problems were evaluated. The presented algorithm based on conditional probabilities is also online capable and requires only a fraction of memory compared to the kNN algorithm.

EEG-Based Cognitive State Assessment Using Deep Ensemble Model and Filter Bank Common Spatial Pattern

Debashis Das Chakladar, Shubhashis Dey, Partha Pratim Roy, Masakazu Iwamura

Responsive image

Auto-TLDR; A Deep Ensemble Model for Cognitive State Assessment using EEG-based Cognitive State Analysis

Slides Poster Similar

Electroencephalography (EEG) is the most used physiological measure to evaluate the cognitive state of a user efficiently. As EEG inherently suffers from a poor spatial resolution, features extracted from each EEG channel may not efficiently used for cognitive state assessment. In this paper, the EEG-based cognitive state assessment has been performed during the mental arithmetic experiment, which includes two cognitive states (task and rest) of a user. To obtain the temporal as well as spatial resolution of the EEG signal, we combined the Filter Bank Common Spatial Pattern (FBCSP) method and Long Short-Term Memory (LSTM)-based deep ensemble model for classifying the cognitive state of a user. Subject-wise data distribution has been performed due to the execution of a large volume of data in a low computing environment. In the FBCSP method, the input EEG is decomposed into multiple equal-sized frequency bands, and spatial features of each frequency bands are extracted using the Common Spatial Pattern (CSP) algorithm. Next, a feature selection algorithm has been applied to identify the most informative features for classification. The proposed deep ensemble model consists of multiple similar structured LSTM networks that work in parallel. The output of the ensemble model (i.e., the cognitive state of a user) is computed using the average weighted combination of individual model prediction. This proposed model achieves 87\% classification accuracy, and it can also effectively estimate the cognitive state of a user in a low computing environment.

Finger Vein Recognition and Intra-Subject Similarity Evaluation of Finger Veins Using the CNN Triplet Loss

Georg Wimmer, Bernhard Prommegger, Andreas Uhl

Responsive image

Auto-TLDR; Finger vein recognition using CNNs and hard triplet online selection

Slides Poster Similar

Finger vein recognition deals with the identification of subjects based on their venous pattern within the fingers. There is a lot of prior work using hand crafted features, but only little work using CNN based recognition systems. This article proposes a new approach using CNNs that utilizes the triplet loss function together with hard triplet online selection for finger vein recognition. The CNNs are used for three different use cases: (1) the classical recognition use case, where every finger of a subject is considered as a separate class, (2) an evaluation of the similarity of left and right hand fingers from the same subject and (3) an evaluation of the similarity of different fingers of the same subject. The results show that the proposed approach achieves superior results compared to prior work on finger vein recognition using the triplet loss function. Furtherly, we show that different fingers of the same subject, especially same fingers from the left and right hand, show enough similarities to perform recognition. The last statement contradicts the current understanding in the literature for finger vein biometry, in which it is assumed that different fingers of the same subject are unique identities.

Learning Disentangled Representations for Identity Preserving Surveillance Face Camouflage

Jingzhi Li, Lutong Han, Hua Zhang, Xiaoguang Han, Jingguo Ge, Xiaochu Cao

Responsive image

Auto-TLDR; Individual Face Privacy under Surveillance Scenario with Multi-task Loss Function

Poster Similar

In this paper, we focus on protecting the person face privacy under the surveillance scenarios, whose goal is to change the visual appearances of faces while keep them to be recognizable by current face recognition systems. This is a challenging problem as that we should retain the most important structures of captured facial images, while alter the salient facial regions to protect personal privacy. To address this problem, we introduce a novel individual face protection model, which can camouflage the face appearance from the perspective of human visual perception and preserve the identity features of faces used for face authentication. To that end, we develop an encoder-decoder network architecture that can separately disentangle the person feature representation into an appearance code and an identity code. Specifically, we first randomly divide the face image into two groups, the source set and the target set, where the source set is used to extract the identity code and the target set provides the appearance code. Then, we recombine the identity and appearance codes to synthesize a new face, which has the same identity with the source subject. Finally, the synthesized faces are used to replace the original face to protect the privacy of individual. Furthermore, our model is trained end-to-end with a multi-task loss function, which can better preserve the identity and stabilize the training loss. Experiments conducted on Cross-Age Celebrity dataset demonstrate the effectiveness of our model and validate our superiority in terms of visual quality and scalability.

Electroencephalography Signal Processing Based on Textural Features for Monitoring the Driver’s State by a Brain-Computer Interface

Giulia Orrù, Marco Micheletto, Fabio Terranova, Gian Luca Marcialis

Responsive image

Auto-TLDR; One-dimensional Local Binary Pattern Algorithm for Estimating Driver Vigilance in a Brain-Computer Interface System

Slides Poster Similar

In this study we investigate a textural processing method of electroencephalography (EEG) signal as an indicator to estimate the driver's vigilance in a hypothetical Brain-Computer Interface (BCI) system. The novelty of the solution proposed relies on employing the one-dimensional Local Binary Pattern (1D-LBP) algorithm for feature extraction from pre-processed EEG data. From the resulting feature vector, the classification is done according to three vigilance classes: awake, tired and drowsy. The claim is that the class transitions can be detected by describing the variations of the micro-patterns' occurrences along the EEG signal. The 1D-LBP is able to describe them by detecting mutual variations of the signal temporarily "close" as a short bit-code. Our analysis allows to conclude that the 1D-LBP adoption has led to significant performance improvement. Moreover, capturing the class transitions from the EEG signal is effective, although the overall performance is not yet good enough to develop a BCI for assessing the driver's vigilance in real environments.

SoftmaxOut Transformation-Permutation Network for Facial Template Protection

Hakyoung Lee, Cheng Yaw Low, Andrew Teoh

Responsive image

Auto-TLDR; SoftmaxOut Transformation-Permutation Network for C cancellable Biometrics

Slides Poster Similar

In this paper, we propose a data-driven cancellable biometrics scheme, referred to as SoftmaxOut Transformation-Permutation Network (SOTPN). The SOTPN is a neural version of Random Permutation Maxout (RPM) transform, which was introduced for facial template protection. We present a specialized SoftmaxOut layer integrated with the permutable MaxOut units and the parameterized softmax function to approximate the non-differentiable permutation and the winner-takes-all operations in the RPM transform. On top of that, a novel pairwise ArcFace loss and a code balancing loss are also formulated to ensure that the SOTPN-transformed facial template is cancellable, discriminative, high entropy and free from quantization errors when coupled with the SoftmaxOut layer. The proposed SOTPN is evaluated on three face datasets, namely LFW, YouTube Face and Facescrub, and our experimental results disclosed that the SOTPN outperforms the RPM transform significantly.

Influence of Event Duration on Automatic Wheeze Classification

Bruno M Rocha, Diogo Pessoa, Alda Marques, Paulo Carvalho, Rui Pedro Paiva

Responsive image

Auto-TLDR; Experimental Design of the Non-wheeze Class for Wheeze Classification

Slides Poster Similar

Patients with respiratory conditions typically exhibit adventitious respiratory sounds, such as wheezes. Wheeze events have variable duration. In this work we studied the influence of event duration on wheeze classification, namely how the creation of the non-wheeze class affected the classifiers' performance. First, we evaluated several classifiers on an open access respiratory sound database, with the best one reaching sensitivity and specificity values of 98% and 95%, respectively. Then, by changing one parameter in the design of the non-wheeze class, i.e., event duration, the best classifier only reached sensitivity and specificity values of 53% and 75%, respectively. These results demonstrate the importance of experimental design on the assessment of wheeze classification algorithms' performance.

Face Anti-Spoofing Using Spatial Pyramid Pooling

Lei Shi, Zhuo Zhou, Zhenhua Guo

Responsive image

Auto-TLDR; Spatial Pyramid Pooling for Face Anti-Spoofing

Slides Poster Similar

Face recognition system is vulnerable to many kinds of presentation attacks, so how to effectively detect whether the image is from the real face is particularly important. At present, many deep learning-based anti-spoofing methods have been proposed. But these approaches have some limitations, for example, global average pooling (GAP) easily loses local information of faces, single-scale features easily ignore information differences in different scales, while a complex network is prune to be overfitting. In this paper, we propose a face anti-spoofing approach using spatial pyramid pooling (SPP). Firstly, we use ResNet-18 with a small amount of parameter as the basic model to avoid overfitting. Further, we use spatial pyramid pooling module in the single model to enhance local features while fusing multi-scale information. The effectiveness of the proposed method is evaluated on three databases, CASIA-FASD, Replay-Attack and CASIA-SURF. The experimental results show that the proposed approach can achieve state-of-the-art performance.

Wireless Localisation in WiFi Using Novel Deep Architectures

Peizheng Li, Han Cui, Aftab Khan, Usman Raza, Robert Piechocki, Angela Doufexi, Tim Farnham

Responsive image

Auto-TLDR; Deep Neural Network for Indoor Localisation of WiFi Devices in Indoor Environments

Slides Poster Similar

This paper studies the indoor localisation of WiFi devices based on a commodity chipset and standard channel sounding. First, we present a novel shallow neural network (SNN) in which features are extracted from the channel state information (CSI) corresponding to WiFi subcarriers received on different antennas and used to train the model. The single layer architecture of this localisation neural network makes it lightweight and easy-to-deploy on devices with stringent constraints on computational resources. We further investigate for localisation the use of deep learning models and design novel architectures for convolutional neural network (CNN) and long-short term memory (LSTM). We extensively evaluate these localisation algorithms for continuous tracking in indoor environments. Experimental results prove that even an SNN model, after a careful handcrafted feature extraction, can achieve accurate localisation. Meanwhile, using a well-organised architecture, the neural network models can be trained directly with raw data from the CSI and localisation features can be automatically extracted to achieve accurate position estimates. We also found that the performance of neural network-based methods are directly affected by the number of anchor access points (APs) regardless of their structure. With three APs, all neural network models proposed in this paper can obtain localisation accuracy of around 0.5 metres. In addition the proposed deep NN architecture reduces the data pre-processing time by 6.5 hours compared with a shallow NN using the data collected in our testbed. In the deployment phase, the inference time is also significantly reduced to 0.1 ms per sample. We also demonstrate the generalisation capability of the proposed method by evaluating models using different target movement characteristics to the ones in which they were trained.

Personalized Models in Human Activity Recognition Using Deep Learning

Hamza Amrani, Daniela Micucci, Paolo Napoletano

Responsive image

Auto-TLDR; Incremental Learning for Personalized Human Activity Recognition

Slides Poster Similar

Current sensor-based human activity recognition techniques that rely on a user-independent model struggle to generalize to new users and on to changes that a person may make over time to his or her way of carrying out activities. Incremental learning is a technique that allows to obtain personalized models which may improve the performance on the classifiers thanks to a continuous learning based on user data. Finally, deep learning techniques have been proven to be more effective with respect to traditional ones in the generation of user-independent models. The aim of our work is therefore to put together deep learning techniques with incremental learning in order to obtain personalized models that perform better with respect to user-independent model and personalized model obtained using traditional machine learning techniques. The experimentation was done by comparing the results obtained by a technique in the state of the art with those obtained by two neural networks (ResNet and a simplified CNN) on three datasets. The experimentation showed that neural networks adapt faster to a new user than the baseline.

BAT Optimized CNN Model Identifies Water Stress in Chickpea Plant Shoot Images

Shiva Azimi, Taranjit Kaur, Tapan Gandhi

Responsive image

Auto-TLDR; BAT Optimized ResNet-18 for Stress Classification of chickpea shoot images under water deficiency

Slides Poster Similar

Stress due to water deficiency in plants can significantly lower the agricultural yield. It can affect many visible plant traits such as size and surface area, the number of leaves and their color, etc. In recent years, computer vision-based plant phenomics has emerged as a promising tool for plant research and management. Such techniques have the advantage of being non-destructive, non-evasive, fast, and offer high levels of automation. Pulses like chickpeas play an important role in ensuring food security in poor countries owing to their high protein and nutrition content. In the present work, we have built a dataset comprising of two varieties of chickpea plant shoot images under different moisture stress conditions. Specifically, we propose a BAT optimized ResNet-18 model for classifying stress induced by water deficiency using chickpea shoot images. BAT algorithm identifies the optimal value of the mini-batch size to be used for training rather than employing the traditional manual approach of trial and error. Experimentation on two crop varieties (JG and Pusa) reveals that BAT optimized approach achieves an accuracy of 96% and 91% for JG and Pusa varieties that is better than the traditional method by 4%. The experimental results are also compared with state of the art CNN models like Alexnet, GoogleNet, and ResNet-50. The comparison results demonstrate that the proposed BAT optimized ResNet-18 model achieves higher performance than the comparison counterparts.

Deep Transfer Learning for Alzheimer’s Disease Detection

Nicole Cilia, Claudio De Stefano, Francesco Fontanella, Claudio Marrocco, Mario Molinara, Alessandra Scotto Di Freca

Responsive image

Auto-TLDR; Automatic Detection of Handwriting Alterations for Alzheimer's Disease Diagnosis using Dynamic Features

Slides Poster Similar

Early detection of Alzheimer’s Disease (AD) is essential in order to initiate therapies that can reduce the effects of such a disease, improving both life quality and life expectancy of patients. Among all the activities carried out in our daily life, handwriting seems one of the first to be influenced by the arise of neurodegenerative diseases. For this reason, the analysis of handwriting and the study of its alterations has become of great interest in this research field in order to make a diagnosis as early as possible. In recent years, many studies have tried to use classification algorithms applied to handwritings to implement decision support systems for AD diagnosis. A key issue for the use of these techniques is the detection of effective features, that allow the system to distinguish the natural handwriting alterations due to age, from those caused by neurodegenerative disorders. In this context, many interesting results have been published in the literature in which the features have been typically selected by hand, generally considering the dynamics of the handwriting process in order to detect motor disorders closely related to AD. Features directly derived from handwriting generation models can be also very helpful for AD diagnosis. It should be remarked, however, that the above features do not consider changes in the shape of handwritten traces, which may occur as a consequence of neurodegenerative diseases, as well as the correlation among shape alterations and changes in the dynamics of the handwriting process. Moving from these considerations, the aim of this study is to verify if the combined use of both shape and dynamic features allows a decision support system to improve performance for AD diagnosis. To this purpose, starting from a database of on-line handwriting samples, we generated for each of them a synthetic off-line colour image, where the colour of each elementary trait encodes, in the three RGB channels, the dynamic information associated to that trait. Finally, we exploited the capability of Deep Neural Networks (DNN) to automatically extract features from raw images. The experimental comparison of the results obtained by using standard features and features extracted according the above procedure, confirmed the effectiveness of our approach.

Face Anti-Spoofing Based on Dynamic Color Texture Analysis Using Local Directional Number Pattern

Junwei Zhou, Ke Shu, Peng Liu, Jianwen Xiang, Shengwu Xiong

Responsive image

Auto-TLDR; LDN-TOP Representation followed by ProCRC Classification for Face Anti-Spoofing

Slides Poster Similar

Face anti-spoofing is becoming increasingly indispensable for face recognition systems, which are vulnerable to various spoofing attacks performed using fake photos and videos. In this paper, a novel "LDN-TOP representation followed by ProCRC classification" pipeline for face anti-spoofing is proposed. We use local directional number pattern (LDN) with the derivative-Gaussian mask to capture detailed appearance information resisting illumination variations and noises, which can influence the texture pattern distribution. To further capture motion information, we extend LDN to a spatial-temporal variant named local directional number pattern from three orthogonal planes (LDN-TOP). The multi-scale LDN-TOP capturing complete information is extracted from color images to generate the feature vector with powerful representation capacity. Finally, the feature vector is fed into the probabilistic collaborative representation based classifier (ProCRC) for face anti-spoofing. Our method is evaluated on three challenging public datasets, namely CASIA FASD, Replay-Attack database, and UVAD database using sequence-based evaluation protocol. The experimental results show that our method can achieve promising performance with 0.37% EER on CASIA and 5.73% HTER on UVAD. The performance on Replay-Attack database is also competitive.

Documents Counterfeit Detection through a Deep Learning Approach

Darwin Danilo Saire Pilco, Salvatore Tabbone

Responsive image

Auto-TLDR; End-to-End Learning for Counterfeit Documents Detection using Deep Neural Network

Slides Poster Similar

The main topic of this work is on the detection of counterfeit documents and especially banknotes. We propose an end-to-end learning model using a deep learning approach based on Adapnet++ which manages feature extraction at multiple scale levels using several residual units. Unlike previous models based on regions of interest (ROI) and high-resolution documents, our network is feed with simple input images (i.e., a single patch) and we do not need high resolution images. Besides, discriminative regions can be visualized at different scales. Our network learns by itself which regions of interest predict the better results. Experimental results show that we are competitive compared with the state-of-the-art and our deep neural network has good ability to generalize and can be applied to other kind of documents like identity or administrative one.

How Unique Is a Face: An Investigative Study

Michal Balazia, S L Happy, Francois Bremond, Antitza Dantcheva

Responsive image

Auto-TLDR; Uniqueness of Face Recognition: Exploring the Impact of Factors such as image resolution, feature representation, database size, age and gender

Slides Poster Similar

Face recognition has been widely accepted as a means of identification in applications ranging from border control to security in the banking sector. Surprisingly, while widely accepted, we still lack the understanding of the uniqueness or distinctiveness of face as a biometric characteristic. In this work, we study the impact of factors such as image resolution, feature representation, database size, age and gender on uniqueness denoted by the Kullback-Leibler divergence between genuine and impostor distributions. Towards understanding the impact, we present experimental results on the datasets AT&T, LFW, IMDb-Face, as well as ND-TWINS, with the feature extraction algorithms VGGFace, VGG16, ResNet50, InceptionV3, MobileNet and DenseNet121, that reveal the quantitative impact of the named factors. While these are early results, our findings indicate the need for a better understanding of the concept of biometric uniqueness and its implication on face recognition.

Level Three Synthetic Fingerprint Generation

Andre Wyzykowski, Mauricio Pamplona Segundo, Rubisley Lemes

Responsive image

Auto-TLDR; Synthesis of High-Resolution Fingerprints with Pore Detection Using CycleGAN

Slides Poster Similar

Today's legal restrictions that protect the privacy of biometric data are hampering fingerprint recognition researches. For instance, all high-resolution fingerprint databases ceased to be publicly available. To address this problem, we present a novel hybrid approach to synthesize realistic, high-resolution fingerprints. First, we improved Anguli, a handcrafted fingerprint generator, to obtain dynamic ridge maps with sweat pores and scratches. Then, we trained a CycleGAN to transform these maps into realistic fingerprints. Unlike other CNN-based works, we can generate several images for the same identity. We used our approach to create a synthetic database with 7400 images in an attempt to propel further studies in this field without raising legal issues. We included sweat pore annotations in 740 images to encourage research developments in pore detection. In our experiments, we employed two fingerprint matching approaches to confirm that real and synthetic databases have similar performance. We conducted a human perception analysis where sixty volunteers could hardly differ between real and synthesized fingerprints. Given that we also favorably compare our results with the most advanced works in the literature, our experimentation suggests that our approach is the new state-of-the-art.

Fully Convolutional Neural Networks for Raw Eye Tracking Data Segmentation, Generation, and Reconstruction

Wolfgang Fuhl, Yao Rong, Enkelejda Kasneci

Responsive image

Auto-TLDR; Semantic Segmentation of Eye Tracking Data with Fully Convolutional Neural Networks

Slides Poster Similar

In this paper, we use fully convolutional neural networks for the semantic segmentation of eye tracking data. We also use these networks for reconstruction, and in conjunction with a variational auto-encoder to generate eye movement data. The first improvement of our approach is that no input window is necessary, due to the use of fully convolutional networks and therefore any input size can be processed directly. The second improvement is that the used and generated data is raw eye tracking data (position X, Y and time) without preprocessing. This is achieved by pre-initializing the filters in the first layer and by building the input tensor along the z axis. We evaluated our approach on three publicly available datasets and compare the results to the state of the art.

Creating Classifier Ensembles through Meta-Heuristic Algorithms for Aerial Scene Classification

Álvaro Roberto Ferreira Jr., Gustavo Gustavo Henrique De Rosa, Joao Paulo Papa, Gustavo Carneiro, Fabio Augusto Faria

Responsive image

Auto-TLDR; Univariate Marginal Distribution Algorithm for Aerial Scene Classification Using Meta-Heuristic Optimization

Slides Poster Similar

Aerial scene classification is a challenging task to be solved in the remote sensing area, whereas deep learning approaches, such as Convolutional Neural Networks (CNN), are being widely employed to overcome such a problem. Nevertheless, it is not straightforward to find single CNN models that can solve all aerial scene classification tasks, allowing the nurturing of a better alternative, which is to fuse CNN-based classifiers into an ensemble. However, an appropriate choice of the classifiers that will belong to the ensemble is a critical factor, as it is unfeasible to employ all the possible classifiers in the literature. Therefore, this work proposes a novel framework based on meta-heuristic optimization for creating optimized-ensembles in the context of aerial scene classification. The experimental results were performed across nine meta-heuristic algorithms and three aerial scene literature datasets, being compared in terms of effectiveness (accuracy), efficiency (execution time), and behavioral performance in different scenarios. Finally, one can observe that the Univariate Marginal Distribution Algorithm (UMDA) overcame popular literature meta-heuristic algorithms, such as Genetic Programming and Particle Swarm Optimization considering the adopted criteria in the performed experiments.

Depth Videos for the Classification of Micro-Expressions

Ankith Jain Rakesh Kumar, Bir Bhanu, Christopher Casey, Sierra Cheung, Aaron Seitz

Responsive image

Auto-TLDR; RGB-D Dataset for the Classification of Facial Micro-expressions

Slides Poster Similar

Facial micro-expressions are spontaneous, subtle, involuntary muscle movements occurring briefly on the face. The spotting and recognition of these expressions are difficult due to the subtle behavior, and the time duration of these expressions is about half a second, which makes it difficult for humans to identify them. These micro-expressions have many applications in our daily life, such as in the field of online learning, game playing, lie detection, and therapy sessions. Traditionally, researchers use RGB images/videos to spot and classify these micro-expressions, which pose challenging problems, such as illumination, privacy concerns and pose variation. The use of depth videos solves these issues to some extent, as the depth videos are not susceptible to the variation in illumination. This paper describes the collection of a first RGB-D dataset for the classification of facial micro-expressions into 6 universal expressions: Anger, Happy, Sad, Fear, Disgust, and Surprise. This paper shows the comparison between the RGB and Depth videos for the classification of facial micro-expressions. Further, a comparison of results shows that depth videos alone can be used to classify facial micro-expressions correctly in a decision tree structure by using the traditional and deep learning approaches with good classification accuracy. The dataset will be released to the public in the near future.

Automatic Tuberculosis Detection Using Chest X-Ray Analysis with Position Enhanced Structural Information

Hermann Jepdjio Nkouanga, Szilard Vajda

Responsive image

Auto-TLDR; Automatic Chest X-ray Screening for Tuberculosis in Rural Population using Localized Region on Interest

Slides Poster Similar

For Tuberculosis (TB) detection beside the more expensive diagnosis solutions such as culture or sputum smear analysis one could consider the automatic analysis of the chest X-ray (CXR). This could mimic the lung region reading by the radiologist and it could provide a cheap solution to analyze and diagnose pulmonary abnormalities such as TB which often co- occurs with HIV. This software based pulmonary screening can be a reliable and affordable solution for rural population in different parts of the world such as India, Africa, etc. Our fully automatic system is processing the incoming CXR image by applying image processing techniques to detect the region on interest (ROI) followed by a computationally cheap feature extraction involving edge detection using Laplacian of Gaussian which we enrich by counting the local distribution of the intensities. The choice to ”zoom in” the ROI and look for abnormalities locally is motivated by the fact that some pulmonary abnormalities are localized in specific regions of the lungs. Later on the classifiers can decide about the normal or abnormal nature of each lung X-ray. Our goal is to find a simple feature, instead of a combination of several ones, -proposed and promoted in recent years’ literature, which can properly describe the different pathological alterations in the lungs. Our experiments report results on two publicly available data collections1, namely the Shenzhen and the Montgomery collection. For performance evaluation, measures such as area under the curve (AUC), and accuracy (ACC) were considered, achieving AUC = 0.81 (ACC = 83.33%) and AUC = 0.96 (ACC = 96.35%) for the Montgomery and Schenzen collections, respectively. Several comparisons are also provided to other state- of-the-art systems reported recently in the field.

Quality-Based Representation for Unconstrained Face Recognition

Nelson Méndez-Llanes, Katy Castillo-Rosado, Heydi Mendez-Vazquez, Massimo Tistarelli

Responsive image

Auto-TLDR; activation map for face recognition in unconstrained environments

Slides Similar

Significant advances have been achieved in face recognition in the last decade thanks to the development of deep learning methods. However, recognizing faces captured in uncontrolled environments is still a challenging problem for the scientific community. In these scenarios, the performance of most of existing deep learning based methods abruptly falls, due to the bad quality of the face images. In this work, we propose to use an activation map to represent the quality information in a face image. Different face regions are analyzed to determine their quality and then only those regions with good quality are used to perform the recognition using a given deep face model. For experimental evaluation, in order to simulate unconstrained environments, three challenging databases, with different variations in appearance, were selected: the Labeled Faces in the Wild Database, the Celebrities in Frontal-Profile in the Wild Database, and the AR Database. Three deep face models were used to evaluate the proposal on these databases and in all cases, the use of the proposed activation map allows the improvement of the recognition rates obtained by the original models in a range from 0.3 up to 31%. The obtained results experimentally demonstrated that the proposal is able to select those face areas with higher discriminative power and enough identifying information, while ignores the ones with spurious information.

Detection of Makeup Presentation Attacks Based on Deep Face Representations

Christian Rathgeb, Pawel Drozdowski, Christoph Busch

Responsive image

Auto-TLDR; An Attack Detection Scheme for Face Recognition Using Makeup Presentation Attacks

Slides Poster Similar

Facial cosmetics have the ability to substantially alter the facial appearance, which can negatively affect the decisions of a face recognition. In addition, it was recently shown that the application of makeup can be abused to launch so-called makeup presentation attacks. In such attacks, the attacker might apply heavy makeup in order to achieve the facial appearance of a target subject for the purpose of impersonation. In this work, we assess the vulnerability of a COTS face recognition system to makeup presentation attacks employing the publicly available Makeup Induced Face Spoofing (MIFS) database. It is shown that makeup presentation attacks might seriously impact the security of the face recognition system. Further, we propose an attack detection scheme which distinguishes makeup presentation attacks from genuine authentication attempts by analysing differences in deep face representations obtained from potential makeup presentation attacks and corresponding target face images. The proposed detection system employs a machine learning-based classifier, which is trained with synthetically generated makeup presentation attacks utilizing a generative adversarial network for facial makeup transfer in conjunction with image warping. Experimental evaluations conducted using the MIFS database reveal a detection equal error rate of 0.7% for the task of separating genuine authentication attempts from makeup presentation attacks.

An Investigation of Feature Selection and Transfer Learning for Writer-Independent Offline Handwritten Signature Verification

Victor Souza, Adriano Oliveira, Rafael Menelau Oliveira E Cruz, Robert Sabourin

Responsive image

Auto-TLDR; Overfitting of SigNet using Binary Particle Swarm Optimization

Slides Poster Similar

SigNet is a state of the art model for feature representation used for handwritten signature verification (HSV). This representation is based on a Deep Convolutional Neural Network (DCNN) and contains 2048 dimensions. When transposed to a dissimilarity space generated by the dichotomy transformation (DT), related to the writer-independent (WI) approach, these features may include redundant information. This paper investigates the presence of overfitting when using Binary Particle Swarm Optimization (BPSO) to perform the feature selection in a wrapper mode. We proposed a method based on a global validation strategy with an external archive to control overfitting during the search for the most discriminant representation. Moreover, an investigation is also carried out to evaluate the use of the selected features in a transfer learning context. The analysis is carried out on a writer-independent approach on the CEDAR, MCYT and GPDS-960 datasets. The experimental results showed the presence of overfitting when no validation is used during the optimization process and the improvement when the global validation strategy with an external archive is used. Also, the space generated after feature selection can be used in a transfer learning context.

Viability of Optical Coherence Tomography for Iris Presentation Attack Detection

Renu Sharma, Arun Ross

Responsive image

Auto-TLDR; Optical Coherence Tomography Imaging for Iris Presentation Attack Detection

Slides Poster Similar

In this paper, we first propose the use of Optical Coherence Tomography (OCT) imaging for the problem of iris presentation attack (PA) detection. Secondly, we assess its viability by comparing its performance with respect to traditional modalities, viz., near-infrared (NIR) and visible spectrum. OCT imaging provides a cross-sectional view of an eye, whereas NIR and visible spectrum imaging provide 2D iris textural information. Implementation is performed using three state-of-the-art deep architectures (VGG19, ResNet50 and DenseNet121) to differentiate between bonafide and PA samples for each of the three imaging modalities. Experiments are performed on a dataset of 2,169 bonafide, 177 Van Dyke eyes and 360 cosmetic contact images acquired using all three imaging modalities under intra-attack (known PAs) and cross-attack (unknown PAs) scenario. We observe promising results demonstrating OCT as a viable solution for iris PA detection.

Two-Level Attention-Based Fusion Learning for RGB-D Face Recognition

Hardik Uppal, Alireza Sepas-Moghaddam, Michael Greenspan, Ali Etemad

Responsive image

Auto-TLDR; Fused RGB-D Facial Recognition using Attention-Aware Feature Fusion

Slides Poster Similar

With recent advances in RGB-D sensing technologies as well as improvements in machine learning and fusion techniques, RGB-D facial recognition has become an active area of research. A novel attention aware method is proposed to fuse two image modalities, RGB and depth, for enhanced RGB-D facial recognition. The proposed method first extracts features from both modalities using a convolutional feature extractor. These features are then fused using a two layer attention mechanism. The first layer focuses on the fused feature maps generated by the feature extractor, exploiting the relationship between feature maps using LSTM recurrent learning. The second layer focuses on the spatial features of those maps using convolution. The training database is preprocessed and augmented through a set of geometric transformations, and the learning process is further aided using transfer learning from a pure 2D RGB image training process. Comparative evaluations demonstrate that the proposed method outperforms other state-of-the-art approaches, including both traditional and deep neural network-based methods, on the challenging CurtinFaces and IIIT-D RGB-D benchmark databases, achieving classification accuracies over 98.2% and 99.3% respectively. The proposed attention mechanism is also compared with other attention mechanisms, demonstrating more accurate results.

Improving Gravitational Wave Detection with 2D Convolutional Neural Networks

Siyu Fan, Yisen Wang, Yuan Luo, Alexander Michael Schmitt, Shenghua Yu

Responsive image

Auto-TLDR; Two-dimensional Convolutional Neural Networks for Gravitational Wave Detection from Time Series with Background Noise

Poster Similar

Sensitive gravitational wave (GW) detectors such as that of Laser Interferometer Gravitational-wave Observatory (LIGO) realize the direct observation of GW signals that confirm Einstein's general theory of relativity. However, it remains challenges to quickly detect faint GW signals from a large number of time series with background noise under unknown probability distributions. Traditional methods such as matched-filtering in general assume Additive White Gaussian Noise (AWGN) and are far from being real-time due to its high computational complexity. To avoid these weaknesses, one-dimensional (1D) Convolutional Neural Networks (CNNs) are introduced to achieve fast online detection in milliseconds but do not have enough consideration on the trade-off between the frequency and time features, which will be revisited in this paper through data pre-processing and subsequent two-dimensional (2D) CNNs during offline training to improve the online detection sensitivity. In this work, the input data is pre-processed to form a 2D spectrum by Short-time Fourier transform (STFT), where frequency features are extracted without learning. Then, carrying out two 1D convolutions across time and frequency axes respectively, and concatenating the time-amplitude and frequency-amplitude feature maps with equal proportion subsequently, the frequency and time features are treated equally as the input of our following two-dimensional CNNs. The simulation of our above ideas works on a generated data set with uniformly varying SNR (2-17), which combines the GW signal generated by PYCBC and the background noise sampled directly from LIGO. Satisfying the real-time online detection requirement without noise distribution assumption, the experiments of this paper demonstrate better performance in average compared to that of 1D CNNs, especially in the cases of lower SNR (4-9).

Attribute-Based Quality Assessment for Demographic Estimation in Face Videos

Fabiola Becerra-Riera, Annette Morales-González, Heydi Mendez-Vazquez, Jean-Luc Dugelay

Responsive image

Auto-TLDR; Facial Demographic Estimation in Video Scenarios Using Quality Assessment

Slides Similar

Most existing works regarding facial demographic estimation are focused on still image datasets, although nowadays the need to analyze video content in real applications is increasing. We propose to tackle gender, age and ethnicity estimation in the context of video scenarios. Our main contribution is to use an attribute-specific quality assessment procedure to select best quality frames from a video sequence for each of the three demographic modalities. Best quality frames are classified with fine-tuned MobileNet models and a final video prediction is obtained with a majority voting strategy among the best selected frames. Our validation on three different datasets and our comparison with state-of-the-art models, show the effectiveness of the proposed demographic classifiers and the quality pipeline, which allows to reduce both: the number of frames to be classified and the processing time in practical applications; and improves the soft biometrics prediction accuracy.

ResMax: Detecting Voice Spoofing Attacks with Residual Network and Max Feature Map

Il-Youp Kwak, Sungsu Kwag, Junhee Lee, Jun Ho Huh, Choong-Hoon Lee, Youngbae Jeon, Jeonghwan Hwang, Ji Won Yoon

Responsive image

Auto-TLDR; ASVspoof 2019: A Lightweight Automatic Speaker Verification Spoofing and Countermeasures System

Slides Poster Similar

The ``2019 Automatic Speaker Verification Spoofing And Countermeasures Challenge'' (ASVspoof) competition aimed to facilitate the design of highly accurate voice spoofing attack detection systems. the competition did not emphasize model complexity and latency requirements; such constraints are strict and integral in real-world deployment. Hence, most of the top performing solutions from the competition all used an ensemble approach, and combined multiple complex deep learning models to maximize detection accuracy -- this kind of approach would sit uneasily with real-world deployment constraints. To design a lightweight system, we combined the notions of skip connection (from ResNet) and max feature map (from Light CNN), and evaluated the accuracy of the system using the ASVspoof 2019 dataset. With an optimized constant Q transform (CQT) feature, our single model achieved a replay attack detection equal error rate (EER) of 0.37% on the evaluation set, outperforming the top ensemble system from the competition that achieved an EER of 0.39%.

Trainable Spectrally Initializable Matrix Transformations in Convolutional Neural Networks

Michele Alberti, Angela Botros, Schuetz Narayan, Rolf Ingold, Marcus Liwicki, Mathias Seuret

Responsive image

Auto-TLDR; Trainable and Spectrally Initializable Matrix Transformations for Neural Networks

Slides Poster Similar

In this work, we introduce a new architectural component to Neural Networks (NN), i.e., trainable and spectrally initializable matrix transformations on feature maps. While previous literature has already demonstrated the possibility of adding static spectral transformations as feature processors, our focus is on more general trainable transforms. We study the transforms in various architectural configurations on four datasets of different nature: from medical (ColorectalHist, HAM10000) and natural (Flowers) images to historical documents (CB55). With rigorous experiments that control for the number of parameters and randomness, we show that networks utilizing the introduced matrix transformations outperform vanilla neural networks. The observed accuracy increases appreciably across all datasets. In addition, we show that the benefit of spectral initialization leads to significantly faster convergence, as opposed to randomly initialized matrix transformations. The transformations are implemented as auto-differentiable PyTorch modules that can be incorporated into any neural network architecture. The entire code base is open-source.

Learning Metric Features for Writer-Independent Signature Verification Using Dual Triplet Loss

Qian Wan, Qin Zou

Responsive image

Auto-TLDR; A dual triplet loss based method for offline writer-independent signature verification

Poster Similar

Handwritten signature has long been a widely accepted biometric and applied in many verification scenarios. However, automatic signature verification remains an open research problem, which is mainly due to three reasons. 1) Skilled forgeries generated by persons who imitate the original writting pattern are very difficult to be distinguished from genuine signatures. It is especially so in the case of offline signatures, where only the signature image is captured as a feature for verification. 2) Most state-of-the-art models are writer-dependent, requiring a specific model to be trained whenever a new user is registered in verification, which is quite inconvenient. 3) Writer-independent models often have unsatisfactory performance. To this end, we propose a novel metric learning based method for offline writer-independent signature verification. Specifically, a dual triplet loss is used to train the model, where two different triplets are constructed for random and skilled forgeries, respectively. Experiments on three alphabet datasets — GPDS Synthetic, MCYT and CEDAR — show that the proposed method achieves competitive or superior performance to the state-of-the-art methods. Experiments are also conducted on a new offline Chinese signature dataset — CSIG-WHU, and the results show that the proposed method has a high feasibility on character-based signatures.

Rotation Detection in Finger Vein Biometrics Using CNNs

Bernhard Prommegger, Georg Wimmer, Andreas Uhl

Responsive image

Auto-TLDR; A CNN based rotation detector for finger vein recognition

Slides Poster Similar

Finger vein recognition deals with the identification of subjects based on their venous pattern within the fingers. The recognition accuracy of finger vein recognition systems suffers from different internal and external factors. One of the major problems are misplacements of the finger during acquisition. In particular longitudinal finger rotation poses a severe problem for such recognition systems. The detection and correction of such rotations is a difficult task as typically finger vein scanners acquire only a single image from the vein pattern. Therefore, important information such as the shape of the finger or the depth of the veins within the finger, which are needed for the rotation detection, are not available. This work presents a CNN based rotation detector that is capable of estimating the rotational difference between vein images of the same finger without providing any additional information. The experiments executed not only show that the method delivers highly accurate results, but it also generalizes so that the trained CNN can also be applied on data sets which have not been included during the training of the CNN. Correcting the rotation difference between images using the CNN's rotation prediction leads to EER improvements between 50-260% for a well-established vein-pattern based method (Maximum Curvature) on four public finger vein databases.

Handwritten Signature and Text Based User Verification Using Smartwatch

Raghavendra Ramachandra, Sushma Venkatesh, Raja Kiran, Christoph Busch

Responsive image

Auto-TLDR; A novel technique for user verification using a smartwatch based on writing pattern or signing pattern

Slides Poster Similar

Wrist-wearable devices such as smartwatch have gained popularity as they provide quick access to the various information and easy access to multiple applications. Among various applications of the smartwatch, user verification based on the handwriting has been recently investigated. In this paper, we present a novel technique for user verification using a smartwatch based on writing pattern or signing pattern. The proposed technique leverages accelerometer data captured from the smartwatch that are further represented using 2D Continuous Wavelet Transform (CWT) and deep features extracted using the pre-trained ResNet50. The comparison is performed using the ensemble of the classifier. Extensive experiments are carried out on the newly captured dataset using two different smartwatches with three different writing scenarios (or activities). The article provides key insights and analysis of the results in such a verification scenario.

Appliance Identification Using a Histogram Post-Processing of 2D Local Binary Patterns for Smart Grid Applications

Yassine Himeur, Abdullah Alsalemi, Faycal Bensaali, Abbes Amira

Responsive image

Auto-TLDR; LBP-BEVM based Local Binary Patterns for Appliances Identification in the Smart Grid

Similar

Identifying domestic appliances in the smart grid leads to a better power usage management and further helps in detecting appliance-level abnormalities. An efficient identification can be achieved only if a robust feature extraction scheme is developed with a high ability to discriminate between different appliances on the smart grid. Accordingly, we propose in this paper a novel method to extract electrical power signatures after transforming the power signal to 2D space, which has more encoding possibilities. Following, an improved local binary patterns (LBP) is proposed that relies on improving the discriminative ability of conventional LBP using a post-processing stage. A binarized eigenvalue map (BEVM) is extracted from the 2D power matrix and then used to post-process the generated LBP representation. Next, two histograms are constructed, namely up and down histograms, and are then concatenated to form the global histogram. A comprehensive performance evaluation is performed on two different datasets, namely the GREEND and WITHED, in which power data were collected at 1 Hz and 44000 Hz sampling rates, respectively. The obtained results revealed the superiority of the proposed LBP-BEVM based system in terms of the identification performance versus other 2D descriptors and existing identification frameworks.

A Novel Random Forest Dissimilarity Measure for Multi-View Learning

Hongliu Cao, Simon Bernard, Robert Sabourin, Laurent Heutte

Responsive image

Auto-TLDR; Multi-view Learning with Random Forest Relation Measure and Instance Hardness

Slides Poster Similar

Multi-view learning is a learning task in which data is described by several concurrent representations. Its main challenge is most often to exploit the complementarities between these representations to help solve a classification/regression task. This is a challenge that can be met nowadays if there is a large amount of data available for learning. However, this is not necessarily true for all real-world problems, where data are sometimes scarce (e.g. problems related to the medical environment). In these situations, an effective strategy is to use intermediate representations based on the dissimilarities between instances. This work presents new ways of constructing these dissimilarity representations, learning them from data with Random Forest classifiers. More precisely, two methods are proposed, which modify the Random Forest proximity measure, to adapt it to the context of High Dimension Low Sample Size (HDLSS) multi-view classification problems. The second method, based on an Instance Hardness measurement, is significantly more accurate than other state-of-the-art measurements including the original RF Proximity measurement and the Large Margin Nearest Neighbor (LMNN) metric learning measurement.

AdaFilter: Adaptive Filter Design with Local Image Basis Decomposition for Optimizing Image Recognition Preprocessing

Aiga Suzuki, Keiichi Ito, Takahide Ibe, Nobuyuki Otsu

Responsive image

Auto-TLDR; Optimal Preprocessing Filtering for Pattern Recognition Using Higher-Order Local Auto-Correlation

Slides Poster Similar

Image preprocessing is an important process during pattern recognition which increases the recognition performance. Linear convolution filtering is a primary preprocessing method used to enhance particular local patterns of the image which are essential for recognizing the images. However, because of the vast search space of the preprocessing filter, almost no earlier studies have tackled the problem of identifying an optimal preprocessing filter that yields effective features for input images. This paper proposes a novel design method for the optimal preprocessing filter corresponding to a given task. Our method calculates local image bases of the training dataset and represents the optimal filter as a linear combination of these local image bases with the optimized coefficients to maximize the expected generalization performance. Thereby, the optimization problem of the preprocessing filter is converted to a lower-dimensional optimization problem. Our proposed method combined with a higher-order local auto-correlation (HLAC) feature extraction exhibited the best performance both in the anomaly detection task with the conventional pattern recognition algorithm and in the classification task using the deep convolutional neural network compared with typical preprocessing filters.

Weight Estimation from an RGB-D Camera in Top-View Configuration

Marco Mameli, Marina Paolanti, Nicola Conci, Filippo Tessaro, Emanuele Frontoni, Primo Zingaretti

Responsive image

Auto-TLDR; Top-View Weight Estimation using Deep Neural Networks

Slides Poster Similar

The development of so-called soft-biometrics aims at providing information related to the physical and behavioural characteristics of a person. This paper focuses on bodyweight estimation based on the observation from a top-view RGB-D camera. In fact, the capability to estimate the weight of a person can be of help in many different applications, from health-related scenarios to business intelligence and retail analytics. To deal with this issue, a TVWE (Top-View Weight Estimation) framework is proposed with the aim of predicting the weight. The approach relies on the adoption of Deep Neural Networks (DNNs) that have been trained on depth data. Each network has also been modified in its top section to replace classification with prediction inference. The performance of five state-of-art DNNs has been compared, namely VGG16, ResNet, Inception, DenseNet and Efficient-Net. In addition, a convolutional auto-encoder has also been included for completeness. Considering the limited literature in this domain, the TVWE framework has been evaluated on a new publicly available dataset: “VRAI Weight estimation Dataset”, which also collects, for each subject, labels related to weight, gender, and height. The experimental results have demonstrated that the proposed methods are suitable for this task, bringing different and significant insights for the application of the solution in different domains.

Age Gap Reducer-GAN for Recognizing Age-Separated Faces

Daksha Yadav, Naman Kohli, Mayank Vatsa, Richa Singh, Afzel Noore

Responsive image

Auto-TLDR; Generative Adversarial Network for Age-separated Face Recognition

Slides Poster Similar

In this paper, we propose a novel algorithm for matching faces with temporal variations caused due to age progression. The proposed generative adversarial network algorithm is a unified framework which combines facial age estimation and age-separated face verification. The key idea of this approach is to learn the age variations across time by conditioning the input image on the subject's gender and the target age group to which the face needs to be progressed. The loss function accounts for reducing the age gap between the original image and generated face image as well as preserving the identity. Both visual fidelity and quantitative evaluations demonstrate the efficacy of the proposed architecture on different facial age databases for age-separated face recognition.