Epileptic Seizure Prediction: A Semi-Dilated Convolutional Neural Network Architecture

Ramy Hussein, Rabab K. Ward, Soojin Lee, Martin Mckeown

Responsive image

Auto-TLDR; Semi-Dilated Convolutional Network for Seizure Prediction using EEG Scalograms

Poster

Despite many recent advances in machine learning and time-series classification, accurate prediction of seizures remains elusive. In this work, we develop a convolutional network module that uses Electroencephalogram (EEG) scalograms to distinguish between the pre-seizure and normal brain activities. Since the EEG scalogram takes rectangular image format with many more temporal bins than spectral bins, the presented module uses "semi-dilated convolutions" to also create a proportional non-square receptive field. The proposed semi-dilated convolutions support exponential expansion of the receptive field over the long dimension (image width, i.e. time) while maintaining high resolution over the short dimension (image height, i.e., frequency). The proposed architecture comprises a set of co-operative semi-dilated convolutional blocks, each block has a stack of parallel semi-dilated convolutional modules with different dilation rates. Results show that our proposed seizure prediction solution outperforms the state-of-the-art methods, achieving a seizure prediction sensitivity of 88.45% and 89.52% for the American Epilepsy Society and Melbourne University EEG datasets, respectively.

Similar papers

Multi-Scale and Attention Based ResNet for Heartbeat Classification

Haojie Zhang, Gongping Yang, Yuwen Huang, Feng Yuan, Yilong Yin

Responsive image

Auto-TLDR; A Multi-Scale and Attention based ResNet for ECG heartbeat classification in intra-patient and inter-patient paradigms

Slides Poster Similar

This paper presents a novel deep learning framework for the electrocardiogram (ECG) heartbeat classification. Although there have been some studies with excellent overall accuracy, these studies have not been very accurate in the diagnosis of arrhythmia classes especially such as supraventricular ectopic beat (SVEB) and ventricular ectopic beat (VEB). In our work, we propose a Multi-Scale and Attention based ResNet for heartbeat classification in intra-patient and inter-patient paradigms respectively. Firstly, we extract shallow features from a convolutional layer. Secondly, the shallow features are sent into three branches with different convolution kernels in order to combine receptive fields of different sizes. Finally, fully connected layers are used to classify the heartbeat. Besides, we design a new attention mechanism based on the characteristics of heartbeat data. At last, extensive experiments on benchmark dataset demonstrate the effectiveness of our proposed model.

Exploring Spatial-Temporal Representations for fNIRS-based Intimacy Detection via an Attention-enhanced Cascade Convolutional Recurrent Neural Network

Chao Li, Qian Zhang, Ziping Zhao

Responsive image

Auto-TLDR; Intimate Relationship Prediction by Attention-enhanced Cascade Convolutional Recurrent Neural Network Using Functional Near-Infrared Spectroscopy

Slides Poster Similar

The detection of intimacy plays a crucial role in the improvement of intimate relationship, which contributes to promote the family and social harmony. Previous studies have shown that different degrees of intimacy have significant differences in brain imaging. Recently, a few of work has emerged to recognise intimacy automatically by using machine learning technique. Moreover, considering the temporal dynamic characteristics of intimacy relationship on neural mechanism, how to model spatio-temporal dynamics for intimacy prediction effectively is still a challenge. In this paper, we propose a novel method to explore deep spatial-temporal representations for intimacy prediction by Attention-enhanced Cascade Convolutional Recurrent Neural Network (ACCRNN). Given the advantages of time-frequency resolution in complex neuronal activities analysis, this paper utilizes functional near-infrared spectroscopy (fNIRS) to analyse and infer to intimate relationship. We collect a fNIRS-based dataset for the analysis of intimate relationship. Forty-two-channel fNIRS signals are recorded from the 44 subjects' prefrontal cortex when they watched a total of 18 photos of lovers, friends and strangers for 30 seconds per photo. The experimental results show that our proposed method outperforms the others in terms of accuracy with the precision of 96.5%. To the best of our knowledge, this is the first time that such a hybrid deep architecture has been employed for fNIRS-based intimacy prediction.

Influence of Event Duration on Automatic Wheeze Classification

Bruno M Rocha, Diogo Pessoa, Alda Marques, Paulo Carvalho, Rui Pedro Paiva

Responsive image

Auto-TLDR; Experimental Design of the Non-wheeze Class for Wheeze Classification

Slides Poster Similar

Patients with respiratory conditions typically exhibit adventitious respiratory sounds, such as wheezes. Wheeze events have variable duration. In this work we studied the influence of event duration on wheeze classification, namely how the creation of the non-wheeze class affected the classifiers' performance. First, we evaluated several classifiers on an open access respiratory sound database, with the best one reaching sensitivity and specificity values of 98% and 95%, respectively. Then, by changing one parameter in the design of the non-wheeze class, i.e., event duration, the best classifier only reached sensitivity and specificity values of 53% and 75%, respectively. These results demonstrate the importance of experimental design on the assessment of wheeze classification algorithms' performance.

DE-Net: Dilated Encoder Network for Automated Tongue Segmentation

Hui Tang, Bin Wang, Jun Zhou, Yongsheng Gao

Responsive image

Auto-TLDR; Automated Tongue Image Segmentation using De-Net

Slides Poster Similar

Automated tongue recognition is a growing research field due to global demand for personal health care. Using mobile devices to take tongue pictures is convenient and of low cost for tongue recognition. It is particularly suitable for self-health evaluation of the public. However, images taken by mobile devices are easily affected by various imaging environment, which makes fine segmentation a more challenging task compared with those taken by specialized acquisition devices. Deep learning approaches are promising for tongue image segmentation because they have powerful feature learning and representation capability. However, the successive pooling operations in these methods lead to loss of information on image details, making them fail when segmenting low-quality images captured by mobile devices. To address this issue, we propose a dilated encoder network (DE-Net) to capture more high-level features and get high-resolution output for automated tongue image segmentation. In addition, we construct two tongue image datasets which contain images taken by specialized devices and mobile devices, respectively, to verify the effectiveness of the proposed method. Experimental results on both datasets demonstrate that the proposed method outperforms the state-of-the-art methods in tongue image segmentation.

Exploring Seismocardiogram Biometrics with Wavelet Transform

Po-Ya Hsu, Po-Han Hsu, Hsin-Li Liu

Responsive image

Auto-TLDR; Seismocardiogram Biometric Matching Using Wavelet Transform and Deep Learning Models

Slides Poster Similar

Seismocardiogram (SCG) has become easily accessible in the past decade owing to the advance of sensor technology. However, SCG biometric has not been widely explored. In this paper, we propose combining wavelet transform together with deep learning models, machine learning classifiers, or structural similarity metric to perform SCG biometric matching tasks. We validate the proposed methods on the publicly available dataset from PhysioNet database. The dataset contains one hour long electrocardiogram, breathing, and SCG data of 20 subjects. We train the models on the first five minute SCG and conduct identification on the last five minute SCG. We evaluate the identification and authentication performance with recognition rate and equal error rate, respectively. Based on the results, we show that wavelet transformed SCG biometric can achieve state-of-the-art performance when combined with deep learning models, machine learning classifiers, or structural similarity.

EEG-Based Cognitive State Assessment Using Deep Ensemble Model and Filter Bank Common Spatial Pattern

Debashis Das Chakladar, Shubhashis Dey, Partha Pratim Roy, Masakazu Iwamura

Responsive image

Auto-TLDR; A Deep Ensemble Model for Cognitive State Assessment using EEG-based Cognitive State Analysis

Slides Poster Similar

Electroencephalography (EEG) is the most used physiological measure to evaluate the cognitive state of a user efficiently. As EEG inherently suffers from a poor spatial resolution, features extracted from each EEG channel may not efficiently used for cognitive state assessment. In this paper, the EEG-based cognitive state assessment has been performed during the mental arithmetic experiment, which includes two cognitive states (task and rest) of a user. To obtain the temporal as well as spatial resolution of the EEG signal, we combined the Filter Bank Common Spatial Pattern (FBCSP) method and Long Short-Term Memory (LSTM)-based deep ensemble model for classifying the cognitive state of a user. Subject-wise data distribution has been performed due to the execution of a large volume of data in a low computing environment. In the FBCSP method, the input EEG is decomposed into multiple equal-sized frequency bands, and spatial features of each frequency bands are extracted using the Common Spatial Pattern (CSP) algorithm. Next, a feature selection algorithm has been applied to identify the most informative features for classification. The proposed deep ensemble model consists of multiple similar structured LSTM networks that work in parallel. The output of the ensemble model (i.e., the cognitive state of a user) is computed using the average weighted combination of individual model prediction. This proposed model achieves 87\% classification accuracy, and it can also effectively estimate the cognitive state of a user in a low computing environment.

Automatic Classification of Human Granulosa Cells in Assisted Reproductive Technology Using Vibrational Spectroscopy Imaging

Marina Paolanti, Emanuele Frontoni, Giorgia Gioacchini, Giorgini Elisabetta, Notarstefano Valentina, Zacà Carlotta, Carnevali Oliana, Andrea Borini, Marco Mameli

Responsive image

Auto-TLDR; Predicting Oocyte Quality in Assisted Reproductive Technology Using Machine Learning Techniques

Slides Poster Similar

In the field of reproductive technology, the biochemical composition of female gametes has been successfully investigated with the use of vibrational spectroscopy. Currently, in assistive reproductive technology (ART), there are no shared criteria for the choice of oocyte, and automatic classification methods for the best quality oocytes have not yet been applied. In this paper, considering the lack of criteria in Assisted Reproductive Technology (ART), we use Machine Learning (ML) techniques to predict oocyte quality for a successful pregnancy. To improve the chances of successful implantation and minimize any complications during the pregnancy, Fourier transform infrared microspectroscopy (FTIRM) analysis has been applied on granulosa cells (GCs) collected along with the oocytes during oocyte aspiration, as it is routinely done in ART, and specific spectral biomarkers were selected by multivariate statistical analysis. A proprietary biological reference dataset (BRD) was successfully collected to predict the best oocyte for a successful pregnancy. Personal health information are stored, maintained and backed up using a cloud computing service. Using a user-friendly interface, the user will evaluate whether or not the selected oocyte will have a positive result. This interface includes a dashboard for retrospective analysis, reporting, real-time processing, and statistical analysis. The experimental results are promising and confirm the efficiency of the method in terms of classification metrics: precision, recall, and F1-score (F1) measures.

Electroencephalography Signal Processing Based on Textural Features for Monitoring the Driver’s State by a Brain-Computer Interface

Giulia Orrù, Marco Micheletto, Fabio Terranova, Gian Luca Marcialis

Responsive image

Auto-TLDR; One-dimensional Local Binary Pattern Algorithm for Estimating Driver Vigilance in a Brain-Computer Interface System

Slides Poster Similar

In this study we investigate a textural processing method of electroencephalography (EEG) signal as an indicator to estimate the driver's vigilance in a hypothetical Brain-Computer Interface (BCI) system. The novelty of the solution proposed relies on employing the one-dimensional Local Binary Pattern (1D-LBP) algorithm for feature extraction from pre-processed EEG data. From the resulting feature vector, the classification is done according to three vigilance classes: awake, tired and drowsy. The claim is that the class transitions can be detected by describing the variations of the micro-patterns' occurrences along the EEG signal. The 1D-LBP is able to describe them by detecting mutual variations of the signal temporarily "close" as a short bit-code. Our analysis allows to conclude that the 1D-LBP adoption has led to significant performance improvement. Moreover, capturing the class transitions from the EEG signal is effective, although the overall performance is not yet good enough to develop a BCI for assessing the driver's vigilance in real environments.

Ballroom Dance Recognition from Audio Recordings

Tomas Pavlin, Jan Cech, Jiri Matas

Responsive image

Auto-TLDR; A CNN-based approach to classify ballroom dances given audio recordings

Slides Poster Similar

We propose a CNN-based approach to classify ten genres of ballroom dances given audio recordings, five latin and five standard, namely Cha Cha Cha, Jive, Paso Doble, Rumba, Samba, Quickstep, Slow Foxtrot, Slow Waltz, Tango and Viennese Waltz. We utilize a spectrogram of an audio signal and we treat it as an image that is an input of the CNN. The classification is performed independently by 5-seconds spectrogram segments in sliding window fashion and the results are then aggregated. The method was tested on following datasets: Publicly available Extended Ballroom dataset collected by Marchand and Peeters, 2016 and two YouTube datasets collected by us, one in studio quality and the other, more challenging, recorded on mobile phones. The method achieved accuracy 93.9%, 96.7% and 89.8% respectively. The method runs in real-time. We implemented a web application to demonstrate the proposed method.

Planar 3D Transfer Learning for End to End Unimodal MRI Unbalanced Data Segmentation

Martin Kolarik, Radim Burget, Carlos M. Travieso-Gonzalez, Jan Kocica

Responsive image

Auto-TLDR; Planar 3D Res-U-Net Network for Unbalanced 3D Image Segmentation using Fluid Attenuation Inversion Recover

Slides Similar

We present a novel approach of 2D to 3D transfer learning based on mapping pre-trained 2D convolutional neural network weights into planar 3D kernels. The method is validated by proposed planar 3D res-u-net network with encoder transferred from the 2D VGG-16 which is applied for a single-stage unbalanced 3D image data segmentation. In particular, we evaluate the method on the MICCAI 2016 MS lesion segmentation challenge dataset utilizing solely Fluid Attenuation Inversion Recover (FLAIR) sequence without brain extraction for training and inference to simulate real medical praxis. The planar 3D res-u-net network performed the best both in sensitivity and Dice score amongst end to end methods processing raw MRI scans and achieved comparable Dice score to a state-of-the-art unimodal not end to end approach. Complete source code was released under the open-source license and this paper is in compliance with the Machine learning Reproducibility Checklist. By implementing practical transfer learning for 3D data representation we were able to successfully segment heavily unbalanced data without selective sampling and achieved more reliable results using less training data in single modality. From medical perspective, the unimodal approach gives an advantage in real praxis as it does not require co-registration nor additional scanning time during examination. Although modern medical imaging methods capture high resolution 3D anatomy scans suitable for computer aided detection system processing, deployment of automatic systems for interpretation of radiology imaging is still rather theoretical in many medical areas. Our work aims to bridge the gap offering solution for partial research questions.

Hybrid Network for End-To-End Text-Independent Speaker Identification

Wajdi Ghezaiel, Luc Brun, Olivier Lezoray

Responsive image

Auto-TLDR; Text-Independent Speaker Identification with Scattering Wavelet Network and Convolutional Neural Networks

Slides Poster Similar

Deep learning has recently improved the performance of Speaker Identification (SI) systems. Promising results have been obtained with Convolutional Neural Networks (CNNs). This success are mostly driven by the advent of large datasets. However in the context of commercial applications, collection of large amount of training data is not always possible. In addition, robustness of a SI system is adversely effected by short utterances. SI with only a few and short utterances is a challenging problem. Therefore, in this paper, we propose a novel text-independent speaker identification system. The proposed system can identify speakers by learning from only few training short utterances examples. To achieve this, we combine CNN with Scattering Wavelet Network. We propose a two-stage feature extraction framework using a two-layer wavelet scattering network coupled with a CNN for SI system. The proposed architecture takes variable length speech segments. To evaluate the effectiveness of the proposed approach, Timit and Librispeech datasets are used in the experiments. These conducted experiments show that our hybrid architecture performs successfully for SI, even with a small number and short duration of training samples. In comparaison with related methods, the obtained results shows that an hybrid architecture achieve better performance.

Automatic Tuberculosis Detection Using Chest X-Ray Analysis with Position Enhanced Structural Information

Hermann Jepdjio Nkouanga, Szilard Vajda

Responsive image

Auto-TLDR; Automatic Chest X-ray Screening for Tuberculosis in Rural Population using Localized Region on Interest

Slides Poster Similar

For Tuberculosis (TB) detection beside the more expensive diagnosis solutions such as culture or sputum smear analysis one could consider the automatic analysis of the chest X-ray (CXR). This could mimic the lung region reading by the radiologist and it could provide a cheap solution to analyze and diagnose pulmonary abnormalities such as TB which often co- occurs with HIV. This software based pulmonary screening can be a reliable and affordable solution for rural population in different parts of the world such as India, Africa, etc. Our fully automatic system is processing the incoming CXR image by applying image processing techniques to detect the region on interest (ROI) followed by a computationally cheap feature extraction involving edge detection using Laplacian of Gaussian which we enrich by counting the local distribution of the intensities. The choice to ”zoom in” the ROI and look for abnormalities locally is motivated by the fact that some pulmonary abnormalities are localized in specific regions of the lungs. Later on the classifiers can decide about the normal or abnormal nature of each lung X-ray. Our goal is to find a simple feature, instead of a combination of several ones, -proposed and promoted in recent years’ literature, which can properly describe the different pathological alterations in the lungs. Our experiments report results on two publicly available data collections1, namely the Shenzhen and the Montgomery collection. For performance evaluation, measures such as area under the curve (AUC), and accuracy (ACC) were considered, achieving AUC = 0.81 (ACC = 83.33%) and AUC = 0.96 (ACC = 96.35%) for the Montgomery and Schenzen collections, respectively. Several comparisons are also provided to other state- of-the-art systems reported recently in the field.

Segmentation of Intracranial Aneurysm Remnant in MRA Using Dual-Attention Atrous Net

Subhashis Banerjee, Ashis Kumar Dhara, Johan Wikström, Robin Strand

Responsive image

Auto-TLDR; Dual-Attention Atrous Net for Segmentation of Intracranial Aneurysm Remnant from MRA Images

Slides Poster Similar

Due to the advancement of non-invasive medical imaging modalities like Magnetic Resonance Angiography (MRA), an increasing number of Intracranial Aneurysm (IA) cases are being reported in recent years. The IAs are typically treated by so-called endovascular coiling, where blood flow in the IA is prevented by embolization with a platinum coil. Accurate quantification of the IA Remnant (IAR), i.e. the volume with blood flow present post treatment is the utmost important factor in choosing the right treatment planning. This is typically done by manually segmenting the aneurysm remnant from the MRA volume. Since manual segmentation of volumetric images is a labour-intensive and error-prone process, development of an automatic volumetric segmentation method is required. Segmentation of small structures such as IA, that may largely vary in size, shape, and location is considered extremely difficult. Similar intensity distribution of IAs and surrounding blood vessels makes it more challenging and susceptible to false positive. In this paper we propose a novel 3D CNN architecture called Dual-Attention Atrous Net (DAtt-ANet), which can efficiently segment IAR volumes from MRA images by reconciling features at different scales using the proposed Parallel Atrous Unit (PAU) along with the use of self-attention mechanism for extracting fine-grained features and intra-class correlation. The proposed DAtt-ANet model is trained and evaluated on a clinical MRA image dataset (prospective research project, approved by the local ethical committee) of IAR consisting of 46 subjects, annotated by an expert radiologist from our group. We compared the proposed DAtt-ANet with five state-of-the-art CNN models based on their segmentation performance. The proposed DAtt-ANet outperformed all other methods and was able to achieve a five-fold cross-validation DICE score of $0.73\pm0.06$.

Improving Gravitational Wave Detection with 2D Convolutional Neural Networks

Siyu Fan, Yisen Wang, Yuan Luo, Alexander Michael Schmitt, Shenghua Yu

Responsive image

Auto-TLDR; Two-dimensional Convolutional Neural Networks for Gravitational Wave Detection from Time Series with Background Noise

Poster Similar

Sensitive gravitational wave (GW) detectors such as that of Laser Interferometer Gravitational-wave Observatory (LIGO) realize the direct observation of GW signals that confirm Einstein's general theory of relativity. However, it remains challenges to quickly detect faint GW signals from a large number of time series with background noise under unknown probability distributions. Traditional methods such as matched-filtering in general assume Additive White Gaussian Noise (AWGN) and are far from being real-time due to its high computational complexity. To avoid these weaknesses, one-dimensional (1D) Convolutional Neural Networks (CNNs) are introduced to achieve fast online detection in milliseconds but do not have enough consideration on the trade-off between the frequency and time features, which will be revisited in this paper through data pre-processing and subsequent two-dimensional (2D) CNNs during offline training to improve the online detection sensitivity. In this work, the input data is pre-processed to form a 2D spectrum by Short-time Fourier transform (STFT), where frequency features are extracted without learning. Then, carrying out two 1D convolutions across time and frequency axes respectively, and concatenating the time-amplitude and frequency-amplitude feature maps with equal proportion subsequently, the frequency and time features are treated equally as the input of our following two-dimensional CNNs. The simulation of our above ideas works on a generated data set with uniformly varying SNR (2-17), which combines the GW signal generated by PYCBC and the background noise sampled directly from LIGO. Satisfying the real-time online detection requirement without noise distribution assumption, the experiments of this paper demonstrate better performance in average compared to that of 1D CNNs, especially in the cases of lower SNR (4-9).

Trainable Spectrally Initializable Matrix Transformations in Convolutional Neural Networks

Michele Alberti, Angela Botros, Schuetz Narayan, Rolf Ingold, Marcus Liwicki, Mathias Seuret

Responsive image

Auto-TLDR; Trainable and Spectrally Initializable Matrix Transformations for Neural Networks

Slides Poster Similar

In this work, we introduce a new architectural component to Neural Networks (NN), i.e., trainable and spectrally initializable matrix transformations on feature maps. While previous literature has already demonstrated the possibility of adding static spectral transformations as feature processors, our focus is on more general trainable transforms. We study the transforms in various architectural configurations on four datasets of different nature: from medical (ColorectalHist, HAM10000) and natural (Flowers) images to historical documents (CB55). With rigorous experiments that control for the number of parameters and randomness, we show that networks utilizing the introduced matrix transformations outperform vanilla neural networks. The observed accuracy increases appreciably across all datasets. In addition, we show that the benefit of spectral initialization leads to significantly faster convergence, as opposed to randomly initialized matrix transformations. The transformations are implemented as auto-differentiable PyTorch modules that can be incorporated into any neural network architecture. The entire code base is open-source.

EasiECG: A Novel Inter-Patient Arrhythmia Classification Method Using ECG Waves

Chuanqi Han, Ruoran Huang, Fang Yu, Xi Huang, Li Cui

Responsive image

Auto-TLDR; EasiECG: Attention-based Convolution Factorization Machines for Arrhythmia Classification

Slides Poster Similar

Abstract—In an ECG record, the PQRST waves are of important medical significance which provide ample information reflecting heartbeat activities. In this paper, we propose a novel arrhythmia classification method namely EasiECG, characterized by simplicity and accuracy. Compared with other works, the EasiECG takes the configuration of these five key waves into account and does not require complicated feature engineering. Meanwhile, an additional encoding of the extracted features makes the EasiECG applicable even on samples with missing waves. To automatically capture interactions that contribute to the classification among the processed features, a novel adapted classification model named Attention-based Convolution Factorization Machines (ACFM) is proposed. In detail, the ACFM can learn both linear and high-order interactions from linear regression and convolution on outer-product feature interaction maps, respectively. After that, an attention mechanism implemented in the model can further assign different importance of these interactions when predicting certain types of heartbeats. To validate the effectiveness and practicability of our EasiECG, extensive experiments of inter-patient paradigm on the benchmark MIT-BIH arrhythmia database are conducted. To tackle the imbalanced sample problem in this dataset, an ingenious loss function: focal loss is adopted when training. The experiment results show that our method is competitive compared with other state-of-the-arts, especially in classifying the Supraventricular ectopic beats. Besides, the EasiECG achieves an overall accuracy of 87.6% on samples with a missing wave in the related experiment, demonstrating the robustness of our proposed method.

Inception Based Deep Learning Architecture for Tuberculosis Screening of Chest X-Rays

Dipayan Das, K.C. Santosh, Umapada Pal

Responsive image

Auto-TLDR; End to End CNN-based Chest X-ray Screening for Tuberculosis positive patients in the severely resource constrained regions of the world

Slides Poster Similar

The motivation for this work is the primary need of screening Tuberculosis (TB) positive patients in the severely resource constrained regions of the world. Chest X-ray (CXR) is considered to be a promising indicator for the onset of TB, but the lack of skilled radiologists in such regions degrades the situation. Therefore, several computer aided diagnosis (CAD) systems have been proposed to solve the decision making problem, which includes hand engineered feature extraction methods to deep learning or Convolutional Neural Network (CNN) based methods. Feature extraction, being a time and resource intensive process, often delays the process of mass screening. Hence an end to end CNN architecture is proposed in this work to solve the problem. Two benchmark CXR datasets have been used in this work, collected from Shenzhen (China) and Montgomery County (USA), on which the proposed methodology achieved a maximum abnormality detection accuracy (ACC) of 91.7\% (0.96 AUC) and 87.47\% (0.92 AUC) respectively. To the greatest of our knowledge, the obtained results are marginally superior to the state of the art results that have solely used deep learning methodologies on the aforementioned datasets.

Classification of Spatially Enriched Pixel Time Series with Convolutional Neural Networks

Mohamed Chelali, Camille Kurtz, Anne Puissant, Nicole Vincent

Responsive image

Auto-TLDR; Spatio-Temporal Features Extraction from Satellite Image Time Series Using Random Walk

Slides Poster Similar

Satellite Image Time Series (SITS), MRI sequences, and more generally image time series, constitute 2D+t data providing spatial and temporal information about an observed scene. Given a pattern recognition task such as image classification, considering jointly such rich information is crucial during the decision process. Nevertheless, due to the complex representation of the data-cube, spatio-temporal features extraction from 2D+t data remains difficult to handle. We present in this article an approach to learn such features from this data, and then to proceed to their classification. Our strategy consists in enriching pixel time series with spatial information. It is based on Random Walk to build a novel segment-based representation of the data, passing from a 2D+t dimension to a 2D one, without loosing too much spatial information. Such new representation is then involved in an end-to-end learning process with a classical 2D Convolutional Neural Network (CNN) in order to learn spatio-temporal features for the classification of image time series. Our approach is evaluated on a remote sensing application for the mapping of agricultural crops. Thanks to a visual attention mechanism, the proposed $2D$ spatio-temporal representation makes also easier the interpretation of a SITS to understand spatio-temporal phenomenons related to soil management practices.

A Comparison of Neural Network Approaches for Melanoma Classification

Maria Frasca, Michele Nappi, Michele Risi, Genoveffa Tortora, Alessia Auriemma Citarella

Responsive image

Auto-TLDR; Classification of Melanoma Using Deep Neural Network Methodologies

Slides Poster Similar

Melanoma is the deadliest form of skin cancer and it is diagnosed mainly visually, starting from initial clinical screening and followed by dermoscopic analysis, biopsy and histopathological examination. A dermatologist’s recognition of melanoma may be subject to errors and may take some time to diagnose it. In this regard, deep learning can be useful in the study and classification of skin cancer. In particular, by classifying images with Deep Neural Network methodologies, it is possible to obtain comparable or even superior results compared to those of dermatologists. In this paper, we propose a methodology for the classification of melanoma by adopting different deep learning techniques applied to a common dataset, composed of images from the ISIC dataset and consisting of different types of skin diseases, including melanoma on which we applied a specific pre-processing phase. In particular, a comparison of the results is performed in order to select the best effective neural network to be applied to the problem of recognition and classification of melanoma. Moreover, we also evaluate the impact of the pre- processing phase on the final classification. Different metrics such as accuracy, sensitivity, and specificity have been selected to assess the goodness of the adopted neural networks and compare them also with the manual classification of dermatologists.

Using Machine Learning to Refer Patients with Chronic Kidney Disease to Secondary Care

Lee Au-Yeung, Xianghua Xie, Timothy Marcus Scale, James Anthony Chess

Responsive image

Auto-TLDR; A Machine Learning Approach for Chronic Kidney Disease Prediction using Blood Test Data

Slides Poster Similar

There has been growing interest recently in using machine learning techniques as an aid in clinical medicine. Machine learning offers a range of classification algorithms which can be applied to medical data to aid in making clinical predictions. Recent studies have demonstrated the high predictive accuracy of various classification algorithms applied to clinical data. Several studies have already been conducted in diagnosing or predicting chronic kidney disease at various stages using different sets of variables. In this study we are investigating the use machine learning techniques with blood test data. Such a system could aid renal teams in making recommendations to primary care general practitioners to refer patients to secondary care where patients may benefit from earlier specialist assessment and medical intervention. We are able to achieve an overall accuracy of 88.48\% using logistic regression, 87.12\% using ANN and 85.29\% using SVM. ANNs performed with the highest sensitivity at 89.74\% compared to 86.67\% for logistic regression and 85.51\% for SVM.

Graph Convolutional Neural Networks for Power Line Outage Identification

Jia He, Maggie Cheng

Responsive image

Auto-TLDR; Graph Convolutional Networks for Power Line Outage Identification

Poster Similar

In this paper, we consider the power line outage identification problem as a graph signal classification problem, where the signal at each vertex is given as a time series. We propose graph convolutional networks (GCNs) for the task of classifying signals supported on graphs. An important element of the GCN design is filter design. We consider filtering signals in either the vertex (spatial) domain, or the frequency (spectral) domain. Two basic architectures are proposed. In the spatial GCN architecture, the GCN uses a graph shift operator as the basic building block to incorporate the underlying graph structure into the convolution layer. The spatial filter directly utilizes the graph connectivity information. It defines the filter to be a polynomial in the graph shift operator to obtain the convolved features that aggregate neighborhood information of each node. In the spectral GCN architecture, a frequency filter is used instead. A graph Fourier transform operator first transforms the raw graph signal from the vertex domain to the frequency domain, and then a filter is defined using the graph's spectral parameters. The spectral GCN then uses the output from the graph Fourier transform to compute the convolved features. There are additional challenges to classify the time-evolving graph signal as the signal value at each vertex changes over time. The GCNs are designed to recognize different spatiotemporal patterns from high-dimensional data defined on a graph. The application of the proposed methods to power line outage identification shows that these GCN architectures can successfully classify abnormal signal patterns and identify the outage location.

Automatic Semantic Segmentation of Structural Elements related to the Spinal Cord in the Lumbar Region by Using Convolutional Neural Networks

Jhon Jairo Sáenz Gamboa, Maria De La Iglesia-Vaya, Jon Ander Gómez

Responsive image

Auto-TLDR; Semantic Segmentation of Lumbar Spine Using Convolutional Neural Networks

Slides Poster Similar

This work addresses the problem of automatically segmenting the MR images corresponding to the lumbar spine. The purpose is to detect and delimit the different structural elements like vertebrae, intervertebral discs, nerves, blood vessels, etc. This task is known as semantic segmentation. The approach proposed in this work is based on convolutional neural networks whose output is a mask where each pixel from the input image is classified into one of the possible classes. Classes were defined by radiologists and correspond to structural elements and tissues. The proposed network architectures are variants of the U-Net. Several complementary blocks were used to define the variants: spatial attention models, deep supervision and multi-kernels at input, this last block type is based on the idea of inception. Those architectures which got the best results are described in this paper, and their results are discussed. Two of the proposed architectures outperform the standard U-Net used as baseline.

Malware Detection by Exploiting Deep Learning over Binary Programs

Panpan Qi, Zhaoqi Zhang, Wei Wang, Chang Yao

Responsive image

Auto-TLDR; End-to-End Malware Detection without Feature Engineering

Slides Poster Similar

Malware evolves rapidly over time, which makes existing solutions being ineffective in detecting newly released malware. Machine learning models that can learn to capture malicious patterns directly from the data play an increasingly important role in malware analysis. However, traditional machine learning models heavily depend on feature engineering. The extracted static features are vulnerable as hackers could create new malware with different feature values to deceive the machine learning models. In this paper, we propose an end-to-end malware detection framework consisting of convolutional neural network, autoencoder and neural decision trees. It learns the features from multiple domains for malware detection without feature engineering. In addition, since anti-virus products should have a very low false alarm rate to avoid annoying users, we propose a special loss function, which optimizes the recall for a fixed low false positive rate (e.g., less than 0.1%). Experiments show that the proposed framework has achieved a better recall than the baseline models, and the derived loss function also makes a difference.

The Application of Capsule Neural Network Based CNN for Speech Emotion Recognition

Xincheng Wen, Kunhong Liu

Responsive image

Auto-TLDR; CapCNN: A Capsule Neural Network for Speech Emotion Recognition

Slides Poster Similar

Moreover, the abstraction of audio features makes it impossible to fully use the inherent relationship among audio features. This paper proposes a model that combines a convolutional neural network(CNN) and a capsule neural network (CapsNet), named as CapCNN. The advantage of CapCNN lies in that it provides a solution to solve time sensitivity and focus on the overall characteristics. In this study, it is found that CapCNN can well handle the speech emotion recognition task. Compared with other state-of-art methods, our algorithm shows high performances on the CASIA and EMODB datasets. The detailed analysis confirms that our method provides balanced results on the various classes.

Deep Transfer Learning for Alzheimer’s Disease Detection

Nicole Cilia, Claudio De Stefano, Francesco Fontanella, Claudio Marrocco, Mario Molinara, Alessandra Scotto Di Freca

Responsive image

Auto-TLDR; Automatic Detection of Handwriting Alterations for Alzheimer's Disease Diagnosis using Dynamic Features

Slides Poster Similar

Early detection of Alzheimer’s Disease (AD) is essential in order to initiate therapies that can reduce the effects of such a disease, improving both life quality and life expectancy of patients. Among all the activities carried out in our daily life, handwriting seems one of the first to be influenced by the arise of neurodegenerative diseases. For this reason, the analysis of handwriting and the study of its alterations has become of great interest in this research field in order to make a diagnosis as early as possible. In recent years, many studies have tried to use classification algorithms applied to handwritings to implement decision support systems for AD diagnosis. A key issue for the use of these techniques is the detection of effective features, that allow the system to distinguish the natural handwriting alterations due to age, from those caused by neurodegenerative disorders. In this context, many interesting results have been published in the literature in which the features have been typically selected by hand, generally considering the dynamics of the handwriting process in order to detect motor disorders closely related to AD. Features directly derived from handwriting generation models can be also very helpful for AD diagnosis. It should be remarked, however, that the above features do not consider changes in the shape of handwritten traces, which may occur as a consequence of neurodegenerative diseases, as well as the correlation among shape alterations and changes in the dynamics of the handwriting process. Moving from these considerations, the aim of this study is to verify if the combined use of both shape and dynamic features allows a decision support system to improve performance for AD diagnosis. To this purpose, starting from a database of on-line handwriting samples, we generated for each of them a synthetic off-line colour image, where the colour of each elementary trait encodes, in the three RGB channels, the dynamic information associated to that trait. Finally, we exploited the capability of Deep Neural Networks (DNN) to automatically extract features from raw images. The experimental comparison of the results obtained by using standard features and features extracted according the above procedure, confirmed the effectiveness of our approach.

Vertex Feature Encoding and Hierarchical Temporal Modeling in a Spatio-Temporal Graph Convolutional Network for Action Recognition

Konstantinos Papadopoulos, Enjie Ghorbel, Djamila Aouada, Bjorn Ottersten

Responsive image

Auto-TLDR; Spatio-Temporal Graph Convolutional Network for Skeleton-Based Action Recognition

Slides Poster Similar

Spatio-temporal Graph Convolutional Networks (ST-GCNs) have shown great performance in the context of skeleton-based action recognition. Nevertheless, ST-GCNs use raw skeleton data as vertex features. Such features have low dimensionality and might not be optimal for action discrimination. Moreover, a single layer of temporal convolution is used to model short-term temporal dependencies but can be insufficient for capturing both long-term. In this paper, we extend the Spatio-Temporal Graph Convolutional Network for skeleton-based action recognition by introducing two novel modules, namely, the Graph Vertex Feature Encoder (GVFE) and the Dilated Hierarchical Temporal Convolutional Network (DH-TCN). On the one hand, the GVFE module learns appropriate vertex features for action recognition by encoding raw skeleton data into a new feature space. On the other hand, the DH-TCN module is capable of capturing both short-term and long-term temporal dependencies using a hierarchical dilated convolutional network. Experiments have been conducted on the challenging NTU RGB-D 60, NTU RGB-D 120 and Kinetics datasets. The obtained results show that our method competes with state-of-the-art approaches while using a smaller number of layers and parameters; thus reducing the required training time and memory.

Prediction of Obstructive Coronary Artery Disease from Myocardial Perfusion Scintigraphy using Deep Neural Networks

Ida Arvidsson, Niels Christian Overgaard, Miguel Ochoa Figueroa, Jeronimo Rose, Anette Davidsson, Kalle Åström, Anders Heyden

Responsive image

Auto-TLDR; A Deep Learning Algorithm for Multi-label Classification of Myocardial Perfusion Scintigraphy for Stable Ischemic Heart Disease

Slides Poster Similar

For diagnosis and risk assessment in patients with stable ischemic heart disease, myocardial perfusion scintigraphy is one of the most common cardiological examinations performed today. There are however many motivations for why an artificial intelligence algorithm would provide useful input to this task. For example to reduce the subjectiveness and save time for the nuclear medicine physicians working with this time consuming task. In this work we have developed a deep learning algorithm for multi-label classification based on a modified convolutional neural network to estimate probability of obstructive coronary artery disease in the left anterior artery, left circumflex artery and right coronary artery. The prediction is based on data from myocardial perfusion scintigraphy studies conducted in a dedicated Cadmium-Zinc-Telluride cardio camera (D-SPECT Spectrum Dynamics). Data from 588 patients was available, with stress images in both upright and supine position, as well as a number of auxiliary parameters such as angina symptoms and BMI. The data was used to train and evaluate the algorithm using 5-fold cross-validation. We achieve state-of-the-art results for this task with an area under the receiver operating characteristics curve of 0.89 as average on per-vessel level and 0.94 on per-patient level.

ESResNet: Environmental Sound Classification Based on Visual Domain Models

Andrey Guzhov, Federico Raue, Jörn Hees, Andreas Dengel

Responsive image

Auto-TLDR; Environmental Sound Classification with Short-Time Fourier Transform Spectrograms

Slides Poster Similar

Environmental Sound Classification (ESC) is an active research area in the audio domain and has seen a lot of progress in the past years. However, many of the existing approaches achieve high accuracy by relying on domain-specific features and architectures, making it harder to benefit from advances in other fields (e.g., the image domain). Additionally, some of the past successes have been attributed to a discrepancy of how results are evaluated (i.e., on unofficial splits of the UrbanSound8K (US8K) dataset), distorting the overall progression of the field. The contribution of this paper is twofold. First, we present a model that is inherently compatible with mono and stereo sound inputs. Our model is based on simple log-power Short-Time Fourier Transform (STFT) spectrograms and combines them with several well-known approaches from the image domain (i.e., ResNet, Siamese-like networks and attention). We investigate the influence of cross-domain pre-training, architectural changes, and evaluate our model on standard datasets. We find that our model out-performs all previously known approaches in a fair comparison by achieving accuracies of 97.0 % (ESC-10), 91.5 % (ESC-50) and 84.2 % / 85.4 % (US8K mono / stereo). Second, we provide a comprehensive overview of the actual state of the field, by differentiating several previously reported results on the US8K dataset between official or unofficial splits. For better reproducibility, our code (including any re-implementations) is made available.

PC-Net: A Deep Network for 3D Point Clouds Analysis

Zhuo Chen, Tao Guan, Yawei Luo, Yuesong Wang

Responsive image

Auto-TLDR; PC-Net: A Hierarchical Neural Network for 3D Point Clouds Analysis

Slides Poster Similar

Due to the irregularity and sparsity of 3D point clouds, applying convolutional neural networks directly on them can be nontrivial. In this work, we propose a simple but effective approach for 3D Point Clouds analysis, named PC-Net. PC-Net directly learns on point sets and is equipped with three new operations: first, we apply a novel scale-aware neighbor search for adaptive neighborhood extracting; second, for each neighboring point, we learn a local spatial feature as a complement to their associated features; finally, at the end we use a distance re-weighted pooling to aggregate all the features from local structure. With this module, we design hierarchical neural network for point cloud understanding. For both classification and segmentation tasks, our architecture proves effective in the experiments and our models demonstrate state-of-the-art performance over existing deep learning methods on popular point cloud benchmarks.

Spatial-Related and Scale-Aware Network for Crowd Counting

Lei Li, Yuan Dong, Hongliang Bai

Responsive image

Auto-TLDR; Spatial Attention for Crowd Counting

Slides Poster Similar

Crowd counting aims to estimate the number of people in images. Although promising progresses have been made with the prevalence of deep Convolutional Neural Networks, there still remains a challenging task due to cluttered backgrounds and varying scales of people within an image. In this paper, we propose a learnable spatial attention module which can get the spatial relations to diminish the negative impact of backgrounds. Besides, a dense hybrid dilated convolution module is also brought up to preserve information derived from varied scales. With these two modules, our network can deal with the problem caused by scale variance and background interference. To demonstrate the effectiveness of our method, we compare it with state-of-the-art algorithms on three representative crowd counting benchmarks (ShanghaiTech UCF-QNRF,UCF_CC_50). Experimental results show that our proposed network can achieve significant improvements on all the three datasets.

Wireless Localisation in WiFi Using Novel Deep Architectures

Peizheng Li, Han Cui, Aftab Khan, Usman Raza, Robert Piechocki, Angela Doufexi, Tim Farnham

Responsive image

Auto-TLDR; Deep Neural Network for Indoor Localisation of WiFi Devices in Indoor Environments

Slides Poster Similar

This paper studies the indoor localisation of WiFi devices based on a commodity chipset and standard channel sounding. First, we present a novel shallow neural network (SNN) in which features are extracted from the channel state information (CSI) corresponding to WiFi subcarriers received on different antennas and used to train the model. The single layer architecture of this localisation neural network makes it lightweight and easy-to-deploy on devices with stringent constraints on computational resources. We further investigate for localisation the use of deep learning models and design novel architectures for convolutional neural network (CNN) and long-short term memory (LSTM). We extensively evaluate these localisation algorithms for continuous tracking in indoor environments. Experimental results prove that even an SNN model, after a careful handcrafted feature extraction, can achieve accurate localisation. Meanwhile, using a well-organised architecture, the neural network models can be trained directly with raw data from the CSI and localisation features can be automatically extracted to achieve accurate position estimates. We also found that the performance of neural network-based methods are directly affected by the number of anchor access points (APs) regardless of their structure. With three APs, all neural network models proposed in this paper can obtain localisation accuracy of around 0.5 metres. In addition the proposed deep NN architecture reduces the data pre-processing time by 6.5 hours compared with a shallow NN using the data collected in our testbed. In the deployment phase, the inference time is also significantly reduced to 0.1 ms per sample. We also demonstrate the generalisation capability of the proposed method by evaluating models using different target movement characteristics to the ones in which they were trained.

Dual Encoder Fusion U-Net (DEFU-Net) for Cross-manufacturer Chest X-Ray Segmentation

Zhang Lipei, Aozhi Liu, Jing Xiao

Responsive image

Auto-TLDR; Inception Convolutional Neural Network with Dilation for Chest X-Ray Segmentation

Slides Similar

A number of methods based on the deep learning have been applied to medical image segmentation and have achieved state-of-the-art performance. The most famous technique is U-Net which has been used to many medical datasets including the Chest X-ray. Due to the importance of chest x- ray data in studying COVID-19, there is a demand for state-of- art models capable of precisely segmenting chest x-rays. In this paper, we propose a dual encoder fusion U-Net framework for Chest X-rays based on Inception Convolutional Neural Network with dilation, Densely Connected Recurrent Convolutional Neural Network, which is named DEFU-Net. The densely connected recurrent path extends the network deeper for facilitating context feature extraction. In order to increase the width of network and enrich representation of features, the inception blocks with dilation have been used. The inception blocks can capture globally and locally spatial information with various receptive fields to avoid information loss caused by max-pooling. Meanwhile, the features fusion of two path by summation preserve the context and the spatial information for decoding part. We applied this model in Chest X-ray dataset from two different manufacturers (Montgomery and Shenzhen hospital). The DEFU-Net achieves the better performance than basic U-Net, residual U-Net, BCDU- Net, R2U-Net and attention R2U-Net. This model approaches state-of-the-art in this mixed dataset. The open source code for this proposed framework is public available.

DenseRecognition of Spoken Languages

Jaybrata Chakraborty, Bappaditya Chakraborty, Ujjwal Bhattacharya

Responsive image

Auto-TLDR; DenseNet: A Dense Convolutional Network Architecture for Speech Recognition in Indian Languages

Slides Poster Similar

In the present study, we have, for the first time, con- sidered a large number of Indian languages for recog- nition from their audio signals of different sources. A dense convolutional network architecture (DenseNet) has been proposed for this classification problem. Dy- namic elimination of low energy frames from the input speech signal has been considered as a preprocessing operation. Mel-spectrogram of pre-processed speech signal is fed to a DenseNet architecture for recogni- tion of its language. Recognition performance of the proposed architecture has been compared with that of several state-of-the-art deep architectures which include a traditional convolutional neural network (CNN), multiple ResNet architectures, CNN-BLSTM and DenseNet-BLSTM hybrid architectures. Addition- ally, we obtained recognition performances of a stacked BLSTM architecture fed with different sets of hand- crafted features for comparison purpose. Simulations have been performed on two different standard datasets which include (i) IITKGP-MLILSC dataset of news clips in 27 different Indian languages and (ii) Linguistic Data Consortium (LDC) dataset of telephonic conver- sations in 5 different Indian languages. Recognition performance of the proposed framework has been found to be consistently and significantly better than all other frameworks implemented in this study.

FOANet: A Focus of Attention Network with Application to Myocardium Segmentation

Zhou Zhao, Elodie Puybareau, Nicolas Boutry, Thierry Geraud

Responsive image

Auto-TLDR; FOANet: A Hybrid Loss Function for Myocardium Segmentation of Cardiac Magnetic Resonance Images

Slides Poster Similar

In myocardium segmentation of cardiac magnetic resonance images, ambiguities often appear near the boundaries of the target domains due to tissue similarities. To address this issue, we propose a new architecture, called FOANet, which can be decomposed in three main steps: a localization step, a Gaussian-based contrast enhancement step, and a segmentation step. This architecture is supplied with a hybrid loss function that guides the FOANet to study the transformation relationship between the input image and the corresponding label in a threelevel hierarchy (pixel-, patch- and map-level), which is helpful to improve segmentation and recovery of the boundaries. We demonstrate the efficiency of our approach on two public datasets in terms of regional and boundary segmentations.

Two-Level Attention-Based Fusion Learning for RGB-D Face Recognition

Hardik Uppal, Alireza Sepas-Moghaddam, Michael Greenspan, Ali Etemad

Responsive image

Auto-TLDR; Fused RGB-D Facial Recognition using Attention-Aware Feature Fusion

Slides Poster Similar

With recent advances in RGB-D sensing technologies as well as improvements in machine learning and fusion techniques, RGB-D facial recognition has become an active area of research. A novel attention aware method is proposed to fuse two image modalities, RGB and depth, for enhanced RGB-D facial recognition. The proposed method first extracts features from both modalities using a convolutional feature extractor. These features are then fused using a two layer attention mechanism. The first layer focuses on the fused feature maps generated by the feature extractor, exploiting the relationship between feature maps using LSTM recurrent learning. The second layer focuses on the spatial features of those maps using convolution. The training database is preprocessed and augmented through a set of geometric transformations, and the learning process is further aided using transfer learning from a pure 2D RGB image training process. Comparative evaluations demonstrate that the proposed method outperforms other state-of-the-art approaches, including both traditional and deep neural network-based methods, on the challenging CurtinFaces and IIIT-D RGB-D benchmark databases, achieving classification accuracies over 98.2% and 99.3% respectively. The proposed attention mechanism is also compared with other attention mechanisms, demonstrating more accurate results.

Attention Pyramid Module for Scene Recognition

Zhinan Qiao, Xiaohui Yuan, Chengyuan Zhuang, Abolfazl Meyarian

Responsive image

Auto-TLDR; Attention Pyramid Module for Multi-Scale Scene Recognition

Slides Poster Similar

The unrestricted open vocabulary and diverse substances of scenery images bring significant challenges to scene recognition. However, most deep learning architectures and attention methods are developed on general-purpose datasets and omit the characteristics of scene data. In this paper, we exploit the attention pyramid module (APM) to tackle the predicament of scene recognition. Our method streamlines the multi-scale scene recognition pipeline, learns comprehensive scene features at various scales and locations, addresses the interdependency among scales, and further assists feature re-calibration as well as aggregation process. APM is extremely light-weighted and can be easily plugged into existing network architectures in a parameter-efficient manner. By simply integrating APM into ResNet-50, we obtain a 3.54\% boost in terms of top-1 accuracy on the benchmark scene dataset. Comprehensive experiments show that APM achieves better performance comparing with state-of-the-art attention methods using significant less computation budget. Code and pre-trained models will be made publicly available.

Which are the factors affecting the performance of audio surveillance systems?

Antonio Greco, Antonio Roberto, Alessia Saggese, Mario Vento

Responsive image

Auto-TLDR; Sound Event Recognition Using Convolutional Neural Networks and Visual Representations on MIVIA Audio Events

Slides Similar

Sound event recognition systems are rapidly becoming part of our life, since they can be profitably used in several vertical markets, ranging from audio security applications to scene classification and multi-modal analysis in social robotics. In the last years, a not negligible part of the scientific community started to apply Convolutional Neural Networks (CNNs) to image-based representations of the audio stream, due to their successful adoption in almost all the computer vision tasks. In this paper, we carry out a detailed benchmark of various widely used CNN architectures and visual representations on a popular dataset, namely the MIVIA Audio Events database. Our analysis is aimed at understanding how these factors affect the sound event recognition performance with a particular focus on the false positive rate, very relevant in audio surveillance solutions. In fact, although most of the proposed solutions achieve a high recognition rate, the capability of distinguishing the events-of-interest from the background is often not yet sufficient for real systems, and prevent its usage in real applications. Our comprehensive experimental analysis investigates this aspect and allows to identify useful design guidelines for increasing the specificity of sound event recognition systems.

Context-Aware Residual Module for Image Classification

Jing Bai, Ran Chen

Responsive image

Auto-TLDR; Context-Aware Residual Module for Image Classification

Slides Poster Similar

Attention module has achieved great success in numerous vision tasks. However, existing visual attention modules generally consider the features of a single-scale, and cannot make full use of their multi-scale contextual information. Meanwhile, the multi-scale spatial feature representation has demonstrated its outstanding performance in a wide range of applications. However, the multi-scale features are always represented in a layer-wise manner, i.e. it is impossible to know their contextual information at a granular level. Focusing on the above issue, a context-aware residual module for image classification is proposed in this paper. It consists of a novel multi-scale channel attention module MSCAM to learn refined channel weights by considering the visual features of its own scale and its surrounding fields, and a multi-scale spatial aware module MSSAM to further capture more spatial information. Either or both of the two modules can be plugged into any CNN-based backbone image classification architecture with a short residual connection to obtain the context-aware enhanced features. The experiments on public image recognition datasets including CIFAR10, CIFAR100,Tiny-ImageNet and ImageNet consistently demonstrate that our proposed modules significantly outperforms a wide-used state-of-the-art methods, e.g., ResNet and the lightweight networks of MobileNet and SqueezeeNet.

Generalized Iris Presentation Attack Detection Algorithm under Cross-Database Settings

Mehak Gupta, Vishal Singh, Akshay Agarwal, Mayank Vatsa, Richa Singh

Responsive image

Auto-TLDR; MVNet: A Deep Learning-based PAD Network for Iris Recognition against Presentation Attacks

Slides Poster Similar

The deployment of biometrics features based person identification has increased significantly from border access to mobile unlock to electronic transactions. Iris recognition is considered as one of the most accurate biometric modality for person identification. However, the vulnerability of this recognition towards presentation attacks, especially towards the 3D contact lenses, can limit its potential deployments. The textured lenses are so effective in hiding the real texture of iris that it can fool not only the automatic recognition algorithms but also the human examiners. While in literature, several presentation attack detection (PAD) algorithms are presented; however, the significant limitation is the generalizability against an unseen database, unseen sensor, and different imaging environment. Inspired by the success of the hybrid algorithm or fusion of multiple detection networks, we have proposed a deep learning-based PAD network that utilizes multiple feature representation layers. The computational complexity is an essential factor in training the deep neural networks; therefore, to limit the computational complexity while learning multiple feature representation layers, a base model is kept the same. The network is trained end-to-end using a softmax classifier. We have evaluated the performance of the proposed network termed as MVNet using multiple databases such as IIITD-WVU MUIPA, IIITD-WVU UnMIPA database under cross-database training-testing settings. The experiments are performed extensively to assess the generalizability of the proposed algorithm.

Personalized Models in Human Activity Recognition Using Deep Learning

Hamza Amrani, Daniela Micucci, Paolo Napoletano

Responsive image

Auto-TLDR; Incremental Learning for Personalized Human Activity Recognition

Slides Poster Similar

Current sensor-based human activity recognition techniques that rely on a user-independent model struggle to generalize to new users and on to changes that a person may make over time to his or her way of carrying out activities. Incremental learning is a technique that allows to obtain personalized models which may improve the performance on the classifiers thanks to a continuous learning based on user data. Finally, deep learning techniques have been proven to be more effective with respect to traditional ones in the generation of user-independent models. The aim of our work is therefore to put together deep learning techniques with incremental learning in order to obtain personalized models that perform better with respect to user-independent model and personalized model obtained using traditional machine learning techniques. The experimentation was done by comparing the results obtained by a technique in the state of the art with those obtained by two neural networks (ResNet and a simplified CNN) on three datasets. The experimentation showed that neural networks adapt faster to a new user than the baseline.

Appliance Identification Using a Histogram Post-Processing of 2D Local Binary Patterns for Smart Grid Applications

Yassine Himeur, Abdullah Alsalemi, Faycal Bensaali, Abbes Amira

Responsive image

Auto-TLDR; LBP-BEVM based Local Binary Patterns for Appliances Identification in the Smart Grid

Similar

Identifying domestic appliances in the smart grid leads to a better power usage management and further helps in detecting appliance-level abnormalities. An efficient identification can be achieved only if a robust feature extraction scheme is developed with a high ability to discriminate between different appliances on the smart grid. Accordingly, we propose in this paper a novel method to extract electrical power signatures after transforming the power signal to 2D space, which has more encoding possibilities. Following, an improved local binary patterns (LBP) is proposed that relies on improving the discriminative ability of conventional LBP using a post-processing stage. A binarized eigenvalue map (BEVM) is extracted from the 2D power matrix and then used to post-process the generated LBP representation. Next, two histograms are constructed, namely up and down histograms, and are then concatenated to form the global histogram. A comprehensive performance evaluation is performed on two different datasets, namely the GREEND and WITHED, in which power data were collected at 1 Hz and 44000 Hz sampling rates, respectively. The obtained results revealed the superiority of the proposed LBP-BEVM based system in terms of the identification performance versus other 2D descriptors and existing identification frameworks.

Bridging the Gap between Natural and Medical Images through Deep Colorization

Lia Morra, Luca Piano, Fabrizio Lamberti, Tatiana Tommasi

Responsive image

Auto-TLDR; Transfer Learning for Diagnosis on X-ray Images Using Color Adaptation

Slides Poster Similar

Deep learning has thrived by training on large-scale datasets. However, in many applications, as for medical image diagnosis, getting massive amount of data is still prohibitive due to privacy, lack of acquisition homogeneity and annotation cost. In this scenario transfer learning from natural image collections is a standard practice that attempts to tackle shape, texture and color discrepancy all at once through pretrained model fine-tuning. In this work we propose to disentangle those challenges and design a dedicated network module that focuses on color adaptation. We combine learning from scratch of the color module with transfer learning of different classification backbones obtaining an end-to-end, easy-to-train architecture for diagnostic image recognition on X-ray images. Extensive experiments show how our approach is particularly efficient in case of data scarcity and provides a new path for further transferring the learned color information across multiple medical datasets.

PCANet: Pyramid Context-Aware Network for Retinal Vessel Segmentation

Yi Zhang, Yixuan Chen, Kai Zhang

Responsive image

Auto-TLDR; PCANet: Adaptive Context-Aware Network for Automated Retinal Vessel Segmentation

Slides Poster Similar

Automated retinal vessel segmentation plays an important role in the diagnosis of some diseases such as diabetes, arteriosclerosis and hypertension. Recent works attempt to improve segmentation performance by exploring either global or local contexts. However, the context demands are varying from regions in each image and different levels of network. To address these problems, we propose Pyramid Context-aware Network (PCANet), which can adaptively capture multi-scale context representations. Specifically, PCANet is composed of multiple Adaptive Context-aware (ACA) blocks arranged in parallel, each of which can adaptively obtain the context-aware features by estimating affinity coefficients at a specific scale under the guidance of global contextual dependencies. Meanwhile, we import ACA blocks with specific scales in different levels of the network to obtain a coarse-to-fine result. Furthermore, an integrated test-time augmentation method is developed to further boost the performance of PCANet. Finally, extensive experiments demonstrate the effectiveness of the proposed PCANet, and state-of-the-art performances are achieved with AUCs of 0.9866, 0.9886 and F1 Scores of 0.8274, 0.8371 on two public datasets, DRIVE and STARE, respectively.

GazeMAE: General Representations of Eye Movements Using a Micro-Macro Autoencoder

Louise Gillian C. Bautista, Prospero Naval

Responsive image

Auto-TLDR; Fast and Slow Eye Movement Representations for Sentiment-agnostic Eye Tracking

Slides Poster Similar

Eye movements are intricate and dynamic events that contain a wealth of information about the subject and the stimuli. We propose an abstract representation of eye movements that preserve the important nuances in gaze behavior while being stimuli-agnostic. We consider eye movements as raw position and velocity signals and train a deep temporal convolutional autoencoder to learn micro-scale and macro-scale representations corresponding to the fast and slow features of eye movements. These joint representations are evaluated by fitting a linear classifier on various tasks and outperform other works in biometrics and stimuli classification. Further experiments highlight the validity and generalizability of this method, bringing eye tracking research closer to real-world applications.

Dealing with Scarce Labelled Data: Semi-Supervised Deep Learning with Mix Match for Covid-19 Detection Using Chest X-Ray Images

Saúl Calderón Ramirez, Raghvendra Giri, Shengxiang Yang, Armaghan Moemeni, Mario Umaña, David Elizondo, Jordina Torrents-Barrena, Miguel A. Molina-Cabello

Responsive image

Auto-TLDR; Semi-supervised Deep Learning for Covid-19 Detection using Chest X-rays

Slides Poster Similar

Coronavirus (Covid-19) is spreading fast, infecting people through contact in various forms including droplets from sneezing and coughing. Therefore, the detection of infected subjects in an early, quick and cheap manner is urgent. Currently available tests are scarce and limited to people in danger of serious illness. The application of deep learning to chest X-ray images for Covid-19 detection is an attractive approach. However, this technology usually relies on the availability of large labelled datasets, a requirement hard to meet in the context of a virus outbreak. To overcome this challenge, a semi-supervised deep learning model using both labelled and unlabelled data is proposed. We developed and tested a semi-supervised deep learning framework based on the Mix Match architecture to classify chest X-rays into Covid-19, pneumonia and healthy cases. The presented approach was calibrated using two publicly available datasets. The results show an accuracy increase of around $15\%$ under low labelled / unlabelled data ratio. This indicates that our semi-supervised framework can help improve performance levels towards Covid-19 detection when the amount of high-quality labelled data is scarce. Also, we introduce a semi-supervised deep learning boost coefficient which is meant to ease the scalability of our approach and performance comparison.

Tensor Factorization of Brain Structural Graph for Unsupervised Classification in Multiple Sclerosis

Berardino Barile, Marzullo Aldo, Claudio Stamile, Françoise Durand-Dubief, Dominique Sappey-Marinier

Responsive image

Auto-TLDR; A Fully Automated Tensor-based Algorithm for Multiple Sclerosis Classification based on Structural Connectivity Graph of the White Matter Network

Slides Poster Similar

Analysis of longitudinal changes in brain diseases is essential for a better characterization of pathological processes and evaluation of the prognosis. This is particularly important in Multiple Sclerosis (MS) which is the first traumatic disease in young adults, with unknown etiology and characterized by complex inflammatory and degenerative processes leading to different clinical courses. In this work, we propose a fully automated tensor-based algorithm for the classification of MS clinical forms based on the structural connectivity graph of the white matter (WM) network. Using non-negative tensor factorization (NTF), we first focused on the detection of pathological patterns of the brain WM network affected by significant longitudinal variations. Second, we performed unsupervised classification of different MS phenotypes based on these longitudinal patterns, and finally, we used the latent factors obtained by the factorization algorithm to identify the most affected brain regions.

3D Medical Multi-Modal Segmentation Network Guided by Multi-Source Correlation Constraint

Tongxue Zhou, Stéphane Canu, Pierre Vera, Su Ruan

Responsive image

Auto-TLDR; Multi-modality Segmentation with Correlation Constrained Network

Slides Poster Similar

In the field of multimodal segmentation, the correlation between different modalities can be considered for improving the segmentation results. In this paper, we propose a multi-modality segmentation network with a correlation constraint. Our network includes N model-independent encoding paths with N image sources, a correlation constrain block, a feature fusion block, and a decoding path. The model-independent encoding path can capture modality-specific features from the N modalities. Since there exists a strong correlation between different modalities, we first propose a linear correlation block to learn the correlation between modalities, then a loss function is used to guide the network to learn the correlated features based on the correlation representation block. This block forces the network to learn the latent correlated features which are more relevant for segmentation. Considering that not all the features extracted from the encoders are useful for segmentation, we propose to use dual attention based fusion block to recalibrate the features along the modality and spatial paths, which can suppress less informative features and emphasize the useful ones. The fused feature representation is finally projected by the decoder to obtain the segmentation result. Our experiment results tested on BraTS-2018 dataset for brain tumor segmentation demonstrate the effectiveness of our proposed method.

Fast and Accurate Real-Time Semantic Segmentation with Dilated Asymmetric Convolutions

Leonel Rosas-Arias, Gibran Benitez-Garcia, Jose Portillo-Portillo, Gabriel Sanchez-Perez, Keiji Yanai

Responsive image

Auto-TLDR; FASSD-Net: Dilated Asymmetric Pyramidal Fusion for Real-Time Semantic Segmentation

Slides Poster Similar

Recent works have shown promising results applied to real-time semantic segmentation tasks. To maintain fast inference speed, most of the existing networks make use of light decoders, or they simply do not use them at all. This strategy helps to maintain a fast inference speed; however, their accuracy performance is significantly lower in comparison to non-real-time semantic segmentation networks. In this paper, we introduce two key modules aimed to design a high-performance decoder for real-time semantic segmentation for reducing the accuracy gap between real-time and non-real-time segmentation networks. Our first module, Dilated Asymmetric Pyramidal Fusion (DAPF), is designed to substantially increase the receptive field on the top of the last stage of the encoder, obtaining richer contextual features. Our second module, Multi-resolution Dilated Asymmetric (MDA) module, fuses and refines detail and contextual information from multi-scale feature maps coming from early and deeper stages of the network. Both modules exploit contextual information without excessively increasing the computational complexity by using asymmetric convolutions. Our proposed network entitled “FASSD-Net” reaches 78.8% of mIoU accuracy on the Cityscapes validation dataset at 41.1 FPS on full resolution images (1024x2048). Besides, with a light version of our network, we reach 74.1% of mIoU at 133.1 FPS (full resolution) on a single NVIDIA GTX 1080Ti card with no additional acceleration techniques. The source code and pre-trained models are available at https://github.com/GibranBenitez/FASSD-Net.