A Systematic Investigation on Deep Architectures for Automatic Skin Lesions Classification

Pierluigi Carcagni, Marco Leo, Andrea Cuna, Giuseppe Celeste, Cosimo Distante

Responsive image

Auto-TLDR; RegNet: Deep Investigation of Convolutional Neural Networks for Automatic Classification of Skin Lesions

Slides Poster

Computer vision-based techniques are more and more employed in healthcare and medical fields nowadays in order, principally, to be as a support to the experienced medical staff to help them to make a quick and correct diagnosis. One of the hot topics in this arena concerns the automatic classification of skin lesions. Several promising works exist about it, mainly leveraging Convolutional Neural Networks (CNN), but proposed pipeline mainly rely on complex data preprocessing and there is no systematic investigation about how available deep models can actually reach the accuracy needed for real applications. In order to overcome these drawbacks, in this work, an end-to-end pipeline is introduced and some of the most recent Convolutional Neural Networks (CNNs) architectures are included in it and compared on the largest common benchmark dataset recently introduced. To this aim, for the first time in this application context, a new network design paradigm, namely RegNet, has been exploited to get the best models among a population of configurations. The paper introduces a threefold level of contribution and novelty with respect the previous literature: the deep investigation of several CNN architectures driving to a consistent improvement of the lesions recognition accuracy, the exploitation of a new network design paradigm able to study the behavior of populations of models and a deep discussion about pro and cons of each analyzed method paving the path towards new research lines.

Similar papers

A Systematic Investigation on End-To-End Deep Recognition of Grocery Products in the Wild

Marco Leo, Pierluigi Carcagni, Cosimo Distante

Responsive image

Auto-TLDR; Automatic Recognition of Products on grocery shelf images using Convolutional Neural Networks

Slides Poster Similar

Automatic recognition of products on grocery shelf images is a new and attractive topic in computer vision and machine learning since, it can be exploited in different application areas. This paper introduces a complete end-to-end pipeline (without preliminary radiometric and spatial transformations usually involved while dealing with the considered issue) and it provides a systematic investigation of recent machine learning models based on convolutional neural networks for addressing the product recognition task by exploiting the proposed pipeline on a recent challenging grocery product dataset. The investigated models were never been used in this context: they derive from the successful and more generic object recognition task and have been properly tuned to address this specific issue. Besides, also ensembles of nets built by most advanced theoretical fundaments have been taken into account. Gathered classification results were very encouraging since the recognition accuracy has been improved up to 15\% with respect to the leading approaches in the state of art on the same dataset. A discussion about the pros and cons of the investigated solutions are discussed by paving the path towards new research lines.

A Comparison of Neural Network Approaches for Melanoma Classification

Maria Frasca, Michele Nappi, Michele Risi, Genoveffa Tortora, Alessia Auriemma Citarella

Responsive image

Auto-TLDR; Classification of Melanoma Using Deep Neural Network Methodologies

Slides Poster Similar

Melanoma is the deadliest form of skin cancer and it is diagnosed mainly visually, starting from initial clinical screening and followed by dermoscopic analysis, biopsy and histopathological examination. A dermatologist’s recognition of melanoma may be subject to errors and may take some time to diagnose it. In this regard, deep learning can be useful in the study and classification of skin cancer. In particular, by classifying images with Deep Neural Network methodologies, it is possible to obtain comparable or even superior results compared to those of dermatologists. In this paper, we propose a methodology for the classification of melanoma by adopting different deep learning techniques applied to a common dataset, composed of images from the ISIC dataset and consisting of different types of skin diseases, including melanoma on which we applied a specific pre-processing phase. In particular, a comparison of the results is performed in order to select the best effective neural network to be applied to the problem of recognition and classification of melanoma. Moreover, we also evaluate the impact of the pre- processing phase on the final classification. Different metrics such as accuracy, sensitivity, and specificity have been selected to assess the goodness of the adopted neural networks and compare them also with the manual classification of dermatologists.

Supporting Skin Lesion Diagnosis with Content-Based Image Retrieval

Stefano Allegretti, Federico Bolelli, Federico Pollastri, Sabrina Longhitano, Giovanni Pellacani, Costantino Grana

Responsive image

Auto-TLDR; Skin Images Retrieval Using Convolutional Neural Networks for Skin Lesion Classification and Segmentation

Slides Poster Similar

Given the relevance of skin cancer, many attempts have been dedicated to the creation of automated devices that could assist both expert and beginner dermatologists towards fast and early diagnosis of skin lesions. In recent years, tasks such as skin lesion classification and segmentation have been extensively addressed with deep learning algorithms, which in some cases reach a diagnostic accuracy comparable to that of expert physicians. However, the general lack of interpretability and reliability severely hinders the ability of those approaches to actually support dermatologists in the diagnosis process. In this paper a novel skin images retrieval system is presented, which exploits features extracted by Convolutional Neural Networks to gather similar images from a publicly available dataset, in order to assist the diagnosis process of both expert and novice practitioners. In the proposed framework, Resnet-50 is initially trained for the classification of dermoscopic images; then, the feature extraction part is isolated, and an embedding network is build on top of it. The embedding learns an alternative representation, which allows to check image similarity by means of a distance measure. Experimental results reveal that the proposed method is able to select meaningful images, which can effectively boost the classification accuracy of human dermatologists.

Investigating and Exploiting Image Resolution for Transfer Learning-Based Skin Lesion Classification

Amirreza Mahbod, Gerald Schaefer, Chunliang Wang, Rupert Ecker, Georg Dorffner, Isabella Ellinger

Responsive image

Auto-TLDR; Fine-tuned Neural Networks for Skin Lesion Classification Using Dermoscopic Images

Slides Poster Similar

Skin cancer is among the most common cancer types. Dermoscopic image analysis improves the diagnostic accuracy for detection of malignant melanoma and other pigmented skin lesions when compared to unaided visual inspection. Hence, computer-based methods to support medical experts in the diagnostic procedure are of great interest. Fine-tuning pre-trained convolutional neural networks (CNNs) has been shown to work well for skin lesion classification. Pre-trained CNNs are usually trained with natural images of a fixed image size which is typically significantly smaller than captured skin lesion images and consequently dermoscopic images are downsampled for fine-tuning. However, useful medical information may be lost during this transformation. In this paper, we explore the effect of input image size on skin lesion classification performance of fine-tuned CNNs. For this, we resize dermoscopic images to different resolutions, ranging from 64x64 to 768x768 pixels and investigate the resulting classification performance of three well-established CNNs, namely DenseNet-121, ResNet-18, and ResNet-50. Our results show that using very small images (of size 64x64 pixels) degrades the classification performance, while images of size 128x128 pixels and above support good performance with larger image sizes leading to slightly improved classification. We further propose a novel fusion approach based on a three-level ensemble strategy that exploits multiple fine-tuned networks trained with dermoscopic images at various sizes. When applied on the ISIC 2017 skin lesion classification challenge, our fusion approach yields an area under the receiver operating characteristic curve of 89.2% and 96.6% for melanoma classification and seborrheic keratosis classification, respectively, outperforming state-of-the-art algorithms.

Skin Lesion Classification Using Weakly-Supervised Fine-Grained Method

Xi Xue, Sei-Ichiro Kamata, Daming Luo

Responsive image

Auto-TLDR; Different Region proposal module for skin lesion classification

Slides Poster Similar

In recent years, skin cancer has become one of the most common cancers. Among all types of skin cancers, melanoma is the most fatal one and many people die of this disease every year. Early detection can greatly reduce the death rate and save more lives. Skin lesions are one of the early symptoms of melanoma and other types of skin cancer. So accurately recognizing various skin lesions in early stage are of great significance. There have been lots of existing works based on convolutional neural networks (CNN) to solve skin lesion classification but seldom do them involve the similarity among different lesions. For example, we find that some lesions of melanoma and nevi look similar in appearance which is hard for neural network to distinguish categories of skin lesions. Inspired by fine-grained image classification, we propose a novel network to distinguish each category accurately. In our paper, we design an effective module, distinct region proposal module (DRPM), to extract the distinct regions from each image. Spatial attention and channel-wise attention are both utilized to enrich feature maps and guide the network to focus on the highlighted areas in a weakly-supervised way. In addition, two preprocessing steps are added to ensure the network to get better results. We demonstrate the potential of the proposed method on ISIC 2017 dataset. Experiments show that our approach is effective and efficient.

Confidence Calibration for Deep Renal Biopsy Immunofluorescence Image Classification

Federico Pollastri, Juan Maroñas, Federico Bolelli, Giulia Ligabue, Roberto Paredes, Riccardo Magistroni, Costantino Grana

Responsive image

Auto-TLDR; A Probabilistic Convolutional Neural Network for Immunofluorescence Classification in Renal Biopsy

Slides Poster Similar

With this work we tackle immunofluorescence classification in renal biopsy, employing state-of-the-art Convolutional Neural Networks. In this setting, the aim of the probabilistic model is to assist an expert practitioner towards identifying the location pattern of antibody deposits within a glomerulus. Since modern neural networks often provide overconfident outputs, we stress the importance of having a reliable prediction, demonstrating that Temperature Scaling, a recently introduced re-calibration technique, can be successfully applied to immunofluorescence classification in renal biopsy. Experimental results demonstrate that the designed model yields good accuracy on the specific task, and that Temperature Scaling is able to provide reliable probabilities, which are highly valuable for such a task given the low inter-rater agreement.

Fine-Tuning Convolutional Neural Networks: A Comprehensive Guide and Benchmark Analysis for Glaucoma Screening

Amed Mvoulana, Rostom Kachouri, Mohamed Akil

Responsive image

Auto-TLDR; Fine-tuning Convolutional Neural Networks for Glaucoma Screening

Slides Poster Similar

This work aimed at giving a comprehensive and in-detailed guide on the route to fine-tuning Convolutional Neural Networks (CNNs) for glaucoma screening. Transfer learning consists in a promising alternative to train CNNs from stratch, to avoid the huge data and resources requirements. After a thorough study of five state-of-the-art CNNs architectures, a complete and well-explained strategy for fine-tuning these networks is proposed, using hyperparameter grid-searching and two-phase training approach. Excellent performance is reached on model evaluation, with a 0.9772 AUROC validation rate, giving arise to reliable glaucoma diagosis-help systems. Also, a benchmark analysis is conducted across all fine-tuned models, studying them according to performance indices such as model complexity and size, AUROC density and inference time. This in-depth analysis allows a rigorous comparison between model characteristics, and is useful for giving practioners important trademarks for prospective applications and deployments.

Dual Stream Network with Selective Optimization for Skin Disease Recognition in Consumer Grade Images

Krishnam Gupta, Jaiprasad Rampure, Monu Krishnan, Ajit Narayanan, Nikhil Narayan

Responsive image

Auto-TLDR; A Deep Network Architecture for Skin Disease Localisation and Classification on Consumer Grade Images

Slides Poster Similar

Skin disease localisation and classification on consumer-grade images is more challenging compared to that on dermoscopic imaging. Consumer grade images refer to the images taken using commonly available imaging devices such as a mobile camera or a hand held digital camera. Such images, in addition to having the skin condition of interest in a very small area of the image, has other noisy non-clinical details introduced due to the lighting conditions and the distance of the hand held device from the anatomy at the time of acquisition. We propose a novel deep network architecture \& a new optimization strategy for classification with implicit localisation of skin diseases from clinical/consumer grade images. A weakly supervised segmentation algorithm is first employed to extract Region of Interests (RoI) from the image, the RoI and the original image form the two input streams of the proposed architecture. Each stream of the architecture learns high level and low level features from the original image and the RoI, respectively. The two streams are independently optimised until the loss stops decreasing after which both the streams are optimised collectively with the help of a third combiner sub-network. Such a strategy resulted in a 5% increase of accuracy over the current state-of-the-art methods on SD-198 dataset, which is publicly available. The proposed algorithm is also validated on a new dataset containing over 12,000 images across 75 different skin conditions. We intend to release this dataset as SD-75 to aid in the advancement of research on skin condition classification on consumer grade images.

Planar 3D Transfer Learning for End to End Unimodal MRI Unbalanced Data Segmentation

Martin Kolarik, Radim Burget, Carlos M. Travieso-Gonzalez, Jan Kocica

Responsive image

Auto-TLDR; Planar 3D Res-U-Net Network for Unbalanced 3D Image Segmentation using Fluid Attenuation Inversion Recover

Slides Similar

We present a novel approach of 2D to 3D transfer learning based on mapping pre-trained 2D convolutional neural network weights into planar 3D kernels. The method is validated by proposed planar 3D res-u-net network with encoder transferred from the 2D VGG-16 which is applied for a single-stage unbalanced 3D image data segmentation. In particular, we evaluate the method on the MICCAI 2016 MS lesion segmentation challenge dataset utilizing solely Fluid Attenuation Inversion Recover (FLAIR) sequence without brain extraction for training and inference to simulate real medical praxis. The planar 3D res-u-net network performed the best both in sensitivity and Dice score amongst end to end methods processing raw MRI scans and achieved comparable Dice score to a state-of-the-art unimodal not end to end approach. Complete source code was released under the open-source license and this paper is in compliance with the Machine learning Reproducibility Checklist. By implementing practical transfer learning for 3D data representation we were able to successfully segment heavily unbalanced data without selective sampling and achieved more reliable results using less training data in single modality. From medical perspective, the unimodal approach gives an advantage in real praxis as it does not require co-registration nor additional scanning time during examination. Although modern medical imaging methods capture high resolution 3D anatomy scans suitable for computer aided detection system processing, deployment of automatic systems for interpretation of radiology imaging is still rather theoretical in many medical areas. Our work aims to bridge the gap offering solution for partial research questions.

Trainable Spectrally Initializable Matrix Transformations in Convolutional Neural Networks

Michele Alberti, Angela Botros, Schuetz Narayan, Rolf Ingold, Marcus Liwicki, Mathias Seuret

Responsive image

Auto-TLDR; Trainable and Spectrally Initializable Matrix Transformations for Neural Networks

Slides Poster Similar

In this work, we introduce a new architectural component to Neural Networks (NN), i.e., trainable and spectrally initializable matrix transformations on feature maps. While previous literature has already demonstrated the possibility of adding static spectral transformations as feature processors, our focus is on more general trainable transforms. We study the transforms in various architectural configurations on four datasets of different nature: from medical (ColorectalHist, HAM10000) and natural (Flowers) images to historical documents (CB55). With rigorous experiments that control for the number of parameters and randomness, we show that networks utilizing the introduced matrix transformations outperform vanilla neural networks. The observed accuracy increases appreciably across all datasets. In addition, we show that the benefit of spectral initialization leads to significantly faster convergence, as opposed to randomly initialized matrix transformations. The transformations are implemented as auto-differentiable PyTorch modules that can be incorporated into any neural network architecture. The entire code base is open-source.

Weight Estimation from an RGB-D Camera in Top-View Configuration

Marco Mameli, Marina Paolanti, Nicola Conci, Filippo Tessaro, Emanuele Frontoni, Primo Zingaretti

Responsive image

Auto-TLDR; Top-View Weight Estimation using Deep Neural Networks

Slides Poster Similar

The development of so-called soft-biometrics aims at providing information related to the physical and behavioural characteristics of a person. This paper focuses on bodyweight estimation based on the observation from a top-view RGB-D camera. In fact, the capability to estimate the weight of a person can be of help in many different applications, from health-related scenarios to business intelligence and retail analytics. To deal with this issue, a TVWE (Top-View Weight Estimation) framework is proposed with the aim of predicting the weight. The approach relies on the adoption of Deep Neural Networks (DNNs) that have been trained on depth data. Each network has also been modified in its top section to replace classification with prediction inference. The performance of five state-of-art DNNs has been compared, namely VGG16, ResNet, Inception, DenseNet and Efficient-Net. In addition, a convolutional auto-encoder has also been included for completeness. Considering the limited literature in this domain, the TVWE framework has been evaluated on a new publicly available dataset: “VRAI Weight estimation Dataset”, which also collects, for each subject, labels related to weight, gender, and height. The experimental results have demonstrated that the proposed methods are suitable for this task, bringing different and significant insights for the application of the solution in different domains.

Bridging the Gap between Natural and Medical Images through Deep Colorization

Lia Morra, Luca Piano, Fabrizio Lamberti, Tatiana Tommasi

Responsive image

Auto-TLDR; Transfer Learning for Diagnosis on X-ray Images Using Color Adaptation

Slides Poster Similar

Deep learning has thrived by training on large-scale datasets. However, in many applications, as for medical image diagnosis, getting massive amount of data is still prohibitive due to privacy, lack of acquisition homogeneity and annotation cost. In this scenario transfer learning from natural image collections is a standard practice that attempts to tackle shape, texture and color discrepancy all at once through pretrained model fine-tuning. In this work we propose to disentangle those challenges and design a dedicated network module that focuses on color adaptation. We combine learning from scratch of the color module with transfer learning of different classification backbones obtaining an end-to-end, easy-to-train architecture for diagnostic image recognition on X-ray images. Extensive experiments show how our approach is particularly efficient in case of data scarcity and provides a new path for further transferring the learned color information across multiple medical datasets.

Cross-View Relation Networks for Mammogram Mass Detection

Ma Jiechao, Xiang Li, Hongwei Li, Ruixuan Wang, Bjoern Menze, Wei-Shi Zheng

Responsive image

Auto-TLDR; Multi-view Modeling for Mass Detection in Mammogram

Slides Poster Similar

In medical image analysis, multi-view modeling is crucial for pathology detection when the target lesion is presented in different views, e.g. mass lesions in breast. Currently mammogram is the most effective imaging modality for mass lesion detection of breast cancer at the early stage. The pathological information from the two paired views (i.e., medio-lateral oblique and cranio-caudal) are highly relational and complementary, which is crucial for diagnosis in clinical practice. Existing mass detection methods do not consider learning synergistic features from the two relational views. For the first time, we propose a novel mass detection framework to capture the latent relation information from the two paired views of a same mass in mammogram. We evaluate our model on a public mammogram dataset and a large-scale private dataset, demonstrating that the proposed method outperforms existing feature fusion approaches and state-of-the-art mass detection methods. We further analyze the performance gains from the relation modeling. Our quantitative and qualitative results suggest that jointly learning cross-view features boosts the detection performance of existing models, which is a promising avenue for mass detection task in mammogram.

Learn to Segment Retinal Lesions and Beyond

Qijie Wei, Xirong Li, Weihong Yu, Xiao Zhang, Yongpeng Zhang, Bojie Hu, Bin Mo, Di Gong, Ning Chen, Dayong Ding, Youxin Chen

Responsive image

Auto-TLDR; Multi-task Lesion Segmentation and Disease Classification for Diabetic Retinopathy Grading

Poster Similar

Towards automated retinal screening, this paper makes an endeavor to simultaneously achieve pixel-level retinal lesion segmentation and image-level disease classification. Such a multi-task approach is crucial for accurate and clinically interpretable disease diagnosis. Prior art is insufficient due to three challenges, i.e., lesions lacking objective boundaries, clinical importance of lesions irrelevant to their size, and the lack of one-to-one correspondence between lesion and disease classes. This paper attacks the three challenges in the context of diabetic retinopathy (DR) grading. We propose Lesion-Net, a new variant of fully convolutional networks, with its expansive path re- designed to tackle the first challenge. A dual Dice loss that leverages both semantic segmentation and image classification losses is introduced to resolve the second challenge. Lastly, we build a multi-task network that employs Lesion-Net as a side- attention branch for both DR grading and result interpretation. A set of 12K fundus images is manually segmented by 45 ophthalmologists for 8 DR-related lesions, resulting in 290K manual segments in total. Extensive experiments on this large- scale dataset show that our proposed approach surpasses the prior art for multiple tasks including lesion segmentation, lesion classification and DR grading.

Automatic Semantic Segmentation of Structural Elements related to the Spinal Cord in the Lumbar Region by Using Convolutional Neural Networks

Jhon Jairo Sáenz Gamboa, Maria De La Iglesia-Vaya, Jon Ander Gómez

Responsive image

Auto-TLDR; Semantic Segmentation of Lumbar Spine Using Convolutional Neural Networks

Slides Poster Similar

This work addresses the problem of automatically segmenting the MR images corresponding to the lumbar spine. The purpose is to detect and delimit the different structural elements like vertebrae, intervertebral discs, nerves, blood vessels, etc. This task is known as semantic segmentation. The approach proposed in this work is based on convolutional neural networks whose output is a mask where each pixel from the input image is classified into one of the possible classes. Classes were defined by radiologists and correspond to structural elements and tissues. The proposed network architectures are variants of the U-Net. Several complementary blocks were used to define the variants: spatial attention models, deep supervision and multi-kernels at input, this last block type is based on the idea of inception. Those architectures which got the best results are described in this paper, and their results are discussed. Two of the proposed architectures outperform the standard U-Net used as baseline.

Deep Learning in the Ultrasound Evaluation of Neonatal Respiratory Status

Michela Gravina, Diego Gragnaniello, Giovanni Poggi, Luisa Verdoliva, Carlo Sansone, Iuri Corsini, Carlo Dani, Fabio Meneghin, Gianluca Lista, Salvatore Aversa, Migliaro Migliaro, Raimondi Francesco

Responsive image

Auto-TLDR; Lung Ultrasound Imaging with Deep Learning Networks and Training Strategies: An Analysis and Adaptation

Slides Poster Similar

Lung ultrasound imaging is reaching growing interest from the scientific community. On one side, thanks to its harmlessness and high descriptive power, this kind of diagnostic imaging became largely adopted in sensitive applications, like the diagnosis and follow-up of preterm newborns in neonatal intensive care units. At the same time, novel image analysis and pattern recognition approaches can fully exploit the rich information contained in this data, making them attractive for the research community. In this work, we present a thorough analysis of recent deep learning networks and training strategies conducted on a vast and challenging multicenter dataset comprising 87 patients with different diseases and gestational ages. These approaches are firstly discussed in the context of lung respiratory status assessing through ultrasound imaging and then evaluated against a reference marker. The conducted analysis shed some light on this problem, by relating the criticisms that can mislead the training procedure and proposing some adaptations to the specific problem. The achieved results sensibly outperform that obtained by previous work, based on textural features, and narrow the gap with the visual score predicted by the human experts.

Video Face Manipulation Detection through Ensemble of CNNs

Nicolo Bonettini, Edoardo Daniele Cannas, Sara Mandelli, Luca Bondi, Paolo Bestagini, Stefano Tubaro

Responsive image

Auto-TLDR; Face Manipulation Detection in Video Sequences Using Convolutional Neural Networks

Slides Similar

In the last few years, several techniques for facial manipulation in videos have been successfully developed and made available to the masses (i.e., FaceSwap, deepfake, etc.). These methods enable anyone to easily edit faces in video sequences with incredibly realistic results and a very little effort. Despite the usefulness of these tools in many fields, if used maliciously, they can have a significantly bad impact on society (e.g., fake news spreading, cyber bullying through fake revenge porn). The ability of objectively detecting whether a face has been manipulated in a video sequence is then a task of utmost importance. In this paper, we tackle the problem of face manipulation detection in video sequences targeting modern facial manipulation techniques. In particular, we study the ensembling of different trained Convolutional Neural Network (CNN) models. In the proposed solution, different models are obtained starting from a base network (i.e., EfficientNetB4) making use of two different concepts: (i) attention layers; (ii) siamese training. We show that combining these networks leads to promising face manipulation detection results on two publicly available datasets with more than 119000 videos.

A Lumen Segmentation Method in Ureteroscopy Images Based on a Deep Residual U-Net Architecture

Jorge Lazo, Marzullo Aldo, Sara Moccia, Michele Catellani, Benoit Rosa, Elena De Momi, Michel De Mathelin, Francesco Calimeri

Responsive image

Auto-TLDR; A Deep Neural Network for Ureteroscopy with Residual Units

Slides Poster Similar

Ureteroscopy is becoming the first surgical treatment option for the majority of urinary affections. This procedure is carried out using an endoscope which provides the surgeon with the visual and spatial information necessary to navigate inside the urinary tract. Having in mind the development of surgical assistance systems, that could enhance the performance of surgeon, the task of lumen segmentation is a fundamental part since this is the visual reference which marks the path that the endoscope should follow. This is something that has not been analyzed in ureteroscopy data before. However, this task presents several challenges given the image quality and the conditions itself of ureteroscopy procedures. In this paper, we study the implementation of a Deep Neural Network which exploits the advantage of residual units in an architecture based on U-Net. For the training of these networks, we analyze the use of two different color spaces: gray-scale and RGB data images. We found that training on gray-scale images gives the best results obtaining mean values of Dice Score, Precision, and Recall of 0.73, 0.58, and 0.92 respectively. The results obtained show that the use of residual U-Net could be a suitable model for further development for a computer-aided system for navigation and guidance through the urinary system.

Classify Breast Histopathology Images with Ductal Instance-Oriented Pipeline

Beibin Li, Ezgi Mercan, Sachin Mehta, Stevan Knezevich, Corey Arnold, Donald Weaver, Joann Elmore, Linda Shapiro

Responsive image

Auto-TLDR; DIOP: Ductal Instance-Oriented Pipeline for Diagnostic Classification

Slides Poster Similar

In this study, we propose the Ductal Instance-Oriented Pipeline (DIOP) that contains a duct-level instance segmentation model, a tissue-level semantic segmentation model, and three-levels of features for diagnostic classification. Based on recent advancements in instance segmentation and the Mask R-CNN model, our duct-level segmenter tries to identify each ductal individual inside a microscopic image; then, it extracts tissue-level information from the identified ductal instances. Leveraging three levels of information obtained from these ductal instances and also the histopathology image, the proposed DIOP outperforms previous approaches (both feature-based and CNN-based) in all diagnostic tasks; for the four-way classification task, the DIOP achieves comparable performance to general pathologists in this unique dataset. The proposed DIOP only takes a few seconds to run in the inference time, which could be used interactively on most modern computers. More clinical explorations are needed to study the robustness and generalizability of this system in the future.

Which are the factors affecting the performance of audio surveillance systems?

Antonio Greco, Antonio Roberto, Alessia Saggese, Mario Vento

Responsive image

Auto-TLDR; Sound Event Recognition Using Convolutional Neural Networks and Visual Representations on MIVIA Audio Events

Slides Similar

Sound event recognition systems are rapidly becoming part of our life, since they can be profitably used in several vertical markets, ranging from audio security applications to scene classification and multi-modal analysis in social robotics. In the last years, a not negligible part of the scientific community started to apply Convolutional Neural Networks (CNNs) to image-based representations of the audio stream, due to their successful adoption in almost all the computer vision tasks. In this paper, we carry out a detailed benchmark of various widely used CNN architectures and visual representations on a popular dataset, namely the MIVIA Audio Events database. Our analysis is aimed at understanding how these factors affect the sound event recognition performance with a particular focus on the false positive rate, very relevant in audio surveillance solutions. In fact, although most of the proposed solutions achieve a high recognition rate, the capability of distinguishing the events-of-interest from the background is often not yet sufficient for real systems, and prevent its usage in real applications. Our comprehensive experimental analysis investigates this aspect and allows to identify useful design guidelines for increasing the specificity of sound event recognition systems.

A Benchmark Dataset for Segmenting Liver, Vasculature and Lesions from Large-Scale Computed Tomography Data

Bo Wang, Zhengqing Xu, Wei Xu, Qingsen Yan, Liang Zhang, Zheng You

Responsive image

Auto-TLDR; The Biggest Treatment-Oriented Liver Cancer Dataset for Segmentation

Slides Poster Similar

How to build a high-performance liver-related computer assisted diagnosis system is an open question of great interest. However, the performance of the state-of-art algorithm is always limited by the amount of data and quality of the label. To address this problem, we propose the biggest treatment-oriented liver cancer dataset for liver surgery and treatment planning. This dataset provides 216 cases (totally about 268K frames) scanned images in contrast-enhanced computed tomography (CT). We labeled all the CT images with the liver, liver vasculature and liver tumor segmentation ground truth for train and tune segmentation algorithms in advance. Based on that, we evaluate several recent and state-of-the-art segmentation algorithms, including 7 deep learning methods, on CT sequences. All results are compared to reference segmentations five error metrics that highlight different aspects of segmentation accuracy. In general, compared with previous datasets, our dataset is really a challenging dataset. To our knowledge, the proposed dataset and benchmark allow for the first time systematic exploration of such issues, and will be made available to allow for further research in this field.

Creating Classifier Ensembles through Meta-Heuristic Algorithms for Aerial Scene Classification

Álvaro Roberto Ferreira Jr., Gustavo Gustavo Henrique De Rosa, Joao Paulo Papa, Gustavo Carneiro, Fabio Augusto Faria

Responsive image

Auto-TLDR; Univariate Marginal Distribution Algorithm for Aerial Scene Classification Using Meta-Heuristic Optimization

Slides Poster Similar

Aerial scene classification is a challenging task to be solved in the remote sensing area, whereas deep learning approaches, such as Convolutional Neural Networks (CNN), are being widely employed to overcome such a problem. Nevertheless, it is not straightforward to find single CNN models that can solve all aerial scene classification tasks, allowing the nurturing of a better alternative, which is to fuse CNN-based classifiers into an ensemble. However, an appropriate choice of the classifiers that will belong to the ensemble is a critical factor, as it is unfeasible to employ all the possible classifiers in the literature. Therefore, this work proposes a novel framework based on meta-heuristic optimization for creating optimized-ensembles in the context of aerial scene classification. The experimental results were performed across nine meta-heuristic algorithms and three aerial scene literature datasets, being compared in terms of effectiveness (accuracy), efficiency (execution time), and behavioral performance in different scenarios. Finally, one can observe that the Univariate Marginal Distribution Algorithm (UMDA) overcame popular literature meta-heuristic algorithms, such as Genetic Programming and Particle Swarm Optimization considering the adopted criteria in the performed experiments.

Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action Localization

Negin Ghamsarian, Mario Taschwer, Doris Putzgruber, Stephanie. Sarny, Klaus Schoeffmann

Responsive image

Auto-TLDR; relevance-based retrieval in cataract surgery videos

Slides Similar

In cataract surgery, the operation is performed with the help of a microscope. Since the microscope enables watching real-time surgery by up to two people only, a major part of surgical training is conducted using the recorded videos. To optimize the training procedure with the video content, the surgeons require an automatic relevance detection approach. In addition to relevance-based retrieval, these results can be further used for skill assessment and irregularity detection in cataract surgery videos. In this paper, a three-module framework is proposed to detect and classify the relevant phase segments in cataract videos. Taking advantage of an idle frame recognition network, the video is divided into idle and action segments. To boost the performance in relevance detection Mask R-CNN is utilized to detect the cornea in each frame where the relevant surgical actions are conducted. The spatio-temporal localized segments containing higher-resolution information about the pupil texture and actions, and complementary temporal information from the same phase are fed into the relevance detection module. This module consists of four parallel recurrent CNNs being responsible to detect four relevant phases that have been defined with medical experts. The results will then be integrated to classify the action phases as irrelevant or one of four relevant phases. Experimental results reveal that the proposed approach outperforms static CNNs and different configurations of feature-based and end-to-end recurrent networks.

Transfer Learning through Weighted Loss Function and Group Normalization for Vessel Segmentation from Retinal Images

Abdullah Sarhan, Jon Rokne, Reda Alhajj, Andrew Crichton

Responsive image

Auto-TLDR; Deep Learning for Segmentation of Blood Vessels in Retinal Images

Slides Poster Similar

The vascular structure of blood vessels is important in diagnosing retinal conditions such as glaucoma and diabetic retinopathy. Accurate segmentation of these vessels can help in detecting retinal objects such as the optic disc and optic cup and hence determine if there are damages to these areas. Moreover, the structure of the vessels can help in diagnosing glaucoma. The rapid development of digital imaging and computer-vision techniques has increased the potential for developing approaches for segmenting retinal vessels. In this paper, we propose an approach for segmenting retinal vessels that uses deep learning along with transfer learning. We adapted the U-Net structure to use a customized InceptionV3 as the encoder and used multiple skip connections to form the decoder. Moreover, we used a weighted loss function to handle the issue of class imbalance in retinal images. Furthermore, we contributed a new dataset to this field. We tested our approach on six publicly available datasets and a newly created dataset. We achieved an average accuracy of 95.60\% and a Dice coefficient of 80.98\%. The results obtained from comprehensive experiments demonstrate the robustness of our approach to the segmentation of blood vessels in retinal images obtained from different sources. Our approach results in greater segmentation accuracy than other approaches.

SyNet: An Ensemble Network for Object Detection in UAV Images

Berat Mert Albaba, Sedat Ozer

Responsive image

Auto-TLDR; SyNet: Combining Multi-Stage and Single-Stage Object Detection for Aerial Images

Poster Similar

Recent advances in camera equipped drone applications and their widespread use increased the demand on vision based object detection algorithms for aerial images. Object detection process is inherently a challenging task as a generic computer vision problem, however, since the use of object detection algorithms on UAVs (or on drones) is relatively a new area, it remains as a more challenging problem to detect objects in aerial images. There are several reasons for that including: (i) the lack of large drone datasets including large object variance, (ii) the large orientation and scale variance in drone images when compared to the ground images, and (iii) the difference in texture and shape features between the ground and the aerial images. Deep learning based object detection algorithms can be classified under two main categories: (a) single-stage detectors and (b) multi-stage detectors. Both single-stage and multi-stage solutions have their advantages and disadvantages over each other. However, a technique to combine the good sides of each of those solutions could yield even a stronger solution than each of those solutions individually. In this paper, we propose an ensemble network, SyNet, that combines a multi-stage method with a single-stage one with the motivation of decreasing the high false negative rate of multi-stage detectors and increasing the quality of the single-stage detector proposals. As building blocks, CenterNet and Cascade R-CNN with pretrained feature extractors are utilized along with an ensembling strategy. We report the state of the art results obtained by our proposed solution on two different datasets: namely MS-COCO and visDrone with \%52.1 $mAP_{IoU = 0.75}$ is obtained on MS-COCO $val2017$ dataset and \%26.2 $mAP_{IoU = 0.75}$ is obtained on VisDrone $test-set$. Our code is available at: https://github.com/mertalbaba/SyNet}{https://github.com/mer talbaba/SyNet

Uncertainty-Aware Data Augmentation for Food Recognition

Eduardo Aguilar, Bhalaji Nagarajan, Rupali Khatun, Marc Bolaños, Petia Radeva

Responsive image

Auto-TLDR; Data Augmentation for Food Recognition Using Epistemic Uncertainty

Slides Poster Similar

Food recognition has recently attracted attention of many researchers. However, high food ambiguity, inter-class variability and intra-class similarity define a real challenge for the Deep learning and Computer Vision algorithms. In order to improve their performance, it is necessary to better understand what the model learns and, from this, to determine the type of data that should be additionally included for being the most beneficial to the training procedure. In this paper, we propose a new data augmentation strategy that estimates and uses the epistemic uncertainty to guide the model training. The method follows an active learning framework, where the new synthetic images are generated from the hard to classify real ones present in the training data based on the epistemic uncertainty. Hence, it allows the food recognition algorithm to focus on difficult images in order to learn their discriminatives features. On the other hand, avoiding data generation from images that do not contribute to the recognition makes it faster and more efficient. We show that the proposed method allows to improve food recognition and provides a better trade-off between micro- and macro-recall measures.

Automatic Classification of Human Granulosa Cells in Assisted Reproductive Technology Using Vibrational Spectroscopy Imaging

Marina Paolanti, Emanuele Frontoni, Giorgia Gioacchini, Giorgini Elisabetta, Notarstefano Valentina, Zacà Carlotta, Carnevali Oliana, Andrea Borini, Marco Mameli

Responsive image

Auto-TLDR; Predicting Oocyte Quality in Assisted Reproductive Technology Using Machine Learning Techniques

Slides Poster Similar

In the field of reproductive technology, the biochemical composition of female gametes has been successfully investigated with the use of vibrational spectroscopy. Currently, in assistive reproductive technology (ART), there are no shared criteria for the choice of oocyte, and automatic classification methods for the best quality oocytes have not yet been applied. In this paper, considering the lack of criteria in Assisted Reproductive Technology (ART), we use Machine Learning (ML) techniques to predict oocyte quality for a successful pregnancy. To improve the chances of successful implantation and minimize any complications during the pregnancy, Fourier transform infrared microspectroscopy (FTIRM) analysis has been applied on granulosa cells (GCs) collected along with the oocytes during oocyte aspiration, as it is routinely done in ART, and specific spectral biomarkers were selected by multivariate statistical analysis. A proprietary biological reference dataset (BRD) was successfully collected to predict the best oocyte for a successful pregnancy. Personal health information are stored, maintained and backed up using a cloud computing service. Using a user-friendly interface, the user will evaluate whether or not the selected oocyte will have a positive result. This interface includes a dashboard for retrospective analysis, reporting, real-time processing, and statistical analysis. The experimental results are promising and confirm the efficiency of the method in terms of classification metrics: precision, recall, and F1-score (F1) measures.

An Evaluation of DNN Architectures for Page Segmentation of Historical Newspapers

Manuel Burghardt, Bernhard Liebl

Responsive image

Auto-TLDR; Evaluation of Backbone Architectures for Optical Character Segmentation of Historical Documents

Slides Poster Similar

One important and particularly challenging step in the optical character recognition of historical documents with complex layouts, such as newspapers, is the separation of text from non-text content (e.g. page borders or illustrations). This step is commonly referred to as page segmentation. While various rule-based algorithms have been proposed, the applicability of Deep Neural Networks for this task recently has gained a lot of attention. In this paper, we perform a systematic evaluation of 11 different published backbone architectures and 9 different tiling and scaling configurations for separating text, tables or table column lines. We also show the influence of the number of labels and the number of training pages on the segmentation quality, which we measure using the Matthews Correlation Coefficient. Our results show that (depending on the task) Inception-ResNet-v2 and EfficientNet backbones work best, vertical tiling is generally preferable to other tiling approaches, and training data that comprises 30 to 40 pages will be sufficient most of the time.

Improving Model Accuracy for Imbalanced Image Classification Tasks by Adding a Final Batch Normalization Layer: An Empirical Study

Veysel Kocaman, Ofer M. Shir, Thomas Baeck

Responsive image

Auto-TLDR; Exploiting Batch Normalization before the Output Layer in Deep Learning for Minority Class Detection in Imbalanced Data Sets

Slides Poster Similar

Some real-world domains, such as Agriculture and Healthcare, comprise early-stage disease indications whose recording constitutes a rare event, and yet, whose precise detection at that stage is critical. In this type of highly imbalanced classification problems, which encompass complex features, deep learning (DL) is much needed because of its strong detection capabilities. At the same time, DL is observed in practice to favor majority over minority classes and consequently suffer from inaccurate detection of the targeted early-stage indications. To simulate such scenarios, we artificially generate skewness (99% vs. 1%) for certain plant types out of the PlantVillage dataset as a basis for classification of scarce visual cues through transfer learning. By randomly and unevenly picking healthy and unhealthy samples from certain plant types to form a training set, we consider a base experiment as fine-tuning ResNet34 and VGG19 architectures and then testing the model performance on a balanced dataset of healthy and unhealthy images. We empirically observe that the initial F1 test score jumps from 0.29 to 0.95 for the minority class upon adding a final Batch Normalization (BN) layer just before the output layer in VGG19. We demonstrate that utilizing an additional BN layer before the output layer in modern CNN architectures has a considerable impact in terms of minimizing the training time and testing error for minority classes in highly imbalanced data sets. Moreover, when the final BN is employed, trying to minimize validation and training losses may not be an optimal way for getting a high F1 test score for minority classes in anomaly detection problems. That is, the network might perform better even if it is not ‘confident’ enough while making a prediction; leading to another discussion about why softmax output is not a good uncertainty measure for DL models.

The DeepHealth Toolkit: A Unified Framework to Boost Biomedical Applications

Michele Cancilla, Laura Canalini, Federico Bolelli, Stefano Allegretti, Salvador Carrión, Roberto Paredes, Jon Ander Gómez, Simone Leo, Marco Enrico Piras, Luca Pireddu, Asaf Badouh, Santiago Marco-Sola, Lluc Alvarez, Miquel Moreto, Costantino Grana

Responsive image

Auto-TLDR; DeepHealth Toolkit: An Open Source Deep Learning Toolkit for Cloud Computing and HPC

Slides Poster Similar

Given the overwhelming impact of machine learning on the last decade, several libraries and frameworks have been developed in recent years to simplify the design and training of neural networks, providing array-based programming, automatic differentiation and user-friendly access to hardware accelerators. None of those tools, however, was designed with native and transparent support for Cloud Computing or heterogeneous High-Performance Computing (HPC). The DeepHealth Toolkit is an open source deep learning toolkit aimed at boosting productivity of data scientists operating in the medical field by providing a unified framework for the distributed training of neural networks, that is able to leverage hybrid HPC and Cloud environments in a way transparent to the user. The toolkit is composed of a computer vision library, a deep learning library, and a front-end for non-expert users; all of the components are focused on the medical domain, but they are general purpose and can be applied to any other field. In this paper, the principles driving the design of the DeepHealth libraries are described, along with details about the implementation and the interaction between the different elements composing the toolkit. Finally, experiments on common benchmarks prove the efficiency of each separate component, and of the DeepHealth Toolkit overall.

Deep Recurrent-Convolutional Model for AutomatedSegmentation of Craniomaxillofacial CT Scans

Francesca Murabito, Simone Palazzo, Federica Salanitri Proietto, Francesco Rundo, Ulas Bagci, Daniela Giordano, Rosalia Leonardi, Concetto Spampinato

Responsive image

Auto-TLDR; Automated Segmentation of Anatomical Structures in Craniomaxillofacial CT Scans using Fully Convolutional Deep Networks

Slides Poster Similar

In this paper we define a deep learning architecture for automated segmentation of anatomical structures in Craniomaxillofacial (CMF) CT scans that leverages the recent success of encoder-decoder models for semantic segmentation of natural images. In particular, we propose a fully convolutional deep network that combines the advantages of recent fully convolutional models, such as Tiramisu, with squeeze-and-excitation blocks for feature recalibration, integrated with convolutional LSTMs to model spatio-temporal correlations between consecutive slices. The proposed segmentation network shows superior performance and generalization capabilities (to different structures and imaging modalities) than state of the art methods on automated segmentation of CMF structures (e.g., mandibles and airways) in several standard benchmarks (e.g., MICCAI datasets) and on new datasets proposed herein, effectively facing shape variability.

Multi-Attribute Learning with Highly Imbalanced Data

Lady Viviana Beltran Beltran, Mickaël Coustaty, Nicholas Journet, Juan C. Caicedo, Antoine Doucet

Responsive image

Auto-TLDR; Data Imbalance in Multi-Attribute Deep Learning Models: Adaptation to face each one of the problems derived from imbalance

Slides Poster Similar

Data is one of the most important keys for success when studying a simple or a complex phenomenon. With the use of deep-learning exploding and its democratization, non-computer science experts may struggle to use highly complex deep learning architectures, even when straightforward models offer them suitable performances. In this article, we study the specific and common problem of data imbalance in real databases as most of the bad performance problems are due to the data itself. We review two points: first, when the data contains different levels of imbalance. Classical imbalanced learning strategies cannot be directly applied when using multi-attribute deep learning models, i.e., multi-task and multi-label architectures. Therefore, one of our contributions is our proposed adaptations to face each one of the problems derived from imbalance. Second, we demonstrate that with little to no imbalance, straightforward deep learning models work well. However, for non-experts, these models can be seen as black boxes, where all the effort is put in pre-processing the data. To simplify the problem, we performed the classification task ignoring information that is costly to extract, such as part localization which is widely used in the state of the art of attribute classification. We make use of a widely known attribute database, CUB-200-2011 - CUB as our main use case due to its deeply imbalanced nature, along with two better structured databases: celebA and Awa2. All of them contain multi-attribute annotations. The results of highly fine-grained attribute learning over CUB demonstrate that in the presence of imbalance, by using our proposed strategies is possible to have competitive results against the state of the art, while taking advantage of multi-attribute deep learning models. We also report results for two better-structured databases over which our models over-perform the state of the art.

Robust Localization of Retinal Lesions Via Weakly-Supervised Learning

Ruohan Zhao, Qin Li, Jane You

Responsive image

Auto-TLDR; Weakly Learning of Lesions in Fundus Images Using Multi-level Feature Maps and Classification Score

Slides Poster Similar

Retinal fundus images reveal the condition of retina, blood vessels and optic nerve. Retinal imaging is becoming widely adopted in clinical work because any subtle changes to the structures at the back of the eyes can affect the eyes and indicate the overall health. Machine learning, in particular deep learning by convolutional neural network (CNN), has been increasingly adopted for computer-aided detection (CAD) of retinal lesions. However, a significant barrier to the high performance of CNN based CAD approach is caused by the lack of sufficient labeled ground-truth image samples for training. Unlike the fully-supervised learning which relies on pixel-level annotation of pathology in fundus images, this paper presents a new approach to discriminate the location of various lesions based on image-level labels via weakly learning. More specifically, our proposed method leverages multi-level feature maps and classification score to cope with both bright and red lesions in fundus images. To enhance capability of learning less discriminative parts of objects (e.g. small blobs of microaneurysms opposed to bulk of exudates), the classifier is regularized by refining images with corresponding labels. The experimental results of the performance evaluation and benchmarking at both image-level and pixel-level on the public DIARETDB1 dataset demonstrate the feasibility and excellent potentials of our method in practice.

Multimodal Side-Tuning for Document Classification

Stefano Zingaro, Giuseppe Lisanti, Maurizio Gabbrielli

Responsive image

Auto-TLDR; Side-tuning for Multimodal Document Classification

Slides Poster Similar

In this paper, we propose to exploit the side-tuning framework for multimodal document classification. Side-tuning is a methodology for network adaptation recently introduced to solve some of the problems related to previous approaches. Thanks to this technique it is actually possible to overcome model rigidity and catastrophic forgetting of transfer learning by fine-tuning. The proposed solution uses off-the-shelf deep learning architectures leveraging the side-tuning framework to combine a base model with a tandem of two side networks. We show that side-tuning can be successfully employed also when different data sources are considered, e.g. text and images in document classification. The experimental results show that this approach pushes further the limit for document classification accuracy with respect to the state of the art.

Contextual Classification Using Self-Supervised Auxiliary Models for Deep Neural Networks

Sebastian Palacio, Philipp Engler, Jörn Hees, Andreas Dengel

Responsive image

Auto-TLDR; Self-Supervised Autogenous Learning for Deep Neural Networks

Slides Poster Similar

Classification problems solved with deep neural networks (DNNs) typically rely on a closed world paradigm, and optimize over a single objective (e.g., minimization of the cross- entropy loss). This setup dismisses all kinds of supporting signals that can be used to reinforce the existence or absence of particular patterns. The increasing need for models that are interpretable by design makes the inclusion of said contextual signals a crucial necessity. To this end, we introduce the notion of Self-Supervised Autogenous Learning (SSAL). A SSAL objective is realized through one or more additional targets that are derived from the original supervised classification task, following architectural principles found in multi-task learning. SSAL branches impose low-level priors into the optimization process (e.g., grouping). The ability of using SSAL branches during inference, allow models to converge faster, focusing on a richer set of class-relevant features. We equip state-of-the-art DNNs with SSAL objectives and report consistent improvements for all of them on CIFAR100 and Imagenet. We show that SSAL models outperform similar state-of-the-art methods focused on contextual loss functions, auxiliary branches and hierarchical priors.

BAT Optimized CNN Model Identifies Water Stress in Chickpea Plant Shoot Images

Shiva Azimi, Taranjit Kaur, Tapan Gandhi

Responsive image

Auto-TLDR; BAT Optimized ResNet-18 for Stress Classification of chickpea shoot images under water deficiency

Slides Poster Similar

Stress due to water deficiency in plants can significantly lower the agricultural yield. It can affect many visible plant traits such as size and surface area, the number of leaves and their color, etc. In recent years, computer vision-based plant phenomics has emerged as a promising tool for plant research and management. Such techniques have the advantage of being non-destructive, non-evasive, fast, and offer high levels of automation. Pulses like chickpeas play an important role in ensuring food security in poor countries owing to their high protein and nutrition content. In the present work, we have built a dataset comprising of two varieties of chickpea plant shoot images under different moisture stress conditions. Specifically, we propose a BAT optimized ResNet-18 model for classifying stress induced by water deficiency using chickpea shoot images. BAT algorithm identifies the optimal value of the mini-batch size to be used for training rather than employing the traditional manual approach of trial and error. Experimentation on two crop varieties (JG and Pusa) reveals that BAT optimized approach achieves an accuracy of 96% and 91% for JG and Pusa varieties that is better than the traditional method by 4%. The experimental results are also compared with state of the art CNN models like Alexnet, GoogleNet, and ResNet-50. The comparison results demonstrate that the proposed BAT optimized ResNet-18 model achieves higher performance than the comparison counterparts.

Attention Based Multi-Instance Thyroid Cytopathological Diagnosis with Multi-Scale Feature Fusion

Shuhao Qiu, Yao Guo, Chuang Zhu, Wenli Zhou, Huang Chen

Responsive image

Auto-TLDR; A weakly supervised multi-instance learning framework based on attention mechanism with multi-scale feature fusion for thyroid cytopathological diagnosis

Slides Poster Similar

In recent years, deep learning has been popular in combining with cytopathology diagnosis. Using the whole slide images (WSI) scanned by electronic scanners at clinics, researchers have developed many algorithms to classify the slide (benign or malignant). However, the key area that support the diagnosis result can be relatively small in a thyroid WSI, and only the global label can be acquired, which make the direct use of the strongly supervised learning framework infeasible. What’s more, because the clinical diagnosis of the thyroid cells requires the use of visual features in different scales, a generic feature extraction way may not achieve good performance. In this paper, we propose a weakly supervised multi-instance learning framework based on attention mechanism with multi-scale feature fusion (MSF) using convolutional neural network (CNN) for thyroid cytopathological diagnosis. We take each WSI as a bag, each bag contains multiple instances which are the different regions of the WSI, our framework is trained to learn the key area automatically and make the classification. We also propose a feature fusion structure, merge the low-level features into the final feature map and add an instance-level attention module in it, which improves the classification accuracy. Our model is trained and tested on the collected clinical data, reaches the accuracy of 93.2%, which outperforms the other existing methods. We also tested our model on a public histopathology dataset and achieves better result than the state-of-the-art deep multi-instance method.

A Novel Computer-Aided Diagnostic System for Early Assessment of Hepatocellular Carcinoma

Ahmed Alksas, Mohamed Shehata, Gehad Saleh, Ahmed Shaffie, Ahmed Soliman, Mohammed Ghazal, Hadil Abukhalifeh, Abdel Razek Ahmed, Ayman El-Baz

Responsive image

Auto-TLDR; Classification of Liver Tumor Lesions from CE-MRI Using Structured Structural Features and Functional Features

Slides Poster Similar

Early assessment of liver cancer patients with hepatocellular carcinoma (HCC) is of immense importance to provide the proper treatment plan. In this paper, we have developed a two-stage classification computer-aided diagnostic (CAD) system that has the ability to detect and grade the liver observations from multiphase contrast enhanced magnetic resonance imaging (CE-MRI). The proposed approach consists of three main steps. First, a pre-processing is applied to the CE-MRI scans to delineate the tumor lesions that will be used as an ROI across the four different phases of the CE-MRI, (namely, the pre-contrast, late-arterial, portal-venous, and delayed-contrast). Second, a group of three features are modeled to provide a quantitative discrimination between the tumor lesions; namely: i) the tumor appearance that is modeled using a set of texture features, (namely; the first-order histogram, second-order gray-level co-occurrence matrix, and second-order gray-level run-length matrix), to capture any discrimination that may appear in the lesion texture, ii) the spherical harmonics (SH) based shape features that have the ability to describe the shape complexity of the liver tumors, and iii) the functional features that are based on the calculation of the wash-in/wash-out through that evaluate the intensity changes across the post-contrast phases. Finally, the aforementioned individual features were then integrated together to obtain the combined features to be fed to a machine learning classifier towards getting the final diagnostic decision. The proposed CAD system has been tested using hepatic observations that was obtained from 85 participating patients, 34 patients with benign tumors, 34 patients with intermediate tumors and 34 with malignant tumors. Using a random forests based classifier with a leave-one-subject-out (LOSO) cross-validation, the developed CAD system achieved an 87.1% accuracy in distinguishing the malignant, intermediate and benign tumors. The classification performance is then evaluated using k-fold (5/10-fold) cross-validation approach to examine the robustness of the system. The LR-1 lesions were classified from LR-2 benign lesions with 91.2% accuracy, while 85.3% accuracy was achieved differentiating between LR-4 and LR-5 malignant tumors. The obtained results hold a promise of the proposed framework to be reliably used as a noninvasive diagnostic tool for the early detection and grading of liver cancer tumors.

A Novel Region of Interest Extraction Layer for Instance Segmentation

Leonardo Rossi, Akbar Karimi, Andrea Prati

Responsive image

Auto-TLDR; Generic RoI Extractor for Two-Stage Neural Network for Instance Segmentation

Slides Poster Similar

Given the wide diffusion of deep neural network architectures for computer vision tasks, several new applications are nowadays more and more feasible. Among them, a particular attention has been recently given to instance segmentation, by exploiting the results achievable by two-stage networks (such as Mask R-CNN or Faster R-CNN), derived from R-CNN. In these complex architectures, a crucial role is played by the Region of Interest (RoI) extraction layer, devoted to extract a coherent subset of features from a single Feature Pyramid Network (FPN) layer attached on top of a backbone. This paper is motivated by the need to overcome to the limitations of existing RoI extractors which select only one (the best) layer from FPN. Our intuition is that all the layers of FPN retain useful information. Therefore, the proposed layer (called Generic RoI Extractor - GRoIE) introduces non-local building blocks and attention mechanisms to boost the performance. A comprehensive ablation study at component level is conducted to find the best set of algorithms and parameters for the GRoIE layer. Moreover, GRoIE can be integrated seamlessly with every two-stage architecture for both object detection and instance segmentation tasks. Therefore, the improvements brought by the use of GRoIE in different state-of-the-art architectures are also evaluated. The proposed layer leads up to gain a 1.1% AP on bounding box detection and 1.7% AP on instance segmentation. The code is publicly available on GitHub repository at https://github.com/IMPLabUniPr/mmdetection-groie

Inception Based Deep Learning Architecture for Tuberculosis Screening of Chest X-Rays

Dipayan Das, K.C. Santosh, Umapada Pal

Responsive image

Auto-TLDR; End to End CNN-based Chest X-ray Screening for Tuberculosis positive patients in the severely resource constrained regions of the world

Slides Poster Similar

The motivation for this work is the primary need of screening Tuberculosis (TB) positive patients in the severely resource constrained regions of the world. Chest X-ray (CXR) is considered to be a promising indicator for the onset of TB, but the lack of skilled radiologists in such regions degrades the situation. Therefore, several computer aided diagnosis (CAD) systems have been proposed to solve the decision making problem, which includes hand engineered feature extraction methods to deep learning or Convolutional Neural Network (CNN) based methods. Feature extraction, being a time and resource intensive process, often delays the process of mass screening. Hence an end to end CNN architecture is proposed in this work to solve the problem. Two benchmark CXR datasets have been used in this work, collected from Shenzhen (China) and Montgomery County (USA), on which the proposed methodology achieved a maximum abnormality detection accuracy (ACC) of 91.7\% (0.96 AUC) and 87.47\% (0.92 AUC) respectively. To the greatest of our knowledge, the obtained results are marginally superior to the state of the art results that have solely used deep learning methodologies on the aforementioned datasets.

Dealing with Scarce Labelled Data: Semi-Supervised Deep Learning with Mix Match for Covid-19 Detection Using Chest X-Ray Images

Saúl Calderón Ramirez, Raghvendra Giri, Shengxiang Yang, Armaghan Moemeni, Mario Umaña, David Elizondo, Jordina Torrents-Barrena, Miguel A. Molina-Cabello

Responsive image

Auto-TLDR; Semi-supervised Deep Learning for Covid-19 Detection using Chest X-rays

Slides Poster Similar

Coronavirus (Covid-19) is spreading fast, infecting people through contact in various forms including droplets from sneezing and coughing. Therefore, the detection of infected subjects in an early, quick and cheap manner is urgent. Currently available tests are scarce and limited to people in danger of serious illness. The application of deep learning to chest X-ray images for Covid-19 detection is an attractive approach. However, this technology usually relies on the availability of large labelled datasets, a requirement hard to meet in the context of a virus outbreak. To overcome this challenge, a semi-supervised deep learning model using both labelled and unlabelled data is proposed. We developed and tested a semi-supervised deep learning framework based on the Mix Match architecture to classify chest X-rays into Covid-19, pneumonia and healthy cases. The presented approach was calibrated using two publicly available datasets. The results show an accuracy increase of around $15\%$ under low labelled / unlabelled data ratio. This indicates that our semi-supervised framework can help improve performance levels towards Covid-19 detection when the amount of high-quality labelled data is scarce. Also, we introduce a semi-supervised deep learning boost coefficient which is meant to ease the scalability of our approach and performance comparison.

Semi-Supervised Generative Adversarial Networks with a Pair of Complementary Generators for Retinopathy Screening

Yingpeng Xie, Qiwei Wan, Hai Xie, En-Leng Tan, Yanwu Xu, Baiying Lei

Responsive image

Auto-TLDR; Generative Adversarial Networks for Retinopathy Diagnosis via Fundus Images

Slides Poster Similar

Several typical types of retinopathy are major causes of blindness. However, early detection of retinopathy is quite not easy since few symptoms are observable in the early stage, attributing to the development of non-mydriatic retinal camera. These camera produces high-resolution retinal fundus images provide the possibility of Computer-Aided-Diagnosis (CAD) via deep learning to assist diagnosing retinopathy. Deep learning algorithms usually rely on a great number of labelled images which are expensive and time-consuming to obtain in the medical imaging area. Moreover, the random distribution of various lesions which often vary greatly in size also brings significant challenges to learn discriminative information from high-resolution fundus image. In this paper, we present generative adversarial networks simultaneously equipped with "good" generator and "bad" generator (GBGANs) to make up for the incomplete data distribution provided by limited fundus images. To improve the generative feasibility of generator, we introduce into pre-trained feature extractor to acquire condensed feature for each fundus image in advance. Experimental results on integrated three public iChallenge datasets show that the proposed GBGANs could fully utilize the available fundus images to identify retinopathy with little label cost.

CAggNet: Crossing Aggregation Network for Medical Image Segmentation

Xu Cao, Yanghao Lin

Responsive image

Auto-TLDR; Crossing Aggregation Network for Medical Image Segmentation

Slides Poster Similar

In this paper, we present Crossing Aggregation Network (CAggNet), a novel densely connected semantic segmentation method for medical image analysis. The crossing aggregation network absorbs the idea of deep layer aggregation and makes significant innovations in layer connection and semantic information fusion. In this architecture, the traditional skip-connection structure of general U-Net is replaced by aggregations of multi-level down-sampling and up-sampling layers. This enables the network to fuse information interactively flows at different levels of layers in semantic segmentation. It also introduces weighted aggregation module to aggregate multi-scale output information. We have evaluated and compared our CAggNet with several advanced U-Net based methods in two public medical image datasets, including the 2018 Data Science Bowl nuclei detection dataset and the 2015 MICCAI gland segmentation competition dataset. Experimental results indicate that CAggNet improves medical object recognition and achieves a more accurate and efficient segmentation compared to existing improved U-Net and UNet++ structure.

NephCNN: A Deep-Learning Framework for Vessel Segmentation in Nephrectomy Laparoscopic Videos

Alessandro Casella, Sara Moccia, Chiara Carlini, Emanuele Frontoni, Elena De Momi, Leonardo Mattos

Responsive image

Auto-TLDR; Adversarial Fully Convolutional Neural Networks for kidney vessel segmentation from nephrectomy laparoscopic videos

Slides Poster Similar

Objective: In the last years, Robot-assisted partial nephrectomy (RAPN) is establishing as elected treatment for renal cell carcinoma (RCC). Reduced field of view, field occlusions by surgical tools, and reduced maneuverability may potentially cause accidents, such as unwanted vessel resection with consequent bleeding. Surgical Data Science (SDS) can provide effective context-aware tools for supporting surgeons. However, currently no tools have been exploited for automatic vessels segmentation from nephrectomy laparoscopic videos. Herein, we propose a new approach based on adversarial Fully Convolutional Neural Networks (FCNNs) to kidney vessel segmentation from nephrectomy laparoscopic vision. Methods: The proposed approach enhances existing segmentation framework by (i) encoding 3D kernels for spatio-temporal features extraction to enforce pixel connectivity in time, and (ii) perform training in adversarial fashion, which constrains vessels shape. Results: We performed a preliminary study using 8 different RAPN videos (1871 frames), the first in the field, achieving a median Dice Similarity Coefficient of 71.76%. Conclusions: Results showed that the proposed approach could be a valuable solution with a view to assist surgeon during RAPN.

Merged 1D-2D Deep Convolutional Neural Networks for Nerve Detection in Ultrasound Images

Mohammad Alkhatib, Adel Hafiane, Pierre Vieyres

Responsive image

Auto-TLDR; A Deep Neural Network for Deep Neural Networks to Detect Median Nerve in Ultrasound-Guided Regional Anesthesia

Slides Poster Similar

Ultrasound-Guided Regional Anesthesia (UGRA) becomes a standard procedure in surgical operations and contributes to pain management. It offers the advantages of the targeted nerve detection and provides the visualization of regions of interest such as anatomical structures. However, nerve detection is one of the most challenging tasks that anesthetists can encounter in the UGRA procedure. A computer-aided system that can detect automatically the nerve region would facilitate the anesthetist's daily routine and allow them to concentrate more on the anesthetic delivery. In this paper, we propose a new method based on merging deep learning models from different data to detect the median nerve. The merged architecture consists of two branches, one being one dimensional (1D) convolutional neural networks (CNN) branch and another 2D CNN branch. The merged architecture aims to learn the high-level features from 1D handcrafted noise-robust features and 2D ultrasound images. The obtained results show the validity and high accuracy of the proposed approach and its robustness.

A Versatile Crack Inspection Portable System Based on Classifier Ensemble and Controlled Illumination

Milind Gajanan Padalkar, Carlos Beltran-Gonzalez, Matteo Bustreo, Alessio Del Bue, Vittorio Murino

Responsive image

Auto-TLDR; Lighting Conditions for Crack Detection in Ceramic Tile

Slides Poster Similar

This paper presents a novel setup for automatic visual inspection of cracks in ceramic tile as well as studies the effect of various classifiers and height-varying illumination conditions for this task. The intuition behind this setup is that cracks can be better visualized under specific lighting conditions than others. Our setup, which is designed for field work with constraints in its maximum dimensions, can acquire images for crack detection with multiple lighting conditions using the illumination sources placed at multiple heights. Crack detection is then performed by classifying patches extracted from the acquired images in a sliding window fashion. We study the effect of lights placed at various heights by training classifiers both on customized as well as state-of-the-art architectures and evaluate their performance both at patch-level and image-level, demonstrating the effectiveness of our setup. More importantly, ours is the first study that demonstrates how height-varying illumination conditions can affect crack detection with the use of existing state-of-the-art classifiers. We provide an insight about the illumination conditions that can help in improving crack detection in a challenging real-world industrial environment.

Personalized Models in Human Activity Recognition Using Deep Learning

Hamza Amrani, Daniela Micucci, Paolo Napoletano

Responsive image

Auto-TLDR; Incremental Learning for Personalized Human Activity Recognition

Slides Poster Similar

Current sensor-based human activity recognition techniques that rely on a user-independent model struggle to generalize to new users and on to changes that a person may make over time to his or her way of carrying out activities. Incremental learning is a technique that allows to obtain personalized models which may improve the performance on the classifiers thanks to a continuous learning based on user data. Finally, deep learning techniques have been proven to be more effective with respect to traditional ones in the generation of user-independent models. The aim of our work is therefore to put together deep learning techniques with incremental learning in order to obtain personalized models that perform better with respect to user-independent model and personalized model obtained using traditional machine learning techniques. The experimentation was done by comparing the results obtained by a technique in the state of the art with those obtained by two neural networks (ResNet and a simplified CNN) on three datasets. The experimentation showed that neural networks adapt faster to a new user than the baseline.

Vision-Based Layout Detection from Scientific Literature Using Recurrent Convolutional Neural Networks

Huichen Yang, William Hsu

Responsive image

Auto-TLDR; Transfer Learning for Scientific Literature Layout Detection Using Convolutional Neural Networks

Slides Poster Similar

We present an approach for adapting convolutional neural networks for object recognition and classification to scientific literature layout detection (SLLD), a shared subtask of several information extraction problems. Scientific publications contain multiple types of information sought by researchers in various disciplines, organized into an abstract, bibliography, and sections documenting related work, experimental methods, and results; however, there is no effective way to extract this information due to their diverse layout. In this paper, we present a novel approach to developing an end-to-end learning framework to segment and classify major regions of a scientific document. We consider scientific document layout analysis as an object detection task over digital images, without any additional text features that need to be added into the network during the training process. Our technical objective is to implement transfer learning via fine-tuning of pre-trained networks and thereby demonstrate that this deep learning architecture is suitable for tasks that lack very large document corpora for training. As part of the experimental test bed for empirical evaluation of this approach, we created a merged multi-corpus data set for scientific publication layout detection tasks. Our results show good improvement with fine-tuning of a pre-trained base network using this merged data set, compared to the baseline convolutional neural network architecture.