Classify Breast Histopathology Images with Ductal Instance-Oriented Pipeline

Beibin Li, Ezgi Mercan, Sachin Mehta, Stevan Knezevich, Corey Arnold, Donald Weaver, Joann Elmore, Linda Shapiro

Responsive image

Auto-TLDR; DIOP: Ductal Instance-Oriented Pipeline for Diagnostic Classification

Slides Poster

In this study, we propose the Ductal Instance-Oriented Pipeline (DIOP) that contains a duct-level instance segmentation model, a tissue-level semantic segmentation model, and three-levels of features for diagnostic classification. Based on recent advancements in instance segmentation and the Mask R-CNN model, our duct-level segmenter tries to identify each ductal individual inside a microscopic image; then, it extracts tissue-level information from the identified ductal instances. Leveraging three levels of information obtained from these ductal instances and also the histopathology image, the proposed DIOP outperforms previous approaches (both feature-based and CNN-based) in all diagnostic tasks; for the four-way classification task, the DIOP achieves comparable performance to general pathologists in this unique dataset. The proposed DIOP only takes a few seconds to run in the inference time, which could be used interactively on most modern computers. More clinical explorations are needed to study the robustness and generalizability of this system in the future.

Similar papers

Attention Based Multi-Instance Thyroid Cytopathological Diagnosis with Multi-Scale Feature Fusion

Shuhao Qiu, Yao Guo, Chuang Zhu, Wenli Zhou, Huang Chen

Responsive image

Auto-TLDR; A weakly supervised multi-instance learning framework based on attention mechanism with multi-scale feature fusion for thyroid cytopathological diagnosis

Slides Poster Similar

In recent years, deep learning has been popular in combining with cytopathology diagnosis. Using the whole slide images (WSI) scanned by electronic scanners at clinics, researchers have developed many algorithms to classify the slide (benign or malignant). However, the key area that support the diagnosis result can be relatively small in a thyroid WSI, and only the global label can be acquired, which make the direct use of the strongly supervised learning framework infeasible. What’s more, because the clinical diagnosis of the thyroid cells requires the use of visual features in different scales, a generic feature extraction way may not achieve good performance. In this paper, we propose a weakly supervised multi-instance learning framework based on attention mechanism with multi-scale feature fusion (MSF) using convolutional neural network (CNN) for thyroid cytopathological diagnosis. We take each WSI as a bag, each bag contains multiple instances which are the different regions of the WSI, our framework is trained to learn the key area automatically and make the classification. We also propose a feature fusion structure, merge the low-level features into the final feature map and add an instance-level attention module in it, which improves the classification accuracy. Our model is trained and tested on the collected clinical data, reaches the accuracy of 93.2%, which outperforms the other existing methods. We also tested our model on a public histopathology dataset and achieves better result than the state-of-the-art deep multi-instance method.

A Systematic Investigation on Deep Architectures for Automatic Skin Lesions Classification

Pierluigi Carcagni, Marco Leo, Andrea Cuna, Giuseppe Celeste, Cosimo Distante

Responsive image

Auto-TLDR; RegNet: Deep Investigation of Convolutional Neural Networks for Automatic Classification of Skin Lesions

Slides Poster Similar

Computer vision-based techniques are more and more employed in healthcare and medical fields nowadays in order, principally, to be as a support to the experienced medical staff to help them to make a quick and correct diagnosis. One of the hot topics in this arena concerns the automatic classification of skin lesions. Several promising works exist about it, mainly leveraging Convolutional Neural Networks (CNN), but proposed pipeline mainly rely on complex data preprocessing and there is no systematic investigation about how available deep models can actually reach the accuracy needed for real applications. In order to overcome these drawbacks, in this work, an end-to-end pipeline is introduced and some of the most recent Convolutional Neural Networks (CNNs) architectures are included in it and compared on the largest common benchmark dataset recently introduced. To this aim, for the first time in this application context, a new network design paradigm, namely RegNet, has been exploited to get the best models among a population of configurations. The paper introduces a threefold level of contribution and novelty with respect the previous literature: the deep investigation of several CNN architectures driving to a consistent improvement of the lesions recognition accuracy, the exploitation of a new network design paradigm able to study the behavior of populations of models and a deep discussion about pro and cons of each analyzed method paving the path towards new research lines.

Cross-View Relation Networks for Mammogram Mass Detection

Ma Jiechao, Xiang Li, Hongwei Li, Ruixuan Wang, Bjoern Menze, Wei-Shi Zheng

Responsive image

Auto-TLDR; Multi-view Modeling for Mass Detection in Mammogram

Slides Poster Similar

In medical image analysis, multi-view modeling is crucial for pathology detection when the target lesion is presented in different views, e.g. mass lesions in breast. Currently mammogram is the most effective imaging modality for mass lesion detection of breast cancer at the early stage. The pathological information from the two paired views (i.e., medio-lateral oblique and cranio-caudal) are highly relational and complementary, which is crucial for diagnosis in clinical practice. Existing mass detection methods do not consider learning synergistic features from the two relational views. For the first time, we propose a novel mass detection framework to capture the latent relation information from the two paired views of a same mass in mammogram. We evaluate our model on a public mammogram dataset and a large-scale private dataset, demonstrating that the proposed method outperforms existing feature fusion approaches and state-of-the-art mass detection methods. We further analyze the performance gains from the relation modeling. Our quantitative and qualitative results suggest that jointly learning cross-view features boosts the detection performance of existing models, which is a promising avenue for mass detection task in mammogram.

Triplet-Path Dilated Network for Detection and Segmentation of General Pathological Images

Jiaqi Luo, Zhicheng Zhao, Fei Su, Limei Guo

Responsive image

Auto-TLDR; Triplet-path Network for One-Stage Object Detection and Segmentation in Pathological Images

Slides Similar

Deep learning has been widely applied in the field of medical image processing. However, compared with flourishing visual tasks in natural images, the progress achieved in pathological images is not remarkable, and detection and segmentation, which are among basic tasks of computer vision, are regarded as two independent tasks. In this paper, we make full use of existing datasets and construct a triplet-path network using dilated convolutions to cooperatively accomplish one-stage object detection and nuclei segmentation for general pathological images. First, in order to meet the requirement of detection and segmentation, a novel structure called triplet feature generation (TFG) is designed to extract high-resolution and multiscale features, where features from different layers can be properly integrated. Second, considering that pathological datasets are usually small, a location-aware and partially truncated loss function is proposed to improve the classification accuracy of datasets with few images and widely varying targets. We compare the performance of both object detection and instance segmentation with state-of-the-art methods. Experimental results demonstrate the effectiveness and efficiency of the proposed network on two datasets collected from multiple organs.

End-To-End Multi-Task Learning for Lung Nodule Segmentation and Diagnosis

Wei Chen, Qiuli Wang, Dan Yang, Xiaohong Zhang, Chen Liu, Yucong Li

Responsive image

Auto-TLDR; A novel multi-task framework for lung nodule diagnosis based on deep learning and medical features

Slides Similar

Computer-Aided Diagnosis (CAD) systems for lung nodule diagnosis based on deep learning have attracted much attention in recent years. However, most existing methods ignore the relationships between the segmentation and classification tasks, which leads to unstable performances. To address this problem, we propose a novel multi-task framework, which can provide lung nodule segmentation mask, malignancy prediction, and medical features for interpretable diagnosis at the same time. Our framework mainly contains two sub-network: (1) Multi-Channel Segmentation Sub-network (MSN) for lung nodule segmentation, and (2) Joint Classification Sub-network (JCN) for interpretable lung nodule diagnosis. In the proposed framework, we use U-Net down-sampling processes for extracting low-level deep learning features, which are shared by two sub-networks. The JCN forces the down-sampling processes to learn better lowlevel deep features, which lead to a better construct of segmentation masks. Meanwhile, two additional channels constructed by OTSU and super-pixel (SLIC) methods, are utilized as the guideline of the feature extraction. The proposed framework takes advantages of deep learning methods and classical methods, which can significantly improve the performances of all tasks. We evaluate the proposed framework on public dataset LIDCIDRI. Our framework achieves a promising Dice score of 86.43% in segmentation, 87.07% in malignancy level prediction, and convincing results in interpretable medical feature predictions.

A Benchmark Dataset for Segmenting Liver, Vasculature and Lesions from Large-Scale Computed Tomography Data

Bo Wang, Zhengqing Xu, Wei Xu, Qingsen Yan, Liang Zhang, Zheng You

Responsive image

Auto-TLDR; The Biggest Treatment-Oriented Liver Cancer Dataset for Segmentation

Slides Poster Similar

How to build a high-performance liver-related computer assisted diagnosis system is an open question of great interest. However, the performance of the state-of-art algorithm is always limited by the amount of data and quality of the label. To address this problem, we propose the biggest treatment-oriented liver cancer dataset for liver surgery and treatment planning. This dataset provides 216 cases (totally about 268K frames) scanned images in contrast-enhanced computed tomography (CT). We labeled all the CT images with the liver, liver vasculature and liver tumor segmentation ground truth for train and tune segmentation algorithms in advance. Based on that, we evaluate several recent and state-of-the-art segmentation algorithms, including 7 deep learning methods, on CT sequences. All results are compared to reference segmentations five error metrics that highlight different aspects of segmentation accuracy. In general, compared with previous datasets, our dataset is really a challenging dataset. To our knowledge, the proposed dataset and benchmark allow for the first time systematic exploration of such issues, and will be made available to allow for further research in this field.

A Comparison of Neural Network Approaches for Melanoma Classification

Maria Frasca, Michele Nappi, Michele Risi, Genoveffa Tortora, Alessia Auriemma Citarella

Responsive image

Auto-TLDR; Classification of Melanoma Using Deep Neural Network Methodologies

Slides Poster Similar

Melanoma is the deadliest form of skin cancer and it is diagnosed mainly visually, starting from initial clinical screening and followed by dermoscopic analysis, biopsy and histopathological examination. A dermatologist’s recognition of melanoma may be subject to errors and may take some time to diagnose it. In this regard, deep learning can be useful in the study and classification of skin cancer. In particular, by classifying images with Deep Neural Network methodologies, it is possible to obtain comparable or even superior results compared to those of dermatologists. In this paper, we propose a methodology for the classification of melanoma by adopting different deep learning techniques applied to a common dataset, composed of images from the ISIC dataset and consisting of different types of skin diseases, including melanoma on which we applied a specific pre-processing phase. In particular, a comparison of the results is performed in order to select the best effective neural network to be applied to the problem of recognition and classification of melanoma. Moreover, we also evaluate the impact of the pre- processing phase on the final classification. Different metrics such as accuracy, sensitivity, and specificity have been selected to assess the goodness of the adopted neural networks and compare them also with the manual classification of dermatologists.

Dual Stream Network with Selective Optimization for Skin Disease Recognition in Consumer Grade Images

Krishnam Gupta, Jaiprasad Rampure, Monu Krishnan, Ajit Narayanan, Nikhil Narayan

Responsive image

Auto-TLDR; A Deep Network Architecture for Skin Disease Localisation and Classification on Consumer Grade Images

Slides Poster Similar

Skin disease localisation and classification on consumer-grade images is more challenging compared to that on dermoscopic imaging. Consumer grade images refer to the images taken using commonly available imaging devices such as a mobile camera or a hand held digital camera. Such images, in addition to having the skin condition of interest in a very small area of the image, has other noisy non-clinical details introduced due to the lighting conditions and the distance of the hand held device from the anatomy at the time of acquisition. We propose a novel deep network architecture \& a new optimization strategy for classification with implicit localisation of skin diseases from clinical/consumer grade images. A weakly supervised segmentation algorithm is first employed to extract Region of Interests (RoI) from the image, the RoI and the original image form the two input streams of the proposed architecture. Each stream of the architecture learns high level and low level features from the original image and the RoI, respectively. The two streams are independently optimised until the loss stops decreasing after which both the streams are optimised collectively with the help of a third combiner sub-network. Such a strategy resulted in a 5% increase of accuracy over the current state-of-the-art methods on SD-198 dataset, which is publicly available. The proposed algorithm is also validated on a new dataset containing over 12,000 images across 75 different skin conditions. We intend to release this dataset as SD-75 to aid in the advancement of research on skin condition classification on consumer grade images.

Self-Supervised Learning with Graph Neural Networks for Region of Interest Retrieval in Histopathology

Yigit Ozen, Selim Aksoy, Kemal Kosemehmetoglu, Sevgen Onder, Aysegul Uner

Responsive image

Auto-TLDR; Self-supervised Contrastive Learning for Deep Representation Learning of Histopathology Images

Slides Poster Similar

Deep learning has achieved successful performance in representation learning and content-based retrieval of histopathology images. The commonly used setting in deep learning-based approaches is supervised training of deep neural networks for classification, and using the trained model to extract representations that are used for computing and ranking the distances between images. However, there are two remaining major challenges. First, supervised training of deep neural networks requires large amount of manually labeled data which is often limited in the medical field. Transfer learning has been used to overcome this challenge, but its success remained limited. Second, the clinical practice in histopathology necessitates working with regions of interest (ROI) of multiple diagnostic classes with arbitrary shapes and sizes. The typical solution to this problem is to aggregate the representations of fixed-sized patches cropped from these regions to obtain region-level representations. However, naive methods cannot sufficiently exploit the rich contextual information in the complex tissue structures. To tackle these two challenges, we propose a generic method that utilizes graph neural networks (GNN), combined with a self-supervised training method using a contrastive loss. GNN enables representing arbitrarily-shaped ROIs as graphs and encoding contextual information. Self-supervised contrastive learning improves quality of learned representations without requiring labeled data. The experiments using a challenging breast histopathology data set show that the proposed method achieves better performance than the state-of-the-art.

Supporting Skin Lesion Diagnosis with Content-Based Image Retrieval

Stefano Allegretti, Federico Bolelli, Federico Pollastri, Sabrina Longhitano, Giovanni Pellacani, Costantino Grana

Responsive image

Auto-TLDR; Skin Images Retrieval Using Convolutional Neural Networks for Skin Lesion Classification and Segmentation

Slides Poster Similar

Given the relevance of skin cancer, many attempts have been dedicated to the creation of automated devices that could assist both expert and beginner dermatologists towards fast and early diagnosis of skin lesions. In recent years, tasks such as skin lesion classification and segmentation have been extensively addressed with deep learning algorithms, which in some cases reach a diagnostic accuracy comparable to that of expert physicians. However, the general lack of interpretability and reliability severely hinders the ability of those approaches to actually support dermatologists in the diagnosis process. In this paper a novel skin images retrieval system is presented, which exploits features extracted by Convolutional Neural Networks to gather similar images from a publicly available dataset, in order to assist the diagnosis process of both expert and novice practitioners. In the proposed framework, Resnet-50 is initially trained for the classification of dermoscopic images; then, the feature extraction part is isolated, and an embedding network is build on top of it. The embedding learns an alternative representation, which allows to check image similarity by means of a distance measure. Experimental results reveal that the proposed method is able to select meaningful images, which can effectively boost the classification accuracy of human dermatologists.

End-To-End Deep Learning Methods for Automated Damage Detection in Extreme Events at Various Scales

Yongsheng Bai, Alper Yilmaz, Halil Sezen

Responsive image

Auto-TLDR; Robust Mask R-CNN for Crack Detection in Extreme Events

Slides Poster Similar

Robust Mask R-CNN (Mask Regional Convolutional Neural Network) methods are proposed and tested for automatic detection of cracks on structures or their components that may be damaged during extreme events, such as earth-quakes. We curated a new dataset with 2,021 labeled images for training and validation and aimed to find end-to-end deep neural networks for crack detection in the field. With data augmentation and parameters fine-tuning, Path Aggregation Network (PANet) with spatial attention mechanisms and High-resolution Network (HRNet) are introduced into Mask R-CNNs. The tests on three public datasets with low- or high-resolution images demonstrate that the proposed methods can achieve a big improvement over alternative networks, so the proposed method may be sufficient for crack detection for a variety of scales in real applications.

Point In: Counting Trees with Weakly Supervised Segmentation Network

Pinmo Tong, Shuhui Bu, Pengcheng Han

Responsive image

Auto-TLDR; Weakly Tree counting using Deep Segmentation Network with Localization and Mask Prediction

Slides Poster Similar

For tree counting tasks, since traditional image processing methods require expensive feature engineering and are not end-to-end frameworks, this will cause additional noise and cannot be optimized overall, so this method has not been widely used in recent trends of tree counting application. Recently, many deep learning based approaches are designed for this task because of the powerful feature extracting ability. The representative way is bounding box based supervised method, but time-consuming annotations are indispensable for them. Moreover, these methods are difficult to overcome the occlusion or overlap. To solve this problem, we propose a weakly tree counting network (WTCNet) based on deep segmentation network with only point supervision. It can simultaneously complete tree counting with localization and output mask of each tree at the same time. We first adopt a novel feature extractor network (FENet) to get features of input images, and then an effective strategy is introduced to deal with different mask predictions. In the end, we propose a basic localization guidance accompany with rectification guidance to train the network. We create two different datasets and select an existing challenging plant dataset to evaluate our method on three different tasks. Experimental results show the good performance improvement of our method compared with other existing methods. Further study shows that our method has great potential to reduce human labor and provide effective ground-truth masks and the results show the superiority of our method over the advanced methods.

A Novel Computer-Aided Diagnostic System for Early Assessment of Hepatocellular Carcinoma

Ahmed Alksas, Mohamed Shehata, Gehad Saleh, Ahmed Shaffie, Ahmed Soliman, Mohammed Ghazal, Hadil Abukhalifeh, Abdel Razek Ahmed, Ayman El-Baz

Responsive image

Auto-TLDR; Classification of Liver Tumor Lesions from CE-MRI Using Structured Structural Features and Functional Features

Slides Poster Similar

Early assessment of liver cancer patients with hepatocellular carcinoma (HCC) is of immense importance to provide the proper treatment plan. In this paper, we have developed a two-stage classification computer-aided diagnostic (CAD) system that has the ability to detect and grade the liver observations from multiphase contrast enhanced magnetic resonance imaging (CE-MRI). The proposed approach consists of three main steps. First, a pre-processing is applied to the CE-MRI scans to delineate the tumor lesions that will be used as an ROI across the four different phases of the CE-MRI, (namely, the pre-contrast, late-arterial, portal-venous, and delayed-contrast). Second, a group of three features are modeled to provide a quantitative discrimination between the tumor lesions; namely: i) the tumor appearance that is modeled using a set of texture features, (namely; the first-order histogram, second-order gray-level co-occurrence matrix, and second-order gray-level run-length matrix), to capture any discrimination that may appear in the lesion texture, ii) the spherical harmonics (SH) based shape features that have the ability to describe the shape complexity of the liver tumors, and iii) the functional features that are based on the calculation of the wash-in/wash-out through that evaluate the intensity changes across the post-contrast phases. Finally, the aforementioned individual features were then integrated together to obtain the combined features to be fed to a machine learning classifier towards getting the final diagnostic decision. The proposed CAD system has been tested using hepatic observations that was obtained from 85 participating patients, 34 patients with benign tumors, 34 patients with intermediate tumors and 34 with malignant tumors. Using a random forests based classifier with a leave-one-subject-out (LOSO) cross-validation, the developed CAD system achieved an 87.1% accuracy in distinguishing the malignant, intermediate and benign tumors. The classification performance is then evaluated using k-fold (5/10-fold) cross-validation approach to examine the robustness of the system. The LR-1 lesions were classified from LR-2 benign lesions with 91.2% accuracy, while 85.3% accuracy was achieved differentiating between LR-4 and LR-5 malignant tumors. The obtained results hold a promise of the proposed framework to be reliably used as a noninvasive diagnostic tool for the early detection and grading of liver cancer tumors.

Automatic Semantic Segmentation of Structural Elements related to the Spinal Cord in the Lumbar Region by Using Convolutional Neural Networks

Jhon Jairo Sáenz Gamboa, Maria De La Iglesia-Vaya, Jon Ander Gómez

Responsive image

Auto-TLDR; Semantic Segmentation of Lumbar Spine Using Convolutional Neural Networks

Slides Poster Similar

This work addresses the problem of automatically segmenting the MR images corresponding to the lumbar spine. The purpose is to detect and delimit the different structural elements like vertebrae, intervertebral discs, nerves, blood vessels, etc. This task is known as semantic segmentation. The approach proposed in this work is based on convolutional neural networks whose output is a mask where each pixel from the input image is classified into one of the possible classes. Classes were defined by radiologists and correspond to structural elements and tissues. The proposed network architectures are variants of the U-Net. Several complementary blocks were used to define the variants: spatial attention models, deep supervision and multi-kernels at input, this last block type is based on the idea of inception. Those architectures which got the best results are described in this paper, and their results are discussed. Two of the proposed architectures outperform the standard U-Net used as baseline.

Robust Localization of Retinal Lesions Via Weakly-Supervised Learning

Ruohan Zhao, Qin Li, Jane You

Responsive image

Auto-TLDR; Weakly Learning of Lesions in Fundus Images Using Multi-level Feature Maps and Classification Score

Slides Poster Similar

Retinal fundus images reveal the condition of retina, blood vessels and optic nerve. Retinal imaging is becoming widely adopted in clinical work because any subtle changes to the structures at the back of the eyes can affect the eyes and indicate the overall health. Machine learning, in particular deep learning by convolutional neural network (CNN), has been increasingly adopted for computer-aided detection (CAD) of retinal lesions. However, a significant barrier to the high performance of CNN based CAD approach is caused by the lack of sufficient labeled ground-truth image samples for training. Unlike the fully-supervised learning which relies on pixel-level annotation of pathology in fundus images, this paper presents a new approach to discriminate the location of various lesions based on image-level labels via weakly learning. More specifically, our proposed method leverages multi-level feature maps and classification score to cope with both bright and red lesions in fundus images. To enhance capability of learning less discriminative parts of objects (e.g. small blobs of microaneurysms opposed to bulk of exudates), the classifier is regularized by refining images with corresponding labels. The experimental results of the performance evaluation and benchmarking at both image-level and pixel-level on the public DIARETDB1 dataset demonstrate the feasibility and excellent potentials of our method in practice.

CAggNet: Crossing Aggregation Network for Medical Image Segmentation

Xu Cao, Yanghao Lin

Responsive image

Auto-TLDR; Crossing Aggregation Network for Medical Image Segmentation

Slides Poster Similar

In this paper, we present Crossing Aggregation Network (CAggNet), a novel densely connected semantic segmentation method for medical image analysis. The crossing aggregation network absorbs the idea of deep layer aggregation and makes significant innovations in layer connection and semantic information fusion. In this architecture, the traditional skip-connection structure of general U-Net is replaced by aggregations of multi-level down-sampling and up-sampling layers. This enables the network to fuse information interactively flows at different levels of layers in semantic segmentation. It also introduces weighted aggregation module to aggregate multi-scale output information. We have evaluated and compared our CAggNet with several advanced U-Net based methods in two public medical image datasets, including the 2018 Data Science Bowl nuclei detection dataset and the 2015 MICCAI gland segmentation competition dataset. Experimental results indicate that CAggNet improves medical object recognition and achieves a more accurate and efficient segmentation compared to existing improved U-Net and UNet++ structure.

Automatic Tuberculosis Detection Using Chest X-Ray Analysis with Position Enhanced Structural Information

Hermann Jepdjio Nkouanga, Szilard Vajda

Responsive image

Auto-TLDR; Automatic Chest X-ray Screening for Tuberculosis in Rural Population using Localized Region on Interest

Slides Poster Similar

For Tuberculosis (TB) detection beside the more expensive diagnosis solutions such as culture or sputum smear analysis one could consider the automatic analysis of the chest X-ray (CXR). This could mimic the lung region reading by the radiologist and it could provide a cheap solution to analyze and diagnose pulmonary abnormalities such as TB which often co- occurs with HIV. This software based pulmonary screening can be a reliable and affordable solution for rural population in different parts of the world such as India, Africa, etc. Our fully automatic system is processing the incoming CXR image by applying image processing techniques to detect the region on interest (ROI) followed by a computationally cheap feature extraction involving edge detection using Laplacian of Gaussian which we enrich by counting the local distribution of the intensities. The choice to ”zoom in” the ROI and look for abnormalities locally is motivated by the fact that some pulmonary abnormalities are localized in specific regions of the lungs. Later on the classifiers can decide about the normal or abnormal nature of each lung X-ray. Our goal is to find a simple feature, instead of a combination of several ones, -proposed and promoted in recent years’ literature, which can properly describe the different pathological alterations in the lungs. Our experiments report results on two publicly available data collections1, namely the Shenzhen and the Montgomery collection. For performance evaluation, measures such as area under the curve (AUC), and accuracy (ACC) were considered, achieving AUC = 0.81 (ACC = 83.33%) and AUC = 0.96 (ACC = 96.35%) for the Montgomery and Schenzen collections, respectively. Several comparisons are also provided to other state- of-the-art systems reported recently in the field.

DA-RefineNet: Dual-Inputs Attention RefineNet for Whole Slide Image Segmentation

Ziqiang Li, Rentuo Tao, Qianrun Wu, Bin Li

Responsive image

Auto-TLDR; DA-RefineNet: A dual-inputs attention network for whole slide image segmentation

Slides Poster Similar

Automatic medical image segmentation techniques have wide applications for disease diagnosing, however, its much more challenging than natural optical image segmentation tasks due to the high-resolution of medical images and the corresponding huge computation cost. Sliding window was a commonly used technique for whole slide image (WSI) segmentation, however, for these methods that based on sliding window, the main drawback was lacking of global contextual information for supervision. In this paper, we proposed a dual-inputs attention network (denoted as DA-RefineNet) for WSI segmentation, where both local fine-grained information and global coarse information can be efficiently utilized. Sufficient comparative experiments were conducted to evaluate the effectiveness of the proposed method, the results proved that the proposed method can achieve better performance on WSI segmentation tasks compared to methods rely on single-input.

BiLuNet: A Multi-Path Network for Semantic Segmentation on X-Ray Images

Van Luan Tran, Huei-Yung Lin, Rachel Liu, Chun-Han Tseng, Chun-Han Tseng

Responsive image

Auto-TLDR; BiLuNet: Multi-path Convolutional Neural Network for Semantic Segmentation of Lumbar vertebrae, sacrum,

Similar

Semantic segmentation and shape detection of lumbar vertebrae, sacrum, and femoral heads from clinical X-ray images are important and challenging tasks. In this paper, we propose a new multi-path convolutional neural network, BiLuNet, for semantic segmentation on X-ray images. The network is capable of medical image segmentation with very limited training data. With the shape fitting of the bones, we can identify the location of the target regions very accurately for lumbar vertebra inspection. We collected our dataset and annotated by doctors for model training and performance evaluation. Compared to the state-of-the-art methods, the proposed technique provides better mIoUs and higher success rates with the same training data. The experimental results have demonstrated the feasibility of our network to perform semantic segmentation for lumbar vertebrae, sacrum, and femoral heads.

CASNet: Common Attribute Support Network for Image Instance and Panoptic Segmentation

Xiaolong Liu, Yuqing Hou, Anbang Yao, Yurong Chen, Keqiang Li

Responsive image

Auto-TLDR; Common Attribute Support Network for instance segmentation and panoptic segmentation

Slides Poster Similar

Instance segmentation and panoptic segmentation is being paid more and more attention in recent years. In comparison with bounding box based object detection and semantic segmentation, instance segmentation can provide more analytical results at pixel level. Given the insight that pixels belonging to one instance have one or more common attributes of current instance, we bring up an one-stage instance segmentation network named Common Attribute Support Network (CASNet), which realizes instance segmentation by predicting and clustering common attributes. CASNet is designed in the manner of fully convolutional and can implement training and inference from end to end. And CASNet manages predicting the instance without overlaps and holes, which problem exists in most of current instance segmentation algorithms. Furthermore, it can be easily extended to panoptic segmentation through minor modifications with little computation overhead. CASNet builds a bridge between semantic and instance segmentation from finding pixel class ID to obtaining class and instance ID by operations on common attribute. Through experiment for instance and panoptic segmentation, CASNet gets mAP 32.8\% and PQ 59.0\% on Cityscapes validation dataset by joint training, and mAP 36.3\% and PQ 66.1\% by separated training mode. For panoptic segmentation, CASNet gets state-of-the-art performance on the Cityscapes validation dataset.

Skin Lesion Classification Using Weakly-Supervised Fine-Grained Method

Xi Xue, Sei-Ichiro Kamata, Daming Luo

Responsive image

Auto-TLDR; Different Region proposal module for skin lesion classification

Slides Poster Similar

In recent years, skin cancer has become one of the most common cancers. Among all types of skin cancers, melanoma is the most fatal one and many people die of this disease every year. Early detection can greatly reduce the death rate and save more lives. Skin lesions are one of the early symptoms of melanoma and other types of skin cancer. So accurately recognizing various skin lesions in early stage are of great significance. There have been lots of existing works based on convolutional neural networks (CNN) to solve skin lesion classification but seldom do them involve the similarity among different lesions. For example, we find that some lesions of melanoma and nevi look similar in appearance which is hard for neural network to distinguish categories of skin lesions. Inspired by fine-grained image classification, we propose a novel network to distinguish each category accurately. In our paper, we design an effective module, distinct region proposal module (DRPM), to extract the distinct regions from each image. Spatial attention and channel-wise attention are both utilized to enrich feature maps and guide the network to focus on the highlighted areas in a weakly-supervised way. In addition, two preprocessing steps are added to ensure the network to get better results. We demonstrate the potential of the proposed method on ISIC 2017 dataset. Experiments show that our approach is effective and efficient.

Unsupervised Detection of Pulmonary Opacities for Computer-Aided Diagnosis of COVID-19 on CT Images

Rui Xu, Xiao Cao, Yufeng Wang, Yen-Wei Chen, Xinchen Ye, Lin Lin, Wenchao Zhu, Chao Chen, Fangyi Xu, Yong Zhou, Hongjie Hu, Shoji Kido, Noriyuki Tomiyama

Responsive image

Auto-TLDR; A computer-aided diagnosis of COVID-19 from CT images using unsupervised pulmonary opacity detection

Slides Poster Similar

COVID-19 emerged towards the end of 2019 which was identified as a global pandemic by the world heath organization (WHO). With the rapid spread of COVID-19, the number of infected and suspected patients has increased dramatically. Chest computed tomography (CT) has been recognized as an efficient tool for the diagnosis of COVID-19. However, the huge CT data make it difficult for radiologist to fully exploit them on the diagnosis. In this paper, we propose a computer-aided diagnosis system that can automatically analyze CT images to distinguish the COVID-19 against to community-acquired pneumonia (CAP). The proposed system is based on an unsupervised pulmonary opacity detection method that locates opacity regions by a detector unsupervisedly trained from CT images with normal lung tissues. Radiomics based features are extracted insides the opacity regions, and fed into classifiers for classification. We evaluate the proposed CAD system by using 200 CT images collected from different patients in several hospitals. The accuracy, precision, recall, f1-score and AUC achieved are 95.5%, 100%, 91%, 95.1% and 95.9% respectively, exhibiting the promising capacity on the differential diagnosis of COVID-19 from CT images.

A Deep Learning Approach for the Segmentation of Myocardial Diseases

Khawala Brahim, Abdull Qayyum, Alain Lalande, Arnaud Boucher, Anis Sakly, Fabrice Meriaudeau

Responsive image

Auto-TLDR; Segmentation of Myocardium Infarction Using Late GADEMRI and SegU-Net

Slides Poster Similar

Cardiac left ventricular (LV) segmentation is of paramount essential step for both diagnosis and treatment of cardiac pathologies such as ischemia, myocardial infarction, arrhythmia and myocarditis. However, this segmentation is challenging due to high variability across patients and the potential lack of contrast between structures. In this work, we propose and evaluate a (2.5D) SegU-Net model based on the fusion of two deep learning techniques (U-Net and Seg-Net) for automated LGEMRI (Late gadolinium enhanced magnetic resonance imaging) myocardial disease (infarct core and no reflow region) quantification in a new multifield expert annotated dataset. Given that the scar tissue represents a small part of the whole MRI slices, we focused on myocardium area. Segmentation results show that this preprocessing step facilitate the learning procedure. In order to solve the class imbalance problem, we propose to apply the Jaccard loss and the Focal Loss as optimization loss function and to integrate a class weights strategy into the objective function. Late combination has been used to merge the output of the best trained models on a different set of hyperparameters. The final network segmentation performances will be useful for future comparison of new method to the current related work for this task. A total number of 2237 of slices (320 cases) were used for training/validation and 210 slices (35 cases) were used for testing. Experiments over our proposed dataset, using several evaluation metrics such Jaccard distance (IOU), Accuracy and Dice similarity coefficient (DSC), demonstrate efficiency performance in quantifying different zones of myocardium infarction across various patients. As compared to the second intra-observer study, our testing results showed that the SegUNet prediction model leads to these average dice coefficients over all segmented tissue classes, respectively : 'Background': 0.99999, 'Myocardium': 0.99434, 'Infarctus': 0.95587, 'Noreflow': 0.78187.

Learn to Segment Retinal Lesions and Beyond

Qijie Wei, Xirong Li, Weihong Yu, Xiao Zhang, Yongpeng Zhang, Bojie Hu, Bin Mo, Di Gong, Ning Chen, Dayong Ding, Youxin Chen

Responsive image

Auto-TLDR; Multi-task Lesion Segmentation and Disease Classification for Diabetic Retinopathy Grading

Poster Similar

Towards automated retinal screening, this paper makes an endeavor to simultaneously achieve pixel-level retinal lesion segmentation and image-level disease classification. Such a multi-task approach is crucial for accurate and clinically interpretable disease diagnosis. Prior art is insufficient due to three challenges, i.e., lesions lacking objective boundaries, clinical importance of lesions irrelevant to their size, and the lack of one-to-one correspondence between lesion and disease classes. This paper attacks the three challenges in the context of diabetic retinopathy (DR) grading. We propose Lesion-Net, a new variant of fully convolutional networks, with its expansive path re- designed to tackle the first challenge. A dual Dice loss that leverages both semantic segmentation and image classification losses is introduced to resolve the second challenge. Lastly, we build a multi-task network that employs Lesion-Net as a side- attention branch for both DR grading and result interpretation. A set of 12K fundus images is manually segmented by 45 ophthalmologists for 8 DR-related lesions, resulting in 290K manual segments in total. Extensive experiments on this large- scale dataset show that our proposed approach surpasses the prior art for multiple tasks including lesion segmentation, lesion classification and DR grading.

Investigating and Exploiting Image Resolution for Transfer Learning-Based Skin Lesion Classification

Amirreza Mahbod, Gerald Schaefer, Chunliang Wang, Rupert Ecker, Georg Dorffner, Isabella Ellinger

Responsive image

Auto-TLDR; Fine-tuned Neural Networks for Skin Lesion Classification Using Dermoscopic Images

Slides Poster Similar

Skin cancer is among the most common cancer types. Dermoscopic image analysis improves the diagnostic accuracy for detection of malignant melanoma and other pigmented skin lesions when compared to unaided visual inspection. Hence, computer-based methods to support medical experts in the diagnostic procedure are of great interest. Fine-tuning pre-trained convolutional neural networks (CNNs) has been shown to work well for skin lesion classification. Pre-trained CNNs are usually trained with natural images of a fixed image size which is typically significantly smaller than captured skin lesion images and consequently dermoscopic images are downsampled for fine-tuning. However, useful medical information may be lost during this transformation. In this paper, we explore the effect of input image size on skin lesion classification performance of fine-tuned CNNs. For this, we resize dermoscopic images to different resolutions, ranging from 64x64 to 768x768 pixels and investigate the resulting classification performance of three well-established CNNs, namely DenseNet-121, ResNet-18, and ResNet-50. Our results show that using very small images (of size 64x64 pixels) degrades the classification performance, while images of size 128x128 pixels and above support good performance with larger image sizes leading to slightly improved classification. We further propose a novel fusion approach based on a three-level ensemble strategy that exploits multiple fine-tuned networks trained with dermoscopic images at various sizes. When applied on the ISIC 2017 skin lesion classification challenge, our fusion approach yields an area under the receiver operating characteristic curve of 89.2% and 96.6% for melanoma classification and seborrheic keratosis classification, respectively, outperforming state-of-the-art algorithms.

MTGAN: Mask and Texture-Driven Generative Adversarial Network for Lung Nodule Segmentation

Wei Chen, Qiuli Wang, Kun Wang, Dan Yang, Xiaohong Zhang, Chen Liu, Yucong Li

Responsive image

Auto-TLDR; Mask and Texture-driven Generative Adversarial Network for Lung Nodule Segmentation

Slides Poster Similar

Accurate segmentation for lung nodules in lung computed tomography (CT) scans plays a key role in the early diagnosis of lung cancer. Many existing methods, especially UNet, have made significant progress in lung nodule segmentation. However, due to the complex shapes of lung nodules and the similarity of visual characteristics between nodules and lung tissues, an accurate segmentation with few false positives of lung nodules is still a challenging problem. Considering the fact that both boundary and texture information of lung nodules are important for obtaining an accurate segmentation result, we propose a novel Mask and Texture-driven Generative Adversarial Network (MTGAN) with a joint multi-scale L1 loss for lung nodule segmentation, which takes full advantages of U-Net and adversarial training. The proposed MTGAN leverages adversarial learning strategy guided by the boundary and texture information of lung nodules to generate more accurate segmentation results with lesser false positives. We validate our model with the LIDC–IDRI dataset, and experimental results show that our method achieves excellent segmentation results for a variety of lung nodules, especially for juxtapleural nodules and low-dense nodules. Without any bells and whistles, the proposed MTGAN achieves significant segmentation performance with the Dice similarity coefficient (DSC) of 85.24% on the LIDC–IDRI dataset.

Revisiting Sequence-To-Sequence Video Object Segmentation with Multi-Task Loss and Skip-Memory

Fatemeh Azimi, Benjamin Bischke, Sebastian Palacio, Federico Raue, Jörn Hees, Andreas Dengel

Responsive image

Auto-TLDR; Sequence-to-Sequence Learning for Video Object Segmentation

Slides Poster Similar

Video Object Segmentation (VOS) is an active research area of the visual domain. One of its fundamental sub-tasks is semi-supervised / one-shot learning: given only the segmentation mask for the first frame, the task is to provide pixel-accurate masks for the object over the rest of the sequence. Despite much progress in the last years, we noticed that many of the existing approaches lose objects in longer sequences, especially when the object is small or briefly occluded. In this work, we build upon a sequence-to-sequence approach that employs an encoder-decoder architecture together with a memory module for exploiting the sequential data. We further improve this approach by proposing a model that manipulates multi-scale spatio-temporal information using memory-equipped skip connections. Furthermore, we incorporate an auxiliary task based on distance classification which greatly enhances the quality of edges in segmentation masks. We compare our approach to the state of the art and show considerable improvement in the contour accuracy metric and the overall segmentation accuracy.

Bridging the Gap between Natural and Medical Images through Deep Colorization

Lia Morra, Luca Piano, Fabrizio Lamberti, Tatiana Tommasi

Responsive image

Auto-TLDR; Transfer Learning for Diagnosis on X-ray Images Using Color Adaptation

Slides Poster Similar

Deep learning has thrived by training on large-scale datasets. However, in many applications, as for medical image diagnosis, getting massive amount of data is still prohibitive due to privacy, lack of acquisition homogeneity and annotation cost. In this scenario transfer learning from natural image collections is a standard practice that attempts to tackle shape, texture and color discrepancy all at once through pretrained model fine-tuning. In this work we propose to disentangle those challenges and design a dedicated network module that focuses on color adaptation. We combine learning from scratch of the color module with transfer learning of different classification backbones obtaining an end-to-end, easy-to-train architecture for diagnostic image recognition on X-ray images. Extensive experiments show how our approach is particularly efficient in case of data scarcity and provides a new path for further transferring the learned color information across multiple medical datasets.

Semantic Segmentation of Breast Ultrasound Image with Pyramid Fuzzy Uncertainty Reduction and Direction Connectedness Feature

Kuan Huang, Yingtao Zhang, Heng-Da Cheng, Ping Xing, Boyu Zhang

Responsive image

Auto-TLDR; Uncertainty-Based Deep Learning for Breast Ultrasound Image Segmentation

Slides Poster Similar

Deep learning approaches have achieved impressive results in breast ultrasound (BUS) image segmentation. However, these methods did not solve uncertainty and noise in BUS images well. To address this issue, we present a novel deep learning structure for BUS image semantic segmentation by analyzing the uncertainty using a pyramid fuzzy block and generating a novel feature based on connectedness. Firstly, feature maps in the proposed network are down-sampled to different resolutions. Fuzzy transformation and uncertainty representation are applied to each resolution to obtain the uncertainty degree on different scales. Meanwhile, the BUS images contain layer structures. From top to bottom, there are skin layer, fat layer, mammary layer, muscle layer, and background area. A spatial recurrent neural network (RNN) is utilized to calculate the connectedness between each pixel and the pixels on the four boundaries in horizontal and vertical lines. The spatial-wise context feature can introduce the characteristic of layer structure to deep neural network. Finally, the original convolutional features are combined with connectedness feature according to the uncertainty degrees. The proposed methods are applied to two datasets: a BUS image benchmark with two categories (background and tumor) and a five-category BUS image dataset with fat layer, mammary layer, muscle layer, background, and tumor. The proposed method achieves the best results on both datasets compared with eight state-of-the-art deep learning-based approaches.

Dealing with Scarce Labelled Data: Semi-Supervised Deep Learning with Mix Match for Covid-19 Detection Using Chest X-Ray Images

Saúl Calderón Ramirez, Raghvendra Giri, Shengxiang Yang, Armaghan Moemeni, Mario Umaña, David Elizondo, Jordina Torrents-Barrena, Miguel A. Molina-Cabello

Responsive image

Auto-TLDR; Semi-supervised Deep Learning for Covid-19 Detection using Chest X-rays

Slides Poster Similar

Coronavirus (Covid-19) is spreading fast, infecting people through contact in various forms including droplets from sneezing and coughing. Therefore, the detection of infected subjects in an early, quick and cheap manner is urgent. Currently available tests are scarce and limited to people in danger of serious illness. The application of deep learning to chest X-ray images for Covid-19 detection is an attractive approach. However, this technology usually relies on the availability of large labelled datasets, a requirement hard to meet in the context of a virus outbreak. To overcome this challenge, a semi-supervised deep learning model using both labelled and unlabelled data is proposed. We developed and tested a semi-supervised deep learning framework based on the Mix Match architecture to classify chest X-rays into Covid-19, pneumonia and healthy cases. The presented approach was calibrated using two publicly available datasets. The results show an accuracy increase of around $15\%$ under low labelled / unlabelled data ratio. This indicates that our semi-supervised framework can help improve performance levels towards Covid-19 detection when the amount of high-quality labelled data is scarce. Also, we introduce a semi-supervised deep learning boost coefficient which is meant to ease the scalability of our approach and performance comparison.

Detecting Marine Species in Echograms Via Traditional, Hybrid, and Deep Learning Frameworks

Porto Marques Tunai, Alireza Rezvanifar, Melissa Cote, Alexandra Branzan Albu, Kaan Ersahin, Todd Mudge, Stephane Gauthier

Responsive image

Auto-TLDR; End-to-End Deep Learning for Echogram Interpretation of Marine Species in Echograms

Slides Poster Similar

This paper provides a comprehensive comparative study of traditional, hybrid, and deep learning (DL) methods for detecting marine species in echograms. Acoustic backscatter data obtained from multi-frequency echosounders is visualized as echograms and typically interpreted by marine biologists via manual or semi-automatic methods, which are time-consuming. Challenges related to automatic echogram interpretation are the variable size and acoustic properties of the biological targets (marine life), along with significant inter-class similarities. Our study explores and compares three types of approaches that cover the entire range of machine learning methods. Based on our experimental results, we conclude that an end-to-end DL-based framework, that can be readily scaled to accommodate new species, is overall preferable to other learning approaches for echogram interpretation, even when only a limited number of annotated training samples is available.

Planar 3D Transfer Learning for End to End Unimodal MRI Unbalanced Data Segmentation

Martin Kolarik, Radim Burget, Carlos M. Travieso-Gonzalez, Jan Kocica

Responsive image

Auto-TLDR; Planar 3D Res-U-Net Network for Unbalanced 3D Image Segmentation using Fluid Attenuation Inversion Recover

Slides Similar

We present a novel approach of 2D to 3D transfer learning based on mapping pre-trained 2D convolutional neural network weights into planar 3D kernels. The method is validated by proposed planar 3D res-u-net network with encoder transferred from the 2D VGG-16 which is applied for a single-stage unbalanced 3D image data segmentation. In particular, we evaluate the method on the MICCAI 2016 MS lesion segmentation challenge dataset utilizing solely Fluid Attenuation Inversion Recover (FLAIR) sequence without brain extraction for training and inference to simulate real medical praxis. The planar 3D res-u-net network performed the best both in sensitivity and Dice score amongst end to end methods processing raw MRI scans and achieved comparable Dice score to a state-of-the-art unimodal not end to end approach. Complete source code was released under the open-source license and this paper is in compliance with the Machine learning Reproducibility Checklist. By implementing practical transfer learning for 3D data representation we were able to successfully segment heavily unbalanced data without selective sampling and achieved more reliable results using less training data in single modality. From medical perspective, the unimodal approach gives an advantage in real praxis as it does not require co-registration nor additional scanning time during examination. Although modern medical imaging methods capture high resolution 3D anatomy scans suitable for computer aided detection system processing, deployment of automatic systems for interpretation of radiology imaging is still rather theoretical in many medical areas. Our work aims to bridge the gap offering solution for partial research questions.

Leveraging Unlabeled Data for Glioma Molecular Subtype and Survival Prediction

Nicholas Nuechterlein, Beibin Li, Mehmet Saygin Seyfioglu, Sachin Mehta, Patrick Cimino, Linda Shapiro

Responsive image

Auto-TLDR; Multimodal Brain Tumor Segmentation Using Unlabeled MR Data and Genomic Data for Cancer Prediction

Slides Poster Similar

In this paper, we address two long-standing challenges in neuro-oncology: (1) how to leverage large amounts of unlabeled magnetic resonance (MR) imaging data for radiogenomic tasks and (2) how to unite glioma MR imaging with genomic data. We examine multi-parametric MR data from 542 patients in the combined training, validation, and testing sets of the 2018 Multimodal Brain Tumor Segmentation Challenge and somatic copy number alteration (SCNA) data from 1090 patients in The Cancer Genome Archive's (TCGA) lower-grade glioma and glioblastoma projects. We propose a novel application of multi-task learning (MTL) that leverages unlabeled MR data by jointly learning tumor segmentation masks with glioma molecular subtype markers and allows for SCNA input when available. There are 235 patients in the intersection of these MR and SCNA datasets, which we divide into an unlabeled training set, a labeled training set, and a validation set. Our MTL model significantly outperforms comparable classification models trained only on labeled MR data for both IDH1/2 mutation and 1p/19q co-deletion glioma subtype marker prediction tasks. We also observe that models trained on genomic and imaging data improve survival prediction results achieved by models trained on either alone. We will release our source code for future research.

Attack-Agnostic Adversarial Detection on Medical Data Using Explainable Machine Learning

Matthew Watson, Noura Al Moubayed

Responsive image

Auto-TLDR; Explainability-based Detection of Adversarial Samples on EHR and Chest X-Ray Data

Slides Poster Similar

Explainable machine learning has become increasingly prevalent, especially in healthcare where explainable models are vital for ethical and trusted automated decision making. Work on the susceptibility of deep learning models to adversarial attacks has shown the ease of designing samples to mislead a model into making incorrect predictions. In this work, we propose an explainability-based method for the accurate detection of adversarial samples on two datasets with different complexity and properties: Electronic Health Record (EHR) and chest X-ray (CXR) data. On the MIMIC-III and Henan-Renmin EHR datasets, we report a detection accuracy of 77% against the Longitudinal Adversarial Attack. On the MIMIC-CXR dataset, we achieve an accuracy of 88%; significantly improving on the state of the art of adversarial detection in both datasets by over 10% in all settings. We propose an anomaly detection based method using explainability techniques to detect adversarial samples which is able to generalise to different attack methods without a need for retraining.

Superpixel-Based Refinement for Object Proposal Generation

Christian Wilms, Simone Frintrop

Responsive image

Auto-TLDR; Superpixel-based Refinement of AttentionMask for Object Segmentation

Slides Poster Similar

Precise segmentation of objects is an important problem in tasks like class-agnostic object proposal generation or instance segmentation. Deep learning-based systems usually generate segmentations of objects based on coarse feature maps, due to the inherent downsampling in CNNs. This leads to segmentation boundaries not adhering well to the object boundaries in the image. To tackle this problem, we introduce a new superpixel-based refinement approach on top of the state-of-the-art object proposal system AttentionMask. The refinement utilizes superpixel pooling for feature extraction and a novel superpixel classifier to determine if a high precision superpixel belongs to an object or not. Our experiments show an improvement of up to 26.0% in terms of average recall compared to original AttentionMask. Furthermore, qualitative and quantitative analyses of the segmentations reveal significant improvements in terms of boundary adherence for the proposed refinement compared to various deep learning-based state-of-the-art object proposal generation systems.

Classification of Intestinal Gland Cell-Graphs Using Graph Neural Networks

Linda Studer, Jannis Wallau, Heather Dawson, Inti Zlobec, Andreas Fischer

Responsive image

Auto-TLDR; Graph Neural Networks for Classification of Dysplastic Gland Glands using Graph Neural Networks

Slides Poster Similar

We propose to classify intestinal glands as normal or dysplastic using cell-graphs and graph-based deep learning methods. Dysplastic intestinal glands can lead to colorectal cancer, which is one of the three most common cancer types in the world. In order to assess the cancer stage and thus the treatment of a patient, pathologists analyse tissue samples of affected patients. Among other factors, they look at the changes in morphology of different tissues, such as the intestinal glands. Cell-graphs have a high representational power and can describe topological and geometrical properties of intestinal glands. However, classical graph-based methods have a high computational complexity and there is only a limited range of machine learning methods available. In this paper, we propose Graph Neural Networks (GNNs) as an efficient learning-based approach to classify cell-graphs. We investigate different variants of so-called Message Passing Neural Networks and compare them with a classical graph-based approach based on approximated Graph Edit Distance and k-nearest neighbours classifier. A promising classification accuracy of 94.1% is achieved by the proposed method on the pT1 Gland Graph dataset, which is an increase of 11.5% over the baseline result.

A Novel Region of Interest Extraction Layer for Instance Segmentation

Leonardo Rossi, Akbar Karimi, Andrea Prati

Responsive image

Auto-TLDR; Generic RoI Extractor for Two-Stage Neural Network for Instance Segmentation

Slides Poster Similar

Given the wide diffusion of deep neural network architectures for computer vision tasks, several new applications are nowadays more and more feasible. Among them, a particular attention has been recently given to instance segmentation, by exploiting the results achievable by two-stage networks (such as Mask R-CNN or Faster R-CNN), derived from R-CNN. In these complex architectures, a crucial role is played by the Region of Interest (RoI) extraction layer, devoted to extract a coherent subset of features from a single Feature Pyramid Network (FPN) layer attached on top of a backbone. This paper is motivated by the need to overcome to the limitations of existing RoI extractors which select only one (the best) layer from FPN. Our intuition is that all the layers of FPN retain useful information. Therefore, the proposed layer (called Generic RoI Extractor - GRoIE) introduces non-local building blocks and attention mechanisms to boost the performance. A comprehensive ablation study at component level is conducted to find the best set of algorithms and parameters for the GRoIE layer. Moreover, GRoIE can be integrated seamlessly with every two-stage architecture for both object detection and instance segmentation tasks. Therefore, the improvements brought by the use of GRoIE in different state-of-the-art architectures are also evaluated. The proposed layer leads up to gain a 1.1% AP on bounding box detection and 1.7% AP on instance segmentation. The code is publicly available on GitHub repository at https://github.com/IMPLabUniPr/mmdetection-groie

A Fine-Grained Dataset and Its Efficient Semantic Segmentation for Unstructured Driving Scenarios

Kai Andreas Metzger, Peter Mortimer, Hans J "Joe" Wuensche

Responsive image

Auto-TLDR; TAS500: A Semantic Segmentation Dataset for Autonomous Driving in Unstructured Environments

Slides Poster Similar

Research in autonomous driving for unstructured environments suffers from a lack of semantically labeled datasets compared to its urban counterpart. Urban and unstructured outdoor environments are challenging due to the varying lighting and weather conditions during a day and across seasons. In this paper, we introduce TAS500, a novel semantic segmentation dataset for autonomous driving in unstructured environments. TAS500 offers fine-grained vegetation and terrain classes to learn drivable surfaces and natural obstacles in outdoor scenes effectively. We evaluate the performance of modern semantic segmentation models with an additional focus on their efficiency. Our experiments demonstrate the advantages of fine-grained semantic classes to improve the overall prediction accuracy, especially along the class boundaries. The dataset, code, and pretrained model are available online.

Confidence Calibration for Deep Renal Biopsy Immunofluorescence Image Classification

Federico Pollastri, Juan Maroñas, Federico Bolelli, Giulia Ligabue, Roberto Paredes, Riccardo Magistroni, Costantino Grana

Responsive image

Auto-TLDR; A Probabilistic Convolutional Neural Network for Immunofluorescence Classification in Renal Biopsy

Slides Poster Similar

With this work we tackle immunofluorescence classification in renal biopsy, employing state-of-the-art Convolutional Neural Networks. In this setting, the aim of the probabilistic model is to assist an expert practitioner towards identifying the location pattern of antibody deposits within a glomerulus. Since modern neural networks often provide overconfident outputs, we stress the importance of having a reliable prediction, demonstrating that Temperature Scaling, a recently introduced re-calibration technique, can be successfully applied to immunofluorescence classification in renal biopsy. Experimental results demonstrate that the designed model yields good accuracy on the specific task, and that Temperature Scaling is able to provide reliable probabilities, which are highly valuable for such a task given the low inter-rater agreement.

Merged 1D-2D Deep Convolutional Neural Networks for Nerve Detection in Ultrasound Images

Mohammad Alkhatib, Adel Hafiane, Pierre Vieyres

Responsive image

Auto-TLDR; A Deep Neural Network for Deep Neural Networks to Detect Median Nerve in Ultrasound-Guided Regional Anesthesia

Slides Poster Similar

Ultrasound-Guided Regional Anesthesia (UGRA) becomes a standard procedure in surgical operations and contributes to pain management. It offers the advantages of the targeted nerve detection and provides the visualization of regions of interest such as anatomical structures. However, nerve detection is one of the most challenging tasks that anesthetists can encounter in the UGRA procedure. A computer-aided system that can detect automatically the nerve region would facilitate the anesthetist's daily routine and allow them to concentrate more on the anesthetic delivery. In this paper, we propose a new method based on merging deep learning models from different data to detect the median nerve. The merged architecture consists of two branches, one being one dimensional (1D) convolutional neural networks (CNN) branch and another 2D CNN branch. The merged architecture aims to learn the high-level features from 1D handcrafted noise-robust features and 2D ultrasound images. The obtained results show the validity and high accuracy of the proposed approach and its robustness.

Early Wildfire Smoke Detection in Videos

Taanya Gupta, Hengyue Liu, Bir Bhanu

Responsive image

Auto-TLDR; Semi-supervised Spatio-Temporal Video Object Segmentation for Automatic Detection of Smoke in Videos during Forest Fire

Poster Similar

Recent advances in unmanned aerial vehicles and camera technology have proven useful for the detection of smoke that emerges above the trees during a forest fire. Automatic detection of smoke in videos is of great interest to Fire department. To date, in most parts of the world, the fire is not detected in its early stage and generally it turns catastrophic. This paper introduces a novel technique that integrates spatial and temporal features in a deep learning framework using semi-supervised spatio-temporal video object segmentation and dense optical flow. However, detecting this smoke in the presence of haze and without the labeled data is difficult. Considering the visibility of haze in the sky, a dark channel pre-processing method is used that reduces the amount of haze in video frames and consequently improves the detection results. Online training is performed on a video at the time of testing that reduces the need for ground-truth data. Tests using the publicly available video datasets show that the proposed algorithms outperform previous work and they are robust across different wildfire-threatened locations.

A General End-To-End Method for Characterizing Neuropsychiatric Disorders Using Free-Viewing Visual Scanning Tasks

Hong Yue Sean Liu, Jonathan Chung, Moshe Eizenman

Responsive image

Auto-TLDR; A general, data-driven, end-to-end framework that extracts relevant features of attentional bias from visual scanning behaviour and uses these features

Slides Poster Similar

The growing availability of eye-gaze tracking technology has allowed for its employment in a wide variety of applications, one of which is the objective diagnosis and monitoring of neuropsychiatric disorders from features of attentional bias extracted from visual scanning patterns. Current techniques in this field are largely comprised of non-generalizable methodologies that rely on domain expertise and study-specific assumptions. In this paper, we present a general, data-driven, end-to-end framework that extracts relevant features of attentional bias from visual scanning behaviour and uses these features to classify between subject groups with standard machine learning techniques. During the free-viewing task, subjects view sets of slides with thematic images while their visual scanning patterns (sets of ordered fixations) are monitored by an eye-tracking system. We encode fixations into relative visual attention maps (RVAMs) to describe measurement errors, and two data-driven methods are proposed to segment regions of interests from RVAMs: 1) using group average RVAMs, and 2) using difference of group average RVAMs. Relative fixation times within regions of interest are calculated and used as input features for a vanilla multilayered perceptron to classify between patient groups. The methods were evaluated on data from an anorexia nervosa (AN) study with 37 subjects and a bipolar/major depressive disorder (BD-MDD) study with 73 subjects. Using leave-one-subject-out cross validation, our technique is able to achieve an area under the receiver operating curve (AUROC) score of 0.935 for the AN study and 0.888 for the BD-MDD study, the latter of which exceeds the performance of the state-of-the-art analysis model designed specifically for the BD-MDD study, which had an AUROC of 0.879. The results validate the proposed methods' efficacy as generalizable, standard baselines for analyzing visual scanning data.

3D Medical Multi-Modal Segmentation Network Guided by Multi-Source Correlation Constraint

Tongxue Zhou, Stéphane Canu, Pierre Vera, Su Ruan

Responsive image

Auto-TLDR; Multi-modality Segmentation with Correlation Constrained Network

Slides Poster Similar

In the field of multimodal segmentation, the correlation between different modalities can be considered for improving the segmentation results. In this paper, we propose a multi-modality segmentation network with a correlation constraint. Our network includes N model-independent encoding paths with N image sources, a correlation constrain block, a feature fusion block, and a decoding path. The model-independent encoding path can capture modality-specific features from the N modalities. Since there exists a strong correlation between different modalities, we first propose a linear correlation block to learn the correlation between modalities, then a loss function is used to guide the network to learn the correlated features based on the correlation representation block. This block forces the network to learn the latent correlated features which are more relevant for segmentation. Considering that not all the features extracted from the encoders are useful for segmentation, we propose to use dual attention based fusion block to recalibrate the features along the modality and spatial paths, which can suppress less informative features and emphasize the useful ones. The fused feature representation is finally projected by the decoder to obtain the segmentation result. Our experiment results tested on BraTS-2018 dataset for brain tumor segmentation demonstrate the effectiveness of our proposed method.

Accurate Cell Segmentation in Digital Pathology Images Via Attention Enforced Networks

Zeyi Yao, Kaiqi Li, Guanhong Zhang, Yiwen Luo, Xiaoguang Zhou, Muyi Sun

Responsive image

Auto-TLDR; AENet: Attention Enforced Network for Automatic Cell Segmentation

Slides Poster Similar

Automatic cell segmentation is an essential step in the pipeline of computer-aided diagnosis (CAD), such as the detection and grading of breast cancer. Accurate segmentation of cells can not only assist the pathologists to make a more precise diagnosis, but also save much time and labor. However, this task suffers from stain variation, cell inhomogeneous intensities, background clutters and cells from different tissues. To address these issues, we propose an Attention Enforced Network (AENet), which is built on spatial attention module and channel attention module, to integrate local features with global dependencies and weight effective channels adaptively. Besides, we introduce a feature fusion branch to bridge high-level and low-level features. Finally, the marker controlled watershed algorithm is applied to post-process the predicted segmentation maps for reducing the fragmented regions. In the test stage, we present an individual color normalization method to deal with the stain variation problem. We evaluate this model on the MoNuSeg dataset. The quantitative comparisons against several prior methods demonstrate the priority of our approach.

RescueNet: Joint Building Segmentation and Damage Assessment from Satellite Imagery

Rohit Gupta, Mubarak Shah

Responsive image

Auto-TLDR; RescueNet: End-to-End Building Segmentation and Damage Classification for Humanitarian Aid and Disaster Response

Slides Poster Similar

Accurate and fine-grained information about the extent of damage to buildings is essential for directing Humanitarian Aid and Disaster Response (HADR) operations in the immediate aftermath of any natural calamity. In recent years, satellite and UAV (drone) imagery has been used for this purpose, sometimes aided by computer vision algorithms. Existing Computer Vision approaches for building damage assessment typically rely on a two stage approach, consisting of building detection using an object detection model, followed by damage assessment through classification of the detected building tiles. These multi-stage methods are not end-to-end trainable, and suffer from poor overall results. We propose RescueNet, a unified model that can simultaneously segment buildings and assess the damage levels to individual buildings and can be trained end-to end. In order to to model the composite nature of this problem, we propose a novel localization aware loss function, which consists of a Binary Cross Entropy loss for building segmentation, and a foreground only selective Categorical Cross-Entropy loss for damage classification, and show significant improvement over the widely used Cross-Entropy loss. RescueNet is tested on the large scale and diverse xBD dataset and achieves significantly better building segmentation and damage classification performance than previous methods and achieves generalization across varied geographical regions and disaster types.

Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action Localization

Negin Ghamsarian, Mario Taschwer, Doris Putzgruber, Stephanie. Sarny, Klaus Schoeffmann

Responsive image

Auto-TLDR; relevance-based retrieval in cataract surgery videos

Slides Similar

In cataract surgery, the operation is performed with the help of a microscope. Since the microscope enables watching real-time surgery by up to two people only, a major part of surgical training is conducted using the recorded videos. To optimize the training procedure with the video content, the surgeons require an automatic relevance detection approach. In addition to relevance-based retrieval, these results can be further used for skill assessment and irregularity detection in cataract surgery videos. In this paper, a three-module framework is proposed to detect and classify the relevant phase segments in cataract videos. Taking advantage of an idle frame recognition network, the video is divided into idle and action segments. To boost the performance in relevance detection Mask R-CNN is utilized to detect the cornea in each frame where the relevant surgical actions are conducted. The spatio-temporal localized segments containing higher-resolution information about the pupil texture and actions, and complementary temporal information from the same phase are fed into the relevance detection module. This module consists of four parallel recurrent CNNs being responsible to detect four relevant phases that have been defined with medical experts. The results will then be integrated to classify the action phases as irrelevant or one of four relevant phases. Experimental results reveal that the proposed approach outperforms static CNNs and different configurations of feature-based and end-to-end recurrent networks.

Segmentation of Axillary and Supraclavicular Tumoral Lymph Nodes in PET/CT: A Hybrid CNN/Component-Tree Approach

Diana Lucia Farfan Cabrera, Nicolas Gogin, David Morland, Benoît Naegel, Dimitri Papathanassiou, Nicolas Passat

Responsive image

Auto-TLDR; Coupling Convolutional Neural Networks and Component-Trees for Lymph node Segmentation from PET/CT Images

Slides Similar

The analysis of axillary and supraclavicular lymph nodes is a primary prognostic factor for the staging of breast cancer. However, due to the size of lymph nodes and the low resolution of PET data, their segmentation is challenging. We investigate the relevance of considering axillary and supraclavicular lymph node segmentation from PET/CT images by coupling Convolutional Neural Networks (CNNs) and Component-Trees (C-Trees). Building upon the U-Net architecture, we propose a framework that couples a multi-modal U-Net fed with PET and CT, coupled with a hierarchical model obtained from the PET that provides additional high-level region-based features as input channels. Our working hypotheses are twofold. First, we take advantage of both anatomical information from CT for detecting the nodes, and from functional information from PET for detecting the pathological ones. Second, we consider region-based attributes extracted from C-Tree analysis of 3D PET/CT images to improve the CNN segmentation. We carried out experiments on a dataset of 240 pathological lymph nodes from 52 patients scans, and compared our outputs with human expert-defined ground-truth, leading to promising results.

An Integrated Approach of Deep Learning and Symbolic Analysis for Digital PDF Table Extraction

Mengshi Zhang, Daniel Perelman, Vu Le, Sumit Gulwani

Responsive image

Auto-TLDR; Deep Learning and Symbolic Reasoning for Unstructured PDF Table Extraction

Slides Poster Similar

Deep learning has shown great success at interpreting unstructured data such as object recognition in images. Symbolic/logical-reasoning techniques have shown great success in interpreting structured data such as table extraction in webpages, custom text files, spreadsheets. The tables in PDF documents are often generated from such structured sources (text-based Word/Latex documents, spreadsheets, webpages) but end up being unstructured. We thus explore novel combinations of deep learning and symbolic reasoning techniques to build an effective solution for PDF table extraction. We evaluate effectiveness without granting partial credit for matching part of a table (which may cause silent errors in downstream data processing). Our method achieves a 0.725 F1 score (vs. 0.339 for the state-of-the-art) on detecting correct table bounds---a much stricter metric than the common one of detecting characters within tables---in a well known public benchmark (ICDAR 2013) and a 0.404 F1 score (vs. 0.144 for the state-of-the-art) on our private benchmark with more widely varied table structures.