Improving Model Accuracy for Imbalanced Image Classification Tasks by Adding a Final Batch Normalization Layer: An Empirical Study

Veysel Kocaman, Ofer M. Shir, Thomas Baeck

Responsive image

Auto-TLDR; Exploiting Batch Normalization before the Output Layer in Deep Learning for Minority Class Detection in Imbalanced Data Sets

Slides Poster

Some real-world domains, such as Agriculture and Healthcare, comprise early-stage disease indications whose recording constitutes a rare event, and yet, whose precise detection at that stage is critical. In this type of highly imbalanced classification problems, which encompass complex features, deep learning (DL) is much needed because of its strong detection capabilities. At the same time, DL is observed in practice to favor majority over minority classes and consequently suffer from inaccurate detection of the targeted early-stage indications. To simulate such scenarios, we artificially generate skewness (99% vs. 1%) for certain plant types out of the PlantVillage dataset as a basis for classification of scarce visual cues through transfer learning. By randomly and unevenly picking healthy and unhealthy samples from certain plant types to form a training set, we consider a base experiment as fine-tuning ResNet34 and VGG19 architectures and then testing the model performance on a balanced dataset of healthy and unhealthy images. We empirically observe that the initial F1 test score jumps from 0.29 to 0.95 for the minority class upon adding a final Batch Normalization (BN) layer just before the output layer in VGG19. We demonstrate that utilizing an additional BN layer before the output layer in modern CNN architectures has a considerable impact in terms of minimizing the training time and testing error for minority classes in highly imbalanced data sets. Moreover, when the final BN is employed, trying to minimize validation and training losses may not be an optimal way for getting a high F1 test score for minority classes in anomaly detection problems. That is, the network might perform better even if it is not ‘confident’ enough while making a prediction; leading to another discussion about why softmax output is not a good uncertainty measure for DL models.

Similar papers

Generalization Comparison of Deep Neural Networks Via Output Sensitivity

Mahsa Forouzesh, Farnood Salehi, Patrick Thiran

Responsive image

Auto-TLDR; Generalization of Deep Neural Networks using Sensitivity

Slides Similar

Although recent works have brought some insights into the performance improvement of techniques used in state-of-the-art deep-learning models, more work is needed to understand their generalization properties. We shed light on this matter by linking the loss function to the output's sensitivity to its input. We find a rather strong empirical relation between the output sensitivity and the variance in the bias-variance decomposition of the loss function, which hints on using sensitivity as a metric for comparing the generalization performance of networks, without requiring labeled data. We find that sensitivity is decreased by applying popular methods which improve the generalization performance of the model, such as (1) using a deep network rather than a wide one, (2) adding convolutional layers to baseline classifiers instead of adding fully-connected layers, (3) using batch normalization, dropout and max-pooling, and (4) applying parameter initialization techniques.

Confidence Calibration for Deep Renal Biopsy Immunofluorescence Image Classification

Federico Pollastri, Juan Maroñas, Federico Bolelli, Giulia Ligabue, Roberto Paredes, Riccardo Magistroni, Costantino Grana

Responsive image

Auto-TLDR; A Probabilistic Convolutional Neural Network for Immunofluorescence Classification in Renal Biopsy

Slides Poster Similar

With this work we tackle immunofluorescence classification in renal biopsy, employing state-of-the-art Convolutional Neural Networks. In this setting, the aim of the probabilistic model is to assist an expert practitioner towards identifying the location pattern of antibody deposits within a glomerulus. Since modern neural networks often provide overconfident outputs, we stress the importance of having a reliable prediction, demonstrating that Temperature Scaling, a recently introduced re-calibration technique, can be successfully applied to immunofluorescence classification in renal biopsy. Experimental results demonstrate that the designed model yields good accuracy on the specific task, and that Temperature Scaling is able to provide reliable probabilities, which are highly valuable for such a task given the low inter-rater agreement.

Improving Batch Normalization with Skewness Reduction for Deep Neural Networks

Pak Lun Kevin Ding, Martin Sarah, Baoxin Li

Responsive image

Auto-TLDR; Batch Normalization with Skewness Reduction

Slides Poster Similar

Batch Normalization (BN) is a well-known technique used in training deep neural networks. The main idea behind batch normalization is to normalize the features of the layers ($i.e.$, transforming them to have a mean equal to zero and a variance equal to one). Such a procedure encourages the optimization landscape of the loss function to be smoother, and improve the learning of the networks for both speed and performance. In this paper, we demonstrate that the performance of the network can be improved, if the distributions of the features of the output in the same layer are similar. As normalizing based on mean and variance does not necessarily make the features to have the same distribution, we propose a new normalization scheme: Batch Normalization with Skewness Reduction (BNSR). Comparing with other normalization approaches, BNSR transforms not just only the mean and variance, but also the skewness of the data. By tackling this property of a distribution, we are able to make the output distributions of the layers to be further similar. The nonlinearity of BNSR may further improve the expressiveness of the underlying network. Comparisons with other normalization schemes are tested on the CIFAR-100 and ImageNet datasets. Experimental results show that the proposed approach can outperform other state-of-the-arts that are not equipped with BNSR.

Probability Guided Maxout

Claudio Ferrari, Stefano Berretti, Alberto Del Bimbo

Responsive image

Auto-TLDR; Probability Guided Maxout for CNN Training

Slides Poster Similar

In this paper, we propose an original CNN training strategy that brings together ideas from both dropout-like regularization methods and solutions that learn discriminative features. We propose a dropping criterion that, differently from dropout and its variants, is deterministic rather than random. It grounds on the empirical evidence that feature descriptors with larger $L2$-norm and highly-active nodes are strongly correlated to confident class predictions. Thus, our criterion guides towards dropping a percentage of the most active nodes of the descriptors, proportionally to the estimated class probability. We simultaneously train a per-sample scaling factor to balance the expected output across training and inference. This further allows us to keep high the descriptor's L2-norm, which we show enforces confident predictions. The combination of these two strategies resulted in our ``Probability Guided Maxout'' solution that acts as a training regularizer. We prove the above behaviors by reporting extensive image classification results on the CIFAR10, CIFAR100, and Caltech256 datasets.

Fine-Tuning Convolutional Neural Networks: A Comprehensive Guide and Benchmark Analysis for Glaucoma Screening

Amed Mvoulana, Rostom Kachouri, Mohamed Akil

Responsive image

Auto-TLDR; Fine-tuning Convolutional Neural Networks for Glaucoma Screening

Slides Poster Similar

This work aimed at giving a comprehensive and in-detailed guide on the route to fine-tuning Convolutional Neural Networks (CNNs) for glaucoma screening. Transfer learning consists in a promising alternative to train CNNs from stratch, to avoid the huge data and resources requirements. After a thorough study of five state-of-the-art CNNs architectures, a complete and well-explained strategy for fine-tuning these networks is proposed, using hyperparameter grid-searching and two-phase training approach. Excellent performance is reached on model evaluation, with a 0.9772 AUROC validation rate, giving arise to reliable glaucoma diagosis-help systems. Also, a benchmark analysis is conducted across all fine-tuned models, studying them according to performance indices such as model complexity and size, AUROC density and inference time. This in-depth analysis allows a rigorous comparison between model characteristics, and is useful for giving practioners important trademarks for prospective applications and deployments.

WeightAlign: Normalizing Activations by Weight Alignment

Xiangwei Shi, Yunqiang Li, Xin Liu, Jan Van Gemert

Responsive image

Auto-TLDR; WeightAlign: Normalization of Activations without Sample Statistics

Slides Poster Similar

Batch normalization (BN) allows training very deep networks by normalizing activations by mini-batch sample statistics which renders BN unstable for small batch sizes. Current small-batch solutions such as Instance Norm, Layer Norm, and Group Norm use channel statistics which can be computed even for a single sample. Such methods are less stable than BN as they critically depend on the statistics of a single input sample. To address this problem, we propose a normalization of activation without sample statistics. We present WeightAlign: a method that normalizes the weights by the mean and scaled standard derivation computed within a filter, which normalizes activations without computing any sample statistics. Our proposed method is independent of batch size and stable over a wide range of batch sizes. Because weight statistics are orthogonal to sample statistics, we can directly combine WeightAlign with any method for activation normalization. We experimentally demonstrate these benefits for classification on CIFAR-10, CIFAR-100, ImageNet, for semantic segmentation on PASCAL VOC 2012 and for domain adaptation on Office-31.

Towards Tackling Multi-Label Imbalances in Remote Sensing Imagery

Dominik Koßmann, Thorsten Wilhelm, Gernot Fink

Responsive image

Auto-TLDR; Class imbalance in land cover datasets using attribute encoding schemes

Slides Poster Similar

Recent advances in automated image analysis have lead to an increased number of proposed datasets in remote sensing applications. This permits the successful employment of data hungry state-of-the-art deep neural networks. However, the Earth is not covered equally by semantically meaningful classes. Thus, many land cover datasets suffer from a severe class imbalance. We show that by taking appropriate measures, the performance in the minority classes can be improved by up to 30 percent without affecting the performance in the majority classes strongly. Additionally, we investigate the use of an attribute encoding scheme to represent the inherent class hierarchies commonly observed in land cover analysis.

Uncertainty-Sensitive Activity Recognition: A Reliability Benchmark and the CARING Models

Alina Roitberg, Monica Haurilet, Manuel Martinez, Rainer Stiefelhagen

Responsive image

Auto-TLDR; CARING: Calibrated Action Recognition with Input Guidance

Slides Similar

Beyond assigning the correct class, an activity recognition model should also to be able to determine, how certain it is in its predictions. We present the first study of how well the confidence values of modern action recognition architectures indeed reflect the probability of the correct outcome and propose a learning-based approach for improving it. First, we extend two popular action recognition datasets with a reliability benchmark in form of the expected calibration error and reliability diagrams. Since our evaluation highlights that confidence values of standard action recognition architectures do not represent the uncertainty well, we introduce a new approach which learns to transform the model output into realistic confidence estimates through an additional calibration network. The main idea of our Calibrated Action Recognition with Input Guidance (CARING) model is to learn an optimal scaling parameter depending on the video representation. We compare our model with the native action recognition networks and the temperature scaling approach - a wide spread calibration method utilized in image classification. While temperature scaling alone drastically improves the reliability of the confidence values, our CARING method consistently leads to the best uncertainty estimates in all benchmark settings.

Conditional Multi-Task Learning for Plant Disease Identification

Sue Han Lee, Herve Goëau, Pierre Bonnet, Alexis Joly

Responsive image

Auto-TLDR; A conditional multi-task learning approach for plant disease identification

Slides Poster Similar

Several recent studies have proposed an automatic plant disease identification system based on deep learning. Although successful, these approaches are generally based on learned classification models with target classes of joint host species-disease pairs that may not allow optimal use of the available information. This is due to the fact that they require distinguishing between similar host species or diseases. In fact, these approaches have limited scalability because the size of a network gradually increases as new classes are added, even if information on host species or diseases is already available. This constraint is all the more important as it can be difficult to collect/establish a specific list of all diseases for each host plant species in an actual application. In this paper, we address the problems by proposing a new conditional multi-task learning (CMTL) approach which allows the distribution of host species and disease characteristics learned simultaneously with a conditional link between them. This conditioning is formed in such a way that the knowledge to infer the prediction of one concept (the diseases) depends on the other concept (the host species), which corresponds to the way plant pathologists used to infer the diseases of the host species. We show that our approach can improve the performance of plant disease identification compared to the usual species-disease pair modeling in the previous studies. Meanwhile, we also compose a new dataset on plant disease identification that could serve as an important benchmark in this field.

Filtered Batch Normalization

András Horváth, Jalal Al-Afandi

Responsive image

Auto-TLDR; Batch Normalization with Out-of-Distribution Activations in Deep Neural Networks

Slides Poster Similar

It is a common assumption that the activation of different layers in neural networks follow Gaussian distribution. This distribution can be transformed using normalization techniques, such as batch-normalization, increasing convergence speed and improving accuracy. In this paper we would like to demonstrate, that activations do not necessarily follow Gaussian distribution in all layers. Neurons in deeper layers are more and more specific which can result extremely large, out-of-distribution activations. We will demonstrate that one can create more consistent mean and variance values for batch normalization during training by filtering out these activations which can further improve convergence speed and yield higher validation accuracy.

Norm Loss: An Efficient yet Effective Regularization Method for Deep Neural Networks

Theodoros Georgiou, Sebastian Schmitt, Thomas Baeck, Wei Chen, Michael Lew

Responsive image

Auto-TLDR; Weight Soft-Regularization with Oblique Manifold for Convolutional Neural Network Training

Slides Poster Similar

Convolutional neural network training can suffer from diverse issues like exploding or vanishing gradients, scaling-based weight space symmetry and covariant-shift. In order to address these issues, researchers develop weight regularization methods and activation normalization methods. In this work we propose a weight soft-regularization method based on the Oblique manifold. The proposed method uses a loss function which pushes each weight vector to have a norm close to one, i.e. the weight matrix is smoothly steered toward the so-called Oblique manifold. We evaluate our method on the very popular CIFAR-10, CIFAR-100 and ImageNet 2012 datasets using two state-of-the-art architectures, namely the ResNet and wide-ResNet. Our method introduces negligible computational overhead and the results show that it is competitive to the state-of-the-art and in some cases superior to it. Additionally, the results are less sensitive to hyperparameter settings such as batch size and regularization factor.

Revisiting the Training of Very Deep Neural Networks without Skip Connections

Oyebade Kayode Oyedotun, Abd El Rahman Shabayek, Djamila Aouada, Bjorn Ottersten

Responsive image

Auto-TLDR; Optimization of Very Deep PlainNets without shortcut connections with 'vanishing and exploding units' activations'

Slides Poster Similar

Deep neural networks (DNNs) with many layers of feature representations yield state-of-the-art results on several difficult learning tasks. However, optimizing very deep DNNs without shortcut connections known as PlainNets, is a notoriously hard problem. Considering the growing interest in this area, this paper investigates holistically two scenarios that plague the training of very deep PlainNets: (1) the relatively popular challenge of 'vanishing and exploding units' activations', and (2) the less investigated 'singularity' problem, which is studied in details in this paper. In contrast to earlier works that study only the saturation and explosion of units' activations in isolation, this paper harmonizes the inconspicuous coexistence of the aforementioned problems for very deep PlainNets. Particularly, we argue that the aforementioned problems would have to be tackled simultaneously for the successful training of very deep PlainNets. Finally, different techniques that can be employed for tackling the optimization problem are discussed, and a specific combination of simple techniques that allows the successful training of PlainNets having up to 100 layers is demonstrated.

A Systematic Investigation on Deep Architectures for Automatic Skin Lesions Classification

Pierluigi Carcagni, Marco Leo, Andrea Cuna, Giuseppe Celeste, Cosimo Distante

Responsive image

Auto-TLDR; RegNet: Deep Investigation of Convolutional Neural Networks for Automatic Classification of Skin Lesions

Slides Poster Similar

Computer vision-based techniques are more and more employed in healthcare and medical fields nowadays in order, principally, to be as a support to the experienced medical staff to help them to make a quick and correct diagnosis. One of the hot topics in this arena concerns the automatic classification of skin lesions. Several promising works exist about it, mainly leveraging Convolutional Neural Networks (CNN), but proposed pipeline mainly rely on complex data preprocessing and there is no systematic investigation about how available deep models can actually reach the accuracy needed for real applications. In order to overcome these drawbacks, in this work, an end-to-end pipeline is introduced and some of the most recent Convolutional Neural Networks (CNNs) architectures are included in it and compared on the largest common benchmark dataset recently introduced. To this aim, for the first time in this application context, a new network design paradigm, namely RegNet, has been exploited to get the best models among a population of configurations. The paper introduces a threefold level of contribution and novelty with respect the previous literature: the deep investigation of several CNN architectures driving to a consistent improvement of the lesions recognition accuracy, the exploitation of a new network design paradigm able to study the behavior of populations of models and a deep discussion about pro and cons of each analyzed method paving the path towards new research lines.

A Comparison of Neural Network Approaches for Melanoma Classification

Maria Frasca, Michele Nappi, Michele Risi, Genoveffa Tortora, Alessia Auriemma Citarella

Responsive image

Auto-TLDR; Classification of Melanoma Using Deep Neural Network Methodologies

Slides Poster Similar

Melanoma is the deadliest form of skin cancer and it is diagnosed mainly visually, starting from initial clinical screening and followed by dermoscopic analysis, biopsy and histopathological examination. A dermatologist’s recognition of melanoma may be subject to errors and may take some time to diagnose it. In this regard, deep learning can be useful in the study and classification of skin cancer. In particular, by classifying images with Deep Neural Network methodologies, it is possible to obtain comparable or even superior results compared to those of dermatologists. In this paper, we propose a methodology for the classification of melanoma by adopting different deep learning techniques applied to a common dataset, composed of images from the ISIC dataset and consisting of different types of skin diseases, including melanoma on which we applied a specific pre-processing phase. In particular, a comparison of the results is performed in order to select the best effective neural network to be applied to the problem of recognition and classification of melanoma. Moreover, we also evaluate the impact of the pre- processing phase on the final classification. Different metrics such as accuracy, sensitivity, and specificity have been selected to assess the goodness of the adopted neural networks and compare them also with the manual classification of dermatologists.

Quasibinary Classifier for Images with Zero and Multiple Labels

Liao Shuai, Efstratios Gavves, Changyong Oh, Cees Snoek

Responsive image

Auto-TLDR; Quasibinary Classifiers for Zero-label and Multi-label Classification

Slides Poster Similar

The softmax and binary classifier are commonly preferred for image classification applications. However, as softmax is specifically designed for categorical classification, it assumes each image has just one class label. This limits its applicability for problems where the number of labels does not equal one, most notably zero- and multi-label problems. In these challenging settings, binary classifiers are, in theory, better suited. However, as they ignore the correlation between classes, they are not as accurate and scalable in practice. In this paper, we start from the observation that the only difference between binary and softmax classifiers is their normalization function. Specifically, while the binary classifier self-normalizes its score, the softmax classifier combines the scores from all classes before normalization. On the basis of this observation we introduce a normalization function that is learnable, constant, and shared between classes and data points. By doing so, we arrive at a new type of binary classifier that we coin quasibinary classifier. We show in a variety of image classification settings, and on several datasets, that quasibinary classifiers are considerably better in classification settings where regular binary and softmax classifiers suffer, including zero-label and multi-label classification. What is more, we show that quasibinary classifiers yield well-calibrated probabilities allowing for direct and reliable comparisons, not only between classes but also between data points.

On-Manifold Adversarial Data Augmentation Improves Uncertainty Calibration

Kanil Patel, William Beluch, Dan Zhang, Michael Pfeiffer, Bin Yang

Responsive image

Auto-TLDR; On-Manifold Adversarial Data Augmentation for Uncertainty Estimation

Slides Similar

Uncertainty estimates help to identify ambiguous, novel, or anomalous inputs, but the reliable quantification of uncertainty has proven to be challenging for modern deep networks. To improve uncertainty estimation, we propose On-Manifold Adversarial Data Augmentation or OMADA, which specifically attempts to generate challenging examples by following an on-manifold adversarial attack path in the latent space of an autoencoder that closely approximates the decision boundaries between classes. On a variety of datasets and for multiple network architectures, OMADA consistently yields more accurate and better calibrated classifiers than baseline models, and outperforms competing approaches such as Mixup, as well as achieving similar performance to (at times better than) post-processing calibration methods such as temperature scaling. Variants of OMADA can employ different sampling schemes for ambiguous on-manifold examples based on the entropy of their estimated soft labels, which exhibit specific strengths for generalization, calibration of predicted uncertainty, or detection of out-of-distribution inputs.

BAT Optimized CNN Model Identifies Water Stress in Chickpea Plant Shoot Images

Shiva Azimi, Taranjit Kaur, Tapan Gandhi

Responsive image

Auto-TLDR; BAT Optimized ResNet-18 for Stress Classification of chickpea shoot images under water deficiency

Slides Poster Similar

Stress due to water deficiency in plants can significantly lower the agricultural yield. It can affect many visible plant traits such as size and surface area, the number of leaves and their color, etc. In recent years, computer vision-based plant phenomics has emerged as a promising tool for plant research and management. Such techniques have the advantage of being non-destructive, non-evasive, fast, and offer high levels of automation. Pulses like chickpeas play an important role in ensuring food security in poor countries owing to their high protein and nutrition content. In the present work, we have built a dataset comprising of two varieties of chickpea plant shoot images under different moisture stress conditions. Specifically, we propose a BAT optimized ResNet-18 model for classifying stress induced by water deficiency using chickpea shoot images. BAT algorithm identifies the optimal value of the mini-batch size to be used for training rather than employing the traditional manual approach of trial and error. Experimentation on two crop varieties (JG and Pusa) reveals that BAT optimized approach achieves an accuracy of 96% and 91% for JG and Pusa varieties that is better than the traditional method by 4%. The experimental results are also compared with state of the art CNN models like Alexnet, GoogleNet, and ResNet-50. The comparison results demonstrate that the proposed BAT optimized ResNet-18 model achieves higher performance than the comparison counterparts.

A Close Look at Deep Learning with Small Data

Lorenzo Brigato, Luca Iocchi

Responsive image

Auto-TLDR; Low-Complex Neural Networks for Small Data Conditions

Slides Poster Similar

In this work, we perform a wide variety of experiments with different Deep Learning architectures in small data conditions. We show that model complexity is a critical factor when only a few samples per class are available. Differently from the literature, we improve the state of the art using low complexity models. We show that standard convolutional neural networks with relatively few parameters are effective in this scenario. In many of our experiments, low complexity models outperform state-of-the-art architectures. Moreover, we propose a novel network that uses an unsupervised loss to regularize its training. Such architecture either improves the results either performs comparably well to low capacity networks. Surprisingly, experiments show that the dynamic data augmentation pipeline is not beneficial in this particular domain. Statically augmenting the dataset might be a promising research direction while dropout maintains its role as a good regularizer.

Knowledge Distillation Beyond Model Compression

Fahad Sarfraz, Elahe Arani, Bahram Zonooz

Responsive image

Auto-TLDR; Knowledge Distillation from Teacher to Student

Slides Poster Similar

Knowledge distillation (KD) is commonly deemed as an effective model compression technique in which a compact model (student) is trained under the supervision of a larger pretrained model or an ensemble of models (teacher). Various techniques have been proposed since the original formulation, which mimics different aspects of the teacher such as the representation space, decision boundary or intra-data relationship. Some methods replace the one way knowledge distillation from a static teacher with collaborative learning between a cohort of students. Despite the recent advances, a clear understanding of where knowledge resides in a deep neural network and optimal method for capturing knowledge from teacher and transferring it to student still remains an open question. In this study we provide an extensive study on 9 different knowledge distillation methods which covers a broad spectrum of approaches to capture and transfer knowledge. We demonstrate the versatility of the KD framework on different datasets and network architectures under varying capacity gaps between the teacher and student. The study provides intuition for the effects of mimicking different aspects of the teacher and derives insights from the performance of the different distillation approaches to guide the the design of more effective KD methods . Furthermore, our study shows the effectiveness of the KD framework in learning efficiently under varying severity levels of label noise and class imbalance, consistently providing significant generalization gains over standard training. We emphasize that the efficacy of KD goes much beyond a model compression technique and should be considered as a general purpose training paradigm which offers more robustness to common challenges in the real-world datasets compared to the standard training procedure.

Stochastic Label Refinery: Toward Better Target Label Distribution

Xi Fang, Jiancheng Yang, Bingbing Ni

Responsive image

Auto-TLDR; Stochastic Label Refinery for Deep Supervised Learning

Slides Poster Similar

This paper proposes a simple yet effective strategy for improving deep supervised learning, named Stochastic Label Refinery (SLR), by refining training labels to more informative labels. When training a neural network, target distributions (or ground-truth) are typically "hard", which means the target label of each category consists of only 0 and 1. However, the fixed "hard" target distributions do not capture association between categories or that between objects. In this study, instead of using the hard target distributions, we iteratively generate "soft" target label distributions for training the neural networks, which leads to better performances. The soft target distributions are obtained via an Expectation-Maximization (EM) iteration, where the "true" target distributions and the learned models are regarded as hidden variables. In E step, the models are optimized to approximate the target distributions on stochastic splits of training data; In M step, the target distributions are updated with predicted pseudo-label on leave-out splits. Extensive experiments on classification and ordinal regression tasks, empirically prove that the refined target distribution consistently leads to considerable performance improvements even applied on competitive baselines. Notably, in DeepDR 2020 Diabetic Retinopathy Grading (DeepDRiD) challenge, our method improves the quadratic weighted kappa on official validation set from 0.8247 to 0.8348 and achieves a state-of-the-art score on online test set. The proposed SLR technique is easy to implement and practically applicable. The code will be open sourced soon.

Bridging the Gap between Natural and Medical Images through Deep Colorization

Lia Morra, Luca Piano, Fabrizio Lamberti, Tatiana Tommasi

Responsive image

Auto-TLDR; Transfer Learning for Diagnosis on X-ray Images Using Color Adaptation

Slides Poster Similar

Deep learning has thrived by training on large-scale datasets. However, in many applications, as for medical image diagnosis, getting massive amount of data is still prohibitive due to privacy, lack of acquisition homogeneity and annotation cost. In this scenario transfer learning from natural image collections is a standard practice that attempts to tackle shape, texture and color discrepancy all at once through pretrained model fine-tuning. In this work we propose to disentangle those challenges and design a dedicated network module that focuses on color adaptation. We combine learning from scratch of the color module with transfer learning of different classification backbones obtaining an end-to-end, easy-to-train architecture for diagnostic image recognition on X-ray images. Extensive experiments show how our approach is particularly efficient in case of data scarcity and provides a new path for further transferring the learned color information across multiple medical datasets.

Trainable Spectrally Initializable Matrix Transformations in Convolutional Neural Networks

Michele Alberti, Angela Botros, Schuetz Narayan, Rolf Ingold, Marcus Liwicki, Mathias Seuret

Responsive image

Auto-TLDR; Trainable and Spectrally Initializable Matrix Transformations for Neural Networks

Slides Poster Similar

In this work, we introduce a new architectural component to Neural Networks (NN), i.e., trainable and spectrally initializable matrix transformations on feature maps. While previous literature has already demonstrated the possibility of adding static spectral transformations as feature processors, our focus is on more general trainable transforms. We study the transforms in various architectural configurations on four datasets of different nature: from medical (ColorectalHist, HAM10000) and natural (Flowers) images to historical documents (CB55). With rigorous experiments that control for the number of parameters and randomness, we show that networks utilizing the introduced matrix transformations outperform vanilla neural networks. The observed accuracy increases appreciably across all datasets. In addition, we show that the benefit of spectral initialization leads to significantly faster convergence, as opposed to randomly initialized matrix transformations. The transformations are implemented as auto-differentiable PyTorch modules that can be incorporated into any neural network architecture. The entire code base is open-source.

Investigating and Exploiting Image Resolution for Transfer Learning-Based Skin Lesion Classification

Amirreza Mahbod, Gerald Schaefer, Chunliang Wang, Rupert Ecker, Georg Dorffner, Isabella Ellinger

Responsive image

Auto-TLDR; Fine-tuned Neural Networks for Skin Lesion Classification Using Dermoscopic Images

Slides Poster Similar

Skin cancer is among the most common cancer types. Dermoscopic image analysis improves the diagnostic accuracy for detection of malignant melanoma and other pigmented skin lesions when compared to unaided visual inspection. Hence, computer-based methods to support medical experts in the diagnostic procedure are of great interest. Fine-tuning pre-trained convolutional neural networks (CNNs) has been shown to work well for skin lesion classification. Pre-trained CNNs are usually trained with natural images of a fixed image size which is typically significantly smaller than captured skin lesion images and consequently dermoscopic images are downsampled for fine-tuning. However, useful medical information may be lost during this transformation. In this paper, we explore the effect of input image size on skin lesion classification performance of fine-tuned CNNs. For this, we resize dermoscopic images to different resolutions, ranging from 64x64 to 768x768 pixels and investigate the resulting classification performance of three well-established CNNs, namely DenseNet-121, ResNet-18, and ResNet-50. Our results show that using very small images (of size 64x64 pixels) degrades the classification performance, while images of size 128x128 pixels and above support good performance with larger image sizes leading to slightly improved classification. We further propose a novel fusion approach based on a three-level ensemble strategy that exploits multiple fine-tuned networks trained with dermoscopic images at various sizes. When applied on the ISIC 2017 skin lesion classification challenge, our fusion approach yields an area under the receiver operating characteristic curve of 89.2% and 96.6% for melanoma classification and seborrheic keratosis classification, respectively, outperforming state-of-the-art algorithms.

CQNN: Convolutional Quadratic Neural Networks

Pranav Mantini, Shishir Shah

Responsive image

Auto-TLDR; Quadratic Neural Network for Image Classification

Slides Poster Similar

Image classification is a fundamental task in computer vision. A variety of deep learning models based on the Convolutional Neural Network (CNN) architecture have proven to be an efficient solution. Numerous improvements have been proposed over the years, where broader, deeper, and denser networks have been constructed. However, the atomic operation for these models has remained a linear unit (single neuron). In this work, we pursue an alternative dimension by hypothesizing the atomic operation to be performed by a quadratic unit. We construct convolutional layers using quadratic neurons for feature extraction and subsequently use dense layers for classification. We perform analysis to quantify the implication of replacing linear neurons with quadratic units. Results show a keen improvement in classification accuracy with quadratic neurons over linear neurons.

MaxDropout: Deep Neural Network Regularization Based on Maximum Output Values

Claudio Filipi Gonçalves Santos, Danilo Colombo, Mateus Roder, Joao Paulo Papa

Responsive image

Auto-TLDR; MaxDropout: A Regularizer for Deep Neural Networks

Slides Poster Similar

Different techniques have emerged in the deep learning scenario, such as Convolutional Neural Networks, Deep Belief Networks, and Long Short-Term Memory Networks, to cite a few. In lockstep, regularization methods, which aim to prevent overfitting by penalizing the weight connections, or turning off some units, have been widely studied either. In this paper, we present a novel approach called MaxDropout, a regularizer for deep neural network models that works in a supervised fashion by removing (shutting off) the prominent neurons (i.e., most active) in each hidden layer. The model forces fewer activated units to learn more representative information, thus providing sparsity. Regarding the experiments, we show that it is possible to improve existing neural networks and provide better results in neural networks when Dropout is replaced by MaxDropout. The proposed method was evaluated in image classification, achieving comparable results to existing regularizers, such as Cutout and RandomErasing, also improving the accuracy of neural networks that uses Dropout by replacing the existing layer by MaxDropout.

Minority Class Oriented Active Learning for Imbalanced Datasets

Umang Aggarwal, Adrian Popescu, Celine Hudelot

Responsive image

Auto-TLDR; Active Learning for Imbalanced Datasets

Slides Poster Similar

Active learning aims to optimize the dataset annotation process when resources are constrained. Most existing methods are designed for balanced datasets. Their practical applicability is limited by the fact that a majority of real-life datasets are actually imbalanced. Here, we introduce a new active learning method which is designed for imbalanced datasets. It favors samples likely to be in minority classes so as to reduce the imbalance of the labeled subset and create a better representation for these classes. We also compare two training schemes for active learning: (1) the one commonly deployed in deep active learning using model fine tuning for each iteration and (2) a scheme which is inspired by transfer learning and exploits generic pre-trained models and train shallow classifiers for each iteration. Evaluation is run with three imbalanced datasets. Results show that the proposed active learning method outperforms competitive baselines. Equally interesting, they also indicate that the transfer learning training scheme outperforms model fine tuning if features are transferable from the generic dataset to the unlabeled one. This last result is surprising and should encourage the community to explore the design of deep active learning methods.

Verifying the Causes of Adversarial Examples

Honglin Li, Yifei Fan, Frieder Ganz, Tony Yezzi, Payam Barnaghi

Responsive image

Auto-TLDR; Exploring the Causes of Adversarial Examples in Neural Networks

Slides Poster Similar

The robustness of neural networks is challenged by adversarial examples that contain almost imperceptible perturbations to inputs which mislead a classifier to incorrect outputs in high confidence. Limited by the extreme difficulty in examining a high-dimensional image space thoroughly, research on explaining and justifying the causes of adversarial examples falls behind studies on attacks and defenses. In this paper, we present a collection of potential causes of adversarial examples and verify (or partially verify) them through carefully-designed controlled experiments. The major causes of adversarial examples include model linearity, one-sum constraint, and geometry of the categories. To control the effect of those causes, multiple techniques are applied such as $L_2$ normalization, replacement of loss functions, construction of reference datasets, and novel models using multi-layer perceptron probabilistic neural networks (MLP-PNN) and density estimation (DE). Our experiment results show that geometric factors tend to be more direct causes and statistical factors magnify the phenomenon, especially for assigning high prediction confidence. We hope this paper will inspire more studies to rigorously investigate the root causes of adversarial examples, which in turn provide useful guidance on designing more robust models.

The Color Out of Space: Learning Self-Supervised Representations for Earth Observation Imagery

Stefano Vincenzi, Angelo Porrello, Pietro Buzzega, Marco Cipriano, Pietro Fronte, Roberto Cuccu, Carla Ippoliti, Annamaria Conte, Simone Calderara

Responsive image

Auto-TLDR; Satellite Image Representation Learning for Remote Sensing

Slides Poster Similar

The recent growth in the number of satellite images fosters the development of effective deep-learning techniques for Remote Sensing (RS). However, their full potential is untapped due to the lack of large annotated datasets. Such a problem is usually countered by fine-tuning a feature extractor that is previously trained on the ImageNet dataset. Unfortunately, the domain of natural images differs from the RS one, which hinders the final performance. In this work, we propose to learn meaningful representations from satellite imagery, leveraging its high-dimensionality spectral bands to reconstruct the visible colors. We conduct experiments on land cover classification (BigEarthNet) and West Nile Virus detection, showing that colorization is a solid pretext task for training a feature extractor. Furthermore, we qualitatively observe that guesses based on natural images and colorization rely on different parts of the input. This paves the way to an ensemble model that eventually outperforms both the above-mentioned techniques.

Is the Meta-Learning Idea Able to Improve the Generalization of Deep Neural Networks on the Standard Supervised Learning?

Xiang Deng, Zhongfei Zhang

Responsive image

Auto-TLDR; Meta-learning Based Training of Deep Neural Networks for Few-Shot Learning

Slides Poster Similar

Substantial efforts have been made on improving the generalization abilities of deep neural networks (DNNs) in order to obtain better performances without introducing more parameters. On the other hand, meta-learning approaches exhibit powerful generalization on new tasks in few-shot learning. Intuitively, few-shot learning is more challenging than the standard supervised learning as each target class only has a very few or no training samples. The natural question that arises is whether the meta-learning idea can be used for improving the generalization of DNNs on the standard supervised learning. In this paper, we propose a novel meta-learning based training procedure (MLTP) for DNNs and demonstrate that the meta-learning idea can indeed improve the generalization abilities of DNNs. MLTP simulates the meta-training process by considering a batch of training samples as a task. The key idea is that the gradient descent step for improving the current task performance should also improve a new task performance, which is ignored by the current standard procedure for training neural networks. MLTP also benefits from all the existing training techniques such as dropout, weight decay, and batch normalization. We evaluate MLTP by training a variety of small and large neural networks on three benchmark datasets, i.e., CIFAR-10, CIFAR-100, and Tiny ImageNet. The experimental results show a consistently improved generalization performance on all the DNNs with different sizes, which verifies the promise of MLTP and demonstrates that the meta-learning idea is indeed able to improve the generalization of DNNs on the standard supervised learning.

Dealing with Scarce Labelled Data: Semi-Supervised Deep Learning with Mix Match for Covid-19 Detection Using Chest X-Ray Images

Saúl Calderón Ramirez, Raghvendra Giri, Shengxiang Yang, Armaghan Moemeni, Mario Umaña, David Elizondo, Jordina Torrents-Barrena, Miguel A. Molina-Cabello

Responsive image

Auto-TLDR; Semi-supervised Deep Learning for Covid-19 Detection using Chest X-rays

Slides Poster Similar

Coronavirus (Covid-19) is spreading fast, infecting people through contact in various forms including droplets from sneezing and coughing. Therefore, the detection of infected subjects in an early, quick and cheap manner is urgent. Currently available tests are scarce and limited to people in danger of serious illness. The application of deep learning to chest X-ray images for Covid-19 detection is an attractive approach. However, this technology usually relies on the availability of large labelled datasets, a requirement hard to meet in the context of a virus outbreak. To overcome this challenge, a semi-supervised deep learning model using both labelled and unlabelled data is proposed. We developed and tested a semi-supervised deep learning framework based on the Mix Match architecture to classify chest X-rays into Covid-19, pneumonia and healthy cases. The presented approach was calibrated using two publicly available datasets. The results show an accuracy increase of around $15\%$ under low labelled / unlabelled data ratio. This indicates that our semi-supervised framework can help improve performance levels towards Covid-19 detection when the amount of high-quality labelled data is scarce. Also, we introduce a semi-supervised deep learning boost coefficient which is meant to ease the scalability of our approach and performance comparison.

Enhancing Semantic Segmentation of Aerial Images with Inhibitory Neurons

Ihsan Ullah, Sean Reilly, Michael Madden

Responsive image

Auto-TLDR; Lateral Inhibition in Deep Neural Networks for Object Recognition and Semantic Segmentation

Slides Poster Similar

In a Convolutional Neural Network, each neuron in the output feature map takes input from the neurons in its receptive field. This receptive field concept plays a vital role in today's deep neural networks. However, inspired by neuro-biological research, it has been proposed to add inhibitory neurons outside the receptive field, which may enhance the performance of neural network models. In this paper, we begin with deep network architectures such as VGG and ResNet, and propose an approach to add lateral inhibition in each output neuron to reduce its impact on its neighbours, both in fine-tuning pre-trained models and training from scratch. Our experiments show that notable improvements upon prior baseline deep models can be achieved. A key feature of our approach is that it is easy to add to baseline models; it can be adopted in any model containing convolution layers, and we demonstrate its value in applications including object recognition and semantic segmentation of aerial images, where we show state-of-the-art result on the Aeroscape dataset. On semantic segmentation tasks, our enhancement shows 17.43% higher mIoU than a single baseline model on a single source (the Aeroscape dataset), 13.43% higher performance than an ensemble model on the same single source, and 7.03% higher than an ensemble model on multiple sources (segmentation datasets). Our experiments illustrate the potential impact of using inhibitory neurons in deep learning models, and they also show better results than the baseline models that have standard convolutional layer.

ARCADe: A Rapid Continual Anomaly Detector

Ahmed Frikha, Denis Krompass, Volker Tresp

Responsive image

Auto-TLDR; ARCADe: A Meta-Learning Approach for Continuous Anomaly Detection

Slides Poster Similar

Although continual learning and anomaly detection have separately been well-studied in previous works, their intersection remains rather unexplored. The present work addresses a learning scenario where a model has to incrementally learn a sequence of anomaly detection tasks, i.e. tasks from which only examples from the normal (majority) class are available for training. We define this novel learning problem of continual anomaly detection (CAD) and formulate it as a meta-learning problem. Moreover, we propose \emph{A Rapid Continual Anomaly Detector (ARCADe)}, an approach to train neural networks to be robust against the major challenges of this new learning problem, namely catastrophic forgetting and overfitting to the majority class. The results of our experiments on three datasets show that, in the CAD problem setting, ARCADe substantially outperforms baselines from the continual learning and anomaly detection literature. Finally, we provide deeper insights into the learning strategy yielded by the proposed meta-learning algorithm.

Contextual Classification Using Self-Supervised Auxiliary Models for Deep Neural Networks

Sebastian Palacio, Philipp Engler, Jörn Hees, Andreas Dengel

Responsive image

Auto-TLDR; Self-Supervised Autogenous Learning for Deep Neural Networks

Slides Poster Similar

Classification problems solved with deep neural networks (DNNs) typically rely on a closed world paradigm, and optimize over a single objective (e.g., minimization of the cross- entropy loss). This setup dismisses all kinds of supporting signals that can be used to reinforce the existence or absence of particular patterns. The increasing need for models that are interpretable by design makes the inclusion of said contextual signals a crucial necessity. To this end, we introduce the notion of Self-Supervised Autogenous Learning (SSAL). A SSAL objective is realized through one or more additional targets that are derived from the original supervised classification task, following architectural principles found in multi-task learning. SSAL branches impose low-level priors into the optimization process (e.g., grouping). The ability of using SSAL branches during inference, allow models to converge faster, focusing on a richer set of class-relevant features. We equip state-of-the-art DNNs with SSAL objectives and report consistent improvements for all of them on CIFAR100 and Imagenet. We show that SSAL models outperform similar state-of-the-art methods focused on contextual loss functions, auxiliary branches and hierarchical priors.

Rethinking of Deep Models Parameters with Respect to Data Distribution

Shitala Prasad, Dongyun Lin, Yiqun Li, Sheng Dong, Zaw Min Oo

Responsive image

Auto-TLDR; A progressive stepwise training strategy for deep neural networks

Slides Poster Similar

The performance of deep learning models are driven by various parameters but to tune all of them every time, for every dataset, is a heuristic practice. In this paper, unlike the common practice of decaying the learning rate, we propose a step-wise training strategy where the learning rate and the batch size are tuned based on the dataset size. Here, the given dataset size is progressively increased during the training to boost the network performance without saturating the learning curve, after certain epochs. We conducted extensive experiments on multiple networks and datasets to validate the proposed training strategy. The experimental results proves our hypothesis that the learning rate, the batch size and the data size are interrelated and can improve the network accuracy if an optimal progressive stepwise training strategy is applied. The proposed strategy also the overall training computational cost is reduced.

Modeling the Distribution of Normal Data in Pre-Trained Deep Features for Anomaly Detection

Oliver Rippel, Patrick Mertens, Dorit Merhof

Responsive image

Auto-TLDR; Deep Feature Representations for Anomaly Detection in Images

Slides Poster Similar

Anomaly Detection (AD) in images is a fundamental computer vision problem and refers to identifying images and/or image substructures that deviate significantly from the norm. Popular AD algorithms commonly try to learn a model of normality from scratch using task specific datasets, but are limited to semi-supervised approaches employing mostly normal data due to the inaccessibility of anomalies on a large scale combined with the ambiguous nature of anomaly appearance. We follow an alternative approach and demonstrate that deep feature representations learned by discriminative models on large natural image datasets are well suited to describe normality and detect even subtle anomalies. Our model of normality is established by fitting a multivariate Gaussian to deep feature representations of classification networks trained on ImageNet using normal data only in a transfer learning setting. By subsequently applying the Mahalanobis distance as the anomaly score we outperform the current state of the art on the public MVTec AD dataset, achieving an Area Under the Receiver Operating Characteristic curve of 95.8 +- 1.2 % (mean +- SEM) over all 15 classes. We further investigate why the learned representations are discriminative to the AD task using Principal Component Analysis. We find that the principal components containing little variance in normal data are the ones crucial for discriminating between normal and anomalous instances. This gives a possible explanation to the often sub-par performance of AD approaches trained from scratch using normal data only. By selectively fitting a multivariate Gaussian to these most relevant components only, we are able to further reduce model complexity while retaining AD performance. We also investigate setting the working point by selecting acceptable False Positive Rate thresholds based on the multivariate Gaussian assumption.

Meta Soft Label Generation for Noisy Labels

Görkem Algan, Ilkay Ulusoy

Responsive image

Auto-TLDR; MSLG: Meta-Learning for Noisy Label Generation

Slides Poster Similar

The existence of noisy labels in the dataset causes significant performance degradation for deep neural networks (DNNs). To address this problem, we propose a Meta Soft Label Generation algorithm called MSLG, which can jointly generate soft labels using meta-learning techniques and learn DNN parameters in an end-to-end fashion. Our approach adapts the meta-learning paradigm to estimate optimal label distribution by checking gradient directions on both noisy training data and noise-free meta-data. In order to iteratively update soft labels, meta-gradient descent step is performed on estimated labels, which would minimize the loss of noise-free meta samples. In each iteration, the base classifier is trained on estimated meta labels. MSLG is model-agnostic and can be added on top of any existing model at hand with ease. We performed extensive experiments on CIFAR10, Clothing1M and Food101N datasets. Results show that our approach outperforms other state-of-the-art methods by a large margin. Our code is available at \url{https://github.com/gorkemalgan/MSLG_noisy_label}.

Recursive Convolutional Neural Networks for Epigenomics

Aikaterini Symeonidi, Anguelos Nicolaou, Frank Johannes, Vincent Christlein

Responsive image

Auto-TLDR; Recursive Convolutional Neural Networks for Epigenomic Data Analysis

Slides Poster Similar

Deep learning for epigenomic data analysis has demonstrated to be very promising for analysis of genomic and epigenomic data. In this paper we introduce the use of Recursive Convolutional Neural Networks (RCNN) as tool for epigenomic data analysis. We focus on the task of predicting gene expression from the intensity of histone modifications. The proposed RCNN architecture can be applied on data of an arbitrary size and has a single meta-parameter that quantifies the models capacity making it flexible for experimenting. The proposed architecture outperforms state-of-the-art systems while having several orders of magnitude fewer parameters.

Multi-Attribute Learning with Highly Imbalanced Data

Lady Viviana Beltran Beltran, Mickaël Coustaty, Nicholas Journet, Juan C. Caicedo, Antoine Doucet

Responsive image

Auto-TLDR; Data Imbalance in Multi-Attribute Deep Learning Models: Adaptation to face each one of the problems derived from imbalance

Slides Poster Similar

Data is one of the most important keys for success when studying a simple or a complex phenomenon. With the use of deep-learning exploding and its democratization, non-computer science experts may struggle to use highly complex deep learning architectures, even when straightforward models offer them suitable performances. In this article, we study the specific and common problem of data imbalance in real databases as most of the bad performance problems are due to the data itself. We review two points: first, when the data contains different levels of imbalance. Classical imbalanced learning strategies cannot be directly applied when using multi-attribute deep learning models, i.e., multi-task and multi-label architectures. Therefore, one of our contributions is our proposed adaptations to face each one of the problems derived from imbalance. Second, we demonstrate that with little to no imbalance, straightforward deep learning models work well. However, for non-experts, these models can be seen as black boxes, where all the effort is put in pre-processing the data. To simplify the problem, we performed the classification task ignoring information that is costly to extract, such as part localization which is widely used in the state of the art of attribute classification. We make use of a widely known attribute database, CUB-200-2011 - CUB as our main use case due to its deeply imbalanced nature, along with two better structured databases: celebA and Awa2. All of them contain multi-attribute annotations. The results of highly fine-grained attribute learning over CUB demonstrate that in the presence of imbalance, by using our proposed strategies is possible to have competitive results against the state of the art, while taking advantage of multi-attribute deep learning models. We also report results for two better-structured databases over which our models over-perform the state of the art.

On Resource-Efficient Bayesian Network Classifiers and Deep Neural Networks

Wolfgang Roth, Günther Schindler, Holger Fröning, Franz Pernkopf

Responsive image

Auto-TLDR; Quantization-Aware Bayesian Network Classifiers for Small-Scale Scenarios

Slides Poster Similar

We present two methods to reduce the complexity of Bayesian network (BN) classifiers. First, we introduce quantization-aware training using the straight-through gradient estimator to quantize the parameters of BNs to few bits. Second, we extend a recently proposed differentiable tree-augmented naive Bayes (TAN) structure learning approach to also consider the model size. Both methods are motivated by recent developments in the deep learning community, and they provide effective means to trade off between model size and prediction accuracy, which is demonstrated in extensive experiments. Furthermore, we contrast quantized BN classifiers with quantized deep neural networks (DNNs) for small-scale scenarios which have hardly been investigated in the literature. We show Pareto optimal models with respect to model size, number of operations, and test error and find that both model classes are viable options.

Towards Robust Learning with Different Label Noise Distributions

Diego Ortego, Eric Arazo, Paul Albert, Noel E O'Connor, Kevin Mcguinness

Responsive image

Auto-TLDR; Distribution Robust Pseudo-Labeling with Semi-supervised Learning

Slides Similar

Noisy labels are an unavoidable consequence of labeling processes and detecting them is an important step towards preventing performance degradations in Convolutional Neural Networks. Discarding noisy labels avoids a harmful memorization, while the associated image content can still be exploited in a semi-supervised learning (SSL) setup. Clean samples are usually identified using the small loss trick, i.e. they exhibit a low loss. However, we show that different noise distributions make the application of this trick less straightforward and propose to continuously relabel all images to reveal a discriminative loss against multiple distributions. SSL is then applied twice, once to improve the clean-noisy detection and again for training the final model. We design an experimental setup based on ImageNet32/64 for better understanding the consequences of representation learning with differing label noise distributions and find that non-uniform out-of-distribution noise better resembles real-world noise and that in most cases intermediate features are not affected by label noise corruption. Experiments in CIFAR-10/100, ImageNet32/64 and WebVision (real-world noise) demonstrate that the proposed label noise Distribution Robust Pseudo-Labeling (DRPL) approach gives substantial improvements over recent state-of-the-art. Code will be made available.

Data Normalization for Bilinear Structures in High-Frequency Financial Time-Series

Dat Thanh Tran, Juho Kanniainen, Moncef Gabbouj, Alexandros Iosifidis

Responsive image

Auto-TLDR; Bilinear Normalization for Financial Time-Series Analysis and Forecasting

Slides Poster Similar

Financial time-series analysis and forecasting have been extensively studied over the past decades, yet still remain as a very challenging research topic. Since the financial market is inherently noisy and stochastic, a majority of financial time-series of interests are non-stationary, and often obtained from different modalities. This property presents great challenges and can significantly affect the performance of the subsequent analysis/forecasting steps. Recently, the Temporal Attention augmented Bilinear Layer (TABL) has shown great performances in tackling financial forecasting problems. In this paper, by taking into account the nature of bilinear projections in TABL networks, we propose Bilinear Normalization (BiN), a simple, yet efficient normalization layer to be incorporated into TABL networks to tackle potential problems posed by non-stationarity and multimodalities in the input series. Our experiments using a large scale Limit Order Book (LOB) consisting of more than 4 million order events show that BiN-TABL outperforms TABL networks using other state-of-the-arts normalization schemes by a large margin.

P-DIFF: Learning Classifier with Noisy Labels Based on Probability Difference Distributions

Wei Hu, Qihao Zhao, Yangyu Huang, Fan Zhang

Responsive image

Auto-TLDR; P-DIFF: A Simple and Effective Training Paradigm for Deep Neural Network Classifier with Noisy Labels

Slides Poster Similar

Learning deep neural network (DNN) classifier with noisy labels is a challenging task because the DNN can easily over- fit on these noisy labels due to its high capability. In this paper, we present a very simple but effective training paradigm called P-DIFF, which can train DNN classifiers but obviously alleviate the adverse impact of noisy labels. Our proposed probability difference distribution implicitly reflects the probability of a training sample to be clean, then this probability is employed to re-weight the corresponding sample during the training process. P-DIFF can also achieve good performance even without prior- knowledge on the noise rate of training samples. Experiments on benchmark datasets also demonstrate that P-DIFF is superior to the state-of-the-art sample selection methods.

Estimation of Abundance and Distribution of SaltMarsh Plants from Images Using Deep Learning

Jayant Parashar, Suchendra Bhandarkar, Jacob Simon, Brian Hopkinson, Steven Pennings

Responsive image

Auto-TLDR; CNN-based approaches to automated plant identification and localization in salt marsh images

Poster Similar

Recent advances in computer vision and machine learning, most notably deep convolutional neural networks (CNNs), are exploited to identify and localize various plant species in salt marsh images. Three different approaches are explored that provide estimations of abundance and spatial distribution at varying levels of granularity in terms of spatial resolution. In the coarsest-grained approach, CNNs are tasked with identifying which of six plant species are present/absent in large patches within the salt marsh images. CNNs with diverse topological properties and attention mechanisms are shown capable of providing accurate estimations with >90 % precision and recall in the case of the more abundant plant species whereas the performance declines for less common plant species. Estimation of percent cover of each plant species is performed at a finer spatial resolution, where smaller image patches are extracted and the CNNs tasked with identifying the plant species or substrate at the center of the image patch. For the percent cover estimation task, the CNNs are observed to exhibit a performance profile similar to that for the presence/absence estimation task, but with an ~ 5-10% reduction in precision and recall. Finally, fine-grained estimation of the spatial distribution of the various plant species is performed via semantic segmentation. The Deeplab-V3 semantic segmentation architecture is observed to provide very accurate estimations for abundant plant species; however,a significant degradation in performance is observed in the case of less abundant plant species and, in extreme cases, rare plant classes are seen to be ignored entirely. Overall, a clear trade-off is observed between the CNN estimation quality and the spatial resolution of the underlying estimation thereby offering guidance for ecological applications of CNN-based approaches to automated plant identification and localization in salt marsh images.

Can Data Placement Be Effective for Neural Networks Classification Tasks? Introducing the Orthogonal Loss

Brais Cancela, Veronica Bolon-Canedo, Amparo Alonso-Betanzos

Responsive image

Auto-TLDR; Spatial Placement for Neural Network Training Loss Functions

Slides Poster Similar

Traditionally, a Neural Network classification training loss function follows the same principle: minimizing the distance between samples that belong to the same class, while maximizing the distance to the other classes. There are no restrictions on the spatial placement of deep features (last layer input). This paper addresses this issue when dealing with Neural Networks, providing a set of loss functions that are able to train a classifier by forcing the deep features to be projected over a predefined orthogonal basis. Experimental results shows that these `data placement' functions can overcome the training accuracy provided by the classic cross-entropy loss function.

Dual Stream Network with Selective Optimization for Skin Disease Recognition in Consumer Grade Images

Krishnam Gupta, Jaiprasad Rampure, Monu Krishnan, Ajit Narayanan, Nikhil Narayan

Responsive image

Auto-TLDR; A Deep Network Architecture for Skin Disease Localisation and Classification on Consumer Grade Images

Slides Poster Similar

Skin disease localisation and classification on consumer-grade images is more challenging compared to that on dermoscopic imaging. Consumer grade images refer to the images taken using commonly available imaging devices such as a mobile camera or a hand held digital camera. Such images, in addition to having the skin condition of interest in a very small area of the image, has other noisy non-clinical details introduced due to the lighting conditions and the distance of the hand held device from the anatomy at the time of acquisition. We propose a novel deep network architecture \& a new optimization strategy for classification with implicit localisation of skin diseases from clinical/consumer grade images. A weakly supervised segmentation algorithm is first employed to extract Region of Interests (RoI) from the image, the RoI and the original image form the two input streams of the proposed architecture. Each stream of the architecture learns high level and low level features from the original image and the RoI, respectively. The two streams are independently optimised until the loss stops decreasing after which both the streams are optimised collectively with the help of a third combiner sub-network. Such a strategy resulted in a 5% increase of accuracy over the current state-of-the-art methods on SD-198 dataset, which is publicly available. The proposed algorithm is also validated on a new dataset containing over 12,000 images across 75 different skin conditions. We intend to release this dataset as SD-75 to aid in the advancement of research on skin condition classification on consumer grade images.

Supporting Skin Lesion Diagnosis with Content-Based Image Retrieval

Stefano Allegretti, Federico Bolelli, Federico Pollastri, Sabrina Longhitano, Giovanni Pellacani, Costantino Grana

Responsive image

Auto-TLDR; Skin Images Retrieval Using Convolutional Neural Networks for Skin Lesion Classification and Segmentation

Slides Poster Similar

Given the relevance of skin cancer, many attempts have been dedicated to the creation of automated devices that could assist both expert and beginner dermatologists towards fast and early diagnosis of skin lesions. In recent years, tasks such as skin lesion classification and segmentation have been extensively addressed with deep learning algorithms, which in some cases reach a diagnostic accuracy comparable to that of expert physicians. However, the general lack of interpretability and reliability severely hinders the ability of those approaches to actually support dermatologists in the diagnosis process. In this paper a novel skin images retrieval system is presented, which exploits features extracted by Convolutional Neural Networks to gather similar images from a publicly available dataset, in order to assist the diagnosis process of both expert and novice practitioners. In the proposed framework, Resnet-50 is initially trained for the classification of dermoscopic images; then, the feature extraction part is isolated, and an embedding network is build on top of it. The embedding learns an alternative representation, which allows to check image similarity by means of a distance measure. Experimental results reveal that the proposed method is able to select meaningful images, which can effectively boost the classification accuracy of human dermatologists.

IDA-GAN: A Novel Imbalanced Data Augmentation GAN

Hao Yang, Yun Zhou

Responsive image

Auto-TLDR; IDA-GAN: Generative Adversarial Networks for Imbalanced Data Augmentation

Slides Poster Similar

Class imbalance is a widely existed and challenging problem in real-world applications such as disease diagnosis, fraud detection, network intrusion detection and so on. Due to the scarce of data, it could significantly deteriorate the accuracy of classification. To address this challenge, we propose a novel Imbalanced Data Augmentation Generative Adversarial Networks (GAN) named IDA-GAN as an augmentation tool to deal with the imbalanced dataset. This is a great challenge because it is hard to train a GAN model under this situation. We overcome this issue by coupling Variational autoencoder along with GAN training. Specifically, we introduce the Variational autoencoder to learn the majority and minority class distributions in the latent space, and use the generative model to utilize each class distribution for the subsequent GAN training. The generative model learns useful features to generate target minority-class samples. By comparing with the state-of-the-art GAN models, the experimental results demonstrate that our proposed IDA-GAN could generate more diverse minority samples with better qualities, and it consistently benefits the imbalanced classification task in terms of several widely-used evaluation metrics on five benchmark datasets: MNIST, Fashion-MNIST, SVHN, CIFAR-10 and GTRSB.

Multi-Modal Deep Clustering: Unsupervised Partitioning of Images

Guy Shiran, Daphna Weinshall

Responsive image

Auto-TLDR; Multi-Modal Deep Clustering for Unlabeled Images

Slides Poster Similar

The clustering of unlabeled raw images is a daunting task, which has recently been approached with some success by deep learning methods. Here we propose an unsupervised clustering framework, which learns a deep neural network in an end-to-end fashion, providing direct cluster assignments of images without additional processing. Multi-Modal Deep Clustering (MMDC), trains a deep network to align its image embeddings with target points sampled from a Gaussian Mixture Model distribution. The cluster assignments are then determined by mixture component association of image embeddings. Simultaneously, the same deep network is trained to solve an additional self-supervised task. This pushes the network to learn more meaningful image representations and stabilizes the training. Experimental results show that MMDC achieves or exceeds state-of-the-art performance on four challenging benchmarks. On natural image datasets we improve on previous results with significant margins of up to 11% absolute accuracy points, yielding an accuracy of 70% on CIFAR-10 and 61% on STL-10.