Background Invariance by Adversarial Learning

Ricardo Cruz, Ricardo M. Prates, Eduardo F. Simas Filho, Joaquim F. Pinto Costa, Jaime S. Cardoso

Responsive image

Auto-TLDR; Improving Convolutional Neural Networks for Overhead Power Line Insulators Detection using a Drone

Slides Poster

Convolutional neural networks are shown to be vulnerable to changes in the background. The proposed method is an end-to-end method that augments the training set by introducing new backgrounds during the training process. These backgrounds are created by a generative network that is trained as an adversary to the model. A case study is explored based on overhead power line insulators detection using a drone – a training set is prepared from photographs taken inside a laboratory and then evaluated using photographs that are harder to collect from outside the laboratory. The proposed method improves performance by over 20% for this case study.

Similar papers

Phase Retrieval Using Conditional Generative Adversarial Networks

Tobias Uelwer, Alexander Oberstraß, Stefan Harmeling

Responsive image

Auto-TLDR; Conditional Generative Adversarial Networks for Phase Retrieval

Slides Poster Similar

In this paper, we propose the application of conditional generative adversarial networks to solve various phase retrieval problems. We show that including knowledge of the measurement process at training time leads to an optimization at test time that is more robust to initialization than existing approaches involving generative models. In addition, conditioning the generator network on the measurements enables us to achieve much more detailed results. We empirically demonstrate that these advantages provide meaningful solutions to the Fourier and the compressive phase retrieval problem and that our method outperforms well-established projection-based methods as well as existing methods that are based on neural networks. Like other deep learning methods, our approach is very robust to noise and can therefore be very useful for real-world applications.

A GAN-Based Blind Inpainting Method for Masonry Wall Images

Yahya Ibrahim, Balázs Nagy, Csaba Benedek

Responsive image

Auto-TLDR; An End-to-End Blind Inpainting Algorithm for Masonry Wall Images

Slides Poster Similar

In this paper we introduce a novel end-to-end blind inpainting algorithm for masonry wall images, performing the automatic detection and virtual completion of occluded or damaged wall regions. For this purpose, we propose a three-stage deep neural network that comprises a U-Net-based sub-network for wall segmentation into brick, mortar and occluded regions, which is followed by a two-stage adversarial inpainting model. The first adversarial network predicts the schematic mortar-brick pattern of the occluded areas based on the observed wall structure, providing in itself valuable structural information for archeological and architectural applications. Finally, the second adversarial network predicts the RGB pixel values yielding a realistic visual experience for the observer. While the three stages implement a sequential pipeline, they interact through dependencies of their loss functions admitting the consideration of hidden feature dependencies between the different network components. For training and testing the network a new dataset has been created, and an extensive qualitative and quantitative evaluation versus the state-of-the-art is given.

Automatic Semantic Segmentation of Structural Elements related to the Spinal Cord in the Lumbar Region by Using Convolutional Neural Networks

Jhon Jairo Sáenz Gamboa, Maria De La Iglesia-Vaya, Jon Ander Gómez

Responsive image

Auto-TLDR; Semantic Segmentation of Lumbar Spine Using Convolutional Neural Networks

Slides Poster Similar

This work addresses the problem of automatically segmenting the MR images corresponding to the lumbar spine. The purpose is to detect and delimit the different structural elements like vertebrae, intervertebral discs, nerves, blood vessels, etc. This task is known as semantic segmentation. The approach proposed in this work is based on convolutional neural networks whose output is a mask where each pixel from the input image is classified into one of the possible classes. Classes were defined by radiologists and correspond to structural elements and tissues. The proposed network architectures are variants of the U-Net. Several complementary blocks were used to define the variants: spatial attention models, deep supervision and multi-kernels at input, this last block type is based on the idea of inception. Those architectures which got the best results are described in this paper, and their results are discussed. Two of the proposed architectures outperform the standard U-Net used as baseline.

Boundary Optimised Samples Training for Detecting Out-Of-Distribution Images

Luca Marson, Vladimir Li, Atsuto Maki

Responsive image

Auto-TLDR; Boundary Optimised Samples for Out-of-Distribution Input Detection in Deep Convolutional Networks

Slides Poster Similar

This paper presents a new approach to the problem of detecting out-of-distribution (OOD) inputs in image classifications with deep convolutional networks. We leverage so-called boundary samples to enforce low confidence (maximum softmax probabilities) for inputs far away from the training data. In particular, we propose the boundary optimised samples (named BoS) training algorithm for generating them. Unlike existing approaches, it does not require extra generative adversarial network, but achieves the goal by simply back propagating the gradient of an appropriately designed loss function to the input samples. At the end of the BoS training, all the boundary samples are in principle located on a specific level hypersurface with respect to the designed loss. Our contributions are i) the BoS training as an efficient alternative to generate boundary samples, ii) a robust algorithm therewith to enforce low confidence for OOD samples, and iii) experiments demonstrating improved OOD detection over the baseline. We show the performance using standard datasets for training and different test sets including Fashion MNIST, EMNIST, SVHN, and CIFAR-100, preceded by evaluations with a synthetic 2-dimensional dataset that provide an insight for the new procedure.

GAP: Quantifying the Generative Adversarial Set and Class Feature Applicability of Deep Neural Networks

Edward Collier, Supratik Mukhopadhyay

Responsive image

Auto-TLDR; Approximating Adversarial Learning in Deep Neural Networks Using Set and Class Adversaries

Slides Poster Similar

Recent work in deep neural networks has sought to characterize the nature in which a network learns features and how applicable learnt features are to various problem sets. Deep neural network applicability can be split into three sub-problems; set applicability, class applicability, and instance applicability. In this work we seek to quantify the applicability of features learned during adversarial training, focusing specifically on set and class applicability. We apply techniques for measuring applicability to both generators and discriminators trained on various data sets to quantify applicability and better observe how both a generator and a discriminator, and generative models as a whole, learn features during adversarial training.

Evaluation of Anomaly Detection Algorithms for the Real-World Applications

Marija Ivanovska, Domen Tabernik, Danijel Skocaj, Janez Pers

Responsive image

Auto-TLDR; Evaluating Anomaly Detection Algorithms for Practical Applications

Slides Poster Similar

Anomaly detection in complex data structures is oneof the most challenging problems in computer vision. In manyreal-world problems, for example in the quality control in modernmanufacturing, the anomalous samples are usually rare, resultingin (highly) imbalanced datasets. However, in current researchpractice, these scenarios are rarely modeled, and as a conse-quence, evaluation of anomaly detection algorithms often do notreproduce results that are useful for practical applications. First,even in case of highly unbalanced input data, anomaly detectionalgorithms are expected to significantly reduce the proportionof anomalous samples, detecting ”almost all” anomalous samples(with exact specifications depending on the target customer). Thisplaces high importance on only the small part of the ROC curve,possibly rendering the standard metrics such as AUC (AreaUnder Curve) and AP (Average Precision) useless. Second, thetarget of automatic anomaly detection in practical applicationsis significant reduction in manual work required, and standardmetrics are poor predictor of this feature. Finally, the evaluationmay produce erratic results for different randomly initializedtraining runs of the neural network, producing evaluation resultsthat may not reproduce well in practice. In this paper, we presentan evaluation methodology that avoids these pitfalls.

Variational Inference with Latent Space Quantization for Adversarial Resilience

Vinay Kyatham, Deepak Mishra, Prathosh A.P.

Responsive image

Auto-TLDR; A Generalized Defense Mechanism for Adversarial Attacks on Data Manifolds

Slides Poster Similar

Despite their tremendous success in modelling highdimensional data manifolds, deep neural networks suffer from the threat of adversarial attacks - Existence of perceptually valid input-like samples obtained through careful perturbation that lead to degradation in the performance of the underlying model. Major concerns with existing defense mechanisms include non-generalizability across different attacks, models and large inference time. In this paper, we propose a generalized defense mechanism capitalizing on the expressive power of regularized latent space based generative models. We design an adversarial filter, devoid of access to classifier and adversaries, which makes it usable in tandem with any classifier. The basic idea is to learn a Lipschitz constrained mapping from the data manifold, incorporating adversarial perturbations, to a quantized latent space and re-map it to the true data manifold. Specifically, we simultaneously auto-encode the data manifold and its perturbations implicitly through the perturbations of the regularized and quantized generative latent space, realized using variational inference. We demonstrate the efficacy of the proposed formulation in providing resilience against multiple attack types (black and white box) and methods, while being almost real-time. Our experiments show that the proposed method surpasses the stateof-the-art techniques in several cases.

Machine-Learned Regularization and Polygonization of Building Segmentation Masks

Stefano Zorzi, Ksenia Bittner, Friedrich Fraundorfer

Responsive image

Auto-TLDR; Automatic Regularization and Polygonization of Building Segmentation masks using Generative Adversarial Network

Slides Poster Similar

We propose a machine learning based approach for automatic regularization and polygonization of building segmentation masks. Taking an image as input, we first predict building segmentation maps exploiting generic fully convolutional network (FCN). A generative adversarial network (GAN) is then involved to perform a regularization of building boundaries to make them more realistic, i.e., having more rectilinear outlines which construct right angles if required. This is achieved through the interplay between the discriminator which gives a probability of input image being true and generator that learns from discriminator’s response to create more realistic images. Finally, we train the backbone convolutional neural network (CNN) which is adapted to predict sparse outcomes corresponding to building corners out of regularized building segmentation results. Experiments on three building segmentation datasets demonstrate that the proposed method is not only capable of obtaining accurate results, but also of producing visually pleasing building outlines parameterized as polygons.

Semi-Supervised Generative Adversarial Networks with a Pair of Complementary Generators for Retinopathy Screening

Yingpeng Xie, Qiwei Wan, Hai Xie, En-Leng Tan, Yanwu Xu, Baiying Lei

Responsive image

Auto-TLDR; Generative Adversarial Networks for Retinopathy Diagnosis via Fundus Images

Slides Poster Similar

Several typical types of retinopathy are major causes of blindness. However, early detection of retinopathy is quite not easy since few symptoms are observable in the early stage, attributing to the development of non-mydriatic retinal camera. These camera produces high-resolution retinal fundus images provide the possibility of Computer-Aided-Diagnosis (CAD) via deep learning to assist diagnosing retinopathy. Deep learning algorithms usually rely on a great number of labelled images which are expensive and time-consuming to obtain in the medical imaging area. Moreover, the random distribution of various lesions which often vary greatly in size also brings significant challenges to learn discriminative information from high-resolution fundus image. In this paper, we present generative adversarial networks simultaneously equipped with "good" generator and "bad" generator (GBGANs) to make up for the incomplete data distribution provided by limited fundus images. To improve the generative feasibility of generator, we introduce into pre-trained feature extractor to acquire condensed feature for each fundus image in advance. Experimental results on integrated three public iChallenge datasets show that the proposed GBGANs could fully utilize the available fundus images to identify retinopathy with little label cost.

Augmentation of Small Training Data Using GANs for Enhancing the Performance of Image Classification

Shih-Kai Hung, John Q. Gan

Responsive image

Auto-TLDR; Generative Adversarial Network for Image Training Data Augmentation

Slides Poster Similar

It is difficult to achieve high performance without sufficient training data for deep convolutional neural networks (DCNNs) to learn. Data augmentation plays an important role in improving robustness and preventing overfitting in machine learning for many applications such as image classification. In this paper, a novel method for data augmentation is proposed to solve the problem of machine learning with small training datasets. The proposed method can synthesise similar images with rich diversity from only a single original training sample to increase the number of training data by using generative adversarial networks (GANs). It is expected that the synthesised images possess class-informative features, which may be in the validation or testing data but not in the training data due to that the training dataset is small, and thus they can be effective as augmented training data to improve classification accuracy of DCNNs. The experimental results have demonstrated that the proposed method with a novel GAN framework for image training data augmentation can significantly enhance the classification performance of DCNNs for applications where original training data is limited.

IDA-GAN: A Novel Imbalanced Data Augmentation GAN

Hao Yang, Yun Zhou

Responsive image

Auto-TLDR; IDA-GAN: Generative Adversarial Networks for Imbalanced Data Augmentation

Slides Poster Similar

Class imbalance is a widely existed and challenging problem in real-world applications such as disease diagnosis, fraud detection, network intrusion detection and so on. Due to the scarce of data, it could significantly deteriorate the accuracy of classification. To address this challenge, we propose a novel Imbalanced Data Augmentation Generative Adversarial Networks (GAN) named IDA-GAN as an augmentation tool to deal with the imbalanced dataset. This is a great challenge because it is hard to train a GAN model under this situation. We overcome this issue by coupling Variational autoencoder along with GAN training. Specifically, we introduce the Variational autoencoder to learn the majority and minority class distributions in the latent space, and use the generative model to utilize each class distribution for the subsequent GAN training. The generative model learns useful features to generate target minority-class samples. By comparing with the state-of-the-art GAN models, the experimental results demonstrate that our proposed IDA-GAN could generate more diverse minority samples with better qualities, and it consistently benefits the imbalanced classification task in terms of several widely-used evaluation metrics on five benchmark datasets: MNIST, Fashion-MNIST, SVHN, CIFAR-10 and GTRSB.

Can Data Placement Be Effective for Neural Networks Classification Tasks? Introducing the Orthogonal Loss

Brais Cancela, Veronica Bolon-Canedo, Amparo Alonso-Betanzos

Responsive image

Auto-TLDR; Spatial Placement for Neural Network Training Loss Functions

Slides Poster Similar

Traditionally, a Neural Network classification training loss function follows the same principle: minimizing the distance between samples that belong to the same class, while maximizing the distance to the other classes. There are no restrictions on the spatial placement of deep features (last layer input). This paper addresses this issue when dealing with Neural Networks, providing a set of loss functions that are able to train a classifier by forcing the deep features to be projected over a predefined orthogonal basis. Experimental results shows that these `data placement' functions can overcome the training accuracy provided by the classic cross-entropy loss function.

A Joint Representation Learning and Feature Modeling Approach for One-Class Recognition

Pramuditha Perera, Vishal Patel

Responsive image

Auto-TLDR; Combining Generative Features and One-Class Classification for Effective One-class Recognition

Slides Poster Similar

One-class recognition is traditionally approached either as a representation learning problem or a feature modelling problem. In this work, we argue that both of these approaches have their own limitations; and a more effective solution can be obtained by combining the two. The proposed approach is based on the combination of a generative framework and a one-class classification method. First, we learn generative features using the one-class data with a generative framework. We augment the learned features with the corresponding reconstruction errors to obtain augmented features. Then, we qualitatively identify a suitable feature distribution that reduces the redundancy in the chosen classifier space. Finally, we force the augmented features to take the form of this distribution using an adversarial framework. We test the effectiveness of the proposed method on three one-class classification tasks and obtain state-of-the-art results.

Leveraging Synthetic Subject Invariant EEG Signals for Zero Calibration BCI

Nik Khadijah Nik Aznan, Amir Atapour-Abarghouei, Stephen Bonner, Jason Connolly, Toby Breckon

Responsive image

Auto-TLDR; SIS-GAN: Subject Invariant SSVEP Generative Adversarial Network for Brain-Computer Interface

Slides Similar

Recently, substantial progress has been made in the area of Brain-Computer Interface (BCI) using modern machine learning techniques to decode and interpret brain signals. While Electroencephalography (EEG) has provided a non-invasive method of interfacing with a human brain, the acquired data is often heavily subject and session dependent. This makes seamless incorporation of such data into real-world applications intractable as the subject and session data variance can lead to long and tedious calibration requirements and cross-subject generalisation issues. Focusing on a Steady State Visual Evoked Potential (SSVEP) classification systems, we propose a novel means of generating highly-realistic synthetic EEG data invariant to any subject, session or other environmental conditions. Our approach, entitled the Subject Invariant SSVEP Generative Adversarial Network (SIS-GAN), produces synthetic EEG data from multiple SSVEP classes using a single network. Additionally, by taking advantage of a fixed-weight pre-trained subject classification network, we ensure that our generative model remains agnostic to subject-specific features and thus produces subject-invariant data that can be applied to new previously unseen subjects. Our extensive experimental evaluation demonstrates the efficacy of our synthetic data, leading to superior performance, with improvements of up to 16% in zero-calibration classification tasks when trained using our subject-invariant synthetic EEG signals.

Interpreting the Latent Space of GANs Via Correlation Analysis for Controllable Concept Manipulation

Ziqiang Li, Rentuo Tao, Hongjing Niu, Bin Li

Responsive image

Auto-TLDR; Exploring latent space of GANs by analyzing correlation between latent variables and semantic contents in generated images

Slides Poster Similar

Generative adversarial nets (GANs) have been successfully applied in many fields like image generation, inpainting, super-resolution and drug discovery, etc., by now, the inner process of GANs is far from been understood. To get deeper insight of the intrinsic mechanism of GANs, in this paper, a method for interpreting the latent space of GANs by analyzing the correlation between latent variables and the corresponding semantic contents in generated images is proposed. Unlike previous methods that focus on dissecting models via feature visualization, the emphasis of this work is put on the variables in latent space, i.e. how the latent variables affect the quantitative analysis of generated results. Given a pretrained GAN model with weights fixed, the latent variables are intervened to analyze their effect on the semantic content in generated images. A set of controlling latent variables can be derived for specific content generation, and the controllable semantic content manipulation be achieved. The proposed method is testified on the datasets Fashion-MNIST and UT Zappos50K, experiment results show its effectiveness

Local Facial Attribute Transfer through Inpainting

Ricard Durall, Franz-Josef Pfreundt, Janis Keuper

Responsive image

Auto-TLDR; Attribute Transfer Inpainting Generative Adversarial Network

Slides Poster Similar

The term attribute transfer refers to the tasks of altering images in such a way, that the semantic interpretation of a given input image is shifted towards an intended direction, which is quantified by semantic attributes. Prominent example applications are photo realistic changes of facial features and expressions, like changing the hair color, adding a smile, enlarging the nose or altering the entire context of a scene, like transforming a summer landscape into a winter panorama. Recent advances in attribute transfer are mostly based on generative deep neural networks, using various techniques to manipulate images in the latent space of the generator. In this paper, we present a novel method for the common sub-task of local attribute transfers, where only parts of a face have to be altered in order to achieve semantic changes (e.g. removing a mustache). In contrast to previous methods, where such local changes have been implemented by generating new (global) images, we propose to formulate local attribute transfers as an inpainting problem. Removing and regenerating only parts of images, our Attribute Transfer Inpainting Generative Adversarial Network (ATI-GAN) is able to utilize local context information to focus on the attributes while keeping the background unmodified resulting in visually sound results.

Thermal Image Enhancement Using Generative Adversarial Network for Pedestrian Detection

Mohamed Amine Marnissi, Hajer Fradi, Anis Sahbani, Najoua Essoukri Ben Amara

Responsive image

Auto-TLDR; Improving Visual Quality of Infrared Images for Pedestrian Detection Using Generative Adversarial Network

Slides Poster Similar

Infrared imaging has recently played an important role in a wide range of applications including surveillance, robotics and night vision. However, infrared cameras often suffer from some limitations, essentially about low-contrast and blurred details. These problems contribute to the loss of observation of target objects in infrared images, which could limit the feasibility of different infrared imaging applications. In this paper, we mainly focus on the problem of pedestrian detection on thermal images. Particularly, we emphasis the need for enhancing the visual quality of images beforehand performing the detection step. % to ensure effective results. To address that, we propose a novel thermal enhancement architecture based on Generative Adversarial Network, and composed of two modules contrast enhancement and denoising modules with a post-processing step for edge restoration in order to improve the overall quality. The effectiveness of the proposed architecture is assessed by means of visual quality metrics and better results are obtained compared to the original thermal images and to the obtained results by other existing enhancement methods. These results have been conduced on a subset of KAIST dataset. Using the same dataset, the impact of the proposed enhancement architecture has been demonstrated on the detection results by obtaining better performance with a significant margin using YOLOv3 detector.

Generative Latent Implicit Conditional Optimization When Learning from Small Sample

Idan Azuri, Daphna Weinshall

Responsive image

Auto-TLDR; GLICO: Generative Latent Implicit Conditional Optimization for Small Sample Learning

Slides Poster Similar

We revisit the long-standing problem of learning from small sample. The generation of new samples from a small training set of labeled points has attracted increased attention in recent years. In this paper, we propose a novel such method called GLICO (Generative Latent Implicit Conditional Optimization). GLICO learns a mapping from the training examples to a latent space and a generator that generates images from vectors in the latent space. Unlike most recent work, which rely on access to large amounts of unlabeled data, GLICO does not require access to any additional data other than the small set of labeled points. In fact, GLICO learns to synthesize completely new samples for every class using as little as 5 or 10 examples per class, with as few as 10 such classes and no data from unknown classes. GLICO is then used to augment the small training set while training a classifier on the small sample. To this end, our proposed method samples the learned latent space using spherical interpolation (slerp) and generates new examples using the trained generator. Empirical results show that the new sampled set is diverse enough, leading to improvement in image classification in comparison with the state of the art when trained on small samples obtained from CIFAR-10, CIFAR-100, and CUB-200.

A Close Look at Deep Learning with Small Data

Lorenzo Brigato, Luca Iocchi

Responsive image

Auto-TLDR; Low-Complex Neural Networks for Small Data Conditions

Slides Poster Similar

In this work, we perform a wide variety of experiments with different Deep Learning architectures in small data conditions. We show that model complexity is a critical factor when only a few samples per class are available. Differently from the literature, we improve the state of the art using low complexity models. We show that standard convolutional neural networks with relatively few parameters are effective in this scenario. In many of our experiments, low complexity models outperform state-of-the-art architectures. Moreover, we propose a novel network that uses an unsupervised loss to regularize its training. Such architecture either improves the results either performs comparably well to low capacity networks. Surprisingly, experiments show that the dynamic data augmentation pipeline is not beneficial in this particular domain. Statically augmenting the dataset might be a promising research direction while dropout maintains its role as a good regularizer.

Future Urban Scenes Generation through Vehicles Synthesis

Alessandro Simoni, Luca Bergamini, Andrea Palazzi, Simone Calderara, Rita Cucchiara

Responsive image

Auto-TLDR; Predicting the Future of an Urban Scene with a Novel View Synthesis Paradigm

Slides Poster Similar

In this work we propose a deep learning pipeline to predict the visual future appearance of an urban scene. Despite recent advances, generating the entire scene in an end-to-end fashion is still far from being achieved. Instead, here we follow a two stages approach, where interpretable information is included in the loop and each actor is modelled independently. We leverage a per-object novel view synthesis paradigm; i.e. generating a synthetic representation of an object undergoing a geometrical roto-translation in the 3D space. Our model can be easily conditioned with constraints (e.g. input trajectories) provided by state-of-the-art tracking methods or by the user itself. This allows us to generate a set of diverse realistic futures starting from the same input in a multi-modal fashion. We visually and quantitatively show the superiority of this approach over traditional end-to-end scene-generation methods on CityFlow, a challenging real world dataset.

Combining GANs and AutoEncoders for Efficient Anomaly Detection

Fabio Carrara, Giuseppe Amato, Luca Brombin, Fabrizio Falchi, Claudio Gennaro

Responsive image

Auto-TLDR; CBIGAN: Anomaly Detection in Images with Consistency Constrained BiGAN

Slides Poster Similar

In this work, we propose CBiGAN --- a novel method for anomaly detection in images, where a consistency constraint is introduced as a regularization term in both the encoder and decoder of a BiGAN. Our model exhibits fairly good modeling power and reconstruction consistency capability. We evaluate the proposed method on MVTec AD --- a real-world benchmark for unsupervised anomaly detection on high-resolution images --- and compare against standard baselines and state-of-the-art approaches. Experiments show that the proposed method improves the performance of BiGAN formulations by a large margin and performs comparably to expensive state-of-the-art iterative methods while reducing the computational cost. We also observe that our model is particularly effective in texture-type anomaly detection, as it sets a new state of the art in this category. The code will be publicly released.

UCCTGAN: Unsupervised Clothing Color Transformation Generative Adversarial Network

Shuming Sun, Xiaoqiang Li, Jide Li

Responsive image

Auto-TLDR; An Unsupervised Clothing Color Transformation Generative Adversarial Network

Slides Poster Similar

Clothing color transformation refers to changing the clothes color in an original image to the clothes color in a target image. In this paper, we propose an Unsupervised Clothing Color Transformation Generative Adversarial Network (UCCTGAN) for the task. UCCTGAN adopts the color histogram of a target clothes as color guidance and an improved U-net architecture called AntennaNet is put forward to fuse the extracted color information with the original image. Meanwhile, to accomplish unsupervised learning, the loss function is carefully designed according to color moment, which evaluates the chromatic aberration between the target clothing and the generated clothing. Experimental results show that our network has the ability to generate convincing color transformation results.

Combining Similarity and Adversarial Learning to Generate Visual Explanation: Application to Medical Image Classification

Martin Charachon, Roberto Roberto Ardon, Celine Hudelot, Paul-Henry Cournède, Camille Ruppli

Responsive image

Auto-TLDR; Explaining Black-Box Machine Learning Models with Visual Explanation

Slides Poster Similar

Recently, due to their success and increasing applications, explaining the decision of black-box machine learning models has become a critical task. It is particularly the case in sensitive domains such as medical image interpretation. Various explanation approaches have been proposed in the literature, among which perturbation based approaches are very promising. Within this class of methods, we leverage a learning framework to produce our visual explanations method. From a given classifier, we train two generators to produce from an input image the so called similar and adversarial images. The similar (resp. adversarial) image shall be classified as (resp. not as) the input image. We show that visual explanation, outperforming state of the art methods, can be derived from these. Our method is model-agnostic and, at test time, only requires a single forward pass to generate explanation. Therefore, the proposed approach is adapted for real-time systems such as medical image analysis. Finally, we show that random geometric augmentations applied on the original image acts as a regularization that improves all state of the art explanation methods. We validate our approach on a large chest X-ray database.

Improved anomaly detection by training an autoencoder with skip connections on images corrupted with Stain-shaped noise

Anne-Sophie Collin, Christophe De Vleeschouwer

Responsive image

Auto-TLDR; Autoencoder with Skip Connections for Anomaly Detection

Slides Poster Similar

In industrial vision, the anomaly detection problem can be addressed with an autoencoder trained to map an arbitrary image, i.e. with or without any defect, to a clean image, i.e. without any defect. In this approach, anomaly detection relies conventionally on the reconstruction residual or, alternatively, on the reconstruction uncertainty. To improve the sharpness of the reconstruction, we consider an autoencoder architecture with skip connections. In the common scenario where only clean images are available for training, we propose to corrupt them with a synthetic noise model to prevent the convergence of the network towards the identity mapping, and introduce an original Stain noise model for that purpose. We show that this model favors the reconstruction of clean images from arbitrary real-world images, regardless of the actual defects appearance. In addition to demonstrating the relevance of our approach, our validation provides the first consistent assessment of reconstruction-based methods, by comparing their performance over the MVTec AD dataset [ref.], both for pixel- and image-wise anomaly detection.

MTGAN: Mask and Texture-Driven Generative Adversarial Network for Lung Nodule Segmentation

Wei Chen, Qiuli Wang, Kun Wang, Dan Yang, Xiaohong Zhang, Chen Liu, Yucong Li

Responsive image

Auto-TLDR; Mask and Texture-driven Generative Adversarial Network for Lung Nodule Segmentation

Slides Poster Similar

Accurate segmentation for lung nodules in lung computed tomography (CT) scans plays a key role in the early diagnosis of lung cancer. Many existing methods, especially UNet, have made significant progress in lung nodule segmentation. However, due to the complex shapes of lung nodules and the similarity of visual characteristics between nodules and lung tissues, an accurate segmentation with few false positives of lung nodules is still a challenging problem. Considering the fact that both boundary and texture information of lung nodules are important for obtaining an accurate segmentation result, we propose a novel Mask and Texture-driven Generative Adversarial Network (MTGAN) with a joint multi-scale L1 loss for lung nodule segmentation, which takes full advantages of U-Net and adversarial training. The proposed MTGAN leverages adversarial learning strategy guided by the boundary and texture information of lung nodules to generate more accurate segmentation results with lesser false positives. We validate our model with the LIDC–IDRI dataset, and experimental results show that our method achieves excellent segmentation results for a variety of lung nodules, especially for juxtapleural nodules and low-dense nodules. Without any bells and whistles, the proposed MTGAN achieves significant segmentation performance with the Dice similarity coefficient (DSC) of 85.24% on the LIDC–IDRI dataset.

Towards Explaining Adversarial Examples Phenomenon in Artificial Neural Networks

Ramin Barati, Reza Safabakhsh, Mohammad Rahmati

Responsive image

Auto-TLDR; Convolutional Neural Networks and Adversarial Training from the Perspective of convergence

Slides Poster Similar

In this paper, we study the adversarial examples existence and adversarial training from the standpoint of convergence and provide evidence that pointwise convergence in ANNs can explain these observations. The main contribution of our proposal is that it relates the objective of the evasion attacks and adversarial training with concepts already defined in learning theory. Also, we extend and unify some of the other proposals in the literature and provide alternative explanations on the observations made in those proposals. Through different experiments, we demonstrate that the framework is valuable in the study of the phenomenon and is applicable to real-world problems.

Verifying the Causes of Adversarial Examples

Honglin Li, Yifei Fan, Frieder Ganz, Tony Yezzi, Payam Barnaghi

Responsive image

Auto-TLDR; Exploring the Causes of Adversarial Examples in Neural Networks

Slides Poster Similar

The robustness of neural networks is challenged by adversarial examples that contain almost imperceptible perturbations to inputs which mislead a classifier to incorrect outputs in high confidence. Limited by the extreme difficulty in examining a high-dimensional image space thoroughly, research on explaining and justifying the causes of adversarial examples falls behind studies on attacks and defenses. In this paper, we present a collection of potential causes of adversarial examples and verify (or partially verify) them through carefully-designed controlled experiments. The major causes of adversarial examples include model linearity, one-sum constraint, and geometry of the categories. To control the effect of those causes, multiple techniques are applied such as $L_2$ normalization, replacement of loss functions, construction of reference datasets, and novel models using multi-layer perceptron probabilistic neural networks (MLP-PNN) and density estimation (DE). Our experiment results show that geometric factors tend to be more direct causes and statistical factors magnify the phenomenon, especially for assigning high prediction confidence. We hope this paper will inspire more studies to rigorously investigate the root causes of adversarial examples, which in turn provide useful guidance on designing more robust models.

Enlarging Discriminative Power by Adding an Extra Class in Unsupervised Domain Adaptation

Hai Tran, Sumyeong Ahn, Taeyoung Lee, Yung Yi

Responsive image

Auto-TLDR; Unsupervised Domain Adaptation using Artificial Classes

Slides Poster Similar

We study the problem of unsupervised domain adaptation that aims at obtaining a prediction model for the target domain using labeled data from the source domain and unlabeled data from the target domain. There exists an array of recent research based on the idea of extracting features that are not only invariant for both domains but also provide high discriminative power for the target domain. In this paper, we propose an idea of improving the discriminativeness: Adding an extra artificial class and training the model on the given data together with the GAN-generated samples of the new class. The trained model based on the new class samples is capable of extracting the features that are more discriminative by repositioning data of current classes in the target domain and therefore increasing the distances among the target clusters in the feature space. Our idea is highly generic so that it is compatible with many existing methods such as DANN, VADA, and DIRT-T. We conduct various experiments for the standard data commonly used for the evaluation of unsupervised domain adaptations and demonstrate that our algorithm achieves the SOTA performance for many scenarios.

GAN-Based Gaussian Mixture Model Responsibility Learning

Wanming Huang, Yi Da Xu, Shuai Jiang, Xuan Liang, Ian Oppermann

Responsive image

Auto-TLDR; Posterior Consistency Module for Gaussian Mixture Model

Slides Poster Similar

Mixture Model (MM) is a probabilistic framework allows us to define dataset containing $K$ different modes. When each of the modes is associated with a Gaussian distribution, we refer to it as Gaussian MM or GMM. Given a data point $x$, a GMM may assume the existence of a random index $k \in \{1, \dots , K \}$ identifying which Gaussian the particular data is associated with. In a traditional GMM paradigm, it is straightforward to compute in closed-form, the conditional likelihood $p(x |k, \theta)$ as well as the responsibility probability $p(k|x, \theta)$ describing the distribution weights for each data. Computing the responsibility allows us to retrieve many important statistics of the overall dataset, including the weights of each of the modes/clusters. Modern large data-sets are often containing multiple unlabelled modes, such as paintings dataset may contain several styles; fashion images containing several unlabelled categories. In its raw representation, the Euclidean distances between the data (e.g., images) do not allow them to form mixtures naturally, nor it's feasible to compute responsibility distribution analytically, making GMM unable to apply. In this paper, we utilize the Generative Adversarial Network (GAN) framework to achieve a plausible alternative method to compute these probabilities. The key insight is that we compute them at the data's latent space $z$ instead of $x$. However, this process of $z \rightarrow x$ is irreversible under GAN which renders the computation of responsibility $p(k|x, \theta)$ infeasible. Our paper proposed a novel method to solve it by using a so-called Posterior Consistency Module (PCM). PCM acts like a GAN, except its Generator $C_{\text{PCM}}$ does not output the data, but instead it outputs a distribution to approximate $p(k|x, \theta)$. The entire network is trained in an ``end-to-end'' fashion. Trough these techniques, it allows us to model the dataset of very complex structure using GMM and subsequently to discover interesting properties of an unsupervised dataset, including its segments, as well as generating new ``out-distribution" data by smooth linear interpolation across any combinations of the modes in a completely unsupervised manner.

Data Augmentation Via Mixed Class Interpolation Using Cycle-Consistent Generative Adversarial Networks Applied to Cross-Domain Imagery

Hiroshi Sasaki, Chris G. Willcocks, Toby Breckon

Responsive image

Auto-TLDR; C2GMA: A Generative Domain Transfer Model for Non-visible Domain Classification

Slides Poster Similar

Machine learning driven object detection and classification within non-visible imagery has an important role in many fields such as night vision, all-weather surveillance and aviation security. However, such applications often suffer due to the limited quantity and variety of non-visible spectral domain imagery, in contrast to the high data availability of visible-band imagery that readily enables contemporary deep learning driven detection and classification approaches. To address this problem, this paper proposes and evaluates a novel data augmentation approach that leverages the more readily available visible-band imagery via a generative domain transfer model. The model can synthesise large volumes of non-visible domain imagery by image-to-image (I2I) translation from the visible image domain. Furthermore, we show that the generation of interpolated mixed class (non-visible domain) image examples via our novel Conditional CycleGAN Mixup Augmentation (C2GMA) methodology can lead to a significant improvement in the quality of non-visible domain classification tasks that otherwise suffer due to limited data availability. Focusing on classification within the Synthetic Aperture Radar (SAR) domain, our approach is evaluated on a variation of the Statoil/C-CORE Iceberg Classifier Challenge dataset and achieves 75.4% accuracy, demonstrating a significant improvement when compared against traditional data augmentation strategies (Rotation, Mixup, and MixCycleGAN).

Signal Generation Using 1d Deep Convolutional Generative Adversarial Networks for Fault Diagnosis of Electrical Machines

Russell Sabir, Daniele Rosato, Sven Hartmann, Clemens Gühmann

Responsive image

Auto-TLDR; Large Dataset Generation from Faulty AC Machines using Deep Convolutional GAN

Slides Poster Similar

AC machines may be subjected to different electrical or mechanical faults during their operation. Fault patterns can be detected in the DC current from the machine’s E-Drive system with the help of Deep or Machine Learning algorithms. However, Deep or Machine Learning algorithms require large amounts of dataset for training and without the availability of a large dataset the algorithms fail to generalize or give their optimal performance. Collecting large amounts of data from faulty machine can be a tedious task. It is expensive and not always possible. In some cases, the machine is completely damaged even before sufficient amount of data can be collected. Also, data collection from defected machine may cause permanent damage to the connected system. Therefore, in this paper the problem of small dataset is tackled by presenting a methodology for large dataset generation by using the well-known generative model, Generative Adversarial Networks (GAN). As an example, the stator open circuit fault in a synchronous machine is considered. DC currents from the machine’s E-Drive system are measured from different healthy and faulty machines and are used for training of two 1d DCGANs (Deep Convolutional GANs), one for the healthy and the other for the current signal from the faulty machine. Conventional GANs are difficult to train, however in this paper, training parameters of 1d DCGAN are tuned which results an improved training process. The performance of generator during the training of 1d DCGAN is evaluated by using the Fréchet Inception Distance (FID) metric. The proposed 1d DCGAN model is said to converge when FID score between the real and generated signal reaches below a certain threshold. The generated signals from the trained 1d DCGAN are further evaluated using the PDF (Probability Density Function), frequency domain analysis and other measures which check for duplication of the real data and their statistical diversity. The trained 1d DCGAN is able to generate DC current signals for building large datasets for the training of Deep or Machine learning models.

Rethinking Domain Generalization Baselines

Francesco Cappio Borlino, Antonio D'Innocente, Tatiana Tommasi

Responsive image

Auto-TLDR; Style Transfer Data Augmentation for Domain Generalization

Slides Poster Similar

Despite being very powerful in standard learning settings, deep learning models can be extremely brittle when deployed in scenarios different from those on which they were trained. Domain generalization methods investigate this problem and data augmentation strategies have shown to be helpful tools to increase data variability, supporting model robustness across domains. In our work we focus on style transfer data augmentation and we present how it can be implemented with a simple and inexpensive strategy to improve generalization. Moreover, we analyze the behavior of current state of the art domain generalization methods when integrated with this augmentation solution: our thorough experimental evaluation shows that their original effect almost always disappears with respect to the augmented baseline. This issue open new scenarios for domain generalization research, highlighting the need of novel methods properly able to take advantage of the introduced data variability.

Image Representation Learning by Transformation Regression

Xifeng Guo, Jiyuan Liu, Sihang Zhou, En Zhu, Shihao Dong

Responsive image

Auto-TLDR; Self-supervised Image Representation Learning using Continuous Parameter Prediction

Slides Poster Similar

Self-supervised learning is a thriving research direction since it can relieve the burden of human labeling for machine learning by seeking for supervision from data instead of human annotation. Although demonstrating promising performance in various applications, we observe that the existing methods usually model the auxiliary learning tasks as classification tasks with finite discrete labels, leading to insufficient supervisory signals, which in turn restricts the representation quality. In this paper, to solve the above problem and make full use of the supervision from data, we design a regression model to predict the continuous parameters of a group of transformations, i.e., image rotation, translation, and scaling. Surprisingly, this naive modification stimulates tremendous potential from data and the resulting supervisory signal has largely improved the performance of image representation learning. Extensive experiments on four image datasets, including CIFAR10, CIFAR100, STL10, and SVHN, indicate that our proposed algorithm outperforms the state-of-the-art unsupervised learning methods by a large margin in terms of classification accuracy. Crucially, we find that with our proposed training mechanism as an initialization, the performance of the existing state-of-the-art classification deep architectures can be preferably improved.

Rethinking Experience Replay: A Bag of Tricks for Continual Learning

Pietro Buzzega, Matteo Boschini, Angelo Porrello, Simone Calderara

Responsive image

Auto-TLDR; Experience Replay for Continual Learning: A Practical Approach

Slides Poster Similar

In Continual Learning, a Neural Network is trained on a stream of data whose distribution shifts over time. Under these assumptions, it is especially challenging to improve on classes appearing later in the stream while remaining accurate on previous ones. This is due to the infamous problem of catastrophic forgetting, which causes a quick performance degradation when the classifier focuses on learning new categories. Recent literature proposed various approaches to tackle this issue, often resorting to very sophisticated techniques. In this work, we show that naive rehearsal can be patched to achieve similar performance. We point out some shortcomings that restrain Experience Replay (ER) and propose five tricks to mitigate them. Experiments show that ER, thus enhanced, displays an accuracy gain of 51.2 and 26.9 percentage points on the CIFAR-10 and CIFAR-100 datasets respectively (memory buffer size 1000). As a result, it surpasses current state-of-the-art rehearsal-based methods.

Learning with Multiplicative Perturbations

Xiulong Yang, Shihao Ji

Responsive image

Auto-TLDR; XAT and xVAT: A Multiplicative Adversarial Training Algorithm for Robust DNN Training

Slides Poster Similar

Adversarial Training (AT) and Virtual Adversarial Training (VAT) are the regularization techniques that train Deep Neural Networks (DNNs) with adversarial examples generated by adding small but worst-case perturbations to input examples. In this paper, we propose xAT and xVAT, new adversarial training algorithms that generate multiplicative perturbations to input examples for robust training of DNNs. Such perturbations are much more perceptible and interpretable than their additive counterparts exploited by AT and VAT. Furthermore, the multiplicative perturbations can be generated transductively or inductively, while the standard AT and VAT only support a transductive implementation. We conduct a series of experiments that analyze the behavior of the multiplicative perturbations and demonstrate that xAT and xVAT match or outperform state-of-the-art classification accuracies across multiple established benchmarks while being about 30% faster than their additive counterparts. Our source code can be found at https://github.com/sndnyang/xvat

Revisiting Sequence-To-Sequence Video Object Segmentation with Multi-Task Loss and Skip-Memory

Fatemeh Azimi, Benjamin Bischke, Sebastian Palacio, Federico Raue, Jörn Hees, Andreas Dengel

Responsive image

Auto-TLDR; Sequence-to-Sequence Learning for Video Object Segmentation

Slides Poster Similar

Video Object Segmentation (VOS) is an active research area of the visual domain. One of its fundamental sub-tasks is semi-supervised / one-shot learning: given only the segmentation mask for the first frame, the task is to provide pixel-accurate masks for the object over the rest of the sequence. Despite much progress in the last years, we noticed that many of the existing approaches lose objects in longer sequences, especially when the object is small or briefly occluded. In this work, we build upon a sequence-to-sequence approach that employs an encoder-decoder architecture together with a memory module for exploiting the sequential data. We further improve this approach by proposing a model that manipulates multi-scale spatio-temporal information using memory-equipped skip connections. Furthermore, we incorporate an auxiliary task based on distance classification which greatly enhances the quality of edges in segmentation masks. We compare our approach to the state of the art and show considerable improvement in the contour accuracy metric and the overall segmentation accuracy.

Uncertainty-Aware Data Augmentation for Food Recognition

Eduardo Aguilar, Bhalaji Nagarajan, Rupali Khatun, Marc Bolaños, Petia Radeva

Responsive image

Auto-TLDR; Data Augmentation for Food Recognition Using Epistemic Uncertainty

Slides Poster Similar

Food recognition has recently attracted attention of many researchers. However, high food ambiguity, inter-class variability and intra-class similarity define a real challenge for the Deep learning and Computer Vision algorithms. In order to improve their performance, it is necessary to better understand what the model learns and, from this, to determine the type of data that should be additionally included for being the most beneficial to the training procedure. In this paper, we propose a new data augmentation strategy that estimates and uses the epistemic uncertainty to guide the model training. The method follows an active learning framework, where the new synthetic images are generated from the hard to classify real ones present in the training data based on the epistemic uncertainty. Hence, it allows the food recognition algorithm to focus on difficult images in order to learn their discriminatives features. On the other hand, avoiding data generation from images that do not contribute to the recognition makes it faster and more efficient. We show that the proposed method allows to improve food recognition and provides a better trade-off between micro- and macro-recall measures.

Aerial Road Segmentation in the Presence of Topological Label Noise

Corentin Henry, Friedrich Fraundorfer, Eleonora Vig

Responsive image

Auto-TLDR; Improving Road Segmentation with Noise-Aware U-Nets for Fine-Grained Topology delineation

Slides Poster Similar

The availability of large-scale annotated datasets has enabled Fully-Convolutional Neural Networks to reach outstanding performance on road extraction in aerial images. However, high-quality pixel-level annotation is expensive to produce and even manually labeled data often contains topological errors. Trading off quality for quantity, many datasets rely on already available yet noisy labels, for example from OpenStreetMap. In this paper, we explore the training of custom U-Nets built with ResNet and DenseNet backbones using noise-aware losses that are robust towards label omission and registration noise. We perform an extensive evaluation of standard and noise-aware losses, including a novel Bootstrapped DICE-Coefficient loss, on two challenging road segmentation benchmarks. Our losses yield a consistent improvement in overall extraction quality and exhibit a strong capacity to cope with severe label noise. Our method generalizes well to two other fine-grained topology delineation tasks: surface crack detection for quality inspection and cell membrane extraction in electron microscopy imagery.

Accuracy-Perturbation Curves for Evaluation of Adversarial Attack and Defence Methods

Jaka Šircelj, Danijel Skocaj

Responsive image

Auto-TLDR; Accuracy-perturbation Curve for Robustness Evaluation of Adversarial Examples

Slides Poster Similar

With more research published on adversarial examples, we face a growing need for strong and insightful methods for evaluating the robustness of machine learning solutions against their adversarial threats. Previous work contains problematic and overly simplified evaluation methods, where different methods for generating adversarial examples are compared, even though they produce adversarial examples of differing perturbation magnitudes. This creates a biased evaluation environment, as higher perturbations yield naturally stronger adversarial examples. We propose a novel "accuracy-perturbation curve" that visualizes a classifiers classification accuracy response to adversarial examples of different perturbations. To demonstrate the utility of the curve we perform evaluation of responses of different image classifier architectures to four popular adversarial example methods. We also show how adversarial training improves the robustness of a classifier using the "accuracy-perturbation curve".

Augmented Cyclic Consistency Regularization for Unpaired Image-To-Image Translation

Takehiko Ohkawa, Naoto Inoue, Hirokatsu Kataoka, Nakamasa Inoue

Responsive image

Auto-TLDR; Augmented Cyclic Consistency Regularization for Unpaired Image-to-Image Translation

Slides Poster Similar

Unpaired image-to-image (I2I) translation has received considerable attention in pattern recognition and computer vision because of recent advancements in generative adversarial networks (GANs). However, due to the lack of explicit supervision, unpaired I2I models often fail to generate realistic images, especially in challenging datasets with different backgrounds and poses. Hence, stabilization is indispensable for real-world applications and GANs. Herein, we propose Augmented Cyclic Consistency Regularization (ACCR), a novel regularization method for unpaired I2I translation. Our main idea is to enforce consistency regularization originating from semi-supervised learning on the discriminators leveraging real, fake, reconstructed, and augmented samples. We regularize the discriminators to output similar predictions when fed pairs of original and perturbed images. We qualitatively clarify the generation property between unpaired I2I models and standard GANs, and explain why consistency regularization on fake and reconstructed samples works well. Quantitatively, our method outperforms the consistency regularized GAN (CR-GAN) in real-world translations and demonstrates efficacy against several data augmentation variants and cycle-consistent constraints.

UDBNET: Unsupervised Document Binarization Network Via Adversarial Game

Amandeep Kumar, Shuvozit Ghose, Pinaki Nath Chowdhury, Partha Pratim Roy, Umapada Pal

Responsive image

Auto-TLDR; Three-player Min-max Adversarial Game for Unsupervised Document Binarization

Slides Poster Similar

Degraded document image binarization is one of the most challenging tasks in the domain of document image analysis. In this paper, we present a novel approach towards document image binarization by introducing three-player min-max adversarial game. We train the network in an unsupervised setup by assuming that we do not have any paired-training data. In our approach, an Adversarial Texture Augmentation Network (ATANet) first superimposes the texture of a degraded reference image over a clean image. Later, the clean image along with its generated degraded version constitute the pseudo paired-data which is used to train the Unsupervised Document Binarization Network (UDBNet). Following this approach, we have enlarged the document binarization datasets as it generates multiple images having same content feature but different textual feature. These generated noisy images are then fed into the UDBNet to get back the clean version. The joint discriminator which is the third-player of our three-player min-max adversarial game tries to couple both the ATANet and UDBNet. The three-player min-max adversarial game stops, when the distributions modelled by the ATANet and the UDBNet align to the same joint distribution over time. Thus, the joint discriminator enforces the UDBNet to perform better on real degraded image. The experimental results indicate the superior performance of the proposed model over existing state-of-the-art algorithm on widely used DIBCO datasets. The source code of the proposed system is publicly available at https://github.com/VIROBO-15/UDBNET.

SECI-GAN: Semantic and Edge Completion for Dynamic Objects Removal

Francesco Pinto, Andrea Romanoni, Matteo Matteucci, Phil Torr

Responsive image

Auto-TLDR; SECI-GAN: Semantic and Edge Conditioned Inpainting Generative Adversarial Network

Slides Poster Similar

Image inpainting aims at synthesizing the missing content of damaged and corrupted images to produce visually realistic restorations; typical applications are in image restoration, automatic scene editing, super-resolution, and dynamic object removal. In this paper, we propose Semantic and Edge Conditioned Inpainting Generative Adversarial Network (SECI-GAN), an architecture that jointly exploits the high-level cues extracted by semantic segmentation and the fine-grained details captured by edge extraction to condition the image inpainting process. SECI-GAN is designed with a particular focus on recovering big regions belonging to the same object (e.g. cars or pedestrians) in the context of dynamic object removal from complex street views. To demonstrate the effectiveness of SECI-GAN, we evaluate our results on the Cityscapes dataset, showing that SECI-GAN is better than competing state-of-the-art models at recovering the structure and the content of the missing parts while producing consistent predictions.

Position-Aware and Symmetry Enhanced GAN for Radial Distortion Correction

Yongjie Shi, Xin Tong, Jingsi Wen, He Zhao, Xianghua Ying, Jinshi Hongbin Zha

Responsive image

Auto-TLDR; Generative Adversarial Network for Radial Distorted Image Correction

Slides Poster Similar

This paper presents a novel method based on the generative adversarial network for radial distortion correction. Instead of generating a corrected image, our generator predicts a pixel flow map to measure the pixel offset between the distorted and corrected image. The quality of the generated pixel flow map and the warped image are judged by the discriminator. As texture far away from the image center has strong distortion, we develop an Adaptive Inverted Foveal layer which can transform the deformation to the intensity of the image to exploit this property. Rotation symmetry enhanced convolution kernels are applied to extract geometric features of different orientations explicitly. These learned features are recalibrated using the Squeeze-and-Excitation block to assign different weights for different directions. Moreover, we construct a first real-world radial distorted image dataset RD600 annotated with ground truth to evaluate our proposed method. We conduct extensive experiments to validate the effectiveness of each part of our framework. The further experiment shows our approach outperforms previous methods in both synthetic and real-world datasets quantitatively and qualitatively.

MBD-GAN: Model-Based Image Deblurring with a Generative Adversarial Network

Li Song, Edmund Y. Lam

Responsive image

Auto-TLDR; Model-Based Deblurring GAN for Inverse Imaging

Slides Poster Similar

This paper presents a methodology to tackle inverse imaging problems by leveraging the synergistic power of imaging model and deep learning. The premise is that while learning-based techniques have quickly become the methods of choice in various applications, they often ignore the prior knowledge embedded in imaging models. Incorporating the latter has the potential to improve the image estimation. Specifically, we first provide a mathematical basis of using generative adversarial network (GAN) in inverse imaging through considering an optimization framework. Then, we develop the specific architecture that connects the generator and discriminator networks with the imaging model. While this technique can be applied to a variety of problems, from image reconstruction to super-resolution, we take image deblurring as the example here, where we show in detail the implementation and experimental results of what we call the model-based deblurring GAN (MBD-GAN).

Attack Agnostic Adversarial Defense via Visual Imperceptible Bound

Saheb Chhabra, Akshay Agarwal, Richa Singh, Mayank Vatsa

Responsive image

Auto-TLDR; Robust Adversarial Defense with Visual Imperceptible Bound

Slides Poster Similar

High susceptibility of deep learning algorithms against structured and unstructured perturbations has motivated the development of efficient adversarial defense algorithms. However, the lack of generalizability of existing defense algorithms and the high variability in the performance of the attack algorithms for different databases raises several questions on the effectiveness of the defense algorithms. In this research, we aim to design a defense model that is robust within the certain bound against both seen and unseen adversarial attacks. This bound is related to the visual appearance of an image and we termed it as \textit{Visual Imperceptible Bound (VIB)}. To compute this bound, we propose a novel method that uses the database characteristics. The VIB is further used to compute the effectiveness of attack algorithms. In order to design a defense model, we propose a defense algorithm which makes the model robust within the VIB against both seen and unseen attacks. The performance of the proposed defense algorithm and the method to compute VIB are evaluated on MNIST, CIFAR-10, and Tiny ImageNet databases on multiple attacks including C\&W ($l_2$) and DeepFool. The proposed defense algorithm is not only able to increase the robustness against several attacks but also retain or improve the classification accuracy on an original clean test set. Experimentally, it is demonstrated that the proposed defense is better than existing strong defense algorithms based on adversarial retraining. We have additionally performed the PGD attack in white box settings and compared the results with the existing algorithms. The proposed defense is independent of the target model and adversarial attacks, and therefore can be utilized against any attack.

Trainable Spectrally Initializable Matrix Transformations in Convolutional Neural Networks

Michele Alberti, Angela Botros, Schuetz Narayan, Rolf Ingold, Marcus Liwicki, Mathias Seuret

Responsive image

Auto-TLDR; Trainable and Spectrally Initializable Matrix Transformations for Neural Networks

Slides Poster Similar

In this work, we introduce a new architectural component to Neural Networks (NN), i.e., trainable and spectrally initializable matrix transformations on feature maps. While previous literature has already demonstrated the possibility of adding static spectral transformations as feature processors, our focus is on more general trainable transforms. We study the transforms in various architectural configurations on four datasets of different nature: from medical (ColorectalHist, HAM10000) and natural (Flowers) images to historical documents (CB55). With rigorous experiments that control for the number of parameters and randomness, we show that networks utilizing the introduced matrix transformations outperform vanilla neural networks. The observed accuracy increases appreciably across all datasets. In addition, we show that the benefit of spectral initialization leads to significantly faster convergence, as opposed to randomly initialized matrix transformations. The transformations are implemented as auto-differentiable PyTorch modules that can be incorporated into any neural network architecture. The entire code base is open-source.

Hierarchical Mixtures of Generators for Adversarial Learning

Alper Ahmetoğlu, Ethem Alpaydin

Responsive image

Auto-TLDR; Hierarchical Mixture of Generative Adversarial Networks

Slides Similar

Generative adversarial networks (GANs) are deep neural networks that allow us to sample from an arbitrary probability distribution without explicitly estimating the distri- bution. There is a generator that takes a latent vector as input and transforms it into a valid sample from the distribution. There is also a discriminator that is trained to discriminate such fake samples from true samples of the distribution; at the same time, the generator is trained to generate fakes that the discriminator cannot tell apart from the true samples. Instead of learning a global generator, a recent approach involves training multiple generators each responsible from one part of the distribution. In this work, we review such approaches and propose the hierarchical mixture of generators, inspired from the hierarchical mixture of experts model, that learns a tree structure implementing a hierarchical clustering with soft splits in the decision nodes and local generators in the leaves. Since the generators are combined softly, the whole model is continuous and can be trained using gradient-based optimization, just like the original GAN model. Our experiments on five image data sets, namely, MNIST, FashionMNIST, UTZap50K, Oxford Flowers, and CelebA, show that our proposed model generates samples of high quality and diversity in terms of popular GAN evaluation metrics. The learned hierarchical structure also leads to knowledge extraction.

Quantifying the Use of Domain Randomization

Mohammad Ani, Hector Basevi, Ales Leonardis

Responsive image

Auto-TLDR; Evaluating Domain Randomization for Synthetic Image Generation by directly measuring the difference between realistic and synthetic data distributions

Slides Poster Similar

Synthetic image generation provides the ability to efficiently produce large quantities of labeled data, which addresses both the data volume requirements of state-of-the-art vision systems and the expense of manually labeling data. However, systems trained on synthetic data typically under-perform systems trained on realistic data due to mismatch between the synthetic and realistic data distributions. Domain Randomization (DR) is a method of broadening a synthetic data distribution to encompass a realistic data distribution, and so provide better performance, when the exact characteristics of the realistic data distribution are not known or cannot be simulated. However, there is no consensus in the literature on the best method of performing DR. We propose a novel method of ranking DR methods by directly measuring the difference between realistic and DR data distributions. This avoids the need to measure task-specific performance and the associated expense of training and evaluation. We compare different methods for measuring distribution differences including the Wasserstein, and Fr\'echet Inception distances. We also examine the effect of performing this evaluation directly on images, and on features generated by an image classification backbone. Finally, we show that the ranking generated by our method is reflected in actual task performance.