Enlarging Discriminative Power by Adding an Extra Class in Unsupervised Domain Adaptation

Hai Tran, Sumyeong Ahn, Taeyoung Lee, Yung Yi

Responsive image

Auto-TLDR; Unsupervised Domain Adaptation using Artificial Classes

Slides Poster

We study the problem of unsupervised domain adaptation that aims at obtaining a prediction model for the target domain using labeled data from the source domain and unlabeled data from the target domain. There exists an array of recent research based on the idea of extracting features that are not only invariant for both domains but also provide high discriminative power for the target domain. In this paper, we propose an idea of improving the discriminativeness: Adding an extra artificial class and training the model on the given data together with the GAN-generated samples of the new class. The trained model based on the new class samples is capable of extracting the features that are more discriminative by repositioning data of current classes in the target domain and therefore increasing the distances among the target clusters in the feature space. Our idea is highly generic so that it is compatible with many existing methods such as DANN, VADA, and DIRT-T. We conduct various experiments for the standard data commonly used for the evaluation of unsupervised domain adaptations and demonstrate that our algorithm achieves the SOTA performance for many scenarios.

Similar papers

Adversarially Constrained Interpolation for Unsupervised Domain Adaptation

Mohamed Azzam, Aurele Tohokantche Gnanha, Hau-San Wong, Si Wu

Responsive image

Auto-TLDR; Unsupervised Domain Adaptation with Domain Mixup Strategy

Slides Poster Similar

We address the problem of unsupervised domain adaptation (UDA) which aims at adapting models trained on a labeled domain to a completely unlabeled domain. One way to achieve this goal is to learn a domain-invariant representation. However, this approach is subject to two challenges: samples from two domains are insufficient to guarantee domain-invariance at most part of the latent space, and neighboring samples from the target domain may not belong to the same class on the low-dimensional manifold. To mitigate these shortcomings, we propose two strategies. First, we incorporate a domain mixup strategy in domain adversarial learning model by linearly interpolating between the source and target domain samples. This allows the latent space to be continuous and yields an improvement of the domain matching. Second, the domain discriminator is regularized via judging the relative difference between both domains for the input mixup features, which speeds up the domain matching. Experiment results show that our proposed model achieves a superior performance on different tasks under various domain shifts and data complexity.

Class Conditional Alignment for Partial Domain Adaptation

Mohsen Kheirandishfard, Fariba Zohrizadeh, Farhad Kamangar

Responsive image

Auto-TLDR; Multi-class Adversarial Adaptation for Partial Domain Adaptation

Slides Poster Similar

Adversarial adaptation models have demonstrated significant progress towards transferring knowledge from a labeled source dataset to an unlabeled target dataset. Partial domain adaptation (PDA) investigates the scenarios in which the source domain is large and diverse, and the target label space is a subset of the source label space. The main purpose of PDA is to identify the shared classes between the domains and promote learning transferable knowledge from these classes. In this paper, we propose a multi-class adversarial architecture for PDA. The proposed approach jointly aligns the marginal and class-conditional distributions in the shared label space by minimaxing a novel multi-class adversarial loss function. Furthermore, we incorporate effective regularization terms to encourage selecting the most relevant subset of source domain classes. In the absence of target labels, the proposed approach is able to effectively learn domain-invariant feature representations, which in turn can enhance the classification performance in the target domain. Comprehensive experiments on three benchmark datasets Office-$31$, Office-Home, and Caltech-Office corroborate the effectiveness of the proposed approach in addressing different partial transfer learning tasks.

Cross-Domain Semantic Segmentation of Urban Scenes Via Multi-Level Feature Alignment

Bin Zhang, Shengjie Zhao, Rongqing Zhang

Responsive image

Auto-TLDR; Cross-Domain Semantic Segmentation Using Generative Adversarial Networks

Slides Poster Similar

Semantic segmentation is an essential task in plenty of real-life applications such as virtual reality, video analysis, autonomous driving, etc. Recent advancements in fundamental vision-based tasks ranging from image classification to semantic segmentation have demonstrated deep learning-based models' high capability in learning complicated representation on large datasets. Nevertheless, manually labeling semantic segmentation dataset with pixel-level annotation is extremely labor-intensive. To address this problem, we propose a novel multi-level feature alignment framework for cross-domain semantic segmentation of urban scenes by exploiting generative adversarial networks. In the proposed multi-level feature alignment method, we first translate images from one domain to another one. Then the discriminative feature representations extracted by the deep neural network are concatenated, followed by domain adversarial learning to make the intermediate feature distribution of the target domain images close to those in the source domain. With these domain adaptation techniques, models trained with images in the source domain where the labels are easy to acquire can be deployed to the target domain where the labels are scarce. Experimental evaluations on various mainstream benchmarks confirm the effectiveness as well as robustness of our approach.

Unsupervised Multi-Task Domain Adaptation

Shih-Min Yang, Mei-Chen Yeh

Responsive image

Auto-TLDR; Unsupervised Domain Adaptation with Multi-task Learning for Image Recognition

Slides Poster Similar

With abundant labeled data, deep convolutional neural networks have shown great success in various image recognition tasks. However, these models are often less powerful when applied to novel datasets due to a phenomenon known as domain shift. Unsupervised domain adaptation methods aim to address this problem, allowing deep models trained on the labeled source domain to be used on a different target domain (without labels). In this paper, we investigate whether the generalization ability of an unsupervised domain adaptation method can be improved through multi-task learning, with learned features required to be both domain invariant and discriminative for multiple different but relevant tasks. Experiments evaluating two fundamental recognition tasks---including image recognition and segmentation--- show that the generalization ability empowered by multi-task learning may not benefit recognition when the model is directly applied on the target domain, but the multi-task setting can boost the performance of state-of-the-art unsupervised domain adaptation methods by a non-negligible margin.

Supervised Domain Adaptation Using Graph Embedding

Lukas Hedegaard, Omar Ali Sheikh-Omar, Alexandros Iosifidis

Responsive image

Auto-TLDR; Domain Adaptation from the Perspective of Multi-view Graph Embedding and Dimensionality Reduction

Slides Poster Similar

Getting deep convolutional neural networks to perform well requires a large amount of training data. When the available labelled data is small, it is often beneficial to use transfer learning to leverage a related larger dataset (source) in order to improve the performance on the small dataset (target). Among the transfer learning approaches, domain adaptation methods assume that distributions between the two domains are shifted and attempt to realign them. In this paper, we consider the domain adaptation problem from the perspective of multi-view graph embedding and dimensionality reduction. Instead of solving the generalised eigenvalue problem to perform the embedding, we formulate the graph-preserving criterion as loss in the neural network and learn a domain-invariant feature transformation in an end-to-end fashion. We show that the proposed approach leads to a powerful Domain Adaptation framework which generalises the prior methods CCSA and d-SNE, and enables simple and effective loss designs; an LDA-inspired instantiation of the framework leads to performance on par with the state-of-the-art on the most widely used Domain Adaptation benchmarks, Office31 and MNIST to USPS datasets.

Unsupervised Domain Adaptation with Multiple Domain Discriminators and Adaptive Self-Training

Teo Spadotto, Marco Toldo, Umberto Michieli, Pietro Zanuttigh

Responsive image

Auto-TLDR; Unsupervised Domain Adaptation for Semantic Segmentation of Urban Scenes

Slides Poster Similar

Unsupervised Domain Adaptation (UDA) aims at improving the generalization capability of a model trained on a source domain to perform well on a target domain for which no labeled data is available. In this paper, we consider the semantic segmentation of urban scenes and we propose an approach to adapt a deep neural network trained on synthetic data to real scenes addressing the domain shift between the two different data distributions. We introduce a novel UDA framework where a standard supervised loss on labeled synthetic data is supported by an adversarial module and a self-training strategy aiming at aligning the two domain distributions. The adversarial module is driven by a couple of fully convolutional discriminators dealing with different domains: the first discriminates between ground truth and generated maps, while the second between segmentation maps coming from synthetic or real world data. The self-training module exploits the confidence estimated by the discriminators on unlabeled data to select the regions used to reinforce the learning process. Furthermore, the confidence is thresholded with an adaptive mechanism based on the per-class overall confidence. Experimental results prove the effectiveness of the proposed strategy in adapting a segmentation network trained on synthetic datasets like GTA5 and SYNTHIA, to real world datasets like Cityscapes and Mapillary.

Self-Supervised Domain Adaptation with Consistency Training

Liang Xiao, Jiaolong Xu, Dawei Zhao, Zhiyu Wang, Li Wang, Yiming Nie, Bin Dai

Responsive image

Auto-TLDR; Unsupervised Domain Adaptation for Image Classification

Slides Poster Similar

We consider the problem of unsupervised domain adaptation for image classification. To learn target-domain-aware features from the unlabeled data, we create a self-supervised pretext task by augmenting the unlabeled data with a certain type of transformation (specifically, image rotation) and ask the learner to predict the properties of the transformation. However, the obtained feature representation may contain a large amount of irrelevant information with respect to the main task. To provide further guidance, we force the feature representation of the augmented data to be consistent with that of the original data. Intuitively, the consistency introduces additional constraints to representation learning, therefore, the learned representation is more likely to focus on the right information about the main task. Our experimental results validate the proposed method and demonstrate state-of-the-art performance on classical domain adaptation benchmarks.

Teacher-Student Competition for Unsupervised Domain Adaptation

Ruixin Xiao, Zhilei Liu, Baoyuan Wu

Responsive image

Auto-TLDR; Unsupervised Domain Adaption with Teacher-Student Competition

Slides Poster Similar

With the supervision from source domain only in class-level, existing unsupervised domain adaption (UDA) methods mainly learn the domain-invariant representations from a shared feature extractor, which cause the source-bias problem. This paper proposes an unsupervised domain adaption approach with Teacher-Student Competition (TSC). In particular, a student network is introduced to learn the target-specific feature space, and we design a novel competition mechanism to select more credible pseudo-labels for the training of student network. We introduce a teacher network with the structure of existing conventional UDA method, and both teacher and student networks compete to provide target pseudo-labels to constrain every target sample's training in student network. Extensive experiments demonstrate that our proposed TSC framework significantly outperforms the state-of-the-art domain adaption methods on Office-31 and ImageCLEF-DA benchmarks.

A Simple Domain Shifting Network for Generating Low Quality Images

Guruprasad Hegde, Avinash Nittur Ramesh, Kanchana Vaishnavi Gandikota, Michael Möller, Roman Obermaisser

Responsive image

Auto-TLDR; Robotic Image Classification Using Quality degrading networks

Slides Poster Similar

Deep Learning systems have proven to be extremely successful for image recognition tasks for which significant amounts of training data is available, e.g., on the famous ImageNet dataset. We demonstrate that for robotics applications with cheap camera equipment, the low image quality, however, influences the classification accuracy, and freely available data bases cannot be exploited in a straight forward way to train classifiers to be used on a robot. As a solution we propose to train a network on degrading the quality images in order to mimic specific low quality imaging systems. Numerical experiments demonstrate that classification networks trained by using images produced by our quality degrading network along with the high quality images outperform classification networks trained only on high quality data when used on a real robot system, while being significantly easier to use than competing zero-shot domain adaptation techniques.

A Unified Framework for Distance-Aware Domain Adaptation

Fei Wang, Youdong Ding, Huan Liang, Yuzhen Gao, Wenqi Che

Responsive image

Auto-TLDR; distance-aware domain adaptation

Slides Poster Similar

Unsupervised domain adaptation has achieved significant results by leveraging knowledge from a source domain to learn a related but unlabeled target domain. Previous methods are insufficient to model domain discrepancy and class discrepancy, which may lead to misalignment and poor adaptation performance. To address this problem, in this paper, we propose a unified framework, called distance-aware domain adaptation, which is fully aware of both cross-domain distance and class-discriminative distance. In addition, second-order statistics distance and manifold alignment are also exploited to extract more information from data. In this manner, the generalization error of the target domain in classification problems can be reduced substantially. To validate the proposed method, we conducted experiments on five public datasets and an ablation study. The results demonstrate the good performance of our proposed method.

Local Clustering with Mean Teacher for Semi-Supervised Learning

Zexi Chen, Benjamin Dutton, Bharathkumar Ramachandra, Tianfu Wu, Ranga Raju Vatsavai

Responsive image

Auto-TLDR; Local Clustering for Semi-supervised Learning

Slides Similar

The Mean Teacher (MT) model of Tarvainen and Valpola has shown favorable performance on several semi-supervised benchmark datasets. MT maintains a teacher model's weights as the exponential moving average of a student model's weights and minimizes the divergence between their probability predictions under diverse perturbations of the inputs. However, MT is known to suffer from confirmation bias, that is, reinforcing incorrect teacher model predictions. In this work, we propose a simple yet effective method called Local Clustering (LC) to mitigate the effect of confirmation bias. In MT, each data point is considered independent of other points during training; however, data points are likely to be close to each other in feature space if they share similar features. Motivated by this, we cluster data points locally by minimizing the pairwise distance between neighboring data points in feature space. Combined with a standard classification cross-entropy objective on labeled data points, the misclassified unlabeled data points are pulled towards high-density regions of their correct class with the help of their neighbors, thus improving model performance. We demonstrate on semi-supervised benchmark datasets SVHN and CIFAR-10 that adding our LC loss to MT yields significant improvements compared to MT and performance comparable to the state of the art in semi-supervised learning.

Shape Consistent 2D Keypoint Estimation under Domain Shift

Levi Vasconcelos, Massimiliano Mancini, Davide Boscaini, Barbara Caputo, Elisa Ricci

Responsive image

Auto-TLDR; Deep Adaptation for Keypoint Prediction under Domain Shift

Slides Poster Similar

Recent unsupervised domain adaptation methods based on deep architectures have shown remarkable performance not only in traditional classification tasks but also in more complex problems involving structured predictions (e.g. semantic segmentation, depth estimation). Following this trend, in this paper we present a novel deep adaptation framework for estimating keypoints under \textit{domain shift}, i.e. when the training (\textit{source}) and the test (\textit{target}) images significantly differ in terms of visual appearance. Our method seamlessly combines three different components: feature alignment, adversarial training and self-supervision. Specifically, our deep architecture leverages from domain-specific distribution alignment layers to perform target adaptation at the feature level. Furthermore, a novel loss is proposed which combines an adversarial term for ensuring aligned predictions in the output space and a geometric consistency term which guarantees coherent predictions between a target sample and its perturbed version. Our extensive experimental evaluation conducted on three publicly available benchmarks shows that our approach outperforms state-of-the-art domain adaptation methods in the 2D keypoint prediction task.

Semi-Supervised Domain Adaptation Via Selective Pseudo Labeling and Progressive Self-Training

Yoonhyung Kim, Changick Kim

Responsive image

Auto-TLDR; Semi-supervised Domain Adaptation with Pseudo Labels

Slides Poster Similar

Domain adaptation (DA) is a representation learning methodology that transfers knowledge from a label-sufficient source domain to a label-scarce target domain. While most of early methods are focused on unsupervised DA (UDA), several studies on semi-supervised DA (SSDA) are recently suggested. In SSDA, a small number of labeled target images are given for training, and the effectiveness of those data is demonstrated by the previous studies. However, the previous SSDA approaches solely adopt those data for embedding ordinary supervised losses, overlooking the potential usefulness of the few yet informative clues. Based on this observation, in this paper, we propose a novel method that further exploits the labeled target images for SSDA. Specifically, we utilize labeled target images to selectively generate pseudo labels for unlabeled target images. In addition, based on the observation that pseudo labels are inevitably noisy, we apply a label noise-robust learning scheme, which progressively updates the network and the set of pseudo labels by turns. Extensive experimental results show that our proposed method outperforms other previous state-of-the-art SSDA methods.

Revisiting ImprovedGAN with Metric Learning for Semi-Supervised Learning

Jaewoo Park, Yoon Gyo Jung, Andrew Teoh

Responsive image

Auto-TLDR; Improving ImprovedGAN with Metric Learning for Semi-supervised Learning

Slides Poster Similar

Semi-supervised Learning (SSL) is a classical problem where a model needs to solve classification as it is trained on a partially labeled train data. After the introduction of generative adversarial network (GAN) and its success, the model has been modified to be applicable to SSL. ImprovedGAN as a representative model for GAN-based SSL, it showed promising performance on the SSL problem. However, the inner mechanism of this model has been only partially revealed. In this work, we revisit ImprovedGAN with a fresh perspective based on metric learning. In particular, we interpret ImprovedGAN by general pair weighting, a recent framework in metric learning. Based on this interpretation, we derive two theoretical properties of ImprovedGAN: (i) its discriminator learns to make confident predictions over real samples, (ii) the adversarial interaction in ImprovedGAN along with semi-supervision results in cluster separation by reducing intra-class variance and increasing the inter-class variance, thereby improving the model generalization. These theoretical implications are experimentally supported. Motivated by the findings, we propose a variant of ImprovedGAN, called Intensified ImprovedGAN (I2GAN), where its cluster separation characteristic is enhanced by two proposed techniques: (a) the unsupervised discriminator loss is scaled up and (b) the generated batch size is enlarged. As a result, I2GAN produces better class-wise cluster separation and, hence, generalization. Extensive experiments on the widely known benchmark data sets verify the effectiveness of our proposed method, showing that its performance is better than or comparable to other GAN based SSL models.

Randomized Transferable Machine

Pengfei Wei, Tze Yun Leong

Responsive image

Auto-TLDR; Randomized Transferable Machine for Suboptimal Feature-based Transfer Learning

Slides Poster Similar

Feature-based transfer method is one of the most effective methodologies for transfer learning. Existing works usually claim the learned new feature representation is truly \emph{domain-invariant}, and thus directly train a transfer model $\mathcal{M}$ on source domain. In this paper, we work on a more realistic scenario where the new feature representation is suboptimal where small divergence still exists across domains. We propose a new learning strategy and name the transfer model following the learning strategy as Randomized Transferable Machine (RTM). More specifically, we work on source data with the new feature representation learned from existing feature-based transfer methods. Our key idea is to enlarge source training data populations by randomly corrupting source data using some noises, and then train a transfer model $\widetilde{\mathcal{M}}$ performing well on all these corrupted source data populations. In principle, the more corruptions are made, the higher probability of the target data can be covered by the constructed source populations and thus a better transfer performance can be achieved by $\widetilde{\mathcal{M}}$. An ideal case is with infinite corruptions, which however is infeasible in reality. We instead develop a marginalized solution. With a marginalization trick, we can train an RTM that is equivalently trained using infinite source noisy populations without truly conducting any corruption. More importantly, such an RTM has a closed-form solution, which enables a super fast and efficient training. Extensive experiments on various real-world transfer tasks show that RTM is a very promising transfer model.

GAP: Quantifying the Generative Adversarial Set and Class Feature Applicability of Deep Neural Networks

Edward Collier, Supratik Mukhopadhyay

Responsive image

Auto-TLDR; Approximating Adversarial Learning in Deep Neural Networks Using Set and Class Adversaries

Slides Poster Similar

Recent work in deep neural networks has sought to characterize the nature in which a network learns features and how applicable learnt features are to various problem sets. Deep neural network applicability can be split into three sub-problems; set applicability, class applicability, and instance applicability. In this work we seek to quantify the applicability of features learned during adversarial training, focusing specifically on set and class applicability. We apply techniques for measuring applicability to both generators and discriminators trained on various data sets to quantify applicability and better observe how both a generator and a discriminator, and generative models as a whole, learn features during adversarial training.

Unsupervised Domain Adaptation for Person Re-Identification through Source-Guided Pseudo-Labeling

Fabian Dubourvieux, Romaric Audigier, Angélique Loesch, Ainouz-Zemouche Samia, Stéphane Canu

Responsive image

Auto-TLDR; Pseudo-labeling for Unsupervised Domain Adaptation for Person Re-Identification

Slides Poster Similar

Person Re-Identification (re-ID) aims at retrieving images of the same person taken by different cameras. A challenge for re-ID is the performance preservation when a model is used on data of interest (target data) which belong to a different domain from the training data domain (source data). Unsupervised Domain Adaptation (UDA) is an interesting research direction for this challenge as it avoids a costly annotation of the target data. Pseudo-labeling methods achieve the best results in UDA-based re-ID. They incrementally learn with identity pseudo-labels which are initialized by clustering features in the source re-ID encoder space. Surprisingly, labeled source data are discarded after this initialization step. However, we believe that pseudo-labeling could further leverage the labeled source data in order to improve the post-initialization training steps. In order to improve robustness against erroneous pseudo-labels, we advocate the exploitation of both labeled source data and pseudo-labeled target data during all training iterations. To support our guideline, we introduce a framework which relies on a two-branch architecture optimizing classification in source and target domains, respectively, in order to allow adaptability to the target domain while ensuring robustness to noisy pseudo-labels. Indeed, shared low and mid-level parameters benefit from the source classification signal while high-level parameters of the target branch learn domain-specific features. Our method is simple enough to be easily combined with existing pseudo-labeling UDA approaches. We show experimentally that it is efficient and improves performance when the base method has no mechanism to deal with pseudo-label noise. And it maintains performance when combined with base method that already manages pseudo-label noise. Our approach reaches state-of-the-art performance when evaluated on commonly used datasets, Market-1501 and DukeMTMC-reID, and outperforms the state of the art when targeting the bigger and more challenging dataset MSMT.

Respecting Domain Relations: Hypothesis Invariance for Domain Generalization

Ziqi Wang, Marco Loog, Jan Van Gemert

Responsive image

Auto-TLDR; Learning Hypothesis Invariant Representations for Domain Generalization

Slides Poster Similar

In domain generalization, multiple labeled non-independent and non-identically distributed source domains are available during training while neither the data nor the labels of target domains are. Currently, learning so-called domain invariant representations (DIRs) is the prevalent approach to domain generalization. In this work, we define DIRs employed by existing works in probabilistic terms and show that by learning DIRs, overly strict requirements are imposed concerning the invariance. Particularly, DIRs aim to perfectly align representations of different domains, i.e. their input distributions. This is, however, not necessary for good generalization to a target domain and may even dispose of valuable classification information. We propose to learn so-called hypothesis invariant representations (HIRs), which relax the invariance assumptions. We report experimental results on public domain generalization datasets to show that learning HIRs is more effective than learning DIRs. In fact, our approach can even compete with approaches using prior knowledge about domains.

Energy-Constrained Self-Training for Unsupervised Domain Adaptation

Xiaofeng Liu, Xiongchang Liu, Bo Hu, Jun Lu, Jonghye Woo, Jane You

Responsive image

Auto-TLDR; Unsupervised Domain Adaptation with Energy Function Minimization

Slides Poster Similar

Unsupervised domain adaptation (UDA) aims to transfer the knowledge on a labeled source domain distribution to perform well on an unlabeled target domain. Recently, the deep self-training involves an iterative process of predicting on the target domain and then taking the confident predictions as hard pseudo-labels for retraining. However, the pseudo-labels are usually unreliable, and easily leading to deviated solutions with propagated errors. In this paper, we resort to the energy-based model and constrain the training of the unlabeled target sample with the energy function minimization objective. It can be applied as a simple additional regularization. In this framework, it is possible to gain the benefits of the energy-based model, while retaining strong discriminative performance following a plug-and-play fashion. The convergence property and its connection with classification expectation minimization are investigated. We deliver extensive experiments on the most popular and large scale UDA benchmarks of image classification as well as semantic segmentation to demonstrate its generality and effectiveness.

Foreground-Focused Domain Adaption for Object Detection

Yuchen Yang, Nilanjan Ray

Responsive image

Auto-TLDR; Unsupervised Domain Adaptation for Unsupervised Object Detection

Slides Similar

Object detectors suffer from accuracy loss caused by domain shift from a source to a target domain. Unsupervised domain adaptation (UDA) approaches mitigate this loss by training with unlabeled target domain images. A popular processing pipeline applies adversarial training that aligns the distributions of the features from the two domains. We advocate that aligning the full image level features is not ideal for UDA object detection due to the presence of varied background areas during inference. Thus, we propose a novel foreground-focused domain adaptation (FFDA) framework which mines the loss of the domain discriminators to concentrate on the backpropagation of foreground loss. We obtain mining masks by collecting target predictions and source labels to outline foreground regions, and apply the masks to image and instance level domain discriminators to allow backpropagation only on the mined regions. By reinforcing this foreground-focused adaptation throughout multiple layers in the detector model, we gain a significant accuracy boost on the target domain prediction. Compared to previous works, our method reaches the new state-of-the-art accuracy on adapting Cityscape to Foggy Cityscape dataset and demonstrates competitive accuracy on other datasets that include various scenarios for autonomous driving applications.

Open Set Domain Recognition Via Attention-Based GCN and Semantic Matching Optimization

Xinxing He, Yuan Yuan, Zhiyu Jiang

Responsive image

Auto-TLDR; Attention-based GCN and Semantic Matching Optimization for Open Set Domain Recognition

Slides Poster Similar

Open set domain recognition has got the attention in recent years. The task aims to specifically classify each sample in the practical unlabeled target domain, which consists of all known classes in the manually labeled source domain and target-specific unknown categories. The absence of annotated training data or auxiliary attribute information for unknown categories makes this task especially difficult. Moreover, exiting domain discrepancy in label space and data distribution further distracts the knowledge transferred from known classes to unknown classes. To address these issues, this work presents an end-to-end model based on attention-based GCN and semantic matching optimization, which first employs the attention mechanism to enable the central node to learn more discriminating representations from its neighbors in the knowledge graph. Moreover, a coarse-to-fine semantic matching optimization approach is proposed to progressively bridge the domain gap. Experimental results validate that the proposed model not only has superiority on recognizing the images of known and unknown classes, but also can adapt to various openness of the target domain.

IDA-GAN: A Novel Imbalanced Data Augmentation GAN

Hao Yang, Yun Zhou

Responsive image

Auto-TLDR; IDA-GAN: Generative Adversarial Networks for Imbalanced Data Augmentation

Slides Poster Similar

Class imbalance is a widely existed and challenging problem in real-world applications such as disease diagnosis, fraud detection, network intrusion detection and so on. Due to the scarce of data, it could significantly deteriorate the accuracy of classification. To address this challenge, we propose a novel Imbalanced Data Augmentation Generative Adversarial Networks (GAN) named IDA-GAN as an augmentation tool to deal with the imbalanced dataset. This is a great challenge because it is hard to train a GAN model under this situation. We overcome this issue by coupling Variational autoencoder along with GAN training. Specifically, we introduce the Variational autoencoder to learn the majority and minority class distributions in the latent space, and use the generative model to utilize each class distribution for the subsequent GAN training. The generative model learns useful features to generate target minority-class samples. By comparing with the state-of-the-art GAN models, the experimental results demonstrate that our proposed IDA-GAN could generate more diverse minority samples with better qualities, and it consistently benefits the imbalanced classification task in terms of several widely-used evaluation metrics on five benchmark datasets: MNIST, Fashion-MNIST, SVHN, CIFAR-10 and GTRSB.

Rethinking Domain Generalization Baselines

Francesco Cappio Borlino, Antonio D'Innocente, Tatiana Tommasi

Responsive image

Auto-TLDR; Style Transfer Data Augmentation for Domain Generalization

Slides Poster Similar

Despite being very powerful in standard learning settings, deep learning models can be extremely brittle when deployed in scenarios different from those on which they were trained. Domain generalization methods investigate this problem and data augmentation strategies have shown to be helpful tools to increase data variability, supporting model robustness across domains. In our work we focus on style transfer data augmentation and we present how it can be implemented with a simple and inexpensive strategy to improve generalization. Moreover, we analyze the behavior of current state of the art domain generalization methods when integrated with this augmentation solution: our thorough experimental evaluation shows that their original effect almost always disappears with respect to the augmented baseline. This issue open new scenarios for domain generalization research, highlighting the need of novel methods properly able to take advantage of the introduced data variability.

Learning Low-Shot Generative Networks for Cross-Domain Data

Hsuan-Kai Kao, Cheng-Che Lee, Wei-Chen Chiu

Responsive image

Auto-TLDR; Learning Generators for Cross-Domain Data under Low-Shot Learning

Slides Poster Similar

We tackle a novel problem of learning generators for cross-domain data under a specific scenario of low-shot learning. Basically, given a source domain with sufficient amount of training data, we aim to transfer the knowledge of its generative process to another target domain, which not only has few data samples but also contains the domain shift with respect to the source domain. This problem has great potential in practical use and is different from the well-known image translation task, as the target-domain data can be generated without requiring any source-domain ones and the large data consumption for learning target-domain generator can be alleviated. Built upon a cross-domain dataset where (1) each of the low shots in the target domain has its correspondence in the source and (2) these two domains share the similar content information but different appearance, two approaches are proposed: a Latent-Disentanglement-Orientated model (LaDo) and a Generative-Hierarchy-Oriented (GenHo) model. Our LaDo and GenHo approaches address the problem from different perspectives, where the former relies on learning the disentangled representation composed of domain-invariant content features and domain-specific appearance ones; while the later decomposes the generative process of a generator into two parts for synthesizing the content and appearance sequentially. We perform extensive experiments under various settings of cross-domain data and show the efficacy of our models for generating target-domain data with the abundant content variance as in the source domain, which lead to the favourable performance in comparison to several baselines.

Text Recognition in Real Scenarios with a Few Labeled Samples

Jinghuang Lin, Cheng Zhanzhan, Fan Bai, Yi Niu, Shiliang Pu, Shuigeng Zhou

Responsive image

Auto-TLDR; Few-shot Adversarial Sequence Domain Adaptation for Scene Text Recognition

Slides Poster Similar

Scene text recognition (STR) is still a hot research topic in computer vision field due to its various applications. Existing works mainly focus on learning a general model with a huge number of synthetic text images to recognize unconstrained scene texts, and have achieved substantial progress. However, these methods are not quite applicable in many real-world scenarios where 1) high recognition accuracy is required, while 2) labeled samples are lacked. To tackle this challenging problem, this paper proposes a few-shot adversarial sequence domain adaptation (FASDA) approach to build sequence adaptation between the synthetic source domain (with many synthetic labeled samples) and a specific target domain (with only some or a few real labeled samples). This is done by simultaneously learning each character’s feature representation with an attention mech- anism and establishing the corresponding character-level latent subspace with adversarial learning. Our approach can maximize the character-level confusion between the source domain and the target domain, thus achieves the sequence-level adaptation with even a small number of labeled samples in the target domain. Extensive experiments on various datasets show that our method significantly outperforms the finetuning scheme, and obtains comparable performance to the state-of-the-art STR methods.

Efficient Shadow Detection and Removal Using Synthetic Data with Domain Adaptation

Rui Guo, Babajide Ayinde, Hao Sun

Responsive image

Auto-TLDR; Shadow Detection and Removal with Domain Adaptation and Synthetic Image Database

Poster Similar

In recent years, learning based shadow detection and removal approaches have shown prospects and, in most cases, yielded state-of-the-art results. The performance of these approaches, however, relies heavily on the construction of training database of shadow images, shadow-free versions, and shadow maps as ground truth. This conventional data gathering method is time-consuming, expensive, or even practically intractable to realize especially for outdoor scenes with complicated shadow patterns, thus limiting the size of the data available for training. In this paper, we leverage on large high quality synthetic image database and domain adaptation to eliminate the bottlenecks resulting from insufficient training samples and domain bias. Specifically, our approach utilizes adversarial training to predict near-pixel-perfect shadow map from synthetic shadow image for downstream shadow removal steps. At inference time, we capitalize on domain adaptation via image style transfer to map the style of real- world scene to that of synthetic scene for the purpose of detecting and subsequently removing shadow. Comprehensive experiments indicate that our approach outperforms state-of-the-art methods on select benchmark datasets.

Domain Generalized Person Re-Identification Via Cross-Domain Episodic Learning

Ci-Siang Lin, Yuan Chia Cheng, Yu-Chiang Frank Wang

Responsive image

Auto-TLDR; Domain-Invariant Person Re-identification with Episodic Learning

Slides Poster Similar

Aiming at recognizing images of the same person across distinct camera views, person re-identification (re-ID) has been among active research topics in computer vision. Most existing re-ID works require collection of a large amount of labeled image data from the scenes of interest. When the data to be recognized are different from the source-domain training ones, a number of domain adaptation approaches have been proposed. Nevertheless, one still needs to collect labeled or unlabelled target-domain data during training. In this paper, we tackle an even more challenging and practical setting, domain generalized (DG) person re-ID. That is, while a number of labeled source-domain datasets are available, we do not have access to any target-domain training data. In order to learn domain-invariant features without knowing the target domain of interest, we present an episodic learning scheme which advances meta learning strategies to exploit the observed source-domain labeled data. The learned features would exhibit sufficient domain-invariant properties while not overfitting the source-domain data or ID labels. Our experiments on four benchmark datasets confirm the superiority of our method over the state-of-the-arts.

Spatial-Aware GAN for Unsupervised Person Re-Identification

Fangneng Zhan, Changgong Zhang

Responsive image

Auto-TLDR; Unsupervised Unsupervised Domain Adaptation for Person Re-Identification

Similar

The recent person re-identification research has achieved great success by learning from a large number of labeled person images. On the other hand, the learned models often experience significant performance drops when applied to images collected in a different environment. Unsupervised domain adaptation (UDA) has been investigated to mitigate this constraint, but most existing systems adapt images at pixel level only and ignore obvious discrepancies at spatial level. This paper presents an innovative UDA-based person re-identification network that is capable of adapting images at both spatial and pixel levels simultaneously. A novel disentangled cycle-consistency loss is designed which guides the learning of spatial-level and pixel-level adaptation in a collaborative manner. In addition, a novel multi-modal mechanism is incorporated which is capable of generating images of different geometry views and augmenting training images effectively. Extensive experiments over a number of public datasets show that the proposed UDA network achieves superior person re-identification performance as compared with the state-of-the-art.

Data Augmentation Via Mixed Class Interpolation Using Cycle-Consistent Generative Adversarial Networks Applied to Cross-Domain Imagery

Hiroshi Sasaki, Chris G. Willcocks, Toby Breckon

Responsive image

Auto-TLDR; C2GMA: A Generative Domain Transfer Model for Non-visible Domain Classification

Slides Poster Similar

Machine learning driven object detection and classification within non-visible imagery has an important role in many fields such as night vision, all-weather surveillance and aviation security. However, such applications often suffer due to the limited quantity and variety of non-visible spectral domain imagery, in contrast to the high data availability of visible-band imagery that readily enables contemporary deep learning driven detection and classification approaches. To address this problem, this paper proposes and evaluates a novel data augmentation approach that leverages the more readily available visible-band imagery via a generative domain transfer model. The model can synthesise large volumes of non-visible domain imagery by image-to-image (I2I) translation from the visible image domain. Furthermore, we show that the generation of interpolated mixed class (non-visible domain) image examples via our novel Conditional CycleGAN Mixup Augmentation (C2GMA) methodology can lead to a significant improvement in the quality of non-visible domain classification tasks that otherwise suffer due to limited data availability. Focusing on classification within the Synthetic Aperture Radar (SAR) domain, our approach is evaluated on a variation of the Statoil/C-CORE Iceberg Classifier Challenge dataset and achieves 75.4% accuracy, demonstrating a significant improvement when compared against traditional data augmentation strategies (Rotation, Mixup, and MixCycleGAN).

Beyond Cross-Entropy: Learning Highly Separable Feature Distributions for Robust and Accurate Classification

Arslan Ali, Andrea Migliorati, Tiziano Bianchi, Enrico Magli

Responsive image

Auto-TLDR; Gaussian class-conditional simplex loss for adversarial robust multiclass classifiers

Slides Poster Similar

Deep learning has shown outstanding performance in several applications including image classification. However, deep classifiers are known to be highly vulnerable to adversarial attacks, in that a minor perturbation of the input can easily lead to an error. Providing robustness to adversarial attacks is a very challenging task especially in problems involving a large number of classes, as it typically comes at the expense of an accuracy decrease. In this work, we propose the Gaussian class-conditional simplex (GCCS) loss: a novel approach for training deep robust multiclass classifiers that provides adversarial robustness while at the same time achieving or even surpassing the classification accuracy of state-of-the-art methods. Differently from other frameworks, the proposed method learns a mapping of the input classes onto target distributions in a latent space such that the classes are linearly separable. Instead of maximizing the likelihood of target labels for individual samples, our objective function pushes the network to produce feature distributions yielding high inter-class separation. The mean values of the distributions are centered on the vertices of a simplex such that each class is at the same distance from every other class. We show that the regularization of the latent space based on our approach yields excellent classification accuracy and inherently provides robustness to multiple adversarial attacks, both targeted and untargeted, outperforming state-of-the-art approaches over challenging datasets.

Semi-Supervised Generative Adversarial Networks with a Pair of Complementary Generators for Retinopathy Screening

Yingpeng Xie, Qiwei Wan, Hai Xie, En-Leng Tan, Yanwu Xu, Baiying Lei

Responsive image

Auto-TLDR; Generative Adversarial Networks for Retinopathy Diagnosis via Fundus Images

Slides Poster Similar

Several typical types of retinopathy are major causes of blindness. However, early detection of retinopathy is quite not easy since few symptoms are observable in the early stage, attributing to the development of non-mydriatic retinal camera. These camera produces high-resolution retinal fundus images provide the possibility of Computer-Aided-Diagnosis (CAD) via deep learning to assist diagnosing retinopathy. Deep learning algorithms usually rely on a great number of labelled images which are expensive and time-consuming to obtain in the medical imaging area. Moreover, the random distribution of various lesions which often vary greatly in size also brings significant challenges to learn discriminative information from high-resolution fundus image. In this paper, we present generative adversarial networks simultaneously equipped with "good" generator and "bad" generator (GBGANs) to make up for the incomplete data distribution provided by limited fundus images. To improve the generative feasibility of generator, we introduce into pre-trained feature extractor to acquire condensed feature for each fundus image in advance. Experimental results on integrated three public iChallenge datasets show that the proposed GBGANs could fully utilize the available fundus images to identify retinopathy with little label cost.

DAPC: Domain Adaptation People Counting Via Style-Level Transfer Learning and Scene-Aware Estimation

Na Jiang, Xingsen Wen, Zhiping Shi

Responsive image

Auto-TLDR; Domain Adaptation People counting via Style-Level Transfer Learning and Scene-Aware Estimation

Slides Poster Similar

People counting concentrates on predicting the number of people in surveillance images. It remains challenging due to the rich variations in scene type and crowd density. Besides, the limited closed-set with ground truth from reality significantly increase the difficulty of people counting in actual open-set. Targeting to solve these problems, this paper proposes a domain adaptation people counting via style-level transfer learning (STL) and scene-aware estimation (SAE). The style-level transfer learning explicitly leverages the style constraint and content similarity between images to learn effective knowledge transfer, which narrows the gap between closed-set and open-set by generating domain adaptation images. The scene-aware estimation introduces scene classifier to provide scene-aware weights for adaptively fusing density maps, which alleviates interference of variations in scene type and crowd density on domain adaptation people counting. Extensive experimental results demonstrate that images generated by STL are more suitable for domain adaptation learning and our proposed approach significantly outperforms the state-of-the-art methods on multiple cross-domain pairs.

Augmented Cyclic Consistency Regularization for Unpaired Image-To-Image Translation

Takehiko Ohkawa, Naoto Inoue, Hirokatsu Kataoka, Nakamasa Inoue

Responsive image

Auto-TLDR; Augmented Cyclic Consistency Regularization for Unpaired Image-to-Image Translation

Slides Poster Similar

Unpaired image-to-image (I2I) translation has received considerable attention in pattern recognition and computer vision because of recent advancements in generative adversarial networks (GANs). However, due to the lack of explicit supervision, unpaired I2I models often fail to generate realistic images, especially in challenging datasets with different backgrounds and poses. Hence, stabilization is indispensable for real-world applications and GANs. Herein, we propose Augmented Cyclic Consistency Regularization (ACCR), a novel regularization method for unpaired I2I translation. Our main idea is to enforce consistency regularization originating from semi-supervised learning on the discriminators leveraging real, fake, reconstructed, and augmented samples. We regularize the discriminators to output similar predictions when fed pairs of original and perturbed images. We qualitatively clarify the generation property between unpaired I2I models and standard GANs, and explain why consistency regularization on fake and reconstructed samples works well. Quantitatively, our method outperforms the consistency regularized GAN (CR-GAN) in real-world translations and demonstrates efficacy against several data augmentation variants and cycle-consistent constraints.

CANU-ReID: A Conditional Adversarial Network for Unsupervised Person Re-IDentification

Guillaume Delorme, Yihong Xu, Stéphane Lathuiliere, Radu Horaud, Xavier Alameda-Pineda

Responsive image

Auto-TLDR; Unsupervised Person Re-Identification with Clustering and Adversarial Learning

Slides Similar

Unsupervised person re-ID is the task of identifying people on a target data set for which the ID labels are unavailable during training. In this paper, we propose to unify two trends in unsupervised person re-ID: clustering & fine-tuning and adversarial learning. On one side, clustering groups training images into pseudo-ID labels, and uses them to fine-tune the feature extractor. On the other side, adversarial learning is used, inspired by domain adaptation, to match distributions from different domains. Since target data is distributed across different camera viewpoints, we propose to model each camera as an independent domain, and aim to learn domain-independent features. Straightforward adversarial learning yields negative transfer, we thus introduce a conditioning vector to mitigate this undesirable effect. In our framework, the centroid of the cluster to which the visual sample belongs is used as conditioning vector of our conditional adversarial network, where the vector is permutation invariant (clusters ordering does not matter) and its size is independent of the number of clusters. To our knowledge, we are the first to propose the use of conditional adversarial networks for unsupervised person re-ID. We evaluate the proposed architecture on top of two state-of-the-art clustering-based unsupervised person re-identification (re-ID) methods on four different experimental settings with three different data sets and set the new state-of-the-art performance on all four of them. Our code and model will be made publicly available at https://team.inria.fr/perception/canu-reid/.

On the Evaluation of Generative Adversarial Networks by Discriminative Models

Amirsina Torfi, Mohammadreza Beyki, Edward Alan Fox

Responsive image

Auto-TLDR; Domain-agnostic GAN Evaluation with Siamese Neural Networks

Slides Poster Similar

Generative Adversarial Networks (GANs) can accurately model complex multi-dimensional data and generate realistic samples. However, due to their implicit estimation of data distributions, their evaluation is a challenging task. The majority of research efforts associated with tackling this issue were validated by qualitative visual evaluation. Such approaches do not generalize well beyond the image domain. Since many of those evaluation metrics are proposed and bound to the vision domain, they are difficult to apply to other domains. Quantitative measures are necessary to better guide the training and comparison of different GANs models. In this work, we leverage Siamese neural networks to propose a domain-agnostic evaluation metric: (1) with a qualitative evaluation that is consistent with human evaluation, (2) that is robust relative to common GAN issues such as mode dropping and invention, and (3) does not require any pretrained classifier. The empirical results in this paper demonstrate the superiority of this method compared to the popular Inception Score and are competitive with the FID score.

Image Representation Learning by Transformation Regression

Xifeng Guo, Jiyuan Liu, Sihang Zhou, En Zhu, Shihao Dong

Responsive image

Auto-TLDR; Self-supervised Image Representation Learning using Continuous Parameter Prediction

Slides Poster Similar

Self-supervised learning is a thriving research direction since it can relieve the burden of human labeling for machine learning by seeking for supervision from data instead of human annotation. Although demonstrating promising performance in various applications, we observe that the existing methods usually model the auxiliary learning tasks as classification tasks with finite discrete labels, leading to insufficient supervisory signals, which in turn restricts the representation quality. In this paper, to solve the above problem and make full use of the supervision from data, we design a regression model to predict the continuous parameters of a group of transformations, i.e., image rotation, translation, and scaling. Surprisingly, this naive modification stimulates tremendous potential from data and the resulting supervisory signal has largely improved the performance of image representation learning. Extensive experiments on four image datasets, including CIFAR10, CIFAR100, STL10, and SVHN, indicate that our proposed algorithm outperforms the state-of-the-art unsupervised learning methods by a large margin in terms of classification accuracy. Crucially, we find that with our proposed training mechanism as an initialization, the performance of the existing state-of-the-art classification deep architectures can be preferably improved.

Multi-Modal Deep Clustering: Unsupervised Partitioning of Images

Guy Shiran, Daphna Weinshall

Responsive image

Auto-TLDR; Multi-Modal Deep Clustering for Unlabeled Images

Slides Poster Similar

The clustering of unlabeled raw images is a daunting task, which has recently been approached with some success by deep learning methods. Here we propose an unsupervised clustering framework, which learns a deep neural network in an end-to-end fashion, providing direct cluster assignments of images without additional processing. Multi-Modal Deep Clustering (MMDC), trains a deep network to align its image embeddings with target points sampled from a Gaussian Mixture Model distribution. The cluster assignments are then determined by mixture component association of image embeddings. Simultaneously, the same deep network is trained to solve an additional self-supervised task. This pushes the network to learn more meaningful image representations and stabilizes the training. Experimental results show that MMDC achieves or exceeds state-of-the-art performance on four challenging benchmarks. On natural image datasets we improve on previous results with significant margins of up to 11% absolute accuracy points, yielding an accuracy of 70% on CIFAR-10 and 61% on STL-10.

Local Facial Attribute Transfer through Inpainting

Ricard Durall, Franz-Josef Pfreundt, Janis Keuper

Responsive image

Auto-TLDR; Attribute Transfer Inpainting Generative Adversarial Network

Slides Poster Similar

The term attribute transfer refers to the tasks of altering images in such a way, that the semantic interpretation of a given input image is shifted towards an intended direction, which is quantified by semantic attributes. Prominent example applications are photo realistic changes of facial features and expressions, like changing the hair color, adding a smile, enlarging the nose or altering the entire context of a scene, like transforming a summer landscape into a winter panorama. Recent advances in attribute transfer are mostly based on generative deep neural networks, using various techniques to manipulate images in the latent space of the generator. In this paper, we present a novel method for the common sub-task of local attribute transfers, where only parts of a face have to be altered in order to achieve semantic changes (e.g. removing a mustache). In contrast to previous methods, where such local changes have been implemented by generating new (global) images, we propose to formulate local attribute transfers as an inpainting problem. Removing and regenerating only parts of images, our Attribute Transfer Inpainting Generative Adversarial Network (ATI-GAN) is able to utilize local context information to focus on the attributes while keeping the background unmodified resulting in visually sound results.

A Joint Representation Learning and Feature Modeling Approach for One-Class Recognition

Pramuditha Perera, Vishal Patel

Responsive image

Auto-TLDR; Combining Generative Features and One-Class Classification for Effective One-class Recognition

Slides Poster Similar

One-class recognition is traditionally approached either as a representation learning problem or a feature modelling problem. In this work, we argue that both of these approaches have their own limitations; and a more effective solution can be obtained by combining the two. The proposed approach is based on the combination of a generative framework and a one-class classification method. First, we learn generative features using the one-class data with a generative framework. We augment the learned features with the corresponding reconstruction errors to obtain augmented features. Then, we qualitatively identify a suitable feature distribution that reduces the redundancy in the chosen classifier space. Finally, we force the augmented features to take the form of this distribution using an adversarial framework. We test the effectiveness of the proposed method on three one-class classification tasks and obtain state-of-the-art results.

Generative Latent Implicit Conditional Optimization When Learning from Small Sample

Idan Azuri, Daphna Weinshall

Responsive image

Auto-TLDR; GLICO: Generative Latent Implicit Conditional Optimization for Small Sample Learning

Slides Poster Similar

We revisit the long-standing problem of learning from small sample. The generation of new samples from a small training set of labeled points has attracted increased attention in recent years. In this paper, we propose a novel such method called GLICO (Generative Latent Implicit Conditional Optimization). GLICO learns a mapping from the training examples to a latent space and a generator that generates images from vectors in the latent space. Unlike most recent work, which rely on access to large amounts of unlabeled data, GLICO does not require access to any additional data other than the small set of labeled points. In fact, GLICO learns to synthesize completely new samples for every class using as little as 5 or 10 examples per class, with as few as 10 such classes and no data from unknown classes. GLICO is then used to augment the small training set while training a classifier on the small sample. To this end, our proposed method samples the learned latent space using spherical interpolation (slerp) and generates new examples using the trained generator. Empirical results show that the new sampled set is diverse enough, leading to improvement in image classification in comparison with the state of the art when trained on small samples obtained from CIFAR-10, CIFAR-100, and CUB-200.

SSDL: Self-Supervised Domain Learning for Improved Face Recognition

Samadhi Poornima Kumarasinghe Wickrama Arachchilage, Ebroul Izquierdo

Responsive image

Auto-TLDR; Self-supervised Domain Learning for Face Recognition in unconstrained environments

Slides Poster Similar

Face recognition in unconstrained environments is challenging due to variations in illumination, quality of sensing, motion blur and etc. An individual’s face appearance can vary drastically under different conditions creating a gap between train (source) and varying test (target) data. The domain gap could cause decreased performance levels in direct knowledge transfer from source to target. Despite fine-tuning with domain specific data could be an effective solution, collecting and annotating data for all domains is extremely expensive. To this end, we propose a self-supervised domain learning (SSDL) scheme that trains on triplets mined from unlabelled data. A key factor in effective discriminative learning, is selecting informative triplets. Building on most confident predictions, we follow an “easy-to-hard” scheme of alternate triplet mining and self-learning. Comprehensive experiments on four different benchmarks show that SSDL generalizes well on different domains.

Adversarial Encoder-Multi-Task-Decoder for Multi-Stage Processes

Andre Mendes, Julian Togelius, Leandro Dos Santos Coelho

Responsive image

Auto-TLDR; Multi-Task Learning and Semi-Supervised Learning for Multi-Stage Processes

Similar

In multi-stage processes, decisions occur in an ordered sequence of stages. Early stages usually have more observations with general information (easier/cheaper to collect), while later stages have fewer observations but more specific data. This situation can be represented by a dual funnel structure, in which the sample size decreases from one stage to the other while the information increases. Training classifiers in this scenario is challenging since information in the early stages may not contain distinct patterns to learn (underfitting). In contrast, the small sample size in later stages can cause overfitting. We address both cases by introducing a framework that combines adversarial autoencoders (AAE), multi-task learning (MTL), and multi-label semi-supervised learning (MLSSL). We improve the decoder of the AAE with an MTL component so it can jointly reconstruct the original input and use feature nets to predict the features for the next stages. We also introduce a sequence constraint in the output of an MLSSL classifier to guarantee the sequential pattern in the predictions. Using real-world data from different domains (selection process, medical diagnosis), we show that our approach outperforms other state-of-the-art methods.

Boundary Optimised Samples Training for Detecting Out-Of-Distribution Images

Luca Marson, Vladimir Li, Atsuto Maki

Responsive image

Auto-TLDR; Boundary Optimised Samples for Out-of-Distribution Input Detection in Deep Convolutional Networks

Slides Poster Similar

This paper presents a new approach to the problem of detecting out-of-distribution (OOD) inputs in image classifications with deep convolutional networks. We leverage so-called boundary samples to enforce low confidence (maximum softmax probabilities) for inputs far away from the training data. In particular, we propose the boundary optimised samples (named BoS) training algorithm for generating them. Unlike existing approaches, it does not require extra generative adversarial network, but achieves the goal by simply back propagating the gradient of an appropriately designed loss function to the input samples. At the end of the BoS training, all the boundary samples are in principle located on a specific level hypersurface with respect to the designed loss. Our contributions are i) the BoS training as an efficient alternative to generate boundary samples, ii) a robust algorithm therewith to enforce low confidence for OOD samples, and iii) experiments demonstrating improved OOD detection over the baseline. We show the performance using standard datasets for training and different test sets including Fashion MNIST, EMNIST, SVHN, and CIFAR-100, preceded by evaluations with a synthetic 2-dimensional dataset that provide an insight for the new procedure.

Manual-Label Free 3D Detection Via an Open-Source Simulator

Zhen Yang, Chi Zhang, Zhaoxiang Zhang, Huiming Guo

Responsive image

Auto-TLDR; DA-VoxelNet: A Novel Domain Adaptive VoxelNet for LIDAR-based 3D Object Detection

Slides Poster Similar

LiDAR based 3D object detectors typically need a large amount of detailed-labeled point cloud data for training, but these detailed labels are commonly expensive to acquire. In this paper, we propose a manual-label free 3D detection algorithm that leverages the CARLA simulator to generate a large amount of self-labeled training samples and introduces a novel Domain Adaptive VoxelNet (DA-VoxelNet) that can cross the distribution gap from the synthetic data to the real scenario. The self-labeled training samples are generated by a set of high quality 3D models embedded in a CARLA simulator and a proposed LiDAR-guided sampling algorithm. Then a DA-VoxelNet that integrates both a sample-level DA module and an anchor-level DA module is proposed to enable the detector trained by the synthetic data to adapt to real scenario. Experimental results show that the proposed unsupervised DA 3D detector on KITTI evaluation set can achieve 76.66% and 56.64% mAP on BEV mode and 3D mode respectively. The results reveal a promising perspective of training a LIDAR-based 3D detector without any hand-tagged label.

Unsupervised Domain Adaptation for Object Detection in Cultural Sites

Giovanni Pasqualino, Antonino Furnari, Giovanni Maria Farinella

Responsive image

Auto-TLDR; Unsupervised Domain Adaptation for Object Detection in Cultural Sites

Slides Similar

The ability to detect objects in cultural sites from the egocentric point of view of the user can enable interesting applications for both the visitors and the manager of the site. Unfortunately, current object detection algorithms have to be trained on large amounts of labeled data, the collection of which is costly and time-consuming. While synthetic data generated from the 3D model of the cultural site can be used to train object detection algorithms, a significant drop in performance is generally observed when such algorithms are deployed to work with real images. In this paper, we consider the problem of unsupervised domain adaptation for object detection in cultural sites. Specifically, we assume the availability of synthetic labeled images and real unlabeled images for training. To study the problem, we propose a dataset containing 75244 synthetic and 2190 real images with annotations for 16 different artworks. We hence investigate different domain adaptation techniques based on image-to-image translation and feature alignment. Our analysis points out that such techniques can be useful to address the domain adaptation issue, while there is still plenty of space for improvement on the proposed dataset. We release the dataset at our web page to encourage research on this challenging topic: https://iplab.dmi.unict.it/EGO-CH-OBJ-ADAPT/.

Joint Supervised and Self-Supervised Learning for 3D Real World Challenges

Antonio Alliegro, Davide Boscaini, Tatiana Tommasi

Responsive image

Auto-TLDR; Self-supervision for 3D Shape Classification and Segmentation in Point Clouds

Slides Similar

Point cloud processing and 3D shape understanding are very challenging tasks for which deep learning techniques have demonstrated great potentials. Still further progresses are essential to allow artificial intelligent agents to interact with the real world. In many practical conditions the amount of annotated data may be limited and integrating new sources of knowledge becomes crucial to support autonomous learning. Here we consider several scenarios involving synthetic and real world point clouds where supervised learning fails due to data scarcity and large domain gaps. We propose to enrich standard feature representations by leveraging self-supervision through a multi-task model that can solve a 3D puzzle while learning the main task of shape classification or part segmentation. An extensive analysis investigating few-shot, transfer learning and cross-domain settings shows the effectiveness of our approach with state-of-the-art results for 3D shape classification and part segmentation.

Optimal Transport As a Defense against Adversarial Attacks

Quentin Bouniot, Romaric Audigier, Angélique Loesch

Responsive image

Auto-TLDR; Sinkhorn Adversarial Training with Optimal Transport Theory

Slides Poster Similar

Deep learning classifiers are now known to have flaws in the representations of their class. Adversarial attacks can find a human-imperceptible perturbation for a given image that will mislead a trained model. The most effective methods to defend against such attacks trains on generated adversarial examples to learn their distribution. Previous work aimed to align original and adversarial image representations in the same way as domain adaptation to improve robustness. Yet, they partially align the representations using approaches that do not reflect the geometry of space and distribution. In addition, it is difficult to accurately compare robustness between defended models. Until now, they have been evaluated using a fixed perturbation size. However, defended models may react differently to variations of this perturbation size. In this paper, the analogy of domain adaptation is taken a step further by exploiting optimal transport theory. We propose to use a loss between distributions that faithfully reflect the ground distance. This leads to SAT (Sinkhorn Adversarial Training), a more robust defense against adversarial attacks. Then, we propose to quantify more precisely the robustness of a model to adversarial attacks over a wide range of perturbation sizes using a different metric, the Area Under the Accuracy Curve (AUAC). We perform extensive experiments on both CIFAR-10 and CIFAR-100 datasets and show that our defense is globally more robust than the state-of-the-art.

Learning with Multiplicative Perturbations

Xiulong Yang, Shihao Ji

Responsive image

Auto-TLDR; XAT and xVAT: A Multiplicative Adversarial Training Algorithm for Robust DNN Training

Slides Poster Similar

Adversarial Training (AT) and Virtual Adversarial Training (VAT) are the regularization techniques that train Deep Neural Networks (DNNs) with adversarial examples generated by adding small but worst-case perturbations to input examples. In this paper, we propose xAT and xVAT, new adversarial training algorithms that generate multiplicative perturbations to input examples for robust training of DNNs. Such perturbations are much more perceptible and interpretable than their additive counterparts exploited by AT and VAT. Furthermore, the multiplicative perturbations can be generated transductively or inductively, while the standard AT and VAT only support a transductive implementation. We conduct a series of experiments that analyze the behavior of the multiplicative perturbations and demonstrate that xAT and xVAT match or outperform state-of-the-art classification accuracies across multiple established benchmarks while being about 30% faster than their additive counterparts. Our source code can be found at https://github.com/sndnyang/xvat