Decision Snippet Features

Pascal Welke, Fouad Alkhoury, Christian Bauckhage, Stefan Wrobel

Responsive image

Auto-TLDR; Decision Snippet Features for Interpretability

Slides Poster

Decision trees excel at interpretability of their prediction results. To achieve required prediction accuracies, however, often large ensembles of decision trees -- random forests -- are considered, reducing interpretability due to large size. Additionally, their size slows down inference on modern hardware and restricts their applicability in low-memory embedded devices. We introduce \emph{Decision Snippet Features}, which are obtained from small subtrees that appear frequently in trained random forests. We subsequently show that linear models on top of these features achieve comparable and sometimes even better predictive performance than the original random forest, while reducing the model size by up to two orders of magnitude.

Similar papers

On Learning Random Forests for Random Forest Clustering

Manuele Bicego, Francisco Escolano

Responsive image

Auto-TLDR; Learning Random Forests for Clustering

Slides Poster Similar

In this paper we study the poorly investigated problem of learning Random Forests for distance-based Random Forest clustering. We studied both classic schemes as well as alternative approaches, novel in this context. In particular, we investigated the suitability of Gaussian Density Forests, Random Forests specifically designed for density estimation. Further, we introduce a novel variant of Random Forest, based on an effective non parametric by-pass estimator of the Renyi entropy, which can be useful when the parametric assumption is too strict. An empirical evaluation involving different datasets and different RF-clustering strategies confirms that the learning step is crucial for RF-clustering. We also present a set of practical guidelines useful to determine the most suitable variant of RF-clustering according to the problem under examination.

A Novel Random Forest Dissimilarity Measure for Multi-View Learning

Hongliu Cao, Simon Bernard, Robert Sabourin, Laurent Heutte

Responsive image

Auto-TLDR; Multi-view Learning with Random Forest Relation Measure and Instance Hardness

Slides Poster Similar

Multi-view learning is a learning task in which data is described by several concurrent representations. Its main challenge is most often to exploit the complementarities between these representations to help solve a classification/regression task. This is a challenge that can be met nowadays if there is a large amount of data available for learning. However, this is not necessarily true for all real-world problems, where data are sometimes scarce (e.g. problems related to the medical environment). In these situations, an effective strategy is to use intermediate representations based on the dissimilarities between instances. This work presents new ways of constructing these dissimilarity representations, learning them from data with Random Forest classifiers. More precisely, two methods are proposed, which modify the Random Forest proximity measure, to adapt it to the context of High Dimension Low Sample Size (HDLSS) multi-view classification problems. The second method, based on an Instance Hardness measurement, is significantly more accurate than other state-of-the-art measurements including the original RF Proximity measurement and the Large Margin Nearest Neighbor (LMNN) metric learning measurement.

Hierarchical Routing Mixture of Experts

Wenbo Zhao, Yang Gao, Shahan Ali Memon, Bhiksha Raj, Rita Singh

Responsive image

Auto-TLDR; A Binary Tree-structured Hierarchical Routing Mixture of Experts for Regression

Slides Poster Similar

In regression tasks the distribution of the data is often too complex to be fitted by a single model. In contrast, partition-based models are developed where data is divided and fitted by local models. These models partition the input space and do not leverage the input-output dependency of multimodal-distributed data, and strong local models are needed to make good predictions. Addressing these problems, we propose a binary tree-structured hierarchical routing mixture of experts (HRME) model that has classifiers as non-leaf node experts and simple regression models as leaf node experts. The classifier nodes jointly soft-partition the input-output space based on the natural separateness of multimodal data. This enables simple leaf experts to be effective for prediction. Further, we develop a probabilistic framework for the HRME model, and propose a recursive Expectation-Maximization (EM) based algorithm to learn both the tree structure and the expert models. Experiments on a collection of regression tasks validate the effectiveness of our method compared to a variety of other regression models.

Malware Detection by Exploiting Deep Learning over Binary Programs

Panpan Qi, Zhaoqi Zhang, Wei Wang, Chang Yao

Responsive image

Auto-TLDR; End-to-End Malware Detection without Feature Engineering

Slides Poster Similar

Malware evolves rapidly over time, which makes existing solutions being ineffective in detecting newly released malware. Machine learning models that can learn to capture malicious patterns directly from the data play an increasingly important role in malware analysis. However, traditional machine learning models heavily depend on feature engineering. The extracted static features are vulnerable as hackers could create new malware with different feature values to deceive the machine learning models. In this paper, we propose an end-to-end malware detection framework consisting of convolutional neural network, autoencoder and neural decision trees. It learns the features from multiple domains for malware detection without feature engineering. In addition, since anti-virus products should have a very low false alarm rate to avoid annoying users, we propose a special loss function, which optimizes the recall for a fixed low false positive rate (e.g., less than 0.1%). Experiments show that the proposed framework has achieved a better recall than the baseline models, and the derived loss function also makes a difference.

Proximity Isolation Forests

Antonella Mensi, Manuele Bicego, David Tax

Responsive image

Auto-TLDR; Proximity Isolation Forests for Non-vectorial Data

Slides Poster Similar

Isolation Forests are a very successful approach for solving outlier detection tasks. Isolation Forests are based on classical Random Forest classifiers that require feature vectors as input. There are many situations where vectorial data is not readily available, for instance when dealing with input sequences or strings. In these situations, one can extract higher level characteristics from the input, which is typically hard and often loses valuable information. An alternative is to define a proximity between the input objects, which can be more intuitive. In this paper we propose the Proximity Isolation Forests that extend the Isolation Forests to non-vectorial data. The introduced methodology has been thoroughly evaluated on 8 different problems and it achieves very good results also when compared to other techniques.

Algorithm Recommendation for Data Streams

Jáder Martins Camboim De Sá, Andre Luis Debiaso Rossi, Gustavo Enrique De Almeida Prado Alves Batista, Luís Paulo Faina Garcia

Responsive image

Auto-TLDR; Meta-Learning for Algorithm Selection in Time-Changing Data Streams

Slides Poster Similar

In the last decades, many companies are taking advantage of massive data generation at high frequencies through knowledge discovery to identify valuable information. Machine learning techniques can be employed for knowledge discovery, since they are able to extract patterns from data and induce models to predict future events. However, dynamic and evolving environments generate streams of data that usually are non-stationary. Models induced in these scenarios may perish over time due to seasonality or concept drift. The periodic retraining could help but the fixed algorithm's hypothesis space could no longer be appropriate. An alternative solution is to use meta-learning for periodic algorithm selection in time-changing environments, choosing the bias that best suits the current data. In this paper, we present an enhanced framework for data streams algorithm selection based on MetaStream. Our approach uses meta-learning and incremental learning to actively select the best algorithm for the current concept in a time-changing. Different from previous works, a set of cutting edge meta-features and an incremental learning approach in the meta-level based on LightGBM are used. The results show that this new strategy can improve the recommendation of the best algorithm more accurately in time-changing data.

A Heuristic-Based Decision Tree for Connected Components Labeling of 3D Volumes

Maximilian Söchting, Stefano Allegretti, Federico Bolelli, Costantino Grana

Responsive image

Auto-TLDR; Entropy Partitioning Decision Tree for Connected Components Labeling

Slides Poster Similar

Connected Components Labeling represents a fundamental step for many Computer Vision and Image Processing pipelines. Since the first appearance of the task in the sixties, many algorithmic solutions to optimize the computational load needed to label an image have been proposed. Among them, block-based scan approaches and decision trees revealed to be some of the most valuable strategies. However, due to the cost of the manual construction of optimal decision trees and the computational limitations of automatic strategies employed in the past, the application of blocks and decision trees has been restricted to small masks, and thus to 2D algorithms. With this paper we present a novel heuristic algorithm based on decision tree learning methodology, called Entropy Partitioning Decision Tree (EPDT). It allows to compute near-optimal decision trees for large scan masks. Experimental results demonstrate that algorithms based on the generated decision trees outperform state-of-the-art competitors.

Using Meta Labels for the Training of Weighting Models in a Sample-Specific Late Fusion Classification Architecture

Peter Bellmann, Patrick Thiam, Friedhelm Schwenker

Responsive image

Auto-TLDR; A Late Fusion Architecture for Multiple Classifier Systems

Slides Poster Similar

The performance of multiple classifier systems can be significantly improved by the use of intelligent classifier combination approaches. In this study, we introduce a novel late fusion architecture, which can be interpreted as a combination of the well-known mixture of experts and stacked generalization methods. Our proposed method aggregates the outputs of classification models and corresponding sample-specific weighting models. A special feature of our proposed architecture is that each weighting model is trained on an individual set of meta labels. Using individual sets of meta labels allows each weighting model to separate regions, on which the predictions of the corresponding classification model can be associated to an estimated confidence value. We test our proposed architecture on a set of publicly available databases, including different benchmark data sets. The experimental evaluation shows the effectiveness and potential of our proposed method. Moreover, we discuss different approaches for further improvement of our proposed architecture.

Automatically Mining Relevant Variable Interactions Via Sparse Bayesian Learning

Ryoichiro Yafune, Daisuke Sakuma, Yasuo Tabei, Noritaka Saito, Hiroto Saigo

Responsive image

Auto-TLDR; Sparse Bayes for Interpretable Non-linear Prediction

Slides Poster Similar

With the rapid increase in the availability of large amount of data, prediction is becoming increasingly popular, and has widespread through our daily life. However, powerful non- linear prediction methods such as deep learning and SVM suffer from interpretability problem, making it hard to use in domains where the reason for decision making is required. In this paper, we develop an interpretable non-linear model called itemset Sparse Bayes (iSB), which builds a Bayesian probabilistic model, while simultaneously considering variable interactions. In order to suppress the resulting large number of variables, sparsity is imposed on regression weights by a sparsity inducing prior. As a subroutine to search for variable interactions, itemset enumeration algorithm is employed with a novel bounding condition. In computational experiments using real-world dataset, the proposed method performed better than decision tree by 10% in terms of r-squared . We also demonstrated the advantage of our method in Bayesian optimization setting, in which the proposed approach could successfully find the maximum of an unknown function faster than Gaussian process. The interpretability of iSB is naturally inherited to Bayesian optimization, thereby gives us a clue to understand which variables interactions are important in optimizing an unknown function.

Mean Decision Rules Method with Smart Sampling for Fast Large-Scale Binary SVM Classification

Alexandra Makarova, Mikhail Kurbakov, Valentina Sulimova

Responsive image

Auto-TLDR; Improving Mean Decision Rule for Large-Scale Binary SVM Problems

Slides Poster Similar

This paper relies on the Mean Decision Rule (MDR) method for solving large-scale binary SVM problems. It consists in taking small random samples of the full dataset and separate training for each of them with consecutive averaging the respective individual decision rules to obtain a final one. This paper proposes two new approaches to improve it. The first proposed approach is a new sampling technique that exploits SVM and MDR properties to fast form so called smart samples by selecting only the objects, that are candidates to be the support ones. The proposed technique essentially increases MDR convergence and allows to reach the highest quality in less time. In the case of kernel-based MDR (KMDR) the proposed sampling technique allows additionally to reduce the number of support objects in the final decision rule and, as a result, to decrease the recognition time. The second proposed approach is a new data strategy to accelerate random access to large datasets stored in the traditional libsvm format. The proposed strategy allows to quickly extract random subsets of objects from a file and load them into RAM, and is it also suitable for any sampling-based methods, including stochastic gradient methods. Joint using of the proposed approaches with (K)MDR allows to obtain the best (or near the best) decision of large-scale binary SVM problems faster, compared to the existing SVM solvers.

The eXPose Approach to Crosslier Detection

Antonio Barata, Frank Takes, Hendrik Van Den Herik, Cor Veenman

Responsive image

Auto-TLDR; EXPose: Crosslier Detection Based on Supervised Category Modeling

Slides Poster Similar

Transit of wasteful materials within the European Union is highly regulated through a system of permits. Waste processing costs vary greatly depending on the waste category of a permit. Therefore, companies may have a financial incentive to allege transporting waste with erroneous categorisation. Our goal is to assist inspectors in selecting potentially manipulated permits for further investigation, making their task more effective and efficient. Due to data limitations, a supervised learning approach based on historical cases is not possible. Standard unsupervised approaches, such as outlier detection and data quality-assurance techniques, are not suited since we are interested in targeting non-random modifications in both category and category-correlated features. For this purpose we (1) introduce the concept of crosslier: an anomalous instance of a category which lies across other categories; (2) propose eXPose: a novel approach to crosslier detection based on supervised category modelling; and (3) present the crosslier diagram: a visualisation tool specifically designed for domain experts to easily assess crossliers. We compare eXPose against traditional outlier detection methods in various benchmark datasets with synthetic crossliers and show the superior performance of our method in targeting these instances.

Dual-Memory Model for Incremental Learning: The Handwriting Recognition Use Case

Mélanie Piot, Bérangère Bourdoulous, Aurelia Deshayes, Lionel Prevost

Responsive image

Auto-TLDR; A dual memory model for handwriting recognition

Poster Similar

In this paper, we propose a dual memory model inspired by neural science. Short-term memory processes the data stream before integrating them into long-term memory, which generalizes. The use case is learning the ability to recognize handwriting. This begins with the learning of prototypical letters . It continues throughout life and gives the individual the ability to recognize increasingly varied handwriting. This second task is achieved by incrementally training our dual-memory model. We used a convolution network for encoding and random forests as the memory model. Indeed, the latter have the advantage of being easily enhanced to integrate new data and new classes. Performances on the MNIST database are very encouraging since they exceed 95\% and the complexity of the model remains reasonable.

Memetic Evolution of Training Sets with Adaptive Radial Basis Kernels for Support Vector Machines

Jakub Nalepa, Wojciech Dudzik, Michal Kawulok

Responsive image

Auto-TLDR; Memetic Algorithm for Evolving Support Vector Machines with Adaptive Kernels

Slides Poster Similar

Support vector machines (SVMs) are a supervised learning technique that can be applied in both binary and multi-class classification and regression tasks. SVMs seamlessly handle continuous and categorical variables. Their training is, however, both time- and memory-costly for large training data, and selecting an incorrect kernel function or its hyperparameters leads to suboptimal decision hyperplanes. In this paper, we introduce a memetic algorithm for evolving SVM training sets with adaptive radial basis function kernels to not only make the deployment of SVMs easier for emerging big data applications, but also to improve their generalization abilities over the unseen data. We build upon two observations: first, only a small subset of all training vectors, called the support vectors, contribute to the position of the decision boundary, hence the other vectors can be removed from the training set without deteriorating the performance of the model. Second, selecting different kernel hyperparameters for different training vectors may help better reflect the subtle characteristics of the space while determining the hyperplane. The experiments over almost 100 benchmark and synthetic sets showed that our algorithm delivers models outperforming both SVMs optimized using state-of-the-art evolutionary techniques, and other supervised learners.

Using Machine Learning to Refer Patients with Chronic Kidney Disease to Secondary Care

Lee Au-Yeung, Xianghua Xie, Timothy Marcus Scale, James Anthony Chess

Responsive image

Auto-TLDR; A Machine Learning Approach for Chronic Kidney Disease Prediction using Blood Test Data

Slides Poster Similar

There has been growing interest recently in using machine learning techniques as an aid in clinical medicine. Machine learning offers a range of classification algorithms which can be applied to medical data to aid in making clinical predictions. Recent studies have demonstrated the high predictive accuracy of various classification algorithms applied to clinical data. Several studies have already been conducted in diagnosing or predicting chronic kidney disease at various stages using different sets of variables. In this study we are investigating the use machine learning techniques with blood test data. Such a system could aid renal teams in making recommendations to primary care general practitioners to refer patients to secondary care where patients may benefit from earlier specialist assessment and medical intervention. We are able to achieve an overall accuracy of 88.48\% using logistic regression, 87.12\% using ANN and 85.29\% using SVM. ANNs performed with the highest sensitivity at 89.74\% compared to 86.67\% for logistic regression and 85.51\% for SVM.

Categorizing the Feature Space for Two-Class Imbalance Learning

Rosa Sicilia, Ermanno Cordelli, Paolo Soda

Responsive image

Auto-TLDR; Efficient Ensemble of Classifiers for Minority Class Inference

Slides Poster Similar

Class imbalance limits the performance of most learning algorithms, resulting in a low recognition rate for samples belonging to the minority class. Although there are different strategies to address this problem, methods that generate ensemble of classifiers have proven to be effective in several applications. This paper presents a new strategy to construct the training set of each classifier in the ensemble by exploiting information in the feature space that can give rise to unreliable classifications, which are determined by a novel algorithm here introduced. The performance of our proposal is compared against multiple standard ensemble approaches on 25 publicly available datasets, showing promising results.

Supervised Classification Using Graph-Based Space Partitioning for Multiclass Problems

Nicola Yanev, Ventzeslav Valev, Adam Krzyzak, Karima Ben Suliman

Responsive image

Auto-TLDR; Box Classifier for Multiclass Classification

Slides Poster Similar

We introduce and investigate in multiclass setting an efficient classifier which partitions the training data by means of multidimensional parallelepipeds called boxes. We show that multiclass classification problem at hand can be solved by integrating the heuristic minimum clique cover approach and the k-nearest neighbor rule. Our algorithm is motivated an algorithm for partitioning a graph into a minimal number of maximal. The main advantage of the new classifier called Box classifier is that it optimally utilizes the geometrical structure of the training set by decomposing the l-class problem (l > 2) into l binary classification problems. We discuss computational complexity of the proposed Box classifier. The extensive experiments performed on the simulated and real data for binary and multiclass problems show that in almost all cases the Box classifier performs significantly better than k-NN, SVM and decision trees.

Comparison of Stacking-Based Classifier Ensembles Using Euclidean and Riemannian Geometries

Vitaliy Tayanov, Adam Krzyzak, Ching Y Suen

Responsive image

Auto-TLDR; Classifier Stacking in Riemannian Geometries using Cascades of Random Forest and Extra Trees

Slides Poster Similar

This paper considers three different classifier stacking algorithms: simple stacking, cascades of classifier ensembles and nonlinear version of classifier stacking based on classifier interactions. Classifier interactions can be expressed using classifier prediction pairwise matrix (CPPM). As a meta-learner for the last algorithm Convolutional Neural Networks (CNNs) and two other classifier stacking algorithms (simple classifier stacking and cascades of classifier ensembles) have been applied. This allows applying classical stacking and cascade-based recursive stacking in the Euclidean and the Riemannian geometries. The cascades of random forests (RFs) and extra trees (ETs) are considered as a forest-based alternative to deep neural networks [1]. Our goal is to compare accuracies of the cascades of RFs and CNN-based stacking or deep multi-layer perceptrons (MLPs) for different classifications problems. We use gesture phase dataset from UCI repository [2] to compare and analyze cascades of RFs and extra trees (ETs) in both geometries and CNN-based version of classifier stacking. This data set was selected because generally motion is considered as a nonlinear process (patterns do no lie in Euclidean vector space) in computer vision applications. Thus we can assess how good are forest-based deep learning and the Riemannian manifolds (R-manifolds) when applied to nonlinear processes. Some more datasets from UCI repository were used to compare the aforementioned algorithms to some other well-known classifiers and their stacking-based versions in both geometries. Experimental results show that classifier stacking algorithms in Riemannian geometry (R-geometry) are less dependent on some properties of individual classifiers (e.g. depth of decision trees in RFs or ETs) in comparison to Euclidean geometry. More independent individual classifiers allow to obtain R-manifolds with better properties for classification. Generally, accuracy of classification using classifier stacking in R-geometry is higher than in Euclidean one.

Boundary Bagging to Address Training Data Issues in Ensemble Classification

Samia Boukir, Wei Feng

Responsive image

Auto-TLDR; Bagging Ensemble Learning for Multi-Class Imbalanced Classification

Poster Similar

The characteristics of training data is a fundamental consideration when constructing any supervised classifier. Class mislabelling and imbalance are major training data issues that often adversely affect machine learning algorithms, including ensembles. This work proposes extended bagging algorithms to better handle noisy and multi-class imbalanced classification tasks. These algorithms upgrade the sampling procedure by taking benefit of the confidence in ensemble classification outcome. The underlying idea is that a bagging ensemble learning algorithm can achieve greater performance if it is allowed to choose the data from which it learns. The effectiveness of the proposed methods is demonstrated in performing classification on 10 various data sets.

On Resource-Efficient Bayesian Network Classifiers and Deep Neural Networks

Wolfgang Roth, Günther Schindler, Holger Fröning, Franz Pernkopf

Responsive image

Auto-TLDR; Quantization-Aware Bayesian Network Classifiers for Small-Scale Scenarios

Slides Poster Similar

We present two methods to reduce the complexity of Bayesian network (BN) classifiers. First, we introduce quantization-aware training using the straight-through gradient estimator to quantize the parameters of BNs to few bits. Second, we extend a recently proposed differentiable tree-augmented naive Bayes (TAN) structure learning approach to also consider the model size. Both methods are motivated by recent developments in the deep learning community, and they provide effective means to trade off between model size and prediction accuracy, which is demonstrated in extensive experiments. Furthermore, we contrast quantized BN classifiers with quantized deep neural networks (DNNs) for small-scale scenarios which have hardly been investigated in the literature. We show Pareto optimal models with respect to model size, number of operations, and test error and find that both model classes are viable options.

Automatic Classification of Human Granulosa Cells in Assisted Reproductive Technology Using Vibrational Spectroscopy Imaging

Marina Paolanti, Emanuele Frontoni, Giorgia Gioacchini, Giorgini Elisabetta, Notarstefano Valentina, Zacà Carlotta, Carnevali Oliana, Andrea Borini, Marco Mameli

Responsive image

Auto-TLDR; Predicting Oocyte Quality in Assisted Reproductive Technology Using Machine Learning Techniques

Slides Poster Similar

In the field of reproductive technology, the biochemical composition of female gametes has been successfully investigated with the use of vibrational spectroscopy. Currently, in assistive reproductive technology (ART), there are no shared criteria for the choice of oocyte, and automatic classification methods for the best quality oocytes have not yet been applied. In this paper, considering the lack of criteria in Assisted Reproductive Technology (ART), we use Machine Learning (ML) techniques to predict oocyte quality for a successful pregnancy. To improve the chances of successful implantation and minimize any complications during the pregnancy, Fourier transform infrared microspectroscopy (FTIRM) analysis has been applied on granulosa cells (GCs) collected along with the oocytes during oocyte aspiration, as it is routinely done in ART, and specific spectral biomarkers were selected by multivariate statistical analysis. A proprietary biological reference dataset (BRD) was successfully collected to predict the best oocyte for a successful pregnancy. Personal health information are stored, maintained and backed up using a cloud computing service. Using a user-friendly interface, the user will evaluate whether or not the selected oocyte will have a positive result. This interface includes a dashboard for retrospective analysis, reporting, real-time processing, and statistical analysis. The experimental results are promising and confirm the efficiency of the method in terms of classification metrics: precision, recall, and F1-score (F1) measures.

On Morphological Hierarchies for Image Sequences

Caglayan Tuna, Alain Giros, François Merciol, Sébastien Lefèvre

Responsive image

Auto-TLDR; Comparison of Hierarchies for Image Sequences

Slides Poster Similar

Morphological hierarchies form a popular framework aiming at emphasizing the multiscale structure of digital image by performing an unsupervised spatial partitioning of the data. These hierarchies have been recently extended to cope with image sequences, and different strategies have been proposed to allow their construction from spatio-temporal data. In this paper, we compare these hierarchical representation strategies for image sequences according to their structural properties. We introduce a projection method to make these representations comparable. Furthermore, we extend one of these recent strategies in order to obtain more efficient hierarchical representations for image sequences. Experiments were conducted on both synthetic and real datasets, the latter being made of satellite image time series. We show that building one hierarchy by using spatial and temporal information together is more efficient comparing to other existing strategies.

Hierarchical Classification with Confidence Using Generalized Logits

James W. Davis, Tong Liang, James Enouen, Roman Ilin

Responsive image

Auto-TLDR; Generalized Logits for Hierarchical Classification

Slides Poster Similar

We present a bottom-up approach to hierarchical classification based on posteriors conditioned with logits. Beginning with the output logits for a set of terminal labels from a base classifier, an initial hypothesis is repeatedly generalized (softened) to a weaker label until a particular confidence measure is achieved. As conditioning the probabilistic model with the full set of terminal logits quickly becomes intractable for large label sets, we propose an alternative approach employing "generalized logits" spanning relevant hypotheses within the label hierarchy. Experimental results are compared with related methods on multiple datasets and base classifiers. The proposed approach provides an efficient and effective hierarchical classification framework with monotonic, non-decreasing inference behavior.

Killing Four Birds with One Gaussian Process: The Relation between Different Test-Time Attacks

Kathrin Grosse, Michael Thomas Smith, Michael Backes

Responsive image

Auto-TLDR; Security of Gaussian Process Classifiers against Attack Algorithms

Slides Poster Similar

In machine learning (ML) security, attacks like evasion, model stealing or membership inference are generally studied in individually. Previous work has also shown a relationship between some attacks and decision function curvature of the targeted model. Consequently, we study an ML model allowing direct control over the decision surface curvature: Gaussian Process classifiers (GPCs). For evasion, we find that changing GPC's curvature to be robust against one attack algorithm boils down to enabling a different norm or attack algorithm to succeed. This is backed up by our formal analysis showing that static security guarantees are opposed to learning. Concerning intellectual property, we show formally that lazy learning does not necessarily leak all information when applied. In practice, often a seemingly secure curvature can be found. For example, we are able to secure GPC against empirical membership inference by proper configuration. In this configuration, however, the GPC's hyper-parameters are leaked, e.g. model reverse engineering succeeds. We conclude that attacks on classification should not be studied in isolation, but in relation to each other.

Explainable Online Validation of Machine Learning Models for Practical Applications

Wolfgang Fuhl, Yao Rong, Thomas Motz, Michael Scheidt, Andreas Markus Hartel, Andreas Koch, Enkelejda Kasneci

Responsive image

Auto-TLDR; A Reformulation of Regression and Classification for Machine Learning Algorithm Validation

Slides Poster Similar

We present a reformulation of the regression and classification, which aims to validate the result of a machine learning algorithm. Our reformulation simplifies the original problem and validates the result of the machine learning algorithm using the training data. Since the validation of machine learning algorithms must always be explainable, we perform our experiments with the kNN algorithm as well as with an algorithm based on conditional probabilities, which is proposed in this work. For the evaluation of our approach, three publicly available data sets were used and three classification and two regression problems were evaluated. The presented algorithm based on conditional probabilities is also online capable and requires only a fraction of memory compared to the kNN algorithm.

Multi-Attribute Learning with Highly Imbalanced Data

Lady Viviana Beltran Beltran, Mickaël Coustaty, Nicholas Journet, Juan C. Caicedo, Antoine Doucet

Responsive image

Auto-TLDR; Data Imbalance in Multi-Attribute Deep Learning Models: Adaptation to face each one of the problems derived from imbalance

Slides Poster Similar

Data is one of the most important keys for success when studying a simple or a complex phenomenon. With the use of deep-learning exploding and its democratization, non-computer science experts may struggle to use highly complex deep learning architectures, even when straightforward models offer them suitable performances. In this article, we study the specific and common problem of data imbalance in real databases as most of the bad performance problems are due to the data itself. We review two points: first, when the data contains different levels of imbalance. Classical imbalanced learning strategies cannot be directly applied when using multi-attribute deep learning models, i.e., multi-task and multi-label architectures. Therefore, one of our contributions is our proposed adaptations to face each one of the problems derived from imbalance. Second, we demonstrate that with little to no imbalance, straightforward deep learning models work well. However, for non-experts, these models can be seen as black boxes, where all the effort is put in pre-processing the data. To simplify the problem, we performed the classification task ignoring information that is costly to extract, such as part localization which is widely used in the state of the art of attribute classification. We make use of a widely known attribute database, CUB-200-2011 - CUB as our main use case due to its deeply imbalanced nature, along with two better structured databases: celebA and Awa2. All of them contain multi-attribute annotations. The results of highly fine-grained attribute learning over CUB demonstrate that in the presence of imbalance, by using our proposed strategies is possible to have competitive results against the state of the art, while taking advantage of multi-attribute deep learning models. We also report results for two better-structured databases over which our models over-perform the state of the art.

Kernel-Based LIME with Feature Dependency Sampling

Sheng Shi, Yangzhou Du, Fan Wei

Responsive image

Auto-TLDR; Local Interpretable Model-agnostic Explanation with Feature Dependency Sampling

Slides Poster Similar

While deep learning makes significant achievements in Artificial Intelligence (AI), the lack of transparency has limited its broad application in various vertical domains. Explainability is not only a gateway between AI and society, but also a powerful feature to detect flaw of the models and bias of the data. Local Interpretable Model-agnostic Explanation (LIME) is a widely-accepted technique that explains the predictions of any classifier faithfully by learning an interpretable model locally around the predicted instance. However, the sampling operation in the standard implementation of LIME is defective. Perturbed samples are generated from a uniform distribution, ignoring the complicated correlation between features. Moreover, as the local decision boundary is non-linear for most complex networks, linear approximation may produce serious errors. This paper proposes an high-interpretability and high-fidelity local explanation method, known as Kernel-based LIME with Feature Dependency Sampling (KLFDS). Given an instance being explained, KLFDS enhances interpretability by feature sampling with intrinsic dependency. Besides, KLFDS improves the local explanation fidelity by approximating nonlinear boundary of local decision. We evaluate our method with image classification tasks and results show that KLFDS's explanation of the back-box model achieves much better performance than original LIME in terms of interpretability and fidelity.

Hyperspectral Imaging for Analysis and Classification of Plastic Waste

Jakub Kraśniewski, Łukasz Dąbała, Lewandowski Marcin

Responsive image

Auto-TLDR; A Hyperspectral Camera for Material Classification

Slides Poster Similar

Environmental protection is one of the main challenges facing society nowadays. Even with constantly growing awareness, not all of the sorting can be done by people themselves - the differences between materials are not visible to the human eye. For that reason, we present the use of a hyperspectral camera as a capture device, which allows us to obtain the full spectrum of the material. In this work we propose a method for efficient recognition of the substance of an item. We conducted several experiments and analysis of the spectra of different materials in different conditions on a special measuring stand. That enabled identification of the best features, which can later be used during classification, which was confirmed during the extensive testing procedure.

TreeRNN: Topology-Preserving Deep Graph Embedding and Learning

Yecheng Lyu, Ming Li, Xinming Huang, Ulkuhan Guler, Patrick Schaumont, Ziming Zhang

Responsive image

Auto-TLDR; TreeRNN: Recurrent Neural Network for General Graph Classification

Slides Poster Similar

General graphs are difficult for learning due to their irregular structures. Existing works employ message passing along graph edges to extract local patterns using customized graph kernels, but few of them are effective for the integration of such local patterns into global features. In contrast, in this paper we study the methods to transfer the graphs into trees so that explicit orders are learned to direct the feature integration from local to global. To this end, we apply the breadth first search (BFS) to construct trees from the graphs, which adds direction to the graph edges from the center node to the peripheral nodes. In addition, we proposed a novel projection scheme that transfer the trees to image representations, which is suitable for conventional convolution neural networks (CNNs) and recurrent neural networks (RNNs). To best learn the patterns from the graph-tree-images, we propose TreeRNN, a 2D RNN architecture that recurrently integrates the image pixels by rows and columns to help classify the graph categories. We evaluate the proposed method on several graph classification datasets, and manage to demonstrate comparable accuracy with the state-of-the-art on MUTAG, PTC-MR and NCI1 datasets.

Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge

Clemens-Alexander Brust, Björn Barz, Joachim Denzler

Responsive image

Auto-TLDR; Class Hierarchies for Imprecise Label Learning and Annotation eXtrapolation

Slides Poster Similar

Noisy data, crawled from the web or supplied by volunteers such as Mechanical Turkers or citizen scientists, is considered an alternative to professionally labeled data. There has been research focused on mitigating the effects of label noise. It is typically modeled as inaccuracy, where the correct label is replaced by an incorrect label from the same set. We consider an additional dimension of label noise: imprecision. For example, a non-breeding snow bunting is labeled as a bird. This label is correct, but not as precise as the task requires. Standard softmax classifiers cannot learn from such a weak label because they consider all classes mutually exclusive, which non-breeding snow bunting and bird are not. We propose CHILLAX (Class Hierarchies for Imprecise Label Learning and Annotation eXtrapolation), a method based on hierarchical classification, to fully utilize labels of any precision. Experiments on noisy variants of NABirds and ILSVRC2012 show that our method outperforms strong baselines by as much as 16.4 percentage points, and the current state of the art by up to 3.9 percentage points.

Learning Natural Thresholds for Image Ranking

Somayeh Keshavarz, Quang Nhat Tran, Richard Souvenir

Responsive image

Auto-TLDR; Image Representation Learning and Label Discretization for Natural Image Ranking

Slides Poster Similar

For image ranking tasks with naturally continuous output, such as age and scenicness estimation, it is common to discretize the label range and apply methods from (ordered) classification analysis. In this paper, we propose a data-driven approach for simultaneous representation learning and label discretization. Compared to arbitrarily selecting thresholds, we seek to learn thresholds and image representations by minimizing a novel loss function in an end-to-end model. We demonstrate our combined approach on a variety of image ranking tasks and demonstrate that it outperforms task-specific methods. Additionally, our learned partitioning scheme can be transferred to improve methods that rely on discretization.

Neuron-Based Network Pruning Based on Majority Voting

Ali Alqahtani, Xianghua Xie, Ehab Essa, Mark W. Jones

Responsive image

Auto-TLDR; Large-Scale Neural Network Pruning using Majority Voting

Slides Poster Similar

The achievement of neural networks in a variety of applications is accompanied by a dramatic increase in computational costs and memory requirements. In this paper, we propose an efficient method to simultaneously identify the critical neurons and prune the model during training without involving any pre-training or fine-tuning procedures. Unlike existing methods, which accomplish this task in a greedy fashion, we propose a majority voting technique to compare the activation values among neurons and assign a voting score to quantitatively evaluate their importance.This mechanism helps to effectively reduce model complexity by eliminating the less influential neurons and aims to determine a subset of the whole model that can represent the reference model with much fewer parameters within the training process. Experimental results show that majority voting efficiently compresses the network with no drop in model accuracy, pruning more than 79\% of the original model parameters on CIFAR10 and more than 91\% of the original parameters on MNIST. Moreover, we show that with our proposed method, sparse models can be further pruned into even smaller models by removing more than 60\% of the parameters, whilst preserving the reference model accuracy.

A Multilinear Sampling Algorithm to Estimate Shapley Values

Ramin Okhrati, Aldo Lipani

Responsive image

Auto-TLDR; A sampling method for Shapley values for multilayer Perceptrons

Slides Poster Similar

Shapley values are great analytical tools in game theory to measure the importance of a player in a game. Due to their axiomatic and desirable properties such as efficiency, they have become popular for feature importance analysis in data science and machine learning. However, the time complexity to compute Shapley values based on the original formula is exponential, and as the number of features increases, this becomes infeasible. Castro et al. [1] developed a sampling algorithm, to estimate Shapley values. In this work, we propose a new sampling method based on a multilinear extension technique as applied in game theory. The aim is to provide a more efficient (sampling) method for estimating Shapley values. Our method is applicable to any machine learning model, in particular for either multiclass classifications or regression problems. We apply the method to estimate Shapley values for multilayer Perceptrons (MLPs) and through experimentation on two datasets, we demonstrate that our method provides more accurate estimations of the Shapley values by reducing the variance of the sampling statistics

GPSRL: Learning Semi-Parametric Bayesian Survival Rule Lists from Heterogeneous Patient Data

Ameer Hamza Shakur, Xiaoning Qian, Zhangyang Wang, Bobak Mortazavi, Shuai Huang

Responsive image

Auto-TLDR; Semi-parametric Bayesian Survival Rule List Model for Heterogeneous Survival Data

Slides Similar

Survival data is often collected in medical applications from a heterogeneous population of patients. While in the past, popular survival models focused on modeling the average effect of the co-variates on survival outcomes, rapidly advancing sensing and information technologies have provided opportunities to further model the heterogeneity of the population as well as the non-linearity of the survival risk. With this motivation, we propose a new semi-parametric Bayesian Survival Rule List model in this paper. Our model derives a rule-based decision-making approach, while within the regime defined by each rule, survival risk is modelled via a Gaussian process latent variable model. Markov Chain Monte Carlo with a nested Laplace approximation for the latent variable model is used to search over the posterior of the rule lists efficiently. The use of ordered rule lists enables us to model heterogeneity while keeping the model complexity in check. Performance evaluations on a synthetic heterogeneous survival dataset and a real world sepsis survival dataset demonstrate the effectiveness of our model.

Creating Classifier Ensembles through Meta-Heuristic Algorithms for Aerial Scene Classification

Álvaro Roberto Ferreira Jr., Gustavo Gustavo Henrique De Rosa, Joao Paulo Papa, Gustavo Carneiro, Fabio Augusto Faria

Responsive image

Auto-TLDR; Univariate Marginal Distribution Algorithm for Aerial Scene Classification Using Meta-Heuristic Optimization

Slides Poster Similar

Aerial scene classification is a challenging task to be solved in the remote sensing area, whereas deep learning approaches, such as Convolutional Neural Networks (CNN), are being widely employed to overcome such a problem. Nevertheless, it is not straightforward to find single CNN models that can solve all aerial scene classification tasks, allowing the nurturing of a better alternative, which is to fuse CNN-based classifiers into an ensemble. However, an appropriate choice of the classifiers that will belong to the ensemble is a critical factor, as it is unfeasible to employ all the possible classifiers in the literature. Therefore, this work proposes a novel framework based on meta-heuristic optimization for creating optimized-ensembles in the context of aerial scene classification. The experimental results were performed across nine meta-heuristic algorithms and three aerial scene literature datasets, being compared in terms of effectiveness (accuracy), efficiency (execution time), and behavioral performance in different scenarios. Finally, one can observe that the Univariate Marginal Distribution Algorithm (UMDA) overcame popular literature meta-heuristic algorithms, such as Genetic Programming and Particle Swarm Optimization considering the adopted criteria in the performed experiments.

How to Define a Rejection Class Based on Model Learning?

Sarah Laroui, Xavier Descombes, Aurelia Vernay, Florent Villiers, Francois Villalba, Eric Debreuve

Responsive image

Auto-TLDR; An innovative learning strategy for supervised classification that is able, by design, to reject a sample as not belonging to any of the known classes

Slides Poster Similar

In supervised classification, the learning process typically trains a classifier to optimize the accuracy of classifying data into the classes that appear in the learning set, and only them. While this framework fits many use cases, there are situations where the learning process is knowingly performed using a learning set that only represents the data that have been observed so far among a virtually unconstrained variety of possible samples. It is then crucial to define a classifier which has the ability to reject a sample, i.e., to classify it into a rejection class that has not been yet defined. Although obvious solutions can add this ability a posteriori to a classifier that has been learned classically, a better approach seems to directly account for this requirement in the classifier design. In this paper, we propose an innovative learning strategy for supervised classification that is able, by design, to reject a sample as not belonging to any of the known classes. For that, we rely on modeling each class as the combination of a probability density function (PDF) and a threshold that is computed with respect to the other classes. Several alternatives are proposed and compared in this framework. A comparison with straightforward approaches is also provided.

Deep Transfer Learning for Alzheimer’s Disease Detection

Nicole Cilia, Claudio De Stefano, Francesco Fontanella, Claudio Marrocco, Mario Molinara, Alessandra Scotto Di Freca

Responsive image

Auto-TLDR; Automatic Detection of Handwriting Alterations for Alzheimer's Disease Diagnosis using Dynamic Features

Slides Poster Similar

Early detection of Alzheimer’s Disease (AD) is essential in order to initiate therapies that can reduce the effects of such a disease, improving both life quality and life expectancy of patients. Among all the activities carried out in our daily life, handwriting seems one of the first to be influenced by the arise of neurodegenerative diseases. For this reason, the analysis of handwriting and the study of its alterations has become of great interest in this research field in order to make a diagnosis as early as possible. In recent years, many studies have tried to use classification algorithms applied to handwritings to implement decision support systems for AD diagnosis. A key issue for the use of these techniques is the detection of effective features, that allow the system to distinguish the natural handwriting alterations due to age, from those caused by neurodegenerative disorders. In this context, many interesting results have been published in the literature in which the features have been typically selected by hand, generally considering the dynamics of the handwriting process in order to detect motor disorders closely related to AD. Features directly derived from handwriting generation models can be also very helpful for AD diagnosis. It should be remarked, however, that the above features do not consider changes in the shape of handwritten traces, which may occur as a consequence of neurodegenerative diseases, as well as the correlation among shape alterations and changes in the dynamics of the handwriting process. Moving from these considerations, the aim of this study is to verify if the combined use of both shape and dynamic features allows a decision support system to improve performance for AD diagnosis. To this purpose, starting from a database of on-line handwriting samples, we generated for each of them a synthetic off-line colour image, where the colour of each elementary trait encodes, in the three RGB channels, the dynamic information associated to that trait. Finally, we exploited the capability of Deep Neural Networks (DNN) to automatically extract features from raw images. The experimental comparison of the results obtained by using standard features and features extracted according the above procedure, confirmed the effectiveness of our approach.

Watermelon: A Novel Feature Selection Method Based on Bayes Error Rate Estimation and a New Interpretation of Feature Relevance and Redundancy

Xiang Xie, Wilhelm Stork

Responsive image

Auto-TLDR; Feature Selection Using Bayes Error Rate Estimation for Dynamic Feature Selection

Slides Poster Similar

Feature selection has become a crucial part of many classification problems in which high-dimensional datasets may contain tens of thousands of features. In this paper, we propose a novel feature selection method scoring the features through estimating the Bayes error rate based on kernel density estimation. Additionally, we update the scores of features dynamically by quantitatively interpreting the effects of feature relevance and redundancy in a new way. Distinguishing from the common heuristic applied by many feature selection methods, which prefers choosing features that are not relevant to each other, our approach penalizes only monotonically correlated features and rewards any other kind of relevance among features based on Spearman’s rank correlation coefficient and normalized mutual information. We conduct extensive experiments on seventeen diverse classification benchmarks, the results show that our approach overperforms other seventeen popular state-of-the-art feature selection methods in most cases.

Learning Parameter Distributions to Detect Concept Drift in Data Streams

Johannes Haug, Gjergji Kasneci

Responsive image

Auto-TLDR; A novel framework for the detection of concept drift in streaming environments

Slides Poster Similar

Data distributions in streaming environments are usually not stationary. In order to maintain a high predictive quality at all times, online learning models need to adapt to distributional changes, which are known as concept drift. The timely and robust identification of concept drift can be difficult, as we never have access to the true distribution of streaming data. In this work, we propose a novel framework for the detection of real concept drift, called ERICS. By treating the parameters of a predictive model as random variables, we show that concept drift corresponds to a change in the distribution of optimal parameters. To this end, we adopt common measures from information theory. The proposed framework is completely model-agnostic. By choosing an appropriate base model, ERICS is also capable to detect concept drift at the input level, which is a significant advantage over existing approaches. An evaluation on several synthetic and real-world data sets suggests that the proposed framework identifies concept drift more effectively and precisely than various existing works.

Hierarchical Mixtures of Generators for Adversarial Learning

Alper Ahmetoğlu, Ethem Alpaydin

Responsive image

Auto-TLDR; Hierarchical Mixture of Generative Adversarial Networks

Slides Similar

Generative adversarial networks (GANs) are deep neural networks that allow us to sample from an arbitrary probability distribution without explicitly estimating the distri- bution. There is a generator that takes a latent vector as input and transforms it into a valid sample from the distribution. There is also a discriminator that is trained to discriminate such fake samples from true samples of the distribution; at the same time, the generator is trained to generate fakes that the discriminator cannot tell apart from the true samples. Instead of learning a global generator, a recent approach involves training multiple generators each responsible from one part of the distribution. In this work, we review such approaches and propose the hierarchical mixture of generators, inspired from the hierarchical mixture of experts model, that learns a tree structure implementing a hierarchical clustering with soft splits in the decision nodes and local generators in the leaves. Since the generators are combined softly, the whole model is continuous and can be trained using gradient-based optimization, just like the original GAN model. Our experiments on five image data sets, namely, MNIST, FashionMNIST, UTZap50K, Oxford Flowers, and CelebA, show that our proposed model generates samples of high quality and diversity in terms of popular GAN evaluation metrics. The learned hierarchical structure also leads to knowledge extraction.

A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios

Ayush Tripathi, Rupayan Chakraborty, Sunil Kumar Kopparapu

Responsive image

Auto-TLDR; Synthetic Minority OverSampling Technique for Imbalanced Data

Slides Poster Similar

Imbalance in the proportion of training samples belonging to different classes often poses performance degradation of conventional classifiers. This is primarily due to the tendency of the classifier to be biased towards the majority classes in the imbalanced dataset. In this paper, we propose a novel three step technique to address imbalanced data. As a first step we significantly oversample the minority class distribution by employing the traditional Synthetic Minority OverSampling Technique (SMOTE) algorithm using the neighborhood of the minority class samples and in the next step we partition the generated samples using a Gaussian-Mixture Model based clustering algorithm. In the final step synthetic data samples are chosen based on the weight associated with the cluster, the weight itself being determined by the distribution of the majority class samples. Extensive experiments on several standard datasets from diverse domains show the usefulness of the proposed technique in comparison with the original SMOTE and its state-of-the-art variants algorithms.

Separation of Aleatoric and Epistemic Uncertainty in Deterministic Deep Neural Networks

Denis Huseljic, Bernhard Sick, Marek Herde, Daniel Kottke

Responsive image

Auto-TLDR; AE-DNN: Modeling Uncertainty in Deep Neural Networks

Slides Poster Similar

Despite the success of deep neural networks (DNN) in many applications, their ability to model uncertainty is still significantly limited. For example, in safety-critical applications such as autonomous driving, it is crucial to obtain a prediction that reflects different types of uncertainty to address life-threatening situations appropriately. In such cases, it is essential to be aware of the risk (i.e., aleatoric uncertainty) and the reliability (i.e., epistemic uncertainty) that comes with a prediction. We present AE-DNN, a model allowing the separation of aleatoric and epistemic uncertainty while maintaining a proper generalization capability. AE-DNN is based on deterministic DNN, which can determine the respective uncertainty measures in a single forward pass. In analyses with synthetic and image data, we show that our method improves the modeling of epistemic uncertainty while providing an intuitively understandable separation of risk and reliability.

Automatic Tuberculosis Detection Using Chest X-Ray Analysis with Position Enhanced Structural Information

Hermann Jepdjio Nkouanga, Szilard Vajda

Responsive image

Auto-TLDR; Automatic Chest X-ray Screening for Tuberculosis in Rural Population using Localized Region on Interest

Slides Poster Similar

For Tuberculosis (TB) detection beside the more expensive diagnosis solutions such as culture or sputum smear analysis one could consider the automatic analysis of the chest X-ray (CXR). This could mimic the lung region reading by the radiologist and it could provide a cheap solution to analyze and diagnose pulmonary abnormalities such as TB which often co- occurs with HIV. This software based pulmonary screening can be a reliable and affordable solution for rural population in different parts of the world such as India, Africa, etc. Our fully automatic system is processing the incoming CXR image by applying image processing techniques to detect the region on interest (ROI) followed by a computationally cheap feature extraction involving edge detection using Laplacian of Gaussian which we enrich by counting the local distribution of the intensities. The choice to ”zoom in” the ROI and look for abnormalities locally is motivated by the fact that some pulmonary abnormalities are localized in specific regions of the lungs. Later on the classifiers can decide about the normal or abnormal nature of each lung X-ray. Our goal is to find a simple feature, instead of a combination of several ones, -proposed and promoted in recent years’ literature, which can properly describe the different pathological alterations in the lungs. Our experiments report results on two publicly available data collections1, namely the Shenzhen and the Montgomery collection. For performance evaluation, measures such as area under the curve (AUC), and accuracy (ACC) were considered, achieving AUC = 0.81 (ACC = 83.33%) and AUC = 0.96 (ACC = 96.35%) for the Montgomery and Schenzen collections, respectively. Several comparisons are also provided to other state- of-the-art systems reported recently in the field.

Low-Cost Lipschitz-Independent Adaptive Importance Sampling of Stochastic Gradients

Huikang Liu, Xiaolu Wang, Jiajin Li, Man-Cho Anthony So

Responsive image

Auto-TLDR; Adaptive Importance Sampling for Stochastic Gradient Descent

Slides Similar

Stochastic gradient descent (SGD) usually samples training data based on the uniform distribution, which may not be a good choice because of the high variance of its stochastic gradient. Thus, importance sampling methods are considered in the literature to improve the performance. Most previous work on SGD-based methods with importance sampling requires the knowledge of Lipschitz constants of all component gradients, which are in general difficult to estimate. In this paper, we study an adaptive importance sampling method for common SGD-based methods by exploiting the local first-order information without knowing any Lipschitz constants. In particular, we periodically changes the sampling distribution by only utilizing the gradient norms in the past few iterations. We prove that our adaptive importance sampling non-asymptotically reduces the variance of the stochastic gradients in SGD, and thus better convergence bounds than that for vanilla SGD can be obtained. We extend this sampling method to several other widely used stochastic gradient algorithms including SGD with momentum and ADAM. Experiments on common convex learning problems and deep neural networks illustrate notably enhanced performance using the adaptive sampling strategy.

Bayesian Active Learning for Maximal Information Gain on Model Parameters

Kasra Arnavaz, Aasa Feragen, Oswin Krause, Marco Loog

Responsive image

Auto-TLDR; Bayesian assumptions for Bayesian classification

Slides Poster Similar

The fact that machine learning models, despite their advancements, are still trained on randomly gathered data is proof that a lasting solution to the problem of optimal data gathering has not yet been found. In this paper, we investigate whether a Bayesian approach to the classification problem can provide assumptions under which one is guaranteed to perform at least as good as random sampling. For a logistic regression model, we show that maximal expected information gain on model parameters is a promising criterion for selecting samples, assuming that our classification model is well-matched to the data. Our derived criterion is closely related to the maximum model change. We experiment with data sets which satisfy this assumption to varying degrees to see how sensitive our performance is to the violation of our assumption in practice.

Dimensionality Reduction for Data Visualization and Linear Classification, and the Trade-Off between Robustness and Classification Accuracy

Martin Becker, Jens Lippel, Thomas Zielke

Responsive image

Auto-TLDR; Robustness Assessment of Deep Autoencoder for Data Visualization using Scatter Plots

Slides Poster Similar

This paper has three intertwined goals. The first is to introduce a new similarity measure for scatter plots. It uses Delaunay triangulations to compare two scatter plots regarding their relative positioning of clusters. The second is to apply this measure for the robustness assessment of a recent deep neural network (DNN) approach to dimensionality reduction (DR) for data visualization. It uses a nonlinear generalization of Fisher's linear discriminant analysis (LDA) as the encoder network of a deep autoencoder (DAE). The DAE's decoder network acts as a regularizer. The third goal is to look at different variants of the DNN: ones that promise robustness and ones that promise high classification accuracies. This is to study the trade-off between these two objectives -- our results support the recent claim that robustness may be at odds with accuracy; however, results that are balanced regarding both objectives are achievable. We see a restricted Boltzmann machine (RBM) pretraining and the DAE based regularization as important building blocks for achieving balanced results. As a means of assessing the robustness of DR methods, we propose a measure that is based on our similarity measure for scatter plots. The robustness measure comes with a superimposition view of Delaunay triangulations, which allows a fast comparison of results from multiple DR methods.

Weakly Supervised Learning through Rank-Based Contextual Measures

João Gabriel Camacho Presotto, Lucas Pascotti Valem, Nikolas Gomes De Sá, Daniel Carlos Guimaraes Pedronette, Joao Paulo Papa

Responsive image

Auto-TLDR; Exploiting Unlabeled Data for Weakly Supervised Classification of Multimedia Data

Slides Poster Similar

Machine learning approaches have achieved remarkable advances over the last decades, especially in supervised learning tasks such as classification. Meanwhile, multimedia data and applications experienced an explosive growth, becoming ubiquitous in diverse domains. Due to the huge increase in multimedia data collections and the lack of labeled data in several scenarios, creating methods capable of exploiting the unlabeled data and operating under weakly supervision is imperative. In this work, we propose a rank-based model to exploit contextual information encoded in the unlabeled data in order to perform weakly supervised classification. We employ different rank-based correlation measures for identifying strong similarities relationships and expanding the labeled set in an unsupervised way. Subsequently, the extended labeled set is used by a classifier to achieve better accuracy results. The proposed weakly supervised approach was evaluated on multimedia classification tasks, considering several combinations of rank correlation measures and classifiers. An experimental evaluation was conducted on 4 public image datasets and different features. Very positive gains were achieved in comparison with various semi-supervised and supervised classifiers taken as baselines when considering the same amount of labeled data.

PIF: Anomaly detection via preference embedding

Filippo Leveni, Luca Magri, Giacomo Boracchi, Cesare Alippi

Responsive image

Auto-TLDR; PIF: Anomaly Detection with Preference Embedding for Structured Patterns

Slides Poster Similar

We address the problem of detecting anomalies with respect to structured patterns. To this end, we conceive a novel anomaly detection method called PIF, that combines the advantages of adaptive isolation methods with the flexibility of preference embedding. Specifically, we propose to embed the data in a high dimensional space where an efficient tree-based method, PI-FOREST, is employed to compute an anomaly score. Experiments on synthetic and real datasets demonstrate that PIF favorably compares with state-of-the-art anomaly detection techniques, and confirm that PI-FOREST is better at measuring arbitrary distances and isolate points in the preference space.

Comparison of Deep Learning and Hand Crafted Features for Mining Simulation Data

Theodoros Georgiou, Sebastian Schmitt, Thomas Baeck, Nan Pu, Wei Chen, Michael Lew

Responsive image

Auto-TLDR; Automated Data Analysis of Flow Fields in Computational Fluid Dynamics Simulations

Slides Poster Similar

Computational Fluid Dynamics (CFD) simulations are a very important tool for many industrial applications, such as aerodynamic optimization of engineering designs like cars shapes, airplanes parts etc. The output of such simulations, in particular the calculated flow fields, are usually very complex and hard to interpret for realistic three-dimensional real-world applications, especially if time-dependent simulations are investigated. Automated data analysis methods are warranted but a non-trivial obstacle is given by the very large dimensionality of the data. A flow field typically consists of six measurement values for each point of the computational grid in 3D space and time (velocity vector values, turbulent kinetic energy, pressure and viscosity). In this paper we address the task of extracting meaningful results in an automated manner from such high dimensional data sets. We propose deep learning methods which are capable of processing such data and which can be trained to solve relevant tasks on simulation data, i.e. predicting drag and lift forces applied on an airfoil. We also propose an adaptation of the classical hand crafted features known from computer vision to address the same problem and compare a large variety of descriptors and detectors. Finally, we compile a large dataset of 2D simulations of the flow field around airfoils which contains 16000 flow fields with which we tested and compared approaches. Our results show that the deep learning-based methods, as well as hand crafted feature based approaches, are well-capable to accurately describe the content of the CFD simulation output on the proposed dataset.