Learning Sign-Constrained Support Vector Machines

Kenya Tajima, Kouhei Tsuchida, Esmeraldo Ronnie Rey Zara, Naoya Ohta, Tsuyoshi Kato

Responsive image

Auto-TLDR; Constrained Sign Constraints for Learning Linear Support Vector Machine

Poster

Domain knowledge is useful to improve the generalization performance of learning machines. Sign constraints are a handy representation to combine domain knowledge with learning machine. In this paper, we consider constraining the signs of the weight coefficients in learning the linear support vector machine, and develop two optimization algorithms for minimizing the empirical risk under the sign constraints. One of the two algorithms is based on the projected gradient method, in which each iteration of the projected gradient method takes O(nd) computational cost and the sublinear convergence of the objective error is guaranteed. The second algorithm is based on the Frank-Wolfe method that also converges sublinearly and possesses a clear termination criterion. We show that each iteration of the Frank-Wolfe also requires O(nd) cost. Furthermore, we derive the explicit expression for the minimal iteration number to ensure an epsilon-accurate solution by analyzing the curvature of the objective function. Finally, we empirically demonstrate that the sign constraints are a promising technique when similarities to the training examples compose the feature vector.

Similar papers

Low-Cost Lipschitz-Independent Adaptive Importance Sampling of Stochastic Gradients

Huikang Liu, Xiaolu Wang, Jiajin Li, Man-Cho Anthony So

Responsive image

Auto-TLDR; Adaptive Importance Sampling for Stochastic Gradient Descent

Slides Similar

Stochastic gradient descent (SGD) usually samples training data based on the uniform distribution, which may not be a good choice because of the high variance of its stochastic gradient. Thus, importance sampling methods are considered in the literature to improve the performance. Most previous work on SGD-based methods with importance sampling requires the knowledge of Lipschitz constants of all component gradients, which are in general difficult to estimate. In this paper, we study an adaptive importance sampling method for common SGD-based methods by exploiting the local first-order information without knowing any Lipschitz constants. In particular, we periodically changes the sampling distribution by only utilizing the gradient norms in the past few iterations. We prove that our adaptive importance sampling non-asymptotically reduces the variance of the stochastic gradients in SGD, and thus better convergence bounds than that for vanilla SGD can be obtained. We extend this sampling method to several other widely used stochastic gradient algorithms including SGD with momentum and ADAM. Experiments on common convex learning problems and deep neural networks illustrate notably enhanced performance using the adaptive sampling strategy.

Classification and Feature Selection Using a Primal-Dual Method and Projections on Structured Constraints

Michel Barlaud, Antonin Chambolle, Jean_Baptiste Caillau

Responsive image

Auto-TLDR; A Constrained Primal-dual Method for Structured Feature Selection on High Dimensional Data

Slides Poster Similar

This paper deals with feature selection using supervised classification on high dimensional datasets. A classical approach is to project data on a low dimensional space and classify by minimizing an appropriate quadratic cost. Our first contribution is to introduce a matrix of centers in the definition of this cost. Moreover, as quadratic costs are not robust to outliers, we propose to use an $\ell_1$ cost instead (or Huber loss to mitigate overfitting issues). While control on sparsity is commonly obtained by adding an $\ell_1$ constraint on the vectorized matrix of weights used for projecting the data, our second contribution is to enforce structured sparsity. To this end we propose constraints that take into account the matrix structure of the data, based either on the nuclear norm, on the $\ell_{2,1}$ norm, or on the $\ell_{1,2}$ norm for which we provide a new projection algorithm. We optimize simultaneously the projection matrix and the matrix of centers thanks to a new tailored constrained primal-dual method. The primal-dual framework is general enough to encompass the various robust losses and structured constraints we use, and allows a convergence analysis. We demonstrate the effectiveness of the approach on three biological datasets. Our primal-dual method with robust losses, adaptive centers and structured constraints does significantly better than classical methods, both in terms of accuracy and computational time.

Unveiling Groups of Related Tasks in Multi-Task Learning

Jordan Frecon, Saverio Salzo, Massimiliano Pontil

Responsive image

Auto-TLDR; Continuous Bilevel Optimization for Multi-Task Learning

Slides Poster Similar

A common approach in multi-task learning is to encourage the tasks to share a low dimensional representation. This has led to the popular method of trace norm regularization, which has proved effective in many applications. In this paper, we extend this approach by allowing the tasks to partition into different groups, within which trace norm regularization is separately applied. We propose a continuous bilevel optimization framework to simultaneously identify groups of related tasks and learn a low dimensional representation within each group. Hinging on recent results on the derivative of generalized matrix functions, we devise a smooth approximation of the upper-level objective via a dual forward-backward algorithm with Bregman distances. This allows us to solve the bilevel problem by a gradient-based scheme. Numerical experiments on synthetic and benchmark datasets support the effectiveness of the proposed method.

RNN Training along Locally Optimal Trajectories via Frank-Wolfe Algorithm

Yun Yue, Ming Li, Venkatesh Saligrama, Ziming Zhang

Responsive image

Auto-TLDR; Frank-Wolfe Algorithm for Efficient Training of RNNs

Slides Poster Similar

We propose a novel and efficient training method for RNNs by iteratively seeking a local minima on the loss surface within a small region, and leverage this directional vector for the update, in an outer-loop. We propose to utilize the Frank-Wolfe (FW) algorithm in this context. Although, FW implicitly involves normalized gradients, which can lead to a slow convergence rate, we develop a novel RNN training method that, surprisingly, even with the additional cost, the overall training cost is empirically observed to be lower than back-propagation. Our method leads to a new Frank-Wolfe method, that is in essence an SGD algorithm with a restart scheme. We prove that under certain conditions our algorithm has a sublinear convergence rate of $O(1/\epsilon)$ for $\epsilon$ error. We then conduct empirical experiments on several benchmark datasets including those that exhibit long-term dependencies, and show significant performance improvement. We also experiment with deep RNN architectures and show efficient training performance. Finally, we demonstrate that our training method is robust to noisy data.

Automatically Mining Relevant Variable Interactions Via Sparse Bayesian Learning

Ryoichiro Yafune, Daisuke Sakuma, Yasuo Tabei, Noritaka Saito, Hiroto Saigo

Responsive image

Auto-TLDR; Sparse Bayes for Interpretable Non-linear Prediction

Slides Poster Similar

With the rapid increase in the availability of large amount of data, prediction is becoming increasingly popular, and has widespread through our daily life. However, powerful non- linear prediction methods such as deep learning and SVM suffer from interpretability problem, making it hard to use in domains where the reason for decision making is required. In this paper, we develop an interpretable non-linear model called itemset Sparse Bayes (iSB), which builds a Bayesian probabilistic model, while simultaneously considering variable interactions. In order to suppress the resulting large number of variables, sparsity is imposed on regression weights by a sparsity inducing prior. As a subroutine to search for variable interactions, itemset enumeration algorithm is employed with a novel bounding condition. In computational experiments using real-world dataset, the proposed method performed better than decision tree by 10% in terms of r-squared . We also demonstrated the advantage of our method in Bayesian optimization setting, in which the proposed approach could successfully find the maximum of an unknown function faster than Gaussian process. The interpretability of iSB is naturally inherited to Bayesian optimization, thereby gives us a clue to understand which variables interactions are important in optimizing an unknown function.

Learning Sparse Deep Neural Networks Using Efficient Structured Projections on Convex Constraints for Green AI

Michel Barlaud, Frederic Guyard

Responsive image

Auto-TLDR; Constrained Deep Neural Network with Constrained Splitting Projection

Slides Poster Similar

In recent years, deep neural networks (DNN) have been applied to different domains and achieved dramatic performance improvements over state-of-the-art classical methods. These performances of DNNs were however often obtained with networks containing millions of parameters and which training required heavy computational power. In order to cope with this computational issue a huge literature deals with proximal regularization methods which are time consuming.\\ In this paper, we propose instead a constrained approach. We provide the general framework for our new splitting projection gradient method. Our splitting algorithm iterates a gradient step and a projection on convex sets. We study algorithms for different constraints: the classical $\ell_1$ unstructured constraint and structured constraints such as the nuclear norm, the $\ell_{2,1} $ constraint (Group LASSO). We propose a new $\ell_{1,1} $ structured constraint for which we provide a new projection algorithm We demonstrate the effectiveness of our method on three popular datasets (MNIST, Fashion MNIST and CIFAR). Experiments on these datasets show that our splitting projection method with our new $\ell_{1,1} $ structured constraint provides the best reduction of memory and computational power. Experiments show that fully connected linear DNN are more efficient for green AI.

Feature Extraction by Joint Robust Discriminant Analysis and Inter-Class Sparsity

Fadi Dornaika, Ahmad Khoder

Responsive image

Auto-TLDR; Robust Discriminant Analysis with Feature Selection and Inter-class Sparsity (RDA_FSIS)

Slides Similar

Feature extraction methods have been successfully applied to many real-world applications. The classical Linear Discriminant Analysis (LDA) and its variants are widely used as feature extraction methods. Although they have been used for different classification tasks, these methods have some shortcomings. The main one is that the projection axes obtained are not informative about the relevance of original features. In this paper, we propose a linear embedding method that merges two interesting properties: Robust LDA and inter-class sparsity. Furthermore, the targeted projection transformation focuses on the most discriminant original features. The proposed method is called Robust Discriminant Analysis with Feature Selection and Inter-class Sparsity (RDA_FSIS). Two kinds of sparsity are explicitly included in the proposed model. The first kind is obtained by imposing the $\ell_{2,1}$ constraint on the projection matrix in order to perform feature ranking. The second kind is obtained by imposing the inter-class sparsity constraint used for getting a common sparsity structure in each class. Comprehensive experiments on five real-world image datasets demonstrate the effectiveness and advantages of our framework over existing linear methods.

Stochastic Runge-Kutta Methods and Adaptive SGD-G2 Stochastic Gradient Descent

Gabriel Turinici, Imen Ayadi

Responsive image

Auto-TLDR; Adaptive Stochastic Runge Kutta for the Minimization of the Loss Function

Slides Poster Similar

The minimization of the loss function is of paramount importance in deep neural networks. Many popular optimization algorithms have been shown to correspond to some evolution equation of gradient flow type. Inspired by the numerical schemes used for general evolution equations, we introduce a second-order stochastic Runge Kutta method and show that it yields a consistent procedure for the minimization of the loss function. In addition, it can be coupled, in an adaptive framework, with the Stochastic Gradient Descent (SGD) to adjust automatically the learning rate of the SGD The resulting adaptive SGD, called SGD-G2, shows good results in terms of convergence speed when tested on standard data-sets.

Wasserstein k-Means with Sparse Simplex Projection

Takumi Fukunaga, Hiroyuki Kasai

Responsive image

Auto-TLDR; SSPW $k$-means: Sparse Simplex Projection-based Wasserstein $ k$-Means Algorithm

Slides Poster Similar

This paper presents a proposal of a faster Wasserstein $k$-means algorithm for histogram data by reducing Wasserstein distance computations exploiting sparse simplex projection. We shrink data samples, centroids and ground cost matrix, which enables significant reduction of the computations to solve optimal transport problems without loss of clustering quality. Furthermore, we dynamically reduce computational complexity by removing lower-valued data samples harnessing sparse simplex projection while keeping degradation of clustering quality lower. We designate this proposed algorithm as sparse simplex projection-based Wasserstein $k$-means, for short, SSPW $k$-means. Numerical evaluations against Wasserstein $k$-means algorithm demonstrate the effectiveness of the proposed SSPW $k$-means on real-world datasets.

Double Manifolds Regularized Non-Negative Matrix Factorization for Data Representation

Jipeng Guo, Shuai Yin, Yanfeng Sun, Yongli Hu

Responsive image

Auto-TLDR; Double Manifolds Regularized Non-negative Matrix Factorization for Clustering

Slides Poster Similar

Non-negative matrix factorization (NMF) is an important method in learning latent data representation. The local geometrical structure can make the learned representation more effectively and significantly improve the performance of NMF. However, most of existing graph-based learning methods are determined by a predefined similarity graph which may be not optimal for specific tasks. To solve the above the problem, we propose the Double Manifolds Regularized NMF (DMR-NMF) model which jointly learns an adaptive affinity matrix with the non-negative matrix factorization. The learned affinity matrix can guide the NMF to fit the clustering task. Moreover, we develop the iterative updating optimization schemes for DMR-NMF, and provide the strict convergence proof of our optimization strategy. Empirical experiments on four different real-world data sets demonstrate the state-of-the-art performance of DMR-NMF in comparison with the other related algorithms.

Mean Decision Rules Method with Smart Sampling for Fast Large-Scale Binary SVM Classification

Alexandra Makarova, Mikhail Kurbakov, Valentina Sulimova

Responsive image

Auto-TLDR; Improving Mean Decision Rule for Large-Scale Binary SVM Problems

Slides Poster Similar

This paper relies on the Mean Decision Rule (MDR) method for solving large-scale binary SVM problems. It consists in taking small random samples of the full dataset and separate training for each of them with consecutive averaging the respective individual decision rules to obtain a final one. This paper proposes two new approaches to improve it. The first proposed approach is a new sampling technique that exploits SVM and MDR properties to fast form so called smart samples by selecting only the objects, that are candidates to be the support ones. The proposed technique essentially increases MDR convergence and allows to reach the highest quality in less time. In the case of kernel-based MDR (KMDR) the proposed sampling technique allows additionally to reduce the number of support objects in the final decision rule and, as a result, to decrease the recognition time. The second proposed approach is a new data strategy to accelerate random access to large datasets stored in the traditional libsvm format. The proposed strategy allows to quickly extract random subsets of objects from a file and load them into RAM, and is it also suitable for any sampling-based methods, including stochastic gradient methods. Joint using of the proposed approaches with (K)MDR allows to obtain the best (or near the best) decision of large-scale binary SVM problems faster, compared to the existing SVM solvers.

An Efficient Empirical Solver for Localized Multiple Kernel Learning Via DNNs

Ziming Zhang

Responsive image

Auto-TLDR; Localized Multiple Kernel Learning using LMKL-Net

Slides Poster Similar

In this paper we propose solving localized multiple kernel learning (LMKL) using LMKL-Net, a feedforward deep neural network (DNN). In contrast to previous works, as a learning principle we propose parameterizing the gating function for learning kernel combination weights and the multiclass classifier using an attentional network (AN) and a multilayer perceptron (MLP), respectively. Such interpretability helps us better understand how the network solves the problem. Thanks to stochastic gradient descent (SGD), our approach has {\em linear} computational complexity in training. Empirically on benchmark datasets we demonstrate that with comparable or better accuracy than the state-of-the-art, our LMKL-Net can be trained about {\bf two orders of magnitude} faster with about {\bf two orders of magnitude} smaller memory footprint for large-scale learning.

A Randomized Algorithm for Sparse Recovery

Huiyuan Yu, Maggie Cheng, Yingdong Lu

Responsive image

Auto-TLDR; A Constrained Graph Optimization Algorithm for Sparse Signal Recovery

Poster Similar

This paper considers the problem of sparse signal recovery where there is a structure in the signal. Efficient recovery schemes can be designed to leverage the signal structure. Following the model-based compressive sensing framework, we have developed an efficient algorithm for both head and tail approximations for the model-projection problem. The problem is modeled as a constrained graph optimization problem, which is an NP-hard optimization problem. Solving the NP-hard optimization program is then transformed to solving a linear program and finding a randomized algorithm to find an integral solution. The integral solution is optimal-in-expectation. The algorithm is proved to have the same geometric convergence as previous work. The algorithm has been tested on various compressing matrices. It worked well with the matrices with the Restricted Isometry Property (RIP), also worked well with some matrices that have not been shown to have RIP. The proposed algorithm demonstrated improved recoverability and used fewer number of iterations to recover the signal.

Adaptive Sampling of Pareto Frontiers with Binary Constraints Using Regression and Classification

Raoul Heese, Michael Bortz

Responsive image

Auto-TLDR; Adaptive Optimization for Black-Box Multi-Objective Optimizing Problems with Binary Constraints

Poster Similar

We present a novel adaptive optimization algorithm for black-box multi-objective optimization problems with binary constraints on the foundation of Bayes optimization. Our method is based on probabilistic regression and classification models, which act as a surrogate for the optimization goals and allow us to suggest multiple design points at once in each iteration. The proposed acquisition function is intuitively understandable and can be tuned to the demands of the problems at hand. We also present a novel ellipsoid truncation method to speed up the expected hypervolume calculation in a straightfoward way for regression models with a normal probability density. We benchmark our approach with an evolutionary algorithm on multiple test problems.

A Multilinear Sampling Algorithm to Estimate Shapley Values

Ramin Okhrati, Aldo Lipani

Responsive image

Auto-TLDR; A sampling method for Shapley values for multilayer Perceptrons

Slides Poster Similar

Shapley values are great analytical tools in game theory to measure the importance of a player in a game. Due to their axiomatic and desirable properties such as efficiency, they have become popular for feature importance analysis in data science and machine learning. However, the time complexity to compute Shapley values based on the original formula is exponential, and as the number of features increases, this becomes infeasible. Castro et al. [1] developed a sampling algorithm, to estimate Shapley values. In this work, we propose a new sampling method based on a multilinear extension technique as applied in game theory. The aim is to provide a more efficient (sampling) method for estimating Shapley values. Our method is applicable to any machine learning model, in particular for either multiclass classifications or regression problems. We apply the method to estimate Shapley values for multilayer Perceptrons (MLPs) and through experimentation on two datasets, we demonstrate that our method provides more accurate estimations of the Shapley values by reducing the variance of the sampling statistics

Subspace Clustering Via Joint Unsupervised Feature Selection

Wenhua Dong, Xiaojun Wu, Hui Li, Zhenhua Feng, Josef Kittler

Responsive image

Auto-TLDR; Unsupervised Feature Selection for Subspace Clustering

Poster Similar

Any high-dimensional data arising from practical applications usually contains irrelevant features, which may impact on the performance of existing subspace clustering methods. This paper proposes a novel subspace clustering method, which reconstructs the feature matrix by the means of unsupervised feature selection (UFS) to achieve a better dictionary for subspace clustering (SC). Different from most existing clustering methods, the proposed approach uses a reconstructed feature matrix as the dictionary rather than the original data matrix. As the feature matrix reconstructed by representative features is more discriminative and closer to the ground-truth, it results in improved performance. The corresponding non-convex optimization problem is effectively solved using the half-quadratic and augmented Lagrange multiplier methods. Extensive experiments on four real datasets demonstrate the effectiveness of the proposed method.

Low Rank Representation on Product Grassmann Manifolds for Multi-viewSubspace Clustering

Jipeng Guo, Yanfeng Sun, Junbin Gao, Yongli Hu, Baocai Yin

Responsive image

Auto-TLDR; Low Rank Representation on Product Grassmann Manifold for Multi-View Data Clustering

Slides Poster Similar

Clustering high dimension multi-view data with complex intrinsic properties and nonlinear manifold structure is a challenging task since these data are always embedded in low dimension manifolds. Inspired by Low Rank Representation (LRR), some researchers extended classic LRR on Grassmann manifold or Product Grassmann manifold to represent data with non-linear metrics. However, most of these methods utilized convex nuclear norm to leverage a low-rank structure, which was over-relaxation of true rank and would lead to the results deviated from the true underlying ones. And, the computational complexity of singular value decomposition of matrix is high for nuclear norm minimization. In this paper, we propose a new low rank model for high-dimension multi-view data clustering on Product Grassmann Manifold with the matrix tri-factorization which is used to control the upper bound of true rank of representation matrix. And, the original problem can be transformed into the nuclear norm minimization with smaller scale matrices. An effective solution and theoretical analysis are also provided. The experimental results show that the proposed method obviously outperforms other state-of-the-art methods on several multi-source human/crowd action video datasets.

Feature Extraction and Selection Via Robust Discriminant Analysis and Class Sparsity

Ahmad Khoder, Fadi Dornaika

Responsive image

Auto-TLDR; Hybrid Linear Discriminant Embedding for supervised multi-class classification

Slides Poster Similar

The main goal of discriminant embedding is to extract features that can be compact and informative representations of the original set of features. This paper introduces a hybrid scheme for linear feature extraction for supervised multi-class classification. We introduce a unifying criterion that is able to retain the advantages of robust sparse LDA and Inter-class sparsity. Thus, the estimated transformation includes two types of discrimination which are the inter-class sparsity and robust Linear Discriminant Analysis with feature selection. In order to optimize the proposed objective function, we deploy an iterative alternating minimization scheme for estimating the linear transformation and the orthogonal matrix. The introduced scheme is generic in the sense that it can be used for combining and tuning many other linear embedding methods. In the lights of the experiments conducted on six image datasets including faces, objects, and digits, the proposed scheme was able to outperform competing methods in most of the cases.

Sketch-Based Community Detection Via Representative Node Sampling

Mahlagha Sedghi, Andre Beckus, George Atia

Responsive image

Auto-TLDR; Sketch-based Clustering of Community Detection Using a Small Sketch

Slides Poster Similar

This paper proposes a sketch-based approach to the community detection problem which clusters the full graph through the use of an informative and concise sketch. The reduced sketch is built through an effective sampling approach which selects few nodes that best represent the complete graph and operates on a pairwise node similarity measure based on the average commute time. After sampling, the proposed algorithm clusters the nodes in the sketch, and then infers the cluster membership of the remaining nodes in the full graph based on their aggregate similarity to nodes in the partitioned sketch. By sampling nodes with strong representation power, our approach can improve the success rates over full graph clustering. In challenging cases with large node degree variation, our approach not only maintains competitive accuracy with full graph clustering despite using a small sketch, but also outperforms existing sampling methods. The use of a small sketch allows considerable storage savings, and computational and timing improvements for further analysis such as clustering and visualization. We provide numerical results on synthetic data based on the homogeneous, heterogeneous and degree corrected versions of the stochastic block model, as well as experimental results on real-world data.

A New Convex Loss Function for Multiple Instance Support Vector Machines

Sang-Baeg Kim, Jung-Man Bae

Responsive image

Auto-TLDR; WR-SVM: A Novel Multiple Instance SVM for Video Classification

Slides Poster Similar

We develop a novel multiple instance SVM based on maximizing the minimum witness rate (WR) among positive bags. We solve this nonlinear integer programming problem by the multiple instance SVM formulation with a convex loss function, where unknown integer labels of instances in positive bags are relaxed to real numbers between -1 and 1 using the tanh(.) to estimate WR of a positive bag. Our new model, WR-SVM, also can incorporates the multiple instance learning (MIL) problem into a simple deep neural network framework with no additional MIL pooling layers. WR-SVM is robust to input perturbation by eliminating the imbalance between positive instances allocated to positive bags. A better generalization ability is expected by the large margin due to the balanced allocation of positive instances to positive bags. We perform experiments on various video datasets to verify the effectiveness of our method for video classification. The results of WR-SVM outperform the state-of-the-art for MIL-based video classification models.

Label Self-Adaption Hashing for Image Retrieval

Jianglin Lu, Zhihui Lai, Hailing Wang, Jie Zhou

Responsive image

Auto-TLDR; Label Self-Adaption Hashing for Large-Scale Image Retrieval

Slides Poster Similar

Hashing has attracted widespread attention in image retrieval because of its fast retrieval speed and low storage cost. Compared with supervised methods, unsupervised hashing methods are more reasonable and suitable for large-scale image retrieval since it is always difficult and expensive to collect true labels of the massive data. Without label information, however, unsupervised hashing methods can not guarantee the quality of learned binary codes. To resolve this dilemma, this paper proposes a novel unsupervised hashing method called Label Self-Adaption Hashing (LSAH), which contains effective hashing function learning part and self-adaption label generation part. In the first part, we utilize anchor graph to keep the local structure of the data and introduce joint sparsity into the model to extract effective features for high-quality binary code learning. In the second part, a self-adaptive cluster label matrix is learned from the data under the assumption that the nearest neighbor points should have a large probability to be in the same cluster. Therefore, the proposed LSAH can make full use of the potential discriminative information of the data to guide the learning of binary code. It is worth noting that LSAH can learn effective binary codes, hashing function and cluster labels simultaneously in a unified optimization framework. To solve the resulting optimization problem, an Augmented Lagrange Multiplier based iterative algorithm is elaborately designed. Extensive experiments on three large-scale data sets indicate the promising performance of the proposed LSAH.

Scalable Direction-Search-Based Approach to Subspace Clustering

Yicong He, George Atia

Responsive image

Auto-TLDR; Fast Direction-Search-Based Subspace Clustering

Slides Similar

Subspace clustering finds a multi-subspace representation that best fits a high-dimensional dataset. The computational and storage complexities of existing algorithms limit their usefulness for large scale data. In this paper, we develop a novel scalable approach to subspace clustering termed Fast Direction-Search-Based Subspace Clustering (Fast DiSC). In sharp contrast to existing scalable solutions which are mostly based on the self-expressiveness property of the data, Fast DiSC rests upon a new representation obtained from projections on computed data-dependent directions. These directions are derived from a convex formulation for optimal direction search to gauge hidden similarity relations. The computational complexity is significantly reduced by performing direction search in partitions of sampled data, followed by a retrieval step to cluster out-of-sample data using projections on the computed directions. A theoretical analysis underscores the ability of the proposed formulation to construct local similarity relations for the different data points. Experiments on both synthetic and real data demonstrate that the proposed algorithm can often outperform the state-of-the-art clustering methods.

Learning Connectivity with Graph Convolutional Networks

Hichem Sahbi

Responsive image

Auto-TLDR; Learning Graph Convolutional Networks Using Topological Properties of Graphs

Slides Poster Similar

Learning graph convolutional networks (GCNs) is an emerging field which aims at generalizing convolutional operations to arbitrary non-regular domains. In particular, GCNs operating on spatial domains show superior performances compared to spectral ones, however their success is highly dependent on how the topology of input graphs is defined. In this paper, we introduce a novel framework for graph convolutional networks that learns the topological properties of graphs. The design principle of our method is based on the optimization of a constrained objective function which learns not only the usual convolutional parameters in GCNs but also a transformation basis that conveys the most relevant topological relationships in these graphs. Experiments conducted on the challenging task of skeleton-based action recognition shows the superiority of the proposed method compared to handcrafted graph design as well as the related work.

Discrete Semantic Matrix Factorization Hashing for Cross-Modal Retrieval

Jianyang Qin, Lunke Fei, Shaohua Teng, Wei Zhang, Genping Zhao, Haoliang Yuan

Responsive image

Auto-TLDR; Discrete Semantic Matrix Factorization Hashing for Cross-Modal Retrieval

Slides Poster Similar

Hashing has been widely studied for cross-modal retrieval due to its promising efficiency and effectiveness in massive data analysis. However, most existing supervised hashing has the limitations of inefficiency for very large-scale search and intractable discrete constraint for hash codes learning. In this paper, we propose a new supervised hashing method, namely, Discrete Semantic Matrix Factorization Hashing (DSMFH), for cross-modal retrieval. First, we conduct the matrix factorization via directly utilizing the available label information to obtain a latent representation, so that both the inter-modality and intra-modality similarities are well preserved. Then, we simultaneously learn the discriminative hash codes and corresponding hash functions by deriving the matrix factorization into a discrete optimization. Finally, we adopt an alternatively iterative procedure to efficiently optimize the matrix factorization and discrete learning. Extensive experimental results on three widely used image-tag databases demonstrate the superiority of the DSMFH over state-of-the-art cross-modal hashing methods.

Uniform and Non-Uniform Sampling Methods for Sub-Linear Time K-Means Clustering

Yuanhang Ren, Ye Du

Responsive image

Auto-TLDR; Sub-linear Time Clustering with Constant Approximation Ratio for K-Means Problem

Slides Poster Similar

The $k$-means problem is arguably the most well-known clustering problem in machine learning, and lots of approximation algorithms have been proposed for it. However, many of these algorithms may become infeasible when data is huge. Sub-linear time algorithms with constant approximation ratios are desirable in this scenario. In this paper, we first improve the analysis of the algorithm proposed by \cite{Mohan:2017:BNA:3172077.3172235} by sharpening the approximation ratio from $4(\alpha+\beta)$ to $\alpha+\beta$. Moreover, on mild assumptions of the data, a constant approximation ratio can be achieved in poly-logarithmic time by the algorithm. Furthermore, we propose a novel sub-linear time clustering algorithm called {\it Double-K-M$\text{C}^2$ sampling} as well. Experiments on the data clustering task and the image segmentation task have validated the effectiveness of our algorithms.

Watermelon: A Novel Feature Selection Method Based on Bayes Error Rate Estimation and a New Interpretation of Feature Relevance and Redundancy

Xiang Xie, Wilhelm Stork

Responsive image

Auto-TLDR; Feature Selection Using Bayes Error Rate Estimation for Dynamic Feature Selection

Slides Poster Similar

Feature selection has become a crucial part of many classification problems in which high-dimensional datasets may contain tens of thousands of features. In this paper, we propose a novel feature selection method scoring the features through estimating the Bayes error rate based on kernel density estimation. Additionally, we update the scores of features dynamically by quantitatively interpreting the effects of feature relevance and redundancy in a new way. Distinguishing from the common heuristic applied by many feature selection methods, which prefers choosing features that are not relevant to each other, our approach penalizes only monotonically correlated features and rewards any other kind of relevance among features based on Spearman’s rank correlation coefficient and normalized mutual information. We conduct extensive experiments on seventeen diverse classification benchmarks, the results show that our approach overperforms other seventeen popular state-of-the-art feature selection methods in most cases.

Interpretable Structured Learning with Sparse Gated Sequence Encoder for Protein-Protein Interaction Prediction

Kishan K C, Feng Cui, Anne Haake, Rui Li

Responsive image

Auto-TLDR; Predicting Protein-Protein Interactions Using Sequence Representations

Slides Poster Similar

Predicting protein-protein interactions (PPIs) by learning informative representations from amino acid sequences is a challenging yet important problem in biology. Although various deep learning models in Siamese architecture have been proposed to model PPIs from sequences, these methods are computationally expensive for a large number of PPIs due to the pairwise encoding process. Furthermore, these methods are difficult to interpret because of non-intuitive mappings from protein sequences to their sequence representation. To address these challenges, we present a novel deep framework to model and predict PPIs from sequence alone. Our model incorporates a bidirectional gated recurrent unit to learn sequence representations by leveraging contextualized and sequential information from sequences. We further employ a sparse regularization to model long-range dependencies between amino acids and to select important amino acids (protein motifs), thus enhancing interpretability. Besides, the novel design of the encoding process makes our model computationally efficient and scalable to an increasing number of interactions. Experimental results on up-to-date interaction datasets demonstrate that our model achieves superior performance compared to other state-of-the-art methods. Literature-based case studies illustrate the ability of our model to provide biological insights to interpret the predictions.

Fast Subspace Clustering Based on the Kronecker Product

Lei Zhou, Xiao Bai, Liang Zhang, Jun Zhou, Edwin Hancock

Responsive image

Auto-TLDR; Subspace Clustering with Kronecker Product for Large Scale Datasets

Slides Poster Similar

Subspace clustering is a useful technique for many computer vision applications in which the intrinsic dimension of high-dimensional data is often smaller than the ambient dimension. Spectral clustering, as one of the main approaches to subspace clustering, often takes on a sparse representation or a low-rank representation to learn a block diagonal self-representation matrix for subspace generation. However, existing methods require solving a large scale convex optimization problem with a large set of data, with computational complexity reaches O(N^3) for N data points. Therefore, the efficiency and scalability of traditional spectral clustering methods can not be guaranteed for large scale datasets. In this paper, we propose a subspace clustering model based on the Kronecker product. Due to the property that the Kronecker product of a block diagonal matrix with any other matrix is still a block diagonal matrix, we can efficiently learn the representation matrix which is formed by the Kronecker product of k smaller matrices. By doing so, our model significantly reduces the computational complexity to O(kN^{3/k}). Furthermore, our model is general in nature, and can be adapted to different regularization based subspace clustering methods. Experimental results on two public datasets show that our model significantly improves the efficiency compared with several state-of-the-art methods. Moreover, we have conducted experiments on synthetic data to verify the scalability of our model for large scale datasets.

Object Classification of Remote Sensing Images Based on Optimized Projection Supervised Discrete Hashing

Qianqian Zhang, Yazhou Liu, Quansen Sun

Responsive image

Auto-TLDR; Optimized Projection Supervised Discrete Hashing for Large-Scale Remote Sensing Image Object Classification

Slides Poster Similar

Recently, with the increasing number of large-scale remote sensing images, the demand for large-scale remote sensing image object classification is growing and attracting the interest of many researchers. Hashing, because of its low memory requirements and high time efficiency, has been widely solve the problem of large-scale remote sensing image. Supervised hashing methods mainly leverage the label information of remote sensing image to learn hash function, however, the similarity of the original feature space cannot be well preserved, which can not meet the accurate requirements for object classification of remote sensing image. To solve the mentioned problem, we propose a novel method named Optimized Projection Supervised Discrete Hashing(OPSDH), which jointly learns a discrete binary codes generation and optimized projection constraint model. It uses an effective optimized projection method to further constraint the supervised hash learning and generated hash codes preserve the similarity based on the data label while retaining the similarity of the original feature space. The experimental results show that OPSDH reaches improved performance compared with the existing hash learning methods and demonstrate that the proposed method is more efficient for operational applications

Temporal Pattern Detection in Time-Varying Graphical Models

Federico Tomasi, Veronica Tozzo, Annalisa Barla

Responsive image

Auto-TLDR; A dynamical network inference model that leverages on kernels to consider general temporal patterns

Slides Poster Similar

Graphical models allow to describe the interplay among variables of a system through a compact representation, suitable when relations evolve over time. For example, in a biological setting, genes interact differently depending on external environmental or metabolic factors. To incorporate this dynamics a viable strategy is to estimate a sequence of temporally related graphs assuming similarity among samples in different time points. While adjacent time points may direct the analysis towards a robust estimate of the underlying graph, the resulting model will not incorporate long-term or recurrent temporal relationships. In this work we propose a dynamical network inference model that leverages on kernels to consider general temporal patterns (such as circadian rhythms or seasonality). We show how our approach may also be exploited when the recurrent patterns are unknown, by coupling the network inference with a clustering procedure that detects possibly non-consecutive similar networks. Such clusters are then used to build similarity kernels. The convexity of the functional is determined by whether we impose or infer the kernel. In the first case, the optimisation algorithm exploits efficiently proximity operators with closed-form solutions. In the other case, we resort to an alternating minimisation procedure which jointly learns the temporal kernel and the underlying network. Extensive analysis on synthetic data shows the efficacy of our models compared to state-of-the-art methods. Finally, we applied our approach on two real-world applications to show how considering long-term patterns is fundamental to have insights on the behaviour of a complex system.

Exploiting Non-Linear Redundancy for Neural Model Compression

Muhammad Ahmed Shah, Raphael Olivier, Bhiksha Raj

Responsive image

Auto-TLDR; Compressing Deep Neural Networks with Linear Dependency

Slides Poster Similar

Deploying deep learning models with millions, even billions, of parameters is challenging given real world memory, power and compute constraints. In an effort to make these models more practical, in this paper, we propose a novel model compression approach that exploits linear dependence between the activations in a layer to eliminate entire structural units (neurons/convolutional filters). Our approach also adjusts the weights of the layer in a manner that is provably lossless while training if the removed neuron was perfectly predictable. We combine this approach with an annealing algorithm that may be applied during training, or even on a trained model, and demonstrate, using popular datasets, that our technique can reduce the parameters of VGG and AlexNet by more than 97\% on \cifar, 85\% on \caltech, and 19\% on ImageNet at less than 2\% loss in accuracy. Furthermore, we provide theoretical results showing that in overparametrized, locally linear (ReLU) neural networks where redundant features exist, and with correct hyperparameter selection, our method is indeed able to capture and suppress those dependencies.

Embedding Shared Low-Rank and Feature Correlation for Multi-View Data Analysis

Zhan Wang, Lizhi Wang, Hua Huang

Responsive image

Auto-TLDR; embedding shared low-rank and feature correlation for multi-view data analysis

Slides Poster Similar

The diversity of multimedia data in the real-world usually forms multi-view features. How to explore the structure information and correlations among multi-view features is still an open problem. In this paper, we propose a novel multi-view subspace learning method, named embedding shared low-rank and feature correlation (ESLRFC), for multi-view data analysis. First, in the embedding subspace, we propose a robust low-rank model on each feature set and enforce a shared low-rank constraint to characterize the common structure information of multiple feature data. Second, we develop an enhanced correlation analysis in the embedding subspace for simultaneously removing the redundancy of each feature set and exploring the correlations of multiple feature data. Finally, we incorporate the low-rank model and the correlation analysis into a unified framework. The shared low-rank constraint not only depicts the data distribution consistency among multiple feature data, but also assists robust subspace learning. Experimental results on recognition tasks demonstrate the superior performance and noise robustness of the proposed method.

Soft Label and Discriminant Embedding Estimation for Semi-Supervised Classification

Fadi Dornaika, Abdullah Baradaaji, Youssof El Traboulsi

Responsive image

Auto-TLDR; Semi-supervised Semi-Supervised Learning for Linear Feature Extraction and Label Propagation

Slides Poster Similar

In recent times, graph-based semi-supervised learning proved to be a powerful paradigm for processing and mining large datasets. The main advantage relies on the fact that these methods can be useful in propagating a small set of known labels to a large set of unlabeled data. The scarcity of labeled data may affect the performance of the semi-learning. This paper introduces a new semi-supervised framework for simultaneous linear feature extraction and label propagation. The proposed method simultaneously estimates a discriminant transformation and the unknown label by exploiting both labeled and unlabeled data. In addition, the unknowns of the learning model are estimated by integrating two types of graph-based smoothness constraints. The resulting semi-supervised model is expected to learn more discriminative information. Experiments are conducted on six public image datasets. These experimental results show that the performance of the proposed method can be better than that of many state-of-the-art graph-based semi-supervised algorithms.

3D Pots Configuration System by Optimizing Over Geometric Constraints

Jae Eun Kim, Muhammad Zeeshan Arshad, Seong Jong Yoo, Je Hyeong Hong, Jinwook Kim, Young Min Kim

Responsive image

Auto-TLDR; Optimizing 3D Configurations for Stable Pottery Restoration from irregular and noisy evidence

Slides Poster Similar

While potteries are common artifacts excavated in archaeological sites, the restoration process relies on the manual cleaning and reassembling shattered pieces. Since the number of possible 3D configurations is considerably large, the exhaustive manual trial may result in an abrasion on fractured surfaces and even failure to find the correct matches. As a result, many recent works suggest virtual reassembly from 3D scans of the fragments. The problem is challenging in the view of the conventional 3D geometric analysis, as it is hard to extract reliable shape features from the thin break lines. We propose to optimize the global configuration by combining geometric constraints with information from noisy shape features. Specifically, we enforce bijection and continuity of sequence of correspondences given estimates of corners and pair-wise matching scores between multiple break lines. We demonstrate that our pipeline greatly increases the accuracy of correspondences, resulting in the stable restoration of 3D configurations from irregular and noisy evidence.

Automatic Classification of Human Granulosa Cells in Assisted Reproductive Technology Using Vibrational Spectroscopy Imaging

Marina Paolanti, Emanuele Frontoni, Giorgia Gioacchini, Giorgini Elisabetta, Notarstefano Valentina, Zacà Carlotta, Carnevali Oliana, Andrea Borini, Marco Mameli

Responsive image

Auto-TLDR; Predicting Oocyte Quality in Assisted Reproductive Technology Using Machine Learning Techniques

Slides Poster Similar

In the field of reproductive technology, the biochemical composition of female gametes has been successfully investigated with the use of vibrational spectroscopy. Currently, in assistive reproductive technology (ART), there are no shared criteria for the choice of oocyte, and automatic classification methods for the best quality oocytes have not yet been applied. In this paper, considering the lack of criteria in Assisted Reproductive Technology (ART), we use Machine Learning (ML) techniques to predict oocyte quality for a successful pregnancy. To improve the chances of successful implantation and minimize any complications during the pregnancy, Fourier transform infrared microspectroscopy (FTIRM) analysis has been applied on granulosa cells (GCs) collected along with the oocytes during oocyte aspiration, as it is routinely done in ART, and specific spectral biomarkers were selected by multivariate statistical analysis. A proprietary biological reference dataset (BRD) was successfully collected to predict the best oocyte for a successful pregnancy. Personal health information are stored, maintained and backed up using a cloud computing service. Using a user-friendly interface, the user will evaluate whether or not the selected oocyte will have a positive result. This interface includes a dashboard for retrospective analysis, reporting, real-time processing, and statistical analysis. The experimental results are promising and confirm the efficiency of the method in terms of classification metrics: precision, recall, and F1-score (F1) measures.

Supervised Classification Using Graph-Based Space Partitioning for Multiclass Problems

Nicola Yanev, Ventzeslav Valev, Adam Krzyzak, Karima Ben Suliman

Responsive image

Auto-TLDR; Box Classifier for Multiclass Classification

Slides Poster Similar

We introduce and investigate in multiclass setting an efficient classifier which partitions the training data by means of multidimensional parallelepipeds called boxes. We show that multiclass classification problem at hand can be solved by integrating the heuristic minimum clique cover approach and the k-nearest neighbor rule. Our algorithm is motivated an algorithm for partitioning a graph into a minimal number of maximal. The main advantage of the new classifier called Box classifier is that it optimally utilizes the geometrical structure of the training set by decomposing the l-class problem (l > 2) into l binary classification problems. We discuss computational complexity of the proposed Box classifier. The extensive experiments performed on the simulated and real data for binary and multiclass problems show that in almost all cases the Box classifier performs significantly better than k-NN, SVM and decision trees.

A Joint Representation Learning and Feature Modeling Approach for One-Class Recognition

Pramuditha Perera, Vishal Patel

Responsive image

Auto-TLDR; Combining Generative Features and One-Class Classification for Effective One-class Recognition

Slides Poster Similar

One-class recognition is traditionally approached either as a representation learning problem or a feature modelling problem. In this work, we argue that both of these approaches have their own limitations; and a more effective solution can be obtained by combining the two. The proposed approach is based on the combination of a generative framework and a one-class classification method. First, we learn generative features using the one-class data with a generative framework. We augment the learned features with the corresponding reconstruction errors to obtain augmented features. Then, we qualitatively identify a suitable feature distribution that reduces the redundancy in the chosen classifier space. Finally, we force the augmented features to take the form of this distribution using an adversarial framework. We test the effectiveness of the proposed method on three one-class classification tasks and obtain state-of-the-art results.

Are Multiple Cross-Correlation Identities Better Than Just Two? Improving the Estimate of Time Differences-Of-Arrivals from Blind Audio Signals

Danilo Greco, Jacopo Cavazza, Alessio Del Bue

Responsive image

Auto-TLDR; Improving Blind Channel Identification Using Cross-Correlation Identity for Time Differences-of-Arrivals Estimation

Slides Poster Similar

Given an unknown audio source, the estimation of time differences-of-arrivals (TDOAs) can be efficiently and robustly solved using blind channel identification and exploiting the cross-correlation identity (CCI). Prior "blind" works have improved the estimate of TDOAs by means of different algorithmic solutions and optimization strategies, while always sticking to the case N = 2 microphones. But what if we can obtain a direct improvement in performance by just increasing N? In this paper we try to investigate this direction, showing that, despite the arguable simplicity, this is capable of (sharply) improving upon state-of-the-art blind channel identification methods based on CCI, without modifying the computational pipeline. Inspired by our results, we seek to warm up the community and the practitioners by paving the way (with two concrete, yet preliminary, examples) towards joint approaches in which advances in the optimization are combined with an increased number of microphones, in order to achieve further improvements.

Sparse-Dense Subspace Clustering

Shuai Yang, Wenqi Zhu, Yuesheng Zhu

Responsive image

Auto-TLDR; Sparse-Dense Subspace Clustering with Piecewise Correlation Estimation

Slides Poster Similar

Subspace clustering refers to the problem of clustering high-dimensional data into a union of low-dimensional subspaces. Current subspace clustering approaches are usually based on a two-stage framework. In the first stage, an affinity matrix is generated from data. In the second one, spectral clustering is applied on the affinity matrix. However, the affinity matrix produced by two-stage methods cannot fully reveal the similarity between data points from the same subspace, resulting in inaccurate clustering. Besides, most approaches fail to solve large-scale clustering problems due to poor efficiency. In this paper, we first propose a new scalable sparse method called Iterative Maximum Correlation (IMC) to learn the affinity matrix from data. Then we develop Piecewise Correlation Estimation (PCE) to densify the intra-subspace similarity produced by IMC. Finally we extend our work into a Sparse-Dense Subspace Clustering (SDSC) framework with a dense stage to optimize the affinity matrix for two-stage methods. We show that IMC is efficient for large-scale tasks, and PCE ensures better performance for IMC. We show the universality of our SDSC framework for current two-stage methods as well. Experiments on benchmark data sets demonstrate the effectiveness of our approaches.

Fast Discrete Cross-Modal Hashing Based on Label Relaxation and Matrix Factorization

Donglin Zhang, Xiaojun Wu, Zhen Liu, Jun Yu, Josef Kittler

Responsive image

Auto-TLDR; LRMF: Label Relaxation and Discrete Matrix Factorization for Cross-Modal Retrieval

Poster Similar

In recent years, cross-media retrieval has drawn considerable attention due to the exponential growth of multimedia data. Many hashing approaches have been proposed for the cross-media search task. However, there are still open problems that warrant investigation. For example, most existing supervised hashing approaches employ a binary label matrix, which achieves small margins between wrong labels (0) and true labels (1). This may affect the retrieval performance by generating many false negatives and false positives. In addition, some methods adopt a relaxation scheme to solve the binary constraints, which may cause large quantization errors. There are also some discrete hashing methods that have been presented, but most of them are time-consuming. To conquer these problems, we present a label relaxation and discrete matrix factorization method (LRMF) for cross-modal retrieval. It offers a number of innovations. First of all, the proposed approach employs a novel label relaxation scheme to control the margins adaptively, which has the benefit of reducing the quantization error. Second, by virtue of the proposed discrete matrix factorization method designed to learn the binary codes, large quantization errors caused by relaxation can be avoided. The experimental results obtained on two widely-used databases demonstrate that LRMF outperforms state-of-the-art cross-media methods.

Multi-annotator Probabilistic Active Learning

Marek Herde, Daniel Kottke, Denis Huseljic, Bernhard Sick

Responsive image

Auto-TLDR; MaPAL: Multi-annotator Probabilistic Active Learning

Slides Poster Similar

Classifiers require annotations of instances, i.e., class labels, for training. An annotation process is often costly due to its manual execution through human annotators. Active learning (AL) aims at reducing the annotation costs by selecting instances from which the classifier is expected to learn the most. Many AL strategies assume the availability of a single omniscient annotator. In this article, we overcome this limitation by considering multiple error-prone annotators. We propose a novel AL strategy multi-annotator probabilistic active learning (MaPAL). Due to the nature of learning with error-prone annotators, it must not only select instances but annotators, too. MaPAL builds on a decision-theoretic framework and selects instance-annotator pairs maximizing the classifier's expected performance. Experiments on a variety of data sets demonstrate MaPAL's superior performance compared to five related AL strategies.

Unsupervised Feature Learning for Event Data: Direct vs Inverse Problem Formulation

Dimche Kostadinov, Davide Scarammuza

Responsive image

Auto-TLDR; Unsupervised Representation Learning from Local Event Data for Pattern Recognition

Slides Poster Similar

Event-based cameras record asynchronous streamof per-pixel brightness changes. As such, they have numerous advantages over the common frame-based cameras, including high temporal resolution, high dynamic range, and no motion blur. Due to the asynchronous nature, efficient learning of compact representation for event data is challenging. While the extend to which the spatial and temporal event "information" is useful for pattern recognition tasks is not fully explored. In this paper, we focus on single layer architectures. We analyze the performance of two general problem formulations,i.e., the direct and the inverse, for unsupervised feature learning from local event data,i.e., local volumes of events that are described in space and time. We identify and show the main advantages of each approach. Theoretically, we analyze guarantees for local optimal solution, possibility for asynchronous and parallel parameter update as well as the computational complexity. We present numerical experiments for the task of object recognition, where we evaluate the solution under the direct and the inverse problem.We give a comparison with the state-of-the-art methods. Our empirical results highlight the advantages of the both approaches for representation learning from event data. Moreover, we show improvements of up to 9% in the recognition accuracy compared to the state-of-the-art methods from the same class of methods.

3CS Algorithm for Efficient Gaussian Process Model Retrieval

Fabian Berns, Kjeld Schmidt, Ingolf Bracht, Christian Beecks

Responsive image

Auto-TLDR; Efficient retrieval of Gaussian Process Models for large-scale data using divide-&-conquer-based approach

Slides Poster Similar

Gaussian Process Models (GPMs) have been applied for various pattern recognition tasks due to their analytical tractability, ability to quantify uncertainty for their own results as well as to subsume prominent other regression techniques. Despite these promising prospects their super-quadratic computation time complexity for model selection and evaluation impedes its broader application for more than a few thousand data points. Although there have been many proposals towards Gaussian Processes for large-scale data, those only offer a linearly scaling improvement to a cubical scaling problem. In particular, solutions like the Nystrom approximation or sparse matrices are only taking fractions of the given data into account and subsequently lead to inaccurate models. In this paper, we thus propose a divide-&-conquer-based approach, that allows to efficiently retrieve GPMs for large-scale data. The resulting model is composed of independent pattern representations for non-overlapping segments of the given data and consequently reduces computation time significantly. Our performance analysis indicates that our proposal is able to outperform state-of-the-art algorithms for GPM retrieval with respect to the qualities of efficiency and accuracy.

Adversarial Encoder-Multi-Task-Decoder for Multi-Stage Processes

Andre Mendes, Julian Togelius, Leandro Dos Santos Coelho

Responsive image

Auto-TLDR; Multi-Task Learning and Semi-Supervised Learning for Multi-Stage Processes

Similar

In multi-stage processes, decisions occur in an ordered sequence of stages. Early stages usually have more observations with general information (easier/cheaper to collect), while later stages have fewer observations but more specific data. This situation can be represented by a dual funnel structure, in which the sample size decreases from one stage to the other while the information increases. Training classifiers in this scenario is challenging since information in the early stages may not contain distinct patterns to learn (underfitting). In contrast, the small sample size in later stages can cause overfitting. We address both cases by introducing a framework that combines adversarial autoencoders (AAE), multi-task learning (MTL), and multi-label semi-supervised learning (MLSSL). We improve the decoder of the AAE with an MTL component so it can jointly reconstruct the original input and use feature nets to predict the features for the next stages. We also introduce a sequence constraint in the output of an MLSSL classifier to guarantee the sequential pattern in the predictions. Using real-world data from different domains (selection process, medical diagnosis), we show that our approach outperforms other state-of-the-art methods.

T-SVD Based Non-Convex Tensor Completion and Robust Principal Component Analysis

Tao Li, Jinwen Ma

Responsive image

Auto-TLDR; Non-Convex tensor rank surrogate function and non-convex sparsity measure for tensor recovery

Slides Poster Similar

In this paper, we propose a novel non-convex tensor rank surrogate function and a novel non-convex sparsity measure. The basic idea is to sidestep the bias of $\ell_1-$norm by introducing the concavity. Furthermore, we employ this non-convex penalty in tensor recovery problems such as tensor completion and tensor robust principal component analysis. Due to the concavity, the parameters of these models are difficult to solve. To tackle this problem, we devise a majorization minimization algorithm that can optimize the upper bound of the original function in each iteration, and every sub-problem is solved by the alternating direction multiplier method. We also analyze the theoretical properties of the proposed algorithm. Finally, the experimental results on natural and hyperspectral images demonstrate the efficacy and efficiency of the proposed method.

Towards Explaining Adversarial Examples Phenomenon in Artificial Neural Networks

Ramin Barati, Reza Safabakhsh, Mohammad Rahmati

Responsive image

Auto-TLDR; Convolutional Neural Networks and Adversarial Training from the Perspective of convergence

Slides Poster Similar

In this paper, we study the adversarial examples existence and adversarial training from the standpoint of convergence and provide evidence that pointwise convergence in ANNs can explain these observations. The main contribution of our proposal is that it relates the objective of the evasion attacks and adversarial training with concepts already defined in learning theory. Also, we extend and unify some of the other proposals in the literature and provide alternative explanations on the observations made in those proposals. Through different experiments, we demonstrate that the framework is valuable in the study of the phenomenon and is applicable to real-world problems.

Tensorized Feature Spaces for Feature Explosion

Ravdeep Pasricha, Pravallika Devineni, Evangelos Papalexakis, Ramakrishnan Kannan

Responsive image

Auto-TLDR; Tensor Rank Decomposition for Hyperspectral Image Classification

Slides Poster Similar

In this paper, we present a novel framework that uses tensor factorization to generate richer feature spaces for pixel classification in hyperspectral images. In particular, we assess the performance of different tensor rank decomposition methods as compared to the traditional kernel-based approaches for the hyperspectral image classification problem. We propose ORION, which takes as input a hyperspectral image tensor and a rank and outputs an enhanced feature space from the factor matrices of the decomposed tensor. Our method is a feature explosion technique that inherently maps low dimensional input space in R^K to high dimensional space in R^R, where R >> K, say in the order of 1000x, like a kernel. We show how the proposed method exploits the multi-linear structure of hyperspectral three-dimensional tensor. We demonstrate the effectiveness of our method with experiments on three publicly available hyperspectral datasets with labeled pixels and compare their classification performance against traditional linear and non-linear supervised learning methods such as SVM with Linear, Polynomial, RBF kernels, and the Multi-Layer Perceptron model. Finally, we explore the relationship between the rank of the tensor decomposition and the classification accuracy using several hyperspectral datasets with ground truth.

Aggregating Dependent Gaussian Experts in Local Approximation

Hamed Jalali, Gjergji Kasneci

Responsive image

Auto-TLDR; A novel approach for aggregating the Gaussian experts by detecting strong violations of conditional independence

Slides Poster Similar

Distributed Gaussian processes (DGPs) are prominent local approximation methods to scale Gaussian processes (GPs) to large datasets. Instead of a global estimation, they train local experts by dividing the training set into subsets, thus reducing the time complexity. This strategy is based on the conditional independence assumption, which basically means that there is a perfect diversity between the local experts. In practice, however, this assumption is often violated, and the aggregation of experts leads to sub-optimal and inconsistent solutions. In this paper, we propose a novel approach for aggregating the Gaussian experts by detecting strong violations of conditional independence. The dependency between experts is determined by using a Gaussian graphical model, which yields the precision matrix. The precision matrix encodes conditional dependencies between experts and is used to detect strongly dependent experts and construct an improved aggregation. Using both synthetic and real datasets, our experimental evaluations illustrate that our new method outperforms other state-of-the-art (SOTA) DGP approaches while being substantially more time-efficient than SOTA approaches, which build on independent experts.