Label Self-Adaption Hashing for Image Retrieval

Jianglin Lu, Zhihui Lai, Hailing Wang, Jie Zhou

Responsive image

Auto-TLDR; Label Self-Adaption Hashing for Large-Scale Image Retrieval

Slides Poster

Hashing has attracted widespread attention in image retrieval because of its fast retrieval speed and low storage cost. Compared with supervised methods, unsupervised hashing methods are more reasonable and suitable for large-scale image retrieval since it is always difficult and expensive to collect true labels of the massive data. Without label information, however, unsupervised hashing methods can not guarantee the quality of learned binary codes. To resolve this dilemma, this paper proposes a novel unsupervised hashing method called Label Self-Adaption Hashing (LSAH), which contains effective hashing function learning part and self-adaption label generation part. In the first part, we utilize anchor graph to keep the local structure of the data and introduce joint sparsity into the model to extract effective features for high-quality binary code learning. In the second part, a self-adaptive cluster label matrix is learned from the data under the assumption that the nearest neighbor points should have a large probability to be in the same cluster. Therefore, the proposed LSAH can make full use of the potential discriminative information of the data to guide the learning of binary code. It is worth noting that LSAH can learn effective binary codes, hashing function and cluster labels simultaneously in a unified optimization framework. To solve the resulting optimization problem, an Augmented Lagrange Multiplier based iterative algorithm is elaborately designed. Extensive experiments on three large-scale data sets indicate the promising performance of the proposed LSAH.

Similar papers

Object Classification of Remote Sensing Images Based on Optimized Projection Supervised Discrete Hashing

Qianqian Zhang, Yazhou Liu, Quansen Sun

Responsive image

Auto-TLDR; Optimized Projection Supervised Discrete Hashing for Large-Scale Remote Sensing Image Object Classification

Slides Poster Similar

Recently, with the increasing number of large-scale remote sensing images, the demand for large-scale remote sensing image object classification is growing and attracting the interest of many researchers. Hashing, because of its low memory requirements and high time efficiency, has been widely solve the problem of large-scale remote sensing image. Supervised hashing methods mainly leverage the label information of remote sensing image to learn hash function, however, the similarity of the original feature space cannot be well preserved, which can not meet the accurate requirements for object classification of remote sensing image. To solve the mentioned problem, we propose a novel method named Optimized Projection Supervised Discrete Hashing(OPSDH), which jointly learns a discrete binary codes generation and optimized projection constraint model. It uses an effective optimized projection method to further constraint the supervised hash learning and generated hash codes preserve the similarity based on the data label while retaining the similarity of the original feature space. The experimental results show that OPSDH reaches improved performance compared with the existing hash learning methods and demonstrate that the proposed method is more efficient for operational applications

Discrete Semantic Matrix Factorization Hashing for Cross-Modal Retrieval

Jianyang Qin, Lunke Fei, Shaohua Teng, Wei Zhang, Genping Zhao, Haoliang Yuan

Responsive image

Auto-TLDR; Discrete Semantic Matrix Factorization Hashing for Cross-Modal Retrieval

Slides Poster Similar

Hashing has been widely studied for cross-modal retrieval due to its promising efficiency and effectiveness in massive data analysis. However, most existing supervised hashing has the limitations of inefficiency for very large-scale search and intractable discrete constraint for hash codes learning. In this paper, we propose a new supervised hashing method, namely, Discrete Semantic Matrix Factorization Hashing (DSMFH), for cross-modal retrieval. First, we conduct the matrix factorization via directly utilizing the available label information to obtain a latent representation, so that both the inter-modality and intra-modality similarities are well preserved. Then, we simultaneously learn the discriminative hash codes and corresponding hash functions by deriving the matrix factorization into a discrete optimization. Finally, we adopt an alternatively iterative procedure to efficiently optimize the matrix factorization and discrete learning. Extensive experimental results on three widely used image-tag databases demonstrate the superiority of the DSMFH over state-of-the-art cross-modal hashing methods.

Improved Deep Classwise Hashing with Centers Similarity Learning for Image Retrieval

Ming Zhang, Hong Yan

Responsive image

Auto-TLDR; Deep Classwise Hashing for Image Retrieval Using Center Similarity Learning

Slides Poster Similar

Deep supervised hashing for image retrieval has attracted researchers' attention due to its high efficiency and superior retrieval performance. Most existing deep supervised hashing works, which are based on pairwise/triplet labels, suffer from the expensive computational cost and insufficient utilization of the semantics information. Recently, deep classwise hashing introduced a classwise loss supervised by class labels information alternatively; however, we find it still has its drawback. In this paper, we propose an improved deep classwise hashing, which enables hashing learning and class centers learning simultaneously. Specifically, we design a two-step strategy on center similarity learning. It interacts with the classwise loss to attract the class center to concentrate on the intra-class samples while pushing other class centers as far as possible. The centers similarity learning contributes to generating more compact and discriminative hashing codes. We conduct experiments on three benchmark datasets. It shows that the proposed method effectively surpasses the original method and outperforms state-of-the-art baselines under various commonly-used evaluation metrics for image retrieval.

Fast Discrete Cross-Modal Hashing Based on Label Relaxation and Matrix Factorization

Donglin Zhang, Xiaojun Wu, Zhen Liu, Jun Yu, Josef Kittler

Responsive image

Auto-TLDR; LRMF: Label Relaxation and Discrete Matrix Factorization for Cross-Modal Retrieval

Poster Similar

In recent years, cross-media retrieval has drawn considerable attention due to the exponential growth of multimedia data. Many hashing approaches have been proposed for the cross-media search task. However, there are still open problems that warrant investigation. For example, most existing supervised hashing approaches employ a binary label matrix, which achieves small margins between wrong labels (0) and true labels (1). This may affect the retrieval performance by generating many false negatives and false positives. In addition, some methods adopt a relaxation scheme to solve the binary constraints, which may cause large quantization errors. There are also some discrete hashing methods that have been presented, but most of them are time-consuming. To conquer these problems, we present a label relaxation and discrete matrix factorization method (LRMF) for cross-modal retrieval. It offers a number of innovations. First of all, the proposed approach employs a novel label relaxation scheme to control the margins adaptively, which has the benefit of reducing the quantization error. Second, by virtue of the proposed discrete matrix factorization method designed to learn the binary codes, large quantization errors caused by relaxation can be avoided. The experimental results obtained on two widely-used databases demonstrate that LRMF outperforms state-of-the-art cross-media methods.

Hierarchical Deep Hashing for Fast Large Scale Image Retrieval

Yongfei Zhang, Cheng Peng, Zhang Jingtao, Xianglong Liu, Shiliang Pu, Changhuai Chen

Responsive image

Auto-TLDR; Hierarchical indexed deep hashing for fast large scale image retrieval

Slides Poster Similar

Fast image retrieval is of great importance in many computer vision tasks and especially practical applications. Deep hashing, the state-of-the-art fast image retrieval scheme, introduces deep learning to learn the hash functions and generate binary hash codes, and outperforms the other image retrieval methods in terms of accuracy. However, all the existing deep hashing methods could only generate one level hash codes and require a linear traversal of all the hash codes to figure out the closest one when a new query arrives, which is very time-consuming and even intractable for large scale applications. In this work, we propose a Hierarchical Deep HASHing(HDHash) scheme to speed up the state-of-the-art deep hashing methods. More specifically, hierarchical deep hash codes of multiple levels can be generated and indexed with tree structures rather than linear ones, and pruning irrelevant branches can sharply decrease the retrieval time. To our best knowledge, this is the first work to introduce hierarchical indexed deep hashing for fast large scale image retrieval. Extensive experimental results on three benchmark datasets demonstrate that the proposed HDHash scheme achieves better or comparable accuracy with significantly improved efficiency and reduced memory as compared to state-of-the-art fast image retrieval schemes.

VSB^2-Net: Visual-Semantic Bi-Branch Network for Zero-Shot Hashing

Xin Li, Xiangfeng Wang, Bo Jin, Wenjie Zhang, Jun Wang, Hongyuan Zha

Responsive image

Auto-TLDR; VSB^2-Net: inductive zero-shot hashing for image retrieval

Slides Poster Similar

Zero-shot hashing aims at learning hashing model from seen classes and the obtained model is capable of generalizing to unseen classes for image retrieval. Inspired by zero-shot learning, existing zero-shot hashing methods usually transfer the supervised knowledge from seen to unseen classes, by embedding the hamming space to a shared semantic space. However, this makes instances difficult to distinguish due to limited hashing bit numbers, especially for semantically similar unseen classes. We propose a novel inductive zero-shot hashing framework, i.e., VSB^2-Net, where both semantic space and visual feature space are embedded to the same hamming space instead. The reconstructive semantic relationships are established in the hamming space, preserving local similarity relationships and explicitly enlarging the discrepancy between semantic hamming vectors. A two-task architecture, comprising of classification module and visual feature reconstruction module, is employed to enhance the generalization and transfer abilities. Extensive evaluation results on several benchmark datasets demonstratethe superiority of our proposed method compared to several state-of-the-art baselines.

Feature Extraction by Joint Robust Discriminant Analysis and Inter-Class Sparsity

Fadi Dornaika, Ahmad Khoder

Responsive image

Auto-TLDR; Robust Discriminant Analysis with Feature Selection and Inter-class Sparsity (RDA_FSIS)

Slides Similar

Feature extraction methods have been successfully applied to many real-world applications. The classical Linear Discriminant Analysis (LDA) and its variants are widely used as feature extraction methods. Although they have been used for different classification tasks, these methods have some shortcomings. The main one is that the projection axes obtained are not informative about the relevance of original features. In this paper, we propose a linear embedding method that merges two interesting properties: Robust LDA and inter-class sparsity. Furthermore, the targeted projection transformation focuses on the most discriminant original features. The proposed method is called Robust Discriminant Analysis with Feature Selection and Inter-class Sparsity (RDA_FSIS). Two kinds of sparsity are explicitly included in the proposed model. The first kind is obtained by imposing the $\ell_{2,1}$ constraint on the projection matrix in order to perform feature ranking. The second kind is obtained by imposing the inter-class sparsity constraint used for getting a common sparsity structure in each class. Comprehensive experiments on five real-world image datasets demonstrate the effectiveness and advantages of our framework over existing linear methods.

Leveraging Quadratic Spherical Mutual Information Hashing for Fast Image Retrieval

Nikolaos Passalis, Anastasios Tefas

Responsive image

Auto-TLDR; Quadratic Mutual Information for Large-Scale Hashing and Information Retrieval

Slides Poster Similar

Several deep supervised hashing techniques have been proposed to allow for querying large image databases. However, it is often overlooked that the process of information retrieval can be modeled using information-theoretic metrics, leading to optimizing various proxies for the problem at hand instead. Contrary to this, we propose a deep supervised hashing algorithm that optimizes the learned codes using an information-theoretic measure, the Quadratic Mutual Information (QMI). The proposed method is adapted to the needs of large-scale hashing and information retrieval leading to a novel information-theoretic measure, the Quadratic Spherical Mutual Information (QSMI), that is inspired by QMI, but leads to significant better retrieval precision. Indeed, the effectiveness of the proposed method is demonstrated under several different scenarios, using different datasets and network architectures, outperforming existing deep supervised image hashing techniques.

Soft Label and Discriminant Embedding Estimation for Semi-Supervised Classification

Fadi Dornaika, Abdullah Baradaaji, Youssof El Traboulsi

Responsive image

Auto-TLDR; Semi-supervised Semi-Supervised Learning for Linear Feature Extraction and Label Propagation

Slides Poster Similar

In recent times, graph-based semi-supervised learning proved to be a powerful paradigm for processing and mining large datasets. The main advantage relies on the fact that these methods can be useful in propagating a small set of known labels to a large set of unlabeled data. The scarcity of labeled data may affect the performance of the semi-learning. This paper introduces a new semi-supervised framework for simultaneous linear feature extraction and label propagation. The proposed method simultaneously estimates a discriminant transformation and the unknown label by exploiting both labeled and unlabeled data. In addition, the unknowns of the learning model are estimated by integrating two types of graph-based smoothness constraints. The resulting semi-supervised model is expected to learn more discriminative information. Experiments are conducted on six public image datasets. These experimental results show that the performance of the proposed method can be better than that of many state-of-the-art graph-based semi-supervised algorithms.

Cross-Media Hash Retrieval Using Multi-head Attention Network

Zhixin Li, Feng Ling, Chuansheng Xu, Canlong Zhang, Huifang Ma

Responsive image

Auto-TLDR; Unsupervised Cross-Media Hash Retrieval Using Multi-Head Attention Network

Slides Poster Similar

The cross-media hash retrieval method is to encode multimedia data into a common binary hash space, which can effectively measure the correlation between samples from different modalities. In order to further improve the retrieval accuracy, this paper proposes an unsupervised cross-media hash retrieval method based on multi-head attention network. First of all, we use a multi-head attention network to make better matching images and texts, which contains rich semantic information. At the same time, an auxiliary similarity matrix is constructed to integrate the original neighborhood information from different modalities. Therefore, this method can capture the potential correlations between different modalities and within the same modality, so as to make up for the differences between different modalities and within the same modality. Secondly, the method is unsupervised and does not require additional semantic labels, so it has the potential to achieve large-scale cross-media retrieval. In addition, batch normalization and replacement hash code generation functions are adopted to optimize the model, and two loss functions are designed, which make the performance of this method exceed many supervised deep cross-media hash methods. Experiments on three datasets show that the average performance of this method is about 5 to 6 percentage points higher than the state-of-the-art unsupervised method, which proves the effectiveness and superiority of this method.

Double Manifolds Regularized Non-Negative Matrix Factorization for Data Representation

Jipeng Guo, Shuai Yin, Yanfeng Sun, Yongli Hu

Responsive image

Auto-TLDR; Double Manifolds Regularized Non-negative Matrix Factorization for Clustering

Slides Poster Similar

Non-negative matrix factorization (NMF) is an important method in learning latent data representation. The local geometrical structure can make the learned representation more effectively and significantly improve the performance of NMF. However, most of existing graph-based learning methods are determined by a predefined similarity graph which may be not optimal for specific tasks. To solve the above the problem, we propose the Double Manifolds Regularized NMF (DMR-NMF) model which jointly learns an adaptive affinity matrix with the non-negative matrix factorization. The learned affinity matrix can guide the NMF to fit the clustering task. Moreover, we develop the iterative updating optimization schemes for DMR-NMF, and provide the strict convergence proof of our optimization strategy. Empirical experiments on four different real-world data sets demonstrate the state-of-the-art performance of DMR-NMF in comparison with the other related algorithms.

DFH-GAN: A Deep Face Hashing with Generative Adversarial Network

Bo Xiao, Lanxiang Zhou, Yifei Wang, Qiangfang Xu

Responsive image

Auto-TLDR; Deep Face Hashing with GAN for Face Image Retrieval

Slides Poster Similar

Face Image retrieval is one of the key research directions in computer vision field. Thanks to the rapid development of deep neural network in recent years, deep hashing has achieved good performance in the field of image retrieval. But for large-scale face image retrieval, the performance needs to be further improved. In this paper, we propose Deep Face Hashing with GAN (DFH-GAN), a novel deep hashing method for face image retrieval, which mainly consists of three components: a generator network for generating synthesized images, a discriminator network with a shared CNN to learn multi-domain face feature, and a hash encoding network to generate compact binary hash codes. The generator network is used to perform data augmentation so that the model could learn from both real images and diverse synthesized images. We adopt a two-stage training strategy. In the first stage, the GAN is trained to generate fake images, while in the second stage, to make the network convergence faster. The model inherits the trained shared CNN of discriminator to train the DFH model by using many different supervised loss functions not only in the last layer but also in the middle layer of the network. Extensive experiments on two widely used datasets demonstrate that DFH-GAN can generate high-quality binary hash codes and exceed the performance of the state-of-the-art model greatly.

Feature Extraction and Selection Via Robust Discriminant Analysis and Class Sparsity

Ahmad Khoder, Fadi Dornaika

Responsive image

Auto-TLDR; Hybrid Linear Discriminant Embedding for supervised multi-class classification

Slides Poster Similar

The main goal of discriminant embedding is to extract features that can be compact and informative representations of the original set of features. This paper introduces a hybrid scheme for linear feature extraction for supervised multi-class classification. We introduce a unifying criterion that is able to retain the advantages of robust sparse LDA and Inter-class sparsity. Thus, the estimated transformation includes two types of discrimination which are the inter-class sparsity and robust Linear Discriminant Analysis with feature selection. In order to optimize the proposed objective function, we deploy an iterative alternating minimization scheme for estimating the linear transformation and the orthogonal matrix. The introduced scheme is generic in the sense that it can be used for combining and tuning many other linear embedding methods. In the lights of the experiments conducted on six image datasets including faces, objects, and digits, the proposed scheme was able to outperform competing methods in most of the cases.

Subspace Clustering Via Joint Unsupervised Feature Selection

Wenhua Dong, Xiaojun Wu, Hui Li, Zhenhua Feng, Josef Kittler

Responsive image

Auto-TLDR; Unsupervised Feature Selection for Subspace Clustering

Poster Similar

Any high-dimensional data arising from practical applications usually contains irrelevant features, which may impact on the performance of existing subspace clustering methods. This paper proposes a novel subspace clustering method, which reconstructs the feature matrix by the means of unsupervised feature selection (UFS) to achieve a better dictionary for subspace clustering (SC). Different from most existing clustering methods, the proposed approach uses a reconstructed feature matrix as the dictionary rather than the original data matrix. As the feature matrix reconstructed by representative features is more discriminative and closer to the ground-truth, it results in improved performance. The corresponding non-convex optimization problem is effectively solved using the half-quadratic and augmented Lagrange multiplier methods. Extensive experiments on four real datasets demonstrate the effectiveness of the proposed method.

Sparse-Dense Subspace Clustering

Shuai Yang, Wenqi Zhu, Yuesheng Zhu

Responsive image

Auto-TLDR; Sparse-Dense Subspace Clustering with Piecewise Correlation Estimation

Slides Poster Similar

Subspace clustering refers to the problem of clustering high-dimensional data into a union of low-dimensional subspaces. Current subspace clustering approaches are usually based on a two-stage framework. In the first stage, an affinity matrix is generated from data. In the second one, spectral clustering is applied on the affinity matrix. However, the affinity matrix produced by two-stage methods cannot fully reveal the similarity between data points from the same subspace, resulting in inaccurate clustering. Besides, most approaches fail to solve large-scale clustering problems due to poor efficiency. In this paper, we first propose a new scalable sparse method called Iterative Maximum Correlation (IMC) to learn the affinity matrix from data. Then we develop Piecewise Correlation Estimation (PCE) to densify the intra-subspace similarity produced by IMC. Finally we extend our work into a Sparse-Dense Subspace Clustering (SDSC) framework with a dense stage to optimize the affinity matrix for two-stage methods. We show that IMC is efficient for large-scale tasks, and PCE ensures better performance for IMC. We show the universality of our SDSC framework for current two-stage methods as well. Experiments on benchmark data sets demonstrate the effectiveness of our approaches.

Embedding Shared Low-Rank and Feature Correlation for Multi-View Data Analysis

Zhan Wang, Lizhi Wang, Hua Huang

Responsive image

Auto-TLDR; embedding shared low-rank and feature correlation for multi-view data analysis

Slides Poster Similar

The diversity of multimedia data in the real-world usually forms multi-view features. How to explore the structure information and correlations among multi-view features is still an open problem. In this paper, we propose a novel multi-view subspace learning method, named embedding shared low-rank and feature correlation (ESLRFC), for multi-view data analysis. First, in the embedding subspace, we propose a robust low-rank model on each feature set and enforce a shared low-rank constraint to characterize the common structure information of multiple feature data. Second, we develop an enhanced correlation analysis in the embedding subspace for simultaneously removing the redundancy of each feature set and exploring the correlations of multiple feature data. Finally, we incorporate the low-rank model and the correlation analysis into a unified framework. The shared low-rank constraint not only depicts the data distribution consistency among multiple feature data, but also assists robust subspace learning. Experimental results on recognition tasks demonstrate the superior performance and noise robustness of the proposed method.

Fast Subspace Clustering Based on the Kronecker Product

Lei Zhou, Xiao Bai, Liang Zhang, Jun Zhou, Edwin Hancock

Responsive image

Auto-TLDR; Subspace Clustering with Kronecker Product for Large Scale Datasets

Slides Poster Similar

Subspace clustering is a useful technique for many computer vision applications in which the intrinsic dimension of high-dimensional data is often smaller than the ambient dimension. Spectral clustering, as one of the main approaches to subspace clustering, often takes on a sparse representation or a low-rank representation to learn a block diagonal self-representation matrix for subspace generation. However, existing methods require solving a large scale convex optimization problem with a large set of data, with computational complexity reaches O(N^3) for N data points. Therefore, the efficiency and scalability of traditional spectral clustering methods can not be guaranteed for large scale datasets. In this paper, we propose a subspace clustering model based on the Kronecker product. Due to the property that the Kronecker product of a block diagonal matrix with any other matrix is still a block diagonal matrix, we can efficiently learn the representation matrix which is formed by the Kronecker product of k smaller matrices. By doing so, our model significantly reduces the computational complexity to O(kN^{3/k}). Furthermore, our model is general in nature, and can be adapted to different regularization based subspace clustering methods. Experimental results on two public datasets show that our model significantly improves the efficiency compared with several state-of-the-art methods. Moreover, we have conducted experiments on synthetic data to verify the scalability of our model for large scale datasets.

Low Rank Representation on Product Grassmann Manifolds for Multi-viewSubspace Clustering

Jipeng Guo, Yanfeng Sun, Junbin Gao, Yongli Hu, Baocai Yin

Responsive image

Auto-TLDR; Low Rank Representation on Product Grassmann Manifold for Multi-View Data Clustering

Slides Poster Similar

Clustering high dimension multi-view data with complex intrinsic properties and nonlinear manifold structure is a challenging task since these data are always embedded in low dimension manifolds. Inspired by Low Rank Representation (LRR), some researchers extended classic LRR on Grassmann manifold or Product Grassmann manifold to represent data with non-linear metrics. However, most of these methods utilized convex nuclear norm to leverage a low-rank structure, which was over-relaxation of true rank and would lead to the results deviated from the true underlying ones. And, the computational complexity of singular value decomposition of matrix is high for nuclear norm minimization. In this paper, we propose a new low rank model for high-dimension multi-view data clustering on Product Grassmann Manifold with the matrix tri-factorization which is used to control the upper bound of true rank of representation matrix. And, the original problem can be transformed into the nuclear norm minimization with smaller scale matrices. An effective solution and theoretical analysis are also provided. The experimental results show that the proposed method obviously outperforms other state-of-the-art methods on several multi-source human/crowd action video datasets.

Cross-spectrum Face Recognition Using Subspace Projection Hashing

Hanrui Wang, Xingbo Dong, Jin Zhe, Jean-Luc Dugelay, Massimo Tistarelli

Responsive image

Auto-TLDR; Subspace Projection Hashing for Cross-Spectrum Face Recognition

Slides Poster Similar

Cross-spectrum face recognition, e.g. visible to thermal matching, remains a challenging task due to the large variation originated from different domains. This paper proposed a subspace projection hashing (SPH) to enable the cross-spectrum face recognition task. The intrinsic idea behind SPH is to project the features from different domains onto a common subspace, where matching the faces from different domains can be accomplished. Notably, we proposed a new loss function that can (i) preserve both inter-domain and intra-domain similarity; (ii) regularize a scaled-up pairwise distance between hashed codes, to optimize projection matrix. Three datasets, Wiki, EURECOM VIS-TH paired face and TDFace are adopted to evaluate the proposed SPH. The experimental results indicate that the proposed SPH outperforms the original linear subspace ranking hashing (LSRH) in the benchmark dataset (Wiki) and demonstrates a reasonably good performance for visible-thermal, visible-near-infrared face recognition, therefore suggests the feasibility and effectiveness of the proposed SPH.

Scalable Direction-Search-Based Approach to Subspace Clustering

Yicong He, George Atia

Responsive image

Auto-TLDR; Fast Direction-Search-Based Subspace Clustering

Slides Similar

Subspace clustering finds a multi-subspace representation that best fits a high-dimensional dataset. The computational and storage complexities of existing algorithms limit their usefulness for large scale data. In this paper, we develop a novel scalable approach to subspace clustering termed Fast Direction-Search-Based Subspace Clustering (Fast DiSC). In sharp contrast to existing scalable solutions which are mostly based on the self-expressiveness property of the data, Fast DiSC rests upon a new representation obtained from projections on computed data-dependent directions. These directions are derived from a convex formulation for optimal direction search to gauge hidden similarity relations. The computational complexity is significantly reduced by performing direction search in partitions of sampled data, followed by a retrieval step to cluster out-of-sample data using projections on the computed directions. A theoretical analysis underscores the ability of the proposed formulation to construct local similarity relations for the different data points. Experiments on both synthetic and real data demonstrate that the proposed algorithm can often outperform the state-of-the-art clustering methods.

Joint Learning Multiple Curvature Descriptor for 3D Palmprint Recognition

Lunke Fei, Bob Zhang, Jie Wen, Chunwei Tian, Peng Liu, Shuping Zhao

Responsive image

Auto-TLDR; Joint Feature Learning for 3D palmprint recognition using curvature data vectors

Slides Poster Similar

3D palmprint-based biometric recognition has drawn growing research attention due to its several merits over 2D counterpart such as robust structural measurement of a palm surface and high anti-counterfeiting capability. However, most existing 3D palmprint descriptors are hand-crafted that usually extract stationary features from 3D palmprint images. In this paper, we propose a feature learning method to jointly learn compact curvature feature descriptor for 3D palmprint recognition. We first form multiple curvature data vectors to completely sample the intrinsic curvature information of 3D palmprint images. Then, we jointly learn a feature projection function that project curvature data vectors into binary feature codes, which have the maximum inter-class variances and minimum intra-class distance so that they are discriminative. Moreover, we learn the collaborative binary representation of the multiple curvature feature codes by minimizing the information loss between the final representation and the multiple curvature features, so that the proposed method is more compact in feature representation and efficient in matching. Experimental results on the baseline 3D palmprint database demonstrate the superiority of the proposed method in terms of recognition performance in comparison with state-of-the-art 3D palmprint descriptors.

Constrained Spectral Clustering Network with Self-Training

Xinyue Liu, Shichong Yang, Linlin Zong

Responsive image

Auto-TLDR; Constrained Spectral Clustering Network: A Constrained Deep spectral clustering network

Slides Poster Similar

Deep spectral clustering networks have shown their superiorities due to the integration of feature learning and cluster assignment, and the ability to deal with non-convex clusters. Nevertheless, deep spectral clustering is still an ill-posed problem. Specifically, the affinity learned by the most remarkable SpectralNet is not guaranteed to be consistent with local invariance and thus hurts the final clustering performance. In this paper, we propose a novel framework of Constrained Spectral Clustering Network (CSCN) by incorporating pairwise constraints and clustering oriented fine-tuning to deal with the ill-posedness. To the best of our knowledge, this is the first constrained deep spectral clustering method. Another advantage of CSCN over existing constrained deep clustering networks is that it propagates pairwise constraints throughout the entire dataset. In addition, we design a clustering oriented loss by self-training to simultaneously finetune feature representations and perform cluster assignments, which further improve the quality of clustering. Extensive experiments on benchmark datasets demonstrate that our approach outperforms the state-of-the-art clustering methods.

Supervised Feature Embedding for Classification by Learning Rank-Based Neighborhoods

Ghazaal Sheikhi, Hakan Altincay

Responsive image

Auto-TLDR; Supervised Feature Embedding with Representation Learning of Rank-based Neighborhoods

Slides Similar

In feature embedding, the recovery of associated discriminative information in the reduced subspace is critical for downstream classifiers. In this study, a supervised feature embedding method is proposed inspired by the well-known word embedding technique, word2vec. Proposed embedding method is implemented as representative learning of rank-based neighborhoods. The notion of context words in word2vec is extended into neighboring instances within a given window. Neighborship is defined using ranks of instances rather than their values so that regions with different densities are captured properly. Each sample is represented by a unique one-hot vector whereas its neighbors are encoded by several two-hot vectors. The two-hot vectors are identical for neighboring samples of the same class. A feed-forward neural network with a continuous projection layer, then learns the mapping from one-hot vectors to multiple two-hot vectors. The hidden layer determines the reduced subspace for the train samples. The obtained transformation is then applied on test data to find a lower-dimensional representation. Proposed method is tested in classification problems on 10 UCI data sets. Experimental results confirm that the proposed method is effective in finding a discriminative representation of the features and outperforms several supervised embedding approaches in terms of classification performance.

Sketch-Based Community Detection Via Representative Node Sampling

Mahlagha Sedghi, Andre Beckus, George Atia

Responsive image

Auto-TLDR; Sketch-based Clustering of Community Detection Using a Small Sketch

Slides Poster Similar

This paper proposes a sketch-based approach to the community detection problem which clusters the full graph through the use of an informative and concise sketch. The reduced sketch is built through an effective sampling approach which selects few nodes that best represent the complete graph and operates on a pairwise node similarity measure based on the average commute time. After sampling, the proposed algorithm clusters the nodes in the sketch, and then infers the cluster membership of the remaining nodes in the full graph based on their aggregate similarity to nodes in the partitioned sketch. By sampling nodes with strong representation power, our approach can improve the success rates over full graph clustering. In challenging cases with large node degree variation, our approach not only maintains competitive accuracy with full graph clustering despite using a small sketch, but also outperforms existing sampling methods. The use of a small sketch allows considerable storage savings, and computational and timing improvements for further analysis such as clustering and visualization. We provide numerical results on synthetic data based on the homogeneous, heterogeneous and degree corrected versions of the stochastic block model, as well as experimental results on real-world data.

Classification and Feature Selection Using a Primal-Dual Method and Projections on Structured Constraints

Michel Barlaud, Antonin Chambolle, Jean_Baptiste Caillau

Responsive image

Auto-TLDR; A Constrained Primal-dual Method for Structured Feature Selection on High Dimensional Data

Slides Poster Similar

This paper deals with feature selection using supervised classification on high dimensional datasets. A classical approach is to project data on a low dimensional space and classify by minimizing an appropriate quadratic cost. Our first contribution is to introduce a matrix of centers in the definition of this cost. Moreover, as quadratic costs are not robust to outliers, we propose to use an $\ell_1$ cost instead (or Huber loss to mitigate overfitting issues). While control on sparsity is commonly obtained by adding an $\ell_1$ constraint on the vectorized matrix of weights used for projecting the data, our second contribution is to enforce structured sparsity. To this end we propose constraints that take into account the matrix structure of the data, based either on the nuclear norm, on the $\ell_{2,1}$ norm, or on the $\ell_{1,2}$ norm for which we provide a new projection algorithm. We optimize simultaneously the projection matrix and the matrix of centers thanks to a new tailored constrained primal-dual method. The primal-dual framework is general enough to encompass the various robust losses and structured constraints we use, and allows a convergence analysis. We demonstrate the effectiveness of the approach on three biological datasets. Our primal-dual method with robust losses, adaptive centers and structured constraints does significantly better than classical methods, both in terms of accuracy and computational time.

A Spectral Clustering on Grassmann Manifold Via Double Low Rank Constraint

Xinglin Piao, Yongli Hu, Junbin Gao, Yanfeng Sun, Xin Yang, Baocai Yin

Responsive image

Auto-TLDR; Double Low Rank Representation for High-Dimensional Data Clustering on Grassmann Manifold

Slides Similar

High-dimension data clustering is a fundamental topic in machine learning and data mining areas. In recent year, researchers have proposed a series of effective methods based on Low Rank Representation (LRR) which could explore low-dimension subspace structure embedded in original data effectively. The traditional LRR methods usually treat original data as samples in Euclidean space. They generally adopt linear metric to measure the distance between two data. However, high-dimension data (such as video clip or imageset) are always considered as non-linear manifold data such as Grassmann manifold. Therefore, the traditional linear Euclidean metric would be no longer suitable for these special data. In addition, traditional LRR clustering method always adopt nuclear norm as low rank constraint which would lead to suboptimal solution and decrease the clustering accuracy. In this paper, we proposed a new low rank method on Grassmann manifold for high-dimension data clustering task. In the proposed method, a double low rank representation approach is proposed by combining the nuclear norm and bilinear representation for better construct the representation matrix. The experimental results on several public datasets show that the proposed method outperforms the state-of-the-art clustering methods.

Wasserstein k-Means with Sparse Simplex Projection

Takumi Fukunaga, Hiroyuki Kasai

Responsive image

Auto-TLDR; SSPW $k$-means: Sparse Simplex Projection-based Wasserstein $ k$-Means Algorithm

Slides Poster Similar

This paper presents a proposal of a faster Wasserstein $k$-means algorithm for histogram data by reducing Wasserstein distance computations exploiting sparse simplex projection. We shrink data samples, centroids and ground cost matrix, which enables significant reduction of the computations to solve optimal transport problems without loss of clustering quality. Furthermore, we dynamically reduce computational complexity by removing lower-valued data samples harnessing sparse simplex projection while keeping degradation of clustering quality lower. We designate this proposed algorithm as sparse simplex projection-based Wasserstein $k$-means, for short, SSPW $k$-means. Numerical evaluations against Wasserstein $k$-means algorithm demonstrate the effectiveness of the proposed SSPW $k$-means on real-world datasets.

Exploiting Local Indexing and Deep Feature Confidence Scores for Fast Image-To-Video Search

Savas Ozkan, Gözde Bozdağı Akar

Responsive image

Auto-TLDR; Fast and Robust Image-to-Video Retrieval Using Local and Global Descriptors

Slides Poster Similar

Cost-effective visual representation and fast query-by-example search are two challenging goals hat should be provided for web-scale visual retrieval task on a moderate hardware. In this paper, we introduce a fast yet robust method that ensures both of these goals by obtaining the state-of-the-art results for an image-to-video search scenario. To this end, we present important enhancements to commonly used indexing and visual representation techniques by promoting faster, better and more moderate retrieval performance. We also boost the effectiveness of the method for visual distortion by exploiting the individual decision results of local and global descriptors in the query time. By this way, local content descriptors effectively represent copied / duplicated scenes with large geometric deformations, while global descriptors for near duplicate and semantic searches are more practical. Experiments are conducted on the large-scale Stanford I2V dataset. The experimental results show that the method is effective in terms of complexity and query processing time for large-scale visual retrieval scenarios, even if local and global representations are used together. In addition, the proposed method is fairly accurate and achieves state-of-the-art performance based on the mAP score of the dataset. Lastly, we report additional mAP scores after updating the ground annotations obtained by the retrieval results of the proposed method showing more clearly the actual performance.

Unsupervised Feature Learning for Event Data: Direct vs Inverse Problem Formulation

Dimche Kostadinov, Davide Scarammuza

Responsive image

Auto-TLDR; Unsupervised Representation Learning from Local Event Data for Pattern Recognition

Slides Poster Similar

Event-based cameras record asynchronous streamof per-pixel brightness changes. As such, they have numerous advantages over the common frame-based cameras, including high temporal resolution, high dynamic range, and no motion blur. Due to the asynchronous nature, efficient learning of compact representation for event data is challenging. While the extend to which the spatial and temporal event "information" is useful for pattern recognition tasks is not fully explored. In this paper, we focus on single layer architectures. We analyze the performance of two general problem formulations,i.e., the direct and the inverse, for unsupervised feature learning from local event data,i.e., local volumes of events that are described in space and time. We identify and show the main advantages of each approach. Theoretically, we analyze guarantees for local optimal solution, possibility for asynchronous and parallel parameter update as well as the computational complexity. We present numerical experiments for the task of object recognition, where we evaluate the solution under the direct and the inverse problem.We give a comparison with the state-of-the-art methods. Our empirical results highlight the advantages of the both approaches for representation learning from event data. Moreover, we show improvements of up to 9% in the recognition accuracy compared to the state-of-the-art methods from the same class of methods.

Learning Sign-Constrained Support Vector Machines

Kenya Tajima, Kouhei Tsuchida, Esmeraldo Ronnie Rey Zara, Naoya Ohta, Tsuyoshi Kato

Responsive image

Auto-TLDR; Constrained Sign Constraints for Learning Linear Support Vector Machine

Poster Similar

Domain knowledge is useful to improve the generalization performance of learning machines. Sign constraints are a handy representation to combine domain knowledge with learning machine. In this paper, we consider constraining the signs of the weight coefficients in learning the linear support vector machine, and develop two optimization algorithms for minimizing the empirical risk under the sign constraints. One of the two algorithms is based on the projected gradient method, in which each iteration of the projected gradient method takes O(nd) computational cost and the sublinear convergence of the objective error is guaranteed. The second algorithm is based on the Frank-Wolfe method that also converges sublinearly and possesses a clear termination criterion. We show that each iteration of the Frank-Wolfe also requires O(nd) cost. Furthermore, we derive the explicit expression for the minimal iteration number to ensure an epsilon-accurate solution by analyzing the curvature of the objective function. Finally, we empirically demonstrate that the sign constraints are a promising technique when similarities to the training examples compose the feature vector.

Adaptive Matching of Kernel Means

Miao Cheng, Xinge You

Responsive image

Auto-TLDR; Adaptive Matching of Kernel Means for Knowledge Discovery and Feature Learning

Slides Poster Similar

As a promising step, the performance of data analysis and feature learning are able to be improved if certain pattern matching mechanism is available. One of the feasible solutions can refer to the importance estimation of instances, and consequently, kernel mean matching (KMM) has become an important method for knowledge discovery and novelty detection in general. Furthermore, the existing KMM methods have focused on concrete learning frameworks. In this work, a novel approach to adaptive matching of kernel means is proposed, and selected data with high importance are adopted to achieve calculation efficiency with optimization. In addition, scalable learning can be conducted in proposed method as a generalized solution with appended data. The experimental results on a wide variety of real-world data sets demonstrate the proposed method is able to give outstanding performance compared with several state-of-the-art methods, while calculation efficiency can be preserved.

Feature-Aware Unsupervised Learning with Joint Variational Attention and Automatic Clustering

Wang Ru, Lin Li, Peipei Wang, Liu Peiyu

Responsive image

Auto-TLDR; Deep Variational Attention Encoder-Decoder for Clustering

Slides Poster Similar

Deep clustering aims to cluster unlabeled real-world samples by mining deep feature representation. Most of existing methods remain challenging when handling high-dimensional data and simultaneously exploring the complementarity of deep feature representation and clustering. In this paper, we propose a novel Deep Variational Attention Encoder-decoder for Clustering (DVAEC). Our DVAEC improves the representation learning ability by fusing variational attention. Specifically, we design a feature-aware automatic clustering module to mitigate the unreliability of similarity calculation and guide network learning. Besides, to further boost the performance of deep clustering from a global perspective, we define a joint optimization objective to promote feature representation learning and automatic clustering synergistically. Extensive experimental results show the promising performance achieved by our DVAEC on six datasets comparing with several popular baseline clustering methods.

Novel View Synthesis from a 6-DoF Pose by Two-Stage Networks

Xiang Guo, Bo Li, Yuchao Dai, Tongxin Zhang, Hui Deng

Responsive image

Auto-TLDR; Novel View Synthesis from a 6-DoF Pose Using Generative Adversarial Network

Slides Poster Similar

Novel view synthesis is a challenging problem in 3D vision and robotics. Different from the existing works, which need the reference images or 3D model, we propose a novel paradigm to this problem. That is, we synthesize the novel view from a 6-DoF pose directly. Although this setting is the most straightforward way, there are few works addressing it. While, our experiments demonstrate that, with a concise CNN, we could get a meaningful parametric model which could reconstruct the correct scenery images only from the 6-DoF pose. To this end, we propose a two-stage learning strategy, which consists of two consecutive CNNs: GenNet and RefineNet. The GenNet generates a coarse image from a camera pose. The RefineNet is a generative adversarial network that could refine the coarse image. In this way, we decouple the geometric relationship mapping and texture detail rendering. Extensive experiments conducted on the public datasets prove the effectiveness of our method. We believe this paradigm is of high research and application value and could be an important direction in novel view synthesis. We will share our code after the acceptance of this work.

Generative Deep-Neural-Network Mixture Modeling with Semi-Supervised MinMax+EM Learning

Nilay Pande, Suyash Awate

Responsive image

Auto-TLDR; Semi-supervised Deep Neural Networks for Generative Mixture Modeling and Clustering

Slides Poster Similar

Deep neural networks (DNNs) for generative mixture modeling typically rely on unsupervised learning that employs hard clustering schemes, or variational learning with loose / approximate bounds, or under-regularized modeling. We propose a novel statistical framework for a DNN mixture model using a single generative adversarial network. Our learning formulation proposes a novel data-likelihood term relying on a well-regularized / constrained Gaussian mixture model in the latent space along with a prior term on the DNN weights. Our min-max learning increases the data likelihood using a tight variational lower bound using expectation maximization (EM). We leverage our min-max EM learning scheme for semi-supervised learning. Results on three real-world datasets demonstrate the benefits of our compact modeling and learning formulation over the state of the art for mixture modeling and clustering.

Self-Paced Bottom-Up Clustering Network with Side Information for Person Re-Identification

Mingkun Li, Chun-Guang Li, Ruo-Pei Guo, Jun Guo

Responsive image

Auto-TLDR; Self-Paced Bottom-up Clustering Network with Side Information for Unsupervised Person Re-identification

Slides Poster Similar

Person re-identification (Re-ID) has attracted a lot of research attention in recent years. However, supervised methods demand an enormous amount of manually annotated data. In this paper, we propose a Self-Paced bottom-up Clustering Network with Side Information (SPCNet-SI) for unsupervised person Re-ID, where the side information comes from the serial number of the camera associated with each image. Specifically, our proposed SPCNet-SI exploits the camera side information to guide the feature learning and uses soft label in bottom-up clustering process, in which the camera association information is used in the repelled loss and the soft label based cluster information is used to select the candidate cluster pairs to merge. Moreover, a self-paced dynamic mechanism is developed to regularize the merging process such that the clustering is implemented in an easy-to-hard way with a slow-to-fast merging process. Experiments on two benchmark datasets Market-1501 and DukeMTMC-ReID demonstrate promising performance.

Color Texture Description Based on Holistic and Hierarchical Order-Encoding Patterns

Tiecheng Song, Jie Feng, Yuanlin Wang, Chenqiang Gao

Responsive image

Auto-TLDR; Holistic and Hierarchical Order-Encoding Patterns for Color Texture Classification

Slides Poster Similar

Local binary pattern (LBP), as one of the most representative texture operators, has attracted much attention in computer vision applications. Many LBP variants were developed in the literature. However, most of them were designed for gray images and their performance remains to be improved for color images. In this paper, we propose a novel color image descriptor named Holistic and Hierarchical Order-Encoding Patterns (H2OEP) for texture classification. In H2OEP, the holistic order-encoding pattern compactly encodes color order variation tendencies for each pixel in color space. The hierarchical order-encoding pattern leverages min ordering, median ordering and max ordering to encode local neighboring relationships across different color channels. Finally, the generated order-encoding patterns are aggregated via central pixel encoding to build 3D joint histograms for image representation. Experiments on four benchmark texture databases demonstrate the effectiveness of the proposed descriptor for color texture classification.

Mean Decision Rules Method with Smart Sampling for Fast Large-Scale Binary SVM Classification

Alexandra Makarova, Mikhail Kurbakov, Valentina Sulimova

Responsive image

Auto-TLDR; Improving Mean Decision Rule for Large-Scale Binary SVM Problems

Slides Poster Similar

This paper relies on the Mean Decision Rule (MDR) method for solving large-scale binary SVM problems. It consists in taking small random samples of the full dataset and separate training for each of them with consecutive averaging the respective individual decision rules to obtain a final one. This paper proposes two new approaches to improve it. The first proposed approach is a new sampling technique that exploits SVM and MDR properties to fast form so called smart samples by selecting only the objects, that are candidates to be the support ones. The proposed technique essentially increases MDR convergence and allows to reach the highest quality in less time. In the case of kernel-based MDR (KMDR) the proposed sampling technique allows additionally to reduce the number of support objects in the final decision rule and, as a result, to decrease the recognition time. The second proposed approach is a new data strategy to accelerate random access to large datasets stored in the traditional libsvm format. The proposed strategy allows to quickly extract random subsets of objects from a file and load them into RAM, and is it also suitable for any sampling-based methods, including stochastic gradient methods. Joint using of the proposed approaches with (K)MDR allows to obtain the best (or near the best) decision of large-scale binary SVM problems faster, compared to the existing SVM solvers.

Nearest Neighbor Classification Based on Activation Space of Convolutional Neural Network

Xinbo Ju, Shuo Shao, Huan Long, Weizhe Wang

Responsive image

Auto-TLDR; Convolutional Neural Network with Convex Hull Based Classifier

Poster Similar

In this paper, we propose a new image classifier based on the incorporation of the nearest neighbor algorithm and the activation space of convolutional neural network. The classifier has been successfully used on some state-of-the-art models and further improve their performance. Main technique tools we used are convex hull based classification and its acceleration. We find that 1) in several cases, the classifier can reach higher accuracy than original CNN; 2) by sampling, the classifier can work more efficiently; 3) centroid of each convex hull shows surprising ability in classification. Most of the work has strong geometry meanings, which helps us have a new understanding about convolutional layers.

On the Information of Feature Maps and Pruning of Deep Neural Networks

Mohammadreza Soltani, Suya Wu, Jie Ding, Robert Ravier, Vahid Tarokh

Responsive image

Auto-TLDR; Compressing Deep Neural Models Using Mutual Information

Slides Poster Similar

A technique for compressing deep neural models achieving competitive performance to state-of-the-art methods is proposed. The approach utilizes the mutual information between the feature maps and the output of the model in order to prune the redundant layers of the network. Extensive numerical experiments on both CIFAR-10, CIFAR-100, and Tiny ImageNet data sets demonstrate that the proposed method can be effective in compressing deep models, both in terms of the numbers of parameters and operations. For instance, by applying the proposed approach to DenseNet model with 0.77 million parameters and 293 million operations for classification of CIFAR-10 data set, a reduction of 62.66% and 41.00% in the number of parameters and the number of operations are respectively achieved, while increasing the test error only by less than 1%.

Supervised Domain Adaptation Using Graph Embedding

Lukas Hedegaard, Omar Ali Sheikh-Omar, Alexandros Iosifidis

Responsive image

Auto-TLDR; Domain Adaptation from the Perspective of Multi-view Graph Embedding and Dimensionality Reduction

Slides Poster Similar

Getting deep convolutional neural networks to perform well requires a large amount of training data. When the available labelled data is small, it is often beneficial to use transfer learning to leverage a related larger dataset (source) in order to improve the performance on the small dataset (target). Among the transfer learning approaches, domain adaptation methods assume that distributions between the two domains are shifted and attempt to realign them. In this paper, we consider the domain adaptation problem from the perspective of multi-view graph embedding and dimensionality reduction. Instead of solving the generalised eigenvalue problem to perform the embedding, we formulate the graph-preserving criterion as loss in the neural network and learn a domain-invariant feature transformation in an end-to-end fashion. We show that the proposed approach leads to a powerful Domain Adaptation framework which generalises the prior methods CCSA and d-SNE, and enables simple and effective loss designs; an LDA-inspired instantiation of the framework leads to performance on par with the state-of-the-art on the most widely used Domain Adaptation benchmarks, Office31 and MNIST to USPS datasets.

Rethinking Deep Active Learning: Using Unlabeled Data at Model Training

Oriane Siméoni, Mateusz Budnik, Yannis Avrithis, Guillaume Gravier

Responsive image

Auto-TLDR; Unlabeled Data for Active Learning

Slides Poster Similar

Active learning typically focuses on training a model on few labeled examples alone, while unlabeled ones are only used for acquisition. In this work we depart from this setting by using both labeled and unlabeled data during model training across active learning cycles. We do so by using unsupervised feature learning at the beginning of the active learning pipeline and semi-supervised learning at every active learning cycle, on all available data. The former has not been investigated before in active learning, while the study of latter in the context of deep learning is scarce and recent findings are not conclusive with respect to its benefit. Our idea is orthogonal to acquisition strategies by using more data, much like ensemble methods use more models. By systematically evaluating on a number of popular acquisition strategies and datasets, we find that the use of unlabeled data during model training brings a spectacular accuracy improvement in image classification, compared to the differences between acquisition strategies. We thus explore smaller label budgets, even one label per class.

Unveiling Groups of Related Tasks in Multi-Task Learning

Jordan Frecon, Saverio Salzo, Massimiliano Pontil

Responsive image

Auto-TLDR; Continuous Bilevel Optimization for Multi-Task Learning

Slides Poster Similar

A common approach in multi-task learning is to encourage the tasks to share a low dimensional representation. This has led to the popular method of trace norm regularization, which has proved effective in many applications. In this paper, we extend this approach by allowing the tasks to partition into different groups, within which trace norm regularization is separately applied. We propose a continuous bilevel optimization framework to simultaneously identify groups of related tasks and learn a low dimensional representation within each group. Hinging on recent results on the derivative of generalized matrix functions, we devise a smooth approximation of the upper-level objective via a dual forward-backward algorithm with Bregman distances. This allows us to solve the bilevel problem by a gradient-based scheme. Numerical experiments on synthetic and benchmark datasets support the effectiveness of the proposed method.

Graph Spectral Feature Learning for Mixed Data of Categorical and Numerical Type

Saswata Sahoo, Souradip Chakraborty

Responsive image

Auto-TLDR; Feature Learning in Mixed Type of Variable by an undirected graph

Slides Poster Similar

Feature learning in the presence of a mixed type of variables, numerical and categorical types, is important for related modeling problems. In this work, we propose a novel strategy to explicitly model the probabilistic dependence structure among the mixed type of variables by an undirected graph. The dependence structure among different pairs of variables are encoded by a suitable mapping function to estimate the edges of the graph. Spectral decomposition of the graph Laplacian provides the desired feature transformation. We numerically validate the implications of the feature learning strategy on various datasets in terms of data clustering.

T-SVD Based Non-Convex Tensor Completion and Robust Principal Component Analysis

Tao Li, Jinwen Ma

Responsive image

Auto-TLDR; Non-Convex tensor rank surrogate function and non-convex sparsity measure for tensor recovery

Slides Poster Similar

In this paper, we propose a novel non-convex tensor rank surrogate function and a novel non-convex sparsity measure. The basic idea is to sidestep the bias of $\ell_1-$norm by introducing the concavity. Furthermore, we employ this non-convex penalty in tensor recovery problems such as tensor completion and tensor robust principal component analysis. Due to the concavity, the parameters of these models are difficult to solve. To tackle this problem, we devise a majorization minimization algorithm that can optimize the upper bound of the original function in each iteration, and every sub-problem is solved by the alternating direction multiplier method. We also analyze the theoretical properties of the proposed algorithm. Finally, the experimental results on natural and hyperspectral images demonstrate the efficacy and efficiency of the proposed method.

Learning Sparse Deep Neural Networks Using Efficient Structured Projections on Convex Constraints for Green AI

Michel Barlaud, Frederic Guyard

Responsive image

Auto-TLDR; Constrained Deep Neural Network with Constrained Splitting Projection

Slides Poster Similar

In recent years, deep neural networks (DNN) have been applied to different domains and achieved dramatic performance improvements over state-of-the-art classical methods. These performances of DNNs were however often obtained with networks containing millions of parameters and which training required heavy computational power. In order to cope with this computational issue a huge literature deals with proximal regularization methods which are time consuming.\\ In this paper, we propose instead a constrained approach. We provide the general framework for our new splitting projection gradient method. Our splitting algorithm iterates a gradient step and a projection on convex sets. We study algorithms for different constraints: the classical $\ell_1$ unstructured constraint and structured constraints such as the nuclear norm, the $\ell_{2,1} $ constraint (Group LASSO). We propose a new $\ell_{1,1} $ structured constraint for which we provide a new projection algorithm We demonstrate the effectiveness of our method on three popular datasets (MNIST, Fashion MNIST and CIFAR). Experiments on these datasets show that our splitting projection method with our new $\ell_{1,1} $ structured constraint provides the best reduction of memory and computational power. Experiments show that fully connected linear DNN are more efficient for green AI.

Aggregating Dependent Gaussian Experts in Local Approximation

Hamed Jalali, Gjergji Kasneci

Responsive image

Auto-TLDR; A novel approach for aggregating the Gaussian experts by detecting strong violations of conditional independence

Slides Poster Similar

Distributed Gaussian processes (DGPs) are prominent local approximation methods to scale Gaussian processes (GPs) to large datasets. Instead of a global estimation, they train local experts by dividing the training set into subsets, thus reducing the time complexity. This strategy is based on the conditional independence assumption, which basically means that there is a perfect diversity between the local experts. In practice, however, this assumption is often violated, and the aggregation of experts leads to sub-optimal and inconsistent solutions. In this paper, we propose a novel approach for aggregating the Gaussian experts by detecting strong violations of conditional independence. The dependency between experts is determined by using a Gaussian graphical model, which yields the precision matrix. The precision matrix encodes conditional dependencies between experts and is used to detect strongly dependent experts and construct an improved aggregation. Using both synthetic and real datasets, our experimental evaluations illustrate that our new method outperforms other state-of-the-art (SOTA) DGP approaches while being substantially more time-efficient than SOTA approaches, which build on independent experts.

Heterogeneous Graph-Based Knowledge Transfer for Generalized Zero-Shot Learning

Junjie Wang, Xiangfeng Wang, Bo Jin, Junchi Yan, Wenjie Zhang, Hongyuan Zha

Responsive image

Auto-TLDR; Heterogeneous Graph-based Knowledge Transfer for Generalized Zero-Shot Learning

Slides Poster Similar

Generalized zero-shot learning (GZSL) tackles the problem of learning to classify instances involving both seen classes and unseen ones. The key issue is how to effectively transfer the model learned from seen classes to unseen classes. Existing works in GZSL usually assume that some prior information about unseen classes are available. However, such an assumption is unrealistic when new unseen classes appear dynamically. To this end, we propose a novel heterogeneous graph-based knowledge transfer method (HGKT) for GZSL, agnostic to unseen classes and instances, by leveraging graph neural network. Specifically, a structured heterogeneous graph is constructed with high-level representative nodes for seen classes, which are chosen through Wasserstein barycenter in order to simultaneously capture inter-class and intra-class relationship. The aggregation and embedding functions can be learned throughgraph neural network, which can be used to compute the embeddings of unseen classes by transferring the knowledge from their neighbors. Extensive experiments on public benchmark datasets show that our method achieves state-of-the-art results.

Supervised Classification Using Graph-Based Space Partitioning for Multiclass Problems

Nicola Yanev, Ventzeslav Valev, Adam Krzyzak, Karima Ben Suliman

Responsive image

Auto-TLDR; Box Classifier for Multiclass Classification

Slides Poster Similar

We introduce and investigate in multiclass setting an efficient classifier which partitions the training data by means of multidimensional parallelepipeds called boxes. We show that multiclass classification problem at hand can be solved by integrating the heuristic minimum clique cover approach and the k-nearest neighbor rule. Our algorithm is motivated an algorithm for partitioning a graph into a minimal number of maximal. The main advantage of the new classifier called Box classifier is that it optimally utilizes the geometrical structure of the training set by decomposing the l-class problem (l > 2) into l binary classification problems. We discuss computational complexity of the proposed Box classifier. The extensive experiments performed on the simulated and real data for binary and multiclass problems show that in almost all cases the Box classifier performs significantly better than k-NN, SVM and decision trees.