Bhiksha Raj

Papers from this author

Exploiting Non-Linear Redundancy for Neural Model Compression

Muhammad Ahmed Shah, Raphael Olivier, Bhiksha Raj

Responsive image

Auto-TLDR; Compressing Deep Neural Networks with Linear Dependency

Slides Poster Similar

Deploying deep learning models with millions, even billions, of parameters is challenging given real world memory, power and compute constraints. In an effort to make these models more practical, in this paper, we propose a novel model compression approach that exploits linear dependence between the activations in a layer to eliminate entire structural units (neurons/convolutional filters). Our approach also adjusts the weights of the layer in a manner that is provably lossless while training if the removed neuron was perfectly predictable. We combine this approach with an annealing algorithm that may be applied during training, or even on a trained model, and demonstrate, using popular datasets, that our technique can reduce the parameters of VGG and AlexNet by more than 97\% on \cifar, 85\% on \caltech, and 19\% on ImageNet at less than 2\% loss in accuracy. Furthermore, we provide theoretical results showing that in overparametrized, locally linear (ReLU) neural networks where redundant features exist, and with correct hyperparameter selection, our method is indeed able to capture and suppress those dependencies.

Hierarchical Routing Mixture of Experts

Wenbo Zhao, Yang Gao, Shahan Ali Memon, Bhiksha Raj, Rita Singh

Responsive image

Auto-TLDR; A Binary Tree-structured Hierarchical Routing Mixture of Experts for Regression

Slides Poster Similar

In regression tasks the distribution of the data is often too complex to be fitted by a single model. In contrast, partition-based models are developed where data is divided and fitted by local models. These models partition the input space and do not leverage the input-output dependency of multimodal-distributed data, and strong local models are needed to make good predictions. Addressing these problems, we propose a binary tree-structured hierarchical routing mixture of experts (HRME) model that has classifiers as non-leaf node experts and simple regression models as leaf node experts. The classifier nodes jointly soft-partition the input-output space based on the natural separateness of multimodal data. This enables simple leaf experts to be effective for prediction. Further, we develop a probabilistic framework for the HRME model, and propose a recursive Expectation-Maximization (EM) based algorithm to learn both the tree structure and the expert models. Experiments on a collection of regression tasks validate the effectiveness of our method compared to a variety of other regression models.

Optimal Strategies for Comparing Covariates to Solve Matching Problems

Muhammad Ahmed Shah, Raphael Olivier, Bhiksha Raj

Responsive image

Auto-TLDR; Covariate Matching for Pairwise Verification and Ranking

Slides Poster Similar

Many machine learning tasks can be posed as matching problems in which we are given a ``probe'' entry that we expect matches some of the entries in our ``gallery''. The general solution to these problems is to retrieve matching entries based on statistical dependencies between the probe and the gallery data that are learned using complex models. Often, however, there are other common {\em covariates} to the probe and gallery data which might be easily inferred and may explain some of the statistical dependencies between the two. In this paper we present a probabilistic framework to derive optimal matching strategies based only on covariate features for three broad tasks, namely \textit{$N$-way classification}, \textit{pairwise verification} and \textit{ranking}. We use canonical metrics to determine the maximum performance that can be expected if only covariate features are used and determine the marginal gain of using complex models. We find that covariate matching achieves an EER within 10\% of a CNN in the verification task, and an MAP within 22\% of the a DNN based model in the ranking task.