ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Low-Cost Lipschitz-Independent Adaptive Importance Sampling of Stochastic Gradients

Huikang Liu, Xiaolu Wang, Jiajin Li, Man-Cho Anthony So

Auto-TLDR; Adaptive Importance Sampling for Stochastic Gradient Descent

Abstract Slides

Stochastic gradient descent (SGD) usually samples training data based on the uniform distribution, which may not be a good choice because of the high variance of its stochastic gradient. Thus, importance sampling methods are considered in the literature to improve the performance. Most previous work on SGD-based methods with importance sampling requires the knowledge of Lipschitz constants of all component gradients, which are in general difficult to estimate. In this paper, we study an adaptive importance sampling method for common SGD-based methods by exploiting the local first-order information without knowing any Lipschitz constants. In particular, we periodically changes the sampling distribution by only utilizing the gradient norms in the past few iterations. We prove that our adaptive importance sampling non-asymptotically reduces the variance of the stochastic gradients in SGD, and thus better convergence bounds than that for vanilla SGD can be obtained. We extend this sampling method to several other widely used stochastic gradient algorithms including SGD with momentum and ADAM. Experiments on common convex learning problems and deep neural networks illustrate notably enhanced performance using the adaptive sampling strategy.

Similar papers

RNN Training along Locally Optimal Trajectories via Frank-Wolfe Algorithm

Yun Yue, Ming Li, Venkatesh Saligrama, Ziming Zhang

Auto-TLDR; Frank-Wolfe Algorithm for Efficient Training of RNNs

Low-Cost Lipschitz-Independent Adaptive Importance Sampling of Stochastic Gradients

Similar papers

RNN Training along Locally Optimal Trajectories via Frank-Wolfe Algorithm

Stochastic Runge-Kutta Methods and Adaptive SGD-G2 Stochastic Gradient Descent

Learning Sign-Constrained Support Vector Machines

Learning Sparse Deep Neural Networks Using Efficient Structured Projections on Convex Constraints for Green AI

Classification and Feature Selection Using a Primal-Dual Method and Projections on Structured Constraints

Mean Decision Rules Method with Smart Sampling for Fast Large-Scale Binary SVM Classification

Unveiling Groups of Related Tasks in Multi-Task Learning

Uniform and Non-Uniform Sampling Methods for Sub-Linear Time K-Means Clustering

An Efficient Empirical Solver for Localized Multiple Kernel Learning Via DNNs

A New Convex Loss Function for Multiple Instance Support Vector Machines

A Multilinear Sampling Algorithm to Estimate Shapley Values

Multi-Modal Deep Clustering: Unsupervised Partitioning of Images

Regularized Flexible Activation Function Combinations for Deep Neural Networks

Improving Batch Normalization with Skewness Reduction for Deep Neural Networks

Automatically Mining Relevant Variable Interactions Via Sparse Bayesian Learning

Hierarchical Routing Mixture of Experts

Bayesian Active Learning for Maximal Information Gain on Model Parameters

Learning to Prune in Training via Dynamic Channel Propagation

Aggregating Dependent Gaussian Experts in Local Approximation

Is the Meta-Learning Idea Able to Improve the Generalization of Deep Neural Networks on the Standard Supervised Learning?

Overcoming Noisy and Irrelevant Data in Federated Learning

A Randomized Algorithm for Sparse Recovery

On Resource-Efficient Bayesian Network Classifiers and Deep Neural Networks

Exploiting Non-Linear Redundancy for Neural Model Compression

Adaptive Matching of Kernel Means

Norm Loss: An Efficient yet Effective Regularization Method for Deep Neural Networks

Rethinking Experience Replay: A Bag of Tricks for Continual Learning

Progressive Learning Algorithm for Efficient Person Re-Identification

Learning Stable Deep Predictive Coding Networks with Weight Norm Supervision

P-DIFF: Learning Classifier with Noisy Labels Based on Probability Difference Distributions

Compression Strategies and Space-Conscious Representations for Deep Neural Networks

E-DNAS: Differentiable Neural Architecture Search for Embedded Systems

Revisiting Graph Neural Networks: Graph Filtering Perspective

Improved Deep Classwise Hashing with Centers Similarity Learning for Image Retrieval

Edge-Aware Graph Attention Network for Ratio of Edge-User Estimation in Mobile Networks

Subspace Clustering Via Joint Unsupervised Feature Selection

Double Manifolds Regularized Non-Negative Matrix Factorization for Data Representation

Class-Incremental Learning with Pre-Allocated Fixed Classifiers

Feature Extraction by Joint Robust Discriminant Analysis and Inter-Class Sparsity

HFP: Hardware-Aware Filter Pruning for Deep Convolutional Neural Networks Acceleration

Probabilistic Latent Factor Model for Collaborative Filtering with Bayesian Inference

Adaptive Noise Injection for Training Stochastic Student Networks from Deterministic Teachers

Adaptive Sampling of Pareto Frontiers with Binary Constraints Using Regression and Classification

Meta Soft Label Generation for Noisy Labels

Sketch-Based Community Detection Via Representative Node Sampling

Learning Connectivity with Graph Convolutional Networks

Dynamic Multi-Path Neural Network

T-SVD Based Non-Convex Tensor Completion and Robust Principal Component Analysis