ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Stochastic Runge-Kutta Methods and Adaptive SGD-G2 Stochastic Gradient Descent

Gabriel Turinici, Imen Ayadi

Auto-TLDR; Adaptive Stochastic Runge Kutta for the Minimization of the Loss Function

Abstract Slides Poster

The minimization of the loss function is of paramount importance in deep neural networks. Many popular optimization algorithms have been shown to correspond to some evolution equation of gradient flow type. Inspired by the numerical schemes used for general evolution equations, we introduce a second-order stochastic Runge Kutta method and show that it yields a consistent procedure for the minimization of the loss function. In addition, it can be coupled, in an adaptive framework, with the Stochastic Gradient Descent (SGD) to adjust automatically the learning rate of the SGD The resulting adaptive SGD, called SGD-G2, shows good results in terms of convergence speed when tested on standard data-sets.

Similar papers

Low-Cost Lipschitz-Independent Adaptive Importance Sampling of Stochastic Gradients

Huikang Liu, Xiaolu Wang, Jiajin Li, Man-Cho Anthony So

Auto-TLDR; Adaptive Importance Sampling for Stochastic Gradient Descent

Abstract Slides Similar

Stochastic gradient descent (SGD) usually samples training data based on the uniform distribution, which may not be a good choice because of the high variance of its stochastic gradient. Thus, importance sampling methods are considered in the literature to improve the performance. Most previous work on SGD-based methods with importance sampling requires the knowledge of Lipschitz constants of all component gradients, which are in general difficult to estimate. In this paper, we study an adaptive importance sampling method for common SGD-based methods by exploiting the local first-order information without knowing any Lipschitz constants. In particular, we periodically changes the sampling distribution by only utilizing the gradient norms in the past few iterations. We prove that our adaptive importance sampling non-asymptotically reduces the variance of the stochastic gradients in SGD, and thus better convergence bounds than that for vanilla SGD can be obtained. We extend this sampling method to several other widely used stochastic gradient algorithms including SGD with momentum and ADAM. Experiments on common convex learning problems and deep neural networks illustrate notably enhanced performance using the adaptive sampling strategy.

RNN Training along Locally Optimal Trajectories via Frank-Wolfe Algorithm

Yun Yue, Ming Li, Venkatesh Saligrama, Ziming Zhang

Auto-TLDR; Frank-Wolfe Algorithm for Efficient Training of RNNs

Stochastic Runge-Kutta Methods and Adaptive SGD-G2 Stochastic Gradient Descent

Similar papers

Low-Cost Lipschitz-Independent Adaptive Importance Sampling of Stochastic Gradients

RNN Training along Locally Optimal Trajectories via Frank-Wolfe Algorithm

Unveiling Groups of Related Tasks in Multi-Task Learning

A Multilinear Sampling Algorithm to Estimate Shapley Values

Bayesian Active Learning for Maximal Information Gain on Model Parameters

Learning Sign-Constrained Support Vector Machines

Separation of Aleatoric and Epistemic Uncertainty in Deterministic Deep Neural Networks

Dimensionality Reduction for Data Visualization and Linear Classification, and the Trade-Off between Robustness and Classification Accuracy

Learning Sparse Deep Neural Networks Using Efficient Structured Projections on Convex Constraints for Green AI

Learning Stable Deep Predictive Coding Networks with Weight Norm Supervision

Classification and Feature Selection Using a Primal-Dual Method and Projections on Structured Constraints

Mean Decision Rules Method with Smart Sampling for Fast Large-Scale Binary SVM Classification

Can Data Placement Be Effective for Neural Networks Classification Tasks? Introducing the Orthogonal Loss

Generalization Comparison of Deep Neural Networks Via Output Sensitivity

Speeding-Up Pruning for Artificial Neural Networks: Introducing Accelerated Iterative Magnitude Pruning

Improving Batch Normalization with Skewness Reduction for Deep Neural Networks

Energy Minimum Regularization in Continual Learning

Improved Time-Series Clustering with UMAP Dimension Reduction Method

Hcore-Init: Neural Network Initialization Based on Graph Degeneracy

A Close Look at Deep Learning with Small Data

Interpolation in Auto Encoders with Bridge Processes

Auto Encoding Explanatory Examples with Stochastic Paths

ResNet-Like Architecture with Low Hardware Requirements

Towards Explaining Adversarial Examples Phenomenon in Artificial Neural Networks

An Efficient Empirical Solver for Localized Multiple Kernel Learning Via DNNs

Is the Meta-Learning Idea Able to Improve the Generalization of Deep Neural Networks on the Standard Supervised Learning?

Exploiting Non-Linear Redundancy for Neural Model Compression

Uniform and Non-Uniform Sampling Methods for Sub-Linear Time K-Means Clustering

3CS Algorithm for Efficient Gaussian Process Model Retrieval

Norm Loss: An Efficient yet Effective Regularization Method for Deep Neural Networks

Probability Guided Maxout

A Randomized Algorithm for Sparse Recovery

Meta Soft Label Generation for Noisy Labels

Feature Extraction and Selection Via Robust Discriminant Analysis and Class Sparsity

Sparse Network Inversion for Key Instance Detection in Multiple Instance Learning

Deep Transformation Models: Tackling Complex Regression Problems with Neural Network Based Transformation Models

Generative Latent Implicit Conditional Optimization When Learning from Small Sample

Adaptive Sampling of Pareto Frontiers with Binary Constraints Using Regression and Classification

Naturally Constrained Online Expectation Maximization

Fractional Adaptation of Activation Functions in Neural Networks

Boundary Optimised Samples Training for Detecting Out-Of-Distribution Images

Neuron-Based Network Pruning Based on Majority Voting

Quantifying Model Uncertainty in Inverse Problems Via Bayesian Deep Gradient Descent

Understanding Integrated Gradients with SmoothTaylor for Deep Neural Network Attribution

Supervised Domain Adaptation Using Graph Embedding

Contextual Classification Using Self-Supervised Auxiliary Models for Deep Neural Networks

Killing Four Birds with One Gaussian Process: The Relation between Different Test-Time Attacks

Progressive Gradient Pruning for Classification, Detection and Domain Adaptation