ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

RNN Training along Locally Optimal Trajectories via Frank-Wolfe Algorithm

Yun Yue, Ming Li, Venkatesh Saligrama, Ziming Zhang

Auto-TLDR; Frank-Wolfe Algorithm for Efficient Training of RNNs

Abstract Slides Poster

We propose a novel and efficient training method for RNNs by iteratively seeking a local minima on the loss surface within a small region, and leverage this directional vector for the update, in an outer-loop. We propose to utilize the Frank-Wolfe (FW) algorithm in this context. Although, FW implicitly involves normalized gradients, which can lead to a slow convergence rate, we develop a novel RNN training method that, surprisingly, even with the additional cost, the overall training cost is empirically observed to be lower than back-propagation. Our method leads to a new Frank-Wolfe method, that is in essence an SGD algorithm with a restart scheme. We prove that under certain conditions our algorithm has a sublinear convergence rate of $O(1/\epsilon)$ for $\epsilon$ error. We then conduct empirical experiments on several benchmark datasets including those that exhibit long-term dependencies, and show significant performance improvement. We also experiment with deep RNN architectures and show efficient training performance. Finally, we demonstrate that our training method is robust to noisy data.

Similar papers

Learning Stable Deep Predictive Coding Networks with Weight Norm Supervision

Guo Ruohao

Auto-TLDR; Stability of Predictive Coding Network with Weight Norm Supervision

RNN Training along Locally Optimal Trajectories via Frank-Wolfe Algorithm

Similar papers

Learning Stable Deep Predictive Coding Networks with Weight Norm Supervision

Low-Cost Lipschitz-Independent Adaptive Importance Sampling of Stochastic Gradients

Learning Sparse Deep Neural Networks Using Efficient Structured Projections on Convex Constraints for Green AI

An Efficient Empirical Solver for Localized Multiple Kernel Learning Via DNNs

Stochastic Runge-Kutta Methods and Adaptive SGD-G2 Stochastic Gradient Descent

Improving Batch Normalization with Skewness Reduction for Deep Neural Networks

Norm Loss: An Efficient yet Effective Regularization Method for Deep Neural Networks

Revisiting the Training of Very Deep Neural Networks without Skip Connections

Regularized Flexible Activation Function Combinations for Deep Neural Networks

Exploiting Non-Linear Redundancy for Neural Model Compression

Learning Sign-Constrained Support Vector Machines

Meta Learning Via Learned Loss

Learning Connectivity with Graph Convolutional Networks

Classification and Feature Selection Using a Primal-Dual Method and Projections on Structured Constraints

Unveiling Groups of Related Tasks in Multi-Task Learning

On the Global Self-attention Mechanism for Graph Convolutional Networks

Learning with Multiplicative Perturbations

Energy Minimum Regularization in Continual Learning

Is the Meta-Learning Idea Able to Improve the Generalization of Deep Neural Networks on the Standard Supervised Learning?

Generalization Comparison of Deep Neural Networks Via Output Sensitivity

Trajectory-User Link with Attention Recurrent Networks

Switching Dynamical Systems with Deep Neural Networks

Neuron-Based Network Pruning Based on Majority Voting

WeightAlign: Normalizing Activations by Weight Alignment

Speeding-Up Pruning for Artificial Neural Networks: Introducing Accelerated Iterative Magnitude Pruning

Adaptive Noise Injection for Training Stochastic Student Networks from Deterministic Teachers

A Multilinear Sampling Algorithm to Estimate Shapley Values

P-DIFF: Learning Classifier with Noisy Labels Based on Probability Difference Distributions

MA-LSTM: A Multi-Attention Based LSTM for Complex Pattern Extraction

N2D: (Not Too) Deep Clustering Via Clustering the Local Manifold of an Autoencoded Embedding

Dimensionality Reduction for Data Visualization and Linear Classification, and the Trade-Off between Robustness and Classification Accuracy

Rethinking Experience Replay: A Bag of Tricks for Continual Learning

Kernel-based Graph Convolutional Networks

Meta Soft Label Generation for Noisy Labels

Resource-efficient DNNs for Keyword Spotting using Neural Architecture Search and Quantization

E-DNAS: Differentiable Neural Architecture Search for Embedded Systems

Boundaries of Single-Class Regions in the Input Space of Piece-Wise Linear Neural Networks

Feature Engineering and Stacked Echo State Networks for Musical Onset Detection

Can Data Placement Be Effective for Neural Networks Classification Tasks? Introducing the Orthogonal Loss

Towards Explaining Adversarial Examples Phenomenon in Artificial Neural Networks

Understanding Integrated Gradients with SmoothTaylor for Deep Neural Network Attribution

On Resource-Efficient Bayesian Network Classifiers and Deep Neural Networks

Revisiting Graph Neural Networks: Graph Filtering Perspective

Trainable Spectrally Initializable Matrix Transformations in Convolutional Neural Networks

A Randomized Algorithm for Sparse Recovery

Quaternion Capsule Networks

HFP: Hardware-Aware Filter Pruning for Deep Convolutional Neural Networks Acceleration

Compression Strategies and Space-Conscious Representations for Deep Neural Networks