ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Self-Play or Group Practice: Learning to Play Alternating Markov Game in Multi-Agent System

Chin-Wing Leung, Shuyue Hu, Ho-Fung Leung

Auto-TLDR; Group Practice for Deep Reinforcement Learning

Abstract Slides Poster

The research in reinforcement learning has achieved great success in strategic game playing. These successes are thanks to the incorporation of deep reinforcement learning (DRL) and Monte Carlo Tree Search (MCTS) to the agent trained under the self-play (SP) environment. By self-play, agents are provided with an incrementally more difficult curriculum which in turn facilitate learning. However, recent research suggests that agents trained via self-play may easily lead to getting stuck in local equilibria. In this paper, we consider a population of agents each independently learns to play an alternating Markov game (AMG). We propose a new training framework---group practice---for a population of decentralized RL agents. By group practice (GP), agents are assigned into multiple learning groups during training, for every episode of games, an agent is randomly paired up and practices with another agent in the learning group. The convergence result to the optimal value function and the Nash equilibrium are proved under the GP framework. Experimental study is conducted by applying GP to Q-learning algorithm and the deep Q-learning with Monte-Carlo tree search on the game of Connect Four and the game of Hex. We verify that GP is the more efficient training scheme than SP given the same amount of training. We also show that the learning effectiveness can even be improved when applying local grouping to agents.

Similar papers

Learning from Learners: Adapting Reinforcement Learning Agents to Be Competitive in a Card Game

Pablo Vinicius Alves De Barros, Ana Tanevska, Alessandra Sciutti

Auto-TLDR; Adaptive Reinforcement Learning for Competitive Card Games

Self-Play or Group Practice: Learning to Play Alternating Markov Game in Multi-Agent System

Similar papers

Learning from Learners: Adapting Reinforcement Learning Agents to Be Competitive in a Card Game

AVD-Net: Attention Value Decomposition Network for Deep Multi-Agent Reinforcement Learning

Detecting and Adapting to Crisis Pattern with Context Based Deep Reinforcement Learning

The Effect of Multi-Step Methods on Overestimation in Deep Reinforcement Learning

Deep Reinforcement Learning on a Budget: 3D Control and Reasoning without a Supercomputer

Object-Oriented Map Exploration and Construction Based on Auxiliary Task Aided DRL

Low Dimensional State Representation Learning with Reward-Shaped Priors

A Bayesian Approach to Reinforcement Learning of Vision-Based Vehicular Control

Can Reinforcement Learning Lead to Healthy Life?: Simulation Study Based on User Activity Logs

Deep Reinforcement Learning for Autonomous Driving by Transferring Visual Features

Meta Learning Via Learned Loss

Trajectory Representation Learning for Multi-Task NMRDP Planning

Vacant Parking Space Detection Based on Task Consistency and Reinforcement Learning

Explore and Explain: Self-Supervised Navigation and Recounting

A Multilinear Sampling Algorithm to Estimate Shapley Values

A Novel Actor Dual-Critic Model for Remote Sensing Image Captioning

Adaptive Remote Sensing Image Attribute Learning for Active Object Detection

ActionSpotter: Deep Reinforcement Learning Framework for Temporal Action Spotting in Videos

AOAM: Automatic Optimization of Adjacency Matrix for Graph Convolutional Network

On Embodied Visual Navigation in Real Environments through Habitat

RLST: A Reinforcement Learning Approach to Scene Text Detection Refinement

DAG-Net: Double Attentive Graph Neural Network for Trajectory Forecasting

Visual Object Tracking in Drone Images with Deep Reinforcement Learning

An Intransitivity Model for Matchup and Pairwise Comparison

ILS-SUMM: Iterated Local Search for Unsupervised Video Summarization

Creating Classifier Ensembles through Meta-Heuristic Algorithms for Aerial Scene Classification

RNN Training along Locally Optimal Trajectories via Frank-Wolfe Algorithm

Low-Cost Lipschitz-Independent Adaptive Importance Sampling of Stochastic Gradients

Improving Visual Question Answering Using Active Perception on Static Images

Unveiling Groups of Related Tasks in Multi-Task Learning

Switching Dynamical Systems with Deep Neural Networks

Leveraging Sequential Pattern Information for Active Learning from Sequential Data

Deep Next-Best-View Planner for Cross-Season Visual Route Classification

Recurrent Deep Attention Network for Person Re-Identification

Hierarchical Multimodal Attention for Deep Video Summarization

Learning Stable Deep Predictive Coding Networks with Weight Norm Supervision

Uniform and Non-Uniform Sampling Methods for Sub-Linear Time K-Means Clustering

AG-GAN: An Attentive Group-Aware GAN for Pedestrian Trajectory Prediction

A Heuristic-Based Decision Tree for Connected Components Labeling of 3D Volumes

Aggregating Dependent Gaussian Experts in Local Approximation

Augmented Bi-Path Network for Few-Shot Learning

Multiple Future Prediction Leveraging Synthetic Trajectories

Automatically Mining Relevant Variable Interactions Via Sparse Bayesian Learning

SAILenv: Learning in Virtual Visual Environments Made Simple

Progressive Learning Algorithm for Efficient Person Re-Identification

Active Sampling for Pairwise Comparisons via Approximate Message Passing and Information Gain Maximization

Improving Robotic Grasping on Monocular Images Via Multi-Task Learning and Positional Loss

Stochastic Runge-Kutta Methods and Adaptive SGD-G2 Stochastic Gradient Descent