Self-Play or Group Practice: Learning to Play Alternating Markov Game in Multi-Agent System

Chin-Wing Leung, Shuyue Hu, Ho-Fung Leung

Responsive image

Auto-TLDR; Group Practice for Deep Reinforcement Learning

Slides Poster

The research in reinforcement learning has achieved great success in strategic game playing. These successes are thanks to the incorporation of deep reinforcement learning (DRL) and Monte Carlo Tree Search (MCTS) to the agent trained under the self-play (SP) environment. By self-play, agents are provided with an incrementally more difficult curriculum which in turn facilitate learning. However, recent research suggests that agents trained via self-play may easily lead to getting stuck in local equilibria. In this paper, we consider a population of agents each independently learns to play an alternating Markov game (AMG). We propose a new training framework---group practice---for a population of decentralized RL agents. By group practice (GP), agents are assigned into multiple learning groups during training, for every episode of games, an agent is randomly paired up and practices with another agent in the learning group. The convergence result to the optimal value function and the Nash equilibrium are proved under the GP framework. Experimental study is conducted by applying GP to Q-learning algorithm and the deep Q-learning with Monte-Carlo tree search on the game of Connect Four and the game of Hex. We verify that GP is the more efficient training scheme than SP given the same amount of training. We also show that the learning effectiveness can even be improved when applying local grouping to agents.

Similar papers

Learning from Learners: Adapting Reinforcement Learning Agents to Be Competitive in a Card Game

Pablo Vinicius Alves De Barros, Ana Tanevska, Alessandra Sciutti

Responsive image

Auto-TLDR; Adaptive Reinforcement Learning for Competitive Card Games

Slides Poster Similar

Learning how to adapt to complex and dynamic environments is one of the most important factors that contribute to our intelligence. Endowing artificial agents with this ability is not a simple task, particularly in competitive scenarios. In this paper, we present a broad study on how popular reinforcement learning algorithms can be adapted and implemented to learn and to play a real-world implementation of a competitive multiplayer card game. We propose specific training and validation routines for the learning agents, in order to evaluate how the agents learn to be competitive and explain how they adapt to each others' playing style. Finally, we pinpoint how the behavior of each agent derives from their learning style and create a baseline for future research on this scenario.

AVD-Net: Attention Value Decomposition Network for Deep Multi-Agent Reinforcement Learning

Zhang Yuanxin, Huimin Ma, Yu Wang

Responsive image

Auto-TLDR; Attention Value Decomposition Network for Cooperative Multi-agent Reinforcement Learning

Slides Poster Similar

Multi-agent reinforcement learning (MARL) is of importance for variable real-world applications but remains more challenges like stationarity and scalability. While recently value function factorization methods have obtained empirical good results in cooperative multi-agent environment, these works mostly focus on the decomposable learning structures. Inspired by the application of attention mechanism in machine translation and other related domains, we propose an attention based approach called attention value decomposition network (AVD-Net), which capitalizes on the coordination relations between agents. AVD-Net employs centralized training with decentralized execution (CTDE) paradigm, which factorizes the joint action-value functions with only local observations and actions of agents. Our method is evaluated on multi-agent particle environment (MPE) and StarCraft micromanagement environment (SMAC). The experiment results show the strength of our approach compared to existing methods with state-of-the-art performance in cooperative scenarios.

Detecting and Adapting to Crisis Pattern with Context Based Deep Reinforcement Learning

Eric Benhamou, David Saltiel Saltiel, Jean-Jacques Ohana Ohana, Jamal Atif Atif

Responsive image

Auto-TLDR; Deep Reinforcement Learning for Financial Crisis Detection and Dis-Investment

Slides Poster Similar

Deep reinforcement learning (DRL) has reached super human levels in complexes tasks like game solving (Go, StarCraft II), and autonomous driving. However, it remains an open question whether DRL can reach human level in applications to financial problems and in particular in detecting pattern crisis and consequently dis-investing. In this paper, we present an innovative DRL framework consisting in two sub-networks fed respectively with portfolio strategies past performances and standard deviation as well as additional contextual features. The second sub network plays an important role as it captures dependencies with common financial indicators features like risk aversion, economic surprise index and correlations between assets that allows taking into account context based information. We compare different network architectures either using layers of convolutions to reduce network's complexity or LSTM block to capture time dependency and whether previous allocations is important in the modeling. We also use adversarial training to make the final model more robust. Results on test set show this approach substantially over-performs traditional portfolio optimization methods like Markovitz and is able to detect and anticipate crisis like the current Covid one.

The Effect of Multi-Step Methods on Overestimation in Deep Reinforcement Learning

Lingheng Meng, Rob Gorbet, Dana Kulić

Responsive image

Auto-TLDR; Multi-Step DDPG for Deep Reinforcement Learning

Slides Poster Similar

Multi-step (also called n-step) methods in reinforcement learning (RL) have been shown to be more efficient than the 1-step method due to faster propagation of the reward signal, both theoretically and empirically, in tasks exploiting tabular representation of the value-function. Recently, research in Deep Reinforcement Learning (DRL) also shows that multi-step methods improve learning speed and final performance in applications where the value-function and policy are represented with deep neural networks. However, there is a lack of understanding about what is actually contributing to the boost of performance. In this work, we analyze the effect of multi-step methods on alleviating the overestimation problem in DRL, where multi-step experiences are sampled from a replay buffer. Specifically building on top of Deep Deterministic Policy Gradient (DDPG), we experiment with Multi-step DDPG (MDDPG), where different step sizes are manually set, and with a variant called Mixed Multi-step DDPG (MMDDPG) where an average over different multi-step backups is used as target Q-value. Empirically, we show that both MDDPG and MMDDPG are significantly less affected by the overestimation problem than DDPG with 1-step backup, which consequently results in better final performance and learning speed. We also discuss the advantages and disadvantages of different ways to do multi-step expansion in order to reduce approximation error, and expose the tradeoff between overestimation and underestimation that underlies offline multi-step methods. Finally, we compare the computational resource needs of TD3 and our proposed methods, since they show comparable final performance and learning speed.

Deep Reinforcement Learning on a Budget: 3D Control and Reasoning without a Supercomputer

Edward Beeching, Jilles Steeve Dibangoye, Olivier Simonin, Christian Wolf

Responsive image

Auto-TLDR; Deep Reinforcement Learning in Mobile Robots Using 3D Environment Scenarios

Slides Poster Similar

An important goal of research in Deep Reinforcement Learning in mobile robotics is to train agents capableof solving complex tasks, which require a high level of scene understanding and reasoning from an egocentric perspective.When trained from simulations, optimal environments should satisfy a currently unobtainable combination of high-fidelity photographic observations, massive amounts of different environment configurations and fast simulation speeds. In this paper we argue that research on training agents capable of complex reasoning can be simplified by decoupling from the requirement of high fidelity photographic observations. We present a suite of tasks requiring complex reasoning and exploration in continuous,partially observable 3D environments. The objective is to provide challenging scenarios and a robust baseline agent architecture that can be trained on mid-range consumer hardware in under 24h. Our scenarios combine two key advantages: (i) they are based on a simple but highly efficient 3D environment (ViZDoom)which allows high speed simulation (12000fps); (ii) the scenarios provide the user with a range of difficulty settings, in order to identify the limitations of current state of the art algorithms and network architectures. We aim to increase accessibility to the field of Deep-RL by providing baselines for challenging scenarios where new ideas can be iterated on quickly. We argue that the community should be able to address challenging problems in reasoning of mobile agents without the need for a large compute infrastructure.

Object-Oriented Map Exploration and Construction Based on Auxiliary Task Aided DRL

Junzhe Xu, Jianhua Zhang, Shengyong Chen, Honghai Liu

Responsive image

Auto-TLDR; Auxiliary Task Aided Deep Reinforcement Learning for Environment Exploration by Autonomous Robots

Similar

Environment exploration by autonomous robots through deep reinforcement learning (DRL) based methods has attracted more and more attention. However, existing methods usually focus on robot navigation to single or multiple fixed goals, while ignoring the perception and construction of external environments. In this paper, we propose a novel environment exploration task based on DRL, which requires a robot fast and completely perceives all objects of interest, and reconstructs their poses in a global environment map, as much as the robot can do. To this end, we design an auxiliary task aided DRL model, which is integrated with the auxiliary object detection and 6-DoF pose estimation components. The outcome of auxiliary tasks can improve the learning speed and robustness of DRL, as well as the accuracy of object pose estimation. Comprehensive experimental results on the indoor simulation platform AI2-THOR have shown the effectiveness and robustness of our method.

Low Dimensional State Representation Learning with Reward-Shaped Priors

Nicolò Botteghi, Ruben Obbink, Daan Geijs, Mannes Poel, Beril Sirmacek, Christoph Brune, Abeje Mersha, Stefano Stramigioli

Responsive image

Auto-TLDR; Unsupervised Learning for Unsupervised Reinforcement Learning in Robotics

Poster Similar

Reinforcement Learning has been able to solve many complicated robotics tasks without any need of feature engineering in an end-to-end fashion. However, learning the optimal policy directly from the sensory inputs, i.e the observations, often requires processing and storage of huge amount of data. In the context of robotics, the cost of data from real robotics hardware is usually very high, thus solutions that achieves high sample-efficiency are needed. We propose a method that aims at learning a mapping from the observations into a lower dimensional state space. This mapping is learned with unsupervised learning using loss functions shaped to incorporate prior knowledge of the environment and the task. Using the samples from the state space, the optimal policy is quickly and efficiently learned. We test the method on several mobile robot navigation tasks in simulation environment and also on a real robot.

A Bayesian Approach to Reinforcement Learning of Vision-Based Vehicular Control

Zahra Gharaee, Karl Holmquist, Linbo He, Michael Felsberg

Responsive image

Auto-TLDR; Bayesian Reinforcement Learning for Autonomous Driving

Slides Poster Similar

In this paper, we present a state-of-the-art reinforcement learning method for autonomous driving. Our approach employs temporal difference learning in a Bayesian framework to learn vehicle control signals from sensor data. The agent has access to images from a forward facing camera, which are pre-processed to generate semantic segmentation maps. We trained our system using both ground truth and estimated semantic segmentation input. Based on our observations from a large set of experiments, we conclude that training the system on ground truth input data leads to better performance than training the system on estimated input even if estimated input is used for evaluation. The system is trained and evaluated in a realistic simulated urban environment using the CARLA simulator. The simulator also contains a benchmark that allows for comparing to other systems and methods. The required training time of the system is shown to be lower and the performance on the benchmark superior to competing approaches.

Can Reinforcement Learning Lead to Healthy Life?: Simulation Study Based on User Activity Logs

Masami Takahashi, Masahiro Kohjima, Takeshi Kurashima, Hiroyuki Toda

Responsive image

Auto-TLDR; Reinforcement Learning for Healthy Daily Life

Slides Poster Similar

The importance of developing an application based on intervention technology that leads to a healthier life is widely recognized. A challenging part of realizing the application is the need for planning, i.e., considering a user's health goal (e.g., sleep at 10:00 p.m. to get enough sleep), providing intervention at the appropriate timing to help the user achieve the goal. The reinforcement learning (RL) approach is well suited to this type of problem since it is a methodology for planning; RL finds the optimal strategy as that which maximizes future expected profit. The purpose of this study is to clarify the effects of intervention based on RL to support healthy daily life. Therefore, we (i) collect real daily activity data from participants, (ii) generate a user model that imitates the user's response to system interventions, (iii) examine valuable goals and design them as rewards in RL and (iv) obtain optimal intervention strategies by RL via simulations given a user model and goals. We evaluate a generated user model and verify by simulations whether our method could successfully achieve the goal. In addition, we analyze the cases that demonstrated higher probability of achieving the goal and report the features.

Deep Reinforcement Learning for Autonomous Driving by Transferring Visual Features

Hongli Zhou, Guanwen Zhang, Wei Zhou

Responsive image

Auto-TLDR; Deep Reinforcement Learning for Autonomous Driving by Transferring Visual Features

Slides Poster Similar

Deep reinforcement learning (DRL) has achieved great success in processing vision-based driving tasks. However, the end-to-end training manner makes DRL agents suffer from overfitting training scenes. The agents easily fail to generalize to unseen environments. In this paper, we propose a deep reinforcement learning for autonomous driving by transferring visual features. We formulate the DRL training as a perception and control module and introduce adversarial training mechanism for autonomous driving. The perception module is able to extract invariant features between different domains through adversarial training. While the DRL agent can then be trained on the basis of low dimensional states. In this manner, the proposed approach enables trained agents to adapt to unseen environments by learning robust features invariant across various scenes. We evaluate the proposed approach by transferring visual features between different simulators. The experimental results demonstrate the driving policy trained in the source domain can be directly applied in the target domain, and achieve great efficient and effective performance for autonomous driving.

Meta Learning Via Learned Loss

Sarah Bechtle, Artem Molchanov, Yevgen Chebotar, Edward Thomas Grefenstette, Ludovic Righetti, Gaurav Sukhatme, Franziska Meier

Responsive image

Auto-TLDR; meta-learning for learning parametric loss functions that generalize across different tasks and model architectures

Slides Similar

Typically, loss functions, regularization mechanisms and other important aspects of training parametric models are chosen heuristically from a limited set of options. In this paper, we take the first step towards automating this process, with the view of producing models which train faster and more robustly. Concretely, we present a meta-learning method for learning parametric loss functions that can generalize across different tasks and model architectures. We develop a pipeline for “meta-training” such loss functions, targeted at maximizing the performance of the model trained under them. The loss landscape produced by our learned losses significantly improves upon the original task-specific losses in both supervised and reinforcement learning tasks. Furthermore, we show that our meta-learning framework is flexible enough to incorporate additional information at meta-train time. This information shapes the learned loss function such that the environment does not need to provide this information during meta-test time.

Trajectory Representation Learning for Multi-Task NMRDP Planning

Firas Jarboui, Vianney Perchet

Responsive image

Auto-TLDR; Exploring Non Markovian Reward Decision Processes for Reinforcement Learning

Slides Poster Similar

Expanding Non Markovian Reward Decision Processes (NMRDP) into Markov Decision Processes (MDP) enables the use of state of the art Reinforcement Learning (RL) techniques to identify optimal policies. In this paper an approach to exploring NMRDPs and expanding them into MDPs, without the prior knowledge of the reward structure, is proposed. The non Markovianity of the reward function is disentangled under the assumption that sets of similar and dissimilar trajectory batches can be sampled. More precisely, within the same batch, measuring the similarity between any couple of trajectories is permitted, although comparing trajectories from different batches is not possible. A modified version of the triplet loss is optimised to construct a representation of the trajectories under which rewards become Markovian.

Vacant Parking Space Detection Based on Task Consistency and Reinforcement Learning

Manh Hung Nguyen, Tzu-Yin Chao, Ching-Chun Huang

Responsive image

Auto-TLDR; Vacant Space Detection via Semantic Consistency Learning

Slides Poster Similar

In this paper, we proposed a novel task-consistency learning method that allows training a vacant space detection network (target task) based on the logistic consistency with the semantic outcomes from a naive flow-based motion behavior classifier (source task) in a parking lot. By well designing the reward mechanism upon semantic consistency, we show the possibility to train the target network in a reinforcement learning setting. Compared with conventional supervised detection methods, the major contribution of this work is to learn a vacant space detector via semantic consistency rather than supervised labels. The dynamic learning property may make the proposed detector been deployed in different lots easily without heavy training loads. The experiments show that based on the task consistency rewards from the motion behavior classifier, the vacant space detector can be trained successfully.

Explore and Explain: Self-Supervised Navigation and Recounting

Roberto Bigazzi, Federico Landi, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara

Responsive image

Auto-TLDR; Exploring a Photorealistic Environment for Explanation and Navigation

Slides Similar

Embodied AI has been recently gaining attention as it aims to foster the development of autonomous and intelligent agents. In this paper, we devise a novel embodied setting in which an agent needs to explore a previously unknown environment while recounting what it sees during the path. In this context, the agent needs to navigate the environment driven by an exploration goal, select proper moments for description, and output natural language descriptions of relevant objects and scenes. Our model integrates a novel self-supervised exploration module with penalty, and a fully-attentive captioning model for explanation. Also, we investigate different policies for selecting proper moments for explanation, driven by information coming from both the environment and the navigation. Experiments are conducted on photorealistic environments from the Matterport3D dataset and investigate the navigation and explanation capabilities of the agent as well as the role of their interactions.

A Multilinear Sampling Algorithm to Estimate Shapley Values

Ramin Okhrati, Aldo Lipani

Responsive image

Auto-TLDR; A sampling method for Shapley values for multilayer Perceptrons

Slides Poster Similar

Shapley values are great analytical tools in game theory to measure the importance of a player in a game. Due to their axiomatic and desirable properties such as efficiency, they have become popular for feature importance analysis in data science and machine learning. However, the time complexity to compute Shapley values based on the original formula is exponential, and as the number of features increases, this becomes infeasible. Castro et al. [1] developed a sampling algorithm, to estimate Shapley values. In this work, we propose a new sampling method based on a multilinear extension technique as applied in game theory. The aim is to provide a more efficient (sampling) method for estimating Shapley values. Our method is applicable to any machine learning model, in particular for either multiclass classifications or regression problems. We apply the method to estimate Shapley values for multilayer Perceptrons (MLPs) and through experimentation on two datasets, we demonstrate that our method provides more accurate estimations of the Shapley values by reducing the variance of the sampling statistics

A Novel Actor Dual-Critic Model for Remote Sensing Image Captioning

Ruchika Chavhan, Biplab Banerjee, Xiao Xiang Zhu, Subhasis Chaudhuri

Responsive image

Auto-TLDR; Actor Dual-Critic Training for Remote Sensing Image Captioning Using Deep Reinforcement Learning

Slides Poster Similar

We deal with the problem of generating textual captions from optical remote sensing (RS) images using the notion of deep reinforcement learning. Due to the high inter-class similarity in reference sentences describing remote sensing data, jointly encoding the sentences and images encourages prediction of captions that are semantically more precise than the ground truth in many cases. To this end, we introduce an Actor Dual-Critic training strategy where a second critic model is deployed in the form of an encoder-decoder RNN to encode the latent information corresponding to the original and generated captions. While all actor-critic methods use an actor to predict sentences for an image and a critic to provide rewards, our proposed encoder-decoder RNN guarantees high-level comprehension of images by sentence-to-image translation. We observe that the proposed model generates sentences on the test data highly similar to the ground truth and is successful in generating even better captions in many critical cases. Extensive experiments on the benchmark Remote Sensing Image Captioning Dataset (RSICD) and the UCM-captions dataset confirm the superiority of the proposed approach in comparison to the previous state-of-the-art where we obtain a gain of sharp increments in both the ROUGE-L and CIDEr measures.

Adaptive Remote Sensing Image Attribute Learning for Active Object Detection

Nuo Xu, Chunlei Huo, Chunhong Pan

Responsive image

Auto-TLDR; Adaptive Image Attribute Learning for Active Object Detection

Slides Similar

In recent years, deep learning methods bring incredible progress to the field of object detection. However, in the field of remote sensing image processing, existing methods neglect the relationship between imaging configuration and detection performance, and do not take into account the importance of detection performance feedback for improving image quality. Therefore, detection performance is limited by the passive nature of the conventional object detection framework. In order to solve the above limitations, this paper takes adaptive brightness adjustment and scale adjustment as examples, and proposes an active object detection method based on deep reinforcement learning. The goal of adaptive image attribute learning is to maximize the detection performance. With the help of active object detection and image attribute adjustment strategies, low-quality images can be converted into high-quality images, and the overall performance is improved without retraining the detector.

ActionSpotter: Deep Reinforcement Learning Framework for Temporal Action Spotting in Videos

Guillaume Vaudaux-Ruth, Adrien Chan-Hon-Tong, Catherine Achard

Responsive image

Auto-TLDR; ActionSpotter: A Reinforcement Learning Algorithm for Action Spotting in Video

Slides Poster Similar

Action spotting has recently been proposed as an alternative to action detection and key frame extraction. However, the current state-of-the-art method of action spotting requires an expensive ground truth composed of the search sequences employed by human annotators spotting actions - a critical limitation. In this article, we propose to use a reinforcement learning algorithm to perform efficient action spotting using only the temporal segments from the action detection annotations, thus opening an interesting solution for video understanding. Experiments performed on THUMOS14 and ActivityNet datasets show that the proposed method, named ActionSpotter, leads to good results and outperforms state-of-the-art detection outputs redrawn for this application. In particular, the spotting mean Average Precision on THUMOS14 is significantly improved from 59.7% to 65.6% while skipping 23% of video.

AOAM: Automatic Optimization of Adjacency Matrix for Graph Convolutional Network

Yuhang Zhang, Hongshuai Ren, Jiexia Ye, Xitong Gao, Yang Wang, Kejiang Ye, Cheng-Zhong Xu

Responsive image

Auto-TLDR; Adjacency Matrix for Graph Convolutional Network in Non-Euclidean Space

Slides Poster Similar

Graph Convolutional Network (GCN) is adopted to tackle the problem of the convolution operation in non-Euclidean space. Although previous works on GCN have made some progress, one of their limitations is that their input Adjacency Matrix (AM) is designed manually and requires domain knowledge, which is cumbersome, tedious and error-prone. In addition, entries of this fixed Adjacency Matrix are generally designed as binary values (i.e., ones and zeros) which can not reflect more complex relationship between nodes. However, many applications require a weighted and dynamic Adjacency Matrix instead of an unweighted and fixed Adjacency Matrix. To this end, there are few works focusing on designing a more flexible Adjacency Matrix. In this paper, we propose an end-to-end algorithm to improve the GCN performance by focusing on the Adjacency Matrix. We first provide a calculation method that called node information entropy to update the matrix. Then, we analyze the search strategy in a continuous space and introduce the Deep Deterministic Policy Gradient (DDPG) method to overcome the demerit of the discrete space search. Finally, we integrate the GCN and reinforcement learning into an end-to-end framework. Our method can automatically define the adjacency matrix without artificial knowledge. At the same time, the proposed approach can deal with any size of the matrix and provide a better value for the network. Four popular datasets are selected to evaluate the capability of our algorithm. The method in this paper achieves the state-of-the-art performance on Cora and Pubmed datasets, respectively, with the accuracy of 84.6% and 81.6%.

On Embodied Visual Navigation in Real Environments through Habitat

Marco Rosano, Antonino Furnari, Luigi Gulino, Giovanni Maria Farinella

Responsive image

Auto-TLDR; Learning Navigation Policies on Real World Observations using Real World Images and Sensor and Actuation Noise

Slides Poster Similar

Visual navigation models based on deep learning can learn effective policies when trained on large amounts of visual observations through reinforcement learning. Unfortunately, collecting the required experience deploying a robotic platform in the real world is expensive and time-consuming. To deal with this limitation, several simulation platforms have been proposed in order to train visual navigation policies on virtual environments efficiently. Despite the advantages they offer, simulators present a limited realism in terms of appearance and physical dynamics, leading to navigation policies that do not generalize in the real world. In this paper, we propose a tool based on the Habitat simulator which exploits real world images of the environment, together with sensor and actuator noise models, to produce more realistic navigation episodes. We perform a range of experiments using virtual, real and images transformed with a simple domain adaptation approach. We also assess the impact of sensor and actuation noise on the navigation performance and investigate whether they allow to learn more robust navigation policies. We show that our tool can effectively help to train and evaluate navigation policies on real world observations without running navigation episodes in the real world.

RLST: A Reinforcement Learning Approach to Scene Text Detection Refinement

Xuan Peng, Zheng Huang, Kai Chen, Jie Guo, Weidong Qiu

Responsive image

Auto-TLDR; Saccadic Eye Movements and Peripheral Vision for Scene Text Detection using Reinforcement Learning

Slides Poster Similar

Within the research of scene text detection, some previous work has already achieved significant accuracy and efficiency. However, most of the work was generally done without considering about the implicit relationship between detection and eye movements. In this paper, we propose a new method for scene text detection especially for its refinement based on reinforcement learning. The idea of this method is inspired by Saccadic Eye Movements and Peripheral Vision. A saccade makes it possible for humans to orient the gaze to the location where a visual object has appeared. Peripheral vision gathers visual information of surroundings which provides supplement to foveal vision during gazing. We propose a simple pipeline, imitating the way human eyes do a saccade and collect peripheral information, to locate scene text roughly and to refine multi-scale vision field iteratively using reinforcement learning. For both training and evaluation, we use ICDAR2015 Challenge 4 dataset as a base and design several criteria to measure the feasibility of our work.

DAG-Net: Double Attentive Graph Neural Network for Trajectory Forecasting

Alessio Monti, Alessia Bertugli, Simone Calderara, Rita Cucchiara

Responsive image

Auto-TLDR; Recurrent Generative Model for Multi-modal Human Motion Behaviour in Urban Environments

Slides Poster Similar

Understanding human motion behaviour is a critical task for several possible applications like self-driving cars or social robots, and in general for all those settings where an autonomous agent has to navigate inside a human-centric environment. This is non-trivial because human motion is inherently multi-modal: given a history of human motion paths, there are many plausible ways by which people could move in the future. Additionally, people activities are often driven by goals, e.g. reaching particular locations or interacting with the environment. We address both the aforementioned aspects by proposing a new recurrent generative model that considers both single agents’ future goals and interactions between different agents. The model exploits a double attention-based graph neural network to collect information about the mutual influences among different agents and integrates it with data about agents’ possible future objectives. Our proposal is general enough to be applied in different scenarios: the model achieves state-of-the-art results in both urban environments and also in sports applications.

Visual Object Tracking in Drone Images with Deep Reinforcement Learning

Derya Gözen, Sedat Ozer

Responsive image

Auto-TLDR; A Deep Reinforcement Learning based Single Object Tracker for Drone Applications

Slides Poster Similar

There is an increasing demand on utilizing camera equipped drones and their applications in many domains varying from agriculture to entertainment and from sports events to surveillance. In such drone applications, an essential and a common task is tracking an object of interest visually. Drone (or UAV) images have different properties when compared to the ground taken (natural) images and those differences introduce additional complexities to the existing object trackers to be directly applied on drone applications. Some important differences among those complexities include (i) smaller object sizes to be tracked and (ii) different orientations and viewing angles yielding different texture and features to be observed. Therefore, new algorithms trained on drone images are needed for the drone-based applications. In this paper, we introduce a deep reinforcement learning (RL) based single object tracker that tracks an object of interest in drone images by estimating a series of actions to find the location of the object in the next frame. This is the first work introducing a single object tracker using a deep RL-based technique for drone images. Our proposed solution introduces a novel reward function that aims to reduce the total number of actions taken to estimate the object's location in the next frame and also introduces a different backbone network to be used on low resolution images. Additionally, we introduce a set of new actions into the action library to better deal with the above-mentioned complexities. We compare our proposed solutions to a state of the art tracking algorithm from the recent literature and demonstrate up to 3.87\% improvement in precision and 3.6\% improvement in IoU values on the VisDrone2019 dataset. We also provide additional results on OTB-100 dataset and show up to 3.15\% improvement in precision on the OTB-100 dataset when compared to the same previous state of the art algorithm. Lastly, we analyze the ability to handle some of the challenges faced during tracking, including but not limited to occlusion, deformation, and scale variation for our proposed solutions.

An Intransitivity Model for Matchup and Pairwise Comparison

Yan Gu, Jiuding Duan, Hisashi Kashima

Responsive image

Auto-TLDR; Blade-Chest: A Low-Rank Matrix Approach for Probabilistic Ranking of Players

Slides Poster Similar

Ranking is a ubiquitous problem appearing in many real-world applications. The superior players or objects are oftentimes determined by a matchup or pairwise comparison. Various models have been developed to integrate the matchup results into a single ranking list of players and to further predict the results of future matchups. Amongst them, the Bradley-Terry model is a mainstream model that achieves the goals by constructing explicit probabilistic interpretation. However, the model suffers from its strong assumption of transitive relationships and becomes vulnerable in practices where intransitive relationships exist. Blade-Chest model is an alternative solution to this intransitivity challenge by allowing the multi-dimensional representation of players. In this paper, we propose a low-rank matrix approach to characterize all players and generalize the related works by introducing a unified framework. Our experimental results on synthetic datasets and real-world datasets show that the proposed model is stably competitive with the standard models in terms of the consistency of probabilistic model interpretation and the predictive performance in out-of-sample tests.

ILS-SUMM: Iterated Local Search for Unsupervised Video Summarization

Yair Shemer, Daniel Rotman, Nahum Shimkin

Responsive image

Auto-TLDR; ILS-SUMM: Iterated Local Search for Video Summarization

Slides Similar

In recent years, there has been an increasing interest in building video summarization tools, where the goal is to automatically create a short summary of an input video that properly represents the original content. We consider shot-based video summarization where the summary consists of a subset of the video shots which can be of various lengths. A straightforward approach to maximize the representativeness of a subset of shots is by minimizing the total distance between shots and their nearest selected shots. We formulate the task of video summarization as an optimization problem with a knapsack-like constraint on the total summary duration. Previous studies have proposed greedy algorithms to solve this problem approximately, but no experiments were presented to measure the ability of these methods to obtain solutions with low total distance. Indeed, our experiments on video summarization datasets show that the success of current methods in obtaining results with low total distance still has much room for improvement. In this paper, we develop ILS-SUMM, a novel video summarization algorithm to solve the subset selection problem under the knapsack constraint. Our algorithm is based on the well-known metaheuristic optimization framework -- Iterated Local Search (ILS), known for its ability to avoid weak local minima and obtain a good near-global minimum. Extensive experiments show that our method finds solutions with significantly better total distance than previous methods. Moreover, to indicate the high scalability of ILS-SUMM, we introduce a new dataset consisting of videos of various lengths.

Creating Classifier Ensembles through Meta-Heuristic Algorithms for Aerial Scene Classification

Álvaro Roberto Ferreira Jr., Gustavo Gustavo Henrique De Rosa, Joao Paulo Papa, Gustavo Carneiro, Fabio Augusto Faria

Responsive image

Auto-TLDR; Univariate Marginal Distribution Algorithm for Aerial Scene Classification Using Meta-Heuristic Optimization

Slides Poster Similar

Aerial scene classification is a challenging task to be solved in the remote sensing area, whereas deep learning approaches, such as Convolutional Neural Networks (CNN), are being widely employed to overcome such a problem. Nevertheless, it is not straightforward to find single CNN models that can solve all aerial scene classification tasks, allowing the nurturing of a better alternative, which is to fuse CNN-based classifiers into an ensemble. However, an appropriate choice of the classifiers that will belong to the ensemble is a critical factor, as it is unfeasible to employ all the possible classifiers in the literature. Therefore, this work proposes a novel framework based on meta-heuristic optimization for creating optimized-ensembles in the context of aerial scene classification. The experimental results were performed across nine meta-heuristic algorithms and three aerial scene literature datasets, being compared in terms of effectiveness (accuracy), efficiency (execution time), and behavioral performance in different scenarios. Finally, one can observe that the Univariate Marginal Distribution Algorithm (UMDA) overcame popular literature meta-heuristic algorithms, such as Genetic Programming and Particle Swarm Optimization considering the adopted criteria in the performed experiments.

RNN Training along Locally Optimal Trajectories via Frank-Wolfe Algorithm

Yun Yue, Ming Li, Venkatesh Saligrama, Ziming Zhang

Responsive image

Auto-TLDR; Frank-Wolfe Algorithm for Efficient Training of RNNs

Slides Poster Similar

We propose a novel and efficient training method for RNNs by iteratively seeking a local minima on the loss surface within a small region, and leverage this directional vector for the update, in an outer-loop. We propose to utilize the Frank-Wolfe (FW) algorithm in this context. Although, FW implicitly involves normalized gradients, which can lead to a slow convergence rate, we develop a novel RNN training method that, surprisingly, even with the additional cost, the overall training cost is empirically observed to be lower than back-propagation. Our method leads to a new Frank-Wolfe method, that is in essence an SGD algorithm with a restart scheme. We prove that under certain conditions our algorithm has a sublinear convergence rate of $O(1/\epsilon)$ for $\epsilon$ error. We then conduct empirical experiments on several benchmark datasets including those that exhibit long-term dependencies, and show significant performance improvement. We also experiment with deep RNN architectures and show efficient training performance. Finally, we demonstrate that our training method is robust to noisy data.

Low-Cost Lipschitz-Independent Adaptive Importance Sampling of Stochastic Gradients

Huikang Liu, Xiaolu Wang, Jiajin Li, Man-Cho Anthony So

Responsive image

Auto-TLDR; Adaptive Importance Sampling for Stochastic Gradient Descent

Slides Similar

Stochastic gradient descent (SGD) usually samples training data based on the uniform distribution, which may not be a good choice because of the high variance of its stochastic gradient. Thus, importance sampling methods are considered in the literature to improve the performance. Most previous work on SGD-based methods with importance sampling requires the knowledge of Lipschitz constants of all component gradients, which are in general difficult to estimate. In this paper, we study an adaptive importance sampling method for common SGD-based methods by exploiting the local first-order information without knowing any Lipschitz constants. In particular, we periodically changes the sampling distribution by only utilizing the gradient norms in the past few iterations. We prove that our adaptive importance sampling non-asymptotically reduces the variance of the stochastic gradients in SGD, and thus better convergence bounds than that for vanilla SGD can be obtained. We extend this sampling method to several other widely used stochastic gradient algorithms including SGD with momentum and ADAM. Experiments on common convex learning problems and deep neural networks illustrate notably enhanced performance using the adaptive sampling strategy.

Improving Visual Question Answering Using Active Perception on Static Images

Theodoros Bozinis, Nikolaos Passalis, Anastasios Tefas

Responsive image

Auto-TLDR; Fine-Grained Visual Question Answering with Reinforcement Learning-based Active Perception

Slides Poster Similar

Visual Question Answering (VQA) is one of the most challenging emerging applications of deep learning. Providing powerful attention mechanisms is crucial for VQA, since the model must correctly identify the region of an image that is relevant to the question at hand. However, existing models analyze the input images at a fixed and typically small resolution, often leading to discarding valuable fine-grained details. To overcome this limitation, in this work we propose a reinforcement learning-based active perception approach that works by applying a series of transformation operations on the images (translation, zoom) in order to facilitate answering the question at hand. This allows for performing fine-grained analysis, effectively increasing the resolution at which the models process information. The proposed method is orthogonal to existing attention mechanisms and it can be combined with most existing VQA methods. The effectiveness of the proposed method is experimentally demonstrated on a challenging VQA dataset.

Unveiling Groups of Related Tasks in Multi-Task Learning

Jordan Frecon, Saverio Salzo, Massimiliano Pontil

Responsive image

Auto-TLDR; Continuous Bilevel Optimization for Multi-Task Learning

Slides Poster Similar

A common approach in multi-task learning is to encourage the tasks to share a low dimensional representation. This has led to the popular method of trace norm regularization, which has proved effective in many applications. In this paper, we extend this approach by allowing the tasks to partition into different groups, within which trace norm regularization is separately applied. We propose a continuous bilevel optimization framework to simultaneously identify groups of related tasks and learn a low dimensional representation within each group. Hinging on recent results on the derivative of generalized matrix functions, we devise a smooth approximation of the upper-level objective via a dual forward-backward algorithm with Bregman distances. This allows us to solve the bilevel problem by a gradient-based scheme. Numerical experiments on synthetic and benchmark datasets support the effectiveness of the proposed method.

Switching Dynamical Systems with Deep Neural Networks

Cesar Ali Ojeda Marin, Kostadin Cvejoski, Bogdan Georgiev, Ramses J. Sanchez

Responsive image

Auto-TLDR; Variational RNN for Switching Dynamics

Slides Poster Similar

The problem of uncovering different dynamicalregimes is of pivotal importance in time series analysis. Switchingdynamical systems provide a solution for modeling physical phe-nomena whose time series data exhibit different dynamical modes.In this work we propose a novel variational RNN model forswitching dynamics allowing for both non-Markovian and non-linear dynamical behavior between and within dynamic modes.Attention mechanisms are provided to inform the switchingdistribution. We evaluate our model on synthetic and empiricaldatasets of diverse nature and successfully uncover differentdynamical regimes and predict the switching dynamics.

Leveraging Sequential Pattern Information for Active Learning from Sequential Data

Raul Fidalgo-Merino, Lorenzo Gabrielli, Enrico Checchi

Responsive image

Auto-TLDR; Sequential Pattern Information for Active Learning

Slides Poster Similar

This paper presents a novel active learning technique aimed at the selection of sequences for manual annotation from a database of unlabelled sequences. Supervised machine learning algorithms can employ these sequences to build better models than those based on using random sequences for training. The main contribution of the proposed method is the use of sequential pattern information contained in the database to select representative and diverse sequences for annotation. These two characteristics ensure the proper coverage of the instance space of sequences and, at the same time, avoids over-fitting the trained model. The approach, called SPIAL (Sequential Pattern Information for Active Learning), uses sequential pattern mining algorithms to extract frequently occurring sub-sequences from the database and evaluates how representative and diverse each sequence is, based on this information. The output is a list of sequences for annotation sorted by representativeness and diversity. The algorithm is modular and, unlike current techniques, independent of the features taken into account by the machine learning algorithm that trains the model. Experiments done on well-known benchmarks involving sequential data show that the models trained using SPIAL increase their convergence speed while reducing manual effort by selecting small sets of very informative sequences for annotation. In addition, the computation cost using SPIAL is much lower than for the state-of-the-art algorithms evaluated.

Deep Next-Best-View Planner for Cross-Season Visual Route Classification

Kurauchi Kanya, Kanji Tanaka

Responsive image

Auto-TLDR; Active Visual Place Recognition using Deep Convolutional Neural Network

Slides Poster Similar

This paper addresses the problem of active visual place recognition (VPR) from a novel perspective of long-term autonomy. In our approach, a next-best-view (NBV) planner plans an optimal action-observation-sequence to maximize the expected cost-performance for a visual route classification task. A difficulty arises from the fact that the NBV planner is trained and tested in different domains (times of day, weather conditions, and seasons). Existing NBV methods may be confused and deteriorated by the domain-shifts, and require significant efforts for adapting them to a new domain. We address this issue by a novel deep convolutional neural network (DNN) -based NBV planner that does not require the adaptation. Our main contributions in this paper are summarized as follows: (1) We present a novel domain-invariant NBV planner that is specifically tailored for DNN-based VPR. (2) We formulate the active VPR as a POMDP problem and present a feasible solution to address the inherent intractability. Specifically, the probability distribution vector (PDV) output by the available DNN is used as a domain-invariant observation model without the need to retrain it. (3) We verify efficacy of the proposed approach through challenging cross-season VPR experiments, where it is confirmed that the proposed approach clearly outperforms the previous single-view-based or multi-view-based VPR in terms of VPR accuracy and/or action-observation-cost.

Recurrent Deep Attention Network for Person Re-Identification

Changhao Wang, Jun Zhou, Xianfei Duan, Guanwen Zhang, Wei Zhou

Responsive image

Auto-TLDR; Recurrent Deep Attention Network for Person Re-identification

Slides Poster Similar

Person re-identification (re-id) is an important task in video surveillance. It is challenging due to the appearance of person varying a wide range acrossnon-overlapping camera views. Recent years, attention-based models are introduced to learn discriminative representation. In this paper, we consider the attention selection in a natural way as like human moving attention on different parts of the visual field for person re-id. In concrete, we propose a Recurrent Deep Attention Network (RDAN) with an attention selection mechanism based on reinforcement learning. The RDAN aims to adaptively observe the identity-sensitive regions to build up the representation of individuals step by step. Extensive experiments on three person re-id benchmarks Market-1501, DukeMTMC-reID and CUHK03-NP demonstrate the proposed method can achieve competitive performance.

Hierarchical Multimodal Attention for Deep Video Summarization

Melissa Sanabria, Frederic Precioso, Thomas Menguy

Responsive image

Auto-TLDR; Automatic Summarization of Professional Soccer Matches Using Event-Stream Data and Multi- Instance Learning

Slides Poster Similar

The way people consume sports on TV has drastically evolved in the last years, particularly under the combined effects of the legalization of sport betting and the huge increase of sport analytics. Several companies are nowadays sending observers in the stadiums to collect live data of all the events happening on the field during the match. Those data contain meaningful information providing a very detailed description of all the actions occurring during the match to feed the coaches and staff, the fans, the viewers, and the gamblers. Exploiting all these data, sport broadcasters want to generate extra content such as match highlights, match summaries, players and teams analytics, etc., to appeal subscribers. This paper explores the problem of summarizing professional soccer matches as automatically as possible using both the aforementioned event-stream data collected from the field and the content broadcasted on TV. We have designed an architecture, introducing first (1) a Multiple Instance Learning method that takes into account the sequential dependency among events and then (2) a hierarchical multimodal attention layer that grasps the importance of each event in an action. We evaluate our approach on matches from two professional European soccer leagues, showing its capability to identify the best actions for automatic summarization by comparing with real summaries made by human operators.

Learning Stable Deep Predictive Coding Networks with Weight Norm Supervision

Guo Ruohao

Responsive image

Auto-TLDR; Stability of Predictive Coding Network with Weight Norm Supervision

Slides Poster Similar

Predictive Coding Network (PCN) is an important neural network inspired by visual processing models in neuroscience. It combines the feedforward and feedback processing and has the architecture of recurrent neural networks (RNNs). This type of network is usually trained with backpropagation through time (BPTT). With infinite recurrent steps, PCN is a dynamic system. However, as one of the most important properties, stability is rarely studied in this type of network. Inspired by reservoir computing, we investigate the stability of hierarchical RNNs from the perspective of dynamic systems, and propose a sufficient condition for their echo state property (ESP). Our study shows the global stability is determined by stability of the local layers and the feedback between neighboring layers. Based on it, we further propose Weight Norm Supervision, a new algorithm that controls the stability of PCN dynamics by imposing different weight norm constraints on different parts of the network. We compare our approach with other training methods in terms of stability and prediction capability. The experiments show that our algorithm learns stable PCNs with a reliable prediction precision in the most effective and controllable way.

Uniform and Non-Uniform Sampling Methods for Sub-Linear Time K-Means Clustering

Yuanhang Ren, Ye Du

Responsive image

Auto-TLDR; Sub-linear Time Clustering with Constant Approximation Ratio for K-Means Problem

Slides Poster Similar

The $k$-means problem is arguably the most well-known clustering problem in machine learning, and lots of approximation algorithms have been proposed for it. However, many of these algorithms may become infeasible when data is huge. Sub-linear time algorithms with constant approximation ratios are desirable in this scenario. In this paper, we first improve the analysis of the algorithm proposed by \cite{Mohan:2017:BNA:3172077.3172235} by sharpening the approximation ratio from $4(\alpha+\beta)$ to $\alpha+\beta$. Moreover, on mild assumptions of the data, a constant approximation ratio can be achieved in poly-logarithmic time by the algorithm. Furthermore, we propose a novel sub-linear time clustering algorithm called {\it Double-K-M$\text{C}^2$ sampling} as well. Experiments on the data clustering task and the image segmentation task have validated the effectiveness of our algorithms.

AG-GAN: An Attentive Group-Aware GAN for Pedestrian Trajectory Prediction

Yue Song, Niccolò Bisagno, Syed Zohaib Hassan, Nicola Conci

Responsive image

Auto-TLDR; An attentive group-aware GAN for motion prediction in crowded scenarios

Slides Poster Similar

Understanding human behaviors in crowded scenarios requires analyzing not only the position of the subjects in space, but also the scene context. Existing approaches mostly rely on the motion history of each pedestrian and model the interactions among people by considering the entire surrounding neighborhood. In our approach, we address the problem of motion prediction by applying coherent group clustering and a global attention mechanism on the LSTM-based Generative Adversarial Networks (GANs). The proposed model consists of an attentive group-aware GAN that observes the agents' past motion and predicts future paths, using (i) a group pooling module to model neighborhood interaction, and (ii) an attention module to specifically focus on hidden states. The experimental results demonstrate that our proposal outperforms state-of-the-art models on common benchmark datasets, and is able to generate socially-acceptable trajectories.

A Heuristic-Based Decision Tree for Connected Components Labeling of 3D Volumes

Maximilian Söchting, Stefano Allegretti, Federico Bolelli, Costantino Grana

Responsive image

Auto-TLDR; Entropy Partitioning Decision Tree for Connected Components Labeling

Slides Poster Similar

Connected Components Labeling represents a fundamental step for many Computer Vision and Image Processing pipelines. Since the first appearance of the task in the sixties, many algorithmic solutions to optimize the computational load needed to label an image have been proposed. Among them, block-based scan approaches and decision trees revealed to be some of the most valuable strategies. However, due to the cost of the manual construction of optimal decision trees and the computational limitations of automatic strategies employed in the past, the application of blocks and decision trees has been restricted to small masks, and thus to 2D algorithms. With this paper we present a novel heuristic algorithm based on decision tree learning methodology, called Entropy Partitioning Decision Tree (EPDT). It allows to compute near-optimal decision trees for large scan masks. Experimental results demonstrate that algorithms based on the generated decision trees outperform state-of-the-art competitors.

Aggregating Dependent Gaussian Experts in Local Approximation

Hamed Jalali, Gjergji Kasneci

Responsive image

Auto-TLDR; A novel approach for aggregating the Gaussian experts by detecting strong violations of conditional independence

Slides Poster Similar

Distributed Gaussian processes (DGPs) are prominent local approximation methods to scale Gaussian processes (GPs) to large datasets. Instead of a global estimation, they train local experts by dividing the training set into subsets, thus reducing the time complexity. This strategy is based on the conditional independence assumption, which basically means that there is a perfect diversity between the local experts. In practice, however, this assumption is often violated, and the aggregation of experts leads to sub-optimal and inconsistent solutions. In this paper, we propose a novel approach for aggregating the Gaussian experts by detecting strong violations of conditional independence. The dependency between experts is determined by using a Gaussian graphical model, which yields the precision matrix. The precision matrix encodes conditional dependencies between experts and is used to detect strongly dependent experts and construct an improved aggregation. Using both synthetic and real datasets, our experimental evaluations illustrate that our new method outperforms other state-of-the-art (SOTA) DGP approaches while being substantially more time-efficient than SOTA approaches, which build on independent experts.

Augmented Bi-Path Network for Few-Shot Learning

Baoming Yan, Chen Zhou, Bo Zhao, Kan Guo, Yang Jiang, Xiaobo Li, Zhang Ming, Yizhou Wang

Responsive image

Auto-TLDR; Augmented Bi-path Network for Few-shot Learning

Slides Poster Similar

Few-shot Learning (FSL) which aims to learn from few labeled training data is becoming a popular research topic, due to the expensive labeling cost in many real-world applications. One kind of successful FSL method learns to compare the testing (query) image and training (support) image by simply concatenating the features of two images and feeding it into the neural network. However, with few labeled data in each class, the neural network has difficulty in learning or comparing the local features of two images. Such simple image-level comparison may cause serious mis-classification. To solve this problem, we propose Augmented Bi-path Network (ABNet) for learning to compare both global and local features on multi-scales. Specifically, the salient patches are extracted and embedded as the local features for every image. Then, the model learns to augment the features for better robustness. Finally, the model learns to compare global and local features separately, \emph{i.e.}, in two paths, before merging the similarities. Extensive experiments show that the proposed ABNet outperforms the state-of-the-art methods. Both quantitative and visual ablation studies are provided to verify that the proposed modules lead to more precise comparison results.

Multiple Future Prediction Leveraging Synthetic Trajectories

Lorenzo Berlincioni, Federico Becattini, Lorenzo Seidenari, Alberto Del Bimbo

Responsive image

Auto-TLDR; Synthetic Trajectory Prediction using Markov Chains

Slides Poster Similar

Trajectory prediction is an important task, especially in autonomous driving. The ability to forecast the position of other moving agents can yield to an effective planning, ensuring safety for the autonomous vehicle as well for the observed entities. In this work we propose a data driven approach based on Markov Chains to generate synthetic trajectories, which are useful for training a multiple future trajectory predictor. The advantages are twofold: on the one hand synthetic samples can be used to augment existing datasets and train more effective predictors; on the other hand, it allows to generate samples with multiple ground truths, corresponding to diverse equally likely outcomes of the observed trajectory. We define a trajectory prediction model and a loss that explicitly address the multimodality of the problem and we show that combining synthetic and real data leads to prediction improvements, obtaining state of the art results.

Automatically Mining Relevant Variable Interactions Via Sparse Bayesian Learning

Ryoichiro Yafune, Daisuke Sakuma, Yasuo Tabei, Noritaka Saito, Hiroto Saigo

Responsive image

Auto-TLDR; Sparse Bayes for Interpretable Non-linear Prediction

Slides Poster Similar

With the rapid increase in the availability of large amount of data, prediction is becoming increasingly popular, and has widespread through our daily life. However, powerful non- linear prediction methods such as deep learning and SVM suffer from interpretability problem, making it hard to use in domains where the reason for decision making is required. In this paper, we develop an interpretable non-linear model called itemset Sparse Bayes (iSB), which builds a Bayesian probabilistic model, while simultaneously considering variable interactions. In order to suppress the resulting large number of variables, sparsity is imposed on regression weights by a sparsity inducing prior. As a subroutine to search for variable interactions, itemset enumeration algorithm is employed with a novel bounding condition. In computational experiments using real-world dataset, the proposed method performed better than decision tree by 10% in terms of r-squared . We also demonstrated the advantage of our method in Bayesian optimization setting, in which the proposed approach could successfully find the maximum of an unknown function faster than Gaussian process. The interpretability of iSB is naturally inherited to Bayesian optimization, thereby gives us a clue to understand which variables interactions are important in optimizing an unknown function.

SAILenv: Learning in Virtual Visual Environments Made Simple

Enrico Meloni, Luca Pasqualini, Matteo Tiezzi, Marco Gori, Stefano Melacci

Responsive image

Auto-TLDR; SAILenv: A Simple and Customized Platform for Visual Recognition in Virtual 3D Environment

Slides Poster Similar

Recently, researchers in Machine Learning algorithms, Computer Vision scientists, engineers and others, showed a growing interest in 3D simulators as a mean to artificially create experimental settings that are very close to those in the real world. However, most of the existing platforms to interface algorithms with 3D environments are often designed to setup navigation-related experiments, to study physical interactions, or to handle ad-hoc cases that are not thought to be customized, sometimes lacking a strong photorealistic appearance and an easy-to-use software interface. In this paper, we present a novel platform, SAILenv, that is specifically designed to be simple and customizable, and that allows researchers to experiment visual recognition in virtual 3D scenes. A few lines of code are needed to interface every algorithm with the virtual world, and non-3D-graphics experts can easily customize the 3D environment itself, exploiting a collection of photorealistic objects. Our framework yields pixel-level semantic and instance labeling, depth, and, to the best of our knowledge, it is the only one that provides motion-related information directly inherited from the 3D engine. The client-server communication operates at a low level, avoiding the overhead of HTTP-based data exchanges. We perform experiments using a state-of-the-art object detector trained on real-world images, showing that it is able to recognize the photorealistic 3D objects of our environment. The computational burden of the optical flow compares favourably with the estimation performed using modern GPU-based convolutional networks or more classic implementations. We believe that the scientific community will benefit from the easiness and high-quality of our framework to evaluate newly proposed algorithms in their own customized realistic conditions.

Progressive Learning Algorithm for Efficient Person Re-Identification

Zhen Li, Hanyang Shao, Liang Niu, Nian Xue

Responsive image

Auto-TLDR; Progressive Learning Algorithm for Large-Scale Person Re-Identification

Slides Poster Similar

This paper studies the problem of Person Re-Identification (ReID) for large-scale applications. Recent research efforts have been devoted to building complicated part models, which introduce considerably high computational cost and memory consumption, inhibiting its practicability in large-scale applications. This paper aims to develop a novel learning strategy to find efficient feature embeddings while maintaining the balance of accuracy and model complexity. More specifically, we find by enhancing the classical triplet loss together with cross-entropy loss, our method can explore the hard examples and build a discriminant feature embedding yet compact enough for large-scale applications. Our method is carried out progressively using Bayesian optimization, and we call it the Progressive Learning Algorithm (PLA). Extensive experiments on three large-scale datasets show that our PLA is comparable or better than the state-of-the-arts. Especially, on the challenging Market-1501 dataset, we achieve Rank-1=94.7\%/mAP=89.4\% while saving at least 30\% parameters than strong part models.

Active Sampling for Pairwise Comparisons via Approximate Message Passing and Information Gain Maximization

Aliaksei Mikhailiuk, Clifford Wilmot, Maria Perez-Ortiz, Dingcheng Yue, Rafal Mantiuk

Responsive image

Auto-TLDR; ASAP: An Active Sampling Algorithm for Pairwise Comparison Data

Slides Similar

Pairwise comparison data arise in many domains with subjective assessment experiments, for example in image and video quality assessment. In these experiments observers are asked to express a preference between two conditions. However, many pairwise comparison protocols require a large number of comparisons to infer accurate scores, which may be unfeasible when each comparison is time-consuming (e.g. videos) or expensive (e.g. medical imaging). This motivates the use of an active sampling algorithm that chooses only the most informative pairs for comparison. In this paper we propose ASAP, an active sampling algorithm based on approximate message passing and expected information gain maximization. Unlike most existing methods, which rely on partial updates of the posterior distribution, we are able to perform full updates and therefore much improve the accuracy of the inferred scores. The algorithm relies on three techniques for reducing computational cost: inference based on approximate message passing, selective evaluations of the information gain, and selecting pairs in a batch that forms a minimum spanning tree of the inverse of information gain. We demonstrate, with real and synthetic data, that ASAP offers the highest accuracy of inferred scores compared to the existing methods. We also provide an open-source GPU implementation of ASAP for large-scale experiments.

Improving Robotic Grasping on Monocular Images Via Multi-Task Learning and Positional Loss

William Prew, Toby Breckon, Magnus Bordewich, Ulrik Beierholm

Responsive image

Auto-TLDR; Improving grasping performance from monocularcolour images in an end-to-end CNN architecture with multi-task learning

Slides Poster Similar

In this paper we introduce two methods of improv-ing real-time objecting grasping performance from monocularcolour images in an end-to-end CNN architecture. The first isthe addition of an auxiliary task during model training (multi-task learning). Our multi-task CNN model improves graspingperformance from a baseline average of 72.04% to 78.14% onthe large Jacquard grasping dataset when performing a supple-mentary depth reconstruction task. The second is introducinga positional loss function that emphasises loss per pixel forsecondary parameters (gripper angle and width) only on points ofan object where a successful grasp can take place. This increasesperformance from a baseline average of 72.04% to 78.92% aswell as reducing the number of training epochs required. Thesemethods can be also performed in tandem resulting in a furtherperformance increase to 79.12%, while maintaining sufficientinference speed to enable processing at 50FPS

Stochastic Runge-Kutta Methods and Adaptive SGD-G2 Stochastic Gradient Descent

Gabriel Turinici, Imen Ayadi

Responsive image

Auto-TLDR; Adaptive Stochastic Runge Kutta for the Minimization of the Loss Function

Slides Poster Similar

The minimization of the loss function is of paramount importance in deep neural networks. Many popular optimization algorithms have been shown to correspond to some evolution equation of gradient flow type. Inspired by the numerical schemes used for general evolution equations, we introduce a second-order stochastic Runge Kutta method and show that it yields a consistent procedure for the minimization of the loss function. In addition, it can be coupled, in an adaptive framework, with the Stochastic Gradient Descent (SGD) to adjust automatically the learning rate of the SGD The resulting adaptive SGD, called SGD-G2, shows good results in terms of convergence speed when tested on standard data-sets.