ICPR2020 Paper Browser

Prithwijit Guha

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Papers from this author

Multi-Stage Attention Based Visual Question Answering

Aakansha Mishra, Ashish Anand, Prithwijit Guha

Auto-TLDR; Alternative Bi-directional Attention for Visual Question Answering

Abstract Poster Similar

Recent developments in the field of Visual Question Answering (VQA) have witnessed promising improvements in performance through contributions in attention based networks. Most such approaches have focused on unidirectional attention that leverage over attention from textual domain (question) on visual space. These approaches mostly focused on learning high-quality attention in the visual space. In contrast, this work proposes an alternating bi-directional attention framework. First, a question to image attention helps to learn the robust visual space embedding, and second, an image to question attention helps to improve the question embedding. This attention mechanism is realized in an alternating fashion i.e. question-to-image followed by image-to-question and is repeated for maximizing performance. We believe that this process of alternating attention generation helps both the modalities and leads to better representations for the VQA task. This proposal is benchmark on TDIUC dataset and against state-of-art approaches. Our ablation analysis shows that alternate attention is the key to achieve high performance in VQA.

Siamese Fully Convolutional Tracker with Motion Correction

Mathew Francis, Prithwijit Guha

Auto-TLDR; A Siamese Ensemble for Visual Tracking with Appearance and Motion Components

Abstract Slides Poster Similar

Visual tracking algorithms use cues like appearance, structure, motion etc. for locating an object in a video. We propose an ensemble tracker with appearance and motion components. A siamese tracker that learns object appearance from a static image and motion vectors computed between consecutive frames with a flow network forms the ensemble. Motion predicted object localization is used to correct the appearance component in the ensemble. Complementary nature of the components bring performance improvement as observed in experiments performed on VOT2018 and VOT2019 datasets.