Multi-Scale Relational Reasoning with Regional Attention for Visual Question Answering
Auto-TLDR; Question-Guided Relational Reasoning for Visual Question Answering
Similar papers
Multi-Stage Attention Based Visual Question Answering
Aakansha Mishra, Ashish Anand, Prithwijit Guha
Auto-TLDR; Alternative Bi-directional Attention for Visual Question Answering
Question-Agnostic Attention for Visual Question Answering
Moshiur R Farazi, Salman Hameed Khan, Nick Barnes
Auto-TLDR; Question-Agnostic Attention for Visual Question Answering
Abstract Slides Poster Similar
Dual Path Multi-Modal High-Order Features for Textual Content Based Visual Question Answering
Yanan Li, Yuetan Lin, Hongrui Zhao, Donghui Wang
Auto-TLDR; TextVQA: An End-to-End Visual Question Answering Model for Text-Based VQA
Multi-Modal Contextual Graph Neural Network for Text Visual Question Answering
Yaoyuan Liang, Xin Wang, Xuguang Duan, Wenwu Zhu
Auto-TLDR; Multi-modal Contextual Graph Neural Network for Text Visual Question Answering
Abstract Slides Poster Similar
Integrating Historical States and Co-Attention Mechanism for Visual Dialog
Tianling Jiang, Yi Ji, Chunping Liu
Auto-TLDR; Integrating Historical States and Co-attention for Visual Dialog
Abstract Slides Poster Similar
A Novel Attention-Based Aggregation Function to Combine Vision and Language
Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
Auto-TLDR; Fully-Attentive Reduction for Vision and Language
Abstract Slides Poster Similar
Answer-Checking in Context: A Multi-Modal Fully Attention Network for Visual Question Answering
Hantao Huang, Tao Han, Wei Han, Deep Yap Deep Yap, Cheng-Ming Chiang
Auto-TLDR; Fully Attention Based Visual Question Answering
Abstract Slides Poster Similar
Transformer Reasoning Network for Image-Text Matching and Retrieval
Nicola Messina, Fabrizio Falchi, Andrea Esuli, Giuseppe Amato
Auto-TLDR; A Transformer Encoder Reasoning Network for Image-Text Matching in Large-Scale Information Retrieval
Abstract Slides Poster Similar
P ≈ NP, at Least in Visual Question Answering
Shailza Jolly, Sebastian Palacio, Joachim Folz, Federico Raue, Jörn Hees, Andreas Dengel
Auto-TLDR; Polar vs Non-Polar VQA: A Cross-over Analysis of Feature Spaces for Joint Training
Visual Style Extraction from Chart Images for Chart Restyling
Danqing Huang, Jinpeng Wang, Guoxin Wang, Chin-Yew Lin
Auto-TLDR; Exploiting Visual Properties from Reference Chart Images for Chart Restyling
Abstract Slides Poster Similar
Improving Visual Question Answering Using Active Perception on Static Images
Theodoros Bozinis, Nikolaos Passalis, Anastasios Tefas
Auto-TLDR; Fine-Grained Visual Question Answering with Reinforcement Learning-based Active Perception
Abstract Slides Poster Similar
MAGNet: Multi-Region Attention-Assisted Grounding of Natural Language Queries at Phrase Level
Amar Shrestha, Krittaphat Pugdeethosapol, Haowen Fang, Qinru Qiu
Auto-TLDR; MAGNet: A Multi-Region Attention-Aware Grounding Network for Free-form Textual Queries
Abstract Slides Poster Similar
Graph Discovery for Visual Test Generation
Neil Hallonquist, Laurent Younes, Donald Geman
Auto-TLDR; Visual Question Answering over Graphs: A Probabilistic Framework for VQA
Abstract Slides Poster Similar
Reinforcement Learning with Dual Attention Guided Graph Convolution for Relation Extraction
Zhixin Li, Yaru Sun, Suqin Tang, Canlong Zhang, Huifang Ma
Auto-TLDR; Dual Attention Graph Convolutional Network for Relation Extraction
Abstract Slides Poster Similar
Visual Oriented Encoder: Integrating Multimodal and Multi-Scale Contexts for Video Captioning
Auto-TLDR; Visual Oriented Encoder for Video Captioning
Abstract Slides Poster Similar
Improving Visual Relation Detection Using Depth Maps
Sahand Sharifzadeh, Sina Moayed Baharlou, Max Berrendorf, Rajat Koner, Volker Tresp
Auto-TLDR; Exploiting Depth Maps for Visual Relation Detection
Abstract Slides Poster Similar
VSR++: Improving Visual Semantic Reasoning for Fine-Grained Image-Text Matching
Hui Yuan, Yan Huang, Dongbo Zhang, Zerui Chen, Wenlong Cheng, Liang Wang
Auto-TLDR; Improving Visual Semantic Reasoning for Fine-Grained Image-Text Matching
Abstract Slides Poster Similar
Context Visual Information-Based Deliberation Network for Video Captioning
Min Lu, Xueyong Li, Caihua Liu
Auto-TLDR; Context visual information-based deliberation network for video captioning
Abstract Slides Poster Similar
MANet: Multimodal Attention Network Based Point-View Fusion for 3D Shape Recognition
Yaxin Zhao, Jichao Jiao, Ning Li
Auto-TLDR; Fusion Network for 3D Shape Recognition based on Multimodal Attention Mechanism
Abstract Slides Poster Similar
Using Scene Graphs for Detecting Visual Relationships
Anurag Tripathi, Siddharth Srivastava, Brejesh Lall, Santanu Chaudhury
Auto-TLDR; Relationship Detection using Context Aligned Scene Graph Embeddings
Abstract Slides Poster Similar
Two-Level Attention-Based Fusion Learning for RGB-D Face Recognition
Hardik Uppal, Alireza Sepas-Moghaddam, Michael Greenspan, Ali Etemad
Auto-TLDR; Fused RGB-D Facial Recognition using Attention-Aware Feature Fusion
Abstract Slides Poster Similar
Attentive Visual Semantic Specialized Network for Video Captioning
Jesus Perez-Martin, Benjamin Bustos, Jorge Pérez
Auto-TLDR; Adaptive Visual Semantic Specialized Network for Video Captioning
Abstract Slides Poster Similar
PrivAttNet: Predicting Privacy Risks in Images Using Visual Attention
Chen Zhang, Thivya Kandappu, Vigneshwaran Subbaraju
Auto-TLDR; PrivAttNet: A Visual Attention Based Approach for Privacy Sensitivity in Images
Abstract Slides Poster Similar
More Correlations Better Performance: Fully Associative Networks for Multi-Label Image Classification
Auto-TLDR; Fully Associative Network for Fully Exploiting Correlation Information in Multi-Label Classification
Abstract Slides Poster Similar
Zero-Shot Text Classification with Semantically Extended Graph Convolutional Network
Tengfei Liu, Yongli Hu, Junbin Gao, Yanfeng Sun, Baocai Yin
Auto-TLDR; Semantically Extended Graph Convolutional Network for Zero-shot Text Classification
Abstract Slides Poster Similar
ConvMath : A Convolutional Sequence Network for Mathematical Expression Recognition
Zuoyu Yan, Xiaode Zhang, Liangcai Gao, Ke Yuan, Zhi Tang
Auto-TLDR; Convolutional Sequence Modeling for Mathematical Expressions Recognition
Abstract Slides Poster Similar
Object Detection Using Dual Graph Network
Shengjia Chen, Zhixin Li, Feicheng Huang, Canlong Zhang, Huifang Ma
Auto-TLDR; A Graph Convolutional Network for Object Detection with Key Relation Information
From Early Biological Models to CNNs: Do They Look Where Humans Look?
Marinella Iole Cadoni, Andrea Lagorio, Enrico Grosso, Jia Huei Tan, Chee Seng Chan
Auto-TLDR; Comparing Neural Networks to Human Fixations for Semantic Learning
Abstract Slides Poster Similar
Efficient-Receptive Field Block with Group Spatial Attention Mechanism for Object Detection
Jiacheng Zhang, Zhicheng Zhao, Fei Su
Auto-TLDR; E-RFB: Efficient-Receptive Field Block for Deep Neural Network for Object Detection
Abstract Slides Poster Similar
Information Graphic Summarization Using a Collection of Multimodal Deep Neural Networks
Edward Kim, Connor Onweller, Kathleen F. Mccoy
Auto-TLDR; A multimodal deep learning framework that can generate summarization text supporting the main idea of an information graphic for presentation to blind or visually impaired
Attentive Hybrid Feature Based a Two-Step Fusion for Facial Expression Recognition
Jun Weng, Yang Yang, Zichang Tan, Zhen Lei
Auto-TLDR; Attentive Hybrid Architecture for Facial Expression Recognition
Abstract Slides Poster Similar
Adaptive Word Embedding Module for Semantic Reasoning in Large-Scale Detection
Yu Zhang, Xiaoyu Wu, Ruolin Zhu
Auto-TLDR; Adaptive Word Embedding Module for Object Detection
Abstract Slides Poster Similar
SIMCO: SIMilarity-Based Object COunting
Marco Godi, Christian Joppi, Andrea Giachetti, Marco Cristani
Auto-TLDR; SIMCO: An Unsupervised Multi-class Object Counting Approach on InShape
Abstract Slides Poster Similar
Multi-Scale Residual Pyramid Attention Network for Monocular Depth Estimation
Jing Liu, Xiaona Zhang, Zhaoxin Li, Tianlu Mao
Auto-TLDR; Multi-scale Residual Pyramid Attention Network for Monocular Depth Estimation
Abstract Slides Poster Similar
Enhanced User Interest and Expertise Modeling for Expert Recommendation
Tongze He, Caili Guo, Yunfei Chu
Auto-TLDR; A Unified Framework for Expert Recommendation in Community Question Answering
Abstract Slides Poster Similar
6D Pose Estimation with Correlation Fusion
Yi Cheng, Hongyuan Zhu, Ying Sun, Cihan Acar, Wei Jing, Yan Wu, Liyuan Li, Cheston Tan, Joo-Hwee Lim
Auto-TLDR; Intra- and Inter-modality Fusion for 6D Object Pose Estimation with Attention Mechanism
Abstract Slides Poster Similar
Context for Object Detection Via Lightweight Global and Mid-Level Representations
Mesut Erhan Unal, Adriana Kovashka
Auto-TLDR; Context-Based Object Detection with Semantic Similarity
Abstract Slides Poster Similar
Exploring and Exploiting the Hierarchical Structure of a Scene for Scene Graph Generation
Ikuto Kurosawa, Tetsunori Kobayashi, Yoshihiko Hayashi
Auto-TLDR; A Hierarchical Model for Scene Graph Generation
Abstract Slides Poster Similar
P2 Net: Augmented Parallel-Pyramid Net for Attention Guided Pose Estimation
Luanxuan Hou, Jie Cao, Yuan Zhao, Haifeng Shen, Jian Tang, Ran He
Auto-TLDR; Parallel-Pyramid Net with Partial Attention for Human Pose Estimation
Abstract Slides Poster Similar
Beyond the Deep Metric Learning: Enhance the Cross-Modal Matching with Adversarial Discriminative Domain Regularization
Li Ren, Kai Li, Liqiang Wang, Kien Hua
Auto-TLDR; Adversarial Discriminative Domain Regularization for Efficient Cross-Modal Matching
Abstract Slides Poster Similar
Cross-View Relation Networks for Mammogram Mass Detection
Ma Jiechao, Xiang Li, Hongwei Li, Ruixuan Wang, Bjoern Menze, Wei-Shi Zheng
Auto-TLDR; Multi-view Modeling for Mass Detection in Mammogram
Abstract Slides Poster Similar
Global-Local Attention Network for Semantic Segmentation in Aerial Images
Minglong Li, Lianlei Shan, Weiqiang Wang
Auto-TLDR; GLANet: Global-Local Attention Network for Semantic Segmentation
Abstract Slides Poster Similar
Text Synopsis Generation for Egocentric Videos
Aidean Sharghi, Niels Lobo, Mubarak Shah
Auto-TLDR; Egocentric Video Summarization Using Multi-task Learning for End-to-End Learning
Enriching Video Captions with Contextual Text
Philipp Rimle, Pelin Dogan, Markus Gross
Auto-TLDR; Contextualized Video Captioning Using Contextual Text
Abstract Slides Poster Similar
Transformer-Encoder Detector Module: Using Context to Improve Robustness to Adversarial Attacks on Object Detection
Faisal Alamri, Sinan Kalkan, Nicolas Pugeault
Auto-TLDR; Context Module for Robust Object Detection with Transformer-Encoder Detector Module
Abstract Slides Poster Similar
PICK: Processing Key Information Extraction from Documents Using Improved Graph Learning-Convolutional Networks
Wenwen Yu, Ning Lu, Xianbiao Qi, Ping Gong, Rong Xiao
Auto-TLDR; PICK: A Graph Learning Framework for Key Information Extraction from Documents
Abstract Slides Poster Similar
Multimodal Side-Tuning for Document Classification
Stefano Zingaro, Giuseppe Lisanti, Maurizio Gabbrielli
Auto-TLDR; Side-tuning for Multimodal Document Classification
Abstract Slides Poster Similar
PIN: A Novel Parallel Interactive Network for Spoken Language Understanding
Peilin Zhou, Zhiqi Huang, Fenglin Liu, Yuexian Zou
Auto-TLDR; Parallel Interactive Network for Spoken Language Understanding
Abstract Slides Poster Similar