ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Cross-Media Hash Retrieval Using Multi-head Attention Network

Zhixin Li, Feng Ling, Chuansheng Xu, Canlong Zhang, Huifang Ma

Auto-TLDR; Unsupervised Cross-Media Hash Retrieval Using Multi-Head Attention Network

Abstract Slides Poster

The cross-media hash retrieval method is to encode multimedia data into a common binary hash space, which can effectively measure the correlation between samples from different modalities. In order to further improve the retrieval accuracy, this paper proposes an unsupervised cross-media hash retrieval method based on multi-head attention network. First of all, we use a multi-head attention network to make better matching images and texts, which contains rich semantic information. At the same time, an auxiliary similarity matrix is constructed to integrate the original neighborhood information from different modalities. Therefore, this method can capture the potential correlations between different modalities and within the same modality, so as to make up for the differences between different modalities and within the same modality. Secondly, the method is unsupervised and does not require additional semantic labels, so it has the potential to achieve large-scale cross-media retrieval. In addition, batch normalization and replacement hash code generation functions are adopted to optimize the model, and two loss functions are designed, which make the performance of this method exceed many supervised deep cross-media hash methods. Experiments on three datasets show that the average performance of this method is about 5 to 6 percentage points higher than the state-of-the-art unsupervised method, which proves the effectiveness and superiority of this method.

Similar papers

Discrete Semantic Matrix Factorization Hashing for Cross-Modal Retrieval

Jianyang Qin, Lunke Fei, Shaohua Teng, Wei Zhang, Genping Zhao, Haoliang Yuan

Auto-TLDR; Discrete Semantic Matrix Factorization Hashing for Cross-Modal Retrieval

Cross-Media Hash Retrieval Using Multi-head Attention Network

Similar papers

Discrete Semantic Matrix Factorization Hashing for Cross-Modal Retrieval

Fast Discrete Cross-Modal Hashing Based on Label Relaxation and Matrix Factorization

VSB^2-Net: Visual-Semantic Bi-Branch Network for Zero-Shot Hashing

Hierarchical Deep Hashing for Fast Large Scale Image Retrieval

Object Classification of Remote Sensing Images Based on Optimized Projection Supervised Discrete Hashing

DFH-GAN: A Deep Face Hashing with Generative Adversarial Network

Improved Deep Classwise Hashing with Centers Similarity Learning for Image Retrieval

Leveraging Quadratic Spherical Mutual Information Hashing for Fast Image Retrieval

Label Self-Adaption Hashing for Image Retrieval

VSR++: Improving Visual Semantic Reasoning for Fine-Grained Image-Text Matching

A Novel Attention-Based Aggregation Function to Combine Vision and Language

Beyond the Deep Metric Learning: Enhance the Cross-Modal Matching with Adversarial Discriminative Domain Regularization

Cross-spectrum Face Recognition Using Subspace Projection Hashing

Transformer Reasoning Network for Image-Text Matching and Retrieval

Deep Composer: A Hash-Based Duplicative Neural Network for Generating Multi-Instrument Songs

MANet: Multimodal Attention Network Based Point-View Fusion for 3D Shape Recognition

Webly Supervised Image-Text Embedding with Noisy Tag Refinement

Multi-Scale Cascading Network with Compact Feature Learning for RGB-Infrared Person Re-Identification

Integrating Historical States and Co-Attention Mechanism for Visual Dialog

Zero-Shot Text Classification with Semantically Extended Graph Convolutional Network

Global Context-Based Network with Transformer for Image2latex

Aggregating Object Features Based on Attention Weights for Fine-Grained Image Retrieval

RGB-Infrared Person Re-Identification Via Image Modality Conversion

Exploiting Local Indexing and Deep Feature Confidence Scores for Fast Image-To-Video Search

JECL: Joint Embedding and Cluster Learning for Image-Text Pairs

Object Detection Using Dual Graph Network

Object Detection Model Based on Scene-Level Region Proposal Self-Attention

More Correlations Better Performance: Fully Associative Networks for Multi-Label Image Classification

Edge-Aware Graph Attention Network for Ratio of Edge-User Estimation in Mobile Networks

Rethinking ReID：Multi-Feature Fusion Person Re-Identification Based on Orientation Constraints

Supporting Skin Lesion Diagnosis with Content-Based Image Retrieval

Decoupled Self-Attention Module for Person Re-Identification

A Base-Derivative Framework for Cross-Modality RGB-Infrared Person Re-Identification

Reinforcement Learning with Dual Attention Guided Graph Convolution for Relation Extraction

Adaptive Image Compression Using GAN Based Semantic-Perceptual Residual Compensation

Picture-To-Amount (PITA): Predicting Relative Ingredient Amounts from Food Images

Attentive Part-Aware Networks for Partial Person Re-Identification

Multi-Modal Contextual Graph Neural Network for Text Visual Question Answering

Attention-Based Deep Metric Learning for Near-Duplicate Video Retrieval

Context Visual Information-Based Deliberation Network for Video Captioning

Attentive Visual Semantic Specialized Network for Video Captioning

Sketch-SNet: Deeper Subdivision of Temporal Cues for Sketch Recognition

Cross-Lingual Text Image Recognition Via Multi-Task Sequence to Sequence Learning

A CNN-RNN Framework for Image Annotation from Visual Cues and Social Network Metadata

TAAN: Task-Aware Attention Network for Few-Shot Classification

SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition

Price Suggestion for Online Second-Hand Items

RWMF: A Real-World Multimodal Foodlog Database