ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Attention-Based Deep Metric Learning for Near-Duplicate Video Retrieval

Kuan-Hsun Wang, Chia Chun Cheng, Yi-Ling Chen, Yale Song, Shang-Hong Lai

Auto-TLDR; Attention-based Deep Metric Learning for Near-duplicate Video Retrieval

Abstract Slides

Near-duplicate video retrieval (NDVR) is an important and challenging problem due to the increasing amount of videos uploaded to the Internet. In this paper, we propose an attention-based deep metric learning method for NDVR. Our method is based on well-established principles: We leverage two-stream networks to combine RGB and optical flow features, and incorporate an attention module to effectively deal with distractor frames commonly observed in near duplicate videos. We further aggregate the features corresponding to multiple video segments to enhance the discriminative power. The whole system is trained using a deep metric learning objective with a Siamese architecture. Our experiments show that the attention module helps eliminate redundant and noisy frames, while focusing on visually relevant frames for solving NVDR. We evaluate our approach on recent large-scale NDVR datasets, CC_WEB_VIDEO, VCDB, FIVR and SVD. To demonstrate the generalization ability of our approach, we report results in both within- and cross-dataset settings, and show that the proposed method significantly outperforms state-of-the-art approaches.

Similar papers

Audio-Based Near-Duplicate Video Retrieval with Audio Similarity Learning

Pavlos Avgoustinakis, Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Andreas L. Symeonidis, Ioannis Kompatsiaris

Auto-TLDR; AuSiL: Audio Similarity Learning for Near-duplicate Video Retrieval

Attention-Based Deep Metric Learning for Near-Duplicate Video Retrieval

Similar papers

Audio-Based Near-Duplicate Video Retrieval with Audio Similarity Learning

Exploiting Local Indexing and Deep Feature Confidence Scores for Fast Image-To-Video Search

Not 3D Re-ID: Simple Single Stream 2D Convolution for Robust Video Re-Identification

Multi-Level Deep Learning Vehicle Re-Identification Using Ranked-Based Loss Functions

G-FAN: Graph-Based Feature Aggregation Network for Video Face Recognition

Generalized Local Attention Pooling for Deep Metric Learning

AttendAffectNet: Self-Attention Based Networks for Predicting Affective Responses from Movies

RWF-2000: An Open Large Scale Video Database for Violence Detection

A Grid-Based Representation for Human Action Recognition

Rotation Invariant Aerial Image Retrieval with Group Convolutional Metric Learning

Temporally Coherent Embeddings for Self-Supervised Video Representation Learning

A Novel Attention-Based Aggregation Function to Combine Vision and Language

Video Face Manipulation Detection through Ensemble of CNNs

SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition

RMS-Net: Regression and Masking for Soccer Event Spotting

Attention-Driven Body Pose Encoding for Human Activity Recognition

Loop-closure detection by LiDAR scan re-identification

Deep Top-Rank Counter Metric for Person Re-Identification

Nonlinear Ranking Loss on Riemannian Potato Embedding

Improved Deep Classwise Hashing with Centers Similarity Learning for Image Retrieval

On Identification and Retrieval of Near-Duplicate Biological Images: A New Dataset and Protocol

MFI: Multi-Range Feature Interchange for Video Action Recognition

SSDL: Self-Supervised Domain Learning for Improved Face Recognition

Multi-Scale Keypoint Matching

Augmented Bi-Path Network for Few-Shot Learning

Learning Embeddings for Image Clustering: An Empirical Study of Triplet Loss Approaches

Aggregating Object Features Based on Attention Weights for Fine-Grained Image Retrieval

Building Computationally Efficient and Well-Generalizing Person Re-Identification Models with Metric Learning

Comparison of Deep Learning and Hand Crafted Features for Mining Simulation Data

Multi-Scale Cascading Network with Compact Feature Learning for RGB-Infrared Person Re-Identification

Feature Pyramid Hierarchies for Multi-Scale Temporal Action Detection

Global Feature Aggregation for Accident Anticipation

Progressive Learning Algorithm for Efficient Person Re-Identification

Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action Localization

MixTConv: Mixed Temporal Convolutional Kernels for Efficient Action Recognition

Attentive Part-Aware Networks for Partial Person Re-Identification

VTT: Long-Term Visual Tracking with Transformers

Automated Whiteboard Lecture Video Summarization by Content Region Detection and Representation

Total Whitening for Online Signature Verification Based on Deep Representation

DFH-GAN: A Deep Face Hashing with Generative Adversarial Network

Enriching Video Captions with Contextual Text

3D Attention Mechanism for Fine-Grained Classification of Table Tennis Strokes Using a Twin Spatio-Temporal Convolutional Neural Networks

Hierarchical Deep Hashing for Fast Large Scale Image Retrieval

TinyVIRAT: Low-Resolution Video Action Recognition

Hierarchical Multimodal Attention for Deep Video Summarization

Unsupervised Co-Segmentation for Athlete Movements and Live Commentaries Using Crossmodal Temporal Proximity

Modeling Long-Term Interactions to Enhance Action Recognition

One-Shot Representational Learning for Joint Biometric and Device Authentication