Audio-Visual Speech Recognition Using a Two-Step Feature Fusion Strategy
![Responsive image](/icpr/media/video_thumbnails/11074.jpg)
Auto-TLDR; A Two-Step Feature Fusion Network for Speech Recognition
Similar papers
Robust Audio-Visual Speech Recognition Based on Hybrid Fusion
Hong Liu, Wenhao Li, Bing Yang
![Responsive image](/icpr/media/video_thumbnails/11792.jpg)
Auto-TLDR; Hybrid Fusion Based AVSR with Residual Networks and Bidirectional Gated Recurrent Unit for Robust Speech Recognition in Noise Conditions
Abstract Slides Poster Similar
Mutual Alignment between Audiovisual Features for End-To-End Audiovisual Speech Recognition
Hong Liu, Yawei Wang, Bing Yang
![Responsive image](/icpr/media/video_thumbnails/11510.jpg)
Auto-TLDR; Mutual Iterative Attention for Audio Visual Speech Recognition
Abstract Slides Poster Similar
Audio-Visual Predictive Coding for Self-Supervised Visual Representation Learning
Mani Kumar Tellamekala, Michel Valstar, Michael Pound, Timo Giesbrecht
![Responsive image](/icpr/media/video_thumbnails/12083.jpg)
Auto-TLDR; AV-PPC: A Multi-task Learning Framework for Learning Semantic Visual Features from Unlabeled Video Data
Abstract Slides Poster Similar
Three-Dimensional Lip Motion Network for Text-Independent Speaker Recognition
Jianrong Wang, Tong Wu, Shanyu Wang, Mei Yu, Qiang Fang, Ju Zhang, Li Liu
![Responsive image](/icpr/media/video_thumbnails/11259.jpg)
Auto-TLDR; Lip Motion Network for Text-Independent and Text-Dependent Speaker Recognition
Abstract Slides Poster Similar
Visual Oriented Encoder: Integrating Multimodal and Multi-Scale Contexts for Video Captioning
![Responsive image](/icpr/media/video_thumbnails/10852.jpg)
Auto-TLDR; Visual Oriented Encoder for Video Captioning
Abstract Slides Poster Similar
End-To-End Triplet Loss Based Emotion Embedding System for Speech Emotion Recognition
Puneet Kumar, Sidharth Jain, Balasubramanian Raman, Partha Pratim Roy, Masakazu Iwamura
![Responsive image](/icpr/media/video_thumbnails/11937.jpg)
Auto-TLDR; End-to-End Neural Embedding System for Speech Emotion Recognition
Abstract Slides Poster Similar
MFI: Multi-Range Feature Interchange for Video Action Recognition
Sikai Bai, Qi Wang, Xuelong Li
![Responsive image](/icpr/media/video_thumbnails/11676.jpg)
Auto-TLDR; Multi-range Feature Interchange Network for Action Recognition in Videos
Abstract Slides Poster Similar
Audio-Video Detection of the Active Speaker in Meetings
Francisco Madrigal, Frederic Lerasle, Lionel Pibre, Isabelle Ferrané
![Responsive image](/icpr/media/video_thumbnails/11154.jpg)
Auto-TLDR; Active Speaker Detection with Visual and Contextual Information from Meeting Context
Abstract Slides Poster Similar
DenseRecognition of Spoken Languages
Jaybrata Chakraborty, Bappaditya Chakraborty, Ujjwal Bhattacharya
![Responsive image](/icpr/media/video_thumbnails/12052.jpg)
Auto-TLDR; DenseNet: A Dense Convolutional Network Architecture for Speech Recognition in Indian Languages
Abstract Slides Poster Similar
3D Audio-Visual Speaker Tracking with a Novel Particle Filter
Hong Liu, Yongheng Sun, Yidi Li, Bing Yang
![Responsive image](/icpr/media/video_thumbnails/11762.jpg)
Auto-TLDR; 3D audio-visual speaker tracking using particle filter based method
Abstract Slides Poster Similar
AttendAffectNet: Self-Attention Based Networks for Predicting Affective Responses from Movies
Thi Phuong Thao Ha, Bt Balamurali, Herremans Dorien, Roig Gemma
![Responsive image](/icpr/media/video_thumbnails/11931.jpg)
Auto-TLDR; AttendAffectNet: A Self-Attention Based Network for Emotion Prediction from Movies
Abstract Slides Poster Similar
Improving Mix-And-Separate Training in Audio-Visual Sound Source Separation with an Object Prior
Quan Nguyen, Simone Frintrop, Timo Gerkmann, Mikko Lauri, Julius Richter
![Responsive image](/icpr/media/video_thumbnails/11572.jpg)
Auto-TLDR; Object-Prior: Learning the 1-to-1 correspondence between visual and audio signals by audio- visual sound source methods
Person Recognition with HGR Maximal Correlation on Multimodal Data
Yihua Liang, Fei Ma, Yang Li, Shao-Lun Huang
![Responsive image](/icpr/media/video_thumbnails/11111.jpg)
Auto-TLDR; A correlation-based multimodal person recognition framework that learns discriminative embeddings of persons by joint learning visual features and audio features
Abstract Slides Poster Similar
Attentive Hybrid Feature Based a Two-Step Fusion for Facial Expression Recognition
Jun Weng, Yang Yang, Zichang Tan, Zhen Lei
![Responsive image](/icpr/media/video_thumbnails/11643.jpg)
Auto-TLDR; Attentive Hybrid Architecture for Facial Expression Recognition
Abstract Slides Poster Similar
Hybrid Network for End-To-End Text-Independent Speaker Identification
Wajdi Ghezaiel, Luc Brun, Olivier Lezoray
![Responsive image](/icpr/media/video_thumbnails/11130.jpg)
Auto-TLDR; Text-Independent Speaker Identification with Scattering Wavelet Network and Convolutional Neural Networks
Abstract Slides Poster Similar
Unsupervised Co-Segmentation for Athlete Movements and Live Commentaries Using Crossmodal Temporal Proximity
Yasunori Ohishi, Yuki Tanaka, Kunio Kashino
![Responsive image](/icpr/media/video_thumbnails/11983.jpg)
Auto-TLDR; A guided attention scheme for audio-visual co-segmentation
Abstract Slides Poster Similar
Learnable Higher-Order Representation for Action Recognition
![Responsive image](/icpr/media/thumbnails/2495_FI.pdf.jpg)
Auto-TLDR; Learningable Higher-Order Operations for Spatiotemporal Dynamics in Video Recognition
SAT-Net: Self-Attention and Temporal Fusion for Facial Action Unit Detection
Zhihua Li, Zheng Zhang, Lijun Yin
![Responsive image](/icpr/media/video_thumbnails/11470.jpg)
Auto-TLDR; Temporal Fusion and Self-Attention Network for Facial Action Unit Detection
Abstract Slides Poster Similar
Region-Based Non-Local Operation for Video Classification
![Responsive image](/icpr/media/video_thumbnails/12096.jpg)
Auto-TLDR; Regional-based Non-Local Operation for Deep Self-Attention in Convolutional Neural Networks
Abstract Slides Poster Similar
The Application of Capsule Neural Network Based CNN for Speech Emotion Recognition
![Responsive image](/icpr/media/video_thumbnails/12010.jpg)
Auto-TLDR; CapCNN: A Capsule Neural Network for Speech Emotion Recognition
Abstract Slides Poster Similar
Continuous Sign Language Recognition with Iterative Spatiotemporal Fine-Tuning
Kenessary Koishybay, Medet Mukushev, Anara Sandygulova
![Responsive image](/icpr/media/video_thumbnails/12121.jpg)
Auto-TLDR; A Deep Neural Network for Continuous Sign Language Recognition with Iterative Gloss Recognition
Abstract Slides Poster Similar
Global Context-Based Network with Transformer for Image2latex
Nuo Pang, Chun Yang, Xiaobin Zhu, Jixuan Li, Xu-Cheng Yin
![Responsive image](/icpr/media/video_thumbnails/11421.jpg)
Auto-TLDR; Image2latex with Global Context block and Transformer
Abstract Slides Poster Similar
ESResNet: Environmental Sound Classification Based on Visual Domain Models
Andrey Guzhov, Federico Raue, Jörn Hees, Andreas Dengel
![Responsive image](/icpr/media/video_thumbnails/11458.jpg)
Auto-TLDR; Environmental Sound Classification with Short-Time Fourier Transform Spectrograms
Abstract Slides Poster Similar
Learning Visual Voice Activity Detection with an Automatically Annotated Dataset
Stéphane Lathuiliere, Pablo Mesejo, Radu Horaud
![Responsive image](/icpr/media/video_thumbnails/11447.jpg)
Auto-TLDR; Deep Visual Voice Activity Detection with Optical Flow
Towards Practical Compressed Video Action Recognition: A Temporal Enhanced Multi-Stream Network
Bing Li, Longteng Kong, Dongming Zhang, Xiuguo Bao, Di Huang, Yunhong Wang
![Responsive image](/icpr/media/video_thumbnails/11305.jpg)
Auto-TLDR; TEMSN: Temporal Enhanced Multi-Stream Network for Compressed Video Action Recognition
Abstract Slides Poster Similar
ConvMath : A Convolutional Sequence Network for Mathematical Expression Recognition
Zuoyu Yan, Xiaode Zhang, Liangcai Gao, Ke Yuan, Zhi Tang
![Responsive image](/icpr/media/video_thumbnails/11410.jpg)
Auto-TLDR; Convolutional Sequence Modeling for Mathematical Expressions Recognition
Abstract Slides Poster Similar
MixTConv: Mixed Temporal Convolutional Kernels for Efficient Action Recognition
Kaiyu Shan, Yongtao Wang, Zhi Tang, Ying Chen, Yangyan Li
![Responsive image](/icpr/media/video_thumbnails/11055.jpg)
Auto-TLDR; Mixed Temporal Convolution for Action Recognition
Abstract Slides Poster Similar
Two-Stream Temporal Convolutional Network for Dynamic Facial Attractiveness Prediction
Nina Weng, Jiahao Wang, Annan Li, Yunhong Wang
![Responsive image](/icpr/media/video_thumbnails/12098.jpg)
Auto-TLDR; 2S-TCN: A Two-Stream Temporal Convolutional Network for Dynamic Facial Attractiveness Prediction
Abstract Slides Poster Similar
Attention-Driven Body Pose Encoding for Human Activity Recognition
Bappaditya Debnath, Swagat Kumar, Marry O'Brien, Ardhendu Behera
![Responsive image](/icpr/media/video_thumbnails/11578.jpg)
Auto-TLDR; Attention-based Body Pose Encoding for Human Activity Recognition
Abstract Slides Poster Similar
Context Matters: Self-Attention for Sign Language Recognition
Fares Ben Slimane, Mohamed Bouguessa
![Responsive image](/icpr/media/video_thumbnails/11830.jpg)
Auto-TLDR; Attentional Network for Continuous Sign Language Recognition
Abstract Slides Poster Similar
Video-Based Facial Expression Recognition Using Graph Convolutional Networks
Daizong Liu, Hongting Zhang, Pan Zhou
![Responsive image](/icpr/media/video_thumbnails/10908.jpg)
Auto-TLDR; Graph Convolutional Network for Video-based Facial Expression Recognition
Abstract Slides Poster Similar
Exploring Spatial-Temporal Representations for fNIRS-based Intimacy Detection via an Attention-enhanced Cascade Convolutional Recurrent Neural Network
Chao Li, Qian Zhang, Ziping Zhao
![Responsive image](/icpr/media/video_thumbnails/11948.jpg)
Auto-TLDR; Intimate Relationship Prediction by Attention-enhanced Cascade Convolutional Recurrent Neural Network Using Functional Near-Infrared Spectroscopy
Abstract Slides Poster Similar
Flow-Guided Spatial Attention Tracking for Egocentric Activity Recognition
![Responsive image](/icpr/media/video_thumbnails/11376.jpg)
Auto-TLDR; flow-guided spatial attention tracking for egocentric activity recognition
Abstract Slides Poster Similar
A Two-Stream Recurrent Network for Skeleton-Based Human Interaction Recognition
Qianhui Men, Edmond S. L. Ho, Shum Hubert P. H., Howard Leung
![Responsive image](/icpr/media/video_thumbnails/11184.jpg)
Auto-TLDR; Two-Stream Recurrent Neural Network for Human-Human Interaction Recognition
Abstract Slides Poster Similar
Global Feature Aggregation for Accident Anticipation
Mishal Fatima, Umar Karim Khan, Chong Min Kyung
![Responsive image](/icpr/media/video_thumbnails/11189.jpg)
Auto-TLDR; Feature Aggregation for Predicting Accidents in Video Sequences
A Neural Lip-Sync Framework for Synthesizing Photorealistic Virtual News Anchors
Ruobing Zheng, Zhou Zhu, Bo Song, Ji Changjiang
![Responsive image](/icpr/media/video_thumbnails/11502.jpg)
Auto-TLDR; Lip-sync: Synthesis of a Virtual News Anchor for Low-Delayed Applications
Abstract Slides Poster Similar
Audio-Based Near-Duplicate Video Retrieval with Audio Similarity Learning
Pavlos Avgoustinakis, Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Andreas L. Symeonidis, Ioannis Kompatsiaris
![Responsive image](/icpr/media/video_thumbnails/11570.jpg)
Auto-TLDR; AuSiL: Audio Similarity Learning for Near-duplicate Video Retrieval
Abstract Slides Poster Similar
RMS-Net: Regression and Masking for Soccer Event Spotting
Matteo Tomei, Lorenzo Baraldi, Simone Calderara, Simone Bronzin, Rita Cucchiara
![Responsive image](/icpr/media/video_thumbnails/11807.jpg)
Auto-TLDR; An Action Spotting Network for Soccer Videos
Abstract Slides Poster Similar
Talking Face Generation Via Learning Semantic and Temporal Synchronous Landmarks
Aihua Zheng, Feixia Zhu, Hao Zhu, Mandi Luo, Ran He
![Responsive image](/icpr/media/video_thumbnails/11298.jpg)
Auto-TLDR; A semantic and temporal synchronous landmark learning method for talking face generation
Abstract Slides Poster Similar
More Correlations Better Performance: Fully Associative Networks for Multi-Label Image Classification
![Responsive image](/icpr/media/video_thumbnails/12021.jpg)
Auto-TLDR; Fully Associative Network for Fully Exploiting Correlation Information in Multi-Label Classification
Abstract Slides Poster Similar
ReADS: A Rectified Attentional Double Supervised Network for Scene Text Recognition
Qi Song, Qianyi Jiang, Xiaolin Wei, Nan Li, Rui Zhang
![Responsive image](/icpr/media/video_thumbnails/11041.jpg)
Auto-TLDR; ReADS: Rectified Attentional Double Supervised Network for General Scene Text Recognition
Abstract Slides Poster Similar
RWF-2000: An Open Large Scale Video Database for Violence Detection
Ming Cheng, Kunjing Cai, Ming Li
![Responsive image](/icpr/media/video_thumbnails/11360.jpg)
Auto-TLDR; Flow Gated Network for Violence Detection in Surveillance Cameras
Abstract Slides Poster Similar
Improving Gravitational Wave Detection with 2D Convolutional Neural Networks
Siyu Fan, Yisen Wang, Yuan Luo, Alexander Michael Schmitt, Shenghua Yu
![Responsive image](/icpr/media/video_thumbnails/12500.jpg)
Auto-TLDR; Two-dimensional Convolutional Neural Networks for Gravitational Wave Detection from Time Series with Background Noise
What and How? Jointly Forecasting Human Action and Pose
Yanjun Zhu, Yanxia Zhang, Qiong Liu, Andreas Girgensohn
![Responsive image](/icpr/media/video_thumbnails/10928.jpg)
Auto-TLDR; Forecasting Human Actions and Motion Trajectories with Joint Action Classification and Pose Regression
Abstract Slides Poster Similar
Responsive Social Smile: A Machine-Learning Based Multimodal Behavior Assessment Framework towards Early Stage Autism Screening
Yueran Pan, Kunjing Cai, Ming Cheng, Xiaobing Zou, Ming Li
![Responsive image](/icpr/media/video_thumbnails/12509.jpg)
Auto-TLDR; Responsive Social Smile: A Machine Learningbased Assessment Framework for Early ASD Screening
Channel-Wise Dense Connection Graph Convolutional Network for Skeleton-Based Action Recognition
Michael Lao Banteng, Zhiyong Wu
![Responsive image](/icpr/media/video_thumbnails/11311.jpg)
Auto-TLDR; Two-stream channel-wise dense connection GCN for human action recognition
Abstract Slides Poster Similar
3D Attention Mechanism for Fine-Grained Classification of Table Tennis Strokes Using a Twin Spatio-Temporal Convolutional Neural Networks
Pierre-Etienne Martin, Jenny Benois-Pineau, Renaud Péteri, Julien Morlier
![Responsive image](/icpr/media/video_thumbnails/11594.jpg)
Auto-TLDR; Attentional Blocks for Action Recognition in Table Tennis Strokes
Abstract Slides Poster Similar
Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition
Mirco Planamente, Andrea Bottino, Barbara Caputo
![Responsive image](/icpr/media/video_thumbnails/11935.jpg)
Auto-TLDR; A Single Stream Architecture for Egocentric Action Recognition from the First-Person Point of View
Abstract Slides Poster Similar