ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Exploiting the Logits: Joint Sign Language Recognition and Spell-Correction

Christina Runkel, Stefan Dorenkamp, Hartmut Bauermeister, Michael Möller

Auto-TLDR; A Convolutional Neural Network for Spell-correction in Sign Language Videos

Abstract Slides Poster

Machine learning techniques have excelled in the automatic semantic analysis of images, reaching human-level performances on challenging bechmarks. Yet, the semantic analysis of videos remains challenging due to the significantly higher dimensionality of the input data, respectively, the significantly higher need for annotated training examples. By studying the automatic recognition of German sign language videos, we demonstrate that on the relatively scarce training data of 2.800 videos, modern deep learning architectures for video analysis (such as ResNeXt) along with transfer learning on large gesture recognition tasks, can achieve about 75% character accuracy. Considering that this leaves us with a probability of under 25% that a five letter word is spelled correctly, spell-correction systems are crucial for producing readable outputs. The contribution of this paper is to propose a convolutional neural network for spell-correction that expects the softmax outputs of the character recognition network (instead of a misspelled word) as an input. We demonstrate that purely learning on softmax inputs in combination with scarce training data yields overfitting as the network learns the inputs by heart. In contrast, training the network on several variants of the logits of the classification output i.e. scaling by a constant factor, adding of random noise, mixing of softmax and hardmax inputs or purely training on hardmax inputs, leads to better generalization while benefitting from the significant information hidden in these outputs (that have 98% top-5 accuracy), yielding a readable text despite the comparably low character accuracy.

Similar papers

Continuous Sign Language Recognition with Iterative Spatiotemporal Fine-Tuning

Kenessary Koishybay, Medet Mukushev, Anara Sandygulova

Auto-TLDR; A Deep Neural Network for Continuous Sign Language Recognition with Iterative Gloss Recognition

Exploiting the Logits: Joint Sign Language Recognition and Spell-Correction

Similar papers

Continuous Sign Language Recognition with Iterative Spatiotemporal Fine-Tuning

Context Matters: Self-Attention for Sign Language Recognition

Concept Embedding through Canonical Forms: A Case Study on Zero-Shot ASL Recognition

Recognizing American Sign Language Nonmanual Signal Grammar Errors in Continuous Videos

Cross-People Mobile-Phone Based Airwriting Character Recognition

Location Prediction in Real Homes of Older Adults based on K-Means in Low-Resolution Depth Videos

KoreALBERT: Pretraining a Lite BERT Model for Korean Language Understanding

Applying (3+2+1)D Residual Neural Network with Frame Selection for Hong Kong Sign Language Recognition

IPN Hand: A Video Dataset and Benchmark for Real-Time Continuous Hand Gesture Recognition

A Prototype-Based Generalized Zero-Shot Learning Framework for Hand Gesture Recognition

Improving Robotic Grasping on Monocular Images Via Multi-Task Learning and Positional Loss

ESResNet: Environmental Sound Classification Based on Visual Domain Models

Fully Convolutional Neural Networks for Raw Eye Tracking Data Segmentation, Generation, and Reconstruction

Occlusion-Tolerant and Personalized 3D Human Pose Estimation in RGB Images

LODENet: A Holistic Approach to Offline Handwritten Chinese and Japanese Text Line Recognition

Uncertainty-Sensitive Activity Recognition: A Reliability Benchmark and the CARING Models

The HisClima Database: Historical Weather Logs for Automatic Transcription and Information Extraction

Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition

Personalized Models in Human Activity Recognition Using Deep Learning

Developing Motion Code Embedding for Action Recognition in Videos

A Grid-Based Representation for Human Action Recognition

Temporal Binary Representation for Event-Based Action Recognition

Pose-Based Body Language Recognition for Emotion and Psychiatric Symptom Interpretation

Textual-Content Based Classification of Bundles of Untranscribed of Manuscript Images

3D Facial Matching by Spiral Convolutional Metric Learning and a Biometric Fusion-Net of Demographic Properties

A Transformer-Based Radical Analysis Network for Chinese Character Recognition

Enriching Video Captions with Contextual Text

From Human Pose to On-Body Devices for Human-Activity Recognition

Recognizing Bengali Word Images - A Zero-Shot Learning Perspective

Fast Approximate Modelling of the Next Combination Result for Stopping the Text Recognition in a Video

Learning Dictionaries of Kinematic Primitives for Action Classification

SSDL: Self-Supervised Domain Learning for Improved Face Recognition

Enhancing Handwritten Text Recognition with N-Gram Sequencedecomposition and Multitask Learning

To Honor Our Heroes: Analysis of the Obituaries of Australians Killed in Action in WWI and WWII

Domain Siamese CNNs for Sparse Multispectral Disparity Estimation

Ballroom Dance Recognition from Audio Recordings

Improving Batch Normalization with Skewness Reduction for Deep Neural Networks

Tackling Contradiction Detection in German Using Machine Translation and End-To-End Recurrent Neural Networks

Dimensionality Reduction for Data Visualization and Linear Classification, and the Trade-Off between Robustness and Classification Accuracy

Smart Inference for Multidigit Convolutional Neural Network Based Barcode Decoding

Single View Learning in Action Recognition

Conditional-UNet: A Condition-Aware Deep Model for Coherent Human Activity Recognition from Wearables

Explainable Online Validation of Machine Learning Models for Practical Applications

Recursive Recognition of Offline Handwritten Mathematical Expressions

A Close Look at Deep Learning with Small Data

SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition

Transformer Networks for Trajectory Forecasting

GazeMAE: General Representations of Eye Movements Using a Micro-Macro Autoencoder