ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Image Sequence Based Cyclist Action Recognition Using Multi-Stream 3D Convolution

Stefan Zernetsch, Steven Schreck, Viktor Kress, Konrad Doll, Bernhard Sick

Auto-TLDR; 3D-ConvNet: A Multi-stream 3D Convolutional Neural Network for Detecting Cyclists in Real World Traffic Situations

Abstract Slides Poster

In this article, we present an approach to detect basic movements of cyclists in real world traffic situations based on image sequences, optical flow (OF) sequences, and past positions using a multi-stream 3D convolutional neural network (3D-ConvNet) architecture. To resolve occlusions of cyclists by other traffic participants or road structures, we use a wide angle stereo camera system mounted at a heavily frequented public intersection. We created a large dataset consisting of 1,639 video sequences containing cyclists, recorded in real world traffic, resulting in over 1.1 million samples. Through modeling the cyclists' behavior by a state machine of basic cyclist movements, our approach takes every situation into account and is not limited to certain scenarios. We compare our method to an approach solely based on position sequences. Both methods are evaluated taking into account frame wise and scene wise classification results of basic movements, and detection times of basic movement transitions, where our approach outperforms the position based approach by producing more reliable detections with shorter detection times. Our code and parts of our dataset are made publicly available.

Similar papers

What and How? Jointly Forecasting Human Action and Pose

Yanjun Zhu, Yanxia Zhang, Qiong Liu, Andreas Girgensohn

Auto-TLDR; Forecasting Human Actions and Motion Trajectories with Joint Action Classification and Pose Regression

Image Sequence Based Cyclist Action Recognition Using Multi-Stream 3D Convolution

Similar papers

What and How? Jointly Forecasting Human Action and Pose

Visual Prediction of Driver Behavior in Shared Road Areas

Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition

AerialMPTNet: Multi-Pedestrian Tracking in Aerial Imagery Using Temporal and Graphical Features

Late Fusion of Bayesian and Convolutional Models for Action Recognition

Transformer Networks for Trajectory Forecasting

DAG-Net: Double Attentive Graph Neural Network for Trajectory Forecasting

Modeling Long-Term Interactions to Enhance Action Recognition

Holistic Grid Fusion Based Stop Line Estimation

Gabriella: An Online System for Real-Time Activity Detection in Untrimmed Security Videos

Developing Motion Code Embedding for Action Recognition in Videos

Motion Complementary Network for Efficient Action Recognition

Towards Practical Compressed Video Action Recognition: A Temporal Enhanced Multi-Stream Network

Vehicle Lane Merge Visual Benchmark

Tracking Fast Moving Objects by Segmentation Network

A Grid-Based Representation for Human Action Recognition

Global Feature Aggregation for Accident Anticipation

AG-GAN: An Attentive Group-Aware GAN for Pedestrian Trajectory Prediction

Multiple Future Prediction Leveraging Synthetic Trajectories

IPT: A Dataset for Identity Preserved Tracking in Closed Domains

Audio-Video Detection of the Active Speaker in Meetings

3D Attention Mechanism for Fine-Grained Classification of Table Tennis Strokes Using a Twin Spatio-Temporal Convolutional Neural Networks

HPERL: 3D Human Pose Estimastion from RGB and LiDAR

Real-Time End-To-End Lane ID Estimation Using Recurrent Networks

Temporal Binary Representation for Event-Based Action Recognition

Estimation of Clinical Tremor Using Spatio-Temporal Adversarial AutoEncoder

Uncertainty-Sensitive Activity Recognition: A Reliability Benchmark and the CARING Models

Single View Learning in Action Recognition

Semantic Segmentation for Pedestrian Detection from Motion in Temporal Domain

Pose-Based Body Language Recognition for Emotion and Psychiatric Symptom Interpretation

Better Prior Knowledge Improves Human-Pose-Based Extrinsic Camera Calibration

RWF-2000: An Open Large Scale Video Database for Violence Detection

Early Wildfire Smoke Detection in Videos

Learning Dictionaries of Kinematic Primitives for Action Classification

Depth Videos for the Classification of Micro-Expressions

TinyVIRAT: Low-Resolution Video Action Recognition

Ghost Target Detection in 3D Radar Data Using Point Cloud Based Deep Neural Network

A Detection-Based Approach to Multiview Action Classification in Infants

IPN Hand: A Video Dataset and Benchmark for Real-Time Continuous Hand Gesture Recognition

Uncertainty Guided Recognition of Tiny Craters on the Moon

Attention-Oriented Action Recognition for Real-Time Human-Robot Interaction

Precise Temporal Action Localization with Quantified Temporal Structure of Actions

Knowledge Distillation for Action Anticipation Via Label Smoothing

Detecting Anomalies from Video-Sequences: A Novel Descriptor

A Bayesian Approach to Reinforcement Learning of Vision-Based Vehicular Control

PolyLaneNet: Lane Estimation Via Deep Polynomial Regression

Domain Siamese CNNs for Sparse Multispectral Disparity Estimation

Correlation-Based ConvNet for Small Object Detection in Videos