ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action Localization

Negin Ghamsarian, Mario Taschwer, Doris Putzgruber, Stephanie. Sarny, Klaus Schoeffmann

Auto-TLDR; relevance-based retrieval in cataract surgery videos

Abstract Slides

In cataract surgery, the operation is performed with the help of a microscope. Since the microscope enables watching real-time surgery by up to two people only, a major part of surgical training is conducted using the recorded videos. To optimize the training procedure with the video content, the surgeons require an automatic relevance detection approach. In addition to relevance-based retrieval, these results can be further used for skill assessment and irregularity detection in cataract surgery videos. In this paper, a three-module framework is proposed to detect and classify the relevant phase segments in cataract videos. Taking advantage of an idle frame recognition network, the video is divided into idle and action segments. To boost the performance in relevance detection Mask R-CNN is utilized to detect the cornea in each frame where the relevant surgical actions are conducted. The spatio-temporal localized segments containing higher-resolution information about the pupil texture and actions, and complementary temporal information from the same phase are fed into the relevance detection module. This module consists of four parallel recurrent CNNs being responsible to detect four relevant phases that have been defined with medical experts. The results will then be integrated to classify the action phases as irrelevant or one of four relevant phases. Experimental results reveal that the proposed approach outperforms static CNNs and different configurations of feature-based and end-to-end recurrent networks.

Similar papers

Early Wildfire Smoke Detection in Videos

Taanya Gupta, Hengyue Liu, Bir Bhanu

Auto-TLDR; Semi-supervised Spatio-Temporal Video Object Segmentation for Automatic Detection of Smoke in Videos during Forest Fire

Abstract Poster Similar

Recent advances in unmanned aerial vehicles and camera technology have proven useful for the detection of smoke that emerges above the trees during a forest fire. Automatic detection of smoke in videos is of great interest to Fire department. To date, in most parts of the world, the fire is not detected in its early stage and generally it turns catastrophic. This paper introduces a novel technique that integrates spatial and temporal features in a deep learning framework using semi-supervised spatio-temporal video object segmentation and dense optical flow. However, detecting this smoke in the presence of haze and without the labeled data is difficult. Considering the visibility of haze in the sky, a dark channel pre-processing method is used that reduces the amount of haze in video frames and consequently improves the detection results. Online training is performed on a video at the time of testing that reduces the need for ground-truth data. Tests using the publicly available video datasets show that the proposed algorithms outperform previous work and they are robust across different wildfire-threatened locations.

RMS-Net: Regression and Masking for Soccer Event Spotting

Matteo Tomei, Lorenzo Baraldi, Simone Calderara, Simone Bronzin, Rita Cucchiara

Auto-TLDR; An Action Spotting Network for Soccer Videos

Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action Localization

Similar papers

Early Wildfire Smoke Detection in Videos

RMS-Net: Regression and Masking for Soccer Event Spotting

Gabriella: An Online System for Real-Time Activity Detection in Untrimmed Security Videos

Modeling Long-Term Interactions to Enhance Action Recognition

A Grid-Based Representation for Human Action Recognition

A Novel Region of Interest Extraction Layer for Instance Segmentation

A Systematic Investigation on Deep Architectures for Automatic Skin Lesions Classification

Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition

TinyVIRAT: Low-Resolution Video Action Recognition

Video Semantic Segmentation Using Deep Multi-View Representation Learning

ActionSpotter: Deep Reinforcement Learning Framework for Temporal Action Spotting in Videos

Late Fusion of Bayesian and Convolutional Models for Action Recognition

Video Object Detection Using Object's Motion Context and Spatio-Temporal Feature Aggregation

Anomaly Detection, Localization and Classification for Railway Inspection

Flow-Guided Spatial Attention Tracking for Egocentric Activity Recognition

Detecting Objects with High Object Region Percentage

A Detection-Based Approach to Multiview Action Classification in Infants

SyNet: An Ensemble Network for Object Detection in UAV Images

SAT-Net: Self-Attention and Temporal Fusion for Facial Action Unit Detection

Precise Temporal Action Localization with Quantified Temporal Structure of Actions

RWF-2000: An Open Large Scale Video Database for Violence Detection

Learning Object Deformation and Motion Adaption for Semi-Supervised Video Object Segmentation

Video Face Manipulation Detection through Ensemble of CNNs

Tracking Fast Moving Objects by Segmentation Network

3D Attention Mechanism for Fine-Grained Classification of Table Tennis Strokes Using a Twin Spatio-Temporal Convolutional Neural Networks

MFI: Multi-Range Feature Interchange for Video Action Recognition

2D Deep Video Capsule Network with Temporal Shift for Action Recognition

Triplet-Path Dilated Network for Detection and Segmentation of General Pathological Images

Motion U-Net: Multi-Cue Encoder-Decoder Network for Motion Segmentation

Feature Pyramid Hierarchies for Multi-Scale Temporal Action Detection

Dual Stream Network with Selective Optimization for Skin Disease Recognition in Consumer Grade Images

Automated Whiteboard Lecture Video Summarization by Content Region Detection and Representation

What and How? Jointly Forecasting Human Action and Pose

Pose-Based Body Language Recognition for Emotion and Psychiatric Symptom Interpretation

Attention-Based Deep Metric Learning for Near-Duplicate Video Retrieval

Continuous Sign Language Recognition with Iterative Spatiotemporal Fine-Tuning

Stratified Multi-Task Learning for Robust Spotting of Scene Texts

Automatic Semantic Segmentation of Structural Elements related to the Spinal Cord in the Lumbar Region by Using Convolutional Neural Networks

End-To-End Deep Learning Methods for Automated Damage Detection in Extreme Events at Various Scales

Vision-Based Layout Detection from Scientific Literature Using Recurrent Convolutional Neural Networks

Detecting Marine Species in Echograms Via Traditional, Hybrid, and Deep Learning Frameworks

Revisiting Sequence-To-Sequence Video Object Segmentation with Multi-Task Loss and Skip-Memory

Siamese Dynamic Mask Estimation Network for Fast Video Object Segmentation

SFPN: Semantic Feature Pyramid Network for Object Detection

Deep Recurrent-Convolutional Model for AutomatedSegmentation of Craniomaxillofacial CT Scans

A Multi-Task Contextual Atrous Residual Network for Brain Tumor Detection & Segmentation

Text Synopsis Generation for Egocentric Videos

Audio-Based Near-Duplicate Video Retrieval with Audio Similarity Learning