ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

PHNet: Parasite-Host Network for Video Crowd Counting

Shiqiao Meng, Jiajie Li, Weiwei Guo, Jinfeng Jiang, Lai Ye

Auto-TLDR; PHNet: A Parasite-Host Network for Video Crowd Counting

Abstract Slides Poster

Crowd counting plays an increasingly important role in public security. Recently, many crowd counting methods for a single image have been proposed but few studies have focused on using temporal information from image sequences of videos to improve prediction performance. In the existing methods using videos for crowd estimation, temporal features and spatial features are modeled jointly for the prediction, which makes the model less efficient in extracting spatiotemporal features and difficult to improve the performance of predictions. In order to solve these problems, this paper proposes a Parasite-Host Network(PHNet) which is composed of Parasite branch and Host branch to extract temporal features and spatial features respectively. To specifically extract the transform features in the time domain, we propose a novel architecture termed as “Relational Extractor”(RE) which models the multiplicative interaction features of adjacent frames. In addition, the Host branch extracts the spatial features from a current frame which can be replaced with any model that uses a single image for the prediction. We conducted experiments by using our PHNet on four video crowd counting benchmarks: Venice,UCSD,FDST and CrowdFlow. Experimental results show that PHnet achieves superior performance on these four datasets to the state-of-the-art methods.

Similar papers

Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting

Pongpisit Thanasutives, Ken-Ichi Fukui, Masayuki Numao, Boonserm Kijsirikul

Auto-TLDR; M-SFANet and M-SegNet for Crowd Counting Using Multi-Scale Fusion Networks

PHNet: Parasite-Host Network for Video Crowd Counting

Similar papers

Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting

Learning Error-Driven Curriculum for Crowd Counting

Spatial-Related and Scale-Aware Network for Crowd Counting

VGG-Embedded Adaptive Layer-Normalized Crowd Counting Net with Scale-Shuffling Modules

Multi-Resolution Fusion and Multi-Scale Input Priors Based Crowd Counting

HANet: Hybrid Attention-Aware Network for Crowd Counting

Learning from Web Data: Improving Crowd Counting Via Semi-Supervised Learning

DAPC: Domain Adaptation People Counting Via Style-Level Transfer Learning and Scene-Aware Estimation

TSMSAN: A Three-Stream Multi-Scale Attentive Network for Video Saliency Detection

Point In: Counting Trees with Weakly Supervised Segmentation Network

AerialMPTNet: Multi-Pedestrian Tracking in Aerial Imagery Using Temporal and Graphical Features

Distortion-Adaptive Grape Bunch Counting for Omnidirectional Images

RWF-2000: An Open Large Scale Video Database for Violence Detection

Human Segmentation with Dynamic LiDAR Data

Residual Learning of Video Frame Interpolation Using Convolutional LSTM

Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition

Wavelet Attention Embedding Networks for Video Super-Resolution

ACCLVOS: Atrous Convolution with Spatial-Temporal ConvLSTM for Video Object Segmentation

A Duplex Spatiotemporal Filtering Network for Video-Based Person Re-Identification

Not 3D Re-ID: Simple Single Stream 2D Convolution for Robust Video Re-Identification

Video Semantic Segmentation Using Deep Multi-View Representation Learning

MFI: Multi-Range Feature Interchange for Video Action Recognition

A Grid-Based Representation for Human Action Recognition

Learnable Higher-Order Representation for Action Recognition

Weight Estimation from an RGB-D Camera in Top-View Configuration

TinyVIRAT: Low-Resolution Video Action Recognition

PSDNet: A Balanced Architecture of Accuracy and Parameters for Semantic Segmentation

Enhanced Feature Pyramid Network for Semantic Segmentation

MixTConv: Mixed Temporal Convolutional Kernels for Efficient Action Recognition

Two-Stream Temporal Convolutional Network for Dynamic Facial Attractiveness Prediction

Exploring Severe Occlusion: Multi-Person 3D Pose Estimation with Gated Convolution

Efficient-Receptive Field Block with Group Spatial Attention Mechanism for Object Detection

Activity Recognition Using First-Person-View Cameras Based on Sparse Optical Flows

SAT-Net: Self-Attention and Temporal Fusion for Facial Action Unit Detection

What and How? Jointly Forecasting Human Action and Pose

2D Deep Video Capsule Network with Temporal Shift for Action Recognition

Learning a Dynamic High-Resolution Network for Multi-Scale Pedestrian Detection

An Improved Bilinear Pooling Method for Image-Based Action Recognition

Flow-Guided Spatial Attention Tracking for Egocentric Activity Recognition

Selective Kernel and Motion-Emphasized Loss Based Attention-Guided Network for HDR Imaging of Dynamic Scenes

Learning Object Deformation and Motion Adaption for Semi-Supervised Video Object Segmentation

Early Wildfire Smoke Detection in Videos

Nighttime Pedestrian Detection Based on Feature Attention and Transformation

Revisiting Sequence-To-Sequence Video Object Segmentation with Multi-Task Loss and Skip-Memory

You Ought to Look Around: Precise, Large Span Action Detection

Multi-Scale Residual Pyramid Attention Network for Monocular Depth Estimation

Coarse-To-Fine Foreground Segmentation Based on Co-Occurrence Pixel-Block and Spatio-Temporal Attention Model

Coarse to Fine: Progressive and Multi-Task Learning for Salient Object Detection