Di Huang
Paper download is intended for registered attendees only, and is
subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.
Papers from this author
Towards Practical Compressed Video Action Recognition: A Temporal Enhanced Multi-Stream Network
Bing Li, Longteng Kong, Dongming Zhang, Xiuguo Bao, Di Huang, Yunhong Wang
Auto-TLDR; TEMSN: Temporal Enhanced Multi-Stream Network for Compressed Video Action Recognition
Abstract Slides Poster Similar
Current compressed video action recognition methods are mainly based on completely received compressed videos. However, in real transmission, the compressed video packets are usually disorderly received and lost due to network jitters or congestion. It is of great significance to recognize actions in early phases with limited packets, e.g. forecasting the potential risks from videos quickly. In this paper, we proposed a Temporal Enhanced Multi-Stream Network (TEMSN) for practical compressed video action recognition. First, we use three compressed modalities as complementary cues and build a multi-stream network to capture the rich information from compressed video packets. Second, we design a temporal enhanced module based on Encoder-Decoder structure applied on each stream to infer the missing packets, and generate more complete action dynamics. Thanks to the rich modalities and temporal enhancement, our approach is able to better modeling the action with limited compressed packets. Experiments on HMDB-51 and UCF-101 dataset validate its effectiveness and efficiency.