Shenghao Zhang

Papers from this author

Attention-Oriented Action Recognition for Real-Time Human-Robot Interaction

Ziyang Song, Ziyi Yin, Zejian Yuan, Chong Zhang, Wanchao Chi, Yonggen Ling, Shenghao Zhang

Responsive image

Auto-TLDR; Attention-Oriented Multi-Level Network for Action Recognition in Interaction Scenes

Slides Poster Similar

Despite the notable progress made in action recognition tasks, not much work has been done in action recognition specifically for human-robot interaction. In this paper, we deeply explore the characteristics of the action recognition task in interaction scenes and propose an attention-oriented multi-level network framework to meet the need for real-time interaction. Specifically, a Pre-Attention network is employed to roughly focus on the interactor in the scene at low resolution firstly and then perform fine-grained pose estimation at high resolution. The other compact CNN receives the extracted skeleton sequence as input for action recognition, utilizing attention-like mechanisms to capture local spatial-temporal patterns and global semantic information effectively. To evaluate our approach, we construct a new action dataset specially for the recognition task in interaction scenes. Experimental results on our dataset and high efficiency (112 fps at 640 x 480 RGBD) on the mobile computing platform (Nvidia Jetson AGX Xavier) demonstrate excellent applicability of our method on action recognition in real-time human-robot interaction.

FastCompletion: A Cascade Network with Multiscale Group-Fused Inputs for Real-Time Depth Completion

Ang Li, Zejian Yuan, Yonggen Ling, Wanchao Chi, Shenghao Zhang, Chong Zhang

Responsive image

Auto-TLDR; Efficient Depth Completion with Clustered Hourglass Networks

Slides Poster Similar

Completing sparse data captured with commercial depth sensors is a vital and fundamental procedure for many computer vision applications. For execution in real-world scenarios, a good trade-off between accuracy and speed is increasingly in demand for depth completion methods. Most previous methods achieve satisfactory accuracy on standard benchmarks. However, they extensively rely on heavy models to handle diverse structures and require additional run time on multimodal data. In this paper, we present an efficient method of depth completion. We propose a grouped fusion strategy for efficiently extracting depth and guidance features in parallel and fusing them naturally in the feature spaces to achieve high performance. Instead of a monolithic architecture, we employ cascaded hourglass networks, each of which is specialized for certain structures and has a lightweight architecture. Given the sparsity of the depth maps, we downsample the inputs to multiple scales to further accelerate the computation. Our model runs at over 39 FPS on an embedded GPU with high-resolution inputs. Evaluations on the KITTI benchmark demonstrate that the proposed model is an ideal approach for real-world applications.