Haowen Tang

Papers from this author

Inferring Tasks and Fluents in Videos by Learning Causal Relations

Haowen Tang, Ping Wei, Huan Li, Nanning Zheng

Responsive image

Auto-TLDR; Joint Learning of Complex Task and Fluent States in Videos

Slides Poster Similar

Recognizing time-varying object states in complex tasks is an important and challenging issue. In this paper, we propose a novel model to jointly infer object fluents and complex tasks in videos. A task is a complex goal-driven human activity and a fluent is defined as a time-varying object state. A hierarchical graph represents a task as a human action stream and multiple concurrent object fluents which vary as the human performs the actions. In this process, the human actions serve as the causes of object state changes which conversely reflect the effects of human actions. Given an input video, a causal sampling beam search (CSBS) algorithm is proposed to jointly infer the task category and the states of objects in each video frame. For model learning, a structural SVM framework is adopted to jointly train the task, fluent, cause, and effect parameters. We collected a new large-scale dataset of tasks and fluents in third-person view videos. It contains 14 categories of tasks, 24 categories of object fluents, 50 categories of object states, 809 videos, and 333,351 frames. Experimental results demonstrate the effectiveness of the proposed method.