Robotics and Computer Vision Laboratory Login  
  Robotics and Computer Vision Laboratory kaist logo
Archive Courses

Home  >  Research  >  Publications

[International Conference] Human Action Recognition by Attention and Object Network
22th Korea-Japan Joint Workshop on Frontiers of Computer Vision , February 2016
  FCV2016thesis_bgsim.pdf FCV2016thesis_bgsim.pdf (2.2M) [91]
Nowadays, visual recognition in video has no clear winner. However, combination of CNN and LSTM is a
popular way to describe a video. In this paper, we propose a method to recognize human actions and objects
which depend on the action. We also propose a simple attention method to focus on an important region
where human acts with objects. We rst extract temporal stream from a video sequence, and train a RGB
CNN and an optical
ow CNN separately. After that, an action tube to give attention on an important region
is estimated and described by the activations from each network. From the activations, we nally LSTM
blocks to predict action/object classes. Our architecture is small enough (810 fps) but shows competitive
performance against state-of-the-art methods over MPII Cooking dataset.
This work was supported by the Technology Innova- tion Program (No. 10048320), funded by Korea govern- ment (MOTIE).


Robotics and Computer Vision Laboratory
KAIST | Electrical Engineering | Contact Us | Sitemap