Detecting Activities of Daily Living in First-person Camera Views


We present a novel dataset and novel algorithms for the problem of detecting activities of daily living (ADL) in first-person camera views. We have collected a dataset of one million frames of dozens of people performing unscripted, everyday activities. The dataset is annotated with activities, object tracks, hand positions, and interaction events. ADLs differ from typical actions in that they can involve long-scale temporal structure (making tea can take a few minutes) and complex object interactions (a fridge looks different when its door is open). We develop novel representations including (1) temporal pyramids, which generalize the well-known spatial pyramid to approximate temporal correspondence when scoring a model and (2) composite object models that exploit the fact that objects look different when being interacted with. We perform an extensive empirical evaluation and demonstrate that our novel representations produce a two-fold improvement over traditional approaches. Our analysis suggests that real-world ADL recognition is "all about the objects," and in particular, "all about the objects being interacted with."

Hamed Pirsiavash, Deva Ramanan. "Detecting Activities of Daily Living in First-person Camera Views" Computer Vision and Pattern Recognition (CVPR) Providence, Rhode Island, June 2012. [PDF] [slides (PPT, 53M)] [slides (PDF, 7M)] [poster (PDF), 4M]

Movies

Downloads

FilenameDescriptionSize
README Description of contents. 1.8 KB
ADL_videos ADL videos 43.1 GB
(20 files)
ADL_annotations.zip Object and action annotations 2.2 MB
ADL_detected_objects.zip Result of runing part-based object detectors. 496 MB
ADL_code.zip Train and test code for detecting ADL using object centeric features. 4.3 MB