(To appear.ACM Computing Surveys.)
Human Activity Analysis:A Review
J.K.Aggarwal1and M.S.Ryoo1,2
1The University of Texas at Austin
2Electronics and Telecommunications Rearch Institute
Human activity recognition is an important area of computer vision rearch.Its applications include surveillance systems,patient monitoring systems,and a variety of systems that involve interactions between persons and electronic devices such as human-computer interfaces.Most of the applications require an automated recognition of high-level activities,compod of mul-tiple simple(or atomic)actions of persons.This paper provides a detailed overview of various state-of-the-art rearch papers on human activity recognition.We discuss both the methodolo-gies developed for simple human actions and tho for high-level activities.An approach-bad taxonomy is chon,comparing the advantages and limitations of each approach.
Recognition methodologies for an analysis of simple actions of a single person arefirst pre-nted in th
e paper.Space-time volume approaches and quential approaches that reprent and recognize activities directly from input images are discusd.Next,hierarchical recognition methodologies for high-level activities are prented and compared.Statistical approaches,syntac-tic approaches,and description-bad approaches for hierarchical recognition are discusd in the paper.In addition,we further discuss the papers on the recognition of human-object interactions and group activities.Public datats designed for the evaluation of the recognition methodologies are illustrated in our paper as well,comparing the methodologies’performances.This review will provide the impetus for future rearch in more productive areas.
Categories and Subject Descriptors:I.2.10[Artificial Intelligence]:Vision and Scene Under-standing—motion;I.4.8[Image Processing]:Scene Analysis;I.5.4[Pattern Recognition]: Applications—computer vision
General Terms:Algorithms
Additional Key Words and Phras:computer vision;human activity recognition;event detection; activity analysis;video recognition
1.INTRODUCTION
Human activity recognition is an important area of computer vision rearch today. The goal of human activity recognition is to automatically analyze ongoing activities from an unknown a quence of image frames).In a simple ca where a video is gmented to contain only one execution of a human activity,the objective This work was supported partly by Texas Higher Education Coordinating Board under award no. 003658-0140-2007.
Authors’address:J.K.Aggarwal,Computer and Vision Rearch Center,Department of Elec-trical and Computer Engineering,the University of Texas at Austin,Austin,TX78705,U.S.A.; M.S.Ryoo,Robot Rearch Department,Electronics and Telecommunications Rearch Institute, Daejeon305-700,Korea;Correspondence e-mail:kr
Permission to make digital/hard copy of all or part of this material without fee for personal or classroom u provided that the copies are not made or distributed for profit or commercial advantage,the ACM copyright/rver notice,the title of the publication,and its date appear,and notice is given that copying is by permission of the ACM,Inc.To copy otherwi,to republish, to post on rvers,or to redistribute to lists requires prior specific permission and/or a fee.
c 20YY ACM0000-0000/20YY/0000-0001$5.00
2·J.K.Aggarwal and M.S.Ryoo
of the system is to correctly classify the video into its activity category.In more general cas,the continuous recognition of human activities must be performed, detecting starting and ending times of all occurring activities from an input video. The ability to recognize complex human activities from videos enables the con-struction of veral important applications.Automated surveillance systems in pub-lic places like airports and subway stations require detection of abnormal and sus-picious activities as oppod to normal activities.For instance,an airport surveil-lance system must be able to automatically recognize suspicious activities like‘a person leaving a bag’or‘a person placing his/her bag in a trash bin’.Recogni-tion of human activities also enables the real-time monitoring of patients,children, and elderly persons.The construction of gesture-bad human computer interfaces and vision-bad intelligent environments becomes possible as well with an activity recognition system.
There are various types of human activities.Depending on their complexity,we conceptually categorize human activities into four different levels:gestures,actions, interactions,and group activities.Gestures are elementary movements of a person’s body part,and are the atomic components describing the meaningful motion of a person.‘Stretching an arm’and‘raising a leg’are g
ood examples of gestures. Actions are single person activities that may be compod of multiple gestures organized temporally,such as‘walking’,‘waving’,and‘punching’.Interactions are human activities that involve two or more persons and/or objects.For example,‘two personsfighting’is an interaction between two humans and‘a person stealing a suitca from another’is a human-object interaction involving two humans and one object.Finally,group activities are the activities performed by conceptual groups compod of multiple persons and/or objects.‘A group of persons marching’,‘a group having a meeting’,and‘two groupsfighting’are typical examples of them. The objective of this paper is to provide a complete overview of state-of-the-art human activity recognition methodologies.We discuss various types of approaches designed for the recognition of different levels of activities.The previous review written by Aggarwal and Cai[1999]has covered veral esntial low-level compo-nents for the understanding of human motion,such as tracking and body posture analysis.However,the motion analysis methodologies themlves were insufficient to describe and annotate ongoing human activities with complex structures,and most of approaches in1990s focud on the recognition of gestures and simple actions.In this new review,we concentrates on high-level activity recognition methodologies designed for the analysis of human actions,interactions,and group activities,discussing recent rearch trends in activity recognition.
Figure1illustrates an overview of the tree-structured taxonomy that our review follows.We have chon an approach-bad taxonomy.All activity recognition methodologies arefirst classified into two categories:single-layered approaches and hierarchical approaches.Single-layered approaches are approaches that reprent and recognize human activities directly bad on quences of images.Due to their nature,single-layered approaches are suitable for the recognition of gestures and actions with quential characteristics.On the other hand,hierarchical approaches reprent high-level human activities by describing them in terms of other simpler activities,which they generally call sub-events.Recognition systems compod of ACM Journal Name,Vol.V,No.N,Month20YY.
Human Activity Analysis:A Review·3
Fig.1.The hierarchical approach-bad taxonomy of this review.
multiple layers are constructed,making them suitable for the analysis of complex activities.
Single-layered approaches are again classified into two types depending on how they model human activities:space-time approaches and quential approaches. Space-time approaches view an input video as a3-dimensional(XYT)volume while quential approaches interpret it as a quence of obrvations.Space-time ap-proaches are further divided into three categories bad on what features they u from the3-D space-time volumes:volumes themlves,trajectories,or local interest point descriptors.Sequential approaches are classified depending on whether they u exemplar-bad recognition methodologies or model-bad recognition method-ologies.Figure2shows a detailed taxonomy ud for single-layered approaches covered in the review,together with a number of publications corresponding to each category.
Hierarchical approaches are classified bad on the recognition methodologies they u:statistical approaches,syntactic approaches,and description-bad ap-proaches.Statistical approaches construct statistical state-bad models concate-nated layered hidden Markov models)to reprent and recognize high-level human activities.Similarly,syntactic approaches u a grammar syntax such as stochastic context-free grammar(SCFG)to model quential activities.Es-ntially,they are modeling a high-level activity as a string of atomic-level activities. Description-ba
d approaches reprent human activities by describing sub-events of the activities and their temporal,spatial,and logical structures.Figure3prents lists of reprentative publications corresponding to categories.
In addition,in Figures2and3,we have indicated previous works that recognize human-object interactions and group activities by using different colors and by at-taching‘O’(object)and‘G’(group)tags to the right-hand side.The recognition of human-object interactions requires the analysis of interplays between object recog-nition and activity analysis.This paper provides a survey on the methodologies focusing on the analysis of such interplays for the improved recognition of human activities.Similarly,the recognition of groups and the analysis of their structures is necessary for group activity detection,and we cover them as well in this review. This review paper is organized as follows:Section2covers single-layered ap-proaches.In Section3,we review hierarchical recognition approaches for the anal-ysis of high-level activities.Subction4.1discuss recognition methodologies for interactions between humans and objects,while especially concentrating on how
ACM Journal Name,Vol.V,No.N,Month20YY.
4·J.K.Aggarwal and M.S.Ryoo
Fig.2.Detailed taxonomy for single-layered approaches and the lists of lected publications corresponding to each category.
Hierarchical approaches
Statistical approaches Syntactic approaches Description-bad
approaches
[Pinhanez and Bobick ’98]
[Gupta et al. ’09]
[Nguyen et al. ’05]
Human actions [Intille and Bobick ’99][Vu et al. ’03][Ghanem et al. ’04]
[Ryoo and Aggarwal ’06, ’09a]
[Ivanov and Bobick ’00]
[Joo and Chellapha ’06]Human-Human
interactions [Oliver et al. ’02][Shi et al. ’04]O [Yu and Aggarwal ’06]
O [Damen and Hogg ’09]
O [Siskind ’01]O [Nevatia et al. ’03, ’04]O [Ryoo and Aggarwal ’07]O
[Moore and Essa ’02]O [Minnen et al. ’03]O Human-Object
interactions [Cupillard et al. ’02]G [Gong and Xiang ’03]G
[Zhang et al.’06]G
[Dai et al.’08]G [Ryoo and Aggarwal ’08]G
Group activities Fig.3.Detailed taxonomy for hierarchical approaches and the lists of publications corresponding to each category.
previous works handled interplays between object recognition and motion analysis.Subction 4.2prents works on group activity recognition.In Subction 5.1,we review public datats available and compare systems tested on them.In addition,Subction 5.2covers real-time systems for human activity recognition.Section 6concludes the paper.
1.1Comparison with previous review papers
There have been other related surveys on human activity recognition.Several pre-vious reviews on human motion analysis [Cedras and Shah 1995;Gavrila 1999;Aggarwal and Cai 1999]discusd human action recognition approaches as a part of their review.Kruger et al.[2007]reviewed human action recognition approaches while classifying them bad on the complexity of features involved in the action ACM Journal Name,Vol.V,No.N,Month 20YY.
Human Activity Analysis:A Review·5 recognition process.Their review especially focud on the plan
ning aspect of hu-man action recognitions,considering their potential application to robotics.Turaga et al.[2008]’s survey covered human activity recognition approaches,similar to ours. In their paper,approaches arefirst categorized bad on the complexity of the ac-tivities that they want to recognize,and then classified in terms of the recognition methodologies they u.
However,most of the previous reviews have focud on the introduction and summarization of activity recognition methodologies,and are lacking in the aspect of comparing different types of human activity recognition approaches.In this re-view,we prent inter-class and intra-class comparisons between approaches,while providing an overview of human activity recognition approaches categorized bad on the approach-bad taxonomy prented above.Comparisons among abilities of recognition methodologies are esntial for one to take advantage of them.Our goal is to enable a reader(even who is from a differentfield)to understand the context of human activity recognition’s developments,and comprehend advantages and disadvantages of different approach categories.
We u a more elaborate taxonomy and compare and contrast each approach category in detail.For example,differences between single-layered approaches and hierarchical approaches are discusd in the highest-level of our review,while space-time approaches are compared with quential approa
ches in an intermediate level. We prent a comparison among abilities of previous systems within each class as well,pointing out what they are able to recognize and what they are not.Further-more,our review covers recognition methodologies for complex human activities including human-object interactions and group activities,which previous reviews have not focud on.Finally,we discuss the public datats ud by the systems, and compare the recognition methodologies’performances on the datats.
2.SINGLE-LAYERED APPROACHES
Single-layered approaches recognize human activities directly from video data.The approaches consider an activity as a particular class of image quences,and recog-nize the activity from an unknown image an input)by categorizing it into its class.Various reprentation methodologies and matching algorithms have been developed to enable the recognition system to make an accurate deci-sion whether an image quence belongs to a certain activity class or not.For the recognition from continuous videos,most single-layered approaches have adopted a sliding windows technique that classifies all possible sub-quences.Single-layered approaches are most effective when a particular quential pattern describing an activity can be captured from training quences.Due to their nature,the main objective of the single-layered approaches has been
to analyze relatively simple(and short)quential movements of humans,such as walking,jumping,and waving.
In this review,we categorize single-layered approaches into two class:space-time approaches and quential approaches.Space-time approaches model a human activity as a particular3-D volume in a space-time dimension or a t of features extracted from the volume.The video volumes are constructed by concatenating image frames along a time axis,and are compared to measure their similarities. On the other hand,quential approaches treat a human activity as a quence
ACM Journal Name,Vol.V,No.N,Month20YY.