首页 > 美文阅读

挖掘交通数据的时空相关性

更新时间:2023-07-27 03:14:06 阅读：评论：0

Discovering Spatio-Temporal Causal Interactions

in Trafﬁc Data Streams

Wei Liu

University of Sydney,

Sydney,Australia wei.liu@sydney.edu.au

Yu Zheng

Microsoft Rearch Asia,

Beijing,China

Sanjay Chawla

University of Sydney,

Sydney,Australia

sanjay.chawla@sydney.edu.au

Jing Yuan

University of Science and

Technology of China yuanjing@mail.ustc.edu

Xing Xie Microsoft Rearch Asia, Beijing,China

ABSTRACT

The detection of outliers in spatio-temporal traﬃc data is an important rearch problem in the data mining and knowl-edge discovery community.However to the best of our knowl-edge,the discovery of relationships,especially causal inter-actions,among detected traﬃc outliers has not been inves-tigated before.In this paper we propo algorithms which construct outlier causality trees bad on temporal and spa-tial properties of detected outliers.Frequent substructures of the causality trees reveal not only recurring interac-tions among spatio-temporal outliers,but potentialﬂaws in the design

of existing traﬃc networks.The eﬀectiveness and strength of our algorithms are validated by experiments on a very large volume of real taxi trajectories in an urban road network.

Categories and Subject Descriptors

H.2.8[Databa Applications]:Data mining

General Terms

Algorithms

Keywords

Spatio-temporal outliers;outlier causalities;frequent sub-structures;urban computing and planning;

1.INTRODUCTION

The increasing availability of location-acquisition tech-nologies including GPS and WIFI have resulted in huge volumes of spatio-temporal data,especially in the form of Permission to make digital or hard copies of all or part of this work for personal or classroom u is granted without fee provide

d that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on theﬁrst page.To copy otherwi,to republish,to post on rvers or to redistribute to lists,requires prior speciﬁc permission and/or a fee.

KDD’11,August21–24,2011,San Diego,California,USA.

Copyright2011ACM978-1-4503-0813-7/$ajectories[2,5,7,15,14,19,17].Unusual patterns of mov-ing objects’trajectories generally reﬂect abnormal traﬃc streams on road networks,which could be caud by aperi-odic events including celebrations,parades,large-scale busi-ness promotions,protests,traﬃc control and traﬃc jams. Therefore,the detection of outliers/anomalies from trajec-tory data can help in nsing abnormal events and plan for their impact on the smoothﬂow of traﬃc.In this study, we treat both known(planned)and unknown(unplanned) events that behave diﬀerently from daily network traﬃcs as anomalies.

Challenges and Contributions

In order to successfully detect outliers and causal interac-tions among them,the following challenges need to be ad-dresd:(i)Heterogeneous traﬃc patterns:the traﬃc pat-terns on roads vary across days of a week and hours of a day. Diﬀerent road gments have often distinct time-variant traf-ﬁc p

atterns.It is diﬃcult to u one model to detect outliers across the road network at diﬀerent time periods.(ii)Data sparness and distribution skewness:even though we could have a large number of nsors(taxis)probing the traﬃc on roads,there are many roads that have only a small number of samples given a large size of road networks in a major city. Moreover,a few road gments are traveled by thousands of vehicles in a few hours,while some gments may be only driven on a few times in a day.The two properties to-gether result in unique challenges in detecting outliers from traﬃc data.(iii)Causality among outliers:we not only need to discover outliers from the traﬃc,but also infer causal re-lationships and interactions among them,especially given the large number of outliers that could be identiﬁed.So a challenge is how to detect the appearance,growth,disap-pearance and transformation of outliers by ,prop-agation of a traﬃc jam).

In this paper we design veral steps to address the above challenges and propo solutions to the problem of detect-ing spatio-temporal outliers and causal relationships among them from traﬃc data streams.We u contexts of road networks in this study,however,algorithms propod in this paper can be generally applied to domains of networking[12,

Figure 1:An example using traﬃc networks in the city of Beijing.Bad on major roads in the traﬃc network,the entire city (subﬁgure (a))is partitioned into regions (subﬁgure (b)).Trajectories of moving objects (such as a moving taxi shown by a blue trajectory in subﬁgure (c))connect neighboring regions,bad on which we create the notion of links (subﬁgure (d)).13]and climate change [18,23,8]etc.More speciﬁcally,the contributions we make in this paper are:

1.City-wide traﬃc modeling :we partition the urban area of a city into regions using the framework of the city’s road network.Then we build a region graph where a node is a region and a link captures the traﬃc ﬂow among two regions.We formulate the outlier detec-tion problem as identifying the most outlying “links”from the region graph in terms of the spatio-temporal properties of a link.

2.Outlier tree construction :we propo the STOTree al-gorithm bad on both spatial and temporal properties of detected outliers (which are certain “links”in a time frame)to construct outlier trees,which uncover causal relationships among the outliers.

3.Frequent outlier subtree discovery :we propo the fre-qentSubtree algorithm,inspired by association rule mining,which generates the most frequent sub-structure (subtree)from all discovered outlier trees.The fre-quent subtrees reveal recurrent abnormalities in the data and suggest inherent problems in existing road networks.The rest of the paper is structured as follows.In Section 3we introduce the overall framework of our model,includ-ing preliminary concepts and notations that we u in this paper.In Section 4we propo algorithms for discovering spatio-temporal outliers and causal relationships.Exper-iments and their analysis are reported in Section 5.We conclude in Section 6with directions for future work.

2.RELATED WORK

To the best of our knowledge this is the ﬁrst paper that propos the problem of discovering casual relationships among spatio-temporal outliers.However,there have been a num-ber of eﬀorts on detecting only outliers from spatially and temporally distributed data.For example,principle compo-ne

nt analysis (PCA)has been ud for network-wide anomaly detection [13,26,20,1].However,PCA results (as we will show)cannot capture volume heterogeneity and are also very nsitive to parameter ttings which are highly data depen-dent.Under certain circumstances,large anomalies in turn can eﬀect the PCA computation leading to both fal posi-tives and fal negatives [20].

The problem of outlier monitoring has also been studied in [2]which builds local clusters on trajectory streams and monitors outliers that are deﬁned by a “trajectory”(instead of a spatial link as ours).Another method is Sun et al.[22]where a measure,spatial local outlier measure (SLOM),is propod to capture the local behavior of datum in their spa-tial neighborhood.This measure takes into account the local stability around a data point and suppress the reporting of outliers in highly unstable areas.A generalized local sta-tistical (GLS)framework is propod in [6]which studies the performance of local bad methods on detecting outliers in geo-statistical data with either linear or nonlinear trends,and compares them against global bad methods.Wu et al.[23]design algorithms detecting the most abnormal dis-crepancy regions in precipitation data,where they u four sweep lines to form grids which are treated as regions.How-ever,none of the approaches model and capture tempo-ral relations (causalities)among detected outlying regions.Lee et al.[15,14]have designed a “partition-and-group”framework for clustering and detecting trajectory outliers.In their app

roach,they ﬁrst partition the trajectories into small gments and then u both distance and density to detect abnormal sub-trajectories.This is diﬀerent from our work as we detect abnormal regions and links of the entire traﬃc network (instead of objects moving in the network).Moving objects are usually associated with periodic be-havioral patterns,and there have been veral methods pro-pod to address the problem of detecting such periodic movements [3,4,10,9,16].Cao et al.[3,4]propod abbre-viated list tables (ALT)to ﬁnd subquences that appear periodically and frequently in data quences,but the pe-riodic patterns they detect are very nsitive to parameter ttings.Similarly,Elfeky et.al.[9]have propod speciﬁc deﬁnitions of periodicities and algorithms for identifying the periodic patterns.

名下房产查询3.OVERVIEW

In this ction,we introduce our notations,deﬁnitions and the main structure of the propod model.

农业供给侧改革3.1Preliminary Concepts

The overall traﬃc map is partitioned into regions (Rgn )bounded by high level (i.e.major)roads,each of which may consist of a number of road gments.Figure 1(a)and 1(b)demonstrate an example of region formations.

Definition 1.Trajectory :A trajectory T r is a trace cre-

Figure2:The overall structure of our model for detecting spatio-temporal outliers and their inter-causalities.

ated by a moving object in geographical space.A T r is rep-rented by of a t of time-ordered T r:p1→p2→...→p n,where each point consists of a geospatial co-ordinate t and a p=(longitude,latitude, timestamp).

Definition2.Transition:Given a trajectory T r:p1→p2→...→p n,there exists a transition S between Rgn1and Rgn2,if there exists adjacent points p i and p i+1(1≤i≤n+1)such that p i is in Rgn1and p i+1is in Rgn2,and Rgn1is not the same to Rgn2.A transition is associated with a leaving time(timestamp of p i)and an arriving time (timestamp of p i+1).

班主任系统Definition3.Link:A link(Lnk)is comprid of a pair of regions(<Rgn o,Rgn d>)indicating a virtual spatial con-nection between the origin region and the destination re-gion.There exists a link from one region(Rgn o)to another (Rgn d)if and only if there exists at least one object moving from Rgn o at timestamp ts i to Rgn d at ts i+1.Figure1(c) and1(d)give examples of links.

Definition4.Time frame:A time frame(tf)is a t of concutive time intervals1.Figure3(a)shows an example of a time frame.

Definition5.Spatio-temporal outlier:A spatio-tem-poral outlier(ST O)is a link who non-spatial and non-temporal attribute values are very diﬀerent from the values of its spatio-temporal neighbors.Spatio-temporal neighbors of a link Lnk i are the links who locations and timestamps are clo to tho of Lnk i.

Definition6.Outlier causality:ST O2(with a region pair<Rgn o2,Rgn d2>and a time frame tf2)is cau

d by ST O1(containing a region pair<Rgn o1,Rgn d1>and a time frame tf1)if and only if the following conditions hold true: (i)The destination of ST O1is the same as the origin of ST Rgn d1=Rgn o2);(ii)Time frames tf1and tf2 are concutive to each other and tf1is ahead of tf2. During the construction of an outlier tree,an outlier ST O i is a a child of another outlier ST O j if ST O i is caud by ST O j.

3.2Framework

The main structure of our model is illustrated in Figure2. The three main steps are preprocessing traﬃc data to build a region graph,detecting outliers andﬁnally discover causal 1We call a“time interval”a“timebin”in the rest of lationships between the discovered outliers.The cond and the third step have three and two sub-steps respectively. Details of the(sub-)steps are described in the following ction.

黄未

4.METHODOLOGY

In this ction,we provide details of our model as shown in Figure2.Speciﬁcally,we focus on the detection of spatio-temporal outliers bad on each link’s“distort”,construct outlier causality trees bad on the outliers,and discover the most frequent causal trees which are indicative of recur-ren

t traﬃc abnormalities.

4.1Building the Graph of Regions

有你好幸福

In our study,we assume the map of traﬃc network,the t of major roads,and the trajectories of objects are all known. Although it is more straightforward to apply a simple“n×m”grid on maps to deﬁne regions,cells of a grid are equal-sized and do not reﬂect natural diﬀerences of regions in a traﬃc network.So instead of using equal-sized grids,we deﬁne regions of a traﬃc network by road gments as illustrated in Figure1.In detail,we build a graph of regions according to the following three steps.

1.Region Partitioning:As shown in Figure1(a)and Fig-

ure1(b),we partition a city into dis-jointed regions using the major roads of the city.Here we employ Connected Components Labeling(an image gment method)[21]to partition a map into regions eﬀectively and eﬃciently,since the problem of subdivisions in a polygonal region is known to be NP-complete[11].

2.Formulating transitions:By scanning the trajectory

额手称庆什么意思

data t,we transfer each trajectory into a quence of transitions between pairs of regions in terms of deﬁ-nition2.As demonstrated in Figure1(c),a trajectory passing three regions a,b,and c results in two transi-tions:a⇒b,and b⇒c.

3.Generating links:If there is a transition generated be-

tween two regions,we connect the two regions with a link(refer to Deﬁnition3).In timebin j,a link i(Lnk i

=<Rgn o,Rgn d>)is associated with a feature vector of three properties f i,j≡<#Obj,Pct o,Pct d>:

(a)#Obj:Total number of objects on the

objects moving from Rgn o to Rgn d in this time-

bin);

(b)Pct o:The proportion of#Obj among all objects

moving out of Rgn o in this timebin;

(c)Pct d:The proportion of#Obj among all objects

moving into Rgn d in this timebin;

Then,using Figure1(d)as an example(where the number shown on each link is the number of transitions pertaining to the link),the property of link a⇒b is

i,j=<#Obj=2,Pct o=

2+3

=0.4,Pct d=2

2+6

=0.25>.

4.2Detecting Outliers from Graph Links Assume each time frame is comprid of aﬁxed number of q timebins.Given a time frame tf j,we denote the quence of feature values of a link Lnk i in this time frame by:

F i,j=< f i,j−q+1, f i,j−q+2,..., f i,j>.(1)

(a)An example of time frame deﬁned by the gment between two vertical(green)lines.

(b)Time frames to be compared with the one in(a)

Figure3:An example of the number of taxis on a certain link in four adjacent days.Gaps between gre

en vertical lines reprent time frames,each of which is comprid of v-eral timebins.One unit of timebin in x-axis reprents30 48timebins compri a day).The value of minDistort of the time frame in subﬁgure(a)is obtained by calculating the smallest diﬀerence between the time frame

in(a)and the ones at the same time in adjacent days as shown in subﬁgure(b).

For each link(Lnk i)in each time frame tf j,we calculate the distortion between two time frames(denoted by minDistort i,j) by arching for the minimum diﬀerence between tf j and the same time frames of the same days on concutive weeks. With this approach minDistort is capable of capturing the special pattern of traﬃc data that similar behaviors are ob-rved during the same time of diﬀerent days or the same day of diﬀerent weeks etc.

Algorithm1shows the procedure of calculating distorts.

In line7of the algorithm,we obtain the diﬀerence between two time frames of a link by computing their Euclidean dis-

tance:

Distance(tf j,tf t,Lnk i)=

q−1

k=0

f i,j−k− f i,t−k 2(2)

We u minDistort i,j obtained from Algorithm1as the “non-spatial and non-temporal attributes”(e Deﬁnition5) of each link in each time frame.Extreme values among minDistort of all links are identiﬁed as temporal outliers. By subtracting the min and dividing by the max the feature values of the links are in the range of[0,1].The normal-ization removes the eﬀect of diﬀerent regions and volume sizes.Another advantage of using minDistort is that it prevents the examination of many repeating patterns(where minDistort≈0).

Then for each time frame,there is a corresponding three dimensional vector(formed by features<#Obj,Pct o,Pct d>) shown in Figure4.As each point reprents a link,we iden-tify the most extreme points as outliers in that time frame. To normalize the eﬀect of variances along diﬀerent direc-tions,we u the Mahalanobis distance(instead of Euclidean distance)to measure the the extremeness of data points.We Algorithm1minDistort:calculating minimum distort of time quences

Input:Lnk i:a link;tf j:a time frame;t:number of adjacent weeks to check

Output:minDistort i,j:the degree of distort for link Lnk i in time frame tf j菖蒲怎么读

——————————————–

1:minDist←+Inﬁnity;

2:T←tf j±u weeks,u∈{−t,...,t}

3:for All time frames tf t in T do

4:if tf t overlap with tf j then

5:Continue;

6:end if

7:currentDist←Distance(tf j,tf t,Lnk i);

8:if currentDist<minDist then

9:minDist←currentDist;

10:end if

11:end for

12:Return minDist;

手拉着手

Figure4:An illustration of the three-dimensional unit cube formed by three features<#Obj,Pct o,Pct d>before nor-malization for computing the Mahalanobis distance.We la-bel the most“extreme”points among all points as outliers. For example,outlier points are the ones who distance are the farthest to the center of the data cluster.

u the Mahalanobis distance here in order to normalize the distance by the variance in diﬀerent directions.

In this way,the outliers detected are links who features have the largest diﬀerence from both their temporal neighbors(for using“minDistort”)and spa-tial neighbors(for being detected“among all links”)–so they are spatio-temporal outliers(STOs).Another ad-vantage of identifying“extreme points”as outliers is that it can detect abnormal links with either too low volumes or too high volumes since extremeness of points are bad on their Mahalanobis distances.

Now each STO is a spatial link associated with a time frame.We reprent a STO by its link Lnk i(containing an original region and a destination region)and its time frame tf ,STO i,j=<Rgn i,o,Rgn i,d,tf j>.

4.3Constructing Outlier Trees

We propo an algorithm named STOTree thatﬁnds out-lier dependencies by looking at the relationship of outliers from the earliest time frame through the last.The main insight of STOTree is that an outlier ST O1is a par-ent of another outlier ST O2if ST O1occurred before ST O2in time and they are spatially correlated.Al-

Algorithm2STOTree:constructing all outlier trees Input:ST Outlier:a t of spatial-temporal outliers of size t×k

where t is the number of time frames,and k is the number of outliers to examine in a time frame.

Output:ST OT rees:a list of roots of spatial-temporal trees.——————————————–

1:ST OT rees←an empty t{};

2:for Each time frame i(i∈(1,...,t))do

3:for Each outlier j(j∈(1,...,k))in time frame i do

4:ST ORoot i,j←FindAllChildren(ST Outlier i,j,i);

5:ST OT rees←ST OT rees∪ST ORoot i,j;

6:end for

7:end for

8:Return ST OT rees;

——————————————–

Subroutine:FindAllChildren(ST Outlier i,j,i)

9:if Time frame i is the last time frame then

10:Return ST Outlier i,j;

11:end if

12:ST Outlier i,j.subnodes←an empty t{};

13:for Each outlier u(u∈(1,...,k))in time frame i+1do 14:if ST OT rees contains ST Outlier i+1,u then

15:continue;

16:end if

17:if ST Outlier i,j.Rgn d==ST Outlier i+1,u.Rgn o then 18:ST Outlier i,j.subnodes←ST Outlier i,j.subnodes∪FindAllChildren(ST Outlier i+1,u,i+1);

19:end if

20:end for

21:Return ST Outlier i,j;

gorithm2demonstrates the process of constructing outlier trees from discovered outliers.Note the algorithm results in a collection of trees(a forest).The subroutine(Line9 to21)is a recursive function ud to retrieve all possible descendants of a node.For each time frame,this recursive function is called on each outlier of the current time frame to compare with each outlier of next time frame,unless the “current”outlier tree already contains outliers of next time frame(Line14to16).So the overall time complexity of the outlier tree construction process on each time frame is upper bounded by O(k2),where k is the number of outliers in a time frame.

We do not place a restriction on the maximum size of outlier trees in the STOTree algorithm,under the assump-tion that abnormal events caud by one single accident are not expected to last for a long time and the size outlier trees should not grow inﬁnitely.In Section5.3we provide empirical evidence that conﬁrms the maximum size of trees is usually small.

Now we give an example by using Figure5to demonstrate the process of Algorithm2for building outlier trees.Fig-ure5us top3outliers in three concutive time frames,so the input parameters in Algorithm2in this ca are k=3 and t=3.The algorithm starts from time frame1(Line2 of Algorithm2),a

nd for each of the top three outlying links (Line3to6),i.e.,A⇒B,C⇒D and E⇒F,the algo-rithm arches in time frame2(Line13to20)and checks whether there is any following link that can be a child of a previous link(Line17to19).This allows the algorithm to ﬁnd outlying links B⇒G and B⇒E as children of A⇒B; and similarly it identiﬁes link H⇒K in time frame3as a child of J⇒H in time frame2.Therefore two outlier trees are built up as shown in the right side of Figure5.In this way,Algorithm2scans through all time frames of traﬃc data we have,and builds a forest of various outlier

trees.Figure5:A synthetic example for demonstrating the process of building a forest of two outlier trees.The subﬁgure on the left illustrates top3outlying links in three concutive time frames,and the one on the right shows two outlier trees obtained from the time frames.

4.4Causal Outlier Detection

Denote by T the forest containing all outlier trees.The most signiﬁcant and recurring causal relationships corre-spond to the most frequent subtrees of T.The mechanism of discovering frequent subtrees from all outlier trees is in-spired by the process of mining frequent item ts,except that we design our own strategy to generate frequent subtree candidates(through node inrtion on trees).

The process of discovering frequent substructures from constructed outlier trees is shown in Algorithm3.Given a predeﬁned support threshold ,weﬁrstﬁnd all single nodes who supports exceed (Line3of the algorithm),then we u this t of frequent single nodes to form candidates of frequent subtrees.The“while”iteration(Line6to30)ﬁrst generates candidates of subtrees(Line9to15),and then checks the support of each candidate and performsﬁltering (Line18to29)according to .

When generating subtree candidates,new subtrees(who sizes are incread by one)are created by inrting a frequent single node into previous frequent subtrees.This node in-rtion process is gi

ven in Algorithm4.The single node to be inrted isﬁrst compared with the root of the tree,and is inrted as a subnode of the root(Line1to3of Algorithm4) if the root can be a parent of the single node and its existing children do not contain the single node.Otherwi,the sin-gle node is compared and checked whether it can be inrted into branches below the a recursive process shown in Line8to12).When counting the support of a candidate subtree,we increa the frequency of the candidate by one if all nodes(with their immediate subnodes)of the candidate have an exact match with a discovered outlier tree(Line21 to23).

The eﬀectiveness and strengths of our algorithms,STOTree and frequentSubtree,are evaluated in the next ction. 5.EXPERIMENTS AND ANALYSIS

In this ction we report on the experiments carried out on taxi trajectory data on the road network of Beijing city. Our experiments are conducted on a64bit rver with3.2 GHz CPU and8GB memory.We note that although we are using road traﬃc data,our methods and algorithms can be

本文发布于:2023-07-27 03:14:06，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/82/1119258.html

上一篇：二分类和多分类问题下的评价指标详析（Precision,Recall,F1，Micro，M。。。

下一篇：机器学习系列机器学习评价指标

标签：改革查询系统额手称庆农业名下班主任

留言与评论（共有 0 条评论）