PAKDD 2009, Bangkok, Thailand. April 29, 2009

Change Analysis in Spatial Data by Combining Contouring Algorithms with Supervised Density Functions

Change Analysis in Spatial Data by Combining Contouring Algorithms with Supervised Density FunctionsPAKDD 2009, Bangkok, Thailand. April 29, 2009

Chun Sheng Chen1 , Vadeerat Rinsurongkawong1, Christoph F. Eick1, and Michael D. Twa21 Department of Computer Science, University of Houston2 College of Optometry, University of HoustonAbstractDetecting changes in spatial datasets is important for many fields such as early warning systems that monitor environmental conditions or sudden disease outbreaks, epidemiology, crime monitoring, and automatic surveillance. To address this need, this paper introduces a novel methodology and algorithms that discover patterns of change in spatial datasets.2Detecting changes in spatial datasets is important for many fields such as early warning systems that monitor environmental conditions or sudden disease outbreaks, epidemiology, crime monitoring, and automatic surveillance.This paper introduces a novel methodology and algorithms that discover patterns of change in spatial datasets.2OutlineIntroductionContributionsSupervised Density EstimationContour Clustering AlgorithmContour PolygonsChange Analysis ApproachesChange Analysis PredicatesDemonstrationRelated WorkSummary and Future Work

331.IntroductionWe are interested in finding what patterns emerged between two datasets, Oold and Onew, sampled at different time frames. Change analysis centers on identifying changes concerning interesting regions with respect to Oold and Onew.The approach employs supervised density functions [Jiang 2007] that create density maps from spatial datasets. Regions (contiguous areas in the spatial subspace) where density functions take high (or low) values are considered interesting by this approach. Interesting regions are identified using contouring techniques. 4We are interested in change analysis that centers on identifying changes concerning interesting regions between two dataset Oold and Onew sampled at different time frames.The approach employs supervised density functions that create density maps from spatial datasets.Regions where density functions take high or low values are considered interesting by this approach. Interesting regions are identified using contouring techniques.

42.ContributionsIn general, our work is a first step towards analyzing complex change patterns. The contributions of this paper include: 1) using density functions in contouring algorithm; 2) change analysis is conducted by interestingness comparison; 3) degrees of change are computed relying on polygon operations; 4) a novel change analysis approach is introduced that compares clusters that are derived from supervised density functions.

53.Supervised Density Estimation

6Density estimation is called supervised because in addition to the density based on the locations of objects, we take the variable of interest z(o) into consideration when measuring density.In contrast to past work in density estimation, our approach employs weighted influence functions to measure the density in datasets O: the influence of o on v is weighted by z(o) and measured as a product of z(o) and a Gaussian kernel function.The figure depicts an example of results from Supervised Density Estimation. A dataset O is a dataset in which objects belong to two classes in blue and yellow color. O contains the points which are assumed to have spatial attributes (x,y) and attribute of interest z where z takes the value +1 if the objects belong to class yellow and -1 if the objects belong to class blue. Figure b visualizes the supervised density function of the dataset O. Figure c shows the density contour map for the density threshold 10 in red and -10 in blue.63.Supervised Density Estimation In particular, the influence of object oO on a point vF is defined as:

The overall influence of all data objects oiO for 1 i n on a point vF is measured by the density function O(v), which is defined as follows:

7We assume that objects o in a dataset O={o1,,on} have the form ((x, y), z) where (x, y) is the location of object o and zdenoted as z(o) is the value of the variable of interest of object o. Supervised density estimation does not only consider the frequency with which spatial events occur but also takes the value of the variable of interest into consideration Density increases as frequency and z(o) increase.

74.DCONTOUR: A Contour Clustering AlgorithmWe have developed a contour clustering algorithm named DCONTOUR that combines contouring algorithms and density estimation techniques.

8We have developed a contour clustering algorithm named DCONTOUR that combines contouring algorithms and density estimation techniques.First, a space is subdivide into grid cellsThen, a density map is generated.DCONTOUR computes intersection points on grid cells by using binary search and interpolation.Finally, contour polygons are created by connecting the intersection points.

85.Contour PolygonsIn our approach, interesting regions (clusters) are represented by polygons.Our change analysis performs on a set of polygons by using polygon operations such as polygon intersection, union, difference and size (area). 9

In our approach, regions (clusters) are represented by polygons.Our change analysis performs on a set of polygons by using polygon operations such as polygon intersection, union, difference and size (area).

96.Change Analysis Approaches

10The change analysis approach introduced in this paper is summarized by the flow on the left of the figure.First, an interestingness perspective is defined on the variable of interest.Next, a density map is created by supervised density estimation techniques with respect to the interestingness perspective.Then, contour clusters that represent interesting regions are generated by DCONTOUR.Finally, changes between interesting regions of two consecutive time frames are analyzed by a set of change predicates.107.Change Analysis PredicatesWe introduce basic predicates that capture different relationships for change analysis.Change analysis predicates operate on polygons.Agreement between r and r can be computed as follows:Agreement(r,r) = |r r|/|r r| The most similar region r in X with respect to r in X is the region r for which Agreement(r,r) has the highest value.

11We introduce basic predicates that capture different relationships for change analysis.Given two clusterings X and X for Onew and Oold, respectively, relationships between the regions that belong to X and X can be analyzed. Let r be a region in X and r be a region in X. Agreement of r and r is the intersection of region r and r divides by the union of the two regions.

117.Change Analysis PredicatesIn addition to agreement, we also define predicates novelty, relative-novelty, disappearance and relative-disappearance below. Novelty (r) = (r(r1 rk))Relative-Novelty(r) = |r(r1 rk)|/|r|Disappearance(r) = (r(r1 rk))Relative-Disappearance(r) = |r(r1 rk)|/|r|We claim that the above and similar measurements are useful to identify what is new in a changing environment.Moreover, the predicates we introduced so far can be used as building blocks to define more complex predicates.

12In addition to agreement, we also define predicates novelty, relative-novelty, disappearance and relative-disappearance below. Let r, r1, r2,..., rk be regions discovered at time t, and r, r1, r2,, rk be regions that have been obtained for time t+1. Novelty measure captures regions that have not been interesting in the past. On the other hand, disappearance is used to discover regions where those characteristics are disappearing.Relative Novelty and Relative Disappearance are percentages of Novelty and Disappearance.We claim that the above and similar measurements are useful to identify what is new in a changing environment.Moreover, the predicates we introduced so far can be used as building blocks to define more complex predicates.

128.DemonstrationWe uniformly sampled earthquakes Oold : January 1986 to November 1991Onew : December 1991 and January 1996Each dataset contains 4132 earthquakes.We analyze changes in strong positive or negative correlations between the depth of the earthquake and the severity of the earthquake. The variable of interest, z(o) is defined as follows:

13 We demonstrate our proposed methodologies and algorithms by analyzing co-location of depth and severity of earthquakes.

138.Demonstration

Contour polygons generated by DCONTOUR for Oold (upper-left figure) and Onew (lower-right figure). 14The figures shows areas with positive and negative correlations in dataset Oold on the left and in dataset Onew on the right.Red polygons are areas having positive correlations (areas where earthquakes are deep and strong or shallow and weak). Blue polygons indicate areas with significant negative correlations (deep earthquakes are always less severe and shallow earthquakes tend to be strong).

148.Demonstration

Overlap of contour polygons of Oold and OnewNovel polygons of Onew with respect to Oold15Agreement(r,r)= |r r|/|r r|Novelty (r) = (r(r1 rk))The upper left figure shows the intersection regions of the datasets Oold and Onew (filled by orange are positive-correlated areas and filled by green are negative-correlated areas).The lower right figure shows the novel polygons in dataset Onew with respect to to dataset Oold.

158.Demonstration

Contour polygons generated by DCONTOUR for Oold (left figure) and Onew (right figure).

Overlap of contour polygons of Oold and OnewNovel polygons of Onew with respect to Oold16The upper figures shows areas with positive and negative correlations in dataset Oold on the left and in dataset Onew on the right.Red polygons are areas having positive correlations (areas where earthquakes are deep and strong or shallow and weak). Blue polygons indicate areas with significant negative correlations (deep earthquakes are always less severe and shallow earthquakes tend to be strong). The lower left figure shows the intersection regions of the two datasets in the upper figures (filled by orange are positive-correlated areas and filled by green are negative-correlated areas).The lower right figure shows the novel polygons in dataset Onew with respect to to dataset Oold.

169.Related WorkOur change analysis approach relies on clustering analysis.The advantage of our change analysis approaches over the previous work [Asur 2007], [Fleder 2006], [Spiliopoulou 2006] is that we can detect various types of changes in data with continuous attributes and unknown object identity.Existing contour plotting algorithms can be seen as variations of two basic approaches: Level curve tracing algorithms [Watson 1992] scan a grid and mark grid-cell boundaries that are passed by the level curve. Contour polygons are created by connecting the marked edges.Recursive subdivision algorithms [Bruss 1977] start with a coarse initial grid and recursively divide grid cells that are passed by the level curve. DCONTOUR uses level curve tracing.

17Our change analysis approach relies on clustering analysis.The advantage of our change analysis approaches over the previous work is that we can detect various types of changes in data with continuous attributes and unknown object identity.Existing contour plotting algorithms can be seen as variations of two basic approaches: Level curve tracing and Recursive subdivision.Our DCONTOUR uses level curve tracing.

1710.SummaryDeveloping techniques for discovering change in spatial datasets is important and providing methods to detect change for continuous attributes and for objects that are not identified apriori are advantages of the work we describe here. In this paper, change analysis techniques that rely on comparing clusters for the old and new data based on a sets of change predicates are proposed. A novel contour clustering algorithm named DCONTOUR that combines supervised density functions with contouring algorithms has been introduced.

18We introduce methodologies and algorithms for change analysis that are applicable to data with unknown object identity and numeric attributes.Our change analysis relies on comparing clusters for the old and new data based on a set of proposed change predicates.A novel contour clustering algorithm named DCONTOUR that combines supervised density functions with contouring algorithms is introduced.1810.The Ultimate Vision of this Research Development of change analysis systems that automatically detect important changes in spatial datasetsThe change analysis system provides reusable components that can be used for any problem that requires continuous of spatial temporal events Embedding the change analysis system itself into bigger systems that solve critical problems of our society such as automatic surveillance systems, early warning systems and diagnostic tools Mining the patterns of changes themselves to detect complex patterns such as progression of pollution and diseasesTo contribute to important scientific disciplines such as epidemiology that requires the analysis of complex patterns of changes

1919Thank you for your attention

Question?20Input: Density function , density threshold d.Output: Density polygons for density threshold d.1. Subdivide the space into D grid cells.2. Compute densities at grid intersection points by using density function 3. Compute contour intersection points b on grid cell edges where (b) =dusing binary search and interpolation.4. Compute contour polygons from contour intersection points b.