Change Analysis in Spatial Data by Combining Contouring
Algorithms with Supervised Density Functions
Change Analysis in Spatial Data by Combining Contouring
Algorithms with Supervised Density FunctionsPAKDD 2009, Bangkok,
Thailand. April 29, 2009
Chun Sheng Chen1 , Vadeerat Rinsurongkawong1, Christoph F.
Eick1, and Michael D. Twa21 Department of Computer Science,
University of Houston2 College of Optometry, University of
HoustonAbstractDetecting changes in spatial datasets is important
for many fields such as early warning systems that monitor
environmental conditions or sudden disease outbreaks, epidemiology,
crime monitoring, and automatic surveillance. To address this need,
this paper introduces a novel methodology and algorithms that
discover patterns of change in spatial datasets.2Detecting changes
in spatial datasets is important for many fields such as early
warning systems that monitor environmental conditions or sudden
disease outbreaks, epidemiology, crime monitoring, and automatic
surveillance.This paper introduces a novel methodology and
algorithms that discover patterns of change in spatial
datasets.2OutlineIntroductionContributionsSupervised Density
EstimationContour Clustering AlgorithmContour PolygonsChange
Analysis ApproachesChange Analysis PredicatesDemonstrationRelated
WorkSummary and Future Work
331.IntroductionWe are interested in finding what patterns
emerged between two datasets, Oold and Onew, sampled at different
time frames. Change analysis centers on identifying changes
concerning interesting regions with respect to Oold and Onew.The
approach employs supervised density functions [Jiang 2007] that
create density maps from spatial datasets. Regions (contiguous
areas in the spatial subspace) where density functions take high
(or low) values are considered interesting by this approach.
Interesting regions are identified using contouring techniques. 4We
are interested in change analysis that centers on identifying
changes concerning interesting regions between two dataset Oold and
Onew sampled at different time frames.The approach employs
supervised density functions that create density maps from spatial
datasets.Regions where density functions take high or low values
are considered interesting by this approach. Interesting regions
are identified using contouring techniques.
42.ContributionsIn general, our work is a first step towards
analyzing complex change patterns. The contributions of this paper
include: 1) using density functions in contouring algorithm; 2)
change analysis is conducted by interestingness comparison; 3)
degrees of change are computed relying on polygon operations; 4) a
novel change analysis approach is introduced that compares clusters
that are derived from supervised density functions.
53.Supervised Density Estimation
6Density estimation is called supervised because in addition to
the density based on the locations of objects, we take the variable
of interest z(o) into consideration when measuring density.In
contrast to past work in density estimation, our approach employs
weighted influence functions to measure the density in datasets O:
the influence of o on v is weighted by z(o) and measured as a
product of z(o) and a Gaussian kernel function.The figure depicts
an example of results from Supervised Density Estimation. A dataset
O is a dataset in which objects belong to two classes in blue and
yellow color. O contains the points which are assumed to have
spatial attributes (x,y) and attribute of interest z where z takes
the value +1 if the objects belong to class yellow and -1 if the
objects belong to class blue. Figure b visualizes the supervised
density function of the dataset O. Figure c shows the density
contour map for the density threshold 10 in red and -10 in
blue.63.Supervised Density Estimation In particular, the influence
of object oO on a point vF is defined as:
The overall influence of all data objects oiO for 1 i n on a
point vF is measured by the density function O(v), which is defined
as follows:
7We assume that objects o in a dataset O={o1,,on} have the form
((x, y), z) where (x, y) is the location of object o and zdenoted
as z(o) is the value of the variable of interest of object o.
Supervised density estimation does not only consider the frequency
with which spatial events occur but also takes the value of the
variable of interest into consideration Density increases as
frequency and z(o) increase.
74.DCONTOUR: A Contour Clustering AlgorithmWe have developed a
contour clustering algorithm named DCONTOUR that combines
contouring algorithms and density estimation techniques.
8We have developed a contour clustering algorithm named DCONTOUR
that combines contouring algorithms and density estimation
techniques.First, a space is subdivide into grid cellsThen, a
density map is generated.DCONTOUR computes intersection points on
grid cells by using binary search and interpolation.Finally,
contour polygons are created by connecting the intersection
points.
85.Contour PolygonsIn our approach, interesting regions
(clusters) are represented by polygons.Our change analysis performs
on a set of polygons by using polygon operations such as polygon
intersection, union, difference and size (area). 9
In our approach, regions (clusters) are represented by
polygons.Our change analysis performs on a set of polygons by using
polygon operations such as polygon intersection, union, difference
and size (area).
96.Change Analysis Approaches
10The change analysis approach introduced in this paper is
summarized by the flow on the left of the figure.First, an
interestingness perspective is defined on the variable of
interest.Next, a density map is created by supervised density
estimation techniques with respect to the interestingness
perspective.Then, contour clusters that represent interesting
regions are generated by DCONTOUR.Finally, changes between
interesting regions of two consecutive time frames are analyzed by
a set of change predicates.107.Change Analysis PredicatesWe
introduce basic predicates that capture different relationships for
change analysis.Change analysis predicates operate on
polygons.Agreement between r and r can be computed as
follows:Agreement(r,r) = |r r|/|r r| The most similar region r in X
with respect to r in X is the region r for which Agreement(r,r) has
the highest value.
11We introduce basic predicates that capture different
relationships for change analysis.Given two clusterings X and X for
Onew and Oold, respectively, relationships between the regions that
belong to X and X can be analyzed. Let r be a region in X and r be
a region in X. Agreement of r and r is the intersection of region r
and r divides by the union of the two regions.
117.Change Analysis PredicatesIn addition to agreement, we also
define predicates novelty, relative-novelty, disappearance and
relative-disappearance below. Novelty (r) = (r(r1
rk))Relative-Novelty(r) = |r(r1 rk)|/|r|Disappearance(r) = (r(r1
rk))Relative-Disappearance(r) = |r(r1 rk)|/|r|We claim that the
above and similar measurements are useful to identify what is new
in a changing environment.Moreover, the predicates we introduced so
far can be used as building blocks to define more complex
predicates.
12In addition to agreement, we also define predicates novelty,
relative-novelty, disappearance and relative-disappearance below.
Let r, r1, r2,..., rk be regions discovered at time t, and r, r1,
r2,, rk be regions that have been obtained for time t+1. Novelty
measure captures regions that have not been interesting in the
past. On the other hand, disappearance is used to discover regions
where those characteristics are disappearing.Relative Novelty and
Relative Disappearance are percentages of Novelty and
Disappearance.We claim that the above and similar measurements are
useful to identify what is new in a changing environment.Moreover,
the predicates we introduced so far can be used as building blocks
to define more complex predicates.
128.DemonstrationWe uniformly sampled earthquakes Oold : January
1986 to November 1991Onew : December 1991 and January 1996Each
dataset contains 4132 earthquakes.We analyze changes in strong
positive or negative correlations between the depth of the
earthquake and the severity of the earthquake. The variable of
interest, z(o) is defined as follows:
13 We demonstrate our proposed methodologies and algorithms by
analyzing co-location of depth and severity of earthquakes.
138.Demonstration
Contour polygons generated by DCONTOUR for Oold (upper-left
figure) and Onew (lower-right figure). 14The figures shows areas
with positive and negative correlations in dataset Oold on the left
and in dataset Onew on the right.Red polygons are areas having
positive correlations (areas where earthquakes are deep and strong
or shallow and weak). Blue polygons indicate areas with significant
negative correlations (deep earthquakes are always less severe and
shallow earthquakes tend to be strong).
148.Demonstration
Overlap of contour polygons of Oold and OnewNovel polygons of
Onew with respect to Oold15Agreement(r,r)= |r r|/|r r|Novelty (r) =
(r(r1 rk))The upper left figure shows the intersection regions of
the datasets Oold and Onew (filled by orange are
positive-correlated areas and filled by green are
negative-correlated areas).The lower right figure shows the novel
polygons in dataset Onew with respect to to dataset Oold.
158.Demonstration
Contour polygons generated by DCONTOUR for Oold (left figure)
and Onew (right figure).
Overlap of contour polygons of Oold and OnewNovel polygons of
Onew with respect to Oold16The upper figures shows areas with
positive and negative correlations in dataset Oold on the left and
in dataset Onew on the right.Red polygons are areas having positive
correlations (areas where earthquakes are deep and strong or
shallow and weak). Blue polygons indicate areas with significant
negative correlations (deep earthquakes are always less severe and
shallow earthquakes tend to be strong). The lower left figure shows
the intersection regions of the two datasets in the upper figures
(filled by orange are positive-correlated areas and filled by green
are negative-correlated areas).The lower right figure shows the
novel polygons in dataset Onew with respect to to dataset Oold.
169.Related WorkOur change analysis approach relies on
clustering analysis.The advantage of our change analysis approaches
over the previous work [Asur 2007], [Fleder 2006], [Spiliopoulou
2006] is that we can detect various types of changes in data with
continuous attributes and unknown object identity.Existing contour
plotting algorithms can be seen as variations of two basic
approaches: Level curve tracing algorithms [Watson 1992] scan a
grid and mark grid-cell boundaries that are passed by the level
curve. Contour polygons are created by connecting the marked
edges.Recursive subdivision algorithms [Bruss 1977] start with a
coarse initial grid and recursively divide grid cells that are
passed by the level curve. DCONTOUR uses level curve tracing.
17Our change analysis approach relies on clustering analysis.The
advantage of our change analysis approaches over the previous work
is that we can detect various types of changes in data with
continuous attributes and unknown object identity.Existing contour
plotting algorithms can be seen as variations of two basic
approaches: Level curve tracing and Recursive subdivision.Our
DCONTOUR uses level curve tracing.
1710.SummaryDeveloping techniques for discovering change in
spatial datasets is important and providing methods to detect
change for continuous attributes and for objects that are not
identified apriori are advantages of the work we describe here. In
this paper, change analysis techniques that rely on comparing
clusters for the old and new data based on a sets of change
predicates are proposed. A novel contour clustering algorithm named
DCONTOUR that combines supervised density functions with contouring
algorithms has been introduced.
18We introduce methodologies and algorithms for change analysis
that are applicable to data with unknown object identity and
numeric attributes.Our change analysis relies on comparing clusters
for the old and new data based on a set of proposed change
predicates.A novel contour clustering algorithm named DCONTOUR that
combines supervised density functions with contouring algorithms is
introduced.1810.The Ultimate Vision of this Research Development of
change analysis systems that automatically detect important changes
in spatial datasetsThe change analysis system provides reusable
components that can be used for any problem that requires
continuous of spatial temporal events Embedding the change analysis
system itself into bigger systems that solve critical problems of
our society such as automatic surveillance systems, early warning
systems and diagnostic tools Mining the patterns of changes
themselves to detect complex patterns such as progression of
pollution and diseasesTo contribute to important scientific
disciplines such as epidemiology that requires the analysis of
complex patterns of changes
1919Thank you for your attention
Question?20Input: Density function , density threshold d.Output:
Density polygons for density threshold d.1. Subdivide the space
into D grid cells.2. Compute densities at grid intersection points
by using density function 3. Compute contour intersection points b
on grid cell edges where (b) =dusing binary search and
interpolation.4. Compute contour polygons from contour intersection
points b.