STING: A Statistical Information STING: A Statistical Information Grid Approach to Spatial Data Grid Approach to Spatial Data Mining Mining Presentation 2(Group 14) Presentation 2(Group 14) CSE 590 Data Mining Prof. Anita Wasilewska SUNY Stony Brook Presented By: Tejas Somani Nikhil Pujari
16
Embed
STING: A Statistical Information Grid Approach to Spatial Data ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
STING: A Statistical Information Grid STING: A Statistical Information Grid Approach to Spatial Data MiningApproach to Spatial Data Mining
Presentation 2(Group 14)Presentation 2(Group 14)
CSE 590 Data MiningProf. Anita Wasilewska
SUNY Stony Brook
Presented By:Tejas SomaniNikhil Pujari
STING: A Statistical STING: A Statistical Information Grid Approach to Information Grid Approach to
atial_data.htmlJiawei Han and Michelle Kamber. Data
Mining Concept and Techniques (Chapter8). Morgan Kaufman, 2002
Using Grid-clustering Methods in Data Classification by Peter Grabusts and Arkady Borisov @Riga Technical University
What is Spatial Data??What is Spatial Data??Spatial data may be thought of as features
located on or referenced to the Earth's surface, such as roads, streams, political boundaries, schools, land use classifications, property ownership parcels, drinking water intakes, pollution discharge sites - in short, anything that can be mapped.
Spatial Area: The area that encompasses the locations of
all the spatial data is called spatial area.
http://www.webopedia.com/TERM/S/spatial_data.html
STING The OverviewSTING The Overview
• STING is a grid based method to efficiently process many common region oriented queries on a set of points
• A set of points satisfying some criterion defines a Region
• It is a hierarchical Method. The idea is to capture statistical information associated with spatial cells in such a manner that the whole classes of queries can be answered without referring to the individual objects.
We want to cluster the records that are in a spatial table in terms of location.
Placement of a record in a grid cell is completely determined by its physical location.
4)i.e., A cell in level i corresponds to the union
of the areas of its children at level i + 1The size of the leaf level cells is dependent
on the density of objects.http://georges.gardarin.free.fr/Cours_XMLDM_Master2/Sting.PDF
Hierarchical Structure for STING Hierarchical Structure for STING ClusteringClustering
Data Mining: Concepts and Techniques by by Jiawei Han, Micheline Kamber
Statistical ParametersStatistical ParametersFor each cell we have attribute-dependent
and attribute-independent parametersThe attribute independent parameter is
number of objects in a cell-nFor attribute dependent parameters it is
assumed that for each object its attributes have numerical values.
For each Numerical attribute we have the following five parameters
Statistical Parameters..Statistical Parameters..m- mean of all values in this cells- standard deviation of all values
in this cellmin-the minimum value of the
attribute in this cellmax-the minimum value of the
attribute in this celldistribution-the type of
distribution this cell follows. Data Mining: Concepts and Techniques by by Jiawei Han, Micheline Kamber
Statistical Parameters..Statistical Parameters..Statistical information regarding the
attributes in each grid cell, for each layer are pre-computed and stored before hand.
The statistical parameters for the cells in the lowest layer is computed directly from the values that are present in the table, when data are loaded into the database.
The Statistical parameters for the cells in all the other levels are computed from their respective children cells that are in the lower level.
Query Types and Query Query Types and Query ProcessingProcessing1)Query Types SQL like Language used to describe queries Two types of common queries found: one is to
find region specifying certain constraints and other take in a region and return some attribute of the region
2) Query Processing:We use a top-down approach to answer
spatial data queries.
Start from a pre-selected layer-typically with a small number of cells.
Query Processing..Query Processing..
The pre-selected layer does not have to be the top most layer.
For each cell in the current layer compute the confidence interval (or estimated range of probability) reflecting the cells relevance to the given query
The confidence interval is calculated by using the statistical parameters of each cell.
From the interval calculated we label the cells as relevant or irrelevant for this query
Remove irrelevant cells from further consideration.
Query Processing..Query Processing.. When finished with the current layer, proceed to
the next lower level.
Processing of the next lower level examines only the remaining relevant cells.
Repeat this process until the bottom layer is reached.
At this time if query specifications are met, the regions of relevant cells that satisfy the query are returned
Otherwise, the data that fall into the relevant cells are retrieved and further processed until they meet the requirement of the query
Different Grid Levels during Different Grid Levels during Query ProcessingQuery Processing