Top Banner
STING: A Statistical Information STING: A Statistical Information Grid Approach to Spatial Data Grid Approach to Spatial Data Mining Mining Presentation 2(Group 14) Presentation 2(Group 14) CSE 590 Data Mining Prof. Anita Wasilewska SUNY Stony Brook Presented By: Tejas Somani Nikhil Pujari
16

STING: A Statistical Information Grid Approach to Spatial Data ...

Jun 09, 2015

Download

Documents

Tommy96
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: STING: A Statistical Information Grid Approach to Spatial Data ...

STING: A Statistical Information Grid STING: A Statistical Information Grid Approach to Spatial Data MiningApproach to Spatial Data Mining

Presentation 2(Group 14)Presentation 2(Group 14)

CSE 590 Data MiningProf. Anita Wasilewska

SUNY Stony Brook

Presented By:Tejas SomaniNikhil Pujari

Page 2: STING: A Statistical Information Grid Approach to Spatial Data ...

STING: A Statistical STING: A Statistical Information Grid Approach to Information Grid Approach to

Spatial Data MiningSpatial Data MiningPaper by:

Wei WangDepartment of Computer

ScienceUniversity of California, Los

AngelesCA 90095, U.S.A.

[email protected]

Jiong Yang

Department of Computer Science

University of California, Los

Angeles

CA 90095, U.S.A.

[email protected]

Richard Muntz

Department of Computer Science

University of California, Los

Angeles

CA 90095, U.S.A.

[email protected]

VLDB Conference Athens, Greece, 1997VLDB Conference Athens, Greece, 1997

Page 3: STING: A Statistical Information Grid Approach to Spatial Data ...

ReferencesReferenceshttp://georges.gardarin.free.fr/Cours_X

MLDM_Master2/Sting.PDFhttp://www.webopedia.com/TERM/S/sp

atial_data.htmlJiawei Han and Michelle Kamber. Data

Mining Concept and Techniques (Chapter8). Morgan Kaufman, 2002

Using Grid-clustering Methods in Data Classification by Peter Grabusts and Arkady Borisov @Riga Technical University

Page 4: STING: A Statistical Information Grid Approach to Spatial Data ...

What is Spatial Data??What is Spatial Data??Spatial data may be thought of as features

located on or referenced to the Earth's surface, such as roads, streams, political boundaries, schools, land use classifications, property ownership parcels, drinking water intakes, pollution discharge sites - in short, anything that can be mapped.

Spatial Area: The area that encompasses the locations of

all the spatial data is called spatial area.

http://www.webopedia.com/TERM/S/spatial_data.html

Page 5: STING: A Statistical Information Grid Approach to Spatial Data ...

STING The OverviewSTING The Overview

• STING is a grid based method to efficiently process many common region oriented queries on a set of points

• A set of points satisfying some criterion defines a Region

• It is a hierarchical Method. The idea is to capture statistical information associated with spatial cells in such a manner that the whole classes of queries can be answered without referring to the individual objects.

We want to cluster the records that are in a spatial table in terms of location.

Placement of a record in a grid cell is completely determined by its physical location.

http://georges.gardarin.free.fr/Cours_XMLDM_Master2/Sting.PDF

Page 6: STING: A Statistical Information Grid Approach to Spatial Data ...

Grid Cell HierarchyGrid Cell Hierarchy

Spatial Area is divided into rectangular cells

Each cell has a hierarchical structure.

Each cell at a higher level is partitioned into

number of cells of the next lower level (here

4)i.e., A cell in level i corresponds to the union

of the areas of its children at level i + 1The size of the leaf level cells is dependent

on the density of objects.http://georges.gardarin.free.fr/Cours_XMLDM_Master2/Sting.PDF

Page 7: STING: A Statistical Information Grid Approach to Spatial Data ...

Hierarchical Structure for STING Hierarchical Structure for STING ClusteringClustering

Data Mining: Concepts and Techniques by by Jiawei Han, Micheline Kamber

Page 8: STING: A Statistical Information Grid Approach to Spatial Data ...

Statistical ParametersStatistical ParametersFor each cell we have attribute-dependent

and attribute-independent parametersThe attribute independent parameter is

number of objects in a cell-nFor attribute dependent parameters it is

assumed that for each object its attributes have numerical values.

For each Numerical attribute we have the following five parameters

Page 9: STING: A Statistical Information Grid Approach to Spatial Data ...

Statistical Parameters..Statistical Parameters..m- mean of all values in this cells- standard deviation of all values

in this cellmin-the minimum value of the

attribute in this cellmax-the minimum value of the

attribute in this celldistribution-the type of

distribution this cell follows. Data Mining: Concepts and Techniques by by Jiawei Han, Micheline Kamber

Page 10: STING: A Statistical Information Grid Approach to Spatial Data ...

Statistical Parameters..Statistical Parameters..Statistical information regarding the

attributes in each grid cell, for each layer are pre-computed and stored before hand.

The statistical parameters for the cells in the lowest layer is computed directly from the values that are present in the table, when data are loaded into the database.

The Statistical parameters for the cells in all the other levels are computed from their respective children cells that are in the lower level.

Page 11: STING: A Statistical Information Grid Approach to Spatial Data ...

Query Types and Query Query Types and Query ProcessingProcessing1)Query Types SQL like Language used to describe queries Two types of common queries found: one is to

find region specifying certain constraints and other take in a region and return some attribute of the region

2) Query Processing:We use a top-down approach to answer

spatial data queries.

Start from a pre-selected layer-typically with a small number of cells.

Page 12: STING: A Statistical Information Grid Approach to Spatial Data ...

Query Processing..Query Processing..

The pre-selected layer does not have to be the top most layer.

For each cell in the current layer compute the confidence interval (or estimated range of probability) reflecting the cells relevance to the given query

The confidence interval is calculated by using the statistical parameters of each cell.

From the interval calculated we label the cells as relevant or irrelevant for this query

Remove irrelevant cells from further consideration.

Page 13: STING: A Statistical Information Grid Approach to Spatial Data ...

Query Processing..Query Processing.. When finished with the current layer, proceed to

the next lower level.

Processing of the next lower level examines only the remaining relevant cells.

Repeat this process until the bottom layer is reached.

At this time if query specifications are met, the regions of relevant cells that satisfy the query are returned

Otherwise, the data that fall into the relevant cells are retrieved and further processed until they meet the requirement of the query

Page 14: STING: A Statistical Information Grid Approach to Spatial Data ...

Different Grid Levels during Different Grid Levels during Query ProcessingQuery Processing

http://georges.gardarin.free.fr/Cours_XMLDM_Master2/Sting.PDF

Page 15: STING: A Statistical Information Grid Approach to Spatial Data ...

Finally..Finally..Strength and Weakness of Strength and Weakness of STINGSTINGStrength:Grid structure facilitates parallel processing and

incremental updating Is very efficient as the computational cost is

O(g) where g is the total number of grid cells at the lowest level (much smaller than n, total number of objects)

Is query independent as statistical information stored in cells is summary information of data

Weakness:All Cluster boundaries are either horizontal or

vertical, and no diagonal boundary is selected.

Page 16: STING: A Statistical Information Grid Approach to Spatial Data ...

Thank You

All the BEST for FINALS!!!