Top Banner
Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004
40

Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

Dec 23, 2015

Download

Documents

Hollie Powers
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

Geographic Data Mining

Marc van KreveldSeminar for GIVE

Block 1, 2003/2004

Page 2: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

About …

• A form of geographical analysis• Current topic of interest in GIS research

(and database research and AI research)

• Finding hidden information in large collections of geographic data

Page 3: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

This seminar

• Learning about a topic together• Presenting to each other + interaction• Added value by good examples:

– for important concepts, algorithms– possibly self-thought of, or extended– referring to GIS data and issues (hence the

GIS course prerequisite)

• Written assignment: joint survey

Page 4: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

Material

• Book by Harvey Miller and Jiawei Han (editors): selected chapters

• Possibly: papers from conference proceedings

• Mostly provided by me

Page 5: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

Weeks

• Week 36-46• Probably:

– Not September 4 (this Thursday)– Not in week 40 (Sept. 29 & Oct. 2)– Not October 23

• The above depending on participation!

Page 6: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

Overview of Geographic Data Mining & Knowledge

Discovery

Chapter 1 of the book

• KDD: knowledge discovery in databases• Data warehouses• Data mining• Geographic aspects of the above

Page 7: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

Knowledge Discovery in Databases (KDD)

• Large databases contain interesting patterns: non-random properties and relationships that are:– valid (general enough to apply to new data)– novel (non-trivial and unexpected)– useful (leads to effective action: decision

making or investigation)– ultimately understandable (simple, and

interpretable by humans)

Page 8: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

Knowledge Discovery in Databases (KDD)

• Because of quantity of data nowadays• Because we want information, not data• Because computing power allows it

Page 9: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

KDD opposed to statistics

• Statistics– small and clean numeric database– scientifically sampled– specific questions in mind

• KDD: none of the above

Page 10: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

KDD techniques

• Statistics• Machine learning• Pattern recognition• Numeric search (?)• Scientific visualization

Page 11: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

Data warehouse

• Large repository of data• For analytical processing

(DB: transactional processing)• Heterogeneous: different sources and

formats (DB: homogeneous)

• Supports OLAP tools (OnLine Analytical Processing)

Page 12: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

OLAP Example

• Measure of interest: sales• Dimensions of interest: item, store, week

• (item, store, week) money

[quantity sold times price ]

Page 13: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

OLAP Example

• 2-dim. aggregation:(item, store, . ) money

• Another 2-dim. aggregation: sales by store and by week

• 1-dim. aggregation: sales by week (all items and stores)

• Data cube: all 2d possible aggregations, different types of summaries

Page 14: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

KDD steps

• Data selection• Data pre-processing• Data enrichment• Data reduction and projection• Data mining• Interpretation and reporting

Presence of steps and order not fixed

Page 15: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

KDD steps

• Data selection: which records, variables chosen?

Page 16: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

KDD steps

• Data selection• Data pre-processing: removing noise,

duplicate records, handling missing data, …

Page 17: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

KDD steps

• Data selection• Data pre-processing• Data enrichment: combining the

selected data with external data

Page 18: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

KDD steps

• Data selection• Data pre-processing• Data enrichment• Data reduction and projection:

reduction in number, reducing dimension

Page 19: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

KDD steps

• Data selection• Data pre-processing• Data enrichment• Data reduction and projection• Data mining: uncovering information,

interesting patterns

Page 20: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

KDD steps

• Data selection• Data pre-processing• Data enrichment• Data reduction and projection• Data mining• Interpretation and reporting: evaluating,

understanding, communicating

Page 21: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

Data mining

• Segmentation• Dependency analysis• Deviation and outlier analysis• Trend detection• Generalization and characterization

Page 22: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

DM - segmentation

Description:• Clustering: finding a

finite set of implicit classes

• Classification: mapping data items into pre-defined classes

Techniques:• Cluster analysis• Bayesian

classification• Decision or

classification trees• Artificial neural

networks

Page 23: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

DM - segmentation

clustering

given classes

classification

Page 24: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

DM – dependency analysis

Description:• Finding rules to

predict the value of some attribute based on other attributes

Techniques:• Bayesian networks• Association rules

(4, 12, 0.24)(3, 14, 0.21)(7, 13, 0.43)(2, 9, 0.78)(11, 11, 0.55)

(5, 11, ???)

(???, 12, 0.51)

Page 25: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

DM – dependency analysis

• Confidence and support measures for association rules of the form:[ if X then Y ]:

confidence = #(X and Y in DB) / #(X in DB)support = #(X and Y in DB) / #(all in DB)

Page 26: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

DM – deviation & outlier analysis

Description:• Finding data with

unusual deviations (=errors, or data of particular interest)

Techniques:• Clustering, other

mining methods• Outlier analysis

Page 27: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

DM – trend detection

Description:• Finding lines, curves,

summarizing the database (often as a function over time)

Techniques:• Regression• Sequential pattern

extraction

Page 28: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

DM – generalization and characterization

Description:• Obtaining compact

descriptions of the data

Techniques:• Summary rules• Attribute-oriented

induction

concept hierarchy low level concept

higher level concept

Page 29: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

Visualization and knowledge discovery

• KDD is difficult to automate steered by human intelligence

• Visualization helps to understand the data and which data mining techniques to try

Page 30: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

KD + geography

• Special case of KDD• Other special cases

– marketing– biology– astronomy

• Main features: location, distance, dimen-sionality (with dependent dimensions)

Page 31: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

KD + geography

(attr1, attr2, attr3, attr4); attr’s are numbers and (relatively) independent: statistics

(attr1, attr2, attr3, attr4); attr’s can also be on other measurement scales: KDD

(attr1, attr2, attr3, attr4); attr’s are often dependent and can be shapes: KD + geography

Often: (lat., long., attr1, attr2, …)or: (shape description, attr1, attr2, …)

Page 32: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

KD + geography

• Study of scalable versions of DM tasks (in lat. and long.)

• Certain dimensions can be non-metric (travel time need not be symmetric)

• DM in data that is not in the form of tuples: sets of thematic map layers

Page 33: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

Geographic data mining

• Spatial segmentation (clustering, classification)

• Spatial dependency (spatial association rules)

• Spatial trend detection• Geographic characterization and

generalization

Page 34: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

GDM – spatial association rules

• Example: If a location is within 500 m from water and the average winter temperature is at least –2 degreesthen there are frogs around

distance relationship

Page 35: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

GDM – spatial trend detection

• Patterns of change with respect to neighborhood of some object

• Example: (North America) Further from Pacific ocean fewer earthquakes

Page 36: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

GDM - applications

• Map interpretation• Remote sensing interpretation• Environmental mapping (soil type, etc.)• Extracting spatio-temporal patterns for

cyclones, crimes• Spatial interaction (movement/flow of

people, capital, goods)

Page 37: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

Conclusions

• GDM & GKD is an extension of (tool for) geographical analysis

• GDM is different from DM due to– Geographic spaces, not attribute space– Neighborhood is extremely important– Scale issues– Data is different– Applications (interesting patterns to mine

for) are different

Page 38: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

This seminar on GDM

• First: chapters from the book– CH 1: GDM & KD: an overview (today)– CH 2: Paradigms for spatial and spatio-temporal DM(11-9)– CH 3: Fundamentals of spatial DW for GKD (15-9)– CH 7: Algorithms and applications of SDM (Ronny)(18-9)– CH 8: Spatial clustering in DM (22-9)– CH 6: Modeling spatial dependencies (25-9)

(not: 29-9 and 2-10)

– CH 9: Detecting outliers (6-10)– CH 10: Knowledge construction based on GVis and KDD– CH 14: Mining mobile trajectories

Page 39: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

This seminar

• All PowerPoint presentations on the Web page of the course

• Survey paper or written exam; possible topics for survey:– Hierarchical clustering– Clustering with obstacles– Proximity relationship mining– …

• Or: joint survey of (geometric) algorithms for GDM

Page 40: Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

Each presentation

• The chapter contents• Additional (spatial) examples

(from the Web links or self-constructed)• Detect and present algorithmic

problems that appear together: report on algorithmic issues in GDM

• Present your chapter; don’t be afraid of overlap with other chapters