Top Banner
Guest Lecture Guest Lecture Tonight Tonight • Room 205 in EBII (under Dr. Keoghs office) • Starts at 6:10pm • Attendance is mandatory • Best behavior is mandatory • Having an intelligent question is mandatory
35

Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

Guest Lecture TonightGuest Lecture Tonight• Room 205 in EBII (under Dr. Keoghs office)

• Starts at 6:10pm

• Attendance is mandatory

• Best behavior is mandatory

• Having an intelligent question is mandatory

Page 2: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

Spatial Access MethodsSpatial Access Methods

Dr Eamonn KeoghDr Eamonn KeoghComputer Science & Engineering Department

University of California - RiversideRiverside,CA [email protected]

Page 3: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

Mathematical Preliminaries I

(1,1)

What is the distance between the points Blue (1,1) and Red (5,7) ?

The formula for the Euclidian distance between two points is…

22 )()(),( yxedRBlueD

22 )71()51(),( edRBlueD

(5,7)

21.7),( edRBlueD

x

y

Page 4: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

Consider a classic database…

Name ID Type Phone Location GradeMarios Pizza 1 ITA 888-1212 244, 365 D

Joes Bugers 2 US 848-1298 34, 764 A

Tinas Mexican 3 MEX 878-1333 123, 32 A

Sues Pasta 4 ITA 878-1342 876, 65 B

Assume that record(2).Name returns “Joes Bugers” and record(1).Location returns (244, 365) etc.

Page 5: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

B876,65878-1342ITA4Sues Pasta

A123,32878-1333MEX3Tinas Mexican

A34,764848-1298US2Joes Bugers

D244,365888-1212ITA1Marios Pizza

GradeLocationPhoneTypeIDName

B876,65878-1342ITA4Sues Pasta

A123,32878-1333MEX3Tinas Mexican

A34,764848-1298US2Joes Bugers

D244,365888-1212ITA1Marios Pizza

GradeLocationPhoneTypeIDName

Given such a database we can easily answer queries such as• List all Mexican restaurants.• List all Grade A restaurants.

However, classic databases do not allow queries such as• List all Mexican restaurants within five miles of UCR• List the 5 pizza restaurants nearest to 91 and 60.

These kinds of queries are called spatial queries.

Page 6: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

One possible way to do spatial queries Algorithm One_Nearest_Neighbor_Query(location) best_so_far = infinity;

for i = 1 to number_items_in_database

retrieve record(i) from database;

if distance(location,record(i).location )

best_so_far = distance(location,record(i).location )

NNpointer= i; end;

end;

disp(‘The nearest neighbor is’,record(NNpointer ).name );

end;

But this does not work well because…

Page 7: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

…so we need to index the data

Indexing some kinds of data is trivial…

We can index the names field simply by creating 26 files that have the names plus pointers back to the original database.

My informal definition of indexing. Organizing the data such that queries can be answered in sub linear time.

Adams Fish House 23Al’s Burgers 34Al’s Pizza 1Amom Tacos 45

Bills Sushi 56Bobs Burgers 78Bombs Away! 12Brian’s eats 99

Can o BeansCocos Champagnes Chris's Fish House

Page 8: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

So, we call always index 1-dimensional data (if you can sort it, you can index it), such that we can answer 1-nearest neighbor queries by accessing just O(log(n) ) of the database. (n is the number of items in the database).

But we can’t sort 2 dimensional data…

This is an important problem which has attracted a great deal of research interest….

We can in fact index 2 (and higher) dimensional reasonably efficiently with special data structures known as Spatial Access Methods (SAM) or Multidimensional Access Methods.

Although there are many variations we will focus on the R-Tree, introduced by Guttman in the 1984 SIGMOD conference.

Page 9: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

We have seen the formula for the distance between two points. We will find it useful to have a formula for the distance between a point and the closest possible point within an Minimum Bounding Rectangle MBR…

1) Suppose we have a query point Q and one known point R

Could any of the points in the MBR be closer to Q than R is?

MBR = {(6,1),(8,4)}

Q = (3,5)

R = (1,7)

Page 10: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

We have seen the formula for the distance between two points. We will find it useful to have a formula for the distance between a point and the closest possible point within an Minimum Bounding Rectangle MBR…

Suppose we have a query point Q and one known point B

Could any of the points in the MBR be closer to Q than B is?

MBR = {(6,1),(8,4)}

Q = (3,5)

B = (1,1)

Page 11: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

The formula for the distance between a point and the closest possible point within an MBR

MBR = {(L.x,L.y)(U.x,U.y)}

MINDIST(Q,MBR)

if L.x < x < U.x and L.y < y < U.y then 0elseif L.x < x < U.x then min( (L.y -y)2 , (U.y -y)2 )elseif ….

Q = (x,y)

Page 12: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

We can visualize the locations of interest simply as points in space...

Page 13: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

Q1

Q2

…then both nearest neighbor queries and range queries have a simple geometric interpretation…

Imagine that each point has a ID pointer back to the original database…

23

B876,65878-1342ITA4Sues Pasta

A123,32878-1333MEX3Tinas Mexican

A34,764848-1298US2Joes Bugers

D244,365888-1212ITA1Marios Pizza

GradeLocationPhoneTypeIDName

B876,65878-1342ITA4Sues Pasta

A123,32878-1333MEX3Tinas Mexican

A34,764848-1298US2Joes Bugers

D244,365888-1212ITA1Marios Pizza

GradeLocationPhoneTypeIDName

1

2

Page 14: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

R1

R2R5

R3

R7R9

R8

R6

R4

We can group clusters of datapoints into “boxes”, called Minimum Bounding Rectangles (MBRs).

Each MBR can be represented with just two points. The lower left corner, and the upper right corner.

We can further recursively group MBRs into larger MBRs….

Page 15: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

R10 R11 R12

R1 R2 R3 R4 R5 R6 R7 R8 R9

Data nodes containing points

R10 R11

R12

…these nested MBRs are organized as a tree (called a spatial access tree or a multidimensional tree).

Page 16: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

R10

R1 R2 R3

Data nodes

R12

(3,4) 77(1,3) 88(2,3) 22(5,4) 13

(2,2) 47(3,0) 86(7,9) 52

(5,1) 32(1,4) 45(5,6) 27(7,8) 73

{(1,3),(5,4)} {(2,0),(7,9)} {(1,1),(7,8)}

{(1,0),(7,9)}

At the leave nodes we have the location, and a pointer to the record in question.

At the internal nodes, we just have MBR information.

Page 17: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

We can define a function, MINDIST(point, MBR), which tells us the minimum possible distance between any point and any MBR, at any level of the tree.

MINDIST(point, MBR) = 5 MINDIST(point, MBR) = 0

Page 18: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

R10 R11 R12

R1 R2 R3 R4 R5 R6 R7 R8 R9

Data nodes containing points

We can use the MINDIST(point, MBR), to do fast search..

MIN

DIST(

, R10

) = 0

MIN

DIST(

, R11

) =10

MIN

DIST(

, R12

) =17

R10 R11

R12

Page 19: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

R10 R11 R12

R1 R2 R3 R4 R5 R6 R7 R8 R9

Data nodes containing points

We can use the MINDIST(point, MBR), to do fast search..

R10 R11

R12

MINDIST( , R1) = 0

0 10 17

MINDIST( , R2) = 2

MINDIST( , R3) =10

Page 20: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

R10 R11 R12

R1 R2 R3 R4 R5 R6 R7 R8 R9

Data nodes containing points

R10 R11

R12

0 10 17

0 2 10

We now go to disk, and retrieve all the data objects whose pointers are in the green node. We measure the true distance between our query and those objects. Let us imagine two scenarios, the closest object, the “best-so-far” has a value of..• 1.5 units (we are done searching!)• 4.0 units (we have to look in R2, but then we are done)

Page 21: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.
Page 22: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

We have seen that searching R_trees is easy!

But building the MBRs looks like a non-trivial problem.

I will allow a simplifying assumption to make it much easer to understand and code…

R10 R11

R12

R10 R11 R12

R1 R2 R3 R4 R5 R6 R7 R8 R9

Data nodes containing points

A simplifying assumption to A simplifying assumption to make coding the R-Tree easiermake coding the R-Tree easier

Page 23: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

0,0

12,12

We have a collection of datapoints in 2 dimensions.

Before we begin, we must decide on the max number of points per leaf MBR, lets say 3…

Page 24: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

0,0

12,12The definition of an MBR is the smallest (axis parallel) rectangle that can contain a set of points.

We can represent the MBR with just 2 points, the lower left corner (L) and the upper right corner (U).

MBR1 = {(L.x,L.y)(U.x,U.y)}

MBR1 = {(1,1)(9.6,8.9)}

Page 25: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

MBR = {(1,1)(9.6,8.9)}This is the root of our R-Tree

Page 26: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

MBR1 = {(1,1)(9.6,8.9)}

MBR2 = {(1,1)(4.3,3.95)}

MBR2

MBR3 = {(1,3.95)(4.3,8.9)}

MBR3

MBR4

MBR5

MBR4 =

Page 27: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

MBR1 = {(1,1)(9.6,8.9)}

MBR2 = {(1,1)(4.3,3.95)}

MBR2

MBR3 = {(1,3.95)(4.3,8.9)}

MBR4

MBR5

MBR4 =

MBR10 = {(1,3.95)(2.65,6.42)} MBR11 = {(1,6.42)(2.65,8.9)}

MBR10

MBR11

MBR12

MBR13(1.1, 4.4) 44(1.3, 4.1) 34(1.5, 4.3) 12

Page 28: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

Under this definition of an R-tree (which is not the classic definition)

• Every internal node has exactly 4 children.• Every MBR at a given level of the tree is mutually exclusive (a point cannot be in two MBRs at a given level).• Every leaf node is in one of these two classes

•It does not cover any points, and therefore contains a null pointer.•It covers at least one point, and has a pointer to a file which contains those points, together with the primary key.

Page 29: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

Project Report Format

• A new file on the webpage gives details on the format of your final report (the white binders)

Page 30: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

Title PageRequirements Analysis.

Requirements Elicitation.System Specification.

System Design:Program Design:Coding:Unit and Integration Testing:System Testing:Acceptance Testing: A Maintenance Plan: Assume that like the vast majority of software that is commissioned, your contract requires you to maintain the software.Conclusions: Assume that you write this just before handing in your document. Summarize briefly what you did for you project. Address the following; what most surprised you about the process of creating the project. What would you do differently if you had to do it again?References: Every, book, paper, webpage you used must be cited in a standard format. You could use the American Psychological format [1, 2], or the IEEE standard [3], or any other format so long as you are consistent and complete. You should also reference any standard template libraries used in your code. Appendices: Including source code, printed in 2 “pages” per page format. You should also list a professional quality, one-page resume for each member of the group (use the same template for each).Acknowledgements: Your chance to thank anyone who helped you complete the project.Two (identical) Cd roms in a high quality plastic wallet: Containing source code, test data, all the weekly archives, and a readme.txt file that contains a brief (one line) description of all the files.

Page 31: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

How I grade the project…

Page 32: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

Grading Rubric for CS179. Database project.

Group name (all last names): The project binder is in the correct basic format, and is neat. The format is consistent.

1 2 3 4 5

Notes:

The project is well documented (the students carefully referenced, the books, papers, URLs that they used)

1 2 3 4 5

Notes:

Page 33: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

The students made some effort to do elicitation. The came up with interview questions in advance, they noted the answers to the questions. (optional, they also use some other elicitation technique )

1 2 3 4 5

Notes:

The students made some effort to do requirements specification. The specification considers things like the minimum hardware requirements, usability factors, and user scenarios, privacy issues. The specification actually says what the system will do under normal scenarios, and how it will handle errors.

1 2 3 4 5

Notes:

Page 34: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

The students actually designed the project (as opposed to going straight to coding) Design may include flowcharts or pseudocode or English descriptions.

1 2 3 4 5

Notes:

The code is documented, indented etc..

1 2 3 4 5

Notes:

Page 35: Guest Lecture Tonight Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent.

The project was tested. The students demonstrated how they tested the project. There was unit testing and integration testing.

1 2 3 4 5

Notes:

The overall quality of the finished product

1 2 3 4 5

Notes:

General remarks.

Tentative grade: (may be modified after second pass).Final grade:

Special Circumstances: Students in same group get different grades because…