Top Banner
1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)
47

1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

1

Database Management SystemsCS 564

Lecture #1

(with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

Page 2: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

Yes. This is the Room for CS 564

• We moved from Humanities 1111

• All future lectures/discussions will be in this room

• Please sit a bit closer to the screen, so that I don’t have to shout

• Room doors are usually locked; I will unlock 15 minutes before each class

2

Page 3: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

3

A Bit about MyselfA Bit about Myself

Born in Vietnam Grew up in a fishing village

Nice name: AnHai Doan

“Nghe An” “Hai Phong”

Until my brother as born

as HaiAn Doan

Page 4: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

4

Vietnam Vietnam Hungary Hungary US US

High school in Vietnam

Undergrad in Hungary– had lot of beers – learned seven languages

– Hungarian, English, C, C++, Ada, Pascal, PL/I

When iron curtain fell back in 1993, one of the firsts to reach US to study

Page 5: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

5

Wisconsin Wisconsin Seattle Seattle Illinois Illinois Wisconsin Wisconsin

Masters at Wisconsin-Milwaukee

Ph.D. at Washington-Seattle– where I failed to take “CS 564”

started at Univ of Illinois-Urbana– with corn, cow, campus

In Madison since 2006– where the four major food groups are

Page 6: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

6

Random Comments from Students

• Take instruction seriously, … gave lots of really excellent dating advice

• All in-class examples revolve around beer

• His accent is very annoying …

• His accent is great. It’s so hard to understand that I’m forced to concentrate in lectures …

• His accent is a bonus feature of the class. Prepared me to work in Silicon Valley

• I now love databases …When I own Oracle, I will pay you back.

Page 7: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

What is this Course about?

• Numerous applications must deal with a lot of data

• They typically put data into a database

• The database will be managed by a system called database management system

• Applications then interact with this system to access and use the database

7

Page 8: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

An Illustration

8

Database management system

DB 2

DB 3

DB 1App 1

App 2

Page 9: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

Questions

• What form should the data be in? – way back in 1970s, people suggest to store data in

tables– so each database is a set of tables

9

ID First Name Last Name

1 Barack Obama

2 George Bush

ID City State

1 Washington DC Washington DC

2 Dallas TX

Students

Addresses

Page 10: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

Questions

• What form should the data be in? – each table can be thought of as a relation in the

mathematical sense– so such a database is referred to as a relational DB

10

ID First Name Last Name

1 Barack Obama

2 George Bush

ID City State

1 Washington DC Washington DC

2 Dallas TX

Students

Addresses

Page 11: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

So the management system is calleda relational database management

system (or RDBMS for short)

11

Database management system

DB 2

DB 3

DB 1App 1

App 2

Page 12: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

Since the 1970s, RDBMSs have been studied intensively, and have taken

over the world

• It is now a corner stone of the modern world

• Powering virtually all data-intensive apps

• 20B industry

• Bought island in Hawaii

• Since then new types of data have emerged– that would not be very well suited to be modeled as

tables12

Page 13: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

• New types of database management systems have also emerged– eg NoSQL systems

• But RDBMSs remain foundational and pervasive, and will be so in the future

• This class focuses on RDBMSs– we will learn how to design a relational database– how to store it in an RDBMS– how to use an RDBMS– look into the internals of RDBMS

13

Page 14: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

• Lessons that you learn in this class will carry over to newer types of database management systems

• You will learn fundamentals of managing a large amount of data– critical as the world is becoming increasingly data

centric

• Good for you when you go applying for a job– many jobs require knowing how to use RDBMSs

• It’s fun14

Page 15: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

• If you are interested in more data managment stuff– CS 764: gory details about RDBMSs– CS 784: newer types of data and how to manage them

(beyond RDBMSs)

15

Page 16: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

16

Course Logistics

Page 17: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

17

Prerequisite• Must have data structure and algorithm background

– CS 367 is a must; CS 537 might be useful

• For the project– lot of programming will be required

– in a high-level language of your own choosing (or rather your team’s choosing)

– could be Java, C, C++, Perl, Python, etc.

– must know how to build a Web based application or be willing to learn

Page 18: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

18

Textbook– There is no ideal textbook, unfortunately– Database Management Systems, by R. Ramakrishnan

and J. Gehrke, third edition– Database Systems: The Complete Book, by Garcia-

Molina, Ullman and Widom, second edition

– The best thing to do is to attend the lectures, make notes, and read the lecture notes

– Consult the textbooks– If you do this, you will be fine

Page 19: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

19

Course Format

• For all students– two 75-min lectures / week – project: programming, 4-5 stages, may include some

basic homework questions– a midterm and a final exam

• Attending lectures on Wed/Fri is important

• We also use the Mon slots occasionally for make-up lectures

• So if you can’t make Monday 2:25-3:15, do not take the class

Page 20: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

• In fact, for next week I’m traveling on W and F

• So we will have a make-up lecture on Monday, Jan 26

20

Page 21: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

21

Lectures

• Lecture slides in ppt format will be posted shortly before or after the lecture– are to complement the lectures

• Many issues discussed in the lectures will be covered in the exams– hence try to attend lectures regularly

• Will not cover ALL materials on the slides– attending lectures will tell you which is covered and

which is not

Page 22: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

22

Project

• Select an application that needs a database

• Build a database application from start to finish

• Significant amount of programming

• Will be done in stages– you will submit some work at the end of each stage

• May have to show a demo at semester end

Page 23: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

23

Project Groups• Project will be done in group of 3-4 students

– a lot of work, difficult to design so that one person can do all

– learn how to work in a group: valuable skills– groups are like broccoli, they are good for you

• Try to form groups as soon as possible– can start by posting requests on Piazza

• There will be a deadline later for forming groups

• If you have not formed groups by then– we will help assign you to groups

Page 24: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

24

More on Grouping

• All group members receive same grading

• If someone drops out, the rest pick up the work

Page 25: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

25

Exams

• Midterm & final– will be announced shortly– check dates and make sure no conflict!

• There may be some brief review before each exam

• If you have conflicts– do let us know in advance

• The Uncle problem

Page 26: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

26

Tentative Grading Breakdown

• Midterm: 25%

• Final: 35%

• Project: 40%

• Will attempt to grade on an absolute scale as much as possible– not on a curve

Page 27: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

27

Contacting the staff ...

Page 28: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

28

Staff & Office Hours

• Instructor: AnHai Doan

• TAs: – Avinaash Gupta– Harneet Singh

• See class homepage for office hours, contact information

Page 29: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

29

Communications• class homepage

– www.cs.wisc.edu/~anhai/courses/564-sp15

• mailing list: [email protected]– vitally important!

– make sure to check it regularly for new announcements

• Piazza: will be set up shortly• If you have a question/problem

– talk to people in your group first– post your question on Piazza– email TA– go to office hours to talk to TA or instructor

Page 30: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

30

Now onto database studies ...

Page 31: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

At the Beginning

• A program typically consists of code + data

• Eg, need to sort 1000 numbers– 2, 4, 6, 8, 1, 13, 9, ...

• Store these numbers in an array

• Write some code to sort

• Both code + data are stored in memory, and mixed together– this was typical sort programs you learned in CS 367

31

Page 32: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

• Eventually people realized that – the data part could be huge; maybe not sorting 1000

numbers, but 1 trillion numbers– this posed serious problems: what happened if the data

doesn’t fit into memory? – another issue is that many apps may want to access and

do the same thing with data– should we write duplicate codes for each of these apps? – maybe we should factor out common code– thus the motivation for databases and DB management

systems

32

Page 33: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

An Illustration

33

Database management system

DB 2

DB 3

DB 1App 1

App 2

Page 34: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

34

Another Motivating Example

• Suppose we want to store, manipulate, and query information about:– students– courses– professors– who takes what, who teaches what

Page 35: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

35

Application Requirements

• store the data for a long period of time– large amounts (100s of GB)– protect against crashes– protect against unauthorized use

• allow users to query/update: – who teaches “CS 367”– enroll “Mary” in “CS 564”

Page 36: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

36

• allow several (100s, 1000s) users to access the data simultaneously

• allow administrators to change the schema– add information about TAs

Page 37: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

37

Trying Without a DBMS

• Why Direct Implementation Won’t Work:

• Storing data: file system is limited– size less than 4GB (on 32 bits machines)– when system crashes we may loose data– password-based authorization insufficient

• Query/update:– need to write a new C++/Java program for every new

query– need to worry about performance

Page 38: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

38

• Concurrency: limited protection– need to worry about interfering with other users– need to offer different views to different users (e.g.

registrar, students, professors)

• Schema change:– entails changing file formats– need to rewrite virtually all applications

• Better let a database system handle it

Page 39: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

39

What Can a DBMS Do for Us?

• Data Definition Language - DDL

• Data Manipulation Language - DML– query language

• Storage management

• Transaction Management– concurrency control– recovery

• Think buying a plane ticket! Can you do it without a DBMS?

Page 40: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

40

What Can a DBMS Do for Us?

• Automate a lot of boring/mundane operations on data– so that we don’t have to program over and over– so that we can write complex data manipulations in

just a few lines, so that we can concentrate on app logics

• Make execution very fast– so that it scales up to very large data sets

• Make concurrent access/modification possible– so that many users can use the data at the same time

Page 41: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

41

Building an Application with a DBMS

• Requirements modeling (conceptual, pictures)– Decide what entities should be part of the application

and how they should be linked.

• Schema design and implementation– Decide on a set of tables, attributes.– Define the tables in the database system.– Populate database (insert tuples).

• Write application programs using the DBMS– way easier now that the data management is taken

care of.

Page 42: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

42

address name field

Professor

Advises

Takes

Teaches

CourseStudent

name category

quarter

name

ssn

Conceptual Modeling

cid

Page 43: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

43

Schema Design and Implementation

• Tables:

• Separates the logical view from the physical view of the data.

SSN Name Category123-45-6789 Charles undergrad234-56-7890 Dan grad

… …

SSN CID123-45-6789 CSE444123-45-6789 CSE444234-56-7890 CSE142

Students: Takes:

CID Name QuarterCSE444 Databases fallCSE541 Operating systems winter

Courses:

Page 44: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

44

Querying a Database

• Find all courses that “Mary” takes

• S(tructured) Q(uery) L(anguage)

• Query processor figures out how to answer the query efficiently.

select C.namefrom Students S, Takes T, Courses Cwhere S.name = “Mary” and S.ssn = T.ssn and T.cid = C.cid

select C.namefrom Students S, Takes T, Courses Cwhere S.name = “Mary” and S.ssn = T.ssn and T.cid = C.cid

Page 45: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

45

Query Optimization

Imperative query execution plan:

select C.namefrom Students S, Takes T, Courses Cwhere S.name=“Mary” and S.ssn = T.ssn and T.cid = C.cid

select C.namefrom Students S, Takes T, Courses Cwhere S.name=“Mary” and S.ssn = T.ssn and T.cid = C.cid

Declarative SQL query

Plan: tree of Relational Algebra operators, choice of algorithms at each operator

Goal:

Students Takes

sid=sid

sname

name=“Mary”

cid=cid

Courses

Page 46: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

46

Database Industry

• Relational databases are a great success of theoretical ideas.

• Big DBMS companies are among the largest software companies in the world.

• Oracle

• IBM (with DB2)

• Microsoft (SQL Server, Microsoft Access)

• Others

• $20B industry.

Page 47: 1 Database Management Systems CS 564 Lecture #1 (with some slides integrated from those of Raghu Ramakrishnan, Jeff Ullman, Alon Halevy, and Dan Suciu.)

47

The Study of DBMS

• Several aspects:– Modeling and design of databases– Database programming: querying and update

operations– Database implementation

• DBMS study cuts across many fields of Computer Science: OS, languages, AI, Logic, multimedia, theory...