Top Banner
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 *** The “Online” Edition *** Introduction to Data Management Lecture #2 – Part 1 (Course Trailer, cont.) Instructor: Mike Carey [email protected] SQL
19

Introduction to Data Management *** The “Online” Edition€¦ · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4 Data Independence vApplications are insulated(at

Jun 02, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Data Management *** The “Online” Edition€¦ · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4 Data Independence vApplications are insulated(at

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

*** The “Online” Edition ***Introduction to Data Management

Lecture #2 – Part 1(Course Trailer, cont.)

Instructor: Mike Carey [email protected]

SQL

Page 2: Introduction to Data Management *** The “Online” Edition€¦ · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4 Data Independence vApplications are insulated(at

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2

Today’s Notices

v Reminder: First “online” CS122A...v Frequently check the course wiki page:

§ http://www.ics.uci.edu/~cs122a/

v And camp out on the Piazza page:§ http://piazza.com/uci/spring2020/cs122a/home

v Keep partnering up!§ We’ll share the plan in the first HW assignment

v Working on possible free textbook access§ We’ll know tomorrow either way

v Any questions?§ Just kidding.... J

Page 3: Introduction to Data Management *** The “Online” Edition€¦ · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4 Data Independence vApplications are insulated(at

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3

From Last Time: University DBv Conceptual schema:

§ Students(sid: string, name: string, login: string, age: integer, gpa: real)

§ Courses(cid: string, cname: string, credits: integer) § Enrolled(sid: string, cid: string, grade: string)

v Physical schema:§ Relations stored as unordered files§ Indexes on first and third columns of Students

v External schema (a.k.a. view): § CourseInfo(cid: string, cname: string, enrollment: integer)

Page 4: Introduction to Data Management *** The “Online” Edition€¦ · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4 Data Independence vApplications are insulated(at

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4

Data Independence

v Applications are insulated (at multiple levels) from how data is actually structured and stored, thanks to schema layering and high-level queries

§ Logical data independence: Protection from changes in the logical structure of data

§ Physical data independence: Protection from changes in the physical structure of data

v One of the most important benefits of DBMS use!§ Allows changes to occur – w/o application rewrites!

§ 1

Page 5: Introduction to Data Management *** The “Online” Edition€¦ · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4 Data Independence vApplications are insulated(at

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5

University DB Example (cont.)v End user query (in SQL, against the external schema):

§ SELECT c.cid, c.enrollmentFROM CourseInfo cWHERE c.cname = ‘Computer Game Design’

v Equivalent query (against the conceptual schema):§ SELECT e.cid, count(e.*)

FROM Enrolled e, Courses cWHERE e.cid = c.cid AND c.cname = ‘Computer Game Design’GROUP BY c.cid

v Under the hood (against the physical schema)1. Access Courses – use index on cname to find associated cid2. Access Enrolled – use index on cid to count the enrollments

Page 6: Introduction to Data Management *** The “Online” Edition€¦ · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4 Data Independence vApplications are insulated(at

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6

Concurrency and Recovery

v Concurrent execution of user programs is essential to achieve good DBMS performance.§ Disk accesses are frequent and slow, so it’s important to keep the

CPUs busy by serving multiple users’ programs concurrently.§ Interleaving multiple programs’ actions can lead to inconsistency:

e.g., a bank transfer while a customer’s assets are being totaled.

v Errors or crashes may occur during, or soon after, the execution of users’ programs.§ This could lead to undesirable partial results or to lost results.

v DBMS answer: Let users/programmers pretend that they’re using a reliable, single-user system!

Page 7: Introduction to Data Management *** The “Online” Edition€¦ · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4 Data Independence vApplications are insulated(at

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7

Structure of a DBMS

v A typical DBMS has a layered architecture.

v This figure leaves out the concurrency control and recovery components.

v This is one of several possible architectures; each RDBMS has its own variations.

Query Optimizationand Execution

Relational Operators

Files and Access Methods

Buffer Management

Disk Space Management

DB

These layersmust consider

concurrencycontrol and

recovery

SQL

Page 8: Introduction to Data Management *** The “Online” Edition€¦ · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4 Data Independence vApplications are insulated(at

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8

DBMS Structure (More Detail)Query Parser

Query Optimizer

Plan Executor

Relational Operators (+ Utilities)

Filesof

Records

Buffer Manager

AccessMethods(Indices)

Disk Space and I/O Manager

LockManager

TransactionManager

LogManager

DataFiles

IndexFiles

CatalogFiles WAL

SQL

Query plans

API calls

(CS223)

(CS122C)

Page 9: Introduction to Data Management *** The “Online” Edition€¦ · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4 Data Independence vApplications are insulated(at

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9

Components’ Roles v Query Parser

§ Parse and analyze SQL query§ Makes sure the query is valid and talking about

tables, etc., that indeed exist

v Query optimizer (usually has 2 steps)§ Rewrite the query logically§ Perform cost-based optimization§ Goal is finding a “good” query plan considering

• Available access paths (files & indexes)• Data statistics (if known)• Costs of the various relational operations

(Cost differencesare often ordersof magnitude!!!)

SELECT e.title, e.lastnameFROM Employees e, Departments dWHERE e.dept_id = d.dept_id AND

year (e.birthday >= 1970) ANDd.dept_name = ‘Engineering’

Page 10: Introduction to Data Management *** The “Online” Edition€¦ · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4 Data Independence vApplications are insulated(at

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10

Components’ Roles (continued) v Plan Executor + Relational Operators

§ Runtime side of query processing§ Query plan is a tree of relational operators (drawn

from the relational algebra, which you will learn all about in this class)

Page 11: Introduction to Data Management *** The “Online” Edition€¦ · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4 Data Independence vApplications are insulated(at

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11

Components’ Roles (continued) v Files of Records

§ DBMSs have record based APIs under the hood• Record = set of fields• Fields are typed• Records reside on pages of files

v Access Methods§ Index structures for lookups based on field values§ We’ll look in more depth at B+ tree indexes in this

class (the most commonly used index type for both commercial and open source DBMSs)

Page 12: Introduction to Data Management *** The “Online” Edition€¦ · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4 Data Independence vApplications are insulated(at

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12

Components’ Roles (continued) v Buffer Manager

§ The DBMS answer to main memory management!§ All disk page accesses go via the buffer pool§ Buffer manager caches pages from files and indexes

v Disk Space and I/O Managers§ Manage space on disk (pages)§ Also manage I/O (sync, async, prefetch, …)§ Remember: database data is persistent (!)

Page 13: Introduction to Data Management *** The “Online” Edition€¦ · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4 Data Independence vApplications are insulated(at

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13

Components’ Roles (continued) v System Catalog (or “Metadata”)

§ Info about tables (name, columns, column types, … );§ Data statistics (e.g., counts, value distributions, …)§ Info about indexes (tables, index kinds, …)§ And so on! (Views, security, …)

v Transaction Management§ ACID (Atomicity, Consistency, Isolation, Durability)§ Lock Manager for Consistency + Isolation§ Log Manager for Atomicity + Durability

Page 14: Introduction to Data Management *** The “Online” Edition€¦ · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4 Data Independence vApplications are insulated(at

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14

Miscellany: Terms (for parties J)

v Data Definition Language (DDL)§ Used to express views + logical schemas (using a

syntactic form of a a data model, e.g., relational)

v Data Manipulation Language (DML)§ Used to access and update the data in the database

(again in terms of a data model, e.g., relational)

v Query Language (QL)§ Synonym for DML, or for its retrieval (i.e., data

access or query) sublanguage

Page 15: Introduction to Data Management *** The “Online” Edition€¦ · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4 Data Independence vApplications are insulated(at

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15

Miscellany (cont’d.): Rolesv Database Administrator (DBA)

§ The “super user” for a database or a DBMS§ Handles physical DB design, tuning, performance monitoring,

backup/restore, user/group managementv Application Developer

§ Builds data-centric applications (take CS122b!)§ Involved with logical DB design, queries, and DB application tools

(e.g., JDBC, ORM, …)v Data Analyst or End User

§ Non-expert who uses tools to interact w/the datav Data Engineer (new)

§ Develops/constructs/maintains Big Data platforms and data flows§ Uses multiple Big Data (etc.) tools and technologies to prepare data

products for consumption by Data Scientists

Page 16: Introduction to Data Management *** The “Online” Edition€¦ · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4 Data Independence vApplications are insulated(at

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 16

A Brief History of Databasesv Pre-relational era: 1960’s, early 1970’sv Codd’s relational model paper: 1970v Basic RDBMS R&D: 1970-80 (System R, Ingres)v RDBMS improvements: 1980-85v Relational goes mainstream: 1985-90v Distributed DBMS research: 1980-90v Parallel DBMS research: 1985-95v Extensible DBMS research: 1985-95v OLAP and warehouse research: 1990-2000v Stream DB and XML DB research: 2000-2010v “Big Data” R&D (also including “NoSQL”): 2005-present

Page 17: Introduction to Data Management *** The “Online” Edition€¦ · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4 Data Independence vApplications are insulated(at

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 17

Introductory Recapv DBMSs are used to maintain & query large datasets.v Benefits include recovery from system crashes,

concurrent access, quick application development, data integrity and security.

v Levels of abstraction give data independence.v A DBMS typically has a layered architecture.v DBAs (and Data Engineers) hold responsible jobs

and they are also well-paid! (J)v Data-related R&D is one of the

broadest, most exciting areas in CS.

Page 18: Introduction to Data Management *** The “Online” Edition€¦ · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4 Data Independence vApplications are insulated(at

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 18

So Now What?

v Time to dive into the first tech topic:§ Logical DB design (ER model)

v Read the first two chapters of the book§ Intro and ER – see the syllabus on the wiki

v Immediate to-do’s for you are:§ Again, be sure that you’re signed up on Piazza§ And, stockpile sleep – no homework just yet (J)

v Let’s switch gears to database design…

Page 19: Introduction to Data Management *** The “Online” Edition€¦ · Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4 Data Independence vApplications are insulated(at

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 19

To Be Continued...