Top Banner
File and Databases CS208
59

File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

File and Databases

CS208

Page 2: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

File Organization

The three principal file organizations are Sequential Direct Indexed Sequential

Page 3: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Sequential File Organization

Records physically stored one after another in order determined by key field

Advantages: Very efficient when many or all records in a

file need to be accessed Cost of tape and drives very low

Disadvantages: Major drawback is very slow access time for a

particular record. Must rewrite all records following record

insertion.

Page 4: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Direct File Organization

Records stored at a specific address, determined by their key field A mathematical technique called hashing converts

the key field value into a corresponding address Advantages:

Record can be quickly accessed by going directly to its address

Disadvantages: Must be done using random access storage

(disk/optical) which have higher cost than sequential

Can only use one key

Page 5: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Indexed SequentialFile Organization

Compromise between direct and sequential methods Records stored sequentially Index created that records the address of each

record Advantages:

Good compromise between previous two methods Can have multiple index tables to use multiple

keys Disadvantages:

Slower than direct-access

Page 6: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

What is a “database”?

A “database” is just a collection of related data.

Databases can exist in many forms. Examples: Sheets of paper in folders in a file cabinet A book (think of it as a collection of sentences

and illustrations) Books in a collection (e.g., a library) Sets of 3"x5" cards containing notes Maps and other geographic information systems Blood samples in a medical laboratory

Page 7: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Common Elements Sets of data and information composed

of, and/or represented by: bits alphanumeric symbols lines and shapes in drawings, pictures, and

maps audio and video recordings actual substances

A means by which the sets of data and information are organized in order to facilitate access to individual desired sets

Page 8: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Examples of Access Methods

Phone book - Collection of several independent databases, each consisting of names and corresponding phone numbers: Blue-pages governmental listings:

primary arrangement alphabetical by type of government (city, county, state, federal),

secondary arrangement alphabetical by agency within type of government,

tertiary arrangement alphabetical by office within agency

Page 9: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Examples of Access Methods (continued)

White-pages personal listings: arranged alphabetically by surname within surname by first names

Yellow-pages listings: primary arrangement by type of business, secondary arrangement alphabetically by

company within type of business, plus various special groupings (e.g.,

restaurants by ethnic type)

Page 10: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Flat File Databases (DBs)

Flat-file DBs are like the DBs you can construct in a single spreadsheet page All the information in the DB is in one

file consisting of one array of rows and columns.

For example:SSN Surname First Name(s) Telephone Number123-45-6789 Doe J ohn X. 303-555-1234987-65-4321 Smith Martin 720-111-2222567-89-0123 J ohnson Billy Bob 303-444-5555

Page 11: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Flat File Advantages/Disadvantages

Advantages Simple Good for few records with few fields

Disadvantages Unnecessary duplication of data or data

redundancy Inconsistent, incomplete or inaccurate data,

lacking data integrity Changes in data are difficult to implement Separate and isolated data with limited data

sharing

Page 12: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Data Heirarchcy in Computerized Databases

Character - single letter, number or special character

Field - a set of related characters Record - collection of related fields File - collection of related records Database - collection of related files

Page 13: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Types of Databases

Individual - often on a PC used by one person

Company or shared - usually on a mainframe and managed by a database administrator Example: Common operational

databases contain information about company operations

Page 14: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Distributed databases - have data stored in multiple locations, but the data is accessible through communications networks

Proprietary databases - created by an organization, and stored information is offered to others for a fee Examples include Dialog Information Services

and Dow Jones Interactive Publishing

Types of Databases

Page 15: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Database Terminology

Database Management System (DBMS) –Allows a user to deal with data in logical terms, without having to understand the computer's physical view.

Logical data view – How humans see things

Physical data view – How things are stored in a computer

Page 16: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

DBMS – provides storage, retrieval, analysis, sorting, and printing of information in a database.

DataBase Management System (DBMS)

DBMS is:

• A collection of program independent, interrelated data

• A set of programs to access the data

• Information about a particular enterprise

• An environment that is both convenient and efficient to use.

Page 17: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Current DBMS Systems Mainframe database vendors:

Oracle IBM DB2 Microsoft (SQL Server) Sybase Informix

Desktop: MS Access Borland Paradox

Some free database systems (Unix) :

Postgres MySQL, mSQL Predator

Page 18: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

DBMS Uses

The processing power of a DBMS allows it to: Sort Match Link Aggregate Skip fields Calculate Arrange

Page 19: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Database Administrator (DBA)

Coordinates all the activities of the database system.

Must have a good understanding of the enterprise’s information resources and needs.

Page 20: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Database Administrator (DBA)

Database administrator's duties include: Database definition Database modification Granting user authority to access the

database for security and privacy Specifying integrity constraints Acting as liaison with users Monitoring DB performance and

responding to changes in requirements Decides strategy for backup and

recovery

Page 21: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Types of DBMS Users End-User:

Non-specialist accessing data via a query language Naïve user accessing data via a special-purpose

interface Performs data retrieval and update (extend/modify)

Applications Programmer: Writes programs that use the DB by embedding

queries to the DB in a HLL Develops interfaces for the naïve user

Page 22: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

DBMS Organization

The four principal DBMS organizations are Hierarchical Network Relational Object-Oriented

Page 23: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Hierarchical Databases Viewed as branches of an upside-down tree

Each item is subordinate to its parent item Only one parent per item

Any element (node) in the database is linked only to the elements directly above it and directly below it. If parent node is deleted, all the child nodes

are as well New parent node must be created before

adding a new child node No direct relationships between child nodes

Page 24: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Hierarchical Databases Limited by rigid structure Typically require custom programming Example:

Original computer-based databases were designed for banking.

Hierarchical databases were appropriate for such purposes, e.g.:

individual accounts can be grouped by family or business;

sets of accounts, grouped by branch; accounts in different branches, grouped by

city; accounts in different cities, grouped by state.

Page 25: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Hierarchical Model

- Must always begin at top to search for data- One parent per child, no other relationships allowed

CourseNum Name Section

DeptNum Name

IDNum Name

IDNum Name Course

Department

Courses Students

Professors

Page 26: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Network Databases Permits links among all components

(i.e. elements can be linked to other elements anywhere in the database, not just those directly above and below)

The interconnected design allows for access via multiple pathways

Can be extremely difficult to manage

The World-Wide Web is a very large example of a network database.

Page 27: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Network Model

- Added data paths across the tree (instead of just up and down)- Reduced time required to access data, but increased overhead space requirements

CourseNum Name Section

DeptNum Name

IDNum Name

IDNum Name Course

Department

Courses

Students

Professors

Page 28: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Relational Databases

A relational database is a set of one or more tables that together embody information about a set of related concepts and entities.

The tables are connected (related) via fields within the table that are shared by a pair of tables.

Page 29: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Relational Model

Based on tables of objects (the data), rather than specific paths (ways to access the data)

Department

CoursesStudents

Professors

DeptNum Name

… …… …… …

IDNum Name DeptNum Course

… … … …… … … …… … … …

Num ProfID Section

… … …… … …… … …

IDNum Name Course

… … …… … …… … …

Page 30: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Relational Database Rules

Each row is unique (distinct) Each column name is unique within a

table It is permissible to have the same name

for a column in two different tables in the same database. To distinguish between them we use

a qualified name: Pet.Name vs Family.Name

Page 31: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Relational Databases In a Relational DB, the information

content of a table does not depend on either The order of the rows; or The order of the columns

In other words, the rows and columns of a table can be rearranged at will without affecting the table's information content

Page 32: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

File (Table) = RelationRows (Tuples) = RecordsColumns (Attributes) = Fields

Relationships between objects are defined by common attributes

Num Name Credits

CS208 CS Fundamentals 3

MT360 Calculus 4

CS320 C Programming 3

Courses

Relational DB Terminology

Records(Rows, Tuples)

Fields(Columns, Attributes)

Page 33: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Primary Keys

In a Relational DB, each table Must have a primary key (unique identifier) Must have no duplicate rows

A primary key is A data attribute (column), or a combination

of attributes, that uniquely identifies each record in the table.

A simple key consists of a single attribute A composite key consists of two or more

attributes

Page 34: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Primary Keys (continued)

Primary Key Provides unique way to identify each

record Can be obvious from the structure of the

table. If there is no easy natural choice, you can add a column containing a unique identifier.

May consist of the entire record (especially with two-column tables, which occur often in the development of RDBs)

Page 35: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Other Types of Keys

Secondary Key A column that is used to aid in the

retrieval of information from a table. A secondary key is not required to have

unique values in each of its rows,

Foreign Key A column used to retrieve information

from one table (i.e., is a secondary key) that is also the primary key in another table. Foreign keys are a major tool in Relational DBs.

Page 36: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Relational Database Design

Design Goals:

Avoid redundant data

Ensure that relationships among attributes are represented

Facilitate checking updates for violation of database integrity constraints

Page 37: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Creating a Student Database

Student ID First Name Middle Initial Last Name Home Address School Address Street Address City State Zip

Home Phone Work Phone Cell Phone Course ID Course Name Course Instructor

Page 38: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Database NormalizationOptimize the Tables by:

Storing each piece of data once and only once (i.e. Eliminate redundant data).

Ensuring data dependencies make sense (only storing related data within a particular table).

Maintaining data integrity.

Page 39: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

First Normal Form

Eliminate repeating groups in individual tables.

Create a separate table for each set of related data.

Identify each set of related data with a primary key.

Page 40: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

1st Normal Student DatabaseStudentStudent IDFirst NameMiddle InitialLast Name Address

Student IDAddress TypeStreet AddressCityStateZip

PhoneStudent IDPhone TypePhone Number

StudentCourseStudent IDCourse IDCourse NameCourse Instructor

Page 41: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Second Normal Form

Create separate tables for sets of values that apply to multiple records

i.e. Remove partial data dependencies.

Relate these tables using a foreign key.

Page 42: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

2nd Normal Student DatabaseStudentStudent IDFirst NameMiddle InitialLast Name

AddressStudent IDAddress TypeStreet AddressZip

PhoneStudent IDPhone TypePhone Number

StudentCourseStudent IDCourse IDCourse NameCourse Instructor

ZipCodeZipCityState

Page 43: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Third Normal Form

Eliminate any fields that do not depend on the key.

Page 44: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

3rd Normal Student DatabaseStudentStudent IDFirst NameMiddle InitialLast Name

AddressStudent IDAddress TypeStreet AddressZip

PhoneStudent IDPhone TypePhone Number

StudentCourseStudent IDCourse ID

ZipCodeZipCityState

CourseCourse IDCourse NameCourse Instructor

Page 45: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Relational Design Summary

Data is stored in records, inside of tables

Primary keys uniquely identify a record

Foreign keys link data in one table to the primary key in another table

Designs should maintain data integrity

Normalization concepts should be used

Page 46: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Dis/Advantages of Relational Datebases

Disadvantages Require more overhead

Advantages Cut down on needless repetition of

information Ensure more accuracy Facilitate updating and deletion of

information. Design avoids errors that occur when

adding/deleting information from flat files

Page 47: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Query Languages

Query languages allow non-programmers to question the DBMS

Structured Query Language (SQL) –the only standard structured query language

Page 48: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Structured Query Language (SQL)

Pronounced either "S, Q, L" or "sequel"

Widely used standard set of commands and syntax for doing things with Relational DBMSs

Used especially for query and retrieval

Includes commands for defining Relational DBs, conducting transactions, storing data, etc.

Each Relational DBMS also has additional features unique to it, because SQL does not handle all the practical details involved in using a Relational DB

Page 49: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Select

The select clause directs the DBMS to choose a subset of fields from one or more tables.

For example,

SELECT last_name, first_name

chooses all last names and first names and will place them in this order in the result set.

Page 50: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

From clause The from clause directs the query toward

one or more tables. from Student (directs the query to use

the student table) Used with the SELECT clause:

SELECT last_name, first_name FROM Student

chooses the last and first name of all rows in Student table.

Note: As long as the column names within the tables used in the from clause are unique, qualified names are not required.

Page 51: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Where clause

A where clause specifies a selection criterion that we will use to limit our choice of records.

A where clause follows the name of the table.

Format:

SELECT field1,…, fieldN

FROM tablename

WHERE (boolean expression)

Page 52: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Boolean Expressions

Boolean expressions used in where clauses typically involve a field compared to another field or a field compared to a value.

For example:

WHERE GPA > 3.50

WHERE last_name > “M”

WHERE amount_owed >= amount_paid

Page 53: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Comparison Operators

= is equal to

> is greater than

< is less than

>= is greater than or equal to

<= is less than or equal to

not is not equal to

Page 54: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Simple SQL QueryTitle Price Category Publisher

Java Intro 109.99 ComputersPrentice

Hall

Calculus 129.99 Math Kaufman

Advanced C

115.99 Computers Tech Inc

Philosophy 83.99 Lib ArtsPrentice

HallSELECT *FROM TextbooksWHERE category=‘Computers’

SELECT *FROM TextbooksWHERE category=‘Computers’

Textbooks

Title Price Category Publisher

Java Intro 109.99 ComputersPrentice

Hall

Advanced C

115.99 Computers Tech Inc

“selection”

Page 55: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Simple SQL QueryTitle Price Category Publisher

Java Intro 109.99 ComputersPrentice

Hall

Calculus 129.99 Math Kaufman

Advanced C

115.99 Computers Tech Inc

Philosophy 83.99 Lib ArtsPrentice

HallSELECT Title, Price, PublisherFROM TextbooksWHERE Price < 110

SELECT Title, Price, PublisherFROM TextbooksWHERE Price < 110

Textbooks

“selection” and“projection”

Title Price Publisher

Java Intro 109.99Prentice

Hall

Philosophy 83.99Prentice

Hall

Page 56: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Joins

You can use more than one table in a query.

The join clause tells the DBMS that you are using two tables.

You must also specify the fields on which you are linking the files, using the ON clause.

Page 57: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Example

SELECT Textbooks.Title, Price, RequiredFROM Textbooks, CourseTextsON Textbooks.Title = CourseTexts.TitleWHERE Category = ‘Computers’

The dot (.) notation specifies the table then the field within the table to reconcile the ambiguity with the two names.

Page 58: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Join SQL Query

Title Price Category Publisher

Java Intro 109.99 ComputersPrentice

Hall

Calculus 129.99 Math Kaufman

Adv C 115.99 Computers Tech Inc

Philosophy 83.99 Lib ArtsPrentice

Hall

Textbooks

Title Course Required

Java Intro CS434 Yes

Adv C CS422 No

CourseTexts

Title Price Required

Java Intro 109.99 Yes

Adv C 115.99 No

SELECT Textbooks.Title, Price, Required

FROM Textbooks, CourseTexts

ON Textbooks.Title = CourseTexts.Title

WHERE Category = ‘Computers’

SELECT Textbooks.Title, Price, Required

FROM Textbooks, CourseTexts

ON Textbooks.Title = CourseTexts.Title

WHERE Category = ‘Computers’

Page 59: File and Databases CS208. File Organization The three principal file organizations are Sequential Direct Indexed Sequential.

Database Concerns

Privacy/Unauthorized Access to Data

Data is easier to gather and exploit using a computer, so precautions must be taken to guard the data.

Accuracy/Completeness

Owner of database must ensure accuracy of the data.

Users must take data with grain of salt -

must verify