Top Banner
Lecture 1 Introduction to the Design and Specification of File Structures
28

Lecture 1 Introduction to the Design and Specification of File Structures.

Jan 04, 2016

Download

Documents

Megan Holmes
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 1 Introduction to the Design and Specification of File Structures.

Lecture 1Introduction to the Design and Specification of File Structures

Page 2: Lecture 1 Introduction to the Design and Specification of File Structures.

Previous Lecture Previous Lecture

Page 3: Lecture 1 Introduction to the Design and Specification of File Structures.

Lecture 0

Course Outline

Course Aims and Objectives. Course Contents. Course Textbook and Schedule. Course Link

http://hedar.info/Courses/INF211/

Page 4: Lecture 1 Introduction to the Design and Specification of File Structures.

Today Lecture Today Lecture

Page 5: Lecture 1 Introduction to the Design and Specification of File Structures.

Lecture Objectives

Introduce the primary design issues that characterize file structure design.

Survey the history of file structure. Introduce the notions of file structure literacy

and of a conceptual toolkit for file structure design.

Discuss the need for precise specification of data structure and operations

Develop an object-oriented toolkit that makes file structure easy to use.

Page 6: Lecture 1 Introduction to the Design and Specification of File Structures.

Lecture Contents

1. The heart of file structure design.2. A short history of file structure

design.3. A conceptual toolkit: File structure

literacy. 4. An object-oriented toolkit: Making

file structure usable.

Page 7: Lecture 1 Introduction to the Design and Specification of File Structures.

Section 1.1

The heart of file structure design

Page 8: Lecture 1 Introduction to the Design and Specification of File Structures.

File Structure Definition & Functions

DefinitionDefinition A combination of representations for data in

files and of operations for accessing the data.

FunctionsFunctions Allowing applications to read, write and

modify data. It might also supportIt might also support finding the data that matches some search

criteria reading through the data in some particular

order.

Page 9: Lecture 1 Introduction to the Design and Specification of File Structures.

File Structure Design: Need of Study

Data Storage Computer Data can be stored in 3 kinds of locations:

Primary StorageMemory

Primary StorageMemory Secondary StorageSecondary Storage Tertiary Storage

Archival DataTertiary Storage

Archival Data

Computer Memory

• Online Disk

• Tape

• CD Rom “accessed by the computer”

• Offline Disk

• Tape

• CD Rom “not accessed by the computer”

Page 10: Lecture 1 Introduction to the Design and Specification of File Structures.

File Structure Design: Need of Study

Memory versus Secondary Storage Secondary storage such as disks can

pack 1000’s of megabytes in a small physical location.

Computer Memory (RAM) is limited. Comparing to Memory, access to

secondary storage is extremely slow. Getting information from slow RAM takes

120. 10-9 seconds (= 120 nanoseconds) while getting information from Disk takes 30. 10-3 seconds (= 30 milliseconds)

Roughly, 20 second on RAM ≈ 58 days on

Disk

Page 11: Lecture 1 Introduction to the Design and Specification of File Structures.

File Structure Design: Need of Study

Improve Secondary Storage Access Time

representation of the data the implementation of the operations

⇒ the efficiency of the file structure for particular applications

Page 12: Lecture 1 Introduction to the Design and Specification of File Structures.

File Structure Design

General Goals

Get the information we need with one access to the disk.

If that’s not possible, then get the information with as few accesses as possible.

Group information so that we are likely to get everything we need with only one trip to the disk.

Page 13: Lecture 1 Introduction to the Design and Specification of File Structures.

File Structure Design

Fixed versus Dynamic Files

It is relatively easy to come up with file structure designs that meet the general goals when the files never change.

When files grow or shrink when information is added or deleted, it is much more difficult.

Page 14: Lecture 1 Introduction to the Design and Specification of File Structures.

Section 1.2

A short history of file structure design

Page 15: Lecture 1 Introduction to the Design and Specification of File Structures.

Early Work

Early Work assumed that files were on tape.

Access was sequential and the cost of access grew in direct proportion to the size of the file.

Page 16: Lecture 1 Introduction to the Design and Specification of File Structures.

The emergence of Disks and Indexes

As files grew very large, unaided sequential access was not a good solution.

Disks allowed for direct access. Indexes made it possible to keep a list

of keys and pointers in a small file that could be searched very quickly.

With the key and pointer, the user had direct access to the large, primary file.

Page 17: Lecture 1 Introduction to the Design and Specification of File Structures.

The emergence of Tree Structures As indexes also have a sequential

flavor, when they grew too much, they also became difficult to manage.

The idea of using tree structures to manage the index emerged in the early 60’s.

Trees can grow very unevenly as records are added and deleted resulting in long searches requiring many

disk accesses to find a record.

Page 18: Lecture 1 Introduction to the Design and Specification of File Structures.

Balanced Trees

In 1963, researchers came up with the idea of AVL trees for data in memory. heights of the two child subtrees of any

node differ by at most one. named after its two inventors, G.M. Adelson-

Velsky and E.M. Landis AVL trees, however, did not apply to files

because they work well when tree nodes are composed of single records rather than dozens or hundreds of them.

Page 19: Lecture 1 Introduction to the Design and Specification of File Structures.

Balanced Trees

In the 1970’s came the idea of B-Trees In B-trees, internal nodes can have a

variable number of child nodes within some pre-defined range.

When data is inserted or removed from a node, its number of child nodes changes.

B-Trees can guarantee that one can find one file entry among millions of others with only 3 or 4 trips to the disk.

Page 20: Lecture 1 Introduction to the Design and Specification of File Structures.

Hash Tables

Retrieving entries in 3 or 4 accesses is good, but it does not reach the goal of accessing data with a single request.

From early on, Hashing was a good way to reach this goal with files that do not change size greatly over time.

Recently, Extendible Dynamic Hashing guarantees one or at most two disk accesses no matter how big a file becomes.

Page 21: Lecture 1 Introduction to the Design and Specification of File Structures.

Section 1.3

A conceptual toolkit: File structure

literacy

Page 22: Lecture 1 Introduction to the Design and Specification of File Structures.

Conceptual tools For File Structure Design

Decrease the number of disk accesses by collecting data into buffers, blocks, or buckets.

Manage their growth by splitting them. Find a way to increase our address or

index space. Find new ways to combine the basic

tools.

SequentiallyDirect Access

TreeStructure

Page 23: Lecture 1 Introduction to the Design and Specification of File Structures.

Section 1.4

An object-oriented toolkit: Making file structure usable

Page 24: Lecture 1 Introduction to the Design and Specification of File Structures.

Object-Oriented Toolkit For File Structure Design

Making file structure usable needs application programming

interfaces. Invoking an object-oriented approach

Data types and operations are defined as classes.

Page 25: Lecture 1 Introduction to the Design and Specification of File Structures.

Object-Oriented Toolkit Difficulties

Describing the classes for file structure design are:

Progressive New classes are often modifications or

extensions of other classes. Complicated

The details of the data representations and operations become more complex.

Page 26: Lecture 1 Introduction to the Design and Specification of File Structures.

Next Lecture Next Lecture

Page 27: Lecture 1 Introduction to the Design and Specification of File Structures.

Fundamental File Processing Operations

Physical and logical file. Opening and closing files. Reading and writing. Seeking. Special Characters in files. Physical devices and logical files. File-related header files.

Page 28: Lecture 1 Introduction to the Design and Specification of File Structures.

Questions?