Data Storage

1

Data Storage

2

Outline

• The memory hierarchy• Moore’s law• Disks• Access Times• I/O model of computation• Sorting on disk• Optimize disk access

3

Outline

• The memory hierarchy• Moore’s law• Disks• Access Times• I/O model of computation• Sorting on disk• Optimize disk access

The memory hierarchy

• Where data is stored in a computer system?– Capacities– Speeds– Costs

4

Cache

• On the same chip as the microprocessor

• Megabytes

• Nanoseconds(10-9) for cache I/O=cpu speed

• 100 nanoseconds for exchanging data between cache and main memory

5

Main memory

• 100M-10G or more

• Fast random access

• 10-100 nanoseconds for memory access

6

Virtual memory

• Program space: virtual memory address space– On 32bit machine, there are overall 232 =4G

address– When larger than actual main memory, data

will be stored on disk

– Main-memory database systems• manage data through virtual memory, relying on

Paging mechanism of OS

7

Secondary storage: disk• Compared to memory

– Slower (105)• One disk i/o takes 10-30 ms

– more capacious (102) • 100g or more

– more cheaper

• Magnetic or optical

• Support– Sequential access: fast– Random access: slower

• Related concepts– virtual memoryb: pages– file systems: files

• Disk read: moving a block from disk to main memory

• Disk write: moving a block from main memory to disk

8

Buffer file for disk I/O

• Using buffers to read file on disks

9

Tertiary storage

• Capacity– TB=1024G data

• Compared to disk– Higher read/write time (103)– More capacious (103)– Cheaper (cost per byte)

• Support only sequential access

• Typical tertiary storage devices– Ad-hoc tape storage– Optical-disk juke boxes

10

Comparison (lg scale)

11

Memory Hierarchy

Processor cache

RAM

Disks

Tapes / Optical Disks

Access TimePrice $/ Mb1 ns

x10

x106

x1010

100

10

0.2

0.2

Volatile vs nonvolatile

• Volatile: Data is lost when power is off

• Volatile– cache– Main memory

• Non Volatile– Magnetic disks– Tapes– CDROM

13

14

Outline•The memory hierarchy•Moore’s law• Disks• Access Times• I/O model of computation• Sorting on disk• Optimize disk access

Exponential Growth

Moore’s law– Double every 18 months:

• Speed of processors• Cost of storage (per bit, in reverse direction) • 2x / 18 months ~ 100x / 10 years

http://www.intel.com/research/silicon/moorespaper.pdf

Consequences of “Moore’s law”

• Storage access becomes ‘slower’– Latency between data access and computing

becomes larger

• Data flood: Storage size becomes ‘smaller’– Latency between requirement of data capacity

and actual data capacity becomes larger

Storage access becomes ‘slower’

• Storage access time grows slowly relatively• “Latency” becomes progressively larger

– The time moving data between levels of memory hierarchy

– Vs– The time to compute

Data Flood

• Disk Sales double every nine months– Because volume of

stored data increases• Data Warehouses• Internet Logs• Web Archives• Sky Survey

– Because media price drops much faster than areal density.

Graph courtesy of Joe HellersteinSource: J. Porter, Disk/Trend, Inc.http://www.disktrend.com/pdf/portrpkg.pdf

0

500

1000

1500

2000

2500

3000

3500

Year

Pet

abyt

es

Sales

Moore'sLaw

19


Mechanic of disks

CS 245 Notes 2 20

Disk surfaces

• Block: logical unit for data transferring between disk and main memory

• One block=one or more sectors

CS 245 Notes 2 21

Disk controller

• Controlling the moving of head assembly

• Selecting surfaces, sectors

• Transferring data• A disk controller

can control multiple disks

CS 245 Notes 2 22

Disk storage characteristics

• Rotation speed of disk assembly– E.g.,5400 PRM, higher or lower

• Number of platters per unit– 5 platters, 10 surfaces

• Number of tracks per surfaces– 20,000 tracks

• Number of bytes per tracks– A million bytes

CS 245 Notes 2 23

Disk access characteristics

• How to access a block on disk?– Step1: move heads to the proper cylinder->

seek time– Step2: rotate disk to the sectors containing

the block->rotation time– Step 3: transfer data-> transfer time

CS 245 Notes 2 24

CS 245 Notes 2 25


CS 245 Notes 2 26

Time = Seek Time +Rotational Delay +Transfer Time +Other

CS 245 Notes 2 27

Disk Access Time

block xin memory

?

I wantblock X

Seek time

• Proportional to the distance traveled

• ms

CS 245 Notes 2 28

3 or 5x

x

1 N

Cylinders Traveled

Time

CS 245 Notes 2 29

Average Random Seek Time

SEEKTIME (i j)

S =

N(N-1)

N N

i=1 j=1ji

“Typical” S: 10 ms 40 ms

CS 245 Notes 2 30

Rotational Delay

Head Here

Block I Want

CS 245 Notes 2 31

Average Rotational Delay

R = 1/2 revolution

“typical” R = 8.33 ms (3600 RPM)

CS 245 Notes 2 32

Transfer Rate: t

• “typical” t: 1 3 MB/second

• transfer time: block size

t

CS 245 Notes 2 33

Other Delays

• CPU time to issue I/O

• Contention for controller

• Contention for bus, memory

“Typical” Value: 0

CS 245 Notes 2 34

• So far: Random Block Access

• What about: Reading “Next” block?

CS 245 Notes 2 35

If we do things right (e.g., Double Buffer, Stagger Blocks…)

Time to get = Block Size + Negligible

block t

- skip gap

- switch track

- once in a while,

next cylinder

CS 245 Notes 2 36

Rule of Random I/O: ExpensiveThumb Sequential I/O: Much less

• Ex: 1 KB Block» Random I/O: 20 ms.» Sequential I/O: 1 ms.

CS 245 Notes 2 37

Cost for Writing similar to Reading

…. unless we want to verify that the block written was correct! need to add (full) rotation + Block size

tSince the head can not go back

CS 245 Notes 2 38

• To Modify a Block?

To Modify Block:(a) Read Block(b) Modify in Memory(c) Write Block[(d) Verify?]

CS 245 Notes 2 39


Computation model for DBMS

• Traditional RAM model– Assumes that data in main memory, and

access any item of data takes as much time as any other item

• Is RAM model suitable for DBMS?

• Assumption of DBMS:– Data does not fit into main memory– Support secondary and even tertiary storage

CS 245 Notes 2 40

I/O model of computation

• I/O model of computation– Time taken to perform disk I/O is much larger

than the manipulation time of data on main memory

– Quantity to minimize: number of block access (I/O)

CS 245 Notes 2 41

CS 245 Notes 2 42


Merge-sort

• A main-memory sorting algorithm• Complexity

– T(n)=2T(n/2)+an=> T(n)=O(nlogn)

• Procedures– Basis: for one element list, do nothing– Induction:

• Equally divide the lists into tow sublists• Sort the two sublists• Merge the two sorted sublists

CS 245 Notes 2 43

Merge two sorted lists

• Linear to the size of two lists

CS 245 Notes 2 44

Two-phase, Multiway Merge-sort

• Two phases– P1:sort main-memory-sized pieces of data,

resulting into a number of sorted sublists– P2: Merge all sorted sublist into a single

sorted lists

CS 245 Notes 2 45

CS 245 Notes 2 46

Phase 2

• 1. find the smallest key among the first remaining elements of all sublists

• 2. Move the smallest element to the first available position of the output buffer

• 3.if the out block is full, write it to disk and reinitialize the out buffer

• 4. if the input buffer is exhausted, read the next block of the sublist into the buffer

CS 245 Notes 2 47

I/O cost of TPMMS

• Given a relation with R blocks,

• The number of I/O of TPMMS is 4R

CS 245 Notes 2 48

Upper bound of TPMMS • Block size: B bytes, Main memory size: M bytes, Records take R bytes

• Total number of records we can sort– = the maximal number of sublists size of each sublist

• The number of buffers in main memory: M/B,– one for output, – M/B-1 for input=maximal number of sorted sublists

• Each sublists contains at most M/R records

• Total number of records we can sort is (M/R)(M/B-1), approximately M2/RB

• Let M=108, B=214 R=169, M2/RB=4.2billion=2/3TB

CS 245 Notes 2 49

Multiway Merging of Larger Relations

• Use TPMMS to sort groups of M2/RB records, turning them into sorted sublists

• In a third phase, merge up to (M/B)-1 these lists in a final multiway merge

• Capability: M3/RB2

• In the above example, support 27 trillion records, and about 4.3PB

CS 245 Notes 2 50

CS 245 Notes 2 51


Accelerating access to secondary storage

• Organizing data by cylinders

• Using multiple disks

• Mirroring disks

• Disk scheduling algorithms

• Prefetching and large-scale buffering

CS 245Notes 2

52

Organizing data by cylinders

• Place blocks that are accessed together on the same cylinders or adjacent cylinders

• For examples, to read blocks consecutively, only one seek time and rotation time is needed

• For TPMMS, such strategy can significantly reduce the time of Phase 1, but no benefit for Phase 2

CS 245 Notes 2 53

Using multiple disks

• All times associated with reading and writing the disk were divided by the number of disks

• Precondition: the disk controller, bus and main memory can handle the data transferred at a high rate

• One typical example: Striping

CS 245 Notes 2 54

Mirroring disks

• Mirror disks: two or more disks hold identical copies of data

• Two motivations– High availability– Speed up access to data

• If we have n copies of a disk, we can read any n blocks in parallel

• If we have fewer copies, we can obtain speed increase by choosing which disk to read from （ the disk whose heads is closet to the desired block）

• Mirroring does not speed up writing, but neither does it slow writing

CS 245 Notes 2 55

Disk scheduling algorithms

• Elevator algorithm– To accelerate multiple independent access request– Making sweeps from innermost to outermost cylinder and

then back again– When the heads reaches a position where there are no

requests ahead of them in their direction of travel, they reverse direction

• Assume that– Average rotation time+ block access time=4.43 ms– Seek time=1+number of tracks/1000 ms

CS 245 Notes 2 56

CS 245 Notes 2 57

Elevator algorithm vs first-come-first-serve

CS 245 Notes 2 58

Elevator algorithm

First-come-first-serve

Prefetching and large-scale buffering

• Prefetching– In some cases where we can predict the order of

block access, we can load them before they are needed

• Single buffering vs double buffering• Large-scale buffering

– Track-size, cylinder-size buffering– One seek time for each track or cylinder instead

of block• Disadvantage: additional memory

CS 245 Notes 2 59

CS 245 Notes 2 60

Double Buffering

Problem: Have a File» Sequence of Blocks B1, B2

Have a Program» Process B1» Process B2» Process B3

...

CS 245 Notes 2 61

Single Buffer Solution

(1) Read B1 Buffer

(2) Process Data in Buffer

(3) Read B2 Buffer

(4) Process Data in Buffer ...

CS 245 Notes 2 62

Say P = time to process/block

R = time to read in 1 block

n = # blocks

Single buffer time = n(P+R)

63

Double Buffering

Memory:

Disk:

A B C D GE F

A B

done

process

AC

process

B

done

CS 245 Notes 2 64

Say P R

What is processing time?

P = Processing time/blockR = IO time/blockn = # blocks

• Double buffering time = R + nP

• Single buffering time = n(R+P)

Data Storage

Documents

memory access

data access

memory hierarchywhere

diskoptimize disk access

actual main memory

fastrandom access

largerstorage access

slowerstorage access