1 Data Storage
Jan 05, 2016
1
Data Storage
2
Outline
• The memory hierarchy• Moore’s law• Disks• Access Times• I/O model of computation• Sorting on disk• Optimize disk access
3
Outline
• The memory hierarchy• Moore’s law• Disks• Access Times• I/O model of computation• Sorting on disk• Optimize disk access
The memory hierarchy
• Where data is stored in a computer system?– Capacities– Speeds– Costs
4
Cache
• On the same chip as the microprocessor
• Megabytes
• Nanoseconds(10-9) for cache I/O=cpu speed
• 100 nanoseconds for exchanging data between cache and main memory
5
Main memory
• 100M-10G or more
• Fast random access
• 10-100 nanoseconds for memory access
6
Virtual memory
• Program space: virtual memory address space– On 32bit machine, there are overall 232 =4G
address– When larger than actual main memory, data
will be stored on disk
– Main-memory database systems• manage data through virtual memory, relying on
Paging mechanism of OS
7
Secondary storage: disk• Compared to memory
– Slower (105)• One disk i/o takes 10-30 ms
– more capacious (102) • 100g or more
– more cheaper
• Magnetic or optical
• Support– Sequential access: fast– Random access: slower
• Related concepts– virtual memoryb: pages– file systems: files
• Disk read: moving a block from disk to main memory
• Disk write: moving a block from main memory to disk
8
Buffer file for disk I/O
• Using buffers to read file on disks
9
Tertiary storage
• Capacity– TB=1024G data
• Compared to disk– Higher read/write time (103)– More capacious (103)– Cheaper (cost per byte)
• Support only sequential access
• Typical tertiary storage devices– Ad-hoc tape storage– Optical-disk juke boxes
10
Comparison (lg scale)
11
Memory Hierarchy
Processor cache
RAM
Disks
Tapes / Optical Disks
Access TimePrice $/ Mb1 ns
x10
x106
x1010
100
10
0.2
0.2
Volatile vs nonvolatile
• Volatile: Data is lost when power is off
• Volatile– cache– Main memory
• Non Volatile– Magnetic disks– Tapes– CDROM
13
14
Outline•The memory hierarchy•Moore’s law• Disks• Access Times• I/O model of computation• Sorting on disk• Optimize disk access
Exponential Growth
Moore’s law– Double every 18 months:
• Speed of processors• Cost of storage (per bit, in reverse direction) • 2x / 18 months ~ 100x / 10 years
http://www.intel.com/research/silicon/moorespaper.pdf
Consequences of “Moore’s law”
• Storage access becomes ‘slower’– Latency between data access and computing
becomes larger
• Data flood: Storage size becomes ‘smaller’– Latency between requirement of data capacity
and actual data capacity becomes larger
Storage access becomes ‘slower’
• Storage access time grows slowly relatively• “Latency” becomes progressively larger
– The time moving data between levels of memory hierarchy
– Vs– The time to compute
Data Flood
• Disk Sales double every nine months– Because volume of
stored data increases• Data Warehouses• Internet Logs• Web Archives• Sky Survey
– Because media price drops much faster than areal density.
Graph courtesy of Joe HellersteinSource: J. Porter, Disk/Trend, Inc.http://www.disktrend.com/pdf/portrpkg.pdf
0
500
1000
1500
2000
2500
3000
3500
Year
Pet
abyt
es
Sales
Moore'sLaw
19
Outline•The memory hierarchy•Moore’s law• Disks• Access Times• I/O model of computation• Sorting on disk• Optimize disk access
Mechanic of disks
CS 245 Notes 2 20
Disk surfaces
• Block: logical unit for data transferring between disk and main memory
• One block=one or more sectors
CS 245 Notes 2 21
Disk controller
• Controlling the moving of head assembly
• Selecting surfaces, sectors
• Transferring data• A disk controller
can control multiple disks
CS 245 Notes 2 22
Disk storage characteristics
• Rotation speed of disk assembly– E.g.,5400 PRM, higher or lower
• Number of platters per unit– 5 platters, 10 surfaces
• Number of tracks per surfaces– 20,000 tracks
• Number of bytes per tracks– A million bytes
CS 245 Notes 2 23
Disk access characteristics
• How to access a block on disk?– Step1: move heads to the proper cylinder->
seek time– Step2: rotate disk to the sectors containing
the block->rotation time– Step 3: transfer data-> transfer time
CS 245 Notes 2 24
CS 245 Notes 2 25
Outline•The memory hierarchy•Moore’s law• Disks• Access Times• I/O model of computation• Sorting on disk• Optimize disk access
CS 245 Notes 2 26
Time = Seek Time +Rotational Delay +Transfer Time +Other
CS 245 Notes 2 27
Disk Access Time
block xin memory
?
I wantblock X
Seek time
• Proportional to the distance traveled
• ms
CS 245 Notes 2 28
3 or 5x
x
1 N
Cylinders Traveled
Time
CS 245 Notes 2 29
Average Random Seek Time
SEEKTIME (i j)
S =
N(N-1)
N N
i=1 j=1ji
“Typical” S: 10 ms 40 ms
CS 245 Notes 2 30
Rotational Delay
Head Here
Block I Want
CS 245 Notes 2 31
Average Rotational Delay
R = 1/2 revolution
“typical” R = 8.33 ms (3600 RPM)
CS 245 Notes 2 32
Transfer Rate: t
• “typical” t: 1 3 MB/second
• transfer time: block size
t
CS 245 Notes 2 33
Other Delays
• CPU time to issue I/O
• Contention for controller
• Contention for bus, memory
“Typical” Value: 0
CS 245 Notes 2 34
• So far: Random Block Access
• What about: Reading “Next” block?
CS 245 Notes 2 35
If we do things right (e.g., Double Buffer, Stagger Blocks…)
Time to get = Block Size + Negligible
block t
- skip gap
- switch track
- once in a while,
next cylinder
CS 245 Notes 2 36
Rule of Random I/O: ExpensiveThumb Sequential I/O: Much less
• Ex: 1 KB Block» Random I/O: 20 ms.» Sequential I/O: 1 ms.
CS 245 Notes 2 37
Cost for Writing similar to Reading
…. unless we want to verify that the block written was correct! need to add (full) rotation + Block size
tSince the head can not go back
CS 245 Notes 2 38
• To Modify a Block?
To Modify Block:(a) Read Block(b) Modify in Memory(c) Write Block[(d) Verify?]
CS 245 Notes 2 39
Outline•The memory hierarchy•Moore’s law• Disks• Access Times• I/O model of computation• Sorting on disk• Optimize disk access
Computation model for DBMS
• Traditional RAM model– Assumes that data in main memory, and
access any item of data takes as much time as any other item
• Is RAM model suitable for DBMS?
• Assumption of DBMS:– Data does not fit into main memory– Support secondary and even tertiary storage
CS 245 Notes 2 40
I/O model of computation
• I/O model of computation– Time taken to perform disk I/O is much larger
than the manipulation time of data on main memory
– Quantity to minimize: number of block access (I/O)
CS 245 Notes 2 41
CS 245 Notes 2 42
Outline•The memory hierarchy•Moore’s law• Disks• Access Times• I/O model of computation• Sorting on disk• Optimize disk access
Merge-sort
• A main-memory sorting algorithm• Complexity
– T(n)=2T(n/2)+an=> T(n)=O(nlogn)
• Procedures– Basis: for one element list, do nothing– Induction:
• Equally divide the lists into tow sublists• Sort the two sublists• Merge the two sorted sublists
CS 245 Notes 2 43
Merge two sorted lists
• Linear to the size of two lists
CS 245 Notes 2 44
Two-phase, Multiway Merge-sort
• Two phases– P1:sort main-memory-sized pieces of data,
resulting into a number of sorted sublists– P2: Merge all sorted sublist into a single
sorted lists
CS 245 Notes 2 45
CS 245 Notes 2 46
Phase 2
• 1. find the smallest key among the first remaining elements of all sublists
• 2. Move the smallest element to the first available position of the output buffer
• 3.if the out block is full, write it to disk and reinitialize the out buffer
• 4. if the input buffer is exhausted, read the next block of the sublist into the buffer
CS 245 Notes 2 47
I/O cost of TPMMS
• Given a relation with R blocks,
• The number of I/O of TPMMS is 4R
CS 245 Notes 2 48
Upper bound of TPMMS • Block size: B bytes, Main memory size: M bytes, Records take R bytes
• Total number of records we can sort– = the maximal number of sublists size of each sublist
• The number of buffers in main memory: M/B,– one for output, – M/B-1 for input=maximal number of sorted sublists
• Each sublists contains at most M/R records
• Total number of records we can sort is (M/R)(M/B-1), approximately M2/RB
• Let M=108, B=214 R=169, M2/RB=4.2billion=2/3TB
CS 245 Notes 2 49
Multiway Merging of Larger Relations
• Use TPMMS to sort groups of M2/RB records, turning them into sorted sublists
• In a third phase, merge up to (M/B)-1 these lists in a final multiway merge
• Capability: M3/RB2
• In the above example, support 27 trillion records, and about 4.3PB
CS 245 Notes 2 50
CS 245 Notes 2 51
Outline•The memory hierarchy•Moore’s law• Disks• Access Times• I/O model of computation• Sorting on disk• Optimize disk access
Accelerating access to secondary storage
• Organizing data by cylinders
• Using multiple disks
• Mirroring disks
• Disk scheduling algorithms
• Prefetching and large-scale buffering
CS 245Notes 2
52
Organizing data by cylinders
• Place blocks that are accessed together on the same cylinders or adjacent cylinders
• For examples, to read blocks consecutively, only one seek time and rotation time is needed
• For TPMMS, such strategy can significantly reduce the time of Phase 1, but no benefit for Phase 2
CS 245 Notes 2 53
Using multiple disks
• All times associated with reading and writing the disk were divided by the number of disks
• Precondition: the disk controller, bus and main memory can handle the data transferred at a high rate
• One typical example: Striping
CS 245 Notes 2 54
Mirroring disks
• Mirror disks: two or more disks hold identical copies of data
• Two motivations– High availability– Speed up access to data
• If we have n copies of a disk, we can read any n blocks in parallel
• If we have fewer copies, we can obtain speed increase by choosing which disk to read from ( the disk whose heads is closet to the desired block)
• Mirroring does not speed up writing, but neither does it slow writing
CS 245 Notes 2 55
Disk scheduling algorithms
• Elevator algorithm– To accelerate multiple independent access request– Making sweeps from innermost to outermost cylinder and
then back again– When the heads reaches a position where there are no
requests ahead of them in their direction of travel, they reverse direction
• Assume that– Average rotation time+ block access time=4.43 ms– Seek time=1+number of tracks/1000 ms
CS 245 Notes 2 56
CS 245 Notes 2 57
Elevator algorithm vs first-come-first-serve
CS 245 Notes 2 58
Elevator algorithm
First-come-first-serve
Prefetching and large-scale buffering
• Prefetching– In some cases where we can predict the order of
block access, we can load them before they are needed
• Single buffering vs double buffering• Large-scale buffering
– Track-size, cylinder-size buffering– One seek time for each track or cylinder instead
of block• Disadvantage: additional memory
CS 245 Notes 2 59
CS 245 Notes 2 60
Double Buffering
Problem: Have a File» Sequence of Blocks B1, B2
Have a Program» Process B1» Process B2» Process B3
...
CS 245 Notes 2 61
Single Buffer Solution
(1) Read B1 Buffer
(2) Process Data in Buffer
(3) Read B2 Buffer
(4) Process Data in Buffer ...
CS 245 Notes 2 62
Say P = time to process/block
R = time to read in 1 block
n = # blocks
Single buffer time = n(P+R)
63
Double Buffering
Memory:
Disk:
A B C D GE F
A B
done
process
AC
process
B
done
CS 245 Notes 2 64
Say P R
What is processing time?
P = Processing time/blockR = IO time/blockn = # blocks
• Double buffering time = R + nP
• Single buffering time = n(R+P)