Top Banner
Introduction to I/O Efficient Algorithms (External Memory Model) Jeff M. Phillips August 30, 2013
23

Introduction to I/O Efficient Algorithms (External Memory Model)

Feb 11, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to I/O Efficient Algorithms (External Memory Model)

Introduction to I/O Efficient Algorithms(External Memory Model)

Jeff M. Phillips

August 30, 2013

Page 2: Introduction to I/O Efficient Algorithms (External Memory Model)

Von Neumann Architecture

Model:

I CPU and Memory

I Read, Write, Operations(+, −, ∗, . . .) constanttime

I polynomially equivalent toTuring Machine

RAM

CPU

Page 3: Introduction to I/O Efficient Algorithms (External Memory Model)

Memory as Disk

Reality:

I CPU and Memory

I CPU Operations (+, −, ∗,. . .) constant time

I Read, Write not constanttime (at least starting in1980s).

Disk

CPU

read/write head

Page 4: Introduction to I/O Efficient Algorithms (External Memory Model)

Cache

I through 1970s: cacheaccess similar to memoryaccess

I First commerciallyavailable 1982 (CP/Moperating system)

I SmartDrive in MicrosoftMS-DOS in 1988

Disk

CPU

read/write head

RAM

Page 5: Introduction to I/O Efficient Algorithms (External Memory Model)

Memory Hierarchy

I 1980s -→ 1990s Hierarchyexpanded

I 1989: 486 processor hasL1 Cache in CPUhad L2 off CPU onmotherboard

I L2 popular asmotherboard speed rose

Disk

CPU

read/write head

RAM

L2

L1

fasterbigg

er

4 MB

2 GB

164 GB

Page 6: Introduction to I/O Efficient Algorithms (External Memory Model)

Block Transfer

I Disk access is fastersequential: (B = 8-16KB)

I Sends whole block toRAM (size B).

I RAM has size M > B2.

I Disk access is 106 moreexpensive than RAMaccess.

I Each block transfer is 1I/O.

I Bound number of I/Os.

Disk

CPU

read/write head

RAM

block transfer

Page 7: Introduction to I/O Efficient Algorithms (External Memory Model)

Block Transfer

The difference in time between modern CPU and disk technologiesis analogous to the difference in speed in sharpening a pencil usinga sharpener on one’s desk or by taking an airplane to the other sideof the world and using a sharpener on someone else’s desk.- (Douglas Comer)

Page 8: Introduction to I/O Efficient Algorithms (External Memory Model)

Scalability

Disk

CPU

read/write head

RAM

block transfer

I Most programs developed inRAM model.

I Why don’t they alwaysthrash?

runt

ime

data size

I Sophisticated OS shiftsblocks under the hood(paging and prefetching).

I Massive data and scatteredaccess still spells doom.

Page 9: Introduction to I/O Efficient Algorithms (External Memory Model)

Scalability

Disk

CPU

read/write head

RAM

block transfer

I Most programs developed inRAM model.

I Why don’t they alwaysthrash?

runt

ime

data size

I Sophisticated OS shiftsblocks under the hood(paging and prefetching).

I Massive data and scatteredaccess still spells doom.

Page 10: Introduction to I/O Efficient Algorithms (External Memory Model)

Scalability

Disk

CPU

read/write head

RAM

block transfer

I Most programs developed inRAM model.

I Why don’t they alwaysthrash?

runt

ime

data size

I Sophisticated OS shiftsblocks under the hood(paging and prefetching).

I Massive data and scatteredaccess still spells doom.

Page 11: Introduction to I/O Efficient Algorithms (External Memory Model)

External Memory Model

D

P

M

block I/O

I N = size of probleminstance

I B = size of disk block

I M = number of itemsthat fits in Memory

I T = number of items inoutput

I I/O = block movebetween Memory and Disk

[Aggarwal and Vitter ’88][Floyd ’72]

Page 12: Introduction to I/O Efficient Algorithms (External Memory Model)

Fundamental Bounds

Internal ExternalScanning: O(N)

O(N/B)Sorting: O(N log N) O((N/B) logM/B(N/B))

Permuting: O(N) O(min{N, (N/B) logM/B(N/B)})Searching: O(log2 N) O(logB N)

I Linear I/O: O(N/B)

I Permuting not linear

I Permuting and sorting equal (practically)

I B factor very important NB < N

B logM/BNB � N

I Cannot sort optimally with search tree

Page 13: Introduction to I/O Efficient Algorithms (External Memory Model)

Fundamental Bounds

Internal ExternalScanning: O(N) O(N/B)

Sorting: O(N log N) O((N/B) logM/B(N/B))

Permuting: O(N) O(min{N, (N/B) logM/B(N/B)})Searching: O(log2 N) O(logB N)

I Linear I/O: O(N/B)

I Permuting not linear

I Permuting and sorting equal (practically)

I B factor very important NB < N

B logM/BNB � N

I Cannot sort optimally with search tree

Page 14: Introduction to I/O Efficient Algorithms (External Memory Model)

Fundamental Bounds

Internal ExternalScanning: O(N) O(N/B)Sorting: O(N log N)

O((N/B) logM/B(N/B))

Permuting: O(N) O(min{N, (N/B) logM/B(N/B)})Searching: O(log2 N) O(logB N)

I Linear I/O: O(N/B)

I Permuting not linear

I Permuting and sorting equal (practically)

I B factor very important NB < N

B logM/BNB � N

I Cannot sort optimally with search tree

Page 15: Introduction to I/O Efficient Algorithms (External Memory Model)

Fundamental Bounds

Internal ExternalScanning: O(N) O(N/B)Sorting: O(N log N) O((N/B) logM/B(N/B))

Permuting: O(N) O(min{N, (N/B) logM/B(N/B)})Searching: O(log2 N) O(logB N)

I Linear I/O: O(N/B)

I Permuting not linear

I Permuting and sorting equal (practically)

I B factor very important NB < N

B logM/BNB � N

I Cannot sort optimally with search tree

Page 16: Introduction to I/O Efficient Algorithms (External Memory Model)

Fundamental Bounds

Internal ExternalScanning: O(N) O(N/B)Sorting: O(N log N) O((N/B) logM/B(N/B))

Permuting: O(N)

O(min{N, (N/B) logM/B(N/B)})Searching: O(log2 N) O(logB N)

I Linear I/O: O(N/B)

I Permuting not linear

I Permuting and sorting equal (practically)

I B factor very important NB < N

B logM/BNB � N

I Cannot sort optimally with search tree

Page 17: Introduction to I/O Efficient Algorithms (External Memory Model)

Fundamental Bounds

Internal ExternalScanning: O(N) O(N/B)Sorting: O(N log N) O((N/B) logM/B(N/B))

Permuting: O(N) O(min{N, (N/B) logM/B(N/B)})

Searching: O(log2 N) O(logB N)

I Linear I/O: O(N/B)

I Permuting not linear

I Permuting and sorting equal (practically)

I B factor very important NB < N

B logM/BNB � N

I Cannot sort optimally with search tree

Page 18: Introduction to I/O Efficient Algorithms (External Memory Model)

Fundamental Bounds

Internal ExternalScanning: O(N) O(N/B)Sorting: O(N log N) O((N/B) logM/B(N/B))

Permuting: O(N) O(min{N, (N/B) logM/B(N/B)})Searching: O(log2 N)

O(logB N)

I Linear I/O: O(N/B)

I Permuting not linear

I Permuting and sorting equal (practically)

I B factor very important NB < N

B logM/BNB � N

I Cannot sort optimally with search tree

Page 19: Introduction to I/O Efficient Algorithms (External Memory Model)

Fundamental Bounds

Internal ExternalScanning: O(N) O(N/B)Sorting: O(N log N) O((N/B) logM/B(N/B))

Permuting: O(N) O(min{N, (N/B) logM/B(N/B)})Searching: O(log2 N) O(logB N)

I Linear I/O: O(N/B)

I Permuting not linear

I Permuting and sorting equal (practically)

I B factor very important NB < N

B logM/BNB � N

I Cannot sort optimally with search tree

Page 20: Introduction to I/O Efficient Algorithms (External Memory Model)

Difference Between N and N/B

Consider traversing a linked list.

I Naive: O(N) blocks, each hop to new block.

I Smart: O(N/B) blocks, if sequential nodes in single block.

Example: N = 256× 106, B = 8000, 1ms disk access time

I N I/Os takes 256× 103 sec = 4266 min = 71 hours

I N/B I/Os takes 256/8 sec = 32 sec

Page 21: Introduction to I/O Efficient Algorithms (External Memory Model)

Difference Between N and N/B

Consider traversing a linked list.

I Naive: O(N) blocks, each hop to new block.

I Smart: O(N/B) blocks, if sequential nodes in single block.

Example: N = 256× 106, B = 8000, 1ms disk access time

I N I/Os takes 256× 103 sec = 4266 min = 71 hours

I N/B I/Os takes 256/8 sec = 32 sec

Page 22: Introduction to I/O Efficient Algorithms (External Memory Model)

TPIE

Templated Portable I/O EnvironmentOpen source library of I/O-Efficient data structures.

I External memory merge sort

I B-Tree

I Priority queue

I Simple buffered stacks and queues

http://www.madalgo.au.dk/tpie/

Page 23: Introduction to I/O Efficient Algorithms (External Memory Model)

Attribution

These slides are heavily based on slides by Lars Arge(a leading expert in the area of External Memory algorithms).See: http://www.daimi.au.dk/~large/ioS09/