Top Banner
Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important is the extent to which knowledge is organized and mastered Goethe, 1810
60

Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Data Structure and Storage

The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important is the extent to

which knowledge is organized and mastered

Goethe, 1810

Page 2: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Data Structures

The goal is to minimize disk accessesDisks are relatively slow compared to main memory

Writing a letter compared to a telephone call

Disks are a bottleneckAppropriate data structures can reduce disk accesses

Page 3: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Database access

DBMSFile

managerDisk

manager

Recordrequest

Pagerequest

Readpage

command

Pageread

Pagereturned

Recordreturned

Page 4: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Disks

Data stored on tracks on a surfaceA disk drive can have multiple surfaces Rotational delay

Waiting for the physical storage location of the data to appear under the read/write headAround 4 msec for a magnetic diskSet by the manufacturer

Access arm delayMoving the read/write head to the track on which the storage location can be found.Around 9 msec for a magnetic disk

Page 5: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Minimizing data access times

Rotational delay is fixed by the manufacturerAccess arm delay can be reduced by storing files on

The same trackThe same track on each surface• A cylinder

Page 6: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Clustering

Records that are often retrieved together should be stored togetherIntra-file clustering

Records within the one file• A sequential file

Inter-file clusteringRecords in different files• A nation and its stocks

Page 7: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Disk manager

Manages physical I/OSees the disk as a collection of pages

Has a directory of each page on a diskRetrieves, replaces, and manages free pages

Page 8: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

File manager

Manages the storage of filesSees the disk as a collection of stored files

Each file has a unique identifierEach record within a file has a unique record identifier

Page 9: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

File manager's tasks

Create a fileDelete a fileRetrieve a record from a fileUpdate a record in a fileAdd a new record to a fileDelete a record from a file

Page 10: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Sequential retrieval

Consider a file of 10,000 records each occupying 1 pageQueries that require processing all records will require 10,000 accesses

e.g., Find all items of type 'E'

Many disk accesses are wasted if few records meet the condition

Page 11: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Indexing

An index is a small file that has data for one field of a fileIndexes reduce disk accesses

Page 12: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Querying with an index

Read the index into memorySearch the index to find records meeting the conditionAccess only those records containing required dataDisk accesses are substantially reduced when the query involves few records

Page 13: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Maintaining an index

Adding a record requires at least two disk accesses

Update the fileUpdate the index

Trade-offFaster queriesSlower maintenance

Page 14: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Using indexes

Sequential processing of a portion of a file

Find all items with a type code in the range 'E' to 'K'

Direct processingFind all items with a type code of 'E' or 'N'

Existence testingDetermining whether a record meeting the criteria exists without having to retrieve it

Page 15: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Multiple indexes

Find red items of type 'C'Both indexes can be searched to identify records to retrieve

Page 16: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Multiple indexes

Indexes are also called inverted lists

A file of record locations rather than data

Trade-offFaster retrievalSlower maintenance

Page 17: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Sparse indexesTaking advantage of the physical sequence of a fileAssume 2 records per page

TradeoffsFewer disk accesses required to read the index Existence tests not possible

Page 18: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

B-tree

A form of inverted listFrequently used for relational systemsBasis of IBM’s VSAM underlying DB2Supports sequential and direct accessingHas two parts

Sequence setIndex set

Page 19: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

B-tree

Sequence set is a single level index with pointers to recordsIndex set is a tree-structured index to the sequence set

Page 20: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

B+ tree

The combination of index set (the B-tree) and the sequence set is called a B+ treeThe number of data values and pointers for any given node are not restrictedFree space is set aside to permit rapid expansion of a fileTradeoffs

Fast retrieval when pages are packed with data values and pointersSlow updates when pages are packed with data values and pointers

Page 21: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

B-tre

(Fra Weiss: Algorithms and Data Structures using Java)

•De to øverste nivåene i treet kan være innlastet i RAM

•En post kan da finnes med kun én diskaksess. Eller to hvis tabellen er så stor at man trenger tre nivåer i indeksen.

En indeksnode svarer til én page på disken.

Én page kan f.eks være 8 kB. Er feltet 12 byte og diskadresse 4

byte, vil indeksnoden inneholde ca 500 verdier. To nivåer med indeks kan da nå 500*500 eller

250000 sider på disken

Page 22: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Hashing

A technique for reducing disk accesses for direct accessAvoids an indexNumber of accesses per record can be close to oneThe hash field is converted to a hash address by a hash function

Page 23: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Shortcomings of hashing

Different hash fields convert to the same hash address

SynonymsStore the colliding record in an overflow area

Long synonym chains degrade performanceThere can be only one hash fieldThe file can no longer be processed sequentially

Page 24: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Hashing

hash address = remainder after dividing SSN by 10000

Page 25: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Linked list

A structure for inter-file clusteringAn example of a parent/child structure

Page 26: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Linked lists

There can be two-way pointers, forward and backward, to speed up deletionEach child can have a pointer to its parent

Page 27: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Bit map indexes

Uses a single bit, rather than multiple bytes, to indicate the specific value of a field

Color can have only three values, so use three bits

Itemcode Color Code Disk addressRed Green Blue A N

1001 0 0 1 0 1 d1

1002 1 0 0 1 0 d2

1003 1 0 0 1 0 d3

1004 0 1 0 1 0 d4

Page 28: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Bit map indexes

A bit map index saves space and time compared to a standard index

Itemcode Color

Char(8)

Code

Char(1)

Disk address

1001 Blue N d1

1002 Red A d2

1003 Red A d3

1004 Green A d4

Page 29: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Join indexes

Speed up joins by creating an index for the primary key and foreign key pairnation index stock index

natcode Disk address

natcode Disk address

UK d1 UK d101

USA d2 UK d102

UK d103

USA d104

USA d105

join index

nationdisk address

stockdisk address

d1 d101

d1 d102

d1 d103

d2 d104

d2 d105

Page 30: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Data coding standards

ASCIIUNICODE

Page 31: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

ASCII

Each alphabetic, numeric, or special character is represented by a 7-bit code128 possible charactersASCII code usually occupies one byte

Page 32: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

UNICODEA unique binary code for every character, no matter what the platform, program, or languageCurrently contains 34,168 distinct characters derived from 24 supported language scriptsCovers the principal written languagesTwo encoding forms

A default 16-bit form A 8-bit form called UTF-8 for ease of use with existing ASCII-based systems

The default encoding of HTML and XMLThe basis of global software

Page 33: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Data storage devices

What data storage device will be used for

On-line data• Access speed• Capacity

Back-up files• Security against data loss

Archival data• Long-term storage

Page 34: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Key variables

Data volumeData volatilityAccess speedStorage costMedium reliabilityLegal standing of stored data

Page 35: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Magnetic technology

Up to 50% of IS hardware budgets are spent on magnetic storageA $50 billion marketThe major form of data storageA mature and widely used technologyStrong magnetic fields can erase dataMagnetization decays with time

Page 36: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Fixed disks

Sealed, permanently mountedHighly reliableAccess times of 4-10 msecTransfer rates as high as 1,300 Mbytes per secondCapacities of Gbytes to Tbytes

Page 37: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

A disk storage unit

Page 38: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

RAID

Redundant arrays of inexpensive or independent drivesExploits economies of scale of disk manufacturing for the personal computer marketCan also give greater securityIncreases a systems fault toleranceNot a replacement for regular backup

Page 39: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Mirroring

Page 40: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Mirroring

WriteIdentical copies of a file are written to each drive in an array

ReadAlternate pages are read simultaneously from each drivePages put together in memoryAccess time is reduced by approximately the number of disks in the array

Read errorRead required page from another drive

TradeoffsReduced access timeGreater securityMore disk space

Page 41: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Striping

Page 42: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Striping

Three drive modelWrite

Half of file to first driveHalf of file to second driveParity bit to third drive

ReadPortions from each drive are put together in memory

Read errorLost bits are reconstructed from third drive’s parity data

TradeoffsIncreased data securityLess storage capacity than mirroringNot as fast as mirroring

Page 43: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

RAID levels

All levels, except 0, have common featuresThe operating system sees a set of physical drives as one logical driveData are distributed across physical drivesParity is used for data recovery

Page 44: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

RAID levels

Level 0Data spread across multiple drivesNo data recovery when a drive fails

Level 1MirroringCritical non-stop applications

Level 3Striping

Level 5A variation of stripingParity data is spread across drivesLess capacity than level 1Higher I/O rates than level 3

Page 45: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

RAID 5

Page 46: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

RAID på UUS

Page 47: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Magnetic technology

Removable magnetic diskMagnetic tapeMagnetic tape cartridgeMass storage

Page 48: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Masselager på UUS

Page 49: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Solid State

Arrays of memory chipsCan be 50 times faster than magnetic storage$1,400 per Gbyte

Magnetic disk is about $1 per Gbyte

Stock trading and video-streaming applications

Page 50: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Flash drive

SmallRemovableSolid stateUSB connectorUp to 2 Gbytes capacityAround $100 per Gbyte

Page 51: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Optical technology

A more recent development than magneticUse a laser for reading and writing dataHigh storage densitiesLow costDirect accessLong storage lifeNot susceptible to head crashes

Page 52: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Optical technology

Page 53: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

CD-ROM

CD can store data as well as soundEconomies of scale because of common components for CD players and CD-ROM drivesROM - read only memoryCapacity of 650 M bytesRelatively slow device

100 ms access time

Page 54: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Magneto-optical disk

High capacity read-write medium3.5" disk can store up to 256 M bytesNot as fast as fixed disk

10 msec access time

CompactReliableSuitable for data transfer, backup, and archival purposes

Page 55: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Digital Versatile Disc (DVD)

The same physical size as a CD-ROM but up to 28 times the capacity (i.e., 17 Gbytes)DVD drives are likely to have transfer rates of around 2.76 M bytes/sec and access times of 150 msec DVD-ROM drive will play both audio CDs and CD-ROMsRead-only versions

DVD-Video (movies)DVD-ROM (software)DVD-Audio (songs)

DVD-RRecordable (write once, read many)

DVD-RAMErasable (write many, read many)

Page 56: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

SANStorage area networkSupports dynamic sharing of large amounts of data, regardless of operating system or applicationCommunicates via pipelines that consist of an interface called Fibre Channel

A high speed data connection between computer devices

Prices vary from $20-30,000 to 5 million

Page 57: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Storage life

Page 58: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Merit of data storage devices

Device Access speed

Volume Volatility Cost per megabyte

Reliability Legal standing

Solid state *** * *** * ** *

Fixed disk *** *** *** ** ** *

RAID *** *** *** ** *** *

Removable disk ** ** *** ** ** *

Floppy * * *** * * *

Tape * ** * *** ** *

Cartridge ** *** * *** ** *

Mass storage ** *** * *** ** *

SAN *** *** *** ** *** *

CD-ROM * ** * ** *** ***

CD-R * ** * ** *** **

CD-RW * ** * ** *** *

WORM * *** * *** *** **

Magneto-optical ** *** ** *** *** *

DVD-ROM * *** * *** *** ***

DVD-R * *** * *** *** **

DVD-RAM * *** ** *** *** *

Page 59: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Data compressionEncoding digital data so it requires less storage space and thus less network bandwidthLossless

File can be restored to original state

LossyFile cannot be restored to original stateUsed for graphics, video, and audio files

Page 60: Data Structure and Storage The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important.

Key pointsDisk drives are relatively slow compared to main memoryA variety of techniques are used to overcome the disk access bottleneckStorage devices vary on several parametersSelect a storage device based on storage and retrieval goals