Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 10 – Oktober 31 Computer Science an overview EDITION 7 / 8 J. Glenn Brookshear
Mar 31, 2015
Slide 8-1 Copyright © 2003 Pearson Education, Inc.
Overzicht Informatica
College 10 – Oktober 31
Computer Sciencean overview
EDITION 7 / 8J. Glenn Brookshear
Slide 8-2 Copyright © 2003 Pearson Education, Inc.
C H A P T E R 8 (was Chap. 7)
Data Structures
• Abstractions of the actual data organization in main memory
• Allow users to perceive data as ‘logical units’ (e.g.: arrangement in rows and columns)
Slide 8-3 Copyright © 2003 Pearson Education, Inc.
Data Structure Basics: Pointers
• Pointers:– pointer = location in memory that contains the
address of another location in memory– so: pointer points to data positioned elsewhere in
memory
F02A
F02A
FF8C
FF8C
64B0
64B0
Slide 8-4 Copyright © 2003 Pearson Education, Inc.
Static versus Dynamic Data Structures
• Static:– shape & size of structure does not change over time– example in C: int Table[2][9];
Table:
65
32• Dynamic:
– shape & size may change– example: Stack
48
97
17
Stack:
Slide 8-5 Copyright © 2003 Pearson Education, Inc.
Arrays
• Example: to store 24 hourly temperature readings…• … a convenient storage structure is 1-D homogeneous
array of 24 elements (e.g. in C: float Readings[24] )• In main memory:
… x + 23
Address of Readings[i] = x + (i-1)
Slide 8-6 Copyright © 2003 Pearson Education, Inc.
Two-dimensional Arrays
• Sometimes 2-D homogeneous arrays are more useful (e.g. in C: float Table[4][5] )
=> Address of Table[i][j] = x + (nr_of_columns × (i-1)) + (j - 1)=> Example: address of Table[3][4] = x + 13
Slide 8-7 Copyright © 2003 Pearson Education, Inc.
Opdracht:
Suppose an array with 6 rows and 8 columns is stored starting at address 20. If each entry in the array requires only one memory cell, what is the address of the entry in the 3rd row and 4th column? What if each entry requires two cells?
• If 1 cell per entry:– address at [3, 4] : 20 + 8 × (3-1) + (4-1) = 39.
• If 2 cells per entry:– address at [3, 4] : 20 + 2 × (8 × (3-1) + (4-1)) = 58.
Slide 8-8 Copyright © 2003 Pearson Education, Inc.
Opdracht:
Describe a method for storing 3-D homogeneous arrays. What addressing formula would be used to locate the entry in the i-th plane, the j-th row, and the k-th column?
• 3D: place each 2D plane consecutively in memory
• Addressing formula (with x as start address):– x + r × c × (i - 1) + c × (j - 1) + (k - 1)
• 2D:
Slide 8-9 Copyright © 2003 Pearson Education, Inc.
Lists
• To store an ordered list of names we could use 2-D homogeneous array (in C: char Names[10][8])
• However:– addition & removal of names requires expensive data
movements!
Slide 8-10 Copyright © 2003 Pearson Education, Inc.
Linked Lists
• Data movements can be avoided by using a ‘linked list’, including pointers to list entries
Slide 8-11 Copyright © 2003 Pearson Education, Inc.
Deleting an Entry from a Linked List
• A list entry is removed by changing a single pointer:
Slide 8-12 Copyright © 2003 Pearson Education, Inc.
Inserting an Entry into a Linked List
• A new entry is inserted by setting pointer of– (1) new entry to address of entry that is to follow– (2) preceding entry to address of new entry:
Slide 8-13 Copyright © 2003 Pearson Education, Inc.
Previous
Opdracht:
Which of the following routines correctly inserts 'NewEntry' immediately after the entry called 'PreviousEntry' in a linked list?Routine 11. Copy pointer field of 'PreviousEntry' into the pointer field of 'NewEntry'.2. Change pointer field of 'PreviousEntry' to the address of 'NewEntry'.
Routine 21. Change pointer field of 'PreviousEntry' to the address of 'NewEntry'.2. Copy pointer field of 'PreviousEntry' into the pointer field of 'NewEntry'.
(1)
(2)
=> routine 1 is correct
Slide 8-14 Copyright © 2003 Pearson Education, Inc.
Stacks
• Disadvantage of contiguous array structures:– insertion / removal requires costly data movements
• Still okay if insertion / removal restricted to end of array:– stack (with push & pop operations)
Slide 8-15 Copyright © 2003 Pearson Education, Inc.
A Stack in Memory
• Here:– conceptual structure close to identical to actual
structure in memory
• If maximum stack-size unknown:– pointers can be used ( => conceptual = actual structure )
Slide 8-16 Copyright © 2003 Pearson Education, Inc.
Queues
• List where insertions take place at one end, and deletions at the other: ‘queue’ – To serve ‘objects’ in the order of their arrival
(waiting-line)
Slide 8-17 Copyright © 2003 Pearson Education, Inc.
Queue “crawling” (1)
• Problem with queue shown so far:– queue moves downward in memory, destroying any
other data in its path:
Slide 8-18 Copyright © 2003 Pearson Education, Inc.
Queue “crawling” (2)
• Can be overcome by:– circular movement of insertions / deletions through
pre-designated area of memory:
Conceptual view of circular queue
Slide 8-19 Copyright © 2003 Pearson Education, Inc.
Opdracht:
Describe a data structure suitable for representing a board configuration during a chess game
• Simplest:– 8×8 homogeneous array, where each entry contains one of the
values {empty, king, queen, bishop, knight, rook, pawn}
• Other:– 2×16 homogeneous array: 1st dimension used to distinguish
between black / white; 2nd to enumerate remaining pieces of one color, incl. board position. To save memory space this 2nd dimension could be implemented as a linked list.
• Many, many more possibilities
Slide 8-20 Copyright © 2003 Pearson Education, Inc.
Chapter 8 - Data Structures: Conclusions
• Pointers:– basic aid in definition of dynamic data structures
• Often used data structures:– Arrays (multi-dimensional)– Lists (contiguous & linked)– Stacks– Queues (crawling & circular)– Trees (not discussed…)– …
Slide 8-21 Copyright © 2003 Pearson Education, Inc.
‘C H A P T E R’ 9.5 (was chap. 8)
File Structures
• Abstractions of the actual data organization on mass storage (hard disks, tapes, cd’s…)
• Again: differences between conceptual and actual data organization
Slide 8-22 Copyright © 2003 Pearson Education, Inc.
Files, Directories & the Operating System
• OS storage structure:– conceptual hierarchy of directories and files
directory tree
files
Slide 8-23 Copyright © 2003 Pearson Education, Inc.
Files: Conceptual vs. Actual View
• View at OS-level is conceptual– actual storage may differ significantly!
Slide 8-24 Copyright © 2003 Pearson Education, Inc.
Text Files
• Sequential file consisting of long string of encoded characters (e.g. ASCII-code)– But: character-string still interpreted by word processor!
Same file in “MS Word”File in “Notepad”
Slide 8-25 Copyright © 2003 Pearson Education, Inc.
From actual storage to conceptual view
sequential view
Interpretation by Application Program
Assembly by Operating System
actual storage
conceptual view
Sequential buffer
Slide 8-26 Copyright © 2003 Pearson Education, Inc.
Quick File Access
• Disadvantage of sequential files:– no quick access to particular file data
• Two techniques to overcome this problem:– (1) Indexing or (2) Hashing
keys
12N67 John Smith 23-Jul-71 17,000.00 New York …13C08 Andrew White 27-Jun-70 24,500.00 Boston …23G19 Mary Jackson 5-Mar-39 41,000.00 San Francisco …24X17 Eleanor Tracy 17-Sep-63 9,635.00 Fort Lauderdale …26X28 Michael Flanagan 1-Nov-44 18,800.00 Washington …32E76 Glenn White 29-Feb-68 17,000.00 Detroit …36Z05 Virginia Moore 27-Jun-70 32,000.00 San Francisco …
: : : : : …: : : : : …: : : : : …
• Indexing:Indexed File Index
12N67 location13C08 location23G19 location24X17 location26X28 location32E76 location36Z05 location
: :: :: :
loaded into mainmemory when opened
Slide 8-27 Copyright © 2003 Pearson Education, Inc.
Hashing
• Disadvantage of indexing is… the index– requires extra space + includes 1 extra indirection
• Solution: ‘hashing’– finds position in file using a key value (as in indexing)…
– … simply by identifying location directly from the key
• How?– define set of ‘buckets’
& ‘hash function’ that converts keys to bucket numbers …
key value
bucket number
0 1 2 3 … N
hash function
Slide 8-28 Copyright © 2003 Pearson Education, Inc.
Hash Function: Example
• If storage space divided into 40 buckets and hash function is division:– key values 14, 54, & 94 all map onto same bucket
(collision)
Key values
Slide 8-29 Copyright © 2003 Pearson Education, Inc.
Key field value can be anything
Slide 8-30 Copyright © 2003 Pearson Education, Inc.
Handling Bucket Overflow
• When bucket-sizes are fixed:– buckets can fill up and overflow
• One solution:– designate special overflow storage area
not fixed in size!
Slide 8-31 Copyright © 2003 Pearson Education, Inc.
Opdracht:
If we use division as a hash function and have 23 buckets, in which bucket should we search to find the record whose key is interpreted as the integer value 101?
…
101
bucket number: 9
0 1 2 … 9 … 23
Division: 101 / 23 = 4, remainder 9
…
Slide 8-32 Copyright © 2003 Pearson Education, Inc.
Opdracht:
a) What advantage does an indexed file have over a hash file?b) What advantage does a hash file have over an indexed file?
• a) When key unique: index directly points to required data, while hashing oftens require an additional (sequential) bucket search (incl. bucket overflow).
• b) No additional index file storage is required.
Slide 8-33 Copyright © 2003 Pearson Education, Inc.
Chapter 9.5 - File Structures: Conclusions
• File Structures:– abstractions of actual data organization on mass
storage
• Changes of ‘view’:– actual storage -> sequential view by OS ->
conceptual view presented to user
• Quick access to particular file data by– (1) indexing– (2) hashing (requires no index, but requires bucket search!)
Slide 8-34 Copyright © 2003 Pearson Education, Inc.
C H A P T E R 9
Database Structures
• (Large) integrated collections of data that can be accessed quickly
• Combination of data structures and file structures
Slide 8-35 Copyright © 2003 Pearson Education, Inc.
Historical Perspective
• Originally: departments of large organizations stored all data separately in flat files
• Problems: redundancy & inconsistencies
Slide 8-36 Copyright © 2003 Pearson Education, Inc.
Integrated Database System
• Better approach: integrate all data in a single system, to be accessed by all departments
Slide 8-37 Copyright © 2003 Pearson Education, Inc.
Disadvantages of Data Integration
• Disadvantages:– Control of access to sensitive data?!
• Bijvoorbeeld: personeelszaken heeft niets te maken met persoonlijke gegevens opgeslagen door de bedrijfsarts!
– Misinterpretation of integrated data• Supermarkt-database zegt dat een klant veel medicijnen
koopt. Wat betekent dit? Wat als deze klant solliciteert op een baan bij de supermarkt-keten?
– What about the right to hold/collect/interpret data?• Heeft een credit card company het recht gegevens over
koopgedrag van personen te gebruiken/verkopen?
Slide 8-38 Copyright © 2003 Pearson Education, Inc.
Conceptual Database Layers
OperatingSystem
Actual datastorage
Data seen interms of asequential view
• Compare:
Slide 8-39 Copyright © 2003 Pearson Education, Inc.
The Relational Model
• Relational Model– shows data as being stored in rectangular tables,
called relations, e.g.:
– row in a relation is called ‘tuple’– column in a relation is called ‘attribute’
Slide 8-40 Copyright © 2003 Pearson Education, Inc.
Issues of Relational Design
• So, relations make up a relational database… • … but this is not so straightforward:
• Problem: more than one concept combined in single relation
Slide 8-41 Copyright © 2003 Pearson Education, Inc.
Redesign by extraction of 3 concepts
Any information obtained
by combining information
from multiple relations
Slide 8-42 Copyright © 2003 Pearson Education, Inc.
Example:
• Finding all departments in which employee 23Y34 has worked:
Slide 8-43 Copyright © 2003 Pearson Education, Inc.
Relational Operations
• Extracting information from a relational database by way of relational operations– Most important ones:
• (1) extract tuples (rows) : SELECT
• (2) extract attributes (columns) : PROJECT
• (3) combine relations : JOIN
• Such operations on relations produce other relations– so: they can be used in combination, to create
complex database requests (or ‘queries’)
Slide 8-44 Copyright © 2003 Pearson Education, Inc.
The SELECT operation
Slide 8-45 Copyright © 2003 Pearson Education, Inc.
The PROJECT operation
Slide 8-46 Copyright © 2003 Pearson Education, Inc.
The JOIN operation
Slide 8-47 Copyright © 2003 Pearson Education, Inc.
Opdracht:
• RESULT := PROJECT W from X
X relationU V W
A Z 5B D 3C Q 5
Y relation R S
3 J 4 K
RESULTX.U X.V X.W Y.R Y.S
A Z 5 3 J A Z 5 4 K C Q 5 3 J C Q 5 4 K
SELECT from X where W=5PROJECT S from Y JOIN X and Y where X.W > Y.R
Slide 8-48 Copyright © 2003 Pearson Education, Inc.
Opdracht:
PART relation
PartName Weight
Bolt 2X 1 Bolt 2Z 1.5 Nut V5 0.5
• a) Which companies make Bolt 2Z?
– NEW := SELECT from MANUFACTURER where PartName = Bolt2Z
– RESULT := PROJECT CompanyName from NEW
MANUFACTURER relation
CompanyName PartName Cost
Company X Bolt 2Z .03 Company X Nut V5 .01 Company Y Bolt 2X .02 Company Y Nut V5 .01 Company Y Bolt 2Z .04 Company Z Nut V5 .01
Slide 8-49 Copyright © 2003 Pearson Education, Inc.
Opdracht:
PART relation
PartName Weight
Bolt 2X 1 Bolt 2Z 1.5 Nut V5 0.5
• b) Obtain a list of the parts (+cost) made by Company X?
– NEW := SELECT from MANU’ER where CompanyName=CompanyX
– RESULT := PROJECT PartName, Cost from NEW
MANUFACTURER relation
CompanyName PartName Cost
Company X Bolt 2Z .03 Company X Nut V5 .01 Company Y Bolt 2X .02 Company Y Nut V5 .01 Company Y Bolt 2Z .04 Company Z Nut V5 .01
Slide 8-50 Copyright © 2003 Pearson Education, Inc.
Opdracht:
PART relation
PartName Weight
Bolt 2X 1 Bolt 2Z 1.5 Nut V5 0.5
• c) Which companies make a part with weight 1?
– NEW1 := JOIN MANUCTURER and PART where MANUFACTURER.PartName = PART.PartName
– NEW2 := SELECT from NEW1 where PART.Weight = 1
– RESULT := PROJECT MANU’ER.CompanyName from NEW2
MANUFACTURER relation
CompanyName PartName Cost
Company X Bolt 2Z .03 Company X Nut V5 .01 Company Y Bolt 2X .02 Company Y Nut V5 .01 Company Y Bolt 2Z .04 Company Z Nut V5 .01
Slide 8-51 Copyright © 2003 Pearson Education, Inc.
Opdracht:
PART relation
PartName Weight
Bolt 2X 1 Bolt 2Z 1.5 Nut V5 0.5
MANUFACTURER relation
CompanyName PartName Cost
Company X Bolt 2Z .03 Company X Nut V5 .01 Company Y Bolt 2X .02 Company Y Nut V5 .01 Company Y Bolt 2Z .04 Company Z Nut V5 .01
• c) Which companies make a part with weight 1?
– NEW1 := SELECT from PART where Weight = 1
– NEW2 := JOIN MANUCTURER and NEW1 where MANUFACTURER.PartName = NEW1.PartName
– RESULT := PROJECT MANU’ER.CompanyName from NEW2
Slide 8-52 Copyright © 2003 Pearson Education, Inc.
Chapter 9 - Database Structures: Conclusions
• Database Structures:– (large) integrated collections of data that can be
accessed quickly
• Database Management System– provides high-level view of actual data storage
(database model)
• Relational Model most often used– relational operations: SELECT, PROJECT, JOIN, …
– high-level language for database access: SQL
Slide 8-53 Copyright © 2003 Pearson Education, Inc.
Overzicht Informatica – Tentamen (1)
• Most important sections (editie 8) & keywords:
– Ch. 0 - 1, 3, 4: abstractie / algoritme
– Ch. 1 - 1, 2, 4, 5, 6, 7: bits / data opslag & representatie (ASCII, etc) / Boolse operaties / flipflops / geheugen-vormen en -karakteristieken / getalstelsels (binair, hexadecimaal, etc…) / overflow & truncation errors
– Ch. 2 - 1, 2, 3, 4, 6: cpu architectuur / machine language & instructions / programma executie / machine cycle / alternatieve architecturen
– Ch. 3 - 1, 2, 3, 4: operating systems / batch processing / time-sharing / multitasking / OS componenten / process vs. programma / competition
– Ch. 4 - 1, 2, 3, 4: network topologies / bridges / routers / client-server / the internet / world wide web / network protocols / the grid
Slide 8-54 Copyright © 2003 Pearson Education, Inc.
Overzicht Informatica – Tentamen (2)
• Most important sections (editie 8) & keywords:
– Ch. 5 - 1, 2, 4, 5, 6: algoritme (formeel) / primitiven / pseudo-code / syntax / semantiek / iteratie / loop control / recursie / efficientie
– Ch. 6 - 1, 2, 3, 4, 5: generaties: 1e, 2e, 3e / assembly language / compilers / machine independence / paradigma’s / imperatief / object-georienteerd / programming concepts / procedures / parameters / call by value/reference translation/compilation process
– Ch. 7 - 1, 2, 3: software life cycle / ontwikkelings-fase / modulariteit / koppeling / cohesie / documentatie / complexiteits-maat voor software
– Ch. 8 - 1, 2: datastructuren / abstractie / statisch vs. dynamisch / pointers / (arrays, lists, stacks, queues, etc…)
Slide 8-55 Copyright © 2003 Pearson Education, Inc.
Overzicht Informatica – Tentamen (3)
• Most important sections (editie 8) & keywords:
– Ch. 9 - 1, 2, 5: databases vs. ‘platte’ files / relaties / tuples / attributen / relationele operaties: SELECT, PROJECT, JOIN / files / sequential / tekst / indexed / hashing
– Ch. 10 – 1, 3, 4: intelligent agents / Turing-test / production systems / search trees / heuristics / artificial neural networks / training vs test
– Ch. 11 – 1, 2, 4: computability / Turing Machines / the ‘halting’ problem
Veel succes!