2 December 2005
Introduction to Databases Introduction and Conceptual Modelling
Prof. Beat Signer
Department of Computer Science
Vrije Universiteit Brussel
http://www.beatsigner.com
Beat Signer - Department of Computer Science - [email protected]
2 February 14, 2014
Course Organisation
Prof. Beat Signer
Vrije Universiteit Brussel
10 G 731d
+32 2 629 12 39
Reinout Roels
Vrije Universiteit Brussel
10 F 730
+32 2 629 37 53
Beat Signer - Department of Computer Science - [email protected]
3 February 14, 2014
Course Information
Course book Database System Concepts
(Sixth Edition), Abraham Silberschatz, Henry Korth and S. Sudarshan, McGraw-Hill, 2010
additional information from the book is available online
- http://highered.mcgraw-hill.com/sites/0073523321
Course information (lecture slides, exercises, …)
available on PointCarré http://pointcarre.vub.ac.be/index.php?application=weblcms&go=course_viewer&course=2321
Beat Signer - Department of Computer Science - [email protected]
4 February 14, 2014
Exercises
Course content is going to be applied in the
exercise sessions
Weekly exercise sessions starting on February 20
- group 1: computer room E.1.4, Thursday 11:00-13:00
- group 2: computer room E.1.7, Thursday 14:00-16:00
assistants: Reinout Roels
Additional content may be covered in exercise sessions exam covers content of lectures and exercises
Beat Signer - Department of Computer Science - [email protected]
5 February 14, 2014
Exam
Written closed book exam in Dutch / English covers content of lectures, specific book chapters
and exercises
Beat Signer - Department of Computer Science - [email protected]
6 February 14, 2014
Course Overview
1. Introduction overview
conceptual modelling and ER model
2. Extended ER Model and other Modelling Languages
3. Relational Model and Relational Algebra
4. Relational Database Design reduction
functional dependencies and normalisation
5. Structured Query Language (SQL)
6. Advanced SQL
Beat Signer - Department of Computer Science - [email protected]
7 February 14, 2014
Course Overview …
7. DBMS Architectures and Features DBMS components
client-server architecture
parallelisation and distribution
8. Storage Management
9. Access Methods indexing and hashing
10.Query Processing and Optimisation
11.Transaction Management transactions
concurrency and recovery
Beat Signer - Department of Computer Science - [email protected]
8 February 14, 2014
Course Overview …
12.Object and Object-Relational Databases
13.Future Trends and Review
Beat Signer - Department of Computer Science - [email protected]
9 February 14, 2014
Databases in Action
Online shops product information, customer
data, order data, ...
e.g. Amazon
- hundreds of millions of customers
- more than 50 terrabytes of data
Human resources course registration, student
grades, employee records, salary information, tax information, ...
e.g. PointCarré
- course registration
Beat Signer - Department of Computer Science - [email protected]
10 February 14, 2014
Databases in Action ...
Banking and trading customer data, account
information, transactions, ...
e.g. London Stock Exchange
- almost 1 million trades per day
Reservation systems book flights from multiple
airlines, hotel rooms etc.
e.g. Amadeus systems
- Global Distribution System (GDS)
founded by Lufthansa, Air France
and other partners
Beat Signer - Department of Computer Science - [email protected]
11 February 14, 2014
Databases in Action ...
Digital archives persistently store various
types of digital media
e.g. Internet Archive project
- access to more than 150 billion
archived web pages
[archive.org]
Libraries index for traditional paper-
based libraries as well as digital libraries
e.g. Open Library project
- over 23 million indexed books
[openlibrary.org]
Beat Signer - Department of Computer Science - [email protected]
12 February 14, 2014
Databases in Action ...
Geographic Information
Systems (GIS) store raster (bitmap) or
vector data representing real world objects
geospatial query language
Scientific databases sensor data, classifications
(e.g. human genome) as well as data from simulations
e.g. LHC Computing Grid
- LHC experiments at CERN
- 15 petabytes of data per year
Beat Signer - Department of Computer Science - [email protected]
13 February 14, 2014
Databases in Action ...
Many everyday devices
contain databases TVs, washing machines,
mobile phones, ...
e.g. Android phones with SQLite database
Embedded databases in
cars, airplanes etc. manage configurations and
store sensor data
e.g. db4o object database used in BMW's Car IT system
Beat Signer - Department of Computer Science - [email protected]
14 February 14, 2014
Databases in Action ...
Databases in the WISE
research lab (VUB) database-driven cross-media
publishing
database extensions for hypermedia services
personal information management
data visualisation
- e.g. ArtVis
human-information interaction
paper-digital interfaces
Beat Signer - Department of Computer Science - [email protected]
15 February 14, 2014
Databases in Action ...
Databases touch all aspects of our daily life!
Numerous large database software companies e.g. Oracle was the 3rd largest software company in 2011
Beat Signer - Department of Computer Science - [email protected]
16 February 14, 2014
Basic Terminology
Database collection of logically related data
database schema describes the database design
- format and relationships between stored data (often rather static)
collection of data stored in a database at a given time is called an instance of the database
Database Management Systems (DBMS) tools (programs) to efficiently store, maintain and retrieve
information from a database
- support of create, read, update and delete data (CRUD operations)
- data definition language (DDL) to define the database schema
- data manipulation language (DML) to query and update the data
• often declarative fourth generation languages (4GLs) such as SQL
- data access control, transactions and concurrency control
Beat Signer - Department of Computer Science - [email protected]
17 February 14, 2014
File-Processing System
Why should we not just use multiple files in a
file system to store our data?
There are various disadvantages of such an approach data redundancy and inconsistency
- different file formats over time
- duplication of information in different files
limited data access
- we have to write new programs to carry out new tasks
- data cannot be retrieved in a convenient and efficient manner
data isolation
- data may be distributed over different files without a common format
integrity
- integrity constraints (e.g. balance > 0) are hidden in the program code and not
explicitly stated and checked
Beat Signer - Department of Computer Science - [email protected]
18 February 14, 2014
File-Processing System …
missing atomic operations
- system failures and crashes may leave the data in an inconsistent state
(e.g. only parts of a operation have been carried out)
- example: transfer of money from one account to another account
concurrent update anomalies
- concurrent updates may leave the data in an inconsistent state
- example: two programs simultaneously removing money from a single account
limited security control
- difficult to give a user only access to parts of a file
DBMSs offer solutions to all these problems concepts and algorithms to solve the problems with
file-processing systems
Beat Signer - Department of Computer Science - [email protected]
19 February 14, 2014
General Database Architecture
Access
Methods
System
Buffers
Authorisation
Control
Integrity
Checker
Command
Processor
Program
Object Code
DDL
Compiler
File
Manager
Buffer
Manager
Recovery
Manager
Scheduler
Query
Optimiser
Transaction
Manager
Query
Compiler
Queries
Catalogue
Manager
DML
Preprocessor
Database
Schema
Application
Programs
Data, Indices and
System Catalogue
Database
Manager
Data
Manager
DBMS
Programmers Users DB Admins
Based on 'Components of a DBMS', Database Systems,
T. Connolly and C. Begg, Addison-Wesley 2010
Beat Signer - Department of Computer Science - [email protected]
20 February 14, 2014
Data Abstraction
Physical level physical schema describes how the data
is stored (complex low-level data structures)
Logical level logical schema describes what data is stored
- simple structures: attribute names, data types and relationships between data
implementation of simple structures might be based on complex physical-level structures but the user of the logical level should not be aware of that physical data independence
View level subschemas provide only access to parts of the database
- reduce complexity and introduce security
multiple views might be defined for a single database
Viewn
Physical Level
Logical Level
View1
Beat Signer - Department of Computer Science - [email protected]
21 February 14, 2014
Duality of the Database Schema
Database Schema describes
describes
Application
Concepts
Database
Concepts
Application
World
Computer
World
Beat Signer - Department of Computer Science - [email protected]
22 February 14, 2014
Data Models and History of DBMSs
Punched cards used since 1725 Hollerith cards later used by IBM for data processing
1950s: Data processing with magnetic tapes as storage only sequential access to data
reading from one or multiple tapes and writing to a new tape
sometimes combined with input from punched cards etc.
Magnetic Tape Punched Card
Beat Signer - Department of Computer Science - [email protected]
23 February 14, 2014
Data Models and History of DBMSs ...
1960s: Widespread use of hard disks direct access (random access) to data
opened possibilities for new Navigational DBMSs
- IBM's IMS (1968), hierarchical database
- Integrated Data Store (IDS), network database
IBM 350 Disk
Storage Unit
Hard Disk
Beat Signer - Department of Computer Science - [email protected]
24 February 14, 2014
Data Models and History of DBMSs ...
A data model is a collection of conceptual tools describes data, data relationships, data semantics and constraints
Hierarchical model data organised in a tree structure
used in early mainframe DBMS
- e.g. IBM's Information Management System (IMS)
XML documents also described by a hierarchical model
Network model generalised graph structure
two main constructs
- records contain fields and sets define relationships between records
navigational operations
- follow the relationship from one record to another record
Beat Signer - Department of Computer Science - [email protected]
25 February 14, 2014
Data Models and History of DBMSs ...
Relational model collection of tables (relations) containing records
described in a paper by Edgar F. Codd in 1970
1970s: Relational DBMS IBM's System R (1974) "based" on Codd's paper; SQL added later
Entity-Relationship (ER) model representation of basic objects (entities) and their relationships
widely used in conceptual database design
Object-based data model introduces object identity, encapsulation and methods
1980s: Object Databases initial work on object databases
Beat Signer - Department of Computer Science - [email protected]
26 February 14, 2014
Data Models and History of DBMSs ...
1990s: Web Interfaces to Databases databases deployed much more extensively
Semistructured data model no clear separation between data and the schema
("self-describing" data)
individual data items of the same type may have different attributes
XML is widely used to represent semistructured data
2000s: XML and XQuery relational databases often still form the core
Later 2000s: Extremely Large-scale distributed DBMS BigTable or Hadoop and Hbase
"NoSQL databases"
Beat Signer - Department of Computer Science - [email protected]
27 February 14, 2014
Database Design
Conceptual Design define an abstract conceptual application model containing the
main domain concepts
- interact with domain experts to get the requirements
describe the entities (with attributes) and their relationships
specify the functional requirements (operations)
- ensure that operations can be realised based on the conceptual model
Database implementation based on conceptual model logical design phase
- mapping of the conceptual schema to the implementation data model
• e.g. reduction from the ER model to the relational data model
- define the logical database schema
physical design phase
- define the physical database layout based on the logical database schema
Beat Signer - Department of Computer Science - [email protected]
28 February 14, 2014
Database Design ...
Two major database design pitfalls have to be avoided we should avoid any redundancy where information is repeated at
multiple places since this might lead to inconsistent data
- e.g. a lecture management system where a student's name is stored for each
lecture they are attending instead of storing it in a separate student entity
a database design may be incomplete and not enable the representation of certain aspects of the application domain
- e.g. in a shopping application where the customer information is stored as part
of an order we cannot enter new customer data without having an order
There is often more than one "good design" e.g. when do we model something as a relationship and when as
a separate entity?
modelling is a challenging task that requires a combination of engineering skills and "good taste"
Beat Signer - Department of Computer Science - [email protected]
29 February 14, 2014
Entity-Relationship (ER) Model
Conceptual model based on a set
of entities and relationships
An entity is a "thing" or "object" that can
be distinguished from other objects
A relationship describes an association
between multiple entities
"Introduced" and formalised by Peter Chen P. Chen, The Entity-Relationship Model - Toward a Unified View
of Data, ACM Transactions on Database Systems 1 (1), March 1976
Peter Chen
Beat Signer - Department of Computer Science - [email protected]
30 February 14, 2014
Entities
An entity represents a distinguishable object e.g. specific person, car or company
An entity is described by a number of attributes has to be uniquely identifiable by its attributes (ovals)
An entity set is a set of entities with the same type the extension of the entity set (rectangle) are its entities
an entity may belong to multiple entity sets
Beat Signer 1234
1576 Lode Hoste
3212 William Van Woensel
Employees
Employees
id name
note that we will use a slightly
different notation than in the book!
Beat Signer - Department of Computer Science - [email protected]
31 February 14, 2014
Attributes
The set of permitted attribute values is called the domain
or value set entity instances can be described by a set of (name,value) pairs
e.g. {(id, 1576),(name, Lode Hoste)}
The ER model supports the following attribute types simple attributes
composite attributes
- hierarchy of sub-attributes
single-valued attributes
multivalued attributes
- optional lower and upper bounds
derived attributes
- computed via relationships or other attribute values
Beat Signer - Department of Computer Science - [email protected]
32 February 14, 2014
Attributes ...
A multivalued attribute is represented by a double ellipse
Derived attributes are indicated by dashed ellipses
address is an example of composite attribute
LocatedAt OfficesEmployees
id name
birthday
phone age #offices
address
street city
0..1 0..*
Beat Signer - Department of Computer Science - [email protected]
33 February 14, 2014
Keys
An entity's attribute values must uniquely identify the
entity
A subset of attributes that uniquely identify an entity is
called superkey
A minimal superkey without any unnecessary attributes
is called a candidate key
The primary key is one of the candidate keys chosen by
the database designer for unique entity identification in the ER model, the primary key is highlighted by the set of
underlined attributes
the value of a primary key should change very rarely
Beat Signer - Department of Computer Science - [email protected]
34 February 14, 2014
Relationships
A relationship is an association between multiple entities
A relationship set (diamond) is a set of relationships of the
same type e.g. LocatedAt
Beat Signer 1234
1576 Lode Hoste
3212 William Van Woensel
Employees
10F718
10F705
10F721
10F703
Offices
LocatedAt OfficesEmployees
id name name address
Beat Signer - Department of Computer Science - [email protected]
35 February 14, 2014
Relationships ...
We can have binary or n-ary relationship sets {(e1, e2,..., en) | e1 E1, e2E2,..., enEn}
Each relationship instance in an ER schema represents
an association between the involved entities
The role defines an entity's function in a relationship has to be explicitly defined if the same entity set participates more
than once in a relationship set (recursive relationship)
A relationship may contain descriptive attributes
A relationship instance must be uniquely identifiable by
its entities (without any descriptive attributes) i.e. a relationship set cannot contain two relationship entities that
only differ in their descriptive attributes
Beat Signer - Department of Computer Science - [email protected]
36 February 14, 2014
Relationship with Roles and Attributes
LocatedAt OfficesEmployees
id name name address
WorksFor
boss employee
duration
Beat Signer - Department of Computer Science - [email protected]
37 February 14, 2014
Example of a 3-ary Relationship
WorksFor CompaniesEmployees
id name name address
Durations
from to
Beat Signer - Department of Computer Science - [email protected]
38 February 14, 2014
Cardinality Constraints
A relationship can be one-to-one, one-to-many,
many-to-one or many-to-many
An arrow indicates a to-one relationship cardinality constraints may also be expressed by numbers
- e.g. 0..*, 1..*, 0..1, 1..1, 2..5
a double line or 1..* indicates a total participation constraint
MarriedTo WomenMen
Teaches CoursesTeachers
StarsIn FilmsFilmstars
Beat Signer - Department of Computer Science - [email protected]
39 February 14, 2014
Weak Entity Sets
An entity set with a primary key is called a strong
entity set
A weak entity set (double rectangle) does not have
enough attributes to form a primary key
Offers SeatsCinemas
id name number colour
Beat Signer - Department of Computer Science - [email protected]
40 February 14, 2014
Weak Entity Sets ...
A weak entity set must be associated with an
identifying entity set via an identifying relationship
(double diamond) a weak entity set is existence dependent on an identifying
entity set
- can also participate in other non-identifying relationships
a weak entity set must relate to the identifying entity set via a total participation constraint and each weak entity instance can only be related to one identifying entity instance
a discriminator or partial key (underlined dashed attributes) uniquely identifies a weak entity relative to a strong entity
In some cases a weak entity may also be expressed as a
multivalued composite attribute
Beat Signer - Department of Computer Science - [email protected]
41 February 14, 2014
ER Design Issues
When do we model something as an attribute and when
as an entity set? there is no general answer and the choice depends on the
specific application domain to be modelled
When do we model something as a relationship set and
when as an entity set? a relationship set often corresponds to an action between entities
In general we should try to avoid higher level n-ary
relationship sets (3-ary relationship sets should be the
maximum and even these should be used carefully)
There should be no redundant attributes
Beat Signer - Department of Computer Science - [email protected]
42 February 14, 2014
Homework
Study the following two chapters of the
Database System Concepts book chapter 1
- Introduction
chapter 7
- sections 7.1-7.5 and 7.7
- Database Design and the ER Model
Beat Signer - Department of Computer Science - [email protected]
43 February 14, 2014
Exercise 1
Conceptual modelling
Entity-Relationship (ER) model
Beat Signer - Department of Computer Science - [email protected]
44 February 14, 2014
References
A. Silberschatz, H. Korth and S. Sudarshan,
Database System Concepts (Sixth Edition),
McGraw-Hill, 2010
P. Chen, The Entity-Relationship Model - Toward a
Unified View of Data, ACM Transactions on Database
Systems 1 (1), March 1976
WISE Lab http://wise.vub.ac.be
ArtVis Project http://wise.vub.ac.be/content/artvis-exploring-information-through-advanced-visualisation-techniques
2 December 2005
Next Lecture Extended ER Model and
other Modelling Languages