• 基基基基 – 基基基基基基 – 基基基基基基 基基基基基基 统 – 基基基基 基基基基基基 统 – 基基基基基基基基基 • 基基基基 – 基基基基基 E-R 基基基基基 – 基基基基基基基基基 – 基基基基基基基基基基基基基 – 基基基基基基基基基 基基基基基 基基基基基 、、 • 基基基基 基基 统 – 基基基基基基基基基基 ( 基基基 体 ) – 基基基基基基基基基基基
Jan 04, 2016
• 基本概念– 数据库的概念– 数据库管理系统的概念和功能– 数据库系统的概念和组成– 数据管理技术的发展
• 数据模型– 概念模型及 E-R 图表示方法– 数据模型的三大要素– 数据库领域中常用的数据模型– 关系模型的数据结构、主要操作、完整性约束
• 数据库系统结构– 数据库的三级模式结构 ( 体系结构 )– 数据独立性的概念及实现
• 关系数据库– 关系数据结构及形式化定义
• 域、笛卡尔积、关系、主码、外码、关系模式– 关系代数运算符– 使用关系代数描述用户查询
• SQL 语言– SQL 语言的特点– SQL 命令及使用
• 关系数据理论– 函数依赖等基本概念– 1NF 、 2NF 、 3NF 、 BCNF 的定义和判断– 关系模式设计不好可能出现的问题
• 数据库设计– 设计 E-R 图,并转换为关系模式
• 事务– 事务的概念及特性
• 数据库恢复– 数据库恢复机制的功能– 故障种类及对数据库可能造成的影响– 恢复的实现技术 (备份 + 日志)
• 并发控制– 并发控制机制的功能– 并发调度的可串行性的定义– 封锁的概念
• 数据库安全性的概念及安全控制技术
Basic Notions
• Database (DB) – In essence, it’s nothing more than a collection of
information that exists over a long period of time.
– In common parlance, it refers to a collection of data managed by a database management system (DBMS) or just database system (DBS).
DBMS
• Database Management System (DBMS) – A collection of programs that enables you to
store, modify, and extract information from a database.
– There are many different types of DBMSs, such as Oracle, Sybase, SQLServer 2000, My SQL, Access, …
Basic functions of DBMS
• Data definition
• Data manipulation
• Operation management of DB
• Creation and maintenance of DB
Database System (DBS)
• Include : DB, DBMS, Development Tools, DB applications, DB Administrator (DBA) and users
DB
OS
DBMS
Developing Tools
DB applications
User User
DBA
Database Administrator
• DBA– Person who is responsible for management and
maintenance of DB.
• 具体任务– 决定数据库中信息内容和结构– 存储结构和存取方式定义– 定义数据的安全性和完整性约束条件– 改进和重构数据库系统– 监控数据库的使用和运行
History of Data management• File systems
– Problems: limited support to definition of data schema, no directly support to DML (Data Manipulate Language), no support to efficient concurrent, secure access etc.
• Early DBMS– Evolved from file systems.– Based on hierarchical model and network model.– Problems: no support to high-level query languages.
• Relational DBMS– Data organized as tables called relations.– user won’t be concerned with the storage structure, and
queries are expressed in a very high-level language (SQL).– Today used in most DBMS's.
Architecture of DBSThree Schema Architecture of DBS( 三级模式结构 )• External Schema( 外模式 ): or user schema
– Defines one view of the data as seen by a specific set of application or end users.
– There may be many external schemas in a DB.• Schema( 模式 ): conceptual schema, logical schema
– Defines data from perspective systems designer; – Independent of end users & data storage mechanism– There is only one conceptual schema in a DB.
• Internal Schema( 内模式 ): storage schema– Defines how data is organized, stored and manipulated inside the
system.– Totally dependent on particular implementation.– There is only one internal schema in a DB.
Application
A
External Schema 1
Schema
Internal Schema
DB
Application
B
Application
D
Application
E
Application
C
External Schema 2
External Schema 3
Independence of Data and Program
• Logical Independence ( 逻辑独立性 )– Via Reflection of External Schema/Schema– One Schema corresponds to many External schemas; every Ex-
Schema has an Ex-Schema/Schema Reflection.– When Schema changes, DBA changes the Ex-Schema/Schema
reflections, so application programs needn’t been changed.
• Physical Independence ( 物理独立性 )– Via Reflection of Schema/Internal Schema– Schema/Internal Schema Reflection is unique.– When Internal Schema changes, DBA changes the Schema/
Internal Schema Reflection, so application programs needn’t been changed.
Abstraction and Modeling
Two steps of the abstraction and modeling:
1. The objective objects in the real world is abstracted to a concept model( 概念模型 ).
2. Concept model is converted to a data model( 数据模型 ) that supported by some DBMS.
现实世界 机器世界
概念模型 数据模型客观对象
信息世界
Concept model
Also called information model. Modeling data from the viewpoint of users. Mainly used for a database design. Usually represented by Entity-Relationship diagrams.
Modeling data from the viewpoint of computer. Mainly used for a DBMS realization. Traditional data models ( 常用的数据模型 )
Hierarchical ModelNetwork ModelRelational Model
Main factors of the data model( 数据模型的组成要素 ) Data structureData operation Integrity constraints
Data model
数据模型的组成要素• 数据结构
– 指所研究数据集合及数据间的联系• 数据操作
– 允许对数据执行的操作及有关的操作规则,如检索,插入、删除、修改等
• 数据的约束条件– 一组数据及其联系所具有的制约规则,用以限定符合数
据模型的数据库状态以及状态的变化,以保证数据的正确、有效、相容
E/R Diagrams
• Entity set( 实体集 ) ,用矩形框表示 .
• Attribute( 属性 ) ,用椭圆表示• Key( 码 ) ,用下划线表示• Relationship( 联系 ) ,用菱形表示
– 两个实体集间的: 1:1, 1:n 和 m:n– 多个实体集间的联系– 同一实体集内的联系
Relational Model
• Relation( 关系 ) : a two- dimensional table.• Attributes ( 属性 ) : names for the columns
of the relation.• Schema( 关系模式 ) : name of a relation
and the set of attributes for a relation, for example
• Tuples( 元组 ) : The rows of a relation
• 实体完整性规则:若属性 A 是基本关系 R 的主属性,则属性 A 不能取空值
• 外码:设 F 是参照关系 R 的一个或一组属性,但但不是不是 RR 的码的码,若 F 与被参照关系 S 的主码相对应,则称 F 是 R 的外码
• 参照完整性规则:参照关系 R 中每个元组在外码F 上的值必须取空值或者等于 S 中某个元组的主码值
• 用户定义完整性规则:用户定义的、具体应用中的数据必须满足的约束条件
E-R图向关系模型的转换原则• 一个实体转换为一个关系模式,实体的属性就是关系的属性,实体的码就是关系的码
• 对实体间的联系– 一个 1:1联系可以转换为一个独立的关系模式,也可以与任意对应的关系模式合并
– 一个 1:n联系可以转换为一个独立的关系模式,也可以与 n端对应的关系模式合并
– 一个m:n联系转换为一个关系模式– 三个或三个以上实体间的一个多元联系可以转换为一个关系模式
– 具有相同码的关系模式可以合并
Functional Dependencies• Functional Dependency ( 函数依赖 )
– X -> A is an assertion about a relation R that whenever two tuples of R agree on all the attributes of X, then they must also agree on the attribute A.
• Full ( 完全 ) & Partly ( 部分 ) Functional Dependency– In relation R, if X->Y, and any subset of X, X’-\>Y, say Y
full functional dependency to X, that X-F>Y.
– Otherwise X-P>Y, Y is partly function dependency to X.
• Transitive ( 传递 ) Functional Dependencies– the FDs A->B and B->C both hold for R. C is said to depend
on A transitively, via B
• Anomalies( 异常 )– Problem occur when we try to cram too much into a
single relation are called anomalies.• Redundancy: Information may be repeated unnecessarily
in several tuples.
• Insertion Anomalies: Tuple insertion may be failed due to lack some other information in the current database.
• Deletion Anomalies: If a set of values becomes empty, we may lose other information as a side effect.
• Update Anomalies: We may change information in one tuple but leave the same information unchanged in another.
• Normalization Procedure for database schema design– The successive reduction of a given collection of
relation schema to some more desirable form.
NFNFNFBCNF 123
• 1NF– A relation R is in 1NF if and only if, every tuple
contains exactly one value for each attributes. • Relations in Relational database always in 1NF.
• 2NF– A relation R is in 2NF if and only if: it is in 1NF and
every non-key attributes is full functional dependency on the primary key.
– Example: If in relation R(A, B, C), existing functional dependencies
(A, B)->C, A-\>C, B-\>C, so
(A, B) -F>C and R is in 2NF
• 3NF– A relation R is in 3NF if: it is in 2NF and there is
no transitive functional dependency existed.– Example: If in relation R(A, B, C), existing
functional dependencies A->B and B->C, then R is not 3NF.
• BCNF– We say a relation R is in BCNF if: whenever
X ->A is a nontrivial FD and X is a superkey.• Nontrivial means A is not a member of set X.
• Superkey is any superset of a key (not necessarily a proper superset).
Relational Algebra( 关系代数 )
• Union, intersection, and difference.– Usual set operations, but require both operands
have the same relation schema.
• Selection: picking certain rows.
• Projection: picking certain columns.
• Products and joins: compositions of relations.
Operators
运算符 含义 运算符 含义集合运算符
并差交
广义笛卡尔积
比较运算符
>
>=
<
<=
=
大于大于等于
小于小于等于
等于不等于
专门的关系运算符
选择投影连接
除
逻辑运算符
非与或
SQL• SQL is a very-high-level language.
– Say “what to do” rather than “how to do it.”– Avoid a lot of data-manipulation details needed in
procedural languages like C++ or Java.• Usage
– SELECT• DISTINCT, LIKE, IN, Order By, Group By• SUM, MAX, MIN, COUNT, ANY, ALL
– INSERT, DELETE, UPDATE– CREATE (RELATION, INDEX, VIEW)– DROP, ALTER– GRANT, REVOKE
Transactions (事务)• A transaction is a collection of one or more
operations on the database that must be executed atomically, that is, either all operations are performed or none are.
• 事务– 用户定义的一个对数据库读写操作序列– 一个不可分割的工作单位
ACID Transactions
• A DBMS is expected to support “ACID transactions,” which are:– Atomic: All or none is done.– Consistent: Database constraints are preserved.– Isolated: It appears to the user as if only one
process executes at a time.– Durable: Effects of a process do not get lost if
the system crashes.
事务的性质 (ACID 特性)• 原子性 (Atomicity)
– 事务中的操作要么都做,要么都不做 (All or None)
• 一致性 (Consistency)– 事务执行的结果必须使数据库从一个一致性状态变到另
一个一致性状态– 与原子性密切相关
• 隔离性 (Isolation)– 并发执行的各事务不能相互干扰
• 持续性 /永久性 (Durability)– 事务一旦提交,它对数据库的更新不再受后继操作或故
障的影响★ DBMS 中事务处理必须保证其 ACID 特性,这样才能
保证数据库中数据的安全和正确
SQL 语言中定义事务的语句• SQL 定义事务的语句
– Begin transaction( 事务开始 )
– Commit( 事务提交,将更新结果写入磁盘 )
– Rollback( 事务回滚,撤销事务中所有已完成的更新 )
DBMS 中的事务管理• 事务是恢复和并发控制的基本单位• 事务在运行过程中因某种故障被强行终止,数据库一致性被破坏,需进行恢复;
• 多个事务并行运行时,不同事务的各种操作交叉进行,为保证各事务的执行互不干扰,需进行并发控制
故障的种类• 事务内部故障
– 事务在运行至正常终止点 (commit 或 rollback)前被终止– 包括
• 能由事务所在程序处理的,如条件不满足等• 不能由事务所在程序处理的,如运算溢出等
• 系统故障 (system)– 系统重启、 OS 故障、 DBMS 代码错误、掉电等
• 介质故障 (medium)– 磁盘损坏等
• 计算机病毒 (virus)
各类故障对数据库的可能影响
• 数据库本身被破坏,使数据库中全部或部分数据丢失– 如系统故障、介质故障、计算机病毒等
• 数据库没有被破坏,但因事务的运行被非正常终止而使数据库数据失去一致性 ( 正确性 )
– 如事务内部故障、系统故障、计算机病毒等
事务调度的可串行性• 多个并发事务中的操作是交叉执行的• 能将所有事务串行起来 (Serial execution) 的调
度策略不会破坏数据库的不一致性,故而总是正确的
• 可串行化的调度 (serializable) :多个事务的并发执行是正确的,当且仅当其结果与按某一次序串行地执行它们时的结果相同
• 可串行性是并发事务操作是否正确的判别准则
并发控制的主要技术 -- 封锁
• 概念– 事务 T 在对某个数据对象 ( 如数据库、表、记录等 ) 操作之前,首先向系统发出加锁请求以便获得对数据对象相应的控制
– 在事务 T释放它所获得的锁之前,其他事务不能更新此数据对象
锁• 锁的类型
– 排它锁 (eXclusive lock) :写锁– 共享锁 (Share lock) :读锁
• 锁的相容矩阵 T2 T1
X S -
X
S
-
T2 T1
X S -
X N N Y
S N Y Y
- Y Y Y