Bi 7: Chun ho d liu,
Mt s vn khc
1 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni
Chun ho d liu
(data normalization)
2 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni
Khi nim
Vic thit k mt CSDL quan h l xy dng mt lc quan h
cho php lu tr nhng d liu mong mun
gim thiu tnh d tha d liu
cho php trch xut thng tin d dng
s dng cc dng chun (normal forms): l tp hp cc tiu chun cho CSDL
Chun ho d liu l qu trnh cu trc mt CSDL quan h nhm gim thiu d tha v ph thuc ca d liu (da vo kho v cc ph thuc dng hm)
3 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni
Cc dng chun
Dng chun th nht (First normal form 1NF): 1970
Dng chun th hai (Second normal form 2NF): 1971
Dng chun th ba (Third normal form 3NF): 1971
Dng chun Boyce-Codd (Boyce-Codd normal form BCNF): 1974
Dng chun th t (Forth normal form 4NF): 1977
Dng chun th nm (Fifth normal form 5NF): 1979
Dng chun th su (Sixth normal form 6NF): 2003
4 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni
Dng chun th nht 1NF
Mt thc th tho mn 1NF nu n khng c nhm thuc tnh no c lp li
Loi b cc thuc tnh a tr
Phn VD: thc th Order vi phm 1NF v nhm (item_name, item_number, item_price)
lp li 9 ln
y l dng chun n gin nht
Chuyn v 1NF:
Chia cc nhm thuc tnh lp li thnh cc quan h nh hn
S dng thuc tnh kho v kho ngoi
Lin kt 1..n gia cc quan h
5 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni
Order
id: int
shipdate: date
customer: string
address: string
item_name1: string
item_number1: int
item_price1: int
item_name2: string
item_number2: int
item_price2: int
item_name9: string
item_number9: int
item_price9: int
...
V d: Chuyn mt thc th v 1NF
nh ngha thm quan h ph: OrderItem
Lin kt 1..n
Ch s dng thm kho ngoi: order_id
6 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni
Order
id: int
shipdate: date
customer: string
address: string
item_name: string
item_number: int
item_price: int
OrderItem
order_id: int
Dng chun th hai 2NF
Mt quan h tho mn 2NF khi v ch khi ng thi:
tho mn 1NF
khng c thuc tnh no c xc nh bi mt tp con ca kho
7 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni
m-sv lp h-tn
123 CSDL Trn Khnh Linh
123 KTLT Trn Khnh Linh
456 LTM Bill Gates
456 CSDL Bill Gates
456 KTLT Bill Gates
789 VXL L Lin Kit
789 LTM L Lin Kit
Phn VD: quan h bn khng tho mn 2NF v:
h-tn c xc nh hon ton bi m-sv
m-sv l tp con ca kho (m-sv, lp)
Tnh d tha d liu: h tn c lu tr nhiu ln
Chuyn v 2NF
chuyn mt quan h v 2NF:
Tch thnh cc quan h nh hn
S dng lin kt 1..n
8 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni
M SV H tn
123 Trn Khnh Linh
456 Bill Gates
789 L Lin Kit
M SV Lp
123 CSDL
123 KTLT
456 LTM
456 CSDL
456 KTLT
789 VXL
789 LTM
Ph thuc dng hm
2NF s dng khi nim ph thuc dng hm (functional dependencies): l s tng qut ho ca khi nim kho
nh ngha: Trn mt quan h R, cho R v R l hai tp thuc tnh ca R. Gi ph thuc dng hm vo (k hiu: ) khi v ch khi mi gi tr ca xc nh mt gi tr ca .
Vi v d trc ta c:
m-sv h-tn
Vi khi nim ny, c th nh ngha li 2NF: khng c tp thuc tnh khng kho no ph thuc dng hm vo mt tp con ca thuc tnh kho
9 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni
Cc tnh cht ca ph thuc dng hm
Cc tin Armstrong
Phn x (ph thuc tm thng): nu th
Tng cng: nu th (, ) (, )
Bc cu: nu v th
Mt s tnh cht khc
Hp: nu v th (, )
Phn r: nu (, ) v th
Gi bc cu: nu v (, ) th (, )
10 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni
Dng chun th ba 3NF
Mt quan h tho mn 3NF khi v ch khi ng thi:
tho mn 2NF
khng c ph thuc dng hm no vi thuc tnh khng kho
Phn v d:
(Tn ti ph thuc dng hm: tc-gi nm-sinh-tg)
11 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni
tn sch nm tc-gi nm-sinh-tg
The universe in a nutshell 2001 Stephen Hawking 1942
The Da Vinci code 2003 Dan Brown 1964
A brief history of time 1988 Stephen Hawking 1942
Digital fortress 1998 Dan Brown 1964
The lost symbol 2009 Dan Brown 1964
Chuyn v 3NF
chuyn mt quan h v 3NF (tng t vi 2NF):
Tch thnh cc quan h nh hn
S dng lin kt 1..n
Thm thuc tnh kho ca quan h mi
12 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni
tn sch nm tc-gi
The universe in a nutshell 2001 Stephen Hawking
The Da Vinci code 2003 Dan Brown
A brief history of time 1988 Stephen Hawking
Digital fortress 1998 Dan Brown
The lost symbol 2009 Dan Brown tc-gi nm-sinh-tg
Stephen Hawking 1942
Dan Brown 1964
Dng chun Boyce-Codd BCNF
BCNF c nh ngha b tr cho 3NF cn c gi l 3.5NF
nh ngha: quan h R tho mn BCNF khi v ch khi: vi mi ph thuc dng hm , mt trong hai iu kin sau tho mn:
l ph thuc dng hm tm thng (tc )
l kho ca R
Tnh cht:
Nu R tho mn BCNF th tho mn 3NF
Ngc li cha chc ng, nhng ch mt s t trng hp
13 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni
Khung nhn
(views)
14 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni
Khi nim
Khung nhn l cc quan h o thun tu v mt logic, c to ra da trn cc quan h thc, nhm gip thun tin trong s dng
V d:
Trn CSDL nhn vin, n thng tin v mc lng, a ch nh vi cc ngi dng thng thng
Trn CSDL sinh vin, gp cc quan h SinhVien, LopHoc, DangKy thnh mt quan h o khc d s dng
Vic sa i hay trch thng tin trn khung nhn phi m bo phn nh ng nh khi thao tc trn cc quan h thc
15 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni
L do chnh dng khung nhn
Ch lm vic trn mt phn ca d liu
C th gp nhiu quan h thnh mt quan h o
To ra cc quan h c kh nng tu bin cao theo nhu cu s dng
16 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni
Khung nhn khng lu tr thm d liu, m thc thi trn cc quan h thc
H tr thm kh nng bo mt thng tin
n nhng phn d liu khng mun th hin ra bn ngoi
SQL
To khung nhn:
create view tn as select ;
nh ngha ca khung nhn ph thuc vo cu lnh select
Xo khung nhn:
drop view tn;
Sau khi c to, vic truy vn v cp nht d liu ca khung nhn tng t nh vi cc quan h thng
17 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni
V d (MySQL) mysql> select * from t; +------+-------+ | qty | price | +------+-------+ | 3 | 50 | | 5 | 60 | | 2 | 20 | +------+-------+ 3 rows in set (0.00 sec) mysql> create view t1 as select qty, price as value from t where qty>2; Query OK, 0 rows affected (0.02 sec) mysql> select * from t1; +------+-------+ | qty | value | +------+-------+ | 3 | 50 | | 5 | 60 | +------+-------+ 2 rows in set (0.01 sec) mysql> _
18 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni
nh ch mc
(indexing)
19 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni
Khi nim
Vic tm kim vi d liu c sp xp s nhanh hn nhiu so vi d liu khng c sp xp
V d: bi ton tra t in
nh ch mc (indexing) l vic to ra cc cu trc d liu (cy, bng bm,) ph gip tm kim d liu nhanh hn
C th to nhiu index cho mi quan h
Nn to index cho cc thuc tnh hay c dng trong cc iu kin tm kim (mnh where..)
Khng phi iu kin tm kim no cng c th dng c index (VD: tm kim chui con,)
20 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni
SQL
To index:
create index tn-index
on tn-quan-h(tn-thuc-tnh);
Xo index:
drop index tn-index on tn-quan-h;
Lit k cc index:
(MySQL) show indexes from tn-quan-h;
(SQL Server) exec sp_helpindex tn-quan-h;
Sau khi index c to, vic s dng cc quan h vn nh trc. Vic s dng ti cc index l t ng do DBMS t quyt nh.
21 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni
V d (MySQL) mysql> select count(*) from thivien_poem where AUTHOR=20; +----------+ | count(*) | +----------+ | 158 | +----------+ 1 row in set (1.74 sec) mysql> create index thivien_poem_AUTHOR on thivien_poem(AUTHOR); Query OK, 40349 rows affected (1 min 14.00 sec) Records: 40349 Duplicates: 0 Warnings: 0 mysql> select count(*) from thivien_poem where AUTHOR=20; +----------+ | count(*) | +----------+ | 158 | +----------+ 1 row in set (0.00 sec) mysql> _
22 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni
Bi tp
1. Xc nh xem quan h sau thuc dng chun no: ITEM (SKU, PromID, Vendor, Style, Price)
(SKU, PromID) (Vendor, Style, Price)
SKU (Vendor, Style)
2. Chun ho quan h trn v dng cao hn
3. Chn mt kho v lit k cc ph thuc dng hm cho: ITEMS (PONum, ItemNum, PartNum, Desc, Price, Qty)
23 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni