Why SQL DBMS’s Still Lack Full Declarative Constraint Support (A polite excuse) Toon Koppelaars
Why SQL DBMS’s Still
Lack Full Declarative
Constraint Support
(A polite excuse)
Toon Koppelaars
Who am I?
• A database guy, relational dude
– Developer (with DBA experience)
• Oracle technology user since 1987
• Co-author of this book
• Today’s talk = last chapter
Agenda
• SQL Assertions?
– A few examples
• Validation Execution Models
– More efficient along the way
• Serializability
– Probably the biggest problem
• Conclusions
SQL Assertions?
• Examples
1. There is at most one president
2. One cannot manage more than two
departments
3. Department with president and/or
manager should have an administrator
Data integrity constraints
– Just like: CHECK, PK, UK, FK
– ‘all the other ones’
They constrain
data allowed in
our tables
They constrain
data allowed in
our tables
Syntax?
• Create assertion [some_name] as
check([some_SQL_expression]);
• For over 25 years part of Ansi/Iso
SQL standard
Example 1
• create assertion
at_most_one_president
as check
(1 =>
(select count(*)
from EMP e
where e.JOB = ‘PRESIDENT’
)
);
• Task of DBMS: make sure it’s true at all times
- Closed SQL
expression (no
free variables)
- Evaluates to
true or false
Example 2
• create assertion
cannot_manage_more_than_2
as check
(not exists
(select ‘x’
from (select d.MGR
,count(*) as cnt
from DEPT d
group by d.MGR)
where cnt > 2
)
);
Example 3
• create assertion admin_in_dept_with_vip as check (not exists (select ‘x’ from (select distinct e.DEPTNO from EMP e) e where exists (select ‘y’ from EMP e1 where e1.DEPTNO = e.DEPTNO and e1.JOB in (‘P’,’M’)) and not exists (select ‘z’ from EMP e2 where e2.DEPTNO = e.DEPTNO and e2.JOB = ‘ADMIN’) ) );
• Can this be the killer feature of
OracleXIII?
Imagine...
• If DBMS vendor would support these
– How much less lines of code one would
need to write in application
development
– How much less bugs this results into
– How easy we could accomodate
change requested by the business
– How data quality would improve
Point to be made
• SQL assertions ‘cover’ all the other
declarative constraints available
– CHECK can be written as assertion
– PK/UK can be written as assertion
– FK can be written as assertion
• We just have shorthands for these,
since these are so common in every
database design
Check writen as assertion
• create assertion hire_only_18_plus
as check
(not exists
(select ‘x’
from EMP e
where e.HIRED - e.BORN <
18*365
)
);
CHECK((HIRED – BORN) >= 18*365)
PK/UK written as assertion
• create assertion empno_is_unique
as check
(not exists
(select ‘x’
from EMP e1
,EMP e2
where e1.EMPNO = e2.EMPNO
and e1.rowid != e2.rowid
)
);
PRIMARY KEY (EMPNO)
FK written as assertion
• create assertion work_in_known_dept as check (not exists (select ‘x’ from EMP e where not exists (select ‘y’ from DEPT d where d.deptno = e.deptno ) ) ); FOREIGN KEY (DEPTNO) REFERENCES DEPT(DEPTNO)
Scientific Problem
• Why is CREATE ASSERTION not
supported?
• Think about this:
– The DBMS has our assertion expression
but:
• Would you accept full evaluation of given
expression for every insert/update/delete?
– No.
• Performance would be horrible in real world db’s
• You require a much more sophisticated execution
model in the real world
Scientific Problem
• The challenge is:
– Given: arbitrary complex assertion expression
+
– Given: arbitrary complex SQL statement
– What is the minimum required check that
needs to be executed by the DBMS for the
assertion to remain true?
• Can be ‘nothing’ if assertion is immune for the given
DML
For instance...
• Department that employs president or manager should also employ administrator – Obviously only when DML operates on EMP
1.Insert into emp values(100,’Smith’,’SALESREP’
,’1/1/80’,’20/11/07’,7000,20);
2.Insert into emp values(:b0,:b1,:b2
,:b3,:b4,:b5,:b6);
3.Insert into emp (select * from emp_loaded);
4.Update emp set msal=1.05*msal;
5.Delete from emp where empno=101;
Execution Models
• Using triggers:
– EM1: always
– EM2: involved tables
– EM3: involved columns
– EM4: table polarities
– EM5: involved literals (TE)
– EM6: TE + minimal check
Execution Model 1
• Evaluate every assertion on every
DML statement
– Evaluating every <boolean
expression with subqueries>
– On every DML statement
• In after-statement table trigger
– WOULD BE *VERY* INEFFICIENT
• Let’s quickly forget this “EM1”
Execution Model 2
• Only evaluate the assertions that involve the table that is being “DML-led”
– Finding involved tables by parsing the
assertion expression
– Per assertion 100% code generation of a
(insert/update/delete) “after statement” table
trigger
• Select from dual where [expression]
• Found? OK
• Not found raise_application_error
Execution Model 2
• Using examples 2 and 3: – Admin_in_dept_with_vip EMP table
– Cannot_manage_more_than_2 DEPT table
Execution Model 2
create trigger EMP_AIUDS_EMP01
after insert or update or delete on EMP
declare pl_dummy varchar(40);
begin
--
select 'Constraint EMP01 is satisfied' into pl_dummy
from DUAL
where not exists(select ‘a violation’
from (select distinct deptno from EMP) d
where exists(select e2.*
from EMP e2
where e2.DEPTNO = d.DEPTNO
and e2.JOB in ('PRESIDENT','MANAGER'))
and not exists(select e3.*
from EMP e3
where e3.DEPTNO = d.DEPTNO
and e3.JOB = 'ADMIN'));
--
exception when no_data_found then
raise_application_error(-20999,'Constraint EMP01 is violated.');
end;
Assertion
predicate
Execution Model 2
create trigger DEPT_AIUDS_DEPT01
after insert or update or delete on DEPT
declare pl_dummy varchar(1);
begin
--
select ‘a' into pl_dummy
from DUAL
where not exists (select ‘a violation’ from (select d.MGR ,count(*) as cnt from DEPT d group by d.MGR) where cnt > 2));
--
exception when no_data_found then
raise_application_error(-20999,'Constraint DEPT01 is violated.');
end;
EM2 could be supported declaratively
Execution Model 2
• Inefficiencies:
– EMP01 and DEPT01 are checked when updating columns that are not involved, for instance:
• Updating EMP.ENAME
• Updating DEPT.LOC
Execution Model 3
• For inserts + deletes: EM3 == EM2
• For updates: only evaluate
assertions that involve *columns*
being changed
– Simple parse will find columns
– Assumes ‘clean specification’
• Create trigger syntax allows
specification of columns that are
being changed
Execution Model 3
create trigger EMP_AUS_EMP01
after update of DEPTNO,JOB on EMP
declare pl_dummy varchar(40);
Begin
--
select 'Constraint EMP01 is satisfied' into pl_dummy
from DUAL
where not exists(select ‘department in violation’
from (select distinct deptno from EMP) d
where exists(select e2.*
from EMP e2
where e2.DEPTNO = d.DEPTNO
and e2.JOB in ('PRESIDENT','MANAGER'))
and not exists(select e3.*
from EMP e3
where e3.DEPTNO = d.DEPTNO
and e3.JOB = 'ADMIN'));
--
exception when no_data_found then
raise_application_error(-20999,'Constraint EMP01 is violated.');
end;
Execution Model 3
create trigger DEPT_AIUDS_DEPT01
after update of MGR on DEPT
declare pl_dummy varchar(40);
begin
--
select ‘a' into pl_dummy
from DUAL
where not exists (select ‘x’ from (select d.MGR ,count(*) as cnt from DEPT d group by d.MGR) where cnt > 2));
--
exception when no_data_found then
raise_application_error(-20999,'Constraint DEPT01 is violated.');
end;
EM3 could be supported declaratively
Execution Model 3
• Inefficiencies:
– Sometimes inserts (e)or deletes can
never violate a constraint
– For Cannot_manage_more_than_2,
deleting a department does not require
re-validation
– For Admin_in_dept_with_vip, both
inserts and deletes do require re-
validation
Execution Model 4
• For updates: EM4 = EM3
• For deletes and inserts EM4 drops unnecessary delete (e)or insert table triggers
– Polarity of a table for a given constraint • Positive: inserts can violate
• Negative: deletes can violate
• Neutral: both can violate
• Undefined: table is not involved
Polarity of table for given constraint can be
computed via special parsing
Execution Model 4
• EM4 maintains:
– All EM3 triggers, except for one:
• Drops:
– Cannot_manage_more_than_2 delete
trigger
EM4 could be supported declaratively
Execution Model 4
• Inefficiencies: – Admin_in_dept_with_vip: eg. inserting
SALESMAN, or deleting TRAINER does not require re-validation
• Start involving literals mentioned in assertions
• If assertion does not have literals then next EM is same as EM4 – Cannot_manage_more_than_2 has no literals
Execution Model 5
• How do we see: – Salesman is inserted? Trainer deleted?
• Could parse the DML statement
– But does not always work due to absence of literals
• Make use of column values of affected rows – Requires “Transition Effect” (TE) of a DML statement
• Common concept (aka. “delta”-tables)
• Inserted_rows, Updated_rows, Deleted_rows
– Maintaining TE is straightforward (see book)
• EM5: Only check the assertion when a property holds in the TE
Execution Model 5
• TE property for
Admin_in_dept_with_vip:
1. Inserted_rows holds a president or a
manager
or,
2. Deleted_rows holds an admininstrator
3. Updated rows shows that ...
Execution Model 5
create trigger EMP_AIS_EMP01
after insert on EMP
declare pl_dummy varchar(40);
begin
-- If this returns no rows, then EMP01 cannot be violated.
select 'EMP01 must be validated' into pl_dummy
from DUAL
where exists
(select 'A president or manager has just been inserted'
from inserted_rows
where JOB in ('PRESIDENT','MANAGER'));
--
begin
--
<same trigger code as EM4>
--
end;
exception when no_data_found then
-- No need to validate EMP01.
null;
--
end;
Execution Model 5
create trigger EMP_ADS_EMP01
after delete on EMP
declare pl_dummy varchar(40);
begin
-- If this returns no rows, then EMP01 cannot be violated.
select 'EMP01 must be validated' into pl_dummy
from DUAL
where exists
(select 'An administrator has just been deleted'
from deleted_rows
where JOB = 'ADMIN');
--
begin
--
<same trigger code as EM4>
--
end;
exception when no_data_found then
-- No need to validate EMP01.
null;
--
end;
Execution Model 5
• Update TE-property for EMP01:
select 'EMP01 is in need of validation'
from DUAL
where exists
(select 'Some department just won a president/
manager or just lost an administrator'
from updated_rows
where (n_job in ('PRESIDENT','MANAGER') and
o_job not in ('PRESIDENT','MANAGER')
or (o_job='ADMIN' and n_job<>'ADMIN')
or (o_deptno<>n_deptno and
(o_job='ADMIN' or n_job in
('PRESIDENT','MANAGER')))
• Can be deduced from insert + delete properties
• EM5 fully declarative too? – Here it gets complex...
Execution Model 5
• Inefficiencies:
– Admin_in_dept_with_vip:
triggers validate all departments
• Unacceptable in real-world databases
• Only some require re-validation
– Cannot_manage_more_than_2:
triggers validate all department
managers
• Unacceptable in real-world databases
• Only some require re-validation
Execution Model 6
• On TE-property + optimized
validation query
– Use the TE-query to find:
• Which deptno-values require re-validation
• Which mgr-values require re-validation
– Then use these values in the assertion-
expression
create trigger EMP_AIS_EMP01
after insert on EMP
declare pl_dummy varchar(40);
begin
--
for r in (select distinct deptno
from inserted_rows
where JOB in ('PRESIDENT','MANAGER'));
loop
begin
-- Note: this now uses r.deptno value from preceeding TE-query.
select 'Constraint EMP01 is satisfied' into pl_dummy
from DUAL
where not exists(select ‘department in violation’
from (select distinct deptno from EMP where deptno = r.deptno) d
where exists(select e2.*
from EMP e2
where e2.DEPTNO = d.DEPTNO
and e2.JOB in ('PRESIDENT','MANAGER'))
and not exists(select e3.*
from EMP e3
where e3.DEPTNO = d.DEPTNO
and e3.JOB = 'ADMIN'));
--
exception when no_data_found then
--
raise_application_error(-20999,
'Constraint EMP01 is violated for department '||to_char(r.deptno)||'.');
--
end;
end loop;
end;
Execution Model 6
• This requires:
– Detecting that the ASSERTION can be (re)written as a universal quantification
– Can sometimes be done in multiple ways
• Which to choose?
– EM6 fully declarative? • Complexity introduced in EM5 further
increases
• Remember: given any arbitrary complex assertion + dml-statement
Still not there yet...
• Then there is something else too...
– Which is often overseen by database
professionals
– And, which is neglected in every
research paper (I’ve read...) that deals
with generating constraint validation
code
Serializability
Concurrent Transactions
• Deptno 13 has two admins and one
manager
– TX1 deletes an admin from 13
• Does not yet commit
– TX2 deletes the other admin 13
• Commits
– TX1 commits
• Constraint is violated for deptno 13!
– TX1 and TX2 must be serialized
Concurrent Transactions
• Note: this is *not* about locking rows of data, but rather: locking a constraint No two TX’s can validate at same time
• We can use DBMS_LOCK to serialize these transactions – See book for example code
• Again complexity further increases
Concurrent Transactions
• Concurrency impact of acquiring rule locks
– EM1: One TX at a time
– EM2: One TX per table at a time
– EM3, EM4, EM5 slowly relaxes • Up to EM5: not acceptable
– EM6: Only if two TX’s actually validate *and* involve same deptno (EMP01 assertion)
• Acceptable
Another complicating
factor
• Deferrabilty...
– Involves temporarily storing violation
cases for re-evaluation at commit time
– More comments on that in chapter 11
of the book
The Polite Excuse
• Inefficient EM’s
– Could be supported, but: • Are unacceptable wgt. Performance &
transaction concurrency
• Efficient EM’s
– Aligned with business requirements • Performance and TX concurrency
But,
– Need more research to determine if they could be declaratively supported