Top Banner
HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) [email protected]
37
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVERLouis Davidson (drsql.org)

[email protected]

Page 2: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Why did I choose data integrity as a topic?

• Answer 1• If I obviously lie to you, will you trust me?

• If your data obviously lies to your customer, will they trust it?

• For data to become information, it has to be as trustworthy as reasonably possible.

• Answer 2• If I were the judge and was convicting someone of poor

data integrity, I would sentence them to write/maintain ETL

• I wrote this slide at 12:49am, 8/14/2013 because I had to get up and fix a data integrity issue

Page 3: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

First Line of Defense – Testing and Requirements• First, know what your user wants (Requirements)

• Build queries to check the data is within tolerances as you build• Define both illegal values and exceptional values

• Age for DayCare Student: • Legal: 1-8 Illegal: Everything else

• Outside the norm, but perhaps possible: 1, 2, 6, 7, 8

• Save these queries as you go

• Test during all phases of the project• Design

• Development

• Customer testing

• Production

• No matter if you follow any of my following advice and let your tables go naked, these scripts can be used to verify data is within tolerances

Page 4: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Requirements tell you what to test for, now WHERE/HOW to implement?

• Classic client server: “May I save this data, please?”

• Very trustworthy data protectionDB

• Built in tools that most programmers understand

• Flexible to change as users need them to

Middle Tier, Rules Engine

• Friendly for the user• Provide immediate feedback for nominal

rules, limiting bandwidth utilizationUI

Page 5: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

No one place can satisfy well enough (But…)

• No interactive protection, limit to 100% true rules

• Extremely limited flexibilityDB

• Suffers under highly concurrent situations• Difficult for Inter-row, Inter-table rules• Difficult to use with tools like SQL, SSIS

Middle Tier, Rules Engine

• Must be recoded for every form/screen• Very limited rule set that can be enforcedUI

Page 6: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Database Layer Responsibilities

• 100% Rules• Always true

• Usually very simple rules

• Failure to meet the prescribed condition would be harmful to the software (and possibly the users of the software)

• Other layers repeat some rules and implement everything else

Page 7: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Database tier layered approach

• Keep it simple

• Enforce integrity via (Our Agenda for the next 1hr)

• Structure - providing correct places to store data

• Keys - protecting uniqueness

• Relationships - foreign keys

• Domains - limiting data points to size/values that are legit

• Conditions - required situations (Customer may have only 1 primary address; No overlapping ranges; etc)

Page 8: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

But We Don’t Want Errors from the Data Tier!• A frequent concern of non-data tier programmer

• Even if you put no constraints you are apt to get errors • You are always likely to get deadlocks

• And if your indexing isn’t great, you may get them frequently

• Best to code error handler that handle any error condition regardless

• If the other tiers handle all of the errors, then the database protection should remain silent• Except perhaps during testing/coding

Page 9: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Structure

• Match the user's needs precisely to the design with room for growth

• Getting design to match the user's needs will get you way down the road to integrity

• Normalization will usually get the car fueled up and started• Naming stuff well doesn’t hurt either…

• Getting it right can only be done by understanding the users requirements• I promise, no more requirement talk

Page 10: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

If your structure is wrong…Users will find a way• Requirement: Store information about books

• What is wrong with this table?• Lots of books have > 1 Author.

• What are common way users would “solve” the problem?• Any way they think of!

• What’s an another common way someone might fix this?

BookISBN BookTitle BookPublisher Author=========== ------------- --------------- -----------111111111 Normalization Apress Louis222222222 T-SQL Apress Michael333333333 Indexing Microsoft Kim444444444 DB Design Apress Jessica444444444-1 DB Design Apress Louis

, Louis& Louisand Louis

Page 11: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Close, but still quite messy

• Add a repeating group?

• But now how to represent who was the primary author on the book?

BookISBN BookTitle BookPublisher …=========== ------------- --------------- 111111111 Normalization Apress …222222222 T-SQL Apress …333333333 Indexing Microsoft …444444444 Design Apress …

Author1 Author2 Author3----------- ----------- -----------LouisMichaelKimJessica Louis

Page 12: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Now, the structure protects the data…

• And it gives you easy expansion

BookISBN BookTitle BookPublisher =========== ------------- ---------------111111111 Normalization Apress 222222222 T-SQL Apress 333333333 Indexing Microsoft444444444 Design Apress

BookISBN Author=========== =============111111111 Louis222222222 Michael333333333 Kim444444444 Jessica

ContributionType----------------Principal AuthorPrincipal AuthorPrincipal AuthorContributorPrincipal Author444444444 Louis

Page 13: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Keys

• Defending against duplication of data where it oughtn't be duplicated

• Artificial Key (Identity/GUID/Sequence generated value) should NOT be the only key

• When employed, Artificial Key is for tuning, Natural Key is for the user

• Avoid giving users sequentially created values • Well, I am account 0000001, what about account

0000002

Page 14: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Uniqueness Counts

• Requirement: Table of school mascots

• For a row to be truly unique, some manner of constraint needs to be on column(s) that have meaning

• It is a good idea to unit test your structures by putting in data that looks really wrong and see if it stops you, warns you, or something!

MascotId Name=========== -----------1 Smokey112 Smokey4567 Smokey 979796 Smokey

Color-----------Black/BrownBlack/WhiteSmoky Brown

School-----------UTCentral HighLess Central HighSouthwest Middle

~~~~~~~~~~~ ~~~~~~~~~~~

Page 15: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Key Constraints

• Applied to protect data from duplication• May help performance, but should exist even if never used for

a query

• Part of the data structure – applied with ALTER TABLE – unlike indexes, which are generally attached for performance

• NULLs• Primary Key – No NULLs Allowed

• Unique – NULL allows, but treated as a single value

• Table Clustering• Usually makes sense for the primary key to be clustered (not a

hard and fast rule though)

• Key constraints valuable with or without clustering

Page 16: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Demo – Key Constraints (and a wee bit more)

Page 17: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Relationships• Establishes a connection between two tables

• Probably the most trouble to implement from outside of the database• Concurrent users means data can change

• Caching all data is really costly (particularly to keep up to date with multiple caching servers for inserts, updates, and deletes!)

• Using foreign key constraints means these types of queries always return the same value:• SELECT COUNT(*)

FROM InvoiceLineItem

• SELECT COUNT(*) FROM Invoice JOIN InvoiceLineItem ON Invoice.InvoiceId = InvoiceLineItem.InvoiceId

Page 18: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Foreign Key Constraints

• Like CHECK CONSTRAINTs, are part of the table structure

• One table can reference another’s PRIMARY KEY key columns, or even the UNIQUE key columns

• Indexing the child’s reference key can be helpful in many cases

• Usually extremely fast, even on very large tables• As long as key’s underlying indexes maintained

• For integer keys, a B-Tree index can search millions of rows in a few reads

Page 19: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Foreign Key Cascading• Can define cascading operations

• UPDATE CASCADE – Deleting the parent deletes the children

• DELETE SET NULL – Updating the parent key set the child reference key to NULL

• DELETE SET DEFAULT – Deleting the parent row sets the child row to the default

• UPDATE NO ACTION – Fail if any child rows exist – THE DEFAULT

• Or other combinations of DELETE and UPDATE with CASCADE, SET NULL, SET DEFAULT, or NO ACTION

• DELETE CASCADE operations should be limited, to avoid surprises

• Use UPDATE CASCADE where you have updatable primary keys. Changing a primary key with references is messy.

• Multiple or Cyclic cascade paths require INSTEAD OF triggers or procedures to implement

Page 20: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Demo – Foreign Keys

Page 21: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Domains

• Defining the domain of an object or column• Table - Customers? All customers or certain types?

• Column• Integer? Or Whole number between 0 and 10,000,000

• True Unicode Value accepting 64K Characters? Or simple AlphaNumeric?

• Can you accept 2GB of Text (varchar(max))?

• Goal 0% chance of defects• No situational intelligence

• If there can be ANY variation, then the domain includes the variations

• Can't fight users doing dumb stuff

Page 22: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Please don’t do this. Please?

CREATE TABLE object ( objectId uniqueidentifier,

fillMeUp varchar(max)

)

Page 23: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Extreme Bucket Datatypes

• numeric(38,2)• Max value:

999,999,999,999,999,999,999,999,999,999,999,999.99

• Bill gates worth: < $99,999,999,999.99

• US National Debt + All personal Debt: < $99,999,999,999,999.99

• For a nutty value: Distance to nearest galaxy in inches, yes, inches

~74488200000000000000000.00

Page 24: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Extreme Bucket Datatypes - Strings• varchar(8000)

• abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz

• That is just 780 characters!

• Note: If you allow N characters, your apps should minimally test for N (successfully), and N + 1 characters (error)

Page 25: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

On the other hand, don’t be over restrictive

Why did they name me 555 95472? Now I

can’t go to school because of the stupid

school database!

http://peanuts.wikia.com/wiki/File:555.jpg

Page 26: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Single Column Domain• What data is EVER "legal" for a column

• Most data integrity issues are due to lack of domain control• Missperllings: TN, TNN, TENN, TENNESEE, TINNESEE

• Bad values: -1 for Age, NULL for required value, Random default value chosen

• Implementation Includes• Intrinsic data type

• Optionality (NULL v NOT NULL)

• Default Value

• Simple predicates • Check constraint

• Domain table

• Forcing the Issue: Trigger

Page 27: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Multiple column

• Where the domain of one column is affected by the domain/value of another

• Examples: • if col1 = 1 then col2 in (1,2,3)

else col 2 in (3,4,5)

• If col1 = 'bob' then col2 is NOT NULL

• Usually implemented with a CHECK constraint

Page 28: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Multiple Column Concerns• Minimize these conditions to only where necessary to avoid

illogical/illegal data• RefusedToGiveBirthDateFlag = 1 AND BirthDate is null

• Questionable: if DiscountPercent > .5, then ApproverUserId is not null.

• Likely Contraindicated - Processing Situations• The user enters Date1 always before Date2

• The ship date must be after the order date

• Avoid domains based on data in other tables because data in other tables can shift, leading to messy situations• discountPercent > .5 and savingUser.needsApproverFlag = 1 then

ApproverUserId is not null

• What happens if you change/delete the user that is referenced in savingUser?

Page 29: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Check Constraints• Applied to complete implementation of 99.9% of simple

domains• May help performance because it gives the optimizer knowledge of

the data

• Part of the data structure – applied with ALTER TABLE

• Simple predicate implementation• If any column allows NULL, the expectation is that NULL is an

acceptable answer unless specifically coded for

• Hence, to fail CHECK condition, the answer must be FALSE (unlike WHERE clauses that succeed only when the result is TRUE)• 1=1 TRUE – Acceptable for WHERE or CHECK

• 1=NULL UNKNOWN – Succeeds for NULL Column CHECK CONSTRAINT ONLY

• 1=2 FALSE – Fails for both

Page 30: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Demo – Domains

Page 31: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Conditions

• Making sure that some condition is met reliably

• Examples• Row Modification Details

• Overlapping Ranges

• Big decisions here• Non-trivial to implement

• Feels natural to do it non-data tier code

• However non-data tier code:• Can be less reliable

• Can be greatly affected by concurrency

Page 32: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Tools

• Triggers• Instead of Trigger to Automatically Maintain Values

• After to validate complex conditions that must be constantly true

• SQL• Optimistic Locking to avoid heavy locking without lost updates

Page 33: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Demo – Protecting against Conditions

Page 34: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Performance Concerns…

• For most everything you will commonly need, it can be based on basic declarative integrity constraints

• By now, there will be some concern about performance

• Performance WILL be impacted, • Done well: almost negligible

• Done poorly: can lots of pain

• The next demo will do a non-scientific, single user job of showing the performance hit is noticeable, but not tremendous…

Page 35: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Demo –Performance

Page 36: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Summary

• Getting the structure correct is a great start towards data integrity

• Make sure column values are always within an acceptable tolerance so software doesn’t break

• Employ all of the tools SQL Server gives you to help ensure data integrity

• Use non-data tier software to ensure errors that return from the data tier are extremely rare

• The key word is: teamwork. You can’t do an adequate job of protecting data in the UI, Business/Object or Data tiers alone

Page 37: HOW TO IMPLEMENT DATA INTEGRITY IN SQL SERVER Louis Davidson (drsql.org) drsql@hotmail.com.

Trust but verify

• Never stop testing the data, even into production

• Be vigilant• Test the structures to make sure constraints not

disabled and are trusted

• Test data that is not constrained in a 100% manner

• Use your slow periods wisely, running tests regularly

• Even 1 bad row that a customer notices means they may no longer trust the data…