Top Banner
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd. 1 Clariteq ADM extract © 2010 Clariteq contact [email protected] Extract from Clariteq’s workshop: Advanced Data Modeling - Communication, Consistency, and Complexity Alec Sharp Senior Consultant Clariteq Systems Consulting Ltd. West Vancouver, BC, Canada Mobile – 604 418-3352 [email protected] www.clariteq.com Proprietary material – please do not distribute! Thanks, Alec Clariteq ADM extract
99

ADM Extract v9.2

Mar 27, 2015

Download

Documents

Vidhya Sankaran
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.1

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Extract from Clariteq’s workshop:

Advanced Data Modeling -Communication, Consistency, and Complexity

Alec SharpSenior ConsultantClariteq Systems Consulting Ltd.West Vancouver, BC, CanadaMobile – 604 [email protected]

Proprietary material –please do not distrib

ute!

Thanks, Alec

Clariteq ADM extract

Page 2: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.2

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

“Thanks!” from Alec for participating!

� Me: [email protected]

� My company: www.clariteq.com

� My book: Workflow Modeling, Second Edition (A complete rewrite of the first edition, not just a minor

refresh)

� Microblog: www.twitter.com/alecsharp

� Data Modeling blog:

www.erwin.com/expert_blogs/authors/22/Alec’s bio:

Alec Sharp, a senior consultant with Clariteq Systems Consulting, has deep expertise in a rare combination of fields – business process analysis and redesign, application requirements specification, and data modeling. With almost 30 years of hands-on consulting experience, his practical approaches and global reputation in model-driven methods have made him a sought-after resource in locations as diverse as Ireland, Illinois, and India.

He is also a popular conference speaker, mixing content and insight with irreverence and humour. Among his many top-rated presentations are “The Lost Art of Conceptual Modeling,”“The Human Side of Data Modeling,” “Crossing the Chasm - From Process Model to IT Requirements,” and “Getting Traction for Process – What the Experts Forget.”

Alec literally wrote the book on business process modeling – he is the principal author of “Workflow Modeling: Tools for Process Improvement and Application Development, Second Edition” The first edition was published in 2001, and the second edition was published in 2009. It has consistently been the top-selling title on business process modeling, and is widely used as a consulting guide and as an MBA textbook.

Alec’s popular workshops on Workflow Process Modeling, Data Modeling (introductory and advanced,) and Requirements Modeling (with Use Cases and Services) are conducted at many of the world’s best-known organizations. His classes are practical, energetic, and fun, with the most common participant comments being “best course (or best instructor) I’ve ever had.”

Page 3: ADM Extract v9.2

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Clariteq courses for analysts

Workflow Process Modeling – Defining, Mapping, and Analyzing Business Processes 2 days

Business processes matter, because business processes are how value is delivered. Understanding how to work with business processes is now a core skill for business analysts, process and application architects, functional area managers, and even corporate executives. But too often, material on the topic either floats around in generalities and familiar case studies, or descends rapidly into technical details and incomprehensible models. This workshop is different – in a practical way, it shows how to discover and scope a business process, clarify its context, model its workflow with progressive detail, assess it, and design a new process. Everything is backed up with real-world examples, and clear, repeatable guidelines.

Data Modeling – A Business-Oriented Approach to Entity-Relationship Modeling 2 days

Data modeling is critical to the design of quality databases, but is also essential to other requirements techniques such as workflow modeling and requirements modeling (use cases and services) because it ensures a common understanding of the things – the entities – that processes and applications deal with. This workshop introduces entity-relationship modeling from a non-technical perspective, provides tips and guidelines for the analyst, and explores contextual, conceptual, and detailed modeling techniques that maximize user involvement.

Requirements Modeling – Proven Techniques for Use Cases and Service Specifications 2 days

Use cases have offered great promise as a requirements definition technique, but many analysts get disappointing results. That’s because published methods are often inconsistent, complex, or focused on internal design. This unique workshop clears up the confusion. It shows how to employ use cases to discover external requirements – how users wish to interact with an application – and how to use service specifications to define internal requirements – the validation, rules, and data manipulation performed behind the scenes. Better yet, it shows in concrete terms how the two perspectives interact, and demonstrates synergies with data modeling and business process workflow modeling.

Advanced Data Modeling – Communication, Consistency, and Complexity 2 or 3 days

After gaining some practical experience, data modelers encounter situations such as the enforcement of complex business rules, handling recurring patterns, satisfying regulatory requirements to capture complex changes and corrections, dealing with existing databases or packaged applications, integrating with dimensional modeling, and other issues not covered in introductory data modeling classes. This highly participative workshop provides approaches for many advanced data modeling situations, as well as techniques for improving communication between data modelers and subject matter experts.

Facilitation & Presentation – Session Techniques for Business Analysts 2 days

The primary approach for discovering and validating business requirements has shifted from one-on-one interviews to facilitated workshops. This began with JAD or “joint application development” sessions, and has now become the norm. Just as important as gathering information in a facilitated session are skills in presenting that information for validation and to inform a wider audience. While there are many general-purpose courses available on these topics, there is very little available that is specifically designed for the needs of the business analyst. This unique workshop will provide specific methods and techniques in both skills – facilitation and presentation.

Now available! Business Analysis Overview – Model-Driven Techniques for Processes, Applications, and Data 2 days

Essential content from Clariteq’s Process, Requirements, and Data Modeling workshops.

Page 4: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.4

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

The problem… Why it’s a problem…

1. Missing the point

altogether

2. Starting with a data

modeling lecture

3. Not investigating the

“as-is” model

4. Fear of asking

“dumb” questions

5. Not applying graphic principles

6. Getting stuck in a data modeling rut

7. Generalizing too much,

too soon

Seven typical problems

You’ll turn potential participants into actual

non-participants

You need it to show how much better life will be with the “to-be”

You need to show that they’re the experts, and someone will be glad that you asked

An ERD is a graphic – otherwise,why bother?

You won’t get full participation,

understanding, and buy-in

You’re really just showing off –

give us mere mortals a break!

We’re designing businesses, not databases

1 - think about it – do architects bring hammers and saws to their first meeting with a client

2 - by putting them to sleep

3 - and know what has to be left in place, and what has to be converted or integrated

4 - and besides, you never really know the business as well as you think you do

5 - maybe you could just give them the DDL

6 - because for some of the folks, you aren’t using the right language

7 - besides, your “elegant” model is probably wrong if no one can validate it

Page 5: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.5

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

The behavior… What it means…

1. Accessibility

2. Directionality

3. Simplicity

4. Consistency

5. Visibility

6. Relevance

7. Plurality

Seven positive behaviors

Data modeling can be challenging enough to participate in - make it easy for everyone to get involved

Like process models and org charts, data models are easiest to understand if they have a direction.

The forces of complexity are everywhere –resist them! Use simple techniques and

frameworks, at least at the beginning.

Like children, adults learn from repetition –

always do the same things the same way, & they’ll learn modeling by osmosis.

It’s best if your clients spot the need for things like generalization – be patient, and give them every chance.

Data models can be quite abstract to many people, so “attach” concrete, relevant artifacts and issues to them

Data modeling, and data model diagrams, appeal to some, but not all – use other techniques to involve everyone

For example…“Just do it!” - don’t start with a lecture on data modeling

Draw models so that

dependency is visually obvious.

Use methods that let you start simple, and

add detail in layers

Follow the same “script”

whenever adding a new entity

Draw models so that generalization , etc. are visually obvious.

Use familiar “props” like forms or reports to illuminate models

Use scenarios and narratives in addition to E-R diagrams.

And maybe…

8. Patience (is a virtue)

9. Humility (Don’t be afraid to ask! Spend more time saying “tell me more.”)

10. Empathy (Feel their pain! Put yourself in their shoes!)

Page 6: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.6

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Order ID

Placed Date

Delivery Date

Status

etc.

Product ID

Description

Unit Price

etc.

Customer ID

Name

Billing Address

Shipping Addressetc..

Order ID

Product ID

Quantity

etc..

• A description of a business in terms of the things it needs to know about

• Things (Entities) and Facts about Things (Attributes & Relationships)

• “Real world”, not technical implementation

• Graham Witt – “A narrative supported by a graphic”

Customer definition:A Customer is a person or organization that is a past, present, or potential user of our products or services. Excludes the company itself when we use our own products or services, but includes cases where the Customer doesn’t have to pay (e.g., a charity.)

Plus “Assertions” (rules)- Each Order must contain one or more Order Lines (i.e., at least one Order Line)- Each Order Line is contained in exactly one Order- Each Order can contain at most one Order Line per Product

Key Point

Not the same as database design

What is a data model?

Customer

places

Order

Order Line

contained in specifies

Product

Entitya distinct thing of interestabout which the business must maintain information

RelationshipA named associationbetween two entities

AttributeA property of an entitythat can be expressedas a piece of data

IdentifierOne or more attributes that can be used to uniquelyspecify a single instance(only in detailed data models)

placed by

There are many ways to describe a business...

• How it works - Process Model

• How it’s organized - Organization Chart

• Where it operates - Location Map

and…

• What it needs to maintain records about - Data Model

Data modeling symbols will vary slightly among the different “dialects”, but the meaning is constant.

The symbols are much more standardized than they used to be.

Data Modeling involves:

Gathering knowledge from Subject Matter Experts (the hard part!)

Representing that knowledge using a set of standard symbols and conventions (the easier part!)

Not just for Database Design anymore!

Page 7: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.7

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Entity types and conventions

Reference or TypeIndependentClassifies or categorizes other entities and/or allows the recording of allowable values for a descriptive attributeDrawn diagonally out from or beside the classified entity

CharacteristicDependent on one parentRecords multi-valued facts about a parent entity that have been “cast out” from that entityDrawn below parent

AssociativeDependent on two or more parentsRecords facts about a relationship (association) between two or more parent entities – is often the resolution of a M:M relationship between the parentsDrawn between and below parents

KernelIndependentA fundamental thing of interest to the enterprise whose existence does not depend on any other entity – it can “stand alone”Drawn at the top of its area

Recursive relationshipA relationship between instances of the same entity. Can be 1:1, 1:M, or M:M

SupertypeContains facts (attributes and relationships) that are common to all instances of the entity. Any kind of entity can be a supertype.

SubtypeContains facts that are specific to a particular subset of instances of the entity.

Page 8: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.8

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Didn’t know they were important

Different levels of detail support different perspectives

Type of Data Model .. The need

1

2

3

Contextual (Scope)

Conceptual

(Overview)

Logical (Detail)

� Agreement on “big picture” and

vocabulary for process or subject

� Agreements on basic concepts, more vocabulary, and rules

� Excruciating detail for physical

design

Upper levels often lost because…

Tool provided no support

Remember…• Maintain SME involvement

• Get maximum value from the technique

Started at too low a level

Three types of data models

Page 9: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.9

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Summary – data model types

1 2 3Contextual

(Scope)

� Agreement on “big picture”, main terms

and definitions� May be a simple block

diagram, or primarily

textual – a list� Optional – not

necessary on smaller

projects� Later in this course,

we’ll look at some

important techniques for dealing with

contextual models

� Agreements on basic concepts and rules

� Excruciating detail

for physical design

Conceptual (Overview)

Logical (Detail)

Main differences

� Ensures that everyone is on the same wavelength before diving into the details

� Overview: main entities, attributes, and relationships

� Lots of M:M relationships

� Relationships show multiplicity

� No keys

� No reference entities except where they are “structural”

� Many attributes will be non-atomic and multi-valued

� Verified by direct inspection

� A “one-pager”

� 20% of the modeling effort

� Provides all detail for first-cut physical database design and requirements specification

� Detailed: ~ 5 times as many entities as the conceptual model

� M:M relationships resolved

� Relationship optionality added

� Primary, foreign, alternate keys

� Lots of reference entities

� Fully normalized – no multi-valued, redundant, or non-atomic attributes. All attributes defined and “propertized”

� May be verified by other means: sample data, report mockups, …

� May be partitioned

� 80% of the modeling effort

Note that across the industry, there is a lack of consistency in defining these types of models. In the

“Zachman Framework” these would be the planner’s, owner’s, and designer’s views.

Analogies:

- The contextual model is like the site plan with a definition of what will be built. The focus is scope or

“footprint.”

- The conceptual model is like a floor plan and sketches for a building. The focus is the essential terms,

definitions, and facts / rules.

- The logical data model is like the detailed blueprints for a building. The focus is on the individual

data items the enterprise needs, and the rules that govern them.

A basic message for conceptual modeling – “Resist the urge to normalize or generalize until it

matters!!!

The logical model is not necessarily the “as built” model – the physical database design. The database

designer or DBA will make changes in the interest of performance, recoverability, distribution, etc.

Everyone who currently supports an application should:

- draw the application’s logical data model following strict top-down drawing conventions.

- abstract the model “up” to a conceptual data model

- at least consider reviewing the conceptual model with analysts, developers, and subject matter experts

to ensure that it reflects the intentions of the business.

Page 10: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.10

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Contextual data model

� A list of the main topics (“subject areas”) in scope, and an associated vocabulary or glossary

� Glossary may include items other than Entities E.g., processes, transactions, industry terminology, Key Performance Indicators [KPIs], etc.

� Primarily textual; optionally, a diagram showing the topics and their interrelationships, e.g.

Main use: “Do we understand the scope and the main terms?”

Page 11: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.11

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Conceptual data model

� Shows main or core entities, relationships, and attributes

� Gets the “concept” across

� Great for communication, but not for database design

� Best done before any significant process modeling or applicationrequirements (use cases and service specifications)

Let's see what

happens when we

take these three entities to the

"Logical" level...

The conceptual model is the “crossroads” at which both business and IT can communicate – both

parties have “shared accountability” to ensure that there is a common understanding of the basics.

As you add detail, your conceptual model will evolve into a logical data model, but don’t lose the

conceptual view!!! It is an absolutely vital tool for presentations, training, and so on.

After Logical Data Modeling, the next stage in the progression would be to turn your logical data

model into a Physical Database Design for your particular implementation environment (MS Access,

SQL Server, Oracle, DB2, etc.)

Page 12: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.12

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Logical data model

� All necessary detail – it’s the data specifications

� Input to first-cut physical database design

� Completed after use cases and service specifications are finalized

This could be made even more detailed

• we haven’t shown entities like “Semester”, “Building”, or “Room”

• we haven’t shown reference entities like “Course Method” or “Degree Level”

Page 13: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.13

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

From conceptual to initial logical

• Multi-valued - the attribute can have multiple different values for one instance of the entity, either “at a time” or “over time”E.g., “Employee Name” if aliases or previous names are tracked

• move it down to the “many” end of a 1:M relationship into a characteristic entity

• if it’s a fact about a M:M relationship between entities, move it down to the “many” end of a 1:M relationship into an associative entity

• both move data structure into 1st Normal Form – 1NF

• Redundant - the same attribute value is recorded multiple times, in different entity instances, possibly inconsistently E.g., “Company Name” in a “Department” entity

• move it up to the “one” end of a M:1 relationship to one of the parent (or higher) entities (2nd Normal Form – 2NF)

• you might have to create a new parent entity where non existed before

• Constrained - a descriptive attribute needs to be restricted to a set of standardized values to improve integrity and reporting E.g., “Employee Type”

• move it out to the “one” end of a M:1 relationship to a reference or other related entity (3rd Normal Form - 3NF)

The progression from conceptual to logical is largely based on identifying and dealing with three attribute characteristics

For multi-valued attributes, ask “On what basis does the attribute repeat?” The answer should be in the

form “It occurs once per …” This will provide a clue as to what entity the multi-valued attribute should

be moved to.

Two variations of the same example:

- If a Resource has multiple Chargeout Rates over time, then the Chargeout Rate doesn’t vary in

relation to some other entity. We could say that the Chargeout Rate attribute repeats “within” the

Resource entity, so we’ll simply move it down (“cast it out”) into a characteristic entity called

Resource Chargeout Rate. It will need the attributes Effective Date and End Date in addition to

Amount.

- If a Resource has multiple Chargeout Rates, one per Project that the Resource is contracted to, then

we could say that the Chargeout Rate attribute repeats “in relation to” the Project entity. In other

words, we know that Chargeout Rate is a fact about the relationship between Resource and Project, and

belongs in an associative between them. That associative may depict a contract or agreement, and

might have the word “Contract” in its name.

Another example:

- If the attribute Expected Duration is in the Project entity, and it is multi-valued, with one value per

project phase, then Expected Duration should be moved down into a Characteristic (of Project) entity

called Project Phase. The Task entity would likely be a characteristic of Project Phase

Page 14: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.14

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Migrating multi-valued attributes

Attributes can’t repeat within an entity –

“repeating” or “multi-valued” attributes are moved into a characteristic entity

For each Section, there can be one or more Lecture times. Depending on the type of Course, there may be none.

For each Section, there can be one or more Tutorial times. There will always be at least one.

We must move each "repeating group" into a child entity.

Note –Later, we’ll discuss the

inclusion of primary keysand the added relationship symbols

This is one of the rules for normalization - entities are in First Normal Form once all the repeating

attributes or groups of attributes have been sent (“cast out”) to their own entities.

Page 15: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.15

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Migrating attributes of relationships

When the multi-valued attribute is actually a fact about a relationship, we create an associative entity:

"When did John Smith enroll in Math 100?""What grade did John get at midterm?""What was his final grade?““What is the average grade for Math 100 Section 3?”

These required facts are not about Student, or Section, but the relationship betweena Student and a Section

We need to create a new associative entity

“Many to many” relationships will almost always get a “promotion” to an entity, as in the example

above, because there are usually attributes about the relationship that must be recorded.

This is a variation on putting data into First Normal Form.

Page 16: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.16

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Migrating redundant attributes

We eliminate redundancy by ensuring that every attribute is in the entity that it describes, so that the attribute value is recorded only once.

• Before migration, attribute values about a Department would be recorded redundantly with every Course offered by that Department, so it is moved up to a parent entity.

• Before migration, values of the Delivery Method Description attribute would be carried redundantly in many instances of Course, so it is moved out to a “type” (or “reference” or “lookup” or “classification”) entity.

Eliminating redundancy puts entities into Second Normal Form if the redundant attributes move “up”

the parentage hierarchy, and into Third Normal Form if the attributes move “out” to a related entity

(often a “type” entity.)

Page 17: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.17

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

World’s shortest course on normalization

• Unnormalized (UNF or 0NF)� Contains a “repeating group”

• First Normal Form (1NF)� Repeating attributes moved down to Characteristic

or Associative entities

• Second Normal Form (2NF)� Only applies to dependent entities

� No attributes in a child entity are really facts about a parent (or grandparent or…)

� That is, no Characteristic or Associative entity redundantly contains facts from its parent(s) – if it does, move the fact(s) up(create a new parent entity if necessary)

• Third Normal Form (3NF)� If any entity redundantly contains facts from a

related (non-parent) entity, move the fact(s) out to the other entity (create a new entity if necessary)

• BCNF (Boyce-Codd NF)� Not an issue if you keep your wits about you

• Fourth and Fifth Normal Form (4NF, 5NF)� “Large” (3-way or more) associatives need to be

broken down into more granular entities

UNF

1NF

2NF

3NF

4NF, 5NF?...

Other normal forms – forget about it!

The reason we’re covering this? You have to be able to make it simpler for the data “layperson”

Page 18: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.18

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Script – adding a dependent entity

An “orderly script” –adding a new characteristic or associative entity to a logical model

1. Place the entity (and relationships) on the diagram according to dependency

2. Ask “What is one of these things?” then name and define the entity accordingly

3. Add relationship names, and add multiplicity (or confirm, if it was already specified)

4. Add attributes

5. Perform further attribute migration, dealing with multi-valued attributes first, and reference data last(1NF, 2NF, 3NF in sequence)… and only then worry about…

6. Relationship optionality

7. Primary keys or uniqueness constraints

8. Additional constraints (e.g., rules on date ranges)

Whenever you add a new entity

• check to see if attributes or relationships from nearby entitiesshould be moved to the new entity

• check that you haven’t introduced transitivity (clue: “loops”)

Consistency is very important to engaging your clients in the data modeling process. Have a method,

or have scripts – do the same things the same way, and draw the same things the same way. If you do

this, participants will learn modeling “by osmosis” and will learn what to expect. (E.g., that a M:M

relationship will eventually get resolved.)

Page 19: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.19

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Seven questions for date ranges and dates

For records dependent on the same parent…

1. Can there be gaps between date ranges of adjacent (in time) records?

2. Must the date ranges be contiguous (no gaps)?

3. Can the date ranges overlap?

For any date range…

4. Can a date range begin in the future?

5. Is a date range inclusive or exclusive of the

End Date? (“until” or “through?”)

6. Must a date range fit within the date range of a

parent entity?

7. Will the dates have to handle global time zones?

Note that in this example, we could ask the questions for both date ranges:

- Effective / End Date

- Recorded / Corrected Date

To clear up confusion around question 5, some organizations have standardized on “Last Valid Date”

instead of “End Date.”

Page 20: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.20

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Script – meeting a new requirement…

1) State the new

requirement as an assertion

2) Develop a conceptual solution

3) Develop a logical solution

� Start out using the client’s language� Then, ensure that the assertion uses terms from the

data model (entity names, relationship names, etc.) This “leads” you to the solution.

� Confirm it!

� Look for the simplest option first: no change needed, a new reference attribute, a multi-valued attribute(s),

M:M relationship, new entity � Explore rules, like “what is the basis for multi-valued?”� Confirm it!

� Fully normalized, fully attributed

� Follow an “orderly script” –don’t get ahead of yourself or the client

� Confirm it!, possibly using other easy-to-follow formats

such as screen or report mock-ups.

Confirm and extend the model:� discover new requirements, using a variety of techniques

Philosophy

� don’t dive in – start simple, add detail in layers

� start out in “natural language”

Issues in meeting new requirements:

Original modeler moves on, often without properly documenting the model, and subsequent modelers

don’t really understand the conceptual underpinnings of the model

Failure to confirm the requirement with the subject matter expert, often by not using techniques like

narrative assertions or concrete examples, and instead jumping too quickly into the details (keys,

normalization, detailed attributes, reference data, etc.)

When dealing with new requirements, modeler/DBA works at the physical level, instead of at the

conceptual level. The result – a tendency to “bolt on” new tables (entities) rather than properly

“building in” the new requirement. This results in more complexity than is really needed.

Page 21: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.21

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Refining the logical

• Non-atomic attributes:The attribute has “internal structure” - it could be decomposed into more granular (“atomic”) attributes. E.g., “Employee Address” is non-atomic, “Employee Address Street Name” is atomic – it is at the finest level of granularity that will ever be manipulated or displayed

• Semantically overloaded attributes:The attribute is “overworked” - it contains multiple differentattributes, typically encoded into a single attribute

• in the earlier days of systems, this was done deliberately by designers to save space (think of the Y2K problem…)

• now, it will more likely be done inadvertently by business people who don’t know the negative consequences of overloaded coding schemes

As the model nears completion, the entities have been made as granular (normalized) as necessary. Once the model meets known requirements, we’ll also “granularize”the attributes by finding and resolving the following:

Finally, name and define attributes, and document attribute properties

The distinction between non-atomic and semantic overload can be confusing:

A non-atomic attribute needs to be broken down into finer attributes, each of which is a “smaller” part

of the same overall attribute. See page 36 for more information and examples.

A semantically overloaded attribute also needs to be broken down, but into distinctly different

attributes as opposed to smaller pieces of the same attribute. See page 57 for more information and

examples.

Note – we don’t typically do this until after we’ve searched for, discovered, and satisfied outstanding

requirements using the techniques that we’ll look at shortly.

Page 22: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.22

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

A natural progression

Get into the high “value-added” space

� Contextual – helpful for large models

� Conceptual – a great way to add value

� Improve communication among all players

� Highlight disconnects – terms, rules, scope, …

!

Contextual

Conceptual

Logical

PhysicalDB

Design

Focus – scopecontext and boundaries, glossary of main terms and definitions

Focus – overviewbusiness perspective, all terms and definitions, overall structure, major facts and rules

traditional modeling and development

Focus – detailall facts, detailed rules, input to 1st cut physical design

The “Danger Zone”Analysts shouldn’t worry about physical design issues while data modeling.

reverse engineering

Page 23: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.23

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Three phases in data modeling

1) Establish initial Conceptual Data Model

2) Develop initial Logical Data Model

3) Refine & extend Logical Data Model

� Focus is on developing a

core set of entities:

� named

� defined

� minimally attributed

� bound by basic rules

and relationships

� placed on an ERD

� Might start bottom-up:

brainstorm details then

synthesizing “up”

� Might start top-down:

build a contextual model,

then flesh out required

details analyzing “down”

� Experiment w. alternatives

� Refine the contextual

model, if you had one.

� Focus shifts to attribute

rigor and structure when

going to the logical level

� First check attributes for:

� completeness

� necessity

� name and definition

� placement

� Resolve attributes that are:

� multi-valued

� redundant

� constrained

� Continue experimenting

with alternate structures

� Refine conceptual model

� Focus is on refinement, and

validation via new

requirements using…

� …an event-based

approach: fast and easy…

� …or full business analysis:

� process workflow model

� use cases (external)

� service specs (internal)

� Profiling existing data

� informational needs

� Resolve attributes that are

semantically overloaded,

non-atomic, or derived

� Document attribute

properties and validation

� Specify identifiers

� Refine conceptual model

Of course, step 0) is to establish Project Scope and Objectives

We covered all of the previous stuff so you’ll be able to simplify some of the techniques for others.

Page 24: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.24

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Definition Dependency

Detail Demonstration

� “What is one of these things?”

� List common and unusual instances

� “Are there any known anomalies?”

� “What are the potential differences of opinion?”

� “What type of entity is this?”

� “What other entity does it depend on?”

�Essentially, is it a free-standing thing, a type of things, or repeating detail about some other thing?

�Keep it in its place!

�GEFN! HPDL!

�Sample instances

�Schematics

�Props

Reminder – the four Ds of data modeling

Page 25: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.25

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Definitions must focus on what a single instance is:

• Not “how they’re used” or “how they’re created” or “why we care” or “how the process works” or

“interesting problems and tidbits” etc.

• Ask “What is one of these things?”

The most useful questions:

“Can anyone think of examples that might surprise someone else –

that is, anomalies or potential sources of confusion?”e.g., to define “Customer:”

• “In our area, other divisions are treated as customers”

• “We record recipients of charitable donations as customers.”

“Could we list some examples?”

• Rita Smith, Acme Auto, Ministry of Finance, homeowners… (aha!)

“Does this deal with “kinds of things” or “specific things?”

• “kind” - Customer Category vs. “specific” – an individual Customer

• if it’s a specific thing, still ask if there are recognized types

(e.g., Personal, Corporate, Government; Lead, Prospect, Active)

Entity definition basics

Key Point

“What is one of these things?”

The entity definition tells which things in the real world are included within our understanding of that

entity. For instance:

• The world has hundreds of millions of people who are “students”

• Which ones would we expect to find in a specific university’s Student database?

• Which ones would be excluded?

Two other useful questions:

• Are there life cycle issues to consider? For instance, Applicant to Candidate to Employee to Retiree

– does “Employee” include “Applicant” and “Retiree?”

• Does the same real-world thing appear as multiple entities? E.g., one person could be both a

“Driver,” a “Registered Vehicle Owner,” and a “Legal Vehicle Owner.” If this is of interest, you

might need to “generalize by” creating a “Person” entity.

A common error in entity definition - describing the current implementation instead of the “essence” of

what the entity is. E.g., “This entity is the ASF-72 created by Emily down in Personnel.

Another common error - using the entity name to define itself. E.g., "A Contract is a contract between

the corporation and …"

Finally, note that the last example on the slide indicates two separate “type” classifications –

Customer Legal Entity Type and Customer Status Type

Page 26: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.26

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Entity definition format:

1. A description of which real-world things will be included in scope. This might be developed from a list of standard “thing types” – person, organization, request, transfer, item, location, activity, etc.

Be sure to identify specific inclusions or exclusions.

2. Illustrate with examples:

• 5 – 10 sample instances

• diagrams

• current “props” like reports or forms

3. Interesting points – anomalies, synonyms, common points of confusion, etc.

CustomerA Customer is a person or organization that is a past, present, or potential user of our products or services.

Current examples include Solectron (contract manufacturer,) Cisco System (OEM,) Arrow Electronics (distributor,) Best Buy (retailer,) M&P PCs (assembler,) and individual consumers.

Excludes the company itself when we use our own products or services, but includes cases where the Customer doesn’t have to pay (e.g., a charity.)

Entity definition format and example

CustomerWe have a variety of Customers that operate in multiple geographies, and these must be tracked in order to consolidate purchasing statistics and enable our rating process to identify our best Customers.

Page 27: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.27

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Guidelines for working with assertions

1. Focus on the appropriate case –most assertions begin with the word “Each”

2. Exclusively use terms from the data model –entity, relationship, and attribute names

• If there’s a concept that can’t be described with existing terms, you’ll need to add to the data model

3. If the assertion describes a relationship, you must

state it in both directions

4. If the assertion describes a relationship, be clear on whether cardinality is “one” or “one or more”

Each Instructor teaches one or more Sections(Sounds good…)

Each Section is taught by one Instructor(Really…?)

Entity definitions and uniqueness constraints are also assertions.

Page 28: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.28

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Logical Time Physical Time

� Effective date/time,Start date/time,

Begin date/time,etc.

� Time that data reflects the

intent of the business at the time of update

� Reality

Remember

• Can be updated

Remember

• Cannot be updated

� Recorded date/time, Transaction date/time,

Update date/time,etc.

� Time when a record was

written to the database

� Representation

Two important time concepts

Wrong – with developments like

Sarbanes-Oxley, we don’t changestored data, we add new records.

A third type of time is “User Time” - any other date/time of interest to the business

(e.g., Reservation Arrival Date)

Page 29: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.29

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Time dependent data – key points

� Facts that change independently should be recorded independently

� Never name the entity “History” –it probably includes present and future values

� Distinguish between

• business Effective Date

• database Recorded Date

� It’s tempting to put “Effective Date” in the key,

but it might change

� Be sure to define what End / Expiry date means

� Capture the need (the “reality”) first in the model,then factor in performance considerations

� You might need to consider time zones

• GMT / UMT

• Local offset

Plus –

• don’t change stored values, add new records

• check for “one at a time, many over time” vs. “many at a time, many over time”

Page 30: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.30

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Four key points about complex associations

1. You can’t tell whether a model is correct or not simply by inspecting it

– you must have business involvement

This gives rise to the other three points…

2. You must draw the model in a top-down fashion (or other systematic approach) so you can actually see dependencies

3. You must state your assumptions or understanding in narrative form as assertions, using terms (entity names, relationship names, and

attribute names) from the data model

4. You must illuminate the data model by using sample data, schematic

diagrams, scenarios, or some other understandable form

Page 31: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.31

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

A quick exercise…

1. The company decides which items will be carried at which stockrooms.

2. The company qualifies suppliers to provide specific items.

(A supplier can be qualified to provide multiple items, and an item may

be provided by multiple suppliers)

3. The company enters into a contract with qualified suppliers for each

item they will provide to a specific stockroom.

Will this model satisfy the business constraints?

If not, identify specific problems and develop a better model

A 5NF violation occurs if independent relationships between pairs of entities have been lumped

together with other independent relationships.

Page 32: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.32

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

4th Normal Form

• 4NF - “Primary Key cannot contain 2 or more independent,

multivalued attributes of another entity”

• The classic example:

Employees may have Skills and/or Languages

This version is incorrect, becauseSkill and Language are independent

This version is correct

Again the rule is

If only certain combinations of entities are valid, create an associative entity to record those

combinations

The associative should be as “small” as possible. That is, two entities each having a two part key is

preferable to one entity with a three-part key, if each “small” entity with a two-part key could exist

independently of the other.

If Language and Skill weren’t independent, then the original model is okay. (For example, if each

Skill could only be practiced in certain Languages)

4NF is pretty obvious. Things get trickier when we look at 5NF

Page 33: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.33

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

5th Normal Form

• How we model three or more related entities depends on the rules

• Agents represent Manufacturers in Regions - if any combination is valid, the model to the right is fine

• What if there are additional constraints?

– “business rules”

– only certain combinations are valid

Agent ID

Agent

Manufacturer ID

Manufacturer

Region ID

Region

Agent ID

Manufacturer ID

Region ID

Representation

Fifth Normal Form deals with associations between three (or more) entities when there are independent

relationships between two (or more) of those entities.

Page 34: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.34

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

5th Normal Form

• Assume the following constraints:– Agents only represent certain Manufacturers

– Manufacturers only distribute in certain Regions

– Regions are only covered by certain Agents

• Now we have a “cyclic dependency” within the key of Representation– violates 5NF

“Cyclic dependency”:Agents are related to Manufacturers,Manufacturers are related to Regions,

and Regions are related to Agents

What are the problems with the form shown above?

“Independent multi-valued relationships” and “cyclic dependency” are the usual normalization

bafflegab that hides the real issue – a 5NF violation occurs if independent relationships between pairs

of entities have been lumped together with other independent relationships.

Page 35: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.35

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Two sides of the house

Corporate

mission, strategy,

goals, and objectives

Operational

Business Processes

Operational

Applications

supports

Operational

Data

support

support

Executive Functions

and Processes

DSS, EIS, BI,

reporting, etc.

facilities

supports

Data Mart,

ODS, …

support

support

ETML*

We’ve looked at techniques that are appropriate for this side of things…

… but other techniques are appropriate for the information deliveryenvironment

Atomic

Data

Warehouse

* extract, transform, move, load

Entity-Relationship Model Star Schema or Dimensional Model

Page 36: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.36

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Oh-oh…

A detailed data model might be too complex to present to business folks for query, OLAP, BI, etc.

Page 37: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.37

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Dimension Dimension

Dimension Dimension

Fact

� Used to model and implement

data structures for various

types of business intelligence

tools.

� One or more dimensional

models per warehouse model

� We’ll use the terms dimensional model and star schema

interchangeably

� Any combination of dimensions

can be used in a query

• the same dimension will appear in many dimensional models

• should be managed as “shared dimensions”

Dimensional models

Page 38: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.38

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

� the central thing you want to count or measure

�has a count, usually “1”

�often details of a transaction or other core Associative entity(e.g., Sale, Shipment, Crime, Claim, …)

�can have attributes, but when they apply to a Fact they are called measures(e.g., Sale has Total Amount, Time, Payment Method)

“Facts” “Dimensions”

�how you want to organize or summarize the facts

�often a Type or Kernel entity(e.g., Region, Time Period, Product, Customer, …)

�can have attributes(e.g., Product has Category, Price, and Color)

Dimensional model concepts

Page 39: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.39

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

� The fact is usually an associative

entity from somewhere quite “low”in the ERD

� The fact will usually include a “count” of something, even if the

value is implicitly “1”

• E.g., “dollars” or “hours” or “units”

� The dimensions are “clusters” of the fact’s parents, grandparents,

etc. entities

� Any combination of dimensions can be used in a query

• the same dimension will appear in many dimensional models

• should be managed as “shared

dimensions”

Dimensional model – example

CalendarPolice Force –

Location

Court Statute

Crime

Page 40: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.40

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Step Notes

1

2

3

4

5

Identify questions

Identify facts

Identify dimensions

Add attributes

Add calculations

What sorts of relationships among the data are of interest? E.g., want to study sales by product color and customer, or by region and employee seniority.

What is the central thing (or things) of interest? Often a transaction or event entity with multiple

parents and classifications. E.g., a Sale

How will facts be organized? Usually an entity

related to the fact entity (a foreign key.) E.g., Employee, Customer, … May be hierarchic, e.g. Country, Region, “State”, …

What additional detail is needed? Facts have“measures” and dimensions have “attributes”. E.g., Sale units, total price, time of day, …

Identify calculations such as totals, average, or projection that should be pre-defined. E.g., average sale price, total sales per month,

The classic methodology

You may end up producing more than one star schema. Each will get collapsed into a single table

(named for the “fact”). Tables will then have to be joined (but these will be far simpler than what

would otherwise be necessary)

A few guidelines:

• Don’t try to get all your operational data perfect first, or you’ll never get anywhere

• Accept that after the data structure is in use, the questions will change. Embrace iteration.

• Manage the volume. Combining two “facts” (star schemas) into one table may cause exponential

volume increase. Focus initially on the critical measures and attributes.

• Start with a good, normalized data model that clearly shows dependency, as we’ll demonstrate in a

minute…

Page 41: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.41

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Title ID

Name

Author

Title

Format Type Code

Name

Format Type

Title ID

Copy SID

Purchase Price Amt

Acquisition Date

Status Code

Format Type Code (fk)

Copy

is an instance of

Loan ID

Title ID

Copy SID

Due Date

Return Date

Status Code

Loan Item

Loan ID

Date

Cardholder ID (fk,nn)

Loan

Cardholder ID

Name

Number

Member Since Date

Cardholder

takes is part of

Dimensionis classified by

is taken by

Fact

Dimension

Dimension

Dimension

Dimension

Publisher ID

Name

Publisher

available from

Not a dimension

But it’s easier with an ERD

Page 42: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.42

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

� Any parent (or grandparent or…)

entities that are encountered following M:1 relationships from the

fact are possible dimensions

� Any entities that are 1:M or M:M

from the fact cannot be dimensions without “faking” the data

� Additional dimensions not in the

original structure (e.g., Time Period) can be added

� Essentially, a basic dimensional model (no snowflakes) collapses

an ER model to a two-level structure with a 1:M relationship

between each dimension and the fact

From E-R to dimensional

Loan

Calendar Cardholder

Author Title

Page 43: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.43

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Jim’s sister-in-law June has just returned from a BI conference, and she has Jim all wound up about building a query database so he can analyze sales (purchases by customers.)

Construct a dimensional model for Jim, using the following E-R model as a starting point. At this point, don’t worry about individual attributes – just which entities would collapse into which fact or dimension. A few notes:

- Jim’s has grown to a nationwide chain, with stores in many regions. Most regions cover one or more states, although some regions only cover part of a state (e.g., Northern California and Southern California). Each store is in a single city, though, and each city is in only one region.

- The layout of stores (Sections, Aisles, Store Categories, etc.) varies widely across the stores.

- The “Store Category” indicates if the store is a mall location, streetfront, “captive” (contained within another retail outlet,) etc. Web sales are not a factor.

Jim is especially interested in how the same Title sells depending on where in the Store it is displayed, because the same Title might end up in different Sections. He also wants to look at Sales by Store, Region, Artist, Publisher, Supplier, Category, … well, just about everything! You’ll have to decide what’s possible, and then be prepared to explain it to Jim!

Exercise: dimensional modeling

Page 44: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.44

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Dimensional modeling exercise

Page 45: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.45

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Solution: dimensional model

As it turns out, having an E-R model is invaluable in producing a valid star schema, although many

data warehouse experts will argue the point…

Page 46: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.46

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

� fixed number of repeating attributes

� may be an “array”e.g., for each Quarter, also

record:• Target Sales Amount• Sales Per Employee Amount

• … ?Divisional Sales (in 1,000,000s)

Year Q1 Q2 Q3 Q4

2005 1.45 1.37 1.40 1.67

2006 1.46 1.40 1.63 1.91

2007 2.11 2.32 … …

Each row is a vector

Handling “vectors” of attributes

Page 47: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.47

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

“Row-wise” table

• one row per vector;attributes go in

separate columns

“Column-wise” table

• multiple rows per vector;attributes go in a single column

• same handling as for other multi-valued attributes

• easier SQL queries(e.g., average sales)

• More efficient for sparse data

• flexible:– change vector length– add additional attributes

(like Top Sales Rep for each Quarter)

• familiar layout

• from “row to screen”is easier

• fewer tables and joins

• more suitable in DW/DSSenvironment

Advantages Advantages

Alternatives for modeling vectors

Has anyone had experience with this situation?

The point – don’t be too quick to translate reporting layouts into operational data structures

Page 48: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.48

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

�When one entity occurrence can be related to another occurrence of the same entity type

�Three variations –1:1, 1:M, M:M

�Recursion and generalization often go together

Division

Department

Section

OrganizationUnit

generalizes

Recursion

Page 49: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.49

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Recursion - recognizing the data structure

The name on the M:M (network) relationship could be more descriptive:

• contains / contained in

• precedes / follows

• substitute with / substitute for

Drawing out examples (the fourth “D” in data modeling) will always help

Page 50: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.50

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Supertypes and subtypes

ManagementJob

� Breaks an entity down into two or more 'subtypes', or generalizes two or more into a single 'supertype'

• common relationships and attributes go into supertype

• unique relationships and attributes go into subtype

� subtypes are mutually exclusive and mandatory –

there is exactly one subtype instance for each supertype

� a.k.a. generalization-specification, or gen-spec

BargainingUnit Job

Job Title

Creation Date

Job Type Code

Hourly Wage Amt

Confidential Flag

Salary Amount

Certification

Job

required

for

requires

Supertype

Subtypes

Employeedescribesduties of

performs

all jobs

only B.U. jobs

Page 51: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.51

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Generalization vs. subtyping

� “Generalization” is the usual bottom-

up O-O term;

“subtyping” is the usual top-down E-R term

� Generalize whenever two or more entities, each with their own distinct

attributes and relationships, also share

other attributes and relationships

� Automobile, Aircraft, and Vessel have

common attributes that could be generalized into Vehicle…

� …or, Vehicle could be sub-typed into

Automobile, Aircraft, and Vessel, with the same outcome

� Note that it’s common for a subtyped entity to also be classified by a type or

reference entity

Page 52: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.52

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Facilitation – models are built in “sessions.” Why?

1 - The plan:orderly one-on-one interviews

2 - The reality:"the analyst as messenger"

3 - The response:facilitated sessions

Advantages:• speed and quality• commitment• communication, team building• business understanding

Disadvantages:• longer elapsed time• incompleteness• encourages parochialism• no real communication or

consensus

Page 53: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.53

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Should I always use facilitated sessions?

Conceptual Data Models

� up to 8 or 10 content experts

• cross-functional

• mid to senior level

� up to 3 or 4 analysts

• facilitator, analyst, …

� up to 3 or 4 technical experts -

architect, DBA, developer, ...

� Focus is agreeing on concepts,

terminology, rules

� Sessions are essential!

Logical Data Models

� multiple, smaller groups of

content experts, or individuals

• specialists

• managers or supervisors

• “front line” contributors

� small number of IT specialists (or just one) –

analyst, DBA, developer, …

� Focus might be on Process or

Application Requirements

� Sessions are less suitable!

Key point! - Conceptual and Logical data modelingrequire substantially different skill sets.

Conceptual model to support “Fill Order” process will involve cross-functional reps

May separate into multiple logical modeling sessions for

• Customer Relationship piece

• Sales

• Manufacturing Planning and Manufacturing

• Logistics

• Accounts Receivable

Page 54: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.54

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Facilities requirements

Don’t forget flipchart pens,

whiteboard pens, “wall safe” masking tape,flipchart stands & paper, rolls of plotter paper or butcher paper, Post-its,

rubber bands, note paper, …

The facilities really do influence session results...

• comfortable, roomy, and away from work area

• wide U-shaped layout

• lots of whiteboard space and “plain” wall space

whiteboard

flipchartflipchart

participant seating

refreshments, etc.

facilitator’ssupplies

No empty seats – “energy holes”

Room for everyone to work on the wall

As an alternative to the U shape, you might have “rounds” of 4 or 5 people each

Page 55: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.55

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Attitude – “I’m here to do a job, not work a miracle”

DO -• Help develop objectives and plan• Enforce rules & plan• Maintain focus on topic• Press for completion and quality• Help everyone participate• Ensure recordingDON’T -• Develop content• Push a point of view

Sponsor

Facilitator Participant

• Participate!• Provide information• Suggest ideas• Make decisions

• Confirm scope and objectives• Determine and “invite” participants• Arrange other resources• Resolve difficult decisions

Everyone has a job to do - don’t try to be Atlas!

Page 56: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.56

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

The world’s shortest course on facilitation

What You Do What They'll Do

� Write something up

� Watch facial expressions, and ask

� Find areas of agreement

� Use alternate forms of information

� Take time to think, and use the group

� Remember your role –facilitate, not participate

� Acknowledge what is

� Tell you if it's wrong

� Appreciate the opportunity

� Take care of the disagreement

� Build a better product

� Use the time too, and generate the way forward

� Do their job –you stick to yours

� Deal with it

Page 57: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.57

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

“Before I begin my speech, let’s cover a few of the basic rules of grammar. A noun is any... ”

“Before we begin our data modeling session, let’s go over some key points about data modeling. First, an Entity is any uniquely identifiable person, place, thing, event, concept, or organization of interest to the enterprise about which facts maybe recorded. Any questions? I didn’t think so…”

Don’t begin with a lecture on data modeling

Avoid starting with the theory and practice…

Data modeling sessions go better

Allows use of data modeling in non-typical situations

a) - Getting started bottom-up

If you can get away with it,

don’t even call it “data modeling”

Why not?

• “Purple monkey water wrench” – a phrase I saw in an article making the point that our IT terms

(foreign key, referential integrity, cardinality, …) aren’t any clearer to the client

• May lead to boredom and mental shutdown

• May lead to resentment and non-participation

• It’s unnecessary! Some things are easier to just do. Coaching basketball - initially, by example.

Non-typical situations

• Goal Setting and Planning

• BPx

• Package Evaluation and Selection

Page 58: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.58

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Do begin with a brainstorm

Collect(Brainstorm)

Sequence(dependency,priority, …)

Reduce(eliminate,cluster…) Problem

or question

Expand

Usefulresult

When in doubt -

make a list!

CoRSE:The Facilitator’s Friend

Lots of

suggestions

Selected set of

answers or points

Organized set of

points or topics

Not always, but it’s a good default

Gets everyone involved easily, and level-sets (“role induction”)

Level-sets

If your data model isn’t going to start with brainstorming, maybe do a “venting” brainstorm.

Page 59: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.59

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

“CoRSE”: the specifics

Collect (Brainstorm)

� State problem or question

� Going clockwise (fast) everyone

makes one suggestion

• “pass” if nothing to add

• “pure” brainstorming is random, not “in turn”

� Stop when everyone 'passing', or agreement to stop, or time’s up

� Record without editorializing

� might ask for short phrase

� might paraphrase for confirmation

� Keep it moving, enforce rules

� No discussion

� quick clarification or positive

comments okay

� absolutely no negative commentary

���� Reduce

� Eliminate: redundant, out of

scope, …

� Cluster

� Select

����

Sequence

� Goal: workable sequence

� By dependency, chronology, priority, …

� Not permanent – just to

organize the session

����

Expand

� Collect more info: define,

alternatives, pro/con, …

� Apply CoRSE on each item

����

Page 60: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.60

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Applying CoRSE to starting a model

� For “anything related to data or information in any way,

shape, or form” (e.g., things of interest, information needs, facts,

queries, calculations, reports, etc.) Or, simply gather nouns.

Brainstorm…

For each item, ask “Is this a thing, a fact about a thing, or other stuff?”

� Circle things� Cluster facts around the appropriate thing� Other stuff will include reports, forms, systems, departments,

processes, etc. –use these as clues for more things and facts about things

Choose the fundamental terms

� Kernels, then their dependents

Entity definitions and major attributes

� Focus on anomalies and “likely sources of confusion”� Don’t worry about normalization, generalization, keys, …

Collect

Reduce

Sequence

Expand

1

2

3

4

Accessibility – no jargon! Again – this is “role induction”

“Fact about a thing” – attributes or relationships. Don’t worry about keys!!! (or normalization or

atomic attributes or generalization or ANY of that stuff)

Page 61: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.61

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

b) - Getting started top-down

“Draw five boxes. Any five boxes.”

Quotation Booking Confirmation& Ticketing

Amendment Flight

Stockroom Item Supplier

Inventory Availability

&

Agreements

Intake Diagnosis ServiceAssessment TreatmentPlanning

At this point, these could be subject areas, activities, states, … - it doesn’t matter!

Page 62: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.62

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Working with the “big picture”

Sources:

� Review “artifacts” such as

• input formats (screens, web pages, forms…)

• output formats (reports, queries…)

• training materials or periodicals on the topic

• other written documentation

• again, search for nouns and verbs

What to do with the five boxes:

� Have clients describe what they need to know about each “box,” or what they do, or what the problems are… Just keep listening for

and noting:

• nouns – possible entities

• verbs – possible relationships and processes

• rules – constraints

• issues (problems) and opportunities

Page 63: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.63

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Building the storyboard

1. Draw 5 "bubbles"

2. Fill in the last (your "closer" - the purpose)

3. Fill in the first (your "hook")

4. Fill in the middle ones (the "body") –add or subtract bubbles as needed

5. Allocate details to bubbles

6. Iterate until it flows and builds properly

Only include detail that matters!

����Making thesechanges willbe difficult

So methodsfor building

have changed

Thereforesystems have

changed

But it is vitalto our

survival

Businessissues have

changed

Operational to informational

Distributed, component-based

Cross-functional

Details

Presentations – it’s a story, so storyboard it

Used to evaluate merit and sequence

Presentation should flow like a story

• does it make sense?

• does it build to the conclusion?

Page 64: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.64

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

How not to present a data model

� Using visual cuesconsistently

� Having a starting point

and direction

� Abstracting

� Masking

unnecessary detail

� Highlighting

what matters

Our models should aid

understanding by:

“Let’s start here with

Special Tax Rate Variation Comment Type…”

Page 65: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.65

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Start simple, and add details in layers…

• begin with two or three fundamental things

• work “across” the model, not a “deep dive” in one area

• draw the model on a whiteboard as you speak to it

• save detail like optionality until later, and primary/foreign keys until much later

Speak exclusively in the language of the business

• don’t use terms like “entity”, “optionality”, etc.

• point to the relevant entity while addressing a concept

Back it up with sample data, queries, and scenarios

Identify specific business issues or opportunities, and show how the data model helps

We’ll now walk through a successful data model presentation, followed by discussion of key points

Presenting data models

Page 66: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.66

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Presenting – some specifics

� Draw it on a whiteboard while you present it, even if you have a laptop presentation. “If it’s too complicated to draw, it’s too complicated to present.”

� Draw it top down, adding a few entities at a time.

� Constantly illustrate the model with sample instances, definitions, schematics, etc.

� Regularly highlight features and constraints of the model, in business terms. E.g., Currently we can allocate a Product to one Product Category, but this model enables us to allocate a Product to multiple Product Categories at a time, and to record changes in categorization over time.

� Encourage participation –the more questions and comments, the better!

Page 67: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.67

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

The five techniques that really matter

Technique

1Organize their

minds to receive the presentation

Do it live3

Present

information in various forms

4

Show, then tell5

Why?

• Otherwise, you're just "noise"

• "Why is this person telling me these things?"

• Focuses, demands that they watch

• Involves them / you• It means 'attending

has value‘

• Adds interest• Different forms have

different strengths

• Point is more meaningful if experienced firsthand

• Saves time, simplifies

How?

• "Here's the point I want to make."• "This is why you care, and how I know."

(even if it's obvious)• "These are the caveats and limitations."• "This is how I'll make my point.“ (storyboard!)

• Use memory triggers, not a script• Build up content progressively on white board,

flip chart, or screen• Add brainstorming, discussion, or questions• Have them physically “do stuff”

• Supplement PowerPoint slides with flip charts, white boards, Post-Its, handouts, etc.

• Use props – the thing itself, not a description• Use visual, auditory, and kinesthetic

approaches

• Scenario / example first, then concept / abstraction

• Problem first, solution second• Thing first, description / discussion second

Big picture first2• Provides context and

perspective• Makes subsequent

detail understandable

• Show contextual data model first, build up detailed models later

• Process context first, process flow later• Describe 5 problem areas first,

specifics of each area later

Page 68: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.68

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Framework Layer

ProjectCharter

Business Objectives

The mission, strategies (customers / markets, products /

services, differentiators), goals, objectives, and measures

(e.g., Key Performance Indicators) for the organisation.(MSGO – Mission, Strategies, Goals, Objectives)

What it covers… The Technique

Workflow

modellingBusiness

Process

The activities the business carries out in order to meet its objectives. Includes the actors involved, the sequence of

steps they carry out (workflow), and the result(s) produced Provides context - a framework for developing Use Cases

and Service Specifications.

Use CasesPresentation

Services

A mechanism through which an actor in a business process

interacts with a system. Usually a GUI (graphical user interface) and reports, but could involve scanners, IVR

(telephone) systems, etc.

Service

Specification

Business

Services

A “service” offered by a system – a specific function.

Includes the business rules and data updates it is

responsible for. Requires Event Analysis, State Transition Analysis, etc.

Data

Management

Services

Data

modelling

Files and databases that provide a system’s record-keeping

functions. Determines the things a system “knows” about, and the data that is maintained about those things.

Provides a platform - language and structure for developing

Use Cases and Services.

Data Modeling in context with other BA techniques

Go

als

Ap

pli

ca

tio

nP

rocess

Data

THIS IS NOT A SEQUENCE!!! There should always be an initial emphasis on defining objectives (the

“top” layer) and also a “scope level” statement of the business processes, application functions, and data

topics / subject areas that are in scope. Also, we always do some “guerrilla” data modelling during

which we at least clarify the primary terms and definitions, and ideally develop at least an initial

conceptual model. After that, you could choose to go through the layers in whatever order makes sense

given the situation.

The benefits:

• Divide and conquer

• Everything in its place Business Services

• Cross-validation

Other terms:

• Presentation Services = User Interface

• Business Services = Application Logic or Business Logic

• Data Management Services = Persistence Services

Page 69: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.69

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Use Case

actor – verb – noun:

Advisor Enrolls Student

Linkages – top-down and bottom-up

When advisor enters five characters of Last Name

Then System lists matching Students

Output Message:Result Code

Enroll Student

Verify Student Status

Check Student pre-reqs

Check Section availability

Create Enrollment

Input Message:Student NumberCourse ID

Section ID

enrolls in

offersteaches

Student

Number

Name

GPA Section

Dates

Times

Locations

Instructor

ID

Name

Rating Code

Course

Department

Number

Registrar’sOffice

Department Advisor

Print

Student Summary

Report

Attach Reg

Form and forward

Check Reg

Form for data

changes

Enroll

Student

Service

verb – noun:

Enroll Student

Entity

noun:

Student

Business

Process

Presentation

Services

Business

Services

Go

als

Ap

pli

ca

tio

n

Business

Objectives

Data

Management

Services

Business

Objectives

Pro

cess

Data

When advisor selects list itemThen System displays expanded Student view

When advisor etc.

Each layer interacts with its neighbor.

Not all methodologies address each perspective equally well.

• Information Engineering was weak to non-existent in addressing the business process (workflow)

and presentation (use cases) layers

• Most O-O and RAD/JAD techniques don’t address business process well, if at all

Noun - A thing of interest

• “Customer”

Verb – Noun

• An activity that must be performed (process, sub-process, service, …)

• “Register Customer”

Actor – Verb – Noun

• A Use Case or a step within a workflow model

• The intersection!

• “Sales Rep Registers Customer”

Page 70: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.70

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Business

Process

Presentation

Services

Business

Services

Progressive detail for all analysis techniques

Project Charter:

Starts at “Scope” level, may evolve

Scope DetailConcept

Overall Process Map

showing target and

related processes.

Process “framed,” and

initial assessment and

goals stated.

List of the main Use

Cases in the form:

Actor + Service +

(optionally) Technology

/ Platform

List of main Events and

corresponding

Services.

Contextual Data Model

(optional) and a

glossary defining the

main entities and other

important terms.

As-is (and later, to-be)

Workflow Models for

the process’ main

variations (cases) to

the Handoff level.

Initial Use Case

description (goal,

stakeholder interests,

and use case abstract)

for each Use Case.

Initial Service

description - result,

main actions, cross-

referenced to

Conceptual Data

Model

Conceptual Data Model

showing main entities,

relationships,

attributes, and

constraints

As-is Workflow Models

to the appropriate detail,

and to the Service level

for to-be. Optionally,

document procedures

for manual to-be steps.

Use Case dialogues at

the “clause” (“when-

then) level of detail

including alternate

sequences. Optionally,

Use Case Scenarios.

Each service fully

documented, including

input/output messages,

validation, business

rules, and data updates

to the attribute level.

Fully normalised

Logical Data Model

with all attributes fully

defined and

documented.

Workflow

modelling

Use Cases

Service

Specification

Data

modelling

Go

als

Ap

pli

ca

tio

n

Business

Objectives

Data

Management

Services

Business

Objectives

Pro

cess

Data

SpecifyUnderstand

Clariteq business analysis framework

Plan

Three levels of detail for ALL modelling

The reason that the “concept” level is important, and that we don’t dive right into the “detail” level is

that…

the level of precision, rigor, and detail that you need in order to build something

is far greater and different in nature than that which is necessary for the business person to know if

they’re going to like what you build!

Page 71: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.71

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Business

Process

Presentation

Services

Business

Services

Different roles for different perspectives

Project Charter:

Starts at “Scope” level, may evolve

Scope DetailConcept

Overall Process Map

showing target and

related processes.

Process “framed,” and

initial assessment and

goals stated.

List of the main Use

Cases in the form:

Actor + Service +

(optionally) Technology

/ Platform

List of main Events and

corresponding

Services.

Contextual Data Model

(optional) and a

glossary defining the

main entities and other

important terms.

As-is (and later, to-be)

Workflow Models for

the process’ main

variations (cases) to

the Handoff level.

Initial Use Case

description (goal,

stakeholder interests,

and use case abstract)

for each Use Case.

Initial Service

description - result,

main actions, cross-

referenced to

Conceptual Data

Model

Conceptual Data Model

showing main entities,

relationships,

attributes, and

constraints

As-is Workflow Models

to the appropriate detail,

and to the Service level

for to-be. Optionally,

document procedures

for manual to-be steps.

Use Case dialogues at

the “clause” (“when-

then) level of detail

including alternate

sequences. Optionally,

Use Case Scenarios.

Each service fully

documented, including

input/output messages,

validation, business

rules, and data updates

to the attribute level.

Fully normalised

Logical Data Model

with all attributes fully

defined and

documented.

Workflow

modelling

Use Cases

Service

Specification

Data

modelling

Go

als

Ap

pli

ca

tio

n

Business

Objectives

Data

Management

Services

Business

Objectives

Pro

cess

Data

SpecifyUnderstand

Note – this is just one possibility for roles.

Plan

Planners,

Enterprise Architects,

and Business

Analysts

Business

Analysts

Specialist

Specialist

Specialist

Specialist

One a smaller project, the same person might work on all perspectives at all levels of detail; the larger

the project, the more likely it is that different, specialized roles will be involved.

Page 72: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.72

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Other perspectives improve data modeling

Business Process Workflow

Business Services

Presentation Services

Data Management Services

�similar to use of events or services

� inspect each step in the workflow, discuss data needs

� is the necessary data in the data model?

�develop use cases, describe reports & queries

� is the necessary data in the data model?

�describe rules for an event (service)

� is the necessary data in the data model?

�get some real data, conduct data profiling

�does the data have a home, did profiling uncover “hidden” needs?

Mission, Strategy, Goals, Objectives

� reporting requirements�EIS, BI, OLAP, etc. needs. Is the data there?

Page 73: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.73

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Techniques and methodologies

The same techniques are used in different sequences, with different emphasis, in different methodologies

Always start with

• scope and objectives (Project Charter)

• agreement on a fundamental vocabulary (a little Data Modeling)

Small projects are often best handled “inside-out” and are more suitable for “Agile” techniques

• start by identifying the main objects the system will deal with (Data Modeling)

• then identify the events and services that act on the main objects (Events, Service Specifications,

State Transitions)

• then identify how these Services will be invoked (Use Cases, then overall Process Workflow)

Large projects are best handled “outside in” and aren’t suitable for all Agile techniques

• start with an understanding of the overall workflow and the jobs or departments that are involved

(Process Workflow)

Page 74: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.74

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

State diagrams

The concept

� Events happen

� Whether or not that event is legitimate depends on the current entity state

� If the event is legitimate, one or more entities will be updated and their state may change - a state transition

Depicts the allowable states for an entity, the transitions between them, and the rules governing those transitions

No other style of diagram depicts so many important aspects of a system without getting unreadable.

A State Diagram encompasses:

• an entity

• events

• entity states

• allowable state transitions (business rules!)

Page 75: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.75

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

The basic pattern

Section Available Filled Closed

Student

enrollsSection is

scheduled

Scheduled

Time to

open enrollment

Time to

finalize rosters

Completed

Time to

end term

Student

drops/transfers Cancelled

Section is

canceled

Starts with an

entity occurrence

in the null state.

Leaves when the

occurrence is

created

States are entered and

left in response to

events. All states

"matter", and are

mutually exclusive

Eventually, states

are entered where

no further update is

possible

Key Point

• The diagram is linear or circular

All entity state diagrams begin with the entity in the null state, and the first event is always something

that causes the creation of the entity occurrence.

An entity can be in one and only one state at a time - states are mutually exclusive. The most common

error when people are learning this technique is to come up with “overlapping” states.

It’s common to return to the null state if the entity occurrence is deleted, although this example doesn’t

show it (the Registrar saves everything!).

All states “matter” in the sense that the only reason for a state to exist is to enforce a business rule. For

instance, it appears that Students can’t drop or transfer once the Class is “Closed”, and the Class can’t

be cancelled. If these rules weren’t in place, we wouldn’t need the state “Closed”.

Note that this example is different from the one on the previous page, even though they’re for the same

entity – the reason: different business rules.

Page 76: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.76

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Why bother?

�Get up-front agreement on the rules that must be

enforced at UI (use cases)

and Business Services

(service specifications)

� Integrates events, services,

and data modeling

�Understandable –participate in important

systems decisions

� “See” and assess rules for

the first time

� Identify inconsistent or

undefined rules

- Systems Perspective -- Business Perspective -

Key Point

• Clients get started with almost no explanation

… this may seem like extra work, BUT…“pay me now, or pay me later”

The state diagramming technique, in practice, is quite intuitive for clients to pick up. We’ve been at

many sessions where the facilitator drew a simple state diagram on the whiteboard and clients

immediately started discussing and correcting it with no explanation whatsoever of the technique.

It never fails to amaze (and amuse) us how many different versions of “the rules” there are in the

average organization. Naturally, everyone thinks their set of rules is correct, and they are usually

surprised at the alternatives.

Page 77: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.77

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Four basic structures

Probation

Active

Disability

Employment

Term

Inactive

Employee is

hired

Employment

Term is

Purged

1. null state

2. state

3. state transition

4. event

Employee goes

on disability

Employment is

terminated

Probation term is extended

Employee is put on Probation

Employee returns

from disability

Employee passes

probation

This example is circular, which is less common now – it gets quite awkward.

Can you spot the error?

Page 78: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.78

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Components: 1 - The Null State

entity

exists

entity in

null state

Create (birth)

Delete(death)

Update(pay taxes)

The entity in a state of non-existence (hasn’t been created yet)

Indicates which entity’s life cycle is depicted

The simplest life cycle

�An occurrence which

hasn’t been created

�For a single instance

of the entity

In the UML, the state diagram begins with a solid (filled in) circle.

Page 79: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.79

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Components: 2 - States

A distinct stage in the life of an entity

� A status or condition� Events are only valid against

particular states

� The only reason a state is created is to enforce a business rule

� States are mutually exclusive

Order shippedtaken

� An order can’t be cancelled once it has been shipped, so we only need the states “Taken” and “Shipped”

Order taken picked loaded shipped

� An order can be cancelled without penalty if picked, with penalty if loaded, and

not cancelable if shipped

State

� May be determined by

inspecting relationships or

attribute values

� Usually summarized in a “Status” or “State” attribute

Page 80: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.80

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Components: 3 - State Transitions

� Shows pre-conditions -which state(s) an event is valid against

“from”

“to”

event

A change of an entity instance from one state to another

Depicts dependencies of entity states

� Shows post-conditions -

which state(s) result from an event

Key Point

• Visual business rules

Page 81: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.81

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

State transitions - special cases

conjunction

Enrolling

ClassSchedule Class

Purge Class

Filled

bifurcation

Cancelled

Enroll Student

CancelClass

Completed

CompleteClass

“recursive”“simple”

� An event may be valid from multiple

states with the same resultant state

� From a given state an event can have

different outcomes

- Conjunction -- Bifurcation -

A

C

B

C

B

A

Bifurcation often occurs at “boundary condition” of repetitive operation.

e.g., Enrollment is completed until class is full.

Page 82: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.82

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Components: 4 - events / services

Enrolling

ClassClass is scheduled

Filled

Cancelled

Enrollment is completedClass is

canceled

Completed

Class isCompleted

Enrolling

ClassSchedule Class

Filled

Cancelled

Complete EnrollmentCancel

Class

Completed

CompleteClass

Events or services can be shown as the cause of the state transition

State analysis is an ideal “bottom-up” means ofdiscovering additional services

Key Point

• You can show events or services or both

Page 83: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.83

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Perform state analysis forall Kernel and major Associative entities

Subtypes may all be covered by theSupertype’s life cycle

Subtypes may each have their own unique life cycle

Type and minor Characteristic entitieswith a simple “Create-Update-Delete” life cycle

may not warrant a diagram

Client

Claim

Policy

Prior

Address

Policy

TypeHome

Individual Marine

AutoGroup

Guidelines – a diagram for each entity?

Page 84: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.84

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Guidelines – an event can affect multiple entities

� An event affecting a characteristic or associative entity is often constrained by a parent’s state(and vice versa, less often)

� A event changing the state of an entity may also cause a state change in parent or child entity

Enrollment is completed is constrained by the state of a parent entity…

Enrolling

ClassSchedule Class

Filled

Complete Enrollment

Active

EnrollmentComplete Enrollment

Class

Enrollment

Student

… and also causesa state change in itsparent’s life cycle

Key Point

• Start ST analysis at the “bottom” – with entities that have no dependents

Class and Enrollment each have their own life cycles, but they are related

Page 85: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.85

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Building a state diagram

� Get event list for entity

� Brainstorm for valid states

� Select “mainstream” states.

� Start at null state, then select initial state from list

� Ask “What typically happens next?”, and select next state

� Continue until initial State Diagram is done

- First Cut -

� Ensure that states are mutually exclusive

� Identify the event for each state transition

� Ask “Can it cause transitions to or from other states?”(e.g., conjunction or bifurcation)

� Check each event see if it is constrained by or affects the state of parent or child entities

� If sub-types are involved, check whether the state diagram works for all sub-types

- Refine -

Key Point

• Mainstream first, exceptions later

• “Bottom up” - dependents first, parents later

Key Point

• Extremely iterative within and between state diagrams

1

2

� Add remaining “non-mainstream”states or events

� Check each event against each state

� Eliminate unused stated & events as appropriate

- Complete - 3

Key Point

• Lots of detailed cross checking

Page 86: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.86

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

A checklist for state analysis

Every state must matter 1� Recognizable to business people

� Restricts operations in some unique way

All states must be mutually exclusive 2

Each event is “essential”3� e.g., “Enrollment is completed” (what)

not “Student enrolls via web (who and how)

Start with the “most dependent” entity (bottom of the data model) to guard against “overloading” life cycles

4

All states (including parent and child entities) checked against each event 5

Mainstream first…. exceptions later! 6

Page 87: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.87

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Update service specs

Create new services for any newly-discovered events.

For each service, build a “state table” summarizing “from”and “to” states for each entity impacted by the event.

Refine validation, calculations, and updates in service documentation. Optionally, describe logic with a UML Activity Diagram or other format.

1

2

3

Entity State Before

State After

Student Registered Registered

Enrollment (“from”) Active Ended

Class (“from”) Filled, Available Available

Enrollment (“to”) Null Active

Class (“to”) Available Available, Filled

Page 88: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.88

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Exercise: Handling state transitions

1) Design a generalized data model to record valid state transitions. If a particular response is required (such as an error message) when an invalid event

arrives, be sure to handle that as well.

2) (Optional) It can provide useful analytic information

to maintain a history of state changes for the instances of important entities. For example, in the actual project that the stock exchange exercise earlier in the course was based on, it was useful to

have a history of state changes for the “Listing” and “Trade Order” entities. Develop a data model to record a history of state changes for an entity like “Listing”

Page 89: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.89

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Solution - Valid state transitions

Page 90: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.90

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

A few additional slides

I’ve added a few slides from our introductory Data Modeling workshop covering:

- Attribute naming with classwords

- Some conventions for assigning meaningless (surrogate) primary keys

- Checking for transitivity

These are some of the topics that often require clarification during the Advanced Data Modeling workshop.

Page 91: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.91

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Apply attribute naming conventions

Naming format: entity name (implied) + optional qualifiers + classword

Class Word Abbrev. Description

Amount AMT Dollars and cents, or other currency (e.g., Penalty Assessed Amount)

Code CDE Decodes into a name and/or description via lookup (e.g., Vehicle Type Code)

Constant CNS A fixed value, usually numeric (e.g., Pi Constant – 3.1415…)

Count CNT Like Quantity, but specifically for a quantity of items (e.g., Requested Count or On Hand Count)

Description DSC Multi-line descriptive text (e.g., Incident Description)

Date DTE YYYY/MM/DD (e.g., Incident Date)

Identifier ID or IDN Attribute that uniquely identifies an entity occurrence, usually system-generated (e.g., Customer ID)

Indicator or Flag IND or FLG Yes/No (True/False) attribute (e.g., Time Period Available Flag)

Name NME Single line of name text (e.g., First Name or Last Name)

Number NMB A unique identifier assigned by an organization (e.g., Driver License Number)

Secondary ID SID Forms a unique identifier when combined with identifiers inherited from the parent (e.g., Dependent SID)

Percent PCT Integer or number percentage (e.g., Penalty Percent)

Quantity QTY A count of anything – either items (like Count) or of a unit of measure like gallons or feet. (e.g., Maximum Width Feet Quantity) Variations are Volume (VOL), Length (LNG), or Area (ARE)

Rate RTE A ratio using defined numerator and denominator (Percent is a Rate attribute with a numerator of 100) (e.g., ???)

Text TXT Multi-line alphanumeric data other than Name or Description (e.g., Standard Disclaimer Text)

Time TME HHMMSSNN… to the needed fraction of a second (e.g., Incident Time)

Timestamp TMS Date and time in a single attribute (e.g., Record Creation Timestamp) (e.g., Record Creation Timestamp)

There are a variety of naming formats in general use - mixed case with words separated by blanks (e.g.

“Effective Date.”) is the most readable

There are certain date-related attributes that will occur many times in all models, such as “Effective

Date”, “End Date”, “Create Date”, “Superseded Date”. Agree on standard names (e.g., choose

“Effective Date”, “Start Date”, or “Begin Date”) and then use them consistently.

Attribute definition should explain the meaning and purpose of the attribute - in other words, how to

interpret attribute values. Not:

• … a restatement of the attribute name. For instance, for “Person Social Security Number”, the

definition “The Social Security Number of a Person” tells us nothing new. A better definition

would be “ A number issued to wage earners by the Social Security Administration for the purpose

of crediting employees with contributions to future retirement pay as stipulated in the Federal

Insurance Contributions Act.”

• … a description of how the attribute is handled by current systems. For instance, “Budget Center

Code is an 11 character code captured in the GL system and assigned to a Department.”

Page 92: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.92

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Primary keys – essential concepts

What they are… What they’re not…

�One or more attributes with a unique value for each instance of an entity

�There might be many identifiers - one is chosen as the primary identifier, the rest are alternate

�A way to reference an instance of an entity (e.g., a row of a table)

�Used to establish relationships between entities (or tables)

�The only access or search path

�The fundamental way the business distinguishes:

• one instance from an other

• a new instance from existing(e.g., Customer applying for credit)

In short, how we relate entities is not necessarily how the client distinguishes

or accesses them

Customer:Possible keys:• Customer Name +

Postal Code• Sales Region +

Customer Number• Account Number

Part:Possible keys:• Part Category +

Manufacturer Prod #

Employee:Possible keys:• SIN or SSN• Name + Address• Name + Birthdate• Portrait + Voice

Reservation:Possible key:• Room Number +

Start Date

Assigning primary and foreign keys is really part of physical database design, but the concepts are

important so we’ll cover them here.

As modelers, we should focus initially on determining how the client determines the uniqueness of

entities, and how they search for particular instances.

What’s wrong with the possible primary keys shown above?

Page 93: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.93

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Meaningless primary keys

Almost invariably eliminates any choice except keys made up frommeaningless, system-generated ID or Secondary ID (SID) components

Customer:• Customer ID…is better than…• Customer Name +

Postal Code• Sales Region +

Customer Number• Account Number

Part:• Part ID…is better than…• Part Category +

Manufacturer Prod #

Employee:• Employee ID…is better than…• SIN• Name + Address• Name + Birthdate• Portrait + Voice

Reservation:• Reservation ID…is better than…• Room Number +

Start Date

stable (unchanging)

� under your control

� contains no meaningful data, because it will eventually change(and no “special values” like Customer Number 9999999)

� 'key hierarchy' is unchanging when an inherited key is used as part of identifier

available

� known, or can be assigned, at instance creation

Essential characteristics

Key problems:

• embedded meaning

– Customer 99999

– Customer ID with Head Office Region Code built in

• insufficient for expansion

– 1 digit code field

Page 94: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.94

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

• A means of specifying aparticular instance of an entity

• Typically

� Kernel - a system assigned ID

� Characteristic - the key of the parent plus an SID

� Associative - the key of all parents, plus an SID if necessary(if the same parent instances can be associated multiple times)Important associatives are often given their own ID (e.g., Order ID)

� Reference or Type – a recognizable Code or a meaningless ID

Keys - summary

OrganizationUnit Job

PositionBuilding

Employee

Employee ID

Name

Address

Birth Date

Gov’t ID Number

The Primary Key is shown above the dashed line

Job Code (PK)

Title

Description

Alternate Key

Org. Unit ID

Position SID

Building ID (FK)

Job Code (FK)

Org. Unit ID

EmployeeDependent

Employee ID

Emp. Dep. SID

Name

Relationship Code

Birth Date

Building ID

Name

Address

is located at

is assigned to

is the location of

is filled by

Employee ID is an inherited key that forms part of the primary

key of Employee Dependent in combination with the SID

(Secondary ID). It also acts as a

foreign key.)

An alternate method of

showing that the identifier of Job

is Job Code

Building ID is a foreign key

that implements the

relationship to Building

is contained in

contains

classifies

is classified by

There can be many “candidate” or “alternate” keys, also referred to as “business identifiers” or “natural

keys”

• for instance, Employee may have a unique Government ID Number, Employee Number, and

System Logon ID

• one of these could be chosen as the Primary Key, if they meet the criteria; otherwise (normally)

assign a system-generated identifier

• the rest are called Alternate Keys or something similar, and must also be unique (put a unique index

on them)

Some methods use a “shorthand” technique for showing inherited keys in associative or characteristic

entities - the relationship via which parent keys are inherited is marked as an “identifying” relationship.

In one technique, an “I” is put across the relationship line, and in another, identifying relationships are

drawn with a solid line, while others (non-identifying”) are drawn with a dashed line. Normally, we

show the complete, inherited primary key.

Page 95: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.95

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Key propagation rules

An exception - dependent entities (associative or characteristic) are assigned a meaningless ID if they

can be “transferred” to another parent, or if they are very deep in the hierarchy.

Also, if an associative entity only has one parent (e.g., “Order”, where the connection to the other

parent is via another dependent associative entity) it may get its own meaningless ID. This is often true

of associatives that represent an important transaction and are therefore almost like Kernels, e.g. Order,

Sale, Contract, Shipment, etc.

Note - keys always propagate to the “many” end of the relationship. How would you decide where to

place the foreign key in a fully optional 1:1 relationship?

Whether you show the propagated foreign keys on your diagram, or instead flag relationships as

“identifying” is a matter of personal preference or organizational standards. In this workshop, we’ll

always show the propagated foreign keys.

Page 96: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.96

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

How far to go?

Each of the above alternatives employs the concept of “meaningless identifiers”, but differently

• the one on the left assigns an ID to kernel entities, while associative and characteristic entities

inherit the ID of their parent(s)

• the one on the right assigns all entities a unique ID

In teams, discuss the relative strengths and weaknesses of the two approaches. Which would you

choose?

Page 97: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.97

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Transitivity

� A “loop” (two or more paths between a pair of entities) might indicate a problem -

• if the two paths record the same information, one of the relationships is redundant

a.k.a. “transitivity” or “a transitive relationship”

• like redundant attributes, redundant relationships introduce data integrity problems

� Are the two paths between “Order” and “Customer” transitive?

We can’t tell just by looking…

� The presence of a “loop”(a “cyclic relationship”) is only a

clue that there is a problem –for proof, we must perform

Information Loss Analysis(fancy name, simple method)

Page 98: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.98

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Checking for transitivity

� Check for transitivity using

Information Loss Analysis -

• one at a time, check each relationship in the loop

• Ask –“Could this relationship be eliminated without losing necessary information?”

• If “Yes” –The relationship is redundant, and can be removed from the data model

• If “No” –The relationship is necessary, and remains in the data model

� If the two paths have clearly different meanings, there is probably no redundancy,

and therefore no need to apply Information Loss Analysis

Page 99: ADM Extract v9.2

Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.99

ClariteqADM extract

© 2010 Clariteqcontact [email protected]

Transitivity - examples