This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
� My book: Workflow Modeling, Second Edition (A complete rewrite of the first edition, not just a minor
refresh)
� Microblog: www.twitter.com/alecsharp
� Data Modeling blog:
www.erwin.com/expert_blogs/authors/22/Alec’s bio:
Alec Sharp, a senior consultant with Clariteq Systems Consulting, has deep expertise in a rare combination of fields – business process analysis and redesign, application requirements specification, and data modeling. With almost 30 years of hands-on consulting experience, his practical approaches and global reputation in model-driven methods have made him a sought-after resource in locations as diverse as Ireland, Illinois, and India.
He is also a popular conference speaker, mixing content and insight with irreverence and humour. Among his many top-rated presentations are “The Lost Art of Conceptual Modeling,”“The Human Side of Data Modeling,” “Crossing the Chasm - From Process Model to IT Requirements,” and “Getting Traction for Process – What the Experts Forget.”
Alec literally wrote the book on business process modeling – he is the principal author of “Workflow Modeling: Tools for Process Improvement and Application Development, Second Edition” The first edition was published in 2001, and the second edition was published in 2009. It has consistently been the top-selling title on business process modeling, and is widely used as a consulting guide and as an MBA textbook.
Alec’s popular workshops on Workflow Process Modeling, Data Modeling (introductory and advanced,) and Requirements Modeling (with Use Cases and Services) are conducted at many of the world’s best-known organizations. His classes are practical, energetic, and fun, with the most common participant comments being “best course (or best instructor) I’ve ever had.”
Workflow Process Modeling – Defining, Mapping, and Analyzing Business Processes 2 days
Business processes matter, because business processes are how value is delivered. Understanding how to work with business processes is now a core skill for business analysts, process and application architects, functional area managers, and even corporate executives. But too often, material on the topic either floats around in generalities and familiar case studies, or descends rapidly into technical details and incomprehensible models. This workshop is different – in a practical way, it shows how to discover and scope a business process, clarify its context, model its workflow with progressive detail, assess it, and design a new process. Everything is backed up with real-world examples, and clear, repeatable guidelines.
Data Modeling – A Business-Oriented Approach to Entity-Relationship Modeling 2 days
Data modeling is critical to the design of quality databases, but is also essential to other requirements techniques such as workflow modeling and requirements modeling (use cases and services) because it ensures a common understanding of the things – the entities – that processes and applications deal with. This workshop introduces entity-relationship modeling from a non-technical perspective, provides tips and guidelines for the analyst, and explores contextual, conceptual, and detailed modeling techniques that maximize user involvement.
Requirements Modeling – Proven Techniques for Use Cases and Service Specifications 2 days
Use cases have offered great promise as a requirements definition technique, but many analysts get disappointing results. That’s because published methods are often inconsistent, complex, or focused on internal design. This unique workshop clears up the confusion. It shows how to employ use cases to discover external requirements – how users wish to interact with an application – and how to use service specifications to define internal requirements – the validation, rules, and data manipulation performed behind the scenes. Better yet, it shows in concrete terms how the two perspectives interact, and demonstrates synergies with data modeling and business process workflow modeling.
Advanced Data Modeling – Communication, Consistency, and Complexity 2 or 3 days
After gaining some practical experience, data modelers encounter situations such as the enforcement of complex business rules, handling recurring patterns, satisfying regulatory requirements to capture complex changes and corrections, dealing with existing databases or packaged applications, integrating with dimensional modeling, and other issues not covered in introductory data modeling classes. This highly participative workshop provides approaches for many advanced data modeling situations, as well as techniques for improving communication between data modelers and subject matter experts.
Facilitation & Presentation – Session Techniques for Business Analysts 2 days
The primary approach for discovering and validating business requirements has shifted from one-on-one interviews to facilitated workshops. This began with JAD or “joint application development” sessions, and has now become the norm. Just as important as gathering information in a facilitated session are skills in presenting that information for validation and to inform a wider audience. While there are many general-purpose courses available on these topics, there is very little available that is specifically designed for the needs of the business analyst. This unique workshop will provide specific methods and techniques in both skills – facilitation and presentation.
Now available! Business Analysis Overview – Model-Driven Techniques for Processes, Applications, and Data 2 days
Essential content from Clariteq’s Process, Requirements, and Data Modeling workshops.
• A description of a business in terms of the things it needs to know about
• Things (Entities) and Facts about Things (Attributes & Relationships)
• “Real world”, not technical implementation
• Graham Witt – “A narrative supported by a graphic”
Customer definition:A Customer is a person or organization that is a past, present, or potential user of our products or services. Excludes the company itself when we use our own products or services, but includes cases where the Customer doesn’t have to pay (e.g., a charity.)
Plus “Assertions” (rules)- Each Order must contain one or more Order Lines (i.e., at least one Order Line)- Each Order Line is contained in exactly one Order- Each Order can contain at most one Order Line per Product
Key Point
Not the same as database design
What is a data model?
Customer
places
Order
Order Line
contained in specifies
Product
Entitya distinct thing of interestabout which the business must maintain information
RelationshipA named associationbetween two entities
AttributeA property of an entitythat can be expressedas a piece of data
IdentifierOne or more attributes that can be used to uniquelyspecify a single instance(only in detailed data models)
placed by
There are many ways to describe a business...
• How it works - Process Model
• How it’s organized - Organization Chart
• Where it operates - Location Map
and…
• What it needs to maintain records about - Data Model
Data modeling symbols will vary slightly among the different “dialects”, but the meaning is constant.
The symbols are much more standardized than they used to be.
Data Modeling involves:
Gathering knowledge from Subject Matter Experts (the hard part!)
Representing that knowledge using a set of standard symbols and conventions (the easier part!)
Reference or TypeIndependentClassifies or categorizes other entities and/or allows the recording of allowable values for a descriptive attributeDrawn diagonally out from or beside the classified entity
CharacteristicDependent on one parentRecords multi-valued facts about a parent entity that have been “cast out” from that entityDrawn below parent
AssociativeDependent on two or more parentsRecords facts about a relationship (association) between two or more parent entities – is often the resolution of a M:M relationship between the parentsDrawn between and below parents
KernelIndependentA fundamental thing of interest to the enterprise whose existence does not depend on any other entity – it can “stand alone”Drawn at the top of its area
Recursive relationshipA relationship between instances of the same entity. Can be 1:1, 1:M, or M:M
SupertypeContains facts (attributes and relationships) that are common to all instances of the entity. Any kind of entity can be a supertype.
SubtypeContains facts that are specific to a particular subset of instances of the entity.
• Multi-valued - the attribute can have multiple different values for one instance of the entity, either “at a time” or “over time”E.g., “Employee Name” if aliases or previous names are tracked
• move it down to the “many” end of a 1:M relationship into a characteristic entity
• if it’s a fact about a M:M relationship between entities, move it down to the “many” end of a 1:M relationship into an associative entity
• both move data structure into 1st Normal Form – 1NF
• Redundant - the same attribute value is recorded multiple times, in different entity instances, possibly inconsistently E.g., “Company Name” in a “Department” entity
• move it up to the “one” end of a M:1 relationship to one of the parent (or higher) entities (2nd Normal Form – 2NF)
• you might have to create a new parent entity where non existed before
• Constrained - a descriptive attribute needs to be restricted to a set of standardized values to improve integrity and reporting E.g., “Employee Type”
• move it out to the “one” end of a M:1 relationship to a reference or other related entity (3rd Normal Form - 3NF)
The progression from conceptual to logical is largely based on identifying and dealing with three attribute characteristics
For multi-valued attributes, ask “On what basis does the attribute repeat?” The answer should be in the
form “It occurs once per …” This will provide a clue as to what entity the multi-valued attribute should
be moved to.
Two variations of the same example:
- If a Resource has multiple Chargeout Rates over time, then the Chargeout Rate doesn’t vary in
relation to some other entity. We could say that the Chargeout Rate attribute repeats “within” the
Resource entity, so we’ll simply move it down (“cast it out”) into a characteristic entity called
Resource Chargeout Rate. It will need the attributes Effective Date and End Date in addition to
Amount.
- If a Resource has multiple Chargeout Rates, one per Project that the Resource is contracted to, then
we could say that the Chargeout Rate attribute repeats “in relation to” the Project entity. In other
words, we know that Chargeout Rate is a fact about the relationship between Resource and Project, and
belongs in an associative between them. That associative may depict a contract or agreement, and
might have the word “Contract” in its name.
Another example:
- If the attribute Expected Duration is in the Project entity, and it is multi-valued, with one value per
project phase, then Expected Duration should be moved down into a Characteristic (of Project) entity
called Project Phase. The Task entity would likely be a characteristic of Project Phase
When the multi-valued attribute is actually a fact about a relationship, we create an associative entity:
"When did John Smith enroll in Math 100?""What grade did John get at midterm?""What was his final grade?““What is the average grade for Math 100 Section 3?”
These required facts are not about Student, or Section, but the relationship betweena Student and a Section
We need to create a new associative entity
“Many to many” relationships will almost always get a “promotion” to an entity, as in the example
above, because there are usually attributes about the relationship that must be recorded.
This is a variation on putting data into First Normal Form.
We eliminate redundancy by ensuring that every attribute is in the entity that it describes, so that the attribute value is recorded only once.
• Before migration, attribute values about a Department would be recorded redundantly with every Course offered by that Department, so it is moved up to a parent entity.
• Before migration, values of the Delivery Method Description attribute would be carried redundantly in many instances of Course, so it is moved out to a “type” (or “reference” or “lookup” or “classification”) entity.
Eliminating redundancy puts entities into Second Normal Form if the redundant attributes move “up”
the parentage hierarchy, and into Third Normal Form if the attributes move “out” to a related entity
• Unnormalized (UNF or 0NF)� Contains a “repeating group”
• First Normal Form (1NF)� Repeating attributes moved down to Characteristic
or Associative entities
• Second Normal Form (2NF)� Only applies to dependent entities
� No attributes in a child entity are really facts about a parent (or grandparent or…)
� That is, no Characteristic or Associative entity redundantly contains facts from its parent(s) – if it does, move the fact(s) up(create a new parent entity if necessary)
• Third Normal Form (3NF)� If any entity redundantly contains facts from a
related (non-parent) entity, move the fact(s) out to the other entity (create a new entity if necessary)
• BCNF (Boyce-Codd NF)� Not an issue if you keep your wits about you
• Fourth and Fifth Normal Form (4NF, 5NF)� “Large” (3-way or more) associatives need to be
broken down into more granular entities
UNF
1NF
2NF
3NF
4NF, 5NF?...
Other normal forms – forget about it!
The reason we’re covering this? You have to be able to make it simpler for the data “layperson”
An “orderly script” –adding a new characteristic or associative entity to a logical model
1. Place the entity (and relationships) on the diagram according to dependency
2. Ask “What is one of these things?” then name and define the entity accordingly
3. Add relationship names, and add multiplicity (or confirm, if it was already specified)
4. Add attributes
5. Perform further attribute migration, dealing with multi-valued attributes first, and reference data last(1NF, 2NF, 3NF in sequence)… and only then worry about…
6. Relationship optionality
7. Primary keys or uniqueness constraints
8. Additional constraints (e.g., rules on date ranges)
Whenever you add a new entity
• check to see if attributes or relationships from nearby entitiesshould be moved to the new entity
• check that you haven’t introduced transitivity (clue: “loops”)
Consistency is very important to engaging your clients in the data modeling process. Have a method,
or have scripts – do the same things the same way, and draw the same things the same way. If you do
this, participants will learn modeling “by osmosis” and will learn what to expect. (E.g., that a M:M
• Non-atomic attributes:The attribute has “internal structure” - it could be decomposed into more granular (“atomic”) attributes. E.g., “Employee Address” is non-atomic, “Employee Address Street Name” is atomic – it is at the finest level of granularity that will ever be manipulated or displayed
• Semantically overloaded attributes:The attribute is “overworked” - it contains multiple differentattributes, typically encoded into a single attribute
• in the earlier days of systems, this was done deliberately by designers to save space (think of the Y2K problem…)
• now, it will more likely be done inadvertently by business people who don’t know the negative consequences of overloaded coding schemes
As the model nears completion, the entities have been made as granular (normalized) as necessary. Once the model meets known requirements, we’ll also “granularize”the attributes by finding and resolving the following:
Finally, name and define attributes, and document attribute properties
The distinction between non-atomic and semantic overload can be confusing:
A non-atomic attribute needs to be broken down into finer attributes, each of which is a “smaller” part
of the same overall attribute. See page 36 for more information and examples.
A semantically overloaded attribute also needs to be broken down, but into distinctly different
attributes as opposed to smaller pieces of the same attribute. See page 57 for more information and
examples.
Note – we don’t typically do this until after we’ve searched for, discovered, and satisfied outstanding
requirements using the techniques that we’ll look at shortly.
1. A description of which real-world things will be included in scope. This might be developed from a list of standard “thing types” – person, organization, request, transfer, item, location, activity, etc.
Be sure to identify specific inclusions or exclusions.
2. Illustrate with examples:
• 5 – 10 sample instances
• diagrams
• current “props” like reports or forms
3. Interesting points – anomalies, synonyms, common points of confusion, etc.
CustomerA Customer is a person or organization that is a past, present, or potential user of our products or services.
Current examples include Solectron (contract manufacturer,) Cisco System (OEM,) Arrow Electronics (distributor,) Best Buy (retailer,) M&P PCs (assembler,) and individual consumers.
Excludes the company itself when we use our own products or services, but includes cases where the Customer doesn’t have to pay (e.g., a charity.)
Entity definition format and example
CustomerWe have a variety of Customers that operate in multiple geographies, and these must be tracked in order to consolidate purchasing statistics and enable our rating process to identify our best Customers.
What sorts of relationships among the data are of interest? E.g., want to study sales by product color and customer, or by region and employee seniority.
What is the central thing (or things) of interest? Often a transaction or event entity with multiple
parents and classifications. E.g., a Sale
How will facts be organized? Usually an entity
related to the fact entity (a foreign key.) E.g., Employee, Customer, … May be hierarchic, e.g. Country, Region, “State”, …
What additional detail is needed? Facts have“measures” and dimensions have “attributes”. E.g., Sale units, total price, time of day, …
Identify calculations such as totals, average, or projection that should be pre-defined. E.g., average sale price, total sales per month,
The classic methodology
You may end up producing more than one star schema. Each will get collapsed into a single table
(named for the “fact”). Tables will then have to be joined (but these will be far simpler than what
would otherwise be necessary)
A few guidelines:
• Don’t try to get all your operational data perfect first, or you’ll never get anywhere
• Accept that after the data structure is in use, the questions will change. Embrace iteration.
• Manage the volume. Combining two “facts” (star schemas) into one table may cause exponential
volume increase. Focus initially on the critical measures and attributes.
• Start with a good, normalized data model that clearly shows dependency, as we’ll demonstrate in a
Jim’s sister-in-law June has just returned from a BI conference, and she has Jim all wound up about building a query database so he can analyze sales (purchases by customers.)
Construct a dimensional model for Jim, using the following E-R model as a starting point. At this point, don’t worry about individual attributes – just which entities would collapse into which fact or dimension. A few notes:
- Jim’s has grown to a nationwide chain, with stores in many regions. Most regions cover one or more states, although some regions only cover part of a state (e.g., Northern California and Southern California). Each store is in a single city, though, and each city is in only one region.
- The layout of stores (Sections, Aisles, Store Categories, etc.) varies widely across the stores.
- The “Store Category” indicates if the store is a mall location, streetfront, “captive” (contained within another retail outlet,) etc. Web sales are not a factor.
Jim is especially interested in how the same Title sells depending on where in the Store it is displayed, because the same Title might end up in different Sections. He also wants to look at Sales by Store, Region, Artist, Publisher, Supplier, Category, … well, just about everything! You’ll have to decide what’s possible, and then be prepared to explain it to Jim!
Attitude – “I’m here to do a job, not work a miracle”
DO -• Help develop objectives and plan• Enforce rules & plan• Maintain focus on topic• Press for completion and quality• Help everyone participate• Ensure recordingDON’T -• Develop content• Push a point of view
Sponsor
Facilitator Participant
• Participate!• Provide information• Suggest ideas• Make decisions
• Confirm scope and objectives• Determine and “invite” participants• Arrange other resources• Resolve difficult decisions
“Before I begin my speech, let’s cover a few of the basic rules of grammar. A noun is any... ”
“Before we begin our data modeling session, let’s go over some key points about data modeling. First, an Entity is any uniquely identifiable person, place, thing, event, concept, or organization of interest to the enterprise about which facts maybe recorded. Any questions? I didn’t think so…”
Don’t begin with a lecture on data modeling
Avoid starting with the theory and practice…
Data modeling sessions go better
Allows use of data modeling in non-typical situations
a) - Getting started bottom-up
If you can get away with it,
don’t even call it “data modeling”
Why not?
• “Purple monkey water wrench” – a phrase I saw in an article making the point that our IT terms
(foreign key, referential integrity, cardinality, …) aren’t any clearer to the client
• May lead to boredom and mental shutdown
• May lead to resentment and non-participation
• It’s unnecessary! Some things are easier to just do. Coaching basketball - initially, by example.
� Draw it on a whiteboard while you present it, even if you have a laptop presentation. “If it’s too complicated to draw, it’s too complicated to present.”
� Draw it top down, adding a few entities at a time.
� Constantly illustrate the model with sample instances, definitions, schematics, etc.
� Regularly highlight features and constraints of the model, in business terms. E.g., Currently we can allocate a Product to one Product Category, but this model enables us to allocate a Product to multiple Product Categories at a time, and to record changes in categorization over time.
� Encourage participation –the more questions and comments, the better!
1) Design a generalized data model to record valid state transitions. If a particular response is required (such as an error message) when an invalid event
arrives, be sure to handle that as well.
2) (Optional) It can provide useful analytic information
to maintain a history of state changes for the instances of important entities. For example, in the actual project that the stock exchange exercise earlier in the course was based on, it was useful to
have a history of state changes for the “Listing” and “Trade Order” entities. Develop a data model to record a history of state changes for an entity like “Listing”
Naming format: entity name (implied) + optional qualifiers + classword
Class Word Abbrev. Description
Amount AMT Dollars and cents, or other currency (e.g., Penalty Assessed Amount)
Code CDE Decodes into a name and/or description via lookup (e.g., Vehicle Type Code)
Constant CNS A fixed value, usually numeric (e.g., Pi Constant – 3.1415…)
Count CNT Like Quantity, but specifically for a quantity of items (e.g., Requested Count or On Hand Count)
Description DSC Multi-line descriptive text (e.g., Incident Description)
Date DTE YYYY/MM/DD (e.g., Incident Date)
Identifier ID or IDN Attribute that uniquely identifies an entity occurrence, usually system-generated (e.g., Customer ID)
Indicator or Flag IND or FLG Yes/No (True/False) attribute (e.g., Time Period Available Flag)
Name NME Single line of name text (e.g., First Name or Last Name)
Number NMB A unique identifier assigned by an organization (e.g., Driver License Number)
Secondary ID SID Forms a unique identifier when combined with identifiers inherited from the parent (e.g., Dependent SID)
Percent PCT Integer or number percentage (e.g., Penalty Percent)
Quantity QTY A count of anything – either items (like Count) or of a unit of measure like gallons or feet. (e.g., Maximum Width Feet Quantity) Variations are Volume (VOL), Length (LNG), or Area (ARE)
Rate RTE A ratio using defined numerator and denominator (Percent is a Rate attribute with a numerator of 100) (e.g., ???)
Text TXT Multi-line alphanumeric data other than Name or Description (e.g., Standard Disclaimer Text)
Time TME HHMMSSNN… to the needed fraction of a second (e.g., Incident Time)
Timestamp TMS Date and time in a single attribute (e.g., Record Creation Timestamp) (e.g., Record Creation Timestamp)
There are a variety of naming formats in general use - mixed case with words separated by blanks (e.g.
“Effective Date.”) is the most readable
There are certain date-related attributes that will occur many times in all models, such as “Effective
Date”, “End Date”, “Create Date”, “Superseded Date”. Agree on standard names (e.g., choose
“Effective Date”, “Start Date”, or “Begin Date”) and then use them consistently.
Attribute definition should explain the meaning and purpose of the attribute - in other words, how to
interpret attribute values. Not:
• … a restatement of the attribute name. For instance, for “Person Social Security Number”, the
definition “The Social Security Number of a Person” tells us nothing new. A better definition
would be “ A number issued to wage earners by the Social Security Administration for the purpose
of crediting employees with contributions to future retirement pay as stipulated in the Federal
Insurance Contributions Act.”
• … a description of how the attribute is handled by current systems. For instance, “Budget Center
Code is an 11 character code captured in the GL system and assigned to a Department.”
• A means of specifying aparticular instance of an entity
• Typically
� Kernel - a system assigned ID
� Characteristic - the key of the parent plus an SID
� Associative - the key of all parents, plus an SID if necessary(if the same parent instances can be associated multiple times)Important associatives are often given their own ID (e.g., Order ID)
� Reference or Type – a recognizable Code or a meaningless ID
Keys - summary
OrganizationUnit Job
PositionBuilding
Employee
Employee ID
Name
Address
Birth Date
Gov’t ID Number
The Primary Key is shown above the dashed line
Job Code (PK)
Title
Description
Alternate Key
Org. Unit ID
Position SID
Building ID (FK)
Job Code (FK)
Org. Unit ID
EmployeeDependent
Employee ID
Emp. Dep. SID
Name
Relationship Code
Birth Date
Building ID
Name
Address
is located at
is assigned to
is the location of
is filled by
Employee ID is an inherited key that forms part of the primary
key of Employee Dependent in combination with the SID
(Secondary ID). It also acts as a
foreign key.)
An alternate method of
showing that the identifier of Job
is Job Code
Building ID is a foreign key
that implements the
relationship to Building
is contained in
contains
classifies
is classified by
There can be many “candidate” or “alternate” keys, also referred to as “business identifiers” or “natural
keys”
• for instance, Employee may have a unique Government ID Number, Employee Number, and
System Logon ID
• one of these could be chosen as the Primary Key, if they meet the criteria; otherwise (normally)
assign a system-generated identifier
• the rest are called Alternate Keys or something similar, and must also be unique (put a unique index
on them)
Some methods use a “shorthand” technique for showing inherited keys in associative or characteristic
entities - the relationship via which parent keys are inherited is marked as an “identifying” relationship.
In one technique, an “I” is put across the relationship line, and in another, identifying relationships are
drawn with a solid line, while others (non-identifying”) are drawn with a dashed line. Normally, we