INTEGRATED INFORMATION SUPPORT SYSTEM (IISS) Volume V ... · Part 1 AD-A250 448 INTEGRATED INFORMATION SUPPORT SYSTEM (IISS) Volume V - Common Data Model Subsystem Part 1 - CDM Administrator's

WRDC-TR-90-8007Volume VPart 1

AD-A250 448

INTEGRATED INFORMATION SUPPORT SYSTEM (IISS)Volume V - Common Data Model SubsystemPart 1 - CDM Administrator's Manual

M. Apicella, R. Palumbo, S. Singh

Control Data Corporation D TICIntegration Technology Services LECT12970 Presidential DriveFairborn, OH 45324-6209 SMAY0 1992 U

September 1990

Final Report for Period 1 April 1987 - 31 December 1990

Approved for Public Release; Distribution is Unlimited

MANUFACTURING TECHNOLOGY DIRECTORATEWRIGHT RESEARCH AND DEVELOPMENT CENTERAIR FORCE SYSTEMS COMMANDWRIGHT-PATTERSON AIR FORCE BASE, OHIO 45433-6533

92-122229 2 1111 11 IfIII IIIlifI IIJI 111lii ;i

NOTICE

When Government drawings, specifications, or other data are used for any purpose otherthan in connection with a definitely related Government procurement operation, the UnitedStates Government thereby incurs no responsibility nor any obligation whatsoever, regardlesswhether or not the government may have formulated, furnished, or in any way supplied thesaid drawings, specifications, or other data. It should not, therefore, be construed or impliedby any person, persons, or organization that the Government is licensing or conveying anyrights or permission to manufacture, use, or market any patented invention that may in any waybe related thereto.

This technical report has been reviewed and is approved for publication.This report is releasable to the National TechnicalInformation Service (NTIS). At NTIS, it vill be

avail.ablt to the general public, includifg foreign nations

DA L. S N, Prdiect Manager DATE

Writ-Pa rsJ AFB, OH 45433-6533

FOR THE COMMANDER:

9RUCE A. RASMISEN, Chief DATEWRDCQMTWright-Patterson AFB, OH 45433-6533

If your address has changed, if you wish to be removed form our mailing list, or if theaddressee is no longer employed by your organization please notify WRDC/MTI, Wright-Patterson Air Force Base, OH 45433-6533 to help us maintain a current mailing list.

Copies of this report should not be returned unless return is required by securityconsiderations, contractual obligations, or notice on a specific document.

Unclassified

SECURITY CLASSIFICATION OF THIS PAGE

REPORT DOCUMENTATION PAGEla. REPORT SECURITY CLASSIFICATION lb. RESTRICTIVE MARKINGS

Unclassified

2a. SECURITY CLASSIFICATION AUTHORITY 3. DISTRIBUTION/AVAILABILITY OF REPORT

Approved for Public Release;

2b. DECLASSIFICATION/DOWNGRADING SCHEDULE Distribution is Unlimited.

4. PERFORMING ORGANIZATION REPORT NUMBER(S) 5. MONITORING ORGANIZATION REPORT NUMBER(S)LIM 620341001 WRDC-TR- 90-8007 Vol. V, Part 1

$, 6a. NAME OF PERFORMING ORGANIZATION b. OFFICE SYMBOL 7a. NAME OF MONITORING ORGANIZATIONControl Data Corporation; (if applicable) WRDC/MTIIntegration Technology Services

6c. ADDRESS (City, State, and ZIP Code) 7b. ADDRESS (City, State, and ZIP Code)2970 Presidential Drive

Fairbor, OH 45324-6209 WPAFB, OH 45433-6533

8a. NAME OF FUNDING/SPONSORING 3bE OFFICE SYMBOL 9. PROCUREMENT INSTRUMENT IDENTFNCATION.NUM.ORGAN IZATION (if applicable)

Wright Research and Development Center, F33600F87-C-0464Air Force Systems Command, USAF WRDC/MTI

10. SOURCE OF FUNDING NOS.8c. ADDRESS (City, State, and ZIP Code)

Wright- Patterson AFB, Ohio 45433-6533 PROGRAM PROJECT TASK WORK UNIT11. TITLE (Include Security Classification) -ELEMENT No. No. NO. NO.

See Block 19 78011 F 595600 F95600 20950607

12. PERSONAL AUTHOR(S)Control Data Corporation: Apicella. M. L., Palumbo, R., and Singh, S.

13a. TYPE OF REPORT 13b. TIME COVERED 14. DATE OF REPORT (Yr.,Mo.,Day) 15. PAGE COUNTFinal Report 1 4/1/87-12/31/90 fgOSeptember30 222

16. SUPPLEMEN, .-YNOTATION

WRDCMT Project Priority 6203

17. COSATI CODES 18. SUBJECT TERMS t Continue on reverse if necessary and identify block no.)

FIELD GROUP SUB GR.

1308 10905

19. ABSTRACT (Continue on reverse if necessary and identify block number)

This document is the Common Data Model Adminstrator's User Manual. Its purposes are several and include:

o Describing the philosophical and practical objectives of the CDM Administrator.o Discussing the CDM, its design, and its role in the IISS environment.o Descnbing the steps necessary to entenng and maintaining data kept in the CDM.

Block 11 - INTEGRATED INFORMATION SUPPORT SYSTEM (IISS)Vol V - Common Data Model Subsystem

Part I - CDM Administrator's Manual

20. DISTRIBUTIONAVAILABILIT tOF ABSTRACT 21. ABSTRACT SECURITY CLASSIFICATiON

UNCLASSIFIED/UNLIMITED x SAME AS RPT. DTIC USERS Unclassified

22a. NAME OF RESPONSIBLE INDIVIDUAL 22b. TELEPHONE NO. 22c OFFICE SYM11©0L(Include Area Code)

David L. Judson (513) 255-7371 WRDC MTI

EDITION OF 1 JAN 73 is OBSOLETEDD FORM 1473, 83 APR Unclassified

SECURITY CLASSIF'CATION OF TWHS !AGE

UM 62034100130 September 1990

FOREWORD

This technical report covers work performed under Air ForceContract F33600-87-C-0464, DAPro Project. This contract issponsored by the Manufacturing Technology Directorate, Air ForceSystems Command, Wright-Patterson Air Force Base, Ohio. It wasadministered under the technical direction of Mr. Bruce A.Rasmussen, Branch Chief, Integration Technology Division,Manufacturing Technology Directorate, through Mr. David L. Judson,Project Manager. The Prime Contractor was Integration TechnologyServices, Software Programs Division, of the Control DataCorporation, Dayton, Ohio, under the direction of Mr. W. A.Osborne. The DAPro Project Manager for Control Data Corporationwas Mr. Jimmy P. Maxwell.

The DAPro project was created to continue the development, test,and demonstration of the Integrated Information Support System(IISS). The IISS technology work comprises enhancements to IISSsoftware and the establishment and operation of IISS test bedhardware and communications for developers and users.

The following list names the Control Data Corporationsubcontractors and their contributing activities:

SUBCONTRACTOR ROLE

Control Data Corporation Responsible for the overall CommonData Model design development andimplementation, IISS integration andtest, and technology transfer of IISS.

D. Appleton Company Responsible for providing softwareinformation services for the CommonData Model and IDEFIX integrationmethodology.

ONTEK Responsible for defining and testing arepresentative integrated system basein Artificial Intelligence techniquesto establish fitness for use.

Simpact Corporation Responsible for Communicationdevelopment. Acoess10n For

ANNTIS '27?A&IDTIC TABI C1'UNP-mL3unced El

By-

Dt tl ai/on

iii ' K

UM 62034100130 September 1990

Structural Dynamics Responsible for User Interfaces,Research Corporation Virtual Terminal Interface,and Network

Transaction Manager design,development, implementation, andsupport.

Arizona State University Responsible for test bed operationsand support.

iv

UM 62034100130 September 1990

Table of Contents

Page

SECTION 1. INTRODUCTION ........................ I......1-11.1 Managing Data as a Corporate

Resource ............................ 1-1

SECTION 2. CDM OVERVIEW ............................. 2-12.1 The Fundamental Approach ................ 2-12.1.1 The Three Schema-Architecture ....... 2-12.1.2 Representation of the Three

Types of Schemas ........................ 2-72.1.3 Integration Methodology .................. 2-72.1.4 Contributions to IRRIASSPA .......... 2-102.2 Basic Components of the Design ....... 2-102.2.1 The CDM Database ....................... 2-112.2.2 CDM1 ................................. 2-112.2.3 The CDM Processor ........................ 2-12

SECTION 3. RESPONSIBILITIES OF THECDM ADMINISTRATOR ....................... 3-1

3.1 Establishing Data Standards ......... 3-13.2 Maintaining the CDM ..................... 3-13.3 Protecting the CDM ...................... 3-13.4 Facilitating Use of the CDM ........... 3-1

SECTION 4. MAINTAINING THE CONCEPTUAL SCHEMA .-. 4-14.1 Methodology Overview .................... 4-14.1.1 CS Structure ............................. 4-14.1.2 Basic Approach ......................... 4-34.1.3 Modeling Forms ........................... 4-44.2 Building the Initial CS ................ 4-154.2.1 Phase 0: Starting the Project ....... 4-154.2.2 Phase 1: Defining Entity

Classes ............................. 4-134.2.3 Phase 2: Defining Relation

Classes .............................. 4-204.2.4 Phase 3: Defining Key Classes ....... 4-224.2.5 Phase 4: Defining Nonkey Attribute

Classes ............................. 4-294.3 Expanding the CS ....................... 4-304.3.1 Phase 0: Starting the Project ....... 4-314.3.2 Phase 1: Defining Entity

Classes ............................. 4-334.3.3 Phase 2: Defining Relation

Classes .............................. 4-344.3.4 Phase 3: Defining Key Classes ....... 4-364.3.5 Phase 4: Defining Nonkey Attribute

Classes ............................. 4-46

v

UM 620341001

30 September 1990

Table of Contents

Page

SECTION 5. MAINTAINING THE CDM ..................... 5-15.1 Methodology Overview .................. 5-15.1.1 Generic NDDL Commands .................. 5-15.1.2 Transaction NDDL Commands ................ 5-25.2 Loading the Initial CS

Description .............................. 5-35.2.1 Loading Domains ...................... 5-75.2.2 Defining the Model ....................... 5-75.2.3 Loading Attribute Classes ................ 5-85.2.4 Loading Entity Classes .................. 5-105.2.5 Loading Key Classes and

Relation Classes ........................ 5-125.3 Modifying/Deleting CS Objects ......... 5-155.3.1 Domain Class Changes .................... 5-155.3.2 Model Changes/Deletes ................... 5-175.3.3 Attribute Class Changes/Deletes ...... 5-185.3.4 Entity Class Changes/Deletes ......... 5-195.3.5 Relation Class Changes/Deletes ....... 5-215.4 Modeling & Validating Tools ............ 5-235.5 Reviewing the Contents of the CDM .... 5-23

SECTION 6. MAINTAINING INTERNAL SCHEMASAND MAPPINGS ............................. 6-1

6.1 Methodology Overview ................... 6-16.1.1 Internal Schema and CS-IS

Mapping Structure ........................ 6-26.1.2 CS-IS Mapping Modeling Forms ......... 6-166.2 Loading The Initial Internal

Schema .............................. 6-406.2.1 Loading The Distributed Database

Environment .............................. 6-406.2.2 Loading User-Defined data types ...... 6-416.2.3 Loading Databases ....................... 6-416.2.4 Loading Record Types And

Data Fields.... ................... 6-436.3 Loading the Initial CS-IS Mapping

Definition .............................. 6-506.3.1 Loading CS to IS Mappings ............... 6-506.3.2 Loading Record Unions ................... 6-516.3.3 Loading Horizontal Partitions ......... 6-526.3.4 Loading Tranformational

Algorithms .............................. 6-526.4 Modifying/Deleting IS Objects ......... 6-656.4.1 Distributed Database Environment

Changes .............................. 6-65

vi

UM 62034100130 September 1990

Table of Contents

Page

6.4.2 Modifying User-Defined datatypes ............................... 6-67

6.4.3 Database Changes/Deletes ................ 6-676.4.4 Record Type Changes/Deletes ............. 6-696.4.5 Datafield Changes/Deletes ............... 6-706.4.6 Modifying/Deleting CS-IS

Mappings ............................ 6-716.4.7 Record Union Changes/Deletes ......... 6-736.4.8 Horizontal Partition Changes/

Deletes .............................. 6-746.5 Specific Considerations ............ ... 6-746.5.1 IMS Specific Considerations ........ 6-746.5.2 VSAM Specific Considerations ....... 6-836.5.3 Sequential Files Specific

Considerations ............................ 6-83

SECTION 7. MAINTAINING EXTERNAL SCHEMASMAPPINGS ........................... 7-1

7.1 Methodology Overview ................... 7-17.1.1 External Schemas and CS-ES

Mapping Structure .......................... 7-17.1.2 Modeling Forms .......................... 7-107.2 Loading the Initial ES & CS-ES

Mapping Definition ...................... 7-137.2.1 Loading User-Defined data types ..... 7-137.2.2 Loading User Views and Data

Items .............................. 7-147.2.3 Loading Transformation

Algorithms .............................. 7-157.3 Modifying/Deleting ES Elements and

CS-ES Mappings ......................... 7-187.3.1 Modifying User-Defined data

types .............................. 7-187.3.2 User View Changes/Deletes ............. 7-18

APPENDIX A GLOSSARY ........................... A-I

APPENDIX B USEFUL REFERENCES ......................... B-I

vii

UM 62034100130 September 1990

List of Illustrations

Figure Title Paqe

1-1 Data as an Integral Partof the Decision-Making Process .......... 1-3

2-1 Two Fundamentally DifferentViews of Data: Logical and Physical ..... 2-3

2-2 Direct Mapping of Logical andPhysical Views ............................... 2-4

2-3 The Three-Schema Architecture ........... 2-64-1 Relation Classes Form ...................... 4-64-2 Relation Classes Form Example ........... 4-74-3 Owned Attribute Classes Form ............ 4-114-4 Owned Attribute Classes

Form Example ................................ 4-124-5 Inherited Attribute Classes Form ........ 4-134-6 Inherited Attribute Classes

Form Example ................................ 4-144-7 Refinements of Nonspecific

Relation Classes Example ................ 4-484-8 Triads and Other Dual-Path

Structures .................................. 4-494-9 Migration Through

Two Relation Classes ...................... 4-504-10 Guidelines for Determining Key

Classes of Dependent Entity Classes .... 4-515-1 CDM Objects .................................. 5-45-2 CDM Object Description ................... 5-45-3 CDM Conceptual Schema ....................... 5-65-4 Owned Attribute Classes Form Example ..... 5-105-5 Figure Entity Class Glossary

Form Example ................................ 5-125-6 Inherited Attribute Classes

Form Example ............................. 5-145-7 Relation Classes Form Example............. 5-156-1 Entity Class/Record Type Mapping ........ 6-36-2 Join Examples ............................... 6-86-2 Join Structures ............................. 6-116-4 Record Type/Entity Class

Mapping Form ................................ 6-196-5 Record Type/Entity Class

Mapping Form Example ...................... 6-206-6 Record Type Join Structures Diagram ..... 6-216-7 Record Type Join Structures

Diagram Example ............................. 6-226-8 Data Field/Attribute Use

Class Mapping ........................... 6-246-9 Data Field/Attribute Use Class

Mapping Example ............................. 6-256-10 Set Type/Relation Class Mapping ......... 6-27

viii

UM 62034100130 September 1990

List of Illustrations

Figure Title Page

6-11 Set Type/Relation Class MappingExample ................................. 6-28

6-12 Data Field/Attribute Use ClassMapping Example ............................. 6-31

6-13 Record Type Join StructureDiagram Example ............................. 6-38

6-14 Incomplete Join Structure Example ......... 6-396-15 CDM Tables Distributed Data Bases ......... 6-466-16 CDM Tables Domains and Data Types for

Internal Schema .............................. 6-476-17 CDM Tables Relational Database Internal

Schema .................................... 6-486-18 CODASYL Internal Schema. .................. 6-496-19 CS to IS Entity Mapping ..................... 6-546-20 Record Type/Entity Class Mapping ......... 6-556-21 S to IS Attribute and Relation

Mapping ................................. 6-566-22 Datafield to Attribute Use Class

Mapping .................................. 6-586-23 Set Type to Relation Class Mapping ....... 6-596-24 Record Union ............................. 6-606-25 horizontal Partition ..................... 6-616-26 Complex Mapping Algorithm ................... 6-626-27 IMS Internal Schema ......................... 6-636-28 IMS Internal Schema ......................... 6-807-1 Data Item/Attribute Use Class

Mappings ................................ 7-27-2 Vertical Partition ....................... 7-37-3 Entity Joins ............................. 7-47-4 ES-CS Join Examples .......................... 7-57-5 ES-CS Join Structures ........................ 7-87-6 Single Entity Views ......................... 7-107-7 Domains and Data Types External Schema... 7-167-8 External Schema and CS/ES Mapping ......... 7-17

ix

UM 620341001/ 30 September 1990

SECTION 1

INTRODUCTION

The purposes of this document are several and include:

a) Describing the philosophical and practical objectives ofthe Common Data Model (CDM) Administrator;

b) Discussing the CDM itself, its underlying design, and itsrole in the IISS environment;

c) Describing in detail the steps necessary in entering andmaintaining data kept in the CDM.

After reading and understanding this document, the CDMAdministrator should not be able only to collect, enter, andmaintain CDM-related data, but also be able to understand thereasons why such activities are performed.

The NDDL statements used to perform the actual CDMmaintenance activities are described in detail in the NDDL UserGuide.

1.1 Managing Data as a Corporate Resource

Managing data as a corporate resource is a philosophy aboutthe importance of data to an organization. The approachrecognizes that data are assets to be managed along with the othermore generally recognized resources of an enterprise, includingits personnel, inventories, capital, and so forth. Organizationsspend tremendous sums of money collecting and manipulating data,trying to extract information needed to support decision making.The CDM Administrator has as one of his or her primary objectivesthe preservation of that continuing, substantial investment indata resources. The CDM Administrator plays a major role inprotecting and properly managing that investment by managingcommon data rather than just managing applications that accessdata.

Data management includes all the activities that ensure thatquality data are available to produce needed information andknowledge. The objective of data management is to keep dataassets resilient, flexible, and adaptable to supportingdecision-making activities in the business. Data managementresponsibilities include: 1) the representation, storage, andorganization of data so that they can be selectively andefficiently accessed, 2) the manipulation and presentation of dataso that they suppcrt the user environment effectively, and 3) theprotection of data so that they retain their value.

The philosophy of the CDM recognizes that data are absolutelynecessary to the decision-making cycles of organizations (Figure1-1). Individuals must not only be able to collect and retaindata for their own use, but also be able to share data and pooltheir knowledge resources. The ability to correlate informationacross traditional applications boundaries and to provide

1-1

UM 62034100130 September 1990

information that supports all levels of decision making, fromoperational through tactical through strategic, is increasinglyimportant as management at all levels is becoming more aware ofthe potential power of information systems.

The CDM provides the capability to pull the enterprise'sdatabase resources together to form an integrated, common sourceof information to support decision making.

The objectives of data management include the following:

o Independence of data access from data descriptionso Increased data accessibilityo Improved data integrityo Improved data shareabilityo Improved data resiliencyo Improved data administration and controlo Improved data securityo Improved performance

The CDM Administrator needs to understand each of theseobjectives.

Independence between data access and data descriptionsimproves control over the data descriptions, facilitatesstandardization of data-naming conventions, and rpduces theprogramming effort required to accommodate modified datadescriptions. Data independence is perhaps the single mostimportant factor in determining the long-range success of adata-driven environment.

1-2

UM 62034100130 September 1990

Knowledge - o Decisions

Actions

Information Facts

) 0 00

• 0 0 0

Data Pool

Figure 1-1. Data as an Integral Part of the Decision MakingProcess.

1-3

UM 62034100130 September 1990

Data accessibility refers to the capability for a user toextract needed information from the data resource. Dataaccessibility is enhanced by user-friendly interface languagesand well designed screens. Good accessibility is characterizedby being able to relate data in many different ways to produceinformation, and by being able to represent that information ina variety of suitable forms. Data accessibility is improved bythe CDM in its support of multiple access paths and retrievalsequences through the physical databases. Programming effortfor data manipulation is decreased and cost-effective, general-purpose query facilities such as the NDML become possible.

Data integrity is essential to maintain the quality of thedata resource. Data integrity is measured by the completenessand consistency of the data resource. Does it contain the datathat are relevant to the decision-making needs of the user?Does it contain all required interrelationships among types ofdata, and are all consistency constraints satisfied?

Data shareability is needed to keep common data trulycommon. Without shareability, data proliferate and theirquality becomes uncontrollable. Without shareability, data areprivate and personal; their quality is each individual user'sresponsibility. The main difficulty with this distribution andredundancy of control is that it results in no control at all.Improved shareability can be achieved by supporting multipleaccess paths through the physical databases, thereby enablingthem to serve many diverse needs. Shareability is also achievedby separating individual user's views of the data resource fromthe actual physical implementation of databases.

Data shareability refers not just to database contents, butalso to logic that accesses and manages data. Reduced dataduplication streamlines data access, reduces the programmingeffort required for updating data, and reduces the potential forinconsistent data. Reduced redundancy in the data managementeffort improves the productivity of data processing personnel.

Data recoverability is needed to keep the data resourceresilient in the wake of errors. Error conditions need to bedetected and corrected. Better yet, errors should be preventedfrom occurring in the first place. Part of the difficulty inproviding a resilient data resource is continuing to make thedata available to users while recovering from errors.

The CDM Administrator should help to ensure that the dataresource continues to satisfy users' information needs, even asthose needs change through time. Many organizations havesuccessfully established data administration functions to helpdevelop and protect data assets. The CDM Administrator plays asimilar role for the integrated, overall data resource.

Data security is essential to prevent unauthorized accessto data. Certainly not all environments require the same,elaborate security schemes, but nearly all organizations' dataassets need to have some degree of access protection. Some dataare wide open to public retrieve-only access; others require

1-4

UM 62034100130 September 1990

strict authentication to provide retrieval. Many databases havemore stringent restrictions on accesses that will changedatabase contents than on accesses that only read databasecontents.

Performance of the data resource has two facets: efficiencyand effectiveness. Efficiency is a measure of how well the datasystem utilizes physical computer support, while effectivenessis a measure of how well the data system meets users'information needs. The characteristics are closely related; forexample, a user may be totally dissatisfied with the system ifresponse time is measured in hours rather than seconds.Response time is generally considered to be an efficiencymeasure, but it certainly has an impact on effectiveness.

1-5

UM 62034100130 September 1990

SECTION 2

CDM OVERVIEW

2.1 The Fundamental Approach

2.1.1 The Three-Schema Architecture

A key to implementing effective data-oriented environmentslies in a framework that is called the Three-SchemaArchitecture. This approach was proposed in the mid-1970s, thendeveloped, and finally published in 1977 in a report from acommittee of the American National Standards Institute - "TheANSI/X3/SPARC DBMS Framework: Report of the Study Group on DataBase Management Systems."

The basic concepts proposed in the report have the power tolead us to more effective information resource management. Theyare implemented in the CDM.

The Three-Schema Architecture is based upon severalfundamental facts:

o Computers and users need to be able to view the samedata in different ways

o Different users need to be able to view the same datain different ways

o It is (more or less) frequently desirable for usersand computers to change the ways they view data

o It is undesirable for the computer to dictate orconstrain the ways that users view data

Thus, it is necessary to be able to support different typesof views of a data resource. Users need to be able to work withlogical representations of data, which are independent of anyphysical considerations of how the data are actually stored andmanaged on computer facilities. Users view data in terms ofhigh-level entities, e.g., staff members, tools, vehicles,products, orders, and customers. Meanwhile, computerfacilities, access methods, operating systems, and DBMSs, forexample, need to be able to work with more physicalrepresentations. They view data in terms of records and files,with index structures, B-trees, linked lists, pointers,addresses, pages, and so forth.

These requirements lead us to conclude first that there aretwo fundamentally different types of data views: logical andphysical. The logical views are user-oriented, while thephysical views are computer-oriented (Figure 2-1).

A second conclusion is that there must be a mapping ortransformation between the logical and physical views. Afterall, the ultimate objective is to enable users to gain access totheir data that reside on computerized media. This mapping

2-1

UM 62034100130 September 1990

might be simple if there were only one user view and onedatabase, but that is not the real-world situation. Rather,there are multitudes of user views and commonly many (sometimeshundreds or thousands) databases in an enterprise.

Each user view could be mapped directly to the underlyingdatabases (Figure 2-2). This solution suffers, however, whenchange is introduced in either type of view. If a physicaldatabase is restructured on a disk to provide more efficientperformance, then the mapping to each of the user views thatreferences that database can be affected. If a logical view isrevised to present information in a somewhat different way, thenthe napping to each of the referenced databases may be affected.Independence of logical and physical considerations would nothave been achieved, and we would find that physical computerfactors would constrain the ways that users logically view theirdata. This is undesirable.

Using three-schema architecture terminology, "externalschemas" represent user views of data, while "internal schemas"represent physical implementations of databases. Schemas aremetadata, i.e., they are data about data. As a simple example,CUSTOMER-NAME and CHARACTER (17) are metadata describing thedata value CHRISTOPHER ROBIN.

To enable multiple users to share a data resource that isimplemented on potentially many physical databases, we insertbetween the users' views and the physical views a neutral,integrated view of the data resource. This view is called a"conceptual schema" in three-schema architecture terminology.Others sometimes call it an "enterprise view."

2-2

UM 62034100130 September 1990

Logical Data Views Physical Data Views

Figure 2-1. Two Fundamentally Different Views of Data: Logicaland Physical

2-3

UM 62034100130 September 1990

Database Auser view 1 -

User View 2

User View 3

Database 0

Figure 2-2. Direct Mapping of Logical and Physical Views

2-4

UM 62034100130 September 1990

As the vehicle for data integration and sharing, theconceptual schema also carries metadata for enforcement of dataintegrity rules. It is extensible, consistent, accessible,shareable, and enables the data resource to evolve as needs changeand mature.

Figure 2-3 illustrates the relationships between the threetypes of schemas. The schemas and the mappings between them arethe mechanism for achieving both data independence and support ofmultiple views. An internal schema can be changed to improveefficiency and take advantage of new technical developmentswithout altering the conceptual schema.

The conceptual schema represents knowledge of shareable data.There may be access controls and security restrictions placed uponthese common data, but they are not restricted to access by onlyone user. The conceptual schema does not describe personal data.

The scope of the conceptual schema expands through time. Theconceptual schema extension methodology continually expands theconceptual schema to include knowledge of more shared data. Theexternal-conceptual mappings protect the external schemas and thetransactions/programs that depend on them from most modificationsincurred in evolving the conceptual schema.

Adding data to the integrated, common resource does not startover in defining the data resource, nor does it create anotherstand-alone database. Rather, development of its database mustexamine questions of how those data relate to what is alreadyknown by the conceptual schema. The result will be an integrateddata resource whose scope is expanded gradually. It is absolutefolly to approach integration of the data resources of anorganization all at once; the job must be taken on piecemeal. Theconceptual schema is the integrator.

The CDM contains all three types of schemas, as well as theinterschema mappings. It not only documents these metadata, butalso supplies appropriate metadata to support transactionprocessing.

2-5

UM 620341001

30 September 1990

Internal

Schem Schchem

SSchema 2

InternalSchema 2

ExternalC

Schema 2Nch m Internal

Schema 4

Figure 2-3. The Three-Schema Architecture: One Conceptual SchemaThat Provides for Integration and Independence ofMany External Schemas and Many Internal Schemas

2-6

UM 62034100130 September 1990

2.1.2 Representation of the Three Types of Schemas

In the IISS, the Three-Schema Architecture is implementedthrough the CDM facilities to store each of the three types ofschemas and the interschema mappings. An appropriaterepresentation mode has been selected for each of the three typesof schemas.

The conceptual schema is represented by an IDEFl model. TheCDM stores this model in terms of entity classes, attributeclasses, and relation classes.

The external schemas are represented by tables. The userviews the common data resource in terms of flat, simple tables.The mappings between these tables and the IDEFI model of theconceptual schema are part of the CDM database.

The internal schemas are represented in terms of physicaldatabase components, including record types and inter-recordrelationships. The CDM Processor routines convert the users' dataaccess requests, which are phrased in terms of tables, intorequests against the conceptual schema IDEFl model, then intorequests against the physical database structures described in theinternal schema part of the CDM.

2.1.3 Integration Methodology

The Integration Methodology is the set of procedures andquidelines that are used to expand the conceptual schema and toincrease the sphere of common data available to support users andapplications. The schemas and schema mappings in the CDM arebuilt, maintained, and accessed using the Integration Methodologyand the CDM Processors. (CDMP)

The Integration Methodology is intended to guide the CDMAdministrator in building and maintaining the conceptual schemaand in keeping its mappings to the internal and external schemashighly accurate. This methodology consists of a set of techniquesfor building the conceptual schema in gradual increments, forbuilding external and internal schemas from portions of theconceptual schema, for developing schema mappings, and for keepingthese various CDM components current.

The first step in populating the CDM is to select a portionof the data and to document it in the conceptual schema. Thenexternal and internal schemas for those data are built and mappedto the conceptual schema. Subsequently, other portions of thedata resource are incorporated into the conceptual schema, and newexternal and internal schemas and mappings are developed. The CDMis populated gradually, in increments, rather than all at once.It evolves through time.

A conceptual schema is represented by a semantic data model.The IISS uses the IDEFl methodology, with certain extensions fromDACOM's Data Modeling Technique. (Subsequent to the developmentof CDM subsystem, IDEF1 was formally extended. See Appendix B for

2-7

UM 62034100130 September 1990

references.) The data model reflects business policy, provides arigorous view of the meaning of the data resource, and isindependent of the physical implementation of the data resource.

Building a data model is a rigorous procedure, whoseobjective is to discover and document the semantic data structurein its most fundamental terms. The modeling is a multi-stepprocess that requires substantial input from users who are expertin the subject area.

The fundamental steps of the CDM Integration Methodology areas follows:

1. Identify the scope of the initial increment of theconceptual schema.

2. Develop the data model for that initial increment of theconceptual schema.

3. Load the data model into the CDM database.

4. Identify any physical databases or files within the scopeof data in the conceptual schema.

5. Load their internal schemas into the CDM database.

6. Build the conceptual-to-internal schema mappings for theinternal schemas loaded in Step 5.

7. Load the conceptual-to-internal schema mappings into theCDM database.

8. Determine which users/application programs should haveexternal schemas mapped from the conceptual schema.

9. Design the external schemas identified in Step 8, andtheir mappings to the conceptual schema.

10. Load the external schemas and external-to-conceptualschema mappings into the CDM database.

11. Identify the scope of the next increment to theconceptual schema.

12. Develop the data model for the next increment of theconceptual schema.

13. Integrate the data model from Step 12 with the data modelof the existing conceptual schema.

14. Load the integrated data model into the CDM database.

15. Verify that the conceptual-to-internal andexternal-to-conceptual schema mappings are still valid,correcting them as needed.

16. Identify any additional physical databases or files thatare now within the scope of the extended conceptualschema.

2-8

UM 62034100130 September 1990

17. Load their internal schemas into the CDM database.

18. Build the conceptual-to-internal schema mappings for theincremented portions of the conceptual schema.

19. Load the conceptual-to-internal schema mappings into theCDM database.

20. Identify any additional users or application programsthat should be supported by the extended conceptualschema.

21. Design external schemas to support the users/applicationprograms identified in Step 20, and develop theirexternal-to-conceptual schema mappings.

22. Load the external schemas and external-to-conceptualschema mappings from Step 21 into the CDM database.

23. Repeat Steps 11 through 22 for each increment to theconceptual schema.

The evolutionary strategy for the conceptual schema should bedeveloped early in the life of the above cycle. The strategyshould ensure that the common data resource evolves in a mannerthat serves the enterprise's need for controlled, shared data.One tactic is to define the initial scope by that of an existingdatabase that has a corresponding data model. Ideally, thatdatabase would contain core information of high interest to thetarget user community.

Perhaps the most important point to understand about the CDMIntegration Methodology is that the incorporation of additionaldata into the common data resource MUST be done in conjunctionwith the existing conceptual schema. No data can be accessedusing the CDM integrated facilities, including the Neutral DataManipulation Language, unless they are known to the CDM. Addingdata causes the conceptual schema to expand in a consistent mannerthat enables integration to occur. By contrast, adding data to anenvironment that does not use conceptual schema technology justadds more fragmentation to what is probably already at best aninterfaced (not integrated) system.

Applying the CDM Integration Methodology is not likeswallowing a pill. It requires precise knowledge of the meaningsof the data that are to be available in the integrated common dataresource. It means not just building IDEFl models for thosedatabases, but also analyzing the models for overlap, synonyms,homonyms, and all the incipient anomalies and quirks that somehowhave crept into our database structures over the years. The costis measured in man-months of effort; the benefits are integrationand a knowledge base that can be built on and evolved in thefuture.

2-9

UM 62034100130 September 1990

2.1.4 Contributions to IRRIASSPA

The use of the Common Data Model and the Three-SchemaArchitecture allows an organization to benefit from contributionsto IRRIASSPA, which are part of the objectives of the USA'sIntegrated Computer Aided Manufacturing (ICAM) project to developthe Integrated Information Support System (IISS). Thecontributions can best be summarized as follows:

Independence - the IISS allows the separation of thedescription and manipulation of logical data structuresfrom the actual physical data representations and isolatesimplementation changes from user views and programs.

Relatability - the NDDL used in building the CDM allowsthe CDM Administrator to define, modify, and maintainrelationships among data.

Resiliency/Recoverability - although not specificallyaddressed by the CDM, the design of the CDM Processorprovides the ability to recover from failures withoutdamage to the data resource.

Integrity - is provided through the use of data integrityconstraints, which the application may specify and the CDMProcessor enforces.

Accessibilit - the NDDL allows the definition of datathat resi es not only in different databases but also ondifferent computers.

Security - not expressly addressed by the CDM.

Shareability - is provided by support of multiple userviews (i.e., external schemas) of the data resource.

Performance - the NDML, by use of the CDM, allows data [Bfrom multiple resources to be addressed in a cost-effective manner in a distributed environment.

Administration - by providing a means of documenting themeanings in the data resource and of providing a vehicleby which consistency can be maintained even as the scopeof the CDM is extended. It also allows the maintenance ofinformation about data in different databases.

2.2 Basic Components of the Design

The Common Data Model(CDM) subsystem is comprised of threecomponents:

1. The CDM database, which is the database dictionary ofthe IISS

2. A logical model of the CDM database called CDM

3. The CDM Processor (CDMP), which is the distributeddatabase manager of the IISS

2-10

UM 62034100130 September 1990

This section will briefly discuss each of these basiccomponents and show how they interrelate, one with another.

2.2.1 The CDM Database

The CDM database is the database dictionary of the IISS. Itcaptures knowledge of the locations, characteristics, andinterrelationships of all shared data in the system. The mostsignificant feature of the CDM database is that it implements theANSI/X3/SPARC concepts of the three-schema approach to datamanagement. These three types of schemas are the conceptualschema (CS), the internal schemas (IS), and the external schemas(ES).

The conceptual schema describes a neutral, integrated view ofthe shared data resource. There is one conceptual sche,. in anenterprise. It is independent of physical database structures andboundaries and is neutral to biases of individual applications.Each external schema represents a user or application view ofdata. Requests are made against external schemas. Each internalschema represents an external schema to the local DBMS.

The CDM database is implemented as a relational database,which presently resides on a VAX 11/780 computer. It is accessedby the CDMP at compile-time to generate appropriate local DBMScalls against internal schemas to process a user's NDML requestagainst an external schema.

The CDM database is repzesented logically using a semanticdata modeling technique called IDEFI. This method of datamodeling is a hybrid of the entity-relationship approach, therelational model, and the Smith's 2D data abstraction approach.This logical model of the CDM database is called CDM1.

2.2.2 CDM1

CDM1 is a model of metadata, i.e., data about data. It givesthe logical structure of the CDM database which maintains themetadata. These metadata describe the meanings andcharacteristics of user data.

The conceptual schema portion of the CDM1 model is related toportions that describe internal and external schemas. An internalschema describes a local database structure in just enough detailto give the CDMP adequate information to generate code that can beprocessed by the pertinent local DBMS. Because one of therequirements of the IISS is that it provide integration of data inexisting databases, the mappings between the conceptual schemametadata and the internal schema metadata are not simple. IISSdoes not have the luxury of supporting only certain clean databasestructures. It is very likely that an attribute may berepresented by one or more data files, which may be in differentdatabases and even on different computers, or by relationshipsbetween record types.

An external schema describes the portion of the conceptualschema that is within the purview of a user or application. Anexternal schema is equivalent to a view in the relational model.

2-11

UM 62034100130 September 1990

The conceptual-to-external schema mapping part of the CDMI isstraightforward. The present implementation of the CDM subsystemsupports any external schema that can be formed by joiningconceptual schema entities and selecting attributes.

Thus, the CDMI model is a semantic data model that describesthe logical structure of the CDM database. The CDM1 representsthe conceptual schema, the internal schemas and their mappingsfrom the conceptual schema, and the external schemas and theirmappings from the conceptual schema.

2.2.3 The CDM Processor

The CDMP is the distributed database manager of the IISS. Itbuilds on top of local DBMS services to provide data access. TheCDMP plays both a compile-time and a run-time role in theprocessing of transactions. The compile-time component is calledthe CDMP Precompiler. The run-time components are called the CDMPDistributed Request Supervisor (DRS) and the CDMP Aggregator.

2.2.3.1 CDMP Precompiler

The CDMP Precompiler performs the following functions foreach data request:

1. Parses the request

2. Transforms the request f . an external schema access toa conceptual schema ac-ess

3. Decomposes tle request into subrequests, each of whichaccesses one internal schema

4. Determines an appropriate access path for each subrequestgenerating code that can be processed by the pertinentlocal DBMS

5. Generates code to transform any data to be extracted fromlocal databases from internal to conceptual schema format(this code is called a Request Processor or RP)

6. Generates code to transform any data results fromconceptual to external schema format and to performstatistical calculations (this code is called a C/ETransformer or CEX)

7. Generates code to invoke appropriate RPs and CEXs atrun-time, via calls to the NTM Subsystem

The CDMP Precompiler accesses the CDM database to findmetadata for the inter-schema transforms and integrityconstraints for update requests.

After successful precompilation of a user's program, whichcontains embedded data requests in a SQL-like language calledthe Neutral Definition/Manipulation Language (NDML), the CDMPhas produced the following code modules:

2-12

UM 62034100130 September 1990

1. Modified user program will activate appropriateprocesses (RP's and CEX's) at runtime.

2. One Request Processor (RP) per DBMS that manages datato be accessed by the user program.

3. One Conceptual-to-External Transformer (CEX), whichwill deliver query results to the modified user programat run-time.

2.2.3.2 Distributed Request Supervisor

There are presently two CDMP Distributed Request Supervisor(DRS), one residing on the IBM node, the other on the VAX whichhave responsibility for scheduling and coordinating the varioussubrequests of user transactions. The DRS uses request graphsproduced by the CDMP Precompiler to determine which operations areto be performed where. The DRS also uses knowledge ofcommunications costs and intermediate result volumes in itsalgorithm for scheduling RPs.

Request Processors always deliver results as relations. The

relations are operated upon by the Aggregators.

2.2.3.3 Aggregators

An Aggreqator is called to perform a single function; forexample, a union or a join, or an outer join on two sets of data,each of which exists in a single sequential file. These data setsare the results of an RPP or another Aggregator.

An Aggregator always deals with data in conceptual schemaformat.

2-13

UM 62034100130 September 1990

SECTION 3

RESPONSIBILITIES OF THE CDM ADMINISTRATOR

The role that the CDM Administrator plays in the IISSenvironment is not unlike that of the database administrator inthat the CDMA is responsible for the following:

1. Establishing Data Standards

2. Maintaining the CDM

3. Protecting the CDM

4. Facilitating Use of the CDM

Each of these areas is of major importance to theorganization and a failure to properly administer either ofthese areas of responsibility can cost the organization dearly.

3.1 Establishing Data Standards

One of the early roles of the CDMA is the establishment ofdata standards. Part of this work has already been initiatedduring the development of the CDM1. The work that remains is todetermine what types of standards to implement and to gainacceptance for the use of these standards. It should be notedthat, without acceptable standards, it will be difficult, if notimpossible, for the CDMA to enforce any level ofstandardization.

3.2 Maintaining the CDM

The CDMA must maintain the CDM. This entails the buildingof the initial conceptual schema (CS), internal schemas (IS), CSto IS mappings, external schemas (ES), and ES to CS mappings, aswell as extending the model and modifying and deleting elementsas needed. It is to be expected that the need for extending andmodifying the CDM will grow over time, slowly at first, thengrowing rapidly as the benefits of the concept are proved beforeleveling off after several years.

3.3 Protecting the CDM

One of the most important responsibilities of the CDMA isthe protection of the CDM against loss, theft, and corruption,be it intentional or not. At issue is the substantialinvestment that went into the development of the CDM and thepotential damage that can be caused to the enterprise should thedata fall into the wrong hands.

3.4 Facilitating Use of the CDM

The CDMA must make the CDM available to all those who canpotentially gain from the use of the CDM and have legitimatereason to do so. This may involve making the CDM available onother computers in the network. It also involves communicating

3-1

UM 62034100130 September 1990

with the CDM user and potential users as to the contents andperformance of the CDM, as well as the usability of the data.Part of this communication will involve solving problems andanswering questions and reporting the status of the CDM.

3

3-2

UM 62034100130 September 1990

SECTION 4

MAINTAINING THE CONCEPTUAL SCHEMA

4.1 Methodology Overview

This section and its subsections (4.2 - 4.3) introduce themethodology for building and updating a conceptual schema. Theportion of the CDM database that contains a conceptual schema isdescribed, and the basic approach to developing a conceptualschema is presented. Detailed instructions for filling out themodeling forms are included.

4.1.1 CS Structure

A conceptual schema is essentially a single IDEFI modelthat describes all of the common data in an enterprise.Consequently, its components are those of any IDEFI model:

Entity ClassesRelation ClassesAttribute ClassesAttribute Use ClassesInherited Attribute Use ClassesKey ClassesKey Class Members

Detailed explanations of these can be found in the IDEFIdocumentation. (Extensions to the IDEFI language, referenced inAppendix C, simplify the IDEFI terminology used here.)

In addition to the usual metadata (data about data)contained in any IDEFI model, the conceptual schema requirescertain new elements of metadata. Key class numbers areassigned to enable alternate key classes for the same entityclass to be distinguished from one another. Tag numbers, tags(names), and tag labels are assigned to enable attribute useclasses within the same entity class to be distinguished fromone another. Data types and sizes are identified for allattribute classes.

The conceptual schema must conform to several rules thatcause the data relationships and descriptions to be as explicitas possible. (Note: In these rules the phrase "any number"includes the possibility of zero.)

1. Single-Owner Rule: An entity class can own any number ofattribute classes. Every attribute class is owned byexactly one entity class.

2. Every entity class contains one or more attribute useclasses. Every attribute use class is contained inexactly one entity class.

4-1

UM 62034100130 September 1990

3. Every attribute class appears as exactly one attributeuse class in its owner entity class. An attribute classcan also appear as any number of attribute use classes inany number of other entity classes. Every attribute useclass corresponds to exactly one attribute class.

4. Every entity class has one or more key classes. Everykey class is for exactly one entity class.

5. Every key class is composed of one or more key classmembers. Every key class member is in exactly one keyclass.

6. An attribute use class can be used as a member of anynumber of key classes for the entity class in which it iscontained. An attribute use class cannot be used as morethan one member of the same key class; i.e., every memberof a key class must be a different attribute use class.An attribute use class in one entity class cannot be usedas a member of a key class for any other entity class.Every key class member is exactly one attribute useclass.

7. An entity class can be independent in any number ofrelation classes and dependent in any number. An entityclass cannot be both independent and dependent in thesame relation class. Every relation class has exactlytwo entity classes: one independent, one dependent.

8. A key class can migrate through any number of relationclasses in which its entity class is independent. A keyclass cannot migrate through a relation class inwhich its entity class is dependent or one in which itsentity class is not involved. Every relation class hasexactly one key class from the independent entity classmigrating through it into the dependent entity class.

9. Every relation class is a migration path for one or moreinherited attribute use classes, one for each member ofthe key class that migrates through it. Every inheritedattribute use class has exactly one relation class as itsmigration path.

10. Every member of the key class that migrates through arelation class creates exactly one inherited attributeuse class in the dependent entity class for that relationclass. Every inherited attribute use class is createdfrom exactly one key class member.

11. Every attribute use class in an entity class representseither one attribute class that is owned by that entityclass or one inherited attribute use class that migratedinto that entity class. Every inherited attribute useclass is represented by exactly one attribute use class.

4-2

UM 62034100130 September 1990

12. Unique-Key Rule: No two entity instances in an entityclass can have identical values in the samekey class forthat entity class. For a multi-member key class,instances can have identical values for some members, butnot for all.

13. No-Null Rule: Every entity instance in an entity classhas a value in each attribute use class in that entityclass.

14. No-Repeat Rule: No entity instance in an entity classcan have more than one value in any attribute use classin that entity class. This rule is equivalent to thefirst normal form in the relational database model.

15. Full-Functional-Dependency Rule: No entity instance inan entity class can have a value in an owned, nonkeyattribute use class that can be identified by less thanthe entire key value for that entity instance. This ruleapplies only to entity classes with multi-memberkey classes and is equivalent to the second normal formin the relational database model.

16. No-Transitive-Dependency Rule: No entity instance in anentity class can have a value in an owned, nonkeyattribute use class that can be identified by the valuein another owned or inherited, nonkey attribute use classin that entity class. This rule is equivalent to thethird normal form in the relational database model.

17. Smallest-Key-Class Rule: No entity class with amulti-member key class can be split into two or moreentity classes, each with fewer members in its key class,without losing some information. This rule is acombination and extension of the fourth and fifth normalforms in the relational database model.

4.1.2 Basic Approach (Onion Concept)

The complete conceptual schema for an enterprise containsthousands of entity classes and a corresponding number of relationclasses, attribute classes, etc. It is much too large to be builtall at once. Instead, it must be built in increments -- each onebuilding on the prior ones, until the conceptual schema iscomplete. The increments are like the layers of an onion; as eachlayer is added, the onion gets a little larger.

The process of "growing" the conceptual schema involves twoprocedures, both of which are enhanced versions of the IDEFImodeling procedure. The first is used to build the initialincrement only. The second is used to build each additionalincrement. The only difference between the two is that the secondmust be concerned about the integration of the new increment withthe existing conceptual schema. This involves being continuallyaware of which components of the conceptual schema are within thescope of the new increment and how any of those components will beaffected by the addition of the new increment. These twoprocedures are in Sections 4.2 and 4.3, respectively.

4-3

UM 62034100130 September 1990

4.1.3 Modeling Forms

Because the methodology for maintaining the conceptual schemais based on the IDEFI information modeling methodology, it usesmost of the IDEFl forms:

Source Material LogSource Data ListEntity Class PoolEntity Class DefinitionRelation Class MatrixAttribute Class PoolKit Cover SheetEntity Class Diagram (optional)Relation Class Definition (optional)Attribute Class Diagram (optional)Entity Class/Attribute Class Matrix (optional)Attribute Class Migration Index (optional)Author Page Control Log (optional)Index Control Log (optional)Kit Control Log (optional)Text Control Log (optional)FEO Control Log (optional)Entity Class Set Control Log (optional)Entity Class/Function View Matrix (optional)

Please refer to the IDEFI documentation for detaileddescriptions of these forms.

A few of the regular IDEFI forms have certain shortcomingsthat make them unsuitable for use in directly loading theconceptual schema tables into the CDM database. The forms listedbelow were designed to eliminate those shortcomings:

Relation ClassesOwned Attribute ClassesInherited Attribute Classes

The rest of this section contains a detailed description andtwo samples (one blank, one filled in) of each of these forms.

NOTE: When using the NDDL (see Neutral Data Definition LanguageUsers Guide, Pub. No. UM 620341100) for maintaining theconceptual schema in the CDM database, names should be substitutedfor any/all numbers on the modeling forms. A discussion of theNDDL can be found in Subsection 5.1.1.

Relation Classes Form

Purpose:

To provide a single source of information about relationclasses that are to be described in the conceptual schema.

4-4

UM 62034100130 September 1990

Instructions:

Fill in one or more pages for each entity class that isindependent in a relation class. List only those relationclasses in which the entity class is independent; do notlist any relation classes in which it is dependent. Donot fill in a page for an entity class that is dependentin all of its relation classes.

* Form Area Explanation

1. Independent Entity Name of the entity class that isClass Name independent in the relation

class. This will be the same forall relation classes entered on apage. It is included only tomake the entry readable; it isnot used in loading theconceptual schema.

2. Relation Class Label Label of the relation class.This is part of the uniqueidentification of a relationclass.

3. R.C. Card. Symbol for the cardinality of therelation class.

4. Dependent Entity Name of the entity class that isClass Name dependent in the relation class.

It is included only to make theentry readable; it is not used inloading the conceptual schema.

5. Dep. E.C. No. Number of the entity class thatis dependent in the relationclass.

6. Ind. K.C. No. Number of the key class in theindependent entity class thatmigrates through the relationclass into the dependent entityclass.

7. Node Number of the entity class thatis independent in all of therelation classes listed on thepage.

All other form areas correspond to areas on the regular IDEFIforms. Please refer to the IDEFI documentation for details aboutthose areas.

4-5

UM 62034100130 September 1990

Independent flelabon Class Ai C. Dependent De. Ind.Entity Class Narne Label Card Enit 2 C.ss Narne E C. No. K C. No.

Relation Classes Nmr

Figure 4-1. Relation Classes Form

4-6

UIM 62034100130 September 1990

USE OAT AUTIMM OAC01A ICEM. OAR) DATE Aug t963 X VX"4x Arn Of COWnEXISMOJICI 6201M MCMM WV row- 1

I nrc nqrtgxNO eTES 2 4~SS' ___1__1_0_H_1____A

Independent Relation Class nl c Dependent Dei,. IndEnity Class Namne Label Card Enfily Clas% Name EzC. No K C No

0Exec Plan Is OEP Group M .of E03 KI

00 Exec Pion Has -- > EP Slowed ItemY Req E6I KI

OR Exec Plan is Used To Manuwitive OP E20c Plan~ Pall EIs KI

00 E.gc Plan is -4. Op Exec Plan Cosp E14 PCI

op Exec PIMs Has Opration Elo KI

op c sMn Has ..- > Op Exec Plan ObsI E71 PCI

40( Ell Irit Relation Classes 62IT 6

Figure 4-2. Relation Classes Form Example

4-7

UM 62034100130 September 1990

Owned Attribute Classes Form

Purpose:

To provide a single source of information about ownedattribute use classes that are to be described in theconceptual schema.

Instructions:

Fill in one or more pages for each entity class that ownsan attribute use class, either key or nonkey. List onlythose attribute use classes that are owned by the entityclass; do not list any attribute use classes that areinherited by the entity class. Do not fill in a page foran entity class that contains only inherited attribute useclasses.

Form Area Explanation

1. Tag No. Tag number for the attribute useclass.

2. A.C. Name & Label Name, label, and any synonyms ofthe attribute use class. Thename is listed first. The labelis enclosed in parentheses andplaced on the line below thename. If the name and label areidentical, the label can beomitted. If the attribute useclass has any syn.;nyms, the term"Synonyms:" is placed below thename and label and the synonymsare listed under it.

3. A.C. No. Attribute class number for theattribute use class.

4. A.C. Definition Definition of the attribute useclass.

5. Type ID. Format description for theattribute use class indicatingdata type (numeric, character,etc.), length, and decimal length(if applicable). The data typemust be one from the CDM DataType Table.

6. Mbr. of K.C. No. Number(s) of the key class(es) towhich the attribute use classbelongs, if any.

7. Node Number of the entity class thatowns all of the attribute useclasses listed on the page.

4-8

UM 62034100130 September 1990

All other form areas correspond to areas on the regular IDEFlforms. Please refer to the IDEFI documentation for details aboutthose areas.

Inherited Attribute Classes Form

Purpose:

To provide a single source of information about inheritedattribute use classes that are to be described in theconceptual schema.

Instructions:

Fill in one or more pages for each entity class thatinherits an attribute use class. List only thoseattribute use classes that are inherited by the entityclass; do not list any attribute use classes that areowned by the entity class. Do not fill in a page for anentity class that contains only owned attribute useclasses.

Form Area Explanation

1. Tag No. Tag number for the attribute useclass.

2. Tag & Label Name, label, and any synonyms ofthe attribute use class. Thename is listed first. The labelis enclosed in parentheses andplaced on the line below thename. If the name and label areidentical, the label can beomitted. If the attribute useclass has any synonyms, the term"Synonyms:" is placed below thename and label, and the synonymsare listed under it.

3. A.C. No. Attribute class number for theattribute use class.

4. Ind. E.C. No. Number of the independent entityclass from which the attributeuse class was inherited.

5. Ind. K.C. No. Number of the key class in theindependent entity class thatmigrated through the relationclass named in the "MigrationPath R.C. Label" area.

6. Ind. Tag No. Tag number of the attribute useclass in the independent entityclass that migrated to becomethis attribute use class.

4-9

UM 62034100130 September 1990

7. Migration Path Label of the relation classthrough which the attribute useclass was inherited.

8. Mbr. of K.C. No. Number(s) of the key class(es) towhich the attribute use classbelongs, if any.

9. Node Number of the entity class thatcontains all of the attribute useclasses listed on the page.

All other form areas correspond to areas on the regular IDEFIforms. Please refer to the IDEFl documentation tor details aboutthose areas.

4-10

UM 62034100130 September 1990

USDA AU71,401% DATE O"( IWAIV1I flAT i I EMoACf ntv O NF (

IO T S 1 2 3 4 S 7

_ _ t o I_

_ __IA IO_

Tag AC Type Mb OfNo A C Name & Label No. A C. Defnlion 1_ KC No

0 0 © 0

CO( ( TIVL Owned Atribute Classes NU4 n

Figure 4-3. Owned Attribute Classes Form

4-11

UM 62034100130 September 1990

US OT UIK)I ACF (EMon) AT Ag19e3 IX VIN 41I All It fli CONTEXTJIO,(C T 6201M MCMM ntv f IAr

NOES 12 3a5 6 7 0 I j'll C OMW P D _____

Tog A N &IiA C. AC DiType Mbr ofNo. ACNae&LblNo. _______ Definition ___ ID K C. No

737 Opetamon Eveculoon Plan Group ldenbticaIton AtO A unklue idinler aSSqned To wientify Nfdf K01(01 P 0VID) groups ol opefalion excclool pLins

T134 Stalus A34 A CO that 'inifatei uwee a gimi, of c(sfOperalen .secubon plA14 IS WdhWn 13Slile cyci.

TM3 Toua Opetation Esecubion Plans A35 The Wlo riumbt of operalion erpectron Mitt(70WEPS)plansthVat makie up Ihe cyoup

F~t 12l Owned Altribute Classes NUM1 69

Figure 4-4. Owned Attribute Classes Form Example

4-12

UM 62034100130 September 1990

USEO AT ALrflKG (ATE WOOIK G ITAl 01 nAf COM lEXT

I II c c~kffl NIF(

NOTtS 1 22 4 S6 7 9 10 JTag A C. Ind Ind, Ind. Mbr ofNo. Tag & Label No.o Ta No M ation athRC Label KC No

@ :,O Il(C Inheiled Atlibute Classes I R

Figure 4-5. Inherited Attribute Classes Form

4-13

UM 62034100130 September 1990

USEAflUHM VACOM (CEM. DAM~ OATE Aug 190 j D I OdG I WVV~ A rn oAT? CO#E1XETTMW3JCT 6201M MCMM MYE FTWNOTS 23 45 & ?89 $0 V.IJtJIN CAION I J

Tag A C Ind. Ind. Ind ftoPahRCLbl Mbr ofNo, Tag & Label No E C. No. K.C. No. Ta No. MgainPm ae .. N

T73 Ftemfon Number A09 E20 K01 T28 IsFor X01?(Roq NO)

T191 IS" sn~mauackw rea A07 E20 X1O? 7182 fs Far K01

(hs M19 Area 01

T192 Desimatioe manuiacluin Area A07 E24 K01? 140 ISkknorcalon

Figure 4-6. Inherited Attribute Classes Form Example

4-14

UM 62034100130 September 1990

4.2 Building the Initial CS

This section and its subsections (4.2.1 - 4.2.5) describe theprocedure for initiating an enterprise's conceptual schema. Theprocedure is concerned with creating a detailed description (aninformation model) of a portion of the enterprise's common dataand with collecting the data required to place that description inthe CDM database as the first piece of the conceptual schema (thefirst layer of the onion). It is not concerned with decidingwhich portion of the common data to describe nor with setting upthe CDM database and its utilities; these things must be donebefore starting the procedure. The procedure consists of sixphases, the first five of which are patterned after those inIDEFI. The five IDEFI phases are as follows:

o Phase 0 - Starting the Project

o Phase 1 - Defining Entity Classes

o Phase 2 - Defining Relation Classes

o Phase 3 - Defining Key Classes

o Phase 4 - Defining Nonkey Attribute Classes

The procedure for the sixth phase, which consists of populatingthe CDM database with the conceptual schema, is described inSection 5. Each IDEF phase is described in a subsequentsubsection.

4.2.1 Phase 0: Starting the Project

Objectives:

o State the purpose, scope, and viewpoint for theinformation model.

o Establish the project team.

o Develop a phase-level project schedule.

o Collect and catalog relevant source material.

This phase is patterned after Phase 0 of IDEFI, and thedescription presented here is less detailed than the one in theIDEFI documentation. Please refer to that documentation forfurther information.

Tasks:

1. The CDM Administrator appoints a project manager.

Usually, this will be the CDM Administrator.

2. The project manager states the purpose for building theinformation model.

4-15

UM 62034100130 September 1990

This explains why the model is needed, i.e., what it willbe used for. A model built with this procedure isprimarily used to initiate the enterprise's conceptualschema. (It is not necessary to explain why theconceptual schema is needed.) If the model has otherpurposes, they should be mentioned also.

3. The project manager states the scope of the informationmodel.

This sets the boundary of the model. It should bespecific enough to be useful in deciding whether or not aparticular element of common data should be included inthe model. Some of the things that can be used as thebasis for scoping a model are the following:

o Information subjects: parts, employees, salesorders, etc.

o Functions: engineering release, shop floorcontrol, etc.

o Existing computer files or databases

o Existing computer application systems

4. The project manager states the viewpoints for theinformation model.

This explains the mental attitude or role that peopleshould adopt when looking at and thinking about themodel, i.e., in whose place they should put themselves.Usually, this will be the job title of someone who isintimately involved with the common data being modeled.

5. The project manager appoints the project team members.

The four roles to be filled are as follows:

o Modeler - one or two IDEFI experts.

o Source - several subject experts, i.e., peoplewho have in-depth knowledge about some or all ofthe common data being modeled.

o Reviewer - several subject experts; some sourcesmay also serve as reviewers. The CDMAdministrator must also serve as a reviewer toensure that the model, as it is developed, isproperly documented for loading into the CDMdatabase tables.

o Librarian - a person who is trained andexperienced in coordinating kit reviews and inmaintaining files of model documentation; amodeler may also serve as the librarian.

4-16

UM 62034100130 September 1990

6. The project manager appoints t -ceptance reviewcommittee members.

This committee should consist of subject experts from the

area being modeled and from other, related areas.

7. The project manager schedules the project phases.

Estimate the amount of effort needed to complete eachphase (usually in man-weeks or man-months) and thenconvert those estimates to elapsed times and milestonesbased on the availability of the project team members.At this point, only the phases are scheduled; theindividual tasks within a phase will be scheduled whenthat phase is started.

8. The project manager schedules the remaining Phase 0tasks.

Estimate the amount of effort needed to perform eachremaining task in this phase (usually in man-hours orman-days) and then convert those estimates to elapsedtimes and milestones based on the availability of theproject team members who will perform those tasks. Theschedules for the subsequent phases should be adjusted ifthey are inconsistent with these task schedules.

9. The modeler develops a data collection plan.

Determine what kinds of source material are needed andwhere and how to get that material.

10. The project manager conducts a project kick-off meetingattended by the project team members.

The objectives of the meeting are as follows:

o To introduce the team members to one another andto the roles they will be performing.

o To determine which members need IDEFI training.

o To present, discuss, and finalize the statementsof purpose, scope, and viewpoint.

o To present and discuss the project schedule.

o To present, discuss, and finalize the data

collection plan.

11. The modeler collects source material from the sources.

Gather the documents, policies, procedures, databasedesigns, etc., and interview the sources in accordancewith the data collection plan (Task 9).

4-17

UM 62034100130 September 1990

12. The modeler catalogs the source material.

Prepare Source Material Log Forms and Source Data ListForms. If a database design is among the sourcematerial, the record names and data field names should beincluded in the source data list.

13. The modeler explains any author conventions.

These are deviations from or additions to the regularIDEFI methodology. Mention the use of the threespecially designed modeling forms: Relation ClassesForm, Owned Attribute Classes Form, and InheritedAttribute Classes Form.

Deviation from IDEFl:

Usually, kits are not used to accomplish the review of thePhase 0 model documentation; the essentials are reviewed duringthe kick-off meeting (Task 10). However, the project manager mayrequire that kits be used to supplement or replace the kick-offmeeting.

4.2.2 Phase 1: Defining Entity Classes

Objective:

o Identify and define the apparent entity classes thatare within the scope of the model.

This phase is patterned after Phase 1 of IDEFI, and thedescription presented here is less detailed than the one in theIDEFI documentation. Please refer to that documentation forfurther information.

Tasks:

1. The project manager decides what method to use to reviewthe Phase 1 model.

The options are to distribute review kits, to hold awalk-through meeting, or to do both. The factors toconsider are the following:

o Some team members may have to travel to attend awalk-through. How many trips can the projectbudget afford?

o A review can usually be accomplished faster witha walk-through than with kits. Is there enoughtime to circulate kits, perhaps two or threetimes?

o Some reviewers may have very limited time tospend on the project. How can their time beused most effectively, by reviewing a kit or byattending a walk-through? Will they devote timeto reviewing a kit on their own?

4-18

UM 62034100130 September 1990

2. The project manager schedules the Phase 1 tasks.

Estimate the amount of effort needed to perform each taskin this phase (usually in man-hours or man-days) and thenconvert those estimates to elapsed times and milestonesbased on the availability of the project team members whowill perform those tasks. The schedules for thesubsequent phases should be adjusted if they areinconsistent with these task schedules.

3. The modeler builds an entity class pool.

Examine the entries in the source data list and deducewhat sort of thing each entry identifies, describes,refers to, etc. For example:

o Employee number, name, birth date, and salaryare data elements about an employee; hence, an"Employee" entity class.

o Part number, description, and dimensions are allabout a part; hence, a "Part" entity class.

Each sort of thing is represented by an entity class.Talk to the sources when additional information isneeded. The entity instances within an entity classshould be distinguishable from one another by some uniqueidentifier. Assign an entity class number to each entityclass, and record it on an Entity Class Pool Form.

When examining record names from a Catabase design, becareful to think about the "real-world thing" that eachkind of record represents. Realize that several kindsof records may represent the same thing or, conversely,that one kind of record may represent several differentthings. Also, realize that certain kinds of records maybe present for technical reasons only (performance,backup/recovery, etc.). Such records do not represent"real-world things" and should not result in entityclasses being added to the pool.

4. The modeler defines each entity class.

Fill out an Entity Class Definition Form for each entityclass in the pool. Talk to the sources when additionalinformation about an entity class is needed. Check offeach pool entry as it is dealt with.

Watch for synonyms (different names for the same thing)and homonyms (same name for different things). Whenthere are synonyms for something, there is only oneentity class to define. Use the most commonly used nameas the "official" entity class name, and record it andthe corresponding entity class number on an Entity ClassDefinition Form. Record the other names as synonyms onthe form. In the pool, add a note to each synonym entryreferring to the official name or number.

4-19

UM 62034100130 September 1990

For a homonym, there are two or more entity classes todefine, one for each thing that the term represents.Pick a new name for each thing to clarify thedifferences. Record the new names in the entity classpool along with a new entity class number for each, andfill out Entity Class Definition Forms. For example, ifan order can be either something received by anenterprise from a customer, or something sent by anenterprise to a vendor, call the first a sales order andthe second a purchase order, and fill out two definitionforms.

5. The modeler, reviewers, and librarian participate inreviewing the Phase 1 model.

The method of review was selected in Task 1. Themodelers prepare the review materials (kits orwalk-through handouts), the reviewers read and comment onthe materials, and the modelers respond to thecomments. If kits are used, the librarian coordinatestheir circulation. The CDM Administrator reviews themodel to ensure that all model documents are preparedproperly for loading the CDM database tables.

4.2.3 Phase 2: Defining Relation Classes

Objective:

o Identify and define the apparent relation classesthat are within the scope of the model.

This phase is patterned after Phase 2 of IDEFl, and thedescription presented here is less detailed than the one in theIDEF1 documentation. Please refer to that documentation forfurther information.

Tasks:


See Phase 1, Task 1, for the options and factors toconsider.


See Phase 1, Task 2, for details.

3. The modeler builds a relation class matrix.

List all of the entity classes across the top and downthe left side of Relation Class Matrix Forms or on alarge sheet of grid paper; the matrix is easier to workwith when it is all on one sheet of paper. Then,determine which pairs of entity classes are related toeach other. Look for data about one thing that is alsodata about another. For example:

o Customer and Sales Order

4-20

UM 62034100130 September 1990

A sales order has some data about the customer

that placed it, such as customer number, name,address, etc.

o Part and Purchase Order

A purchase order contains some data about theparts being ordered, such as part numbers,descriptions, dimensions, etc.

o Department and Employee

One element of data about an employee is thedepartment to which he/she is assigned, such asdepartment number, name, etc.

o Manufacturing Order and Employee

A manufacturing order has some data about theemployees who performed its operations, such asemployee numbers, names, etc.

Such sharing of data implies a relationship of some sort.Talk to the sources when additional information aboutsuch sharing of data is needed. If a database design isamong the source material, the relationships it depictsmay be useful. Place an "X" in the matrix at theintersection of each pair of related entity classes.

4. The modeler prepares overview diagrams (FEOs).

These diagrams are intended to show all of the entity andrelation classes on just a few pages. Reviewers canusually understand overview diagrams better thanindividual entity class diagrams, so they will be theprimary (or sole) depiction of the model. Each diagramshould focus on a particular subject with which thereviewers will be comfortable (e.g., major activities),and each should contain about 10-to-20 entity classes andtheir relation classes. Use large sheets of paper (e.g.,11x17) and photo-reduction, if necessary.

Every entity and relation class in the matrix must appearin at least one diagram. Use some authorconvention to signify the entity classes that appear inmore than one diagram (e.g., by broadening ordouble-lining the entity class boxes) and to identifywhich other diagrams they are in (e.g., by listing thediagram numbers near the entity class boxes). Forexample, if entity class E27 is in diagrams Fl, F3, andF4:

o List F3 and F4 near E27's box on Fl.o List F1 and F4 near E27's box on F3.o List F1 and F3 near E27's box on F4.

4-21

UM 62034100130 September 1990

Add the appropriate cardinality and a meaningful label toeach relation class as it is drawn in a diagram. Talk tothe sources when additional information about a relationclass label and cardinality is needed. Cardinalities maybe either specific or nonspecific; derived entity classesshould not be introduced yet to avoid getting ahead ofthe reviewers. Check off each relation class in thematrix as it is drawn in a diagram (e.g., by circling theX in the matrix).

5. The modeler defines any additional entity classes thatare introduced during this phase.

Whenever a new entity class is introduced, immediatelydocument it by performing the tasks in Phases 1 and 2that are needed to:

o Update the entity class pool.o Prepare an Entity Class Definition Form.o Update the relation class matrix if it has been

started.o Update the overview diagrams if they have been

started.

6. The modeler, reviewers, and librarian participate inreviewing the Phase 2 model.

See Phase 1, Task 5 for details.Deviation from IDEFI:

Usually, individual entity class diagrams are not preparedbecause the overview diagrams are easier to understand and review,and Relation Class Definition Forms are not filled out because therelation class labels are supposed to be self-descriptive. Also,the Related Entity Class Node Cross-Reference Form is replaced bythe specially designed Relation.

Classes Form, which is called for in Phase 3. However, theproject manager may require the use of any or all of these tosupplement the model documentation called for above.

4.2.4 Phase 3: Defining Key Classes

Objectives:

o Refine all nonspecific relation classes in the model.

o Identify the apparent attribute classes that arewithin the scope of the model.

o Identify and define a key class for each entity classin the model.

o Validate every relation class in the model via keyclass migration.

This phase is patterned after Phase 3 of IDEFI, and thedescription presented here is less detailed than the one in theIDEFl documentation. Please refer to that documentation for

4-22

UM 62034100130 September 1990

further information. Also, please refer to Subsection 5.2.2.1 fordetails on how to fill out the Relation Classes, Owned AttributeClasses, and Inherited Attribute Classes Forms.

Tasks:


See Phase 1, Task 1, for the options and factors toconsider.


See Phase 1, Task 2, for details.

3. The modeler refines the nonspecific relation classes.

Introduce a derived entity class for each nonspecificrelation class and convert that relation class to a pairof specific relation classes as shown in Figure 4-7 atthe end of this section. Assign entity class numbers tothe derived entity classes, record them in the entityclass pool, and fill out Entity Class Definition Forms.The sources may be able to recommend appropriate namesand definitions for some derived entity classes.

Remove the nonspecific relation classes from the relationclass matrix and the overview diagrams. Add the derivedentity classes and the specific relation classes to thematrix and the diagrams. Retain the same focus for eachdiagram unless the reviewers suggested a change.

Also, update any optional documents that are affected.

4. The modeler eliminates any unneeded triads or otherdual-path structures.

A dual-path structure is one composed of two or morerelated entity classes in which:

o There are two paths connecting one entity classto another

o One path is a single relation class

o The other path is a series of relation classes(unless the structure has only two entityclasses in which case the second path is asingle relation class also)

See the examples in Figure 4-8 at the end of thissection. Talk to the sources to determine whether thetwo paths are equal, unequal, or indeterminant. The pathsare equal if, for each dependent entity instance,they both lead to the same independent entity instance.The paths are unequal if, for each dependent entityinstance, they each lead to a different independententity instance. The paths are indeterminant if they are

4-23

UM 620341001.30 September 1990

equal for some dependent entity instances and unequal forothers. If the paths are equal, thesingle-relation-class path is redundant and must beremoved from the relation class matrix and the overviewdiagrams (and from any optional docume:±r ir whichappears).

5. The modeler fills out Relation Class Forms.

Record each relation class on a Re]ption Classes Form.Leave the Ind. K.C. No. column biank for now. As eachrelation class is recorded on a form, check it off on acopy of each overview diagram in which it appears (e.g.,by circling the relation class labels).

6. The modeler builds an attribute class pool.

Examine the entries in the source data list and deducewhat sort of characteristic each represents, where acharacteristic is a data element that identifies,describes, refers to, etc., a thing being modeled. Eachsort of characteristic is represented by an attributeclass. Talk to the sources when additional informationis needed. Assign an attribute class number to eachattribute class, and record it on an Attribute Class PoolForm.

When examining data field names from a database design,realize that several data fields may represent the samekind of "real-world characteristic" or, conversely, thatone data field may represent several differentcharacteristics. For example:

o SALES-ORDER-CUSTOMER-NUMBER, INVOICE-CUSTOMER-NUMBER, and ACCOUNTS-RECEIVABLE-CUSTOMER-NUMBERall represent the same characteristic of acustomer, i.e., customer number.

o SALESMAN-ASSIGNMENT-CODE may represent both theterritory and the product for which the salesmanis responsible.

Also, realize that certain data fields may be present fortechnical reasons only (e.g., record codes) and shouldnot be included in the attribute class pool.

7. The modeler defines the key classes of the totallyindependent entity classes.

A totally independent entity class is one that is notdependent in any relation class. Select any one and findthe attribute classes in the pool that make up its keyclass. Watch for attribute class synonyms and homonyms,and handle them like those for entity classes (Phase 1,Task 4). A few totally independent entity classes havetwo or more alternate key classes (e.g., employees can beuniquely identified by either employee numbers or Social

4-24

UM 62034100130 September 1990

Security Numbers). Be sure to identify all key classesfor such an entity class. Also, be sure each key classconforms to the following rules:

o Single-Owned Ruleo Unique-Key Ruleo No-Null Ruleo No-Repeat Ruleo Smallest-Key-Class-Rule

See Section 4.1.1 for explanations of these rule

INTEGRATED INFORMATION SUPPORT SYSTEM (IISS) Volume V ... · Part 1 AD-A250 448 INTEGRATED INFORMATION SUPPORT SYSTEM (IISS) Volume V - Common Data Model Subsystem Part 1 - CDM Administrator's

Documents