Top Banner

of 65

Pp t 0000015

Jun 03, 2018

Download

Documents

bengej
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/12/2019 Pp t 0000015

    1/65

    Chapter 9 Data Design

  • 8/12/2019 Pp t 0000015

    2/65

    Explain file-oriented systems and how theydiffer from database management systemsExplain data design terminology, including

    entities, fields, common fields, records,files, tables, and key fieldsDescribe data relationships, draw an entityrelationship diagram, define cardinality, and

    use cardinality notation

    2

  • 8/12/2019 Pp t 0000015

    3/65

    Explain the concept of normalizationExplain the importance of codes anddescribe various coding schemes

    Explain data warehousing and data miningDifferentiate between logical and physicalstorage and recordsExplain data control measures

    3

  • 8/12/2019 Pp t 0000015

    4/65

    4

    Data Structures A framework for organizing, storing, and

    managing data Consists of files or tables that interact in various

    waysEach file or table contains data about people, places,things, or events

    FIGURE 9-1 Typical data design task list

  • 8/12/2019 Pp t 0000015

    5/65

    5

    Mario and Danica: A Data DesignExample Marios Auto Shop

    Mario relies on two file oriented systems, thatstore data in separate files that are notconnected

    The MECHANIC SYSTEM uses the MECHANIC fileto store data about shop employeesThe JOB SYSTEM uses the JOB file to store dataabout work performed at the shop

    Danicas Auto Shop Uses a database management system (DBMS)with two separate tables that are joined, so they

    act like one large tableIn Danicas SHOP OPERATIONS SYSTEM, thetables are linked by the Mechanic No field, whichis called a common field because it connects thetables

    FIGURE 9-2 In the example shownhere, data about the mechanic, thecustomer, and the brake job mightbe stored in a file-oriented systemor in a database system

  • 8/12/2019 Pp t 0000015

    6/65

    6

    FIGURE 9-4 Danicas SHOP OPERATIONSSYSTEM uses a database design, whichavoids duplication. The data can be viewed asif it were one large table, regardless of where

    the data is stored physically

    FIGURE 9-3 Marios shop uses two separatesystems, so certain data must be entered twice. Thisredundancy is inefficient, and can produce data errors

    Marios Auto Shop Danicas Auto Shop

  • 8/12/2019 Pp t 0000015

    7/657

    Is File Processing Still Important? Handles large volumes of structured data on a regular

    basis Can be cost-effective

    Great for transaction processing The Database Environment A database management system (DBMS) is a

    collection of tools, features, and interfaces thatenables users to add, update, manage, access, andanalyze data

  • 8/12/2019 Pp t 0000015

    8/658

    FIGURE 9-5 A credit cardcompany that posts thousandsof daily transactions mightconsider a file processingoption

  • 8/12/2019 Pp t 0000015

    9/659

    DBMS Advantages Scalability - A system can be expanded, modified, or

    downsized Economy of scale - Database design allows better

    utilization of hardware Enterprise-wide application - A database

    administrator (DBA) assesses overall requirementsand maintains the database for the entire

    Stronger standards - Standards for data names,formats, and documentation are followed uniformlythroughout the organization

  • 8/12/2019 Pp t 0000015

    10/6510

    DBMS Advantages Better security - The DBA ensures that only legitimate

    users access the database and different users havedifferent levels of access

    Data independence - Systems that interact with aDBMS are relatively independent of how the physicaldata is maintained That design provides the DBA flexibility to alter

    data structures without modifying informationsystems that use the data

  • 8/12/2019 Pp t 0000015

    11/6511

    Interfaces for Users, DatabaseAdministrators, and Related Systems USERS

    Typically work with predefined queries andswitchboard commands, but also use querylanguages to access stored data

    DATABASE ADMINISTRATORS Concerned with data security and integrity,

    preventing unauthorized access, providingbackup and recovery, audit trails,maintaining the database, and supportinguser needs

    RELATED INFORMATION SYSTEMS A DBMS can support several related

    information systems that provide input to,and require specific data from, the DBMS

    FIGURE 9-7 In addition to interfacesfor users, database administrators,and related information systems, a

    DBMS also has a data manipulationlanguage, a schema andsubschemas, and a physical datarepository

  • 8/12/2019 Pp t 0000015

    12/6512

    Data Manipulation Language A data manipulation language (DML) controls

    database operations, including storing, retrieving,updating, and deleting data

    Schema The complete definition of a database, including

    descriptions of all fields, tables, and relationships, iscalled a schema

    Physical Data Repository The complete definition of a database, includingdescriptions of all fields, tables, and relationships, iscalled a schema

  • 8/12/2019 Pp t 0000015

    13/6513

    Overview A data manipulation language (DML) controls database

    operations, including storing, retrieving, updating, anddeleting data

    Connecting to the Web The objective is to connect the database to the Web and

    enable data to be viewed and updated Middleware - software that integrates different applications

    and allows them to exchange data and interpret clientrequests in HTML form; then translate the requests intocommands that the database can execute

    Data Security Web-based data must be secure, yet easily accessible to

    authorized users

  • 8/12/2019 Pp t 0000015

    14/65

    14

    FIGURE 9-9 A Web-based design characteristics include global access,ease of use, multiple platforms, cost effectiveness, security issues, andadaptability issues. In a Web-based design, the Internet serves as thefront end, or interface, for the database management system. Access tothe database requires only a Web browser and an Internet connection

  • 8/12/2019 Pp t 0000015

    15/65

    15

    FIGURE 9-10 When a client workstation requests a Web page (1), theWeb server uses middleware to generate a data query to the databaseserver (2). The database server responds (3), and middlewaretranslates the retrieved data into an HTML page that can be sent by theWeb server and displayed by the users browser (4)

  • 8/12/2019 Pp t 0000015

    16/65

    16

    Definitions: ENTITY

    An entity is a person, place, thing, or event for which datais collected and maintained

    TABLE OR FILE A table, or file, contains a set of related records that store

    data about a specific entity FIELD

    A field, also called an attribute, is a single characteristicor fact about an entity

    RECORD A record, also called a tuple (rhymes with couple), is a setof related fields that describes one instance, oroccurrence, of an entity, such as one customer, one order,or one product

  • 8/12/2019 Pp t 0000015

    17/65

    17

    Key Fields: PRIMARY KEY

    A field or combination of fields that uniquely andminimally identifies a particular member of an entity

    CANDIDATE KEY Any field that could serve as a primary key is called a

    candidate key FOREIGN KEY

    A common field that exists in more than one table andcan be used to form a relationship, or link, between the

    tables SECONDARY KEY A field or combination of fields that can be used to access

    or retrieve records

  • 8/12/2019 Pp t 0000015

    18/65

    18

    Referential Integrity: A set of rules that

    avoids datainconsistency andquality problems. Ina relational database,referential integritymeans that a foreignkey value cannot beentered in one table

    unless it matches anexisting primary keyin another table

    FIGURE 9-13 Microsoft Access allows a user to specify thatreferential integrity rules will be enforced in a relationaldatabase design

  • 8/12/2019 Pp t 0000015

    19/65

    19

    Drawing an ERD The first step is to list the entities

    that you identified during thesystems analysis phase and toconsider the nature of therelationships that link them

    Types of Relationships Three types of relationships can

    exist between entities: One-to-one One-to-many Many-to-many

    FIGURE 9-14 In an entity-relationship diagram, entities arelabeled with singular nouns and relationships are labeledwith verbs. The relationship is interpreted as a simpleEnglish sentence.

  • 8/12/2019 Pp t 0000015

    20/65

    20

    A one-to-one relationship,abbreviated 1:1, existswhen exactly one of thesecond entity occurs foreach instance of the firstentity

    Figure 9-15 showsexamples of several 1:1relationships

    A number 1 is placed

    alongside each of the twoconnecting lines toindicate the 1:1relationship

    FIGURE 9-15 Examples of one-to-one (1:1)relationships

  • 8/12/2019 Pp t 0000015

    21/65

    21

    A one-to-manyrelationship, abbreviated1:M, exists when oneoccurrence of the firstentity can relate to manyinstances of the secondentity, but each instance ofthe second entity canassociate with only oneinstance of the first entity

    FIGURE 9-16 Examples of one-to-many(1:M) relationships

  • 8/12/2019 Pp t 0000015

    22/65

    22

    A many-to-manyrelationship,abbreviated M:N,exists when oneinstance of the first

    entity can relate tomany instances ofthe second entity,and one instance ofthe second entity

    can relate to manyinstances of the firstentity

    FIGURE 9-17 Examples of many-to-many (M:N) relationships.Notice that the event or transaction that links the two entities isan associative entity with its own set of attributes and

    characteristics

  • 8/12/2019 Pp t 0000015

    23/65

    23

    FIGURE 9-18 An entity-relationship diagram for SALES REP,CUSTOMER, ORDER, PRODUCT, and WAREHOUSE. Noticethat the ORDER and PRODUCT entities are joined by anassociative entity named ORDER LINE

  • 8/12/2019 Pp t 0000015

    24/65

    24

    FIGURE 9-19 Crows foot notation is acommon method of indicating cardinality.The four examples show how you can usevarious symbols to describe therelationships between entities

    Cardinality Describes the numeric

    relationship betweentwo entities and showshow instances of oneentity relate to instancesof another entity

    A common method ofcardinality notation iscalled crows foot

    notation because of theshapes, which includecircles, bars, and symbols,that indicate various possibilities

  • 8/12/2019 Pp t 0000015

    25/65

    25

    FIGURE 9-20 In the

    first example ofcardinality notation, oneand only oneCUSTOMER can placeanywhere from zero tomany of the ORDERentity.In the second example,one and only oneORDER can includeone ITEM ORDEREDor many.In the third example,one and only oneEMPLOYEE can have

    one SPOUSE or none.In the fourth example,one EMPLOYEE, ormany employees, ornone, can be assignedto one PROJECT, ormany projects, or none

  • 8/12/2019 Pp t 0000015

    26/65

    26

    FIGURE 9-21 An ERD for a library systemdrawn with Visible Analyst. Noticethat crows foot notation has been used andrelationships are described in bothdirections

  • 8/12/2019 Pp t 0000015

    27/65

    27

    Normalization is the process of creating tabledesigns by assigning specific fields or attributes toeach table in the database

    Normalization involves applying a set of rules that

    can help you identify and correct inherentproblems and complexities in your table designs

    The normalization process typically involves fourstages: Unnormalized design First normal form Second normal form Third normal form

  • 8/12/2019 Pp t 0000015

    28/65

    28

    Standard Notation Format Starts with the name of the table, followed by a

    parenthetical expression that contains the fieldnames separated by commas. The primary key

    field(s) is underlined, like this: NAME (FIELD 1, FIELD 2, FIELD 3)

    A repeating group is a set of one or more fieldsthat can occur any number of times in a singlerecord, with each occurrence having differentvalues

  • 8/12/2019 Pp t 0000015

    29/65

    29

    FIGURE 9-22 In the ORDER table design, two orders have repeating groups thatcontain several products. ORDER is the primary key for the ORDER table, andPRODUCT NUMBER serves as a primary key for the repeating group. Becauseit contains repeating groups, the ORDER table is unnormalized

  • 8/12/2019 Pp t 0000015

    30/65

    30

    First Normal Form (1NF) A table is in first normal form (1NF) if it does not

    contain a repeating group When you eliminate the repeating group, additional

    records emerge one for each combination of aspecific order and a specific product The result is more records, but a greatly

    simplified design

  • 8/12/2019 Pp t 0000015

    31/65

    31

    FIGURE 9-23 The ORDER table as it

    appears in 1NF. The repeating groupshave been eliminated. Notice that therepeating group for order 86223 hasbecome three separate records, andthe repeating group for order 86390has become two separate records. The1NF primary key is a combination ofORDER and PRODUCT NUMBER,which uniquely identifies each record

  • 8/12/2019 Pp t 0000015

    32/65

    32

    Second Normal Form (2NF) Must understand the concept of functional

    Dependence Field A is functionally dependent on Field B if

    the value of Field A depends on Field B A DATE value is functionally dependent on an

    ORDER, because for a specific order number,there can be only one date

    Objective is to break the original table into twoor more new tables and reassign the fields sothat each non-key field will depend on theentire primary key in its table

  • 8/12/2019 Pp t 0000015

    33/65

    33

    FIGURE 9-24 ORDER, PRODUCT,

    and ORDER LINE tables in 2NF. Allfields are functionally dependent onthe primary key

  • 8/12/2019 Pp t 0000015

    34/65

    34

    Third Normal Form (3NF) A design is in 3NF if every non-key field depends

    on the key, the whole key, and nothing but the key A 3NF design avoids redundancy and data integrity

    problems that still can exist in 2NF designs To convert the table to 3NF, you must remove all

    fields from the 2NF table that depend on anothernon-key field and place them in a new table thatuses the non-key field as a primary key

  • 8/12/2019 Pp t 0000015

    35/65

    35

    FIGURE 9-25 When the PRODUCTtable is transformed from 2NF to 3F,the result is two separate tables:PRODUCT and SUPPLIER. Note thatin 3NF, all fields depend on the key, thewhole key, and nothing but the key!

  • 8/12/2019 Pp t 0000015

    36/65

    36

    Example 1: Crossroads College

    FIGURE 9-27 An initialentity-relationshipdiagram for ADVISOR,STUDENT, and COURSE

    FIGURE 9-28 The STUDENT table is unnormalized because it contains arepeating group that represents the courses each student has taken

  • 8/12/2019 Pp t 0000015

    37/65

    37

    FIGURE 9-29 The STUDENTtable in 1NF. Notice that theprimary key has been expandedto include STUDENT NUMBERand COURSE NUMBER

  • 8/12/2019 Pp t 0000015

    38/65

    38

    FIGURE 9-30 The STUDENT,COURSE, and GRADE tables in2NF. Notice that all fields arefunctionally dependent on theentire primary key of theirrespective tables

  • 8/12/2019 Pp t 0000015

    39/65

    39

    FIGURE 9-31 STUDENT, ADVISOR, COURSE, andGRADE tables in 3NF. When the STUDENT table is

    transformed from 2NF to 3NF, the result is two tables:STUDENT and ADVISOR

  • 8/12/2019 Pp t 0000015

    40/65

    40

    FIGURE 9-32 The entity-relationship diagram for STUDENT, ADVISOR, and COURSE after normalization. The GRADE entity wasidentified during the normalization process. GRADE is an associativeentity that links the STUDENT and COURSE tables

  • 8/12/2019 Pp t 0000015

    41/65

    41

    Example 2: Magic Maintenance

    FIGURE 9-33 A

    relational databasedesign for a computerservice company usescommon fields to linkthe tables and form anoverall data structure.Notice the one-to-many notationsymbols, and theprimary keys, whichare indicated withgold-colored keysymbols

  • 8/12/2019 Pp t 0000015

    42/65

    42

    FIGURE 9-34 Sample

    data, primary keys,and common fields forthe database shown inFigure 9-33.The design is in 3NF.Notice that all nonkeyfields functionallydepend on a primarykey, the whole primarykey, and nothing butthe primary key

  • 8/12/2019 Pp t 0000015

    43/65

    43

    Suppose you work in IT, and the sales teamneeds answers to three specific questions Did any customers receive service after

    12/14/2013? If so, who were they? Did technician Marie Johnson put in more than

    six hours of labor on any service calls? If so,which ones?

    Were any parts used on service calls in

    Washington? If so, what were the part numbers,descriptions, and quantities?

  • 8/12/2019 Pp t 0000015

    44/65

    44FIGURE 9-35 Question 1

  • 8/12/2019 Pp t 0000015

    45/65

    45FIGURE 9-36 Question 2

  • 8/12/2019 Pp t 0000015

    46/65

    46FIGURE 9-37 Question 3

  • 8/12/2019 Pp t 0000015

    47/65

    47

    Overview of Codes Because codes can represent data and they are

    shorter than the data they represent, they savestorage space and costs, reduce data

    transmission time, and decrease data entry time Codes can be used to reveal or conceal

    information Codes can reduce data input errors

    Coded data is easier to remember The code itself can provide immediate

    verification that the entry is correct

  • 8/12/2019 Pp t 0000015

    48/65

    48

    Types of Codes Codes should be easy to learn and apply Sequence Codes

    Numbers or letters assigned in a specific order

    Contain no additional information other than anindication of order of entry into the system

    Block sequence codesUse blocks of numbers for different classifications

    100-level courses are freshman-level200-level courses are sophomore-level

  • 8/12/2019 Pp t 0000015

    49/65

    49

    Types of Codes (Cont.) Alphabetic codes

    Use alphabet letters to distinguish one item fromanother

    Category codes identify a group of related itemsA department store may use a two-character category code toidentify the department

    Abbreviation codes are alphabetic abbreviationsState codes include NY for New York, ME for Maine, and MN forMinnesota

    Some abbreviation codes are called mnemonic codesbecause they use a specific combination of letters that areeasy to remember

  • 8/12/2019 Pp t 0000015

    50/65

    50

    FIGURE 9-39 This image shows abbreviationsfor the worlds 30 busiest airports. How manycan you identify?

  • 8/12/2019 Pp t 0000015

    51/65

    51

    Types of Codes (Cont.) Significant digit codes

    Distinguish items by using a series of subgroups of digitsPostal codes are significant digit codes

    Derivation codesCombine data from different item attributes, or characteristics

    Cipher codesUse a keyword to encode a number

    A retail store, for example, might use a 10-letter word, such asCAMPGROUND, to code wholesale prices, where the letter Crepresents 1, A represents 2, and so on. Thus, the code,GRAND, indicates that the store paid $562.90 for the item

    Action codesIndicate what action is to be taken with an associated item

    X (to exit the program)

  • 8/12/2019 Pp t 0000015

    52/65

    52

    FIGURE 9-41 A magazine subscriber code is derivedfrom various parts of the name and address

    FIGURE 9-40 Sample of a code that uses significant digits to pinpoint the location of aninventory item

  • 8/12/2019 Pp t 0000015

    53/65

    53

    Designing Codes Keep codes concise Allow for expansion Keep codes stable Make codes unique Use sortable codes Use a simple structure Avoid confusion Make codes meaningful Use a code for a single purpose Keep codes consistent

  • 8/12/2019 Pp t 0000015

    54/65

    54

    Tools and Techniques Companies use data warehousing and data

    mining as strategic tools to help manage thehuge quantities of data they need for business

    operations and decisions Data warehousing Data mining

  • 8/12/2019 Pp t 0000015

    55/65

    55

    Data Warehousing An integrated

    collection of datathat can include

    seemingly unrelatedinformation, nomatter where itis stored inthecompany

    FIGURE 9-42 A data warehouse stores data from several systems. Byselecting data dimensions, a user can retrieve specific informationwithout having to know how or where the data is stored

  • 8/12/2019 Pp t 0000015

    56/65

    56

    Data Mining Looks for

    meaningfuldata patterns and

    relationships inlarge amountsof data

    FIGURE 9-43 North Carolina StateUniversitys clickable map can take you toa collection of IT ethics issues. Here, themap points to the data mining area

  • 8/12/2019 Pp t 0000015

    57/65

    57

    Logical versus Physical Storage Logical storage refers to data that a user can

    view, understand, and access, regardless ofhow or where that information actually is

    organized or stored Physical storage is strictly hardware-related

    because it involves the process of reading andwriting binary data to physical media such as ahard drive, CD-ROM, or network-based storagedevice

  • 8/12/2019 Pp t 0000015

    58/65

    58

    Data Coding EBCDIC (Extended Binary Coded Decimal

    Interchange Code - pronounced EB-see-dik)A coding method used on mainframe computers

    and high-capacity servers ASCII (American Standard Code for Information

    Interchange - pronounced ASK-ee)A coding method used on most personalcomputers

    BINARYRepresents numbers as actual binary values,rather than as coded numeric digits

  • 8/12/2019 Pp t 0000015

    59/65

    59

    Data Coding (Cont.) UNICODE

    Supports virtually all languages and has become aglobal standard

    FIGURE 9-44 Unicode is an international coding format thatrepresents characters as integers, using 16 bits per character.The Unicode Consortium maintains standards and support forUnicode

  • 8/12/2019 Pp t 0000015

    60/65

    60

    Data Coding (Cont.) STORING DATES

    Y2K IssueInternationalOrganization forStandardization (ISO)requires aformat of four digitsfor the year, two forthe month, and twofor the day(YYYYMMDD)

    FIGURE 9-45 Microsoft Excel uses absolute dates incalculations. In this example, September 27, 2013, isdisplayed as 41544, and July 13, 2012, is displayed as41103. The difference between the dates is 441 days

  • 8/12/2019 Pp t 0000015

    61/65

    61

    A well-designed DBMS must provide built-in control and security features, includingsubschemas, passwords, encryption, audittrail files, and backup and recoveryprocedures to maintain data

  • 8/12/2019 Pp t 0000015

    62/65

    62

    A database consists of linked tables thatform an overall data structureA database management system (DBMS) is acollection of tools, features, and interfacesthat enable users to add, update, manage,access, and analyze data in a databaseDBMS designs are more powerful and flexiblethan traditional file-oriented systems

  • 8/12/2019 Pp t 0000015

    63/65

    DBMS components include interfaces forusers, database administrators, and relatedsystems; a data manipulation language; aschema; and a physical data repository

    In an information system, an entity is aperson, place, thing, or event for which datais collected and maintainedA primary key is the field or field combinationthat uniquely and minimally identifies aspecific record; a candidate key is any fieldthat could serve as a primary key

    63

  • 8/12/2019 Pp t 0000015

    64/65

    An entity-relationship diagram (ERD) is a graphicrepresentation of all system entities and therelationships among themThe relationship between two entities also isreferred to as cardinalityNormalization is a process for avoiding problemsin data designData design tasks include creating an initial ERD;assigning data elements to an entity; normalizing

    all table designs; and completing the datadictionary entries for files, records, and dataelements

    64

  • 8/12/2019 Pp t 0000015

    65/65

    A code is a set of letters or numbers used torepresent data in a systemLogical storage is information seen through ausers eyes, regardless of how or where that

    information actually is organized or storedPhysical storage is hardware related and involvesreading and writing binary data to physical mediaFile and database control measures include

    limiting access to the data, data encryption,backup/recovery procedures, audit-trail files,and internal audit fields