Intro: Databases • Database (DB): – Collection of related data • Has following characteristics: – Logically coherent collection of data with inherent meaning – Designed for specific purpose – Represents some aspect of the real world: the miniworld or universe of discourse • DB is not a random collection of facts • Concept of DB independent of Database Management System (DBMS) • Prior to DBMS concept, DBs maintained as flat file (traditional) systems 1
27
Embed
Intro: Databases Database (DB): Collection of related datadjmoon/db/db-notes/db-intro.pdf · 2020-01-20 · representation (physical level), and the results converted into a user-oriented
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Intro: Databases
• Database (DB):
– Collection of related data
• Has following characteristics:
– Logically coherent collection of data with inherent meaning
– Designed for specific purpose
– Represents some aspect of the real world: the miniworld or universe of discourse
• DB is not a random collection of facts
• Concept of DB independent of Database Management System (DBMS)
• Prior to DBMS concept, DBs maintained as flat file (traditional) systems
1
Intro: Flat File Systems
• Flat file system:
– One or more data files accessed via dedicated programs
• Typical large organization (e.g., university) has many departments, each with specificneeds
– Each department has own set of data
– Each department has own set of apps for processing data
– Data stored in one or more data files accessed by app programs which define
– Correspond closely to way data represented physically, while hiding the details
– Applicable to external and conceptual DBMS levels
– Better than OO for representing structure
– Poorer than OO for representing constraints
– Example paradigms:
(a) Relational model
∗ Represent DB as tables
(b) Network (legacy)
∗ Represent DB as collections of records
∗ Represent relations as sets of records
∗ Graph-based representation
(c) Hierarchical (legacy)
∗ Same representation as Network, but record may only have 1 parent
∗ Tree-based representation
– Relational preferred because the other 2 require knowledge of physical repre-sentation
3. Physical models
– Describe DB at physical level
– Example paradigms:
(a) Unifying
(b) Frame memory
13
Intro: Data Definition Languages
• Once models and schemas have been established for a particular domain, the schemasmust be installed in the DBMS
• Schemas defined in terms of data definition languages (DDLs)
• Need a DDL for each model used in the DB
• Potentially, need 3:
1. One for the model at the external level (view DDL)
2. One for the model at the conceptual level (DDL)
3. One for the model at the internal level (storage DDL)
• In practice, usually a single DDL used for all levels
• DDL statements compiled and results stored in catalog
14
Intro: Data Manipulation Languages
• Queries (requests for data) are posed in terms of data manipulation languages(DMLs)
• Used to add, delete, retrieve, and modify data
• 2 general types:
1. Non-procedural
– High-level
– Declarative - specify what to retrieve, not how
– Retrieve sets of records per query
2. Procedural
– Specify what to retrieve and how to retrieve it
– Retrieve single record per query
• Stand-alone DML called query language
• Fourth generation languages
– Higher-level than set-at-a-time languages
– Non-procedural
– Examples:
∗ Form generators
∗ Report generators
∗ Graphics generators
∗ Application generators
15
Intro: Data Abstraction
• One of key concepts underlying ANSI-SPARC 3-level architecture is data abstraction
– The user is insulated from implementational details
• Data abstraction is achieved by having 3 levels, each potentially with its own datamodel
• The key to data abstraction is the catalog
– Catalog stores
1. Schemas for each level and mappings
2. Data names, types, sizes
3. Relation names
4. Integrity constraints
5. Indices
6. Access paths
7. Authorized user names
– In addition to enabling data abstraction, other benefits include
1. Metadata stored centrally; provides control
2. May id who owns data
3. Redundancies/inconsistencies more easily id’d
4. Impact of change determined prior to implementation
5. Security enforced
6. Integrity enforced
• When a query is posed in a DML, it must be converted into the implementationalrepresentation (physical level), and the results converted into a user-oriented repre-sentation (view level)
• In order to do this, the DBMS must support mapping between
1. The schema at the external level and the schema at the conceptual level, and
2. The schema at the conceptual level and the schema at the internal level
16
Intro: Data Abstraction (2)
• These mappings are stored in the catalog
• By storing all schemas and mappings in the catalog, a DBMS can be independent ofany particular domain
• Physical data independence is the ability to change the internal schema without hav-ing to alter higher-level schemas or applications
– Examples of such changes are
1. File reorganization
2. Change of access path
– This does not include changes to the data itself
• Logical data independence is the ability to change the conceptual schema withouthaving to alter the external schema or applications
– Examples of such changes are
1. Adding/deleting record types
2. Extending a record type
3. Changing constraints
– More difficult to achieve than physical data independence
• The above refer to program-data independence
17
Intro: Data Abstraction (3)
• Program-operation independence refers to the ability to change implementation ofdata operations without having to change the interface
– This is primarily related to OO models
18
Intro: People Involved with DBMS
• DBMS staff
1. DataBase Administrator (DBA)
– Person of group of people with overall responsibility for DBMS
– Involved with
(a) Designing DB (schemes, etc.)
(b) Monitoring performance
(c) Modifying DB as needed
(d) Granting privileges
(e) Evaluating and acquiring supplementary software
2. Systems analysts
– Determine requirements of end users
– Develop specs for canned transactions
3. Applications programmers
4. Systems designers and implementers
– Create DBMS itself
5. Tool developers
– Design and implement software packages to facilitate use of DBMS
6. Operators and maintenance personnel
– Sys admin personnel
• Types of end users:
1. Naive/parametric
– Have no knowledge of DBMS details
– Interact via canned transactions
2. Casual users
– Interact via query languages
3. Sophisticated users
– Interact at all levels
19
Intro: Primary DB Modules
1. Stored data manager
• Controls all access to the data
20
Intro: Primary DB Modules (2)
2. Compilers
(a) DDL compiler
• Accessed by DB staff
• Handles DB definition and privileged commands
• Results stored in catalog
(b) Query compiler
• Accessed by casual users
• Handles general DB queries
• Results passed to query optimizer for efficient data access
(c) Precompiler
• Accessed by application programmers
• Converts embedded DB code in app to object code
• Rest of program compiled by host language compiler
• 2 results linked into single program
3. Run time DB processor
• Executes user ”programs”
(a) Privileged commands
(b) Executable query plans
(c) Canned transactions
• Interacts with catalog and data manager
4. Concurrency control
5. Recovery and backup
21
Intro: Secondary DB Software
1. Utilities
• Loaders
– Convert existing data files into format accessible by DBMS
– Conversion tool converts from one DBMS format to another
• Backup
• DB storage reorganization
– Convert from one file organization to another
– Used to improve performance
• Performance monitors
2. Tools and environments
• CASE tools
– Computer Aided Software Engineering
– DB design
• Data dictionary
– Expanded catalog
– Stores additional info:
∗ Usage statistics
∗ Design rationale
∗ Semantics
– Benefits of catalog
∗ Serves as documentation of DB design
∗ Useful for maintenance and performance monitoring
• Application development environments
• Communications software
22
Intro: Client/Server Model
• Early DBMSs centralized
– All processing performed on a central machine
– User access via remote terminals with no processing capability
• Client/server model enabled by intelligent terminals
– Server hosts DBMS software
– Client executes apps locally
– Client accesses server when needs specialized resources
– Connected via network
• Client/server architectures characterized in terms of tiers
1. 2-tier model
– 1 server, multiple clients
– Functionality can be allocated in several ways
(a) Transaction/query server model
∗ Server hosts DB, query, and transaction functionality
∗ Client executes apps that contact server when need to access DB
∗ Connections achieved using standards like ODBC and JDBC
(b) OODBMS approach
∗ Uses a ”more integrated” (i.e., arbitrary) approach
∗ Much functionality migrated to client
· Client may host user interface, data dictionary functions, compilers,optimizers, ...
· Server hosts DB (data storage), concurrency control, recovery
∗ Server often referred to as data server as primary task is repository forDB
2. n-tier model
– Has 1 or more intermediate layers
– Middle tier often called application/web server
– Stores rules, checks client credentials, ...
– Accepts client requests and forwards to server
– Forwards (partially) processed results to client
23
Intro: DBMS Classification
• DBMSs can be classified along a number of dimensions:
1. Data model
– Primary means of identification
– Primary model is relational
– Newer models include object and object-relational
– Legacy models include hierarchical and network
2. Number of users
– Single
– Multi-user
3. Number of sites hosting DBMS
– Centralized
∗ Single site
– Distributed
∗ Software resides on multiple servers connected via network
∗ Variations:
· HomogeneousSame software at all sites
· HeterogeneousMultiple autonomous DBs at several sites
· Federated DBMSLoosely coupled DBMSs with some autonomy
4. Cost
5. Access path
6. General v special purpose
– General not associated with any application
– Special purpose designed for one particular app
24
Intro: Advantages of DBMS Approach
1. Control of data redundancy
2. Data consistency
3. More info from the same data
4. Sharing of data
5. Improved data integrity
• Integrity results from consistency and validity
• Expressed in terms of constraints
• Integrity constraints (referred to as business rules) implemented as rules thatverify data on entry, modification, or deletion
6. Improved security
7. Enforcement of standards
• Design controlled by central authority
8. Economy of scale
• As a result of reduced redundancy
9. Balance among conflicting requirements
• Needs of different users may be at odds
• DBA can make informed decisions based on various needs
• Resulting schemas (should) be those with greatest overall benefit to organization
10. Improved accessibility to data and responsiveness
• As a result of software utilities like report generators, query languages, etc.
11. Increased productivity
• Users are insulated from implementational details
12. Improved maintenance
• Data independence allows changes to be made at one level without needing tomake changes at other levels
25
Intro: Advantages of DBMS Approach (2)
13. Increased concurrency
14. Improved backup and recovery
• Result of transaction and concurrency control, and centralized access of DB
15. Persistent storage of program objects
• Relevant to OO DB
• Objects exist independently of apps
• DBMS provides for storage and conversion between program representation andDBMS format
16. Efficient query processing
• DBMS provides access paths for efficient retrieval of data
• Compilers may optimize queries
17. Multiple interfaces
18. Ability to represent complex relationships among data
19. Allow inference and actions via rules
• Rules may be associated with DB
• Allow inference of new data
• DBMS may also allow procedures stored independently of apps
26
Intro: Disadvantages of DBMS Approach
1. Complexity
• DBMS is a complex piece of software
• To use to advantage, must understand all aspects
• Poor decisions at early stages of DB design can be costly
2. Cost
• DBMS is expensive
• Often requires additional hardware (processing power, disk space, memory, ...)
• Cost of conversion
(a) Converting existing apps
(b) Converting existing data
(c) Training personnel
3. Size (see above)
4. Performance
• Due to domain-independent nature of DBMS software, queries are slower due tomappings between levels
5. Higher impact of failure
• If DBMS crashes, entire organization affected
6. Higher impact of security breach
• While DBMS provides greater security measures, if security is breached the entireorganization is affected