www.vidyarthiplus.com UNIT II RELATIONAL MODEL The relational model The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F. Codd. [1][2] In the relational model of a database, all data is represented in terms of tuples, grouped into relations. A database organized in terms of the relational model is a relational database.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
www.vidyarthiplus.com
UNIT II RELATIONAL MODEL
The relational model
The relational model for database management is a database model based on first-order
predicate logic, first formulated and proposed in 1969 by Edgar F. Codd.[1][2] In the relational
model of a database, all data is represented in terms of tuples, grouped into relations. A database
organized in terms of the relational model is a relational database.
www.vidyarthiplus.com
Diagram of an example database according to the Relational model.
In the relational model, related records are linked together with a "key".
The purpose of the relational model is to provide a declarative method for specifying data and
queries: users directly state what information the database contains and what information they
www.vidyarthiplus.com
want from it, and let the database management system software take care of describing data
structures for storing the data and retrieval procedures for answering queries.
Most implementations of the relational model use the SQL data definition and query language. A
table in an SQL database schema corresponds to a predicate variable; the contents of a table to a
relation; key constraints, other constraints, and SQL queries correspond to predicates. However,
SQL databases, including DB2, deviate from the relational model in many details; Codd fiercely
argued against deviations that compromise the original principles.
The catalog:
Types – Keys
* Alternate key - An alternate key is any candidate key which is not selected to be the primary
key
* Candidate key - A candidate key is a field or combination of fields that can act as a primary
key field for that table to uniquely identify each record in that table.
Like the Cartesian product, the cardinality of the result is the product of the cardinalities of its
factors, i.e., |R × S| = |R| × |S|. In addition, for the Cartesian product to be defined, the two
www.vidyarthiplus.com
relations involved must have disjoint headers—that is, they must not have a common attribute
name.
Projection (π)
A projection is a unary operation written as where is a set of attribute
names. The result of such projection is defined as the set that is obtained when all tuples in R are
restricted to the set .
This specifies the specific subset of columns (attributes of each tuple) to be retrieved. To obtain
the names and phone numbers from an address book, the projection might be written
. The result of that projection would be
a relation which contains only the contactName and contactPhoneNumber attributes for each
unique entry in addressBook.
Selection (σ)
A generalized selection is a unary operation written as where is a propositional
formula that consists of atoms as allowed in the normal selection and the logical operators
(and), (or) and (negation). This selection selects all those tuples in R for which holds.
To obtain a listing of all friends or business associates in an address book, the selection might be
written as . The result would be a
relation containing every attribute of every unique record where isFriend is true or where
isBusinessContact is true.
Rename (ρ)
A rename is a unary operation written as where the result is identical to R except that
the b attribute in all tuples is renamed to an a attribute. This is simply used to rename the
attribute of a relation or the relation itself.
www.vidyarthiplus.com
To rename the 'isFriend' attribute to 'is BusinessContact' in a relation,
might be used.
Domain relational calculus
In computer science, domain relational calculus (DRC) is a calculus that was introduced by
Michel Lacroix and Alain Pirotte as a declarative database query language for the relational data
model.[1]
In DRC, queries have the form:
where each Xi is either a domain variable or constant, and
true.
denotes a
DRC formula. The result of the query is the set of tuples Xi to Xn which makes the DRC formula
This language uses the same operators as tuple calculus, the logical connectives ∧ (and), ∨ (or)
and ¬ (not). The existential quantifier (∃) and the universal quantifier (∀) can be used to bind the
variables.
Its computational expressiveness is equivalent to that of Relational algebra.[2]
Examples
Let (A, B, C) mean (Rank, Name, ID)
and (D, E, F) to mean (Name, DeptName, ID)
Find all captains of the starship USS Enterprise:
In this example, A, B, C denotes both the result set and a set in the table Enterprise.
www.vidyarthiplus.com
Find Names of Enterprise crewmembers who are in Stellar Cartography:
In this example, we're only looking for the name, and that's B. F = C is a requirement, because
we need to find Enterprise crew members AND they are in the Stellar Cartography Department.
Tuple relational calculus
Tuple calculus is a calculus that was introduced by Edgar F. Codd as part of the relational
model, in order to provide a declarative database-query language for this data model. It formed
the inspiration for the database-query languages QUEL and SQL, of which the latter, although
far less faithful to the original relational model and calculus, is now the de-facto-standard
database-query language; viz., a dialect of SQL is used by nearly every relational-database-
management system. Lacroix and Pirotte proposed domain calculus, which is closer to first-order
logic and which showed that both of these calculi (as well as relational algebra) are equivalent in
expressive power. Subsequently, query languages for the relational model were called
relationally complete if they could express at least all of these queries.
Definition of the calculus
Relational database
Since the calculus is a query language for relational databases we first have to define a relational database. The basic relational building block is the domain, or data type. A tuple is an ordered multiset of attributes, which are ordered pairs of domain and value; or just a row. A relvar (relation variable) is a set of ordered pairs of domain and name, which serves as the header for a relation. A relation is a set of tuples. Although these relational concepts are mathematically defined, those definitions map loosely to traditional database concepts. A table is an accepted visual representation of a relation; a tuple is similar to the concept of row.
We first assume the existence of a set C of column names, examples of which are "name", "author", "address" et cetera. We define headers as finite subsets of C. A relational database schema is defined as a tuple S = (D, R, h) where D is the domain of atomic values (see relational
www.vidyarthiplus.com
model for more on the notions of domain and atomic value), R is a finite set of relation names, and
h : R → 2C
a function that associates a header with each relation name in R. (Note that this is a simplif ication from the full relational model where there is more than one domain and a header is not just a set of column names but also maps these column names to a domain.) Given a domain D we define a tuple over D as a partial function
t:C→D
that maps some column names to an atomic value in D. An example would be (name : "Harry", age : 25).
The set of all tuples over D is denoted as TD. The subset of C for which a tuple t is defined is called the domain of t (not to be confused with the domain in the schema) and denoted as dom(t).
Finally we define a relational database given a schema S = (D, R, h) as a function
db : R → 2TD
that maps the relation names in R to finite subsets of TD, such that for every relation name r in R and tuple t in db(r) it holds that
dom(t) = h(r).
The latter requirement simply says that all the tuples in a relation should contain the same column names, namely those defined for it in the schema.
Fundamental operations – Additional operations
SQL or Structured Query Language is a special-purpose programming language designed for managing data in relational database management systems (RDBMS).
Originally based upon relational algebra and tuple relational calculus, its scope includes data insert, query, update and delete, schema creation and modification, and data access control.
SQL was one of the first commercial languages for Edgar F. Codd's relational model, as described in his influential 1970 paper, "A Relational Model of Data for Large Shared Data Banks".[4] Despite not adhering to the relational model as described by Codd, it became the most widely used database language.[5][6] Although SQL is often described as, and to a great extent is, a declarative language, it also includes procedural elements. SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the International Organization for Standards (ISO) in 1987. Since then, the standard has been enhanced several times with added
www.vidyarthiplus.com
features. However, issues of SQL code portability between major RDBMS products still exist due to lack of full compliance with, or different interpretations of, the standard. Among the reasons mentioned are the large size and incomplete specification of the standard, as well as vendor lock-in.
SQL fundamentals
Language elements
The SQL language is subdivided into several language elements, including:
Clauses, which are constituent components of statements and queries. (In some cases, these are optional.)[10] Expressions, which can produce either scalar values or tables consisting of columns and rows of data. Predicates, which specify conditions that can be evaluated to SQL three-valued logic (3VL) or Boolean (true/false/unknown) truth values and which are used to limit the effects of statements and queries, or to change program flow. Queries, which retrieve the data based on specific criteria. This is the most important element of SQL. Statements, which may have a persistent effect on schemata and data, or which may control transactions, program flow, connections, sessions, or diagnostics.
o SQL statements also include the semicolon (";") statement terminator. Though not required on every platform, it is defined as a standard part of the SQL grammar.
Integrity
In computing, data integrity refers to maintaining and assuring the accuracy and consistency of data over its entire life-cycle,[1] and is an especially important feature of a database or RDBMS system. Data warehousing and business intelligence in general demand the accuracy, validity and correctness of data despite hardware failures, software bugs or human error. Data that has integrity is identically maintained during any operation, such as transfer, storage or retrieval.
All characteristics of data, including business rules, rules for how pieces of data relate, dates, definitions and lineage must be correct for its data integrity to be complete. When functions operate on the data, the functions must ensure integrity. Examples include transforming the data, storing history and storing metadata.
Types of integrity constraints
Data integrity is normally enforced in a database system by a series of integrity constraints or rules. Three types of integrity constraints are an inherent part of the relational data model: entity integrity, referential integrity and domain integrity:
www.vidyarthiplus.com
Entity integrity concerns the concept of a primary key. Entity integrity is an integrity rule which states that every table must have a primary key and that the column or columns chosen to be the primary key should be unique and not null. Referential integrity concerns the concept of a foreign key. The referential integrity rule states that any foreign-key value can only be in one of two states. The usual state of affairs is that the foreign key value refers to a primary key value of some table in the database. Occasionally, and this will depend on the rules of the data owner, a foreign-key value can be null. In this case we are explicitly saying that either there is no relationship between the objects represented in the database or that this relationship is unknown. Domain integrity specifies that all columns in relational database must be declared upon a defined domain. The primary unit of data in the relational data model is the data item. Such data items are said to be non-decomposable or atomic. A domain is a set of values of the same type. Domains are therefore pools of values from which actual values appearing in the columns of a table are drawn.
If a database supports these features it is the responsibility of the database to insure data integrity as well as the consistency model for the data storage and retrieval. If a database does not support these features it is the responsibility of the applications to insure data integrity while the database supports the consistency model for the data storage and retrieval.
Having a single, well-controlled, and well-defined data-integrity system increases
stability (one centralized system performs all data integrity operations) performance (all data integrity operations are performed in the same tier as the consistency model) re-usability (all applications benefit from a single centralized data integrity system) maintainability (one centralized system for all data integrity administration).
As of 2012, since all modern databases support these features (see Comparison of relational database management systems), it has become the de-facto responsibility of the database to ensure data integrity. Out-dated and legacy systems that use file systems (text, spreadsheets, ISAM, flat files, etc.) for their consistency model lack any[citation needed] kind of data-integrity model. This requires organizations to invest a large amount of time, money, and personnel in building data-integrity systems on a per-application basis that effectively just duplicate the existing data integrity systems found in modern databases. Many companies, and indeed many database systems themselves, offer products and services to migrate out-dated and legacy systems to modern databases to provide these data-integrity features. This offers organizations substantial savings in time, money, and resources because they do not have to develop per- application data-integrity systems that must be re-factored each time business requirements change.
Trigger
A database trigger is procedural code that is automatically executed in response to certain events on a particular table or view in a database. The trigger is mostly used for maintaining the integrity of the information on the database. For example, when a new record (representing a
www.vidyarthiplus.com
new worker) is added to the employees table, new records should also be created in the tables of the taxes, vacations and salaries.
Triggers in Microsoft SQL Server
Microsoft SQL Server supports triggers either after or instead of an insert, update or delete operation. They can be set on tables and views with the constraint that a view can be referenced only by an INSTEAD OF trigger.
Microsoft SQL Server 2005 introduced support for Data Definition Language (DDL) triggers, which can fire in reaction to a very wide range of events, including:
Drop table Create table Alter table Login events
A full list is available on MSDN.
Performing conditional actions in triggers (or testing data following modification) is done through accessing the temporary Inserted and Deleted tables.
Security
SQL Server 2012
By default, both DML and DDL triggers execute under the context of the user that calls the trigger. The caller of a trigger is the user that executes the statement that causes the trigger to run. For example, if user Mary executes a DELETE statement that causes DML trigger DML_trigMary to run, the code inside DML_trigMary executes in the context of the user privileges for Mary. This default behavior can be exploited by users who want to introduce malicious code in the database or server instance. For example, the following DDL trigger is created by user JohnDoe:
CREATE TRIGGER DDL_trigJohnDoe
ON DATABASE
FOR ALTER_TABLE
AS
GRANT CONTROL SERVER TO JohnDoe ;
GO
www.vidyarthiplus.com
What this trigger means is that as soon as a user that has permission to execute a GRANT CONTROL SERVER statement, such as a member of the sysadmin fixed server role, executes an ALTER TABLE statement, JohnDoe is granted CONTROL SERVER permission. In other words, although JohnDoe cannot grant CONTROL SERVER permission to himself, he enabled the trigger code that grants him this permission to execute under escalated privileges. Both DML and DDL triggers are open to this kind of security threat.
Advanced SQL features
Simple Features (officially Simple Feature Access) is both an OpenGIS and ISO standard (ISO 19125) that specifies a common storage model of mostly two-dimensional geographical data (point, line, polygon, multi-point, multi- line, etc.)
The ISO 19125 standard comes in two parts. Part one, ISO 19125-1 (SFA-CA for "common architecture"), defines a model for two-dimensional simple features, with linear interpolation between vertices. The data model defined in SFA-CA is a hierarchy of classes. This part also defines representation using Well-Known Text (and Binary). Part 2 of the standard, ISO 19125-2 (SFA-SQL), defines an implementation using SQL.[1] The OpenGIS standard(s) cover implementations in CORBA and OLE/COM as well, although these have lagged behind the SQL one and are not standardized by ISO.
The ISO/IEC 13249-3 SQL/MM Spatial extends the Simple Features data model mainly with circular interpolations (e.g. circular arcs) and adds other features like coordinate transformations and methods for validating geometries as well as Geography Markup Language support.
Embedded SQL
Embedded SQL is a method of combining the computing power of a programming language and the database manipulation capabilities of SQL. Embedded SQL statements are SQL statements written inline with the program source code of the host language. The embedded SQL statements are parsed by an embedded SQL preprocessor and replaced by host-language calls to a code library. The output from the preprocessor is then compiled by the host compiler. This allows programmers to embed SQL statements in programs written in any number of languages such as: C/C++, COBOL and Fortran.
The ANKITA SQL standards committee defined the embedded SQL standard in two steps: a formalism called Module Language was defined, then the embedded SQL standard was derived from Module Language.[1] The SQL standard defines embedding of SQL as embedded SQL and the language in which SQL queries are embedded is referred to as the host language. A popular host language is C. The mixed C and embedded SQL is called Pro*C in Oracle and Sybase database management systems. In the PostgreSQL database management system this precompiler is called ECPG. Other embedded SQL precompilers are Pro*Ada, Pro*COBOL, Pro*FORTRAN, Pro*Pascal, and Pro*PL/I.
www.vidyarthiplus.com
PL/SQL supports variables, conditions, loops and exceptions. Arrays are also supported, though in a somewhat unusual way, involving the use of PL/SQL collections. PL/SQL collections is a slightly advanced topic.
Implementations from version 8 of Oracle Database onwards have included features associated with object-orientation.
Dynamic SQL
Once the program units have been stored into the database, they become available for execution at a later time.
While programmers can readily embed Data Manipulation Language (DML) statements directly into their PL/SQL code using straightforward SQL statements, Data Definition Language (DDL) requires more complex "Dynamic SQL" statements to be written in the PL/SQL code. However, DML statements underpin the majority of PL/SQL code in typical software applications.
In the case of PL/SQL dynamic SQL, early versions of the Oracle Database required the use of a complicated Oracle DBMS_SQL package library. More recent versions have however introduced a simpler "Native Dynamic SQL", along with an associated EXECUTE IMMEDIATE syntax.
Oracle Corporation customarily extends package functionality with each successive release of the Oracle Database.
Introduction to distributed databases and client/server databases.
A distributed database is a database in which storage devices are not all attached to a common
processing unit such as the CPU. It may be stored in multiple computers located in the same
physical location, or may be dispersed over a network of interconnected computers. Unlike
parallel systems, in which the processors are tightly coupled and constitute a single database
system, a distributed database system consists of loosely coupled sites that share no physical
components.
Collections of data (e.g. in a database) can be distributed across multiple physical locations. A
distributed database can reside on network servers on the Internet, on corporate intranets or
extranets, or on other company networks. The replication and distribution of databases improves
database performance at end-user worksites. [1][clarification needed]
www.vidyarthiplus.com
In relational database theory, a functional dependency is a constraint b
To ensure that the distributive databases are up to date and current, there are two processes:
replication and duplication. Replication involves using specialized software that looks for
changes in the distributive database. Once the changes have been identif ied, the replication
process makes all the databases look the same. The replication process can be very complex and
time consuming depending on the size and number of the distributive databases. This process can
also require a lot of time and computer resources. Duplication on the other hand is not as
complicated. It basically identifies one database as a master and then duplicates that database.
The duplication process is normally done at a set time after hours. This is to ensure that each
distributed location has the same data. In the duplication process, changes to the master database
only are allowed. This is to ensure that local data will not be overwritten. Both of the processes
can keep the data current in all distributive locations.[2]
Besides distributed database replication and fragmentation, there are many other distributed
database design technologies. For example, local autonomy, synchronous and asynchronous
distributed database technologies. These technologies' implementation can and does depend on
the needs of the business and the sensitivity/confidentiality of the data to be stored in the
database, and hence the price the business is willing to spend on ensuring data security,