UNIT II RELATIONAL MODEL The relational model

www.vidyarthiplus.com

UNIT II RELATIONAL MODEL

The relational model

The relational model for database management is a database model based on first-order

predicate logic, first formulated and proposed in 1969 by Edgar F. Codd.[1][2] In the relational

model of a database, all data is represented in terms of tuples, grouped into relations. A database

organized in terms of the relational model is a relational database.


Diagram of an example database according to the Relational model.

In the relational model, related records are linked together with a "key".

The purpose of the relational model is to provide a declarative method for specifying data and

queries: users directly state what information the database contains and what information they


want from it, and let the database management system software take care of describing data

structures for storing the data and retrieval procedures for answering queries.

Most implementations of the relational model use the SQL data definition and query language. A

table in an SQL database schema corresponds to a predicate variable; the contents of a table to a

relation; key constraints, other constraints, and SQL queries correspond to predicates. However,

SQL databases, including DB2, deviate from the relational model in many details; Codd fiercely

argued against deviations that compromise the original principles.

The catalog:

Types – Keys

* Alternate key - An alternate key is any candidate key which is not selected to be the primary

key

* Candidate key - A candidate key is a field or combination of fields that can act as a primary

key field for that table to uniquely identify each record in that table.

For Eg:

The table:

Emloyee(Name,Address,Ssn,Employee_Idprimary_key,Phone_ext)

In the above example Ssn no. and employee identity are ccandidate keys.

* Compound key - compound key (also called a composite key or concatenated key) is a key

that consists of 2 or more attributes.

* Primary key - a primary key is a value that can be used to identify a unique row in a table.

Attributes are associated with it. Examples of primary keys are Social Security numbers

(associated to a specific person) or ISBNs (associated to a specific book).

In the relational model of data, a primary key is a candidate key chosen as the main method of

uniquely identifying a tuple in a relation.

For Eg:



* Superkey - A superkey is defined in the relational model as a set of attributes of a relation

variable (relvar) for which it holds that in all relations assigned to that variable there are no two

distinct tuples (rows) that have the same values for the attributes in this set. Equivalently a

superkey can also be defined as a set of attributes of a relvar upon which all attributes of the

relvar are functionally dependent.

For Eg:


<Ssn,Name,Address>

<Ssn,Name>

<Ssn>

All the above are super keys.

* Foreign key - a foreign key (FK) is a field or group of fields in a database record that points to

a key field or group of fields forming a key of another database record in some (usually

different) table. Usually a foreign key in one table refers to the primary key (PK) of another

table. This way references can be made to link

For Eg:

For a Student....

School(Name,Address,Phone,School_Reg_noprimary_key

Relational algebra

In computer science, relational algebra is an offshoot of first-order logic and of algebra of sets

concerned with operations over finitary relations, usually made more convenient to work with by

identifying the components of a tuple by a name (called attribute) rather than by a numeric

column index, which is what is called a relation in database terminology.

The main application of relational algebra is providing a theoretical foundation for relational

databases, particularly query languages for such databases, chiefly among which is


Introduction

Relational algebra received little attention outside of pure mathematics until the publication of

E.F. Codd's relational model of data in 1970. Codd proposed such an algebra as a basis for

database query languages. (See section Implementations.)

Both a named and a unnamed perspective are possible for relational algebra, depending on

whether the tuples are endowed with component names or not. In the unnamed perspective, a

tuple is simply a member of a Cartesian product. In the named perspective, tuples are functions

from a finite set U of attributes (of the relation) to a domain of values (assumed distinct from

U).[1] The relational algebras obtained from the two perspectives are equivalent.[2] The typical

undergraduate textbooks present only the named perspective though,[3][4] and this article follows

suit.

Relational algebra is essentially equivalent in expressive power to relational calculus (and thus

first-order logic); this result is known as Codd's theorem. One must be careful to avoid a

mismatch that may arise between the two languages because negation, applied to a formula of

the calculus, constructs a formula that may be true on an infinite set of possible tuples, while the

difference operator of relational algebra always returns a finite result. To overcome these

difficulties, Codd restricted the operands of relational algebra to finite relations only and also

proposed restricted support for negation (NOT) and disjunction (OR). Analogous restrictions are

found in many other logic-based computer languages. Codd defined the term relational

completeness to refer to a language that is complete with respect to first-order predicate calculus

apart from the restrictions he proposed. In practice the restrictions have no adverse effect on the

applicability of his relational algebra for database purposes.

Primitive operations

As in any algebra, some operators are primitive and the others are derived in terms of the

primitive ones. It is useful if the choice of primitive operators parallels the usual choice of

primitive logical operators.


Five primitive operators of Codd's algebra are the selection, the projection, the Cartesian

product (also called the cross product or cross join), the set union, and the set difference.

Another operator, rename was not noted by Codd, but the need for it is shown by the inventors of

ISBL. These six operators are fundamental in the sense that omitting any one of them causes a

loss of expressive power. Many other operators have been defined in terms of these six. Among

the most important are set intersection, division, and the natural join. In fact ISBL made a

compelling case for replacing the Cartesian product with the natural join, of which the Cartesian

product is a degenerate case.

Altogether, the operators of relational algebra have identical expressive power to that of domain

relational calculus or tuple relational calculus. However, for the reasons given in section

Introduction, relational algebra is less expressive than first-order predicate calculus without

function symbols. Relational algebra corresponds to a subset of first-order logic, namely Horn

clauses without recursion and negation.

Set operators

Although three of the six basic operators are taken from set theory, there are additional

constraints that are present in their relational algebra counterparts: For set union and set

difference, the two relations involved must be union-compatible—that is, the two relations must

have the same set of attributes. Because set intersection can be defined in terms of set difference,

the two relations involved in set intersection must also be union-compatible.

The Cartesian product is defined differently from the one in set theory in the sense that tuples are

considered to be 'shallow' for the purposes of the operation. That is, the Cartesian product of an

n-tuple by an m-tuple has the 2-tuple "flattened" into an (n + m)-tuple. In set theory, the

Cartesian product is a set of 2-tuples. More formally, R × S is defined as follows:

R × S = {(r1, r2, ..., rn, s1, s2, ..., sm) | (r1, r2, ..., rn) ∈ R, (s1, s2, ..., sm) ∈ S}

Like the Cartesian product, the cardinality of the result is the product of the cardinalities of its

factors, i.e., |R × S| = |R| × |S|. In addition, for the Cartesian product to be defined, the two


relations involved must have disjoint headers—that is, they must not have a common attribute

name.

Projection (π)

A projection is a unary operation written as where is a set of attribute

names. The result of such projection is defined as the set that is obtained when all tuples in R are

restricted to the set .

This specifies the specific subset of columns (attributes of each tuple) to be retrieved. To obtain

the names and phone numbers from an address book, the projection might be written

. The result of that projection would be

a relation which contains only the contactName and contactPhoneNumber attributes for each

unique entry in addressBook.

Selection (σ)

A generalized selection is a unary operation written as where is a propositional

formula that consists of atoms as allowed in the normal selection and the logical operators

(and), (or) and (negation). This selection selects all those tuples in R for which holds.

To obtain a listing of all friends or business associates in an address book, the selection might be

written as . The result would be a

relation containing every attribute of every unique record where isFriend is true or where

isBusinessContact is true.

Rename (ρ)

A rename is a unary operation written as where the result is identical to R except that

the b attribute in all tuples is renamed to an a attribute. This is simply used to rename the

attribute of a relation or the relation itself.


To rename the 'isFriend' attribute to 'is BusinessContact' in a relation,

might be used.

Domain relational calculus

In computer science, domain relational calculus (DRC) is a calculus that was introduced by

Michel Lacroix and Alain Pirotte as a declarative database query language for the relational data

model.[1]

In DRC, queries have the form:

where each Xi is either a domain variable or constant, and

true.

denotes a

DRC formula. The result of the query is the set of tuples Xi to Xn which makes the DRC formula

This language uses the same operators as tuple calculus, the logical connectives ∧ (and), ∨ (or)

and ¬ (not). The existential quantifier (∃) and the universal quantifier (∀) can be used to bind the

variables.

Its computational expressiveness is equivalent to that of Relational algebra.[2]

Examples

Let (A, B, C) mean (Rank, Name, ID)

and (D, E, F) to mean (Name, DeptName, ID)

Find all captains of the starship USS Enterprise:

In this example, A, B, C denotes both the result set and a set in the table Enterprise.


Find Names of Enterprise crewmembers who are in Stellar Cartography:

In this example, we're only looking for the name, and that's B. F = C is a requirement, because

we need to find Enterprise crew members AND they are in the Stellar Cartography Department.

Tuple relational calculus

Tuple calculus is a calculus that was introduced by Edgar F. Codd as part of the relational

model, in order to provide a declarative database-query language for this data model. It formed

the inspiration for the database-query languages QUEL and SQL, of which the latter, although

far less faithful to the original relational model and calculus, is now the de-facto-standard

database-query language; viz., a dialect of SQL is used by nearly every relational-database-

management system. Lacroix and Pirotte proposed domain calculus, which is closer to first-order

logic and which showed that both of these calculi (as well as relational algebra) are equivalent in

expressive power. Subsequently, query languages for the relational model were called

relationally complete if they could express at least all of these queries.

Definition of the calculus

Relational database

Since the calculus is a query language for relational databases we first have to define a relational database. The basic relational building block is the domain, or data type. A tuple is an ordered multiset of attributes, which are ordered pairs of domain and value; or just a row. A relvar (relation variable) is a set of ordered pairs of domain and name, which serves as the header for a relation. A relation is a set of tuples. Although these relational concepts are mathematically defined, those definitions map loosely to traditional database concepts. A table is an accepted visual representation of a relation; a tuple is similar to the concept of row.

We first assume the existence of a set C of column names, examples of which are "name", "author", "address" et cetera. We define headers as finite subsets of C. A relational database schema is defined as a tuple S = (D, R, h) where D is the domain of atomic values (see relational


model for more on the notions of domain and atomic value), R is a finite set of relation names, and

h : R → 2C

a function that associates a header with each relation name in R. (Note that this is a simplif ication from the full relational model where there is more than one domain and a header is not just a set of column names but also maps these column names to a domain.) Given a domain D we define a tuple over D as a partial function

t:C→D

that maps some column names to an atomic value in D. An example would be (name : "Harry", age : 25).

The set of all tuples over D is denoted as TD. The subset of C for which a tuple t is defined is called the domain of t (not to be confused with the domain in the schema) and denoted as dom(t).

Finally we define a relational database given a schema S = (D, R, h) as a function

db : R → 2TD

that maps the relation names in R to finite subsets of TD, such that for every relation name r in R and tuple t in db(r) it holds that

dom(t) = h(r).

The latter requirement simply says that all the tuples in a relation should contain the same column names, namely those defined for it in the schema.

Fundamental operations – Additional operations

SQL or Structured Query Language is a special-purpose programming language designed for managing data in relational database management systems (RDBMS).

Originally based upon relational algebra and tuple relational calculus, its scope includes data insert, query, update and delete, schema creation and modification, and data access control.

SQL was one of the first commercial languages for Edgar F. Codd's relational model, as described in his influential 1970 paper, "A Relational Model of Data for Large Shared Data Banks".[4] Despite not adhering to the relational model as described by Codd, it became the most widely used database language.[5][6] Although SQL is often described as, and to a great extent is, a declarative language, it also includes procedural elements. SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the International Organization for Standards (ISO) in 1987. Since then, the standard has been enhanced several times with added


features. However, issues of SQL code portability between major RDBMS products still exist due to lack of full compliance with, or different interpretations of, the standard. Among the reasons mentioned are the large size and incomplete specification of the standard, as well as vendor lock-in.

SQL fundamentals

Language elements

The SQL language is subdivided into several language elements, including:

Clauses, which are constituent components of statements and queries. (In some cases, these are optional.)[10] Expressions, which can produce either scalar values or tables consisting of columns and rows of data. Predicates, which specify conditions that can be evaluated to SQL three-valued logic (3VL) or Boolean (true/false/unknown) truth values and which are used to limit the effects of statements and queries, or to change program flow. Queries, which retrieve the data based on specific criteria. This is the most important element of SQL. Statements, which may have a persistent effect on schemata and data, or which may control transactions, program flow, connections, sessions, or diagnostics.

o SQL statements also include the semicolon (";") statement terminator. Though not required on every platform, it is defined as a standard part of the SQL grammar.

Integrity

In computing, data integrity refers to maintaining and assuring the accuracy and consistency of data over its entire life-cycle,[1] and is an especially important feature of a database or RDBMS system. Data warehousing and business intelligence in general demand the accuracy, validity and correctness of data despite hardware failures, software bugs or human error. Data that has integrity is identically maintained during any operation, such as transfer, storage or retrieval.

All characteristics of data, including business rules, rules for how pieces of data relate, dates, definitions and lineage must be correct for its data integrity to be complete. When functions operate on the data, the functions must ensure integrity. Examples include transforming the data, storing history and storing metadata.

Types of integrity constraints

Data integrity is normally enforced in a database system by a series of integrity constraints or rules. Three types of integrity constraints are an inherent part of the relational data model: entity integrity, referential integrity and domain integrity:


Entity integrity concerns the concept of a primary key. Entity integrity is an integrity rule which states that every table must have a primary key and that the column or columns chosen to be the primary key should be unique and not null. Referential integrity concerns the concept of a foreign key. The referential integrity rule states that any foreign-key value can only be in one of two states. The usual state of affairs is that the foreign key value refers to a primary key value of some table in the database. Occasionally, and this will depend on the rules of the data owner, a foreign-key value can be null. In this case we are explicitly saying that either there is no relationship between the objects represented in the database or that this relationship is unknown. Domain integrity specifies that all columns in relational database must be declared upon a defined domain. The primary unit of data in the relational data model is the data item. Such data items are said to be non-decomposable or atomic. A domain is a set of values of the same type. Domains are therefore pools of values from which actual values appearing in the columns of a table are drawn.

If a database supports these features it is the responsibility of the database to insure data integrity as well as the consistency model for the data storage and retrieval. If a database does not support these features it is the responsibility of the applications to insure data integrity while the database supports the consistency model for the data storage and retrieval.

Having a single, well-controlled, and well-defined data-integrity system increases

stability (one centralized system performs all data integrity operations) performance (all data integrity operations are performed in the same tier as the consistency model) re-usability (all applications benefit from a single centralized data integrity system) maintainability (one centralized system for all data integrity administration).

As of 2012, since all modern databases support these features (see Comparison of relational database management systems), it has become the de-facto responsibility of the database to ensure data integrity. Out-dated and legacy systems that use file systems (text, spreadsheets, ISAM, flat files, etc.) for their consistency model lack any[citation needed] kind of data-integrity model. This requires organizations to invest a large amount of time, money, and personnel in building data-integrity systems on a per-application basis that effectively just duplicate the existing data integrity systems found in modern databases. Many companies, and indeed many database systems themselves, offer products and services to migrate out-dated and legacy systems to modern databases to provide these data-integrity features. This offers organizations substantial savings in time, money, and resources because they do not have to develop per- application data-integrity systems that must be re-factored each time business requirements change.

Trigger

A database trigger is procedural code that is automatically executed in response to certain events on a particular table or view in a database. The trigger is mostly used for maintaining the integrity of the information on the database. For example, when a new record (representing a


new worker) is added to the employees table, new records should also be created in the tables of the taxes, vacations and salaries.

Triggers in Microsoft SQL Server

Microsoft SQL Server supports triggers either after or instead of an insert, update or delete operation. They can be set on tables and views with the constraint that a view can be referenced only by an INSTEAD OF trigger.

Microsoft SQL Server 2005 introduced support for Data Definition Language (DDL) triggers, which can fire in reaction to a very wide range of events, including:

Drop table Create table Alter table Login events

A full list is available on MSDN.

Performing conditional actions in triggers (or testing data following modification) is done through accessing the temporary Inserted and Deleted tables.

Security

SQL Server 2012

By default, both DML and DDL triggers execute under the context of the user that calls the trigger. The caller of a trigger is the user that executes the statement that causes the trigger to run. For example, if user Mary executes a DELETE statement that causes DML trigger DML_trigMary to run, the code inside DML_trigMary executes in the context of the user privileges for Mary. This default behavior can be exploited by users who want to introduce malicious code in the database or server instance. For example, the following DDL trigger is created by user JohnDoe:

CREATE TRIGGER DDL_trigJohnDoe

ON DATABASE

FOR ALTER_TABLE

AS

GRANT CONTROL SERVER TO JohnDoe ;

GO


What this trigger means is that as soon as a user that has permission to execute a GRANT CONTROL SERVER statement, such as a member of the sysadmin fixed server role, executes an ALTER TABLE statement, JohnDoe is granted CONTROL SERVER permission. In other words, although JohnDoe cannot grant CONTROL SERVER permission to himself, he enabled the trigger code that grants him this permission to execute under escalated privileges. Both DML and DDL triggers are open to this kind of security threat.

Advanced SQL features

Simple Features (officially Simple Feature Access) is both an OpenGIS and ISO standard (ISO 19125) that specifies a common storage model of mostly two-dimensional geographical data (point, line, polygon, multi-point, multi- line, etc.)

The ISO 19125 standard comes in two parts. Part one, ISO 19125-1 (SFA-CA for "common architecture"), defines a model for two-dimensional simple features, with linear interpolation between vertices. The data model defined in SFA-CA is a hierarchy of classes. This part also defines representation using Well-Known Text (and Binary). Part 2 of the standard, ISO 19125-2 (SFA-SQL), defines an implementation using SQL.[1] The OpenGIS standard(s) cover implementations in CORBA and OLE/COM as well, although these have lagged behind the SQL one and are not standardized by ISO.

The ISO/IEC 13249-3 SQL/MM Spatial extends the Simple Features data model mainly with circular interpolations (e.g. circular arcs) and adds other features like coordinate transformations and methods for validating geometries as well as Geography Markup Language support.

Embedded SQL

Embedded SQL is a method of combining the computing power of a programming language and the database manipulation capabilities of SQL. Embedded SQL statements are SQL statements written inline with the program source code of the host language. The embedded SQL statements are parsed by an embedded SQL preprocessor and replaced by host-language calls to a code library. The output from the preprocessor is then compiled by the host compiler. This allows programmers to embed SQL statements in programs written in any number of languages such as: C/C++, COBOL and Fortran.

The ANKITA SQL standards committee defined the embedded SQL standard in two steps: a formalism called Module Language was defined, then the embedded SQL standard was derived from Module Language.[1] The SQL standard defines embedding of SQL as embedded SQL and the language in which SQL queries are embedded is referred to as the host language. A popular host language is C. The mixed C and embedded SQL is called Pro*C in Oracle and Sybase database management systems. In the PostgreSQL database management system this precompiler is called ECPG. Other embedded SQL precompilers are Pro*Ada, Pro*COBOL, Pro*FORTRAN, Pro*Pascal, and Pro*PL/I.


PL/SQL supports variables, conditions, loops and exceptions. Arrays are also supported, though in a somewhat unusual way, involving the use of PL/SQL collections. PL/SQL collections is a slightly advanced topic.

Implementations from version 8 of Oracle Database onwards have included features associated with object-orientation.

Dynamic SQL

Once the program units have been stored into the database, they become available for execution at a later time.

While programmers can readily embed Data Manipulation Language (DML) statements directly into their PL/SQL code using straightforward SQL statements, Data Definition Language (DDL) requires more complex "Dynamic SQL" statements to be written in the PL/SQL code. However, DML statements underpin the majority of PL/SQL code in typical software applications.

In the case of PL/SQL dynamic SQL, early versions of the Oracle Database required the use of a complicated Oracle DBMS_SQL package library. More recent versions have however introduced a simpler "Native Dynamic SQL", along with an associated EXECUTE IMMEDIATE syntax.

Oracle Corporation customarily extends package functionality with each successive release of the Oracle Database.

Introduction to distributed databases and client/server databases.

A distributed database is a database in which storage devices are not all attached to a common

processing unit such as the CPU. It may be stored in multiple computers located in the same

physical location, or may be dispersed over a network of interconnected computers. Unlike

parallel systems, in which the processors are tightly coupled and constitute a single database

system, a distributed database system consists of loosely coupled sites that share no physical

components.

Collections of data (e.g. in a database) can be distributed across multiple physical locations. A

distributed database can reside on network servers on the Internet, on corporate intranets or

extranets, or on other company networks. The replication and distribution of databases improves

database performance at end-user worksites. [1][clarification needed]


In relational database theory, a functional dependency is a constraint b

To ensure that the distributive databases are up to date and current, there are two processes:

replication and duplication. Replication involves using specialized software that looks for

changes in the distributive database. Once the changes have been identif ied, the replication

process makes all the databases look the same. The replication process can be very complex and

time consuming depending on the size and number of the distributive databases. This process can

also require a lot of time and computer resources. Duplication on the other hand is not as

complicated. It basically identifies one database as a master and then duplicates that database.

The duplication process is normally done at a set time after hours. This is to ensure that each

distributed location has the same data. In the duplication process, changes to the master database

only are allowed. This is to ensure that local data will not be overwritten. Both of the processes

can keep the data current in all distributive locations.[2]

Besides distributed database replication and fragmentation, there are many other distributed

database design technologies. For example, local autonomy, synchronous and asynchronous

distributed database technologies. These technologies' implementation can and does depend on

the needs of the business and the sensitivity/confidentiality of the data to be stored in the

database, and hence the price the business is willing to spend on ensuring data security,

consistency and integrity.

UNIT II RELATIONAL MODEL The relational model

Documents