Top Banner
Master of Computer Application (MCA) – Semester II MC0067 – Database Management – 4 Credits (Book ID: B0716 & BO717) Assignment Set – 1 BOOK ID: B0716 & BO717 1. Write about: Linear Search Collision Chain With respect to Hashing Techniques Answer : Linear search: Linear search, also known as sequential search, means starting at the beginning of the data and checking each item in turn until either the desired item is found or the end of the data is reached. Linear search is a search algorithm, also known as sequential search that is suitable for searching a list of data for a particular value. It operates by checking every element of a list one at a time in sequence until a match is found. The Linear Search, or sequential search, is simply examining each element in a list one by one until the desired element is found. The Linear Search is not very efficient. If the item of data to be found is at the end of the list, then all previous items must be read and checked before the item that matches the search criteria is found. This is a very straightforward loop comparing every
77
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: mc0067

Master of Computer Application (MCA) – Semester II

MC0067 – Database Management – 4 Credits (Book ID: B0716 & BO717)

Assignment Set – 1

BOOK ID: B0716 & BO717

1. Write about: Linear Search

Collision Chain

With respect to Hashing Techniques

Answer : Linear search:

Linear search, also known as sequential search, means starting at the beginning of the data and

checking each item in turn until either the desired item is found or the end of the data is

reached. Linear search is a search algorithm, also known as sequential search that is suitable

for searching a list of data for a particular value. It operates by checking every element of a list

one at a time in sequence until a match is found. The Linear Search, or sequential search, is

simply examining each element in a list one by one until the desired element is found. The

Linear Search is not very efficient. If the item of data to be found is at the end of the list, then all

previous items must be read and checked before the item that matches the search criteria is

found. This is a very straightforward loop comparing every element in the array with the key. As

soon as an equal value is found, it returns. If the loop finishes without finding a match, the

search failed and -1 is returned. For small arrays, linear search is a good solution because it's

so straightforward. In an array of a million elements linear search on average will take500, 000

comparisons to find the key. For a much faster search, take a look at binary search.

Algorithm

For each item in the database

if the item matches the wanted info

Page 2: mc0067

exit with this item

Continue loop

wanted item is not in database

Collision Chain:

In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys (e.g., a person's name), to their associated values (e.g., their telephone number). Thus, a hash table implements an associate array. The hash function is used to transform the key into the index (the hash) of an array element (the slot or bucket) where the corresponding value is to be sought.

Ideally, the hash function should map each possible key to a unique slot index, but this ideal is rarely achievable in practice (unless the hash keys are fixed; i.e. new entries are never added to the table after it is created). Instead, most hash table designs assume that hast collisions—different keys that map to the same hash value—will occur and must be accommodated in some way.

2. Write about: Integrity Rules

Relational Operators with examples for each

Answer: Integrity Rules:

These are the rules which a relational database follows in order to stay accurate and accessible. These rules govern which operations can be performed on the data and on the structure of the database. There are three integrity rules defined for a relational databse,which are:-

Distinct Rows in a Table - this rule says that all the rows of a table should be distinct to avoid in ambiguity while accessing the rows of that table. Most of the modern database management systems can be configured to avoid duplicate rows.

Entity Integrity (A Primary Key or part of it cannot be null) - this rule says that 'null' is special value in a relational database and it doesn't mean blank or zero. It means the unavailability of data and hence a 'null' primary key would not be a complete identifier. This integrity rule is also termed as entity integirty.

Referential Integrity - this rule says that if a foreign key is defined on a table then a value matching that foreign key value must exist as th e primary key of a row in some other table.

The following are the integrity rules to be satisfied by any relation. • No Component of the Primary Key can be null.

Page 3: mc0067

• The Database must not contain any unmatched Foreign Key values. This is called the referential integrity rule.

Unlike the case of Primary Keys, there is no integrity rule saying that no component of the foreign key can be null. This can be logically explained with the help of the following example:

Consider the relations Employee and Account as given below. Employee

Emp# EmpName EmpCity EmpAcc#

X101 Shekhar Bombay 120001

X102 Raj Pune 120002

X103 Sharma Nagpur Null

X104 Vani Bhopal 120003

Account

ACC# OpenDate BalAmt

120001 30-Aug-1998 5000

120002 29-Oct-1998 1200

120003 01-Jan-1999 3000

120004 04-Mar-1999 500

EmpAcc# in Employee relation is a foreign key creating reference from Employee to Account. Here, a Null value in EmpAcc# attribute is logically possible if an Employee does not have a bank account. If the business rules allow an employee to exist in the system without opening an account, a Null value can be allowed for EmpAcc# in Employee relation.

In the case example given, Cust# in Ord_Aug cannot accept Null if the business rule insists that the Customer No. needs to be stored for every order placed.

Relational Operators:

In the relational model, the database objects seen so far have specific names:

Name Meaning

Relation Table

Tuple Record(Row)

Attribute Field(Column)

Cardinality Number of Records(Rows)

Page 4: mc0067

Degree(or Arity) Number of Fields(Columns)

View Query/Answer table

On these objects, a set of operators (relational operators) is provided to manipulate them:

1. Restrict

2. Project

3. Union

4. Difference

5. Product

6. Intersection

7. Join

8. Divide

Restrict:

Restrict simply extract records from a table.

it is also known as “Select”, but not the same SELECT as defined in SQL.

Project:

Project selects zero or more fields from a table and generates a new table

that contains all of the records and only the selected fields (with no duplications).

Union:

Union creates a new table by adding the records of one table to another

tables, must be compatible: have the same number of fields and each of the field pairs has to have values in the same domain.

Difference:

The difference of two tables is a third table which contains the records which appear in the first BUT NOT in the second.

Page 5: mc0067

Product:

The product of two tables is a third which contains all of the records in the first one added to each of the records in the second.

Intersection:

The intersection of two tables is a third tables which contains the records which are common to both.

Join:

The join of two tables is a third which contains all of the records in the first and the second which are related.

Divide:

Dividing a table by another table gives all the records in the first which have values in their fields matching ALL the records in the second.

The eight relational algebra operators are

1. SELECT – To retrieve specific tuples/rows from a relation.

Ord#OrdDate Cust#

101 02-08-94 002

104 18-09-94 002

2. PROJECT – To retrieve specific attributes/columns from a relation.

Page 6: mc0067

DescriptionPrice

Power Supply 4000

101-Keyboard 2000 2000

Mouse 800 800

MS-DOS 6.0 5000 5000

MS-Word 6.0 8000 8000

3. PRODUCT – To obtain all possible combination of tuples from two relations.

Ord#OrdDate O.Cust# C.Cust# CustName City

101 02-08-94 002 001 Shah Bombay

101 02-08-94 002 002 Srinivasan Madras

101 02-08-94 002 003 Gupta Delhi

101 02-08-94 002 004 Banerjee Calcutta

101 02-08-94 002 005 Apte Bombay

102 11-08-94 003 001 Shah Bombay

102 11-08-94 003 002 Srinivasan Madras

4. UNION – To retrieve tuples appearing in either or both the relations participating in the UNION.

Page 7: mc0067

Eg: Consider the relation Ord_Jul as follows (Table: Ord_Jul)

Ord# OrdDate Cust#

101 03-07-94 001

102 27-07-94 003

101 02-08-94 002

102 11-08-94 003

103 21-08-94 003

104 28-08-94 002

105 30-08-94 005

Note: The union operation shown above logically implies retrieval of records of Orders placed in July or in August

5. INTERSECT – To retrieve tuples appearing in both the relations participating in the INTERSECT.

Eg: To retrieve Cust# of Customers who’ve placed orders in July and in August

Cust#

003

Page 8: mc0067

6. DIFFERENCE – To retrieve tuples appearing in the first relation participating in the DIFFERENCE but not the second.

Eg: To retrieve Cust# of Customers who’ve placed orders in July but not in August

Cust#

001

7. JOIN – To retrieve combinations of tuples in two relations based on a common field in both the relations.

Eg:

ORD_AUG join CUSTOMERS (here, the common column is Cust#)

Ord# OrdDate Cust# CustNames City

101 02-08-94 002 Srinivasan Madras

102 11-08-94 003 Gupta Delhi

103 21-08-94 003 Gupta Delhi

104 28-08-94 002 Srinivasan Madras

105 30-08-94 005 Apte Bombay

Note: The above join operation logically implies retrieval of details of all orders and the details of the corresponding customers who placed the orders. Such a join operation where only those

Page 9: mc0067

rows having corresponding rows in the both the relations are retrieved is called the natural join or inner join. This is the most common join operation.

Consider the example of EMPLOYEE and ACCOUNT relations.

EMPLOYEE

EMP # EmpName EmpCity Acc#

X101 Shekhar Bombay 120001

X102 Raj Pune 120002

X103 Sharma Nagpur Null

X104 Vani Bhopal 120003

ACCOUNT

Acc# OpenDate BalAmt

120001 30. Aug. 1998 5000

120002 29. Oct. 1998 1200

120003 1. Jan. 1999 3000

120004 4. Mar. 1999 500

A join can be formed between the two relations based on the common column Acc#. The result of the (inner) join is :

Emp# EmpName EmpCity Acc# OpenDate BalAmt

X101 Shekhar Bombay 120001 30. Aug. 1998 5000

X102 Raj Pune 120002 29. Oct. 1998 1200

X104 Vani Bhopal 120003 1. Jan 1999 3000

Note that, from each table, only those records which have corresponding records in the other table appear in the result set. This means that result of the inner join shows the details of those employees who hold an account along with the account details.

The other type of join is the outer join which has three variations – the left outer join, the right outer join and the full outer join. These three joins are explained as follows:

The left outer join retrieves all rows from the left-side (of the join operator) table. If there are corresponding or related rows in the right-side table, the correspondence will be shown. Otherwise, columns of the right-side table will take null values.

Page 10: mc0067

EMPLOYEE left outer join ACCOUNT gives:

Emp# EmpName EmpCity Acc# OpenDate BalAmt

X101 Shekhar Bombay 120001 30. Aug. 1998 5000

X102 Raj Pune 120002 29. Oct. 1998 1200

X103 Sharma Nagpur NULL NULL NULL

X104 Vani Bhopal 120003 1. Jan 1999 3000

The right outer join retrieves all rows from the right-side (of the join operator) table. If there are corresponding or related rows in the left-side table, the correspondence will be shown. Otherwise, columns of the left-side table will take null values.

EMPLOYEE right outer join ACCOUNT gives:

Emp# EmpName EmpCity Acc# OpenDate BalAmt

X101 Shekhar Bombay 120001 30. Aug. 1998 5000

X102 Raj Pune 120002 29. Oct. 1998 1200

X104 Vani Bhopal 120003 1. Jan 1999 3000

NULL NULL NULL 120004 4. Mar. 1999 500

(Assume that Acc# 120004 belongs to someone who is not an employee and hence the details of the Account holder are not available here)

The full outer join retrieves all rows from both the tables. If there is a correspondence or relation between rows from the tables of either side, the correspondence will be shown. Otherwise, related columns will take null values.

EMPLOYEE full outer join ACCOUNT gives:

Page 11: mc0067

Emp# EmpName EmpCity Acc# OpenDate BalAmt

X101 Shekhar Bombay 120001 30. Aug. 1998 5000

X102 Raj Pune 120002 29. Oct. 1998 1200

X103 Sharma Nagpur NULL NULL NULL

X104 Vani Bhopal 120003 1. Jan 1999 3000

NULL NULL NULL 120004 4. Mar. 1999 500

8. DIVIDE

Consider the following three relations:

R1 divide by R2 per R3 gives:

a

Thus the result contains those values from R1 whose corresponding R2 values in R3 include all R2 values.

3. Write about: Three Level Architecture of a database

Services of a Database System

Answer : Data are actually stored as bits, or numbers and strings, but it is difficult to work with data at this level.

It is necessary to view data at different levels of abstraction.

Schema:

Description of data at some level. Each level has its own schema.

We will be concerned with three forms of schemas:

Page 12: mc0067

physical, conceptual, and external.

 

 

 

 

4. Explain the SQL syntax for: Table Creation with constraint imposing using an example

Functions Count, Sum, Average with appropriate examples

Answer: create-table-stmt:

column-def:

type-name:

Page 13: mc0067

column-constraint:

table-constraint:

foreign-key-clause:

Page 14: mc0067

The "CREATE TABLE" command is used to create a new table in an SQLite database. A CREATE TABLE command specifies the following attributes of the new table:

The name of the new table. The database in which the new table is created. Tables may be

created in the main database, the temp database, or in any attached database.

The name of each column in the table. The declared type of each column in the table. A default value or expression for each column in the table. A default collation sequence to use with each column. Optionally, a PRIMARY KEY for the table. Both single column

and composite (multiple column) primary keys are supported. A set of SQL constraints for each table. SQLite supports

UNIQUE, NOT NULL, CHECK and FOREIGN KEY constraints.

Every CREATE TABLE statement must specify a name for the new table. Table names that begin with "sqlite_" are reserved for internal

Page 15: mc0067

use. It is an error to attempt to create a table with a name that starts with "sqlite_".

If a <database-name> is specified, it must be either "main", "temp", or the name of an attached database. In this case the new table is created in the named database. If the "TEMP" or "TEMPORARY" keyword occurs between the "CREATE" and "TABLE" then the new table is created in the temp database. It is an error to specify both a <database-name> and the TEMP or TEMPORARY keyword, unless the <database-name> is "temp". If no database name is specified and the TEMP keyword is not present then the table is created in the main database.

It is usually an error to attempt to create a new table in a database that already contains a table, index or view of the same name. However, if the "IF NOT EXISTS" clause is specified as part of the CREATE TABLE statement and a table or view of the same name already exists, the CREATE TABLE command simply has no effect (and no error message is returned). An error is still returned if the table cannot be created because of an existing index, even if the "IF NOT EXISTS" clause is specified.

It is not an error to create a table that has the same name as an existing trigger.

Tables are removed using the DROP TABLE statement.

CREATE TABLE ... AS SELECT Statements

A "CREATE TABLE ... AS SELECT" statement creates and populates a database table based on the results of a SELECT statement. The table has the same number of columns as the rows returned by the SELECT statement. The name of each column is the same as the name of the corresponding column in the result set of the SELECT statement. The declared type of each column is determined by the expression affinity of the corresponding expression in the result set of the SELECT statement, as follows:

Expression Affinity Column Declared Type

TEXT "TEXT"

NUMERIC "NUM"

Page 16: mc0067

INTEGER "INT"

REAL "REAL"

NONE "" (empty string)

A table created using CREATE TABLE AS has no PRIMARY KEY and no constraints of any kind. The default value of each column is NULL. The default collation sequence for each column of the new table is BINARY.

Tables created using CREATE TABLE AS are initially populated with the rows of data returned by the SELECT statement. Rows are assigned contiguously ascending rowid values, starting with 1, in the order that they are returned by the SELECT statement.

Column Definitions

Unless it is a CREATE TABLE ... AS SELECT statement, a CREATE TABLE includes one or more column definitions, optionally followed by a list of table constraints. Each column definition consists of the name of the column, optionally followed by the declared type of the column, then one or more optional column constraints. Included in the definition of "column constraints" for the purposes of the previous statement are the COLLATE and DEFAULT clauses, even though these are not really constraints in the sense that they do not restrict the data that the table may contain. The other constraints - NOT NULL, CHECK, UNIQUE, PRIMARY KEY and FOREIGN KEY constraints - impose restrictions on the tables data, and are are described under SQL Data Constraints below.

Unlike most SQL databases, SQLite does not restrict the type of data that may be inserted into a column based on the columns declared type. Instead, SQLite uses dynamic typing. The declared type of a column is used to determine theaffinity of the column only.

The DEFAULT clause specifies a default value to use for the column if no value is explicitly provided by the user when doing an INSERT. If there is no explicit DEFAULT clause attached to a column definition, then the default value of the column is NULL. An explicit DEFAULT clause may specify that the default value is NULL, a string constant, a blob constant, a signed-number, or any constant expression enclosed in parentheses. An explicit default value may

Page 17: mc0067

also be one of the special case-independent keywords CURRENT_TIME, CURRENT_DATE or CURRENT_TIMESTAMP. For the purposes of the DEFAULT clause, an expression is considered constant provided that it does not contain any sub-queries or string constants enclosed in double quotes.

Each time a row is inserted into the table by an INSERT statement that does not provide explicit values for all table columns the values stored in the new row are determined by their default values, as follows:

If the default value of the column is a constant NULL, text, blob or signed-number value, then that value is used directly in the new row.

If the default value of a column is an expression in parentheses, then the expression is evaluated once for each row inserted and the results used in the new row.

If the default value of a column is CURRENT_TIME, CURRENT_DATE or CURRENT_TIMESTAMP, then the value used in the new row is a text representation of the current UTC date and/or time. For CURRENT_TIME, the format of the value is "HH:MM:SS". For CURRENT_DATE, "YYYY-MM-DD". The format for CURRENT_TIMESTAMP is "YYYY-MM-DD HH:MM:SS".

The COLLATE clause specifies the name of a collating sequence to use as the default collation sequence for the column. If no COLLATE clause is specified, the default collation sequence is BINARY.

The number of columns in a table is limited by the SQLITE_MAX_COLUMN compile-time parameter. A single row of a table cannot store more than SQLITE_MAX_LENGTH bytes of data. Both of these limits can be lowered at runtime using the sqlite3_limit() C/C++ interface.

SQL Data Constraints

Each table in SQLite may have at most one PRIMARY KEY. If the keywords PRIMARY KEY are added to a column definition, then the primary key for the table consists of that single column. Or, if a PRIMARY KEY clause is specified as a table-constraint, then the primary key of the table consists of the list of columns specified as

Page 18: mc0067

part of the PRIMARY KEY clause. If there is more than one PRIMARY KEY clause in a single CREATE TABLE statement, it is an error.

If a table has a single column primary key, and the declared type of that column is "INTEGER", then the column is known as an INTEGER PRIMARY KEY. See below for a description of the special properties and behaviors associated with an INTEGER PRIMARY KEY.

Each row in a table with a primary key must feature a unique combination of values in its primary key columns. For the purposes of determining the uniqueness of primary key values, NULL values are considered distinct from all other values, including other NULLs. If an INSERT or UPDATE statement attempts to modify the table content so that two or more rows feature identical primary key values, it is a constraint violation. According to the SQL standard, PRIMARY KEY should always imply NOT NULL. Unfortunately, due to a long-standing coding oversight, this is not the case in SQLite. Unless the column is an INTEGER PRIMARY KEY SQLite allows NULL values in a PRIMARY KEY column. We could change SQLite to conform to the standard (and we might do so in the future), but by the time the oversight was discovered, SQLite was in such wide use that we feared breaking legacy code if we fixed the problem. So for now we have chosen to continue allowing NULLs in PRIMARY KEY columns. Developers should be aware, however, that we may change SQLite to conform to the SQL standard in future and should design new programs accordingly.

A UNIQUE constraint is similar to a PRIMARY KEY constraint, except that a single table may have any number of UNIQUE constraints. For each UNIQUE constraint on the table, each row must feature a unique combination of values in the columns identified by the UNIQUE constraint. As with PRIMARY KEY constraints, for the purposes of UNIQUE constraints NULL values are considered distinct from all other values (including other NULLs). If an INSERT or UPDATEstatement attempts to modify the table content so that two or more rows feature identical values in a set of columns that are subject to a UNIQUE constraint, it is a constraint violation.

INTEGER PRIMARY KEY columns aside, both UNIQUE and PRIMARY KEY constraints are implemented by creating an index in the

Page 19: mc0067

database (in the same way as a "CREATE UNIQUE INDEX" statement would). Such an index is used like any other index in the database to optimize queries. As a result, there often no advantage (but significant overhead) in creating an index on a set of columns that are already collectively subject to a UNIQUE or PRIMARY KEY constraint.

A CHECK constraint may be attached to a column definition or specified as a table constraint. In practice it makes no difference. Each time a new row is inserted into the table or an existing row is updated, the expression associated with each CHECK constraint is evaluated and cast to a NUMERIC value in the same way as a CAST expression. If the result is zero (integer value 0 or real value 0.0), then a constraint violation has occurred. If the CHECK expression evaluates to NULL, or any other non-zero value, it is not a constraint violation. The expression of a CHECK constraint may not contain a subquery.

CHECK constraints have been supported since version 3.3.0. Prior to version 3.3.0, CHECK constraints were parsed but not enforced.

A NOT NULL constraint may only be attached to a column definition, not specified as a table constraint. Not surprisingly, a NOT NULL constraint dictates that the associated column may not contain a NULL value. Attempting to set the column value to NULL when inserting a new row or updating an existing one causes a constraint violation.

Exactly how a constraint violation is dealt with is determined by the constraint conflict resolution algorithm. Each PRIMARY KEY, UNIQUE, NOT NULL and CHECK constraint has a default conflict resolution algorithm. PRIMARY KEY, UNIQUE and NOT NULL constraints may be explicitly assigned a default conflict resolution algorithm by including aconflict-clause in their definitions. Or, if a constraint definition does not include a conflict-clause or it is a CHECK constraint, the default conflict resolution algorithm is ABORT. Different constraints within the same table may have different default conflict resolution algorithms. See the section titled ON CONFLICT for additional information.

Page 20: mc0067

Aggregate functions compute a single result value from a set of input values. The built-in aggregate functions are listed

in Table 9-37 andTable 9-38. The special syntax considerations for aggregate

Table 9-37. General-Purpose Aggregate Functions

Function Argument Type Return TypeDescriptio

n

avg(expression)

smallint, int, bigint,real, double precision,numeric, or interval

numeric for any integer type argument, double precision for a floating-point argument, otherwise the same as the argument data type

the average (arithmetic mean) of all input values

bit_and(expression)

smallint, int, bigint, orbit same as argument data type

the bitwise AND of all non-null input values, or null if none

bit_or(expression)

smallint, int, bigint, orbit same as argument data type

the bitwise OR of all non-null input values, or null if none

bool_and(expression)

bool bool

true if all input values are true, otherwise false

Page 21: mc0067

Function Argument Type Return TypeDescriptio

n

bool_or(expression)

bool bool

true if at least one input value is true, otherwise false

count(*)   bigint number of input rows

count(expression)

any bigint

number of input rows for which the value ofexpression is not null

every(expression)

bool boolequivalent to bool_and

max(expression)

any array, numeric, string, or date/time type

same as argument type

maximum value ofexpression across all input values

min any array, numeric, same as argument type minimum

Page 22: mc0067

Function Argument Type Return TypeDescriptio

n

(expression)

string, or date/time type

value ofexpression across all input values

sum(expression)

smallint, int, bigint,real, double precision,numeric, or interval

bigint for smallint or int arguments, numeric for bigintarguments, double precision for floating-point arguments, otherwise the same as the argument data type

sum of expressionacross all input values

It should be noted that except for count, these functions return a null value when no rows are selected. In

particular, sum of no rows returns null, not zero as one might expect. The coalesce function may be used to substitute

zero for null when necessary.

Table 9-38 shows aggregate functions typically used in statistical analysis. (These are separated out merely to avoid

cluttering the listing of more-commonly-used aggregates.) Where the description mentions N, it means the number of input

rows for which all the input expressions are non-null. In all cases, null is returned if the computation is meaningless, for

example when N is zero.

Table 9-38. Aggregate Functions for Statistics

Function Argument Type Return Type Description

corr(Y, X) double precisiondouble precision

correlation coefficient

Page 23: mc0067

Function Argument Type Return Type Description

covar_pop(Y, X) double precisiondouble precision

population covariance

covar_samp(Y, X)

double precisiondouble precision

sample covariance

regr_avgx(Y, X) double precisiondouble precision

average of the independent variable (sum(X)/N)

regr_avgy(Y, X) double precisiondouble precision

average of the dependent variable (sum(Y)/N)

regr_count(Y, X)

double precision bigint

number of input rows in which both expressions are nonnull

regr_intercept(Y, X)

double precisiondouble precision

y-intercept of the least-squares-fit linear equation determined by the (X, Y) pairs

Page 24: mc0067

Function Argument Type Return Type Description

regr_r2(Y, X) double precisiondouble precision

square of the correlation coefficient

regr_slope(Y, X)

double precisiondouble precision

slope of the least-squares-fit linear equation determined by the (X, Y) pairs

regr_sxx(Y, X) double precisiondouble precision

sum(X^2) - sum(X)^2/N ("sum of squares" of the independent variable)

regr_sxy(Y, X) double precisiondouble precision

sum(X*Y) - sum(X) * sum(Y)/N ("sum of products" of independent times dependent variable)

regr_syy(Y, X) double precision double precision

sum(Y^2) - sum

Page 25: mc0067

Function Argument Type Return Type Description

(Y)^2/N ("sum of squares" of the dependent variable)

stddev(expression)

smallint, int, bigint, real,double precision, ornumeric

double precision for floating-point arguments, otherwisenumeric

historical alias for stddev_samp

stddev_pop(expression)

smallint, int, bigint, real,double precision, ornumeric

double precision for floating-point arguments, otherwisenumeric

population standard deviation of the input values

stddev_samp(expression)

smallint, int, bigint, real,double precision, ornumeric

double precision for floating-point arguments, otherwisenumeric

sample standard deviation of the input values

variance(expression)

smallint, int, bigint, real,double precision, ornumeric

double precision for floating-point

historical alias for var_samp

Page 26: mc0067

Function Argument Type Return Type Description

arguments, otherwisenumeric

var_pop(expression)

smallint, int, bigint, real,double precision, ornumeric

double precision for floating-point arguments, otherwisenumeric

population variance of the input values (square of the population standard deviation)

var_samp(expression)

smallint, int, bigint, real,double precision, ornumeric

double precision for floating-point arguments, otherwisenumeric

sample variance of the input values (square of the sample standard deviation)

This section describes the SQL-compliant subquery expressions available in PostgreSQL. All of the expression forms documented in this section return Boolean (true/false) results.

EXISTS

EXISTS (subquery)

The argument of EXISTS is an arbitrary SELECT statement,

or subquery. The subquery is evaluated to determine whether it returns any rows. If it returns at least one row, the result

Page 27: mc0067

of EXISTS is "true"; if the subquery returns no rows, the result of EXISTS is "false".

The subquery can refer to variables from the surrounding query, which will act as constants during any one evaluation of the subquery.

The subquery will generally only be executed far enough to determine whether at least one row is returned, not all the way to completion. It is unwise to write a subquery that has any side effects (such as calling sequence functions); whether the side effects occur or not may be difficult to predict.

Since the result depends only on whether any rows are returned, and not on the contents of those rows, the output list of the subquery is normally uninteresting. A common coding convention is to write all EXISTS tests in the form EXISTS(SELECT 1 WHERE ...).

There are exceptions to this rule however, such as subqueries that use INTERSECT.

This simple example is like an inner join on col2, but it produces at most one output row for each tab1 row, even if there are multiple matching tab2 rows:

SELECT col1 FROM tab1

WHERE EXISTS(SELECT 1 FROM tab2 WHERE col2 = tab1.col2);

IN

expression IN (subquery)

The right-hand side is a parenthesized subquery, which must return exactly one column. The left-hand expression is evaluated and

Page 28: mc0067

compared to each row of the subquery result. The result of IN is "true" if any equal subquery row is found. The result

is "false" if no equal row is found (including the special case where the subquery returns no rows).

Note that if the left-hand expression yields null, or if there are no equal right-hand values and at least one right-hand row yields null, the result of the IN construct will be null, not false. This is in accordance with

SQL's normal rules for Boolean combinations of null values.

As with EXISTS, it's unwise to assume that the subquery will be

evaluated completely.

row_constructor IN (subquery)

The left-hand side of this form of IN is a row constructor, as described

The right-hand side is a parenthesized subquery, which must return exactly as many columns as there are expressions in the left-hand row. The left-hand expressions are evaluated and compared row-wise to each row of the subquery result. The result of IN is "true" if any equal

subquery row is found. The result is "false" if no equal row is found (including the special case where the subquery returns no rows).

As usual, null values in the rows are combined per the normal rules of SQL Boolean expressions. Two rows are considered equal if all their corresponding members are non-null and equal; the rows are unequal if any corresponding members are non-null and unequal; otherwise the result of that row comparison is unknown (null). If all the per-row results are either unequal or null, with at least one null, then the result of IN is null.

NOT IN

expression NOT IN (subquery)

Page 29: mc0067

The right-hand side is a parenthesized subquery, which must return exactly one column. The left-hand expression is evaluated and compared to each row of the subquery result. The result of NOT IN is "true" if only unequal subquery rows are found (including the

special case where the subquery returns no rows). The result is "false" if any equal row is found.

Note that if the left-hand expression yields null, or if there are no equal right-hand values and at least one right-hand row yields null, the result of the NOT IN construct will be null, not true. This is in accordance

with SQL's normal rules for Boolean combinations of null values.

As with EXISTS, it's unwise to assume that the subquery will be

evaluated completely.

row_constructor NOT IN (subquery)

The left-hand side of this form of NOT IN is a row constructor, as

described The right-hand side is a parenthesized subquery, which must return exactly as many columns as there are expressions in the left-hand row. The left-hand expressions are evaluated and compared row-wise to each row of the subquery result. The result of NOT IN is "true" if only unequal subquery rows are found (including the

special case where the subquery returns no rows). The result is "false" if any equal row is found.

As usual, null values in the rows are combined per the normal rules of SQL Boolean expressions. Two rows are considered equal if all their corresponding members are non-null and equal; the rows are unequal if any corresponding members are non-null and unequal; otherwise the result of that row comparison is unknown (null). If all the per-row results are either unequal or null, with at least one null, then the result of NOT IN is null.

Page 30: mc0067

ANY/SOME

expression operator ANY (subquery)

expression operator SOME (subquery)

The right-hand side is a parenthesized subquery, which must return exactly one column. The left-hand expression is evaluated and compared to each row of the subquery result using the given operator, which must yield a Boolean result. The result of ANY is "true" if any true result is obtained. The result is "false" if no

true result is found (including the special case where the subquery returns no rows).

SOME is a synonym for ANY. IN is equivalent to = ANY.

Note that if there are no successes and at least one right-hand row yields null for the operator's result, the result of the ANY construct will

be null, not false. This is in accordance with SQL's normal rules for Boolean combinations of null values.

As with EXISTS, it's unwise to assume that the subquery will be

evaluated completely.

row_constructor operator ANY (subquery)

row_constructor operator SOME (subquery)

The left-hand side of this form of ANY is a row constructor, as

described in  The right-hand side is a parenthesized subquery, which must return exactly as many columns as there are expressions in the left-hand row. The left-hand expressions are evaluated and compared row-wise to each row of the subquery result, using the given operator. The result of ANY is "true" if the comparison returns

Page 31: mc0067

true for any subquery row. The result is "false" if the comparison returns false for every subquery row (including the special case where the subquery returns no rows). The result is NULL if the comparison does not return true for any row, and it returns NULL for at least one row.

ALL

expression operator ALL (subquery)

The right-hand side is a parenthesized subquery, which must return exactly one column. The left-hand expression is evaluated and compared to each row of the subquery result using the given operator, which must yield a Boolean result. The result of ALL is "true" if all rows yield true (including the special case where

the subquery returns no rows). The result is "false" if any false result is found. The result is NULL if the comparison does not return false for any row, and it returns NULL for at least one row.

NOT IN is equivalent to <> ALL.

As with EXISTS, it's unwise to assume that the subquery will be

evaluated completely.

row_constructor operator ALL (subquery)

The left-hand side of this form of ALL is a row constructor, as

described The right-hand side is a parenthesized subquery, which must return exactly as many columns as there are expressions in the left-hand row. The left-hand expressions are evaluated and compared row-wise to each row of the subquery result, using the given operator. The result of ALL is "true" if the comparison returns true for all

subquery rows (including the special case where the subquery returns no rows). The result is "false" if the comparison returns false for any

Page 32: mc0067

subquery row. The result is NULL if the comparison does not return false for any subquery row, and it returns NULL for at least one row.

.

Row-wise Comparison

row_constructor operator (subquery)

The left-hand side is a row constructor, as described in Section 4.2.11. The right-hand side is a parenthesized subquery, which must return exactly as many columns as there are expressions in the left-hand row. Furthermore, the subquery cannot return more than one row. (If it returns zero rows, the result is taken to be null.) The left-hand side is evaluated and compared row-wise to the single subquery result row.

5. Compare and Contrast the Centralized and Client / Server Architecture for DBMS

Answer : In centralized database systems, the database system, application programs, and user-interface all are

executed on a single system and dummy terminals are connected to it. The processing power of single system is utilized

and dummy terminals are used only to display the information. As the personal computers became faster, more powerful,

and cheaper, the database system started to exploit the available processing power of the system at the user’s side, which

led to the development of client/server architecture. In client/server architecture, the processing power of the computer

system at the user’s end is utilized by processing the user-interface on that system.

A client is a computer system that sends request to the server connected to the network, and aserver is a computer

system that receives the request, processes it, and returns the requested information back to the client. Client and server

are usually present at different sites. The end users (remote database users) work on client computer system and database

system runs on the server. Servers can be of several types, for example, file servers, printer servers, web servers, database

servers, etc. The client machines have user interfaces that help users to utilize the servers. It also provides users the local

processing power to run local applications on the client side.

There are two approaches to implement client/server architecture. In the first approach, the user interface and application

programs are placed on the client side and the database system on the server side. This architecture is called two-tier

architecture. The application programs that reside at the client side invoke the DBMS at the server side. The application

program interface standards like Open Database Connectivity (ODBC) and Java Database Connectivity (JDBC) are used for

interaction between client and server.  shows two-tier architecture.

Page 33: mc0067

CLIENT

SERVER

The second approach, that is, three-tier architecture is primarily used for web-based applications. It adds intermediate

layer known as application server (or web server) between the client and the database server. The client communicates

with the application server, which in turn communicates with the database server. The application server stores the

business rules (procedures and constraints) used for accessing data from database server. It checks the client’s credentials

before forwarding a request to database server. Hence, it improves database security.

When a client requests for information, the application server accepts the request, processes it, and sends corresponding

database commands to database server. The database server sends the result back to application server which is

converted into GUI format and presented to the client.  shows the three-tier architecture.

Three-tier architecture

GUI WEB INTERFASE

CLIENT

APPLICATIONS PROGRAMS WEB PAGES

APPLICATION SERVER / WEB SERVER

DATA BASE SYSTEMDATA BASE SERVER

6. Taking an example Enterprise System, List out the Entity types, Entity Sets, Attributes and Keys

Answer : Entity Types, Entity Sets, Attributes, and Keys

END USER

APPLICATION

PROGRAMMER

DATABASE

SYSTEMS

Page 34: mc0067

Entities and Attributes

Entities and Their Attributes. The basic object that the ER model represents is an entity, which is a "thing" in the real world with an independent existence. An entity may be an object with a physical existence (for example, a particular person, car, house, or employee) or it may be an object with a conceptual existence (for example, a company, a job, or a university course). Each entity has attributes—the particular properties that describe it. For example, an employee entity may be described by the employee’s name, age, address, salary, and job. A particular entity will have a value for each of its attributes. The attribute values that describe each entity become a major part of the data stored in the database.

Figure shows an E-R schema diagram for the company database

Figure 2.3 shows two entities and the values of their attributes. The employee entity e, has four attributes: Name, Address, Age, and HomePhone; their values are "Mahesh Kumar," "2311 Ameerpet, Hyderabad, AP 500001," "55," and "402459672," respectively. The company entity c, has three attributes: Name, Headquarters, and President; their values are "CDAC," "Hyderabad," and "Mahesh Kumar," respectively.

Figure shows an E-R schema diagram for the company database

Several types of attributes occur in the ER model: simple versus composite, single-valued versus multivalued, and stored versus derived. We first define these attribute types and illustrate their use via examples. We then introduce the concept of a null value for an attribute.

Composite versus Simple (Atomic) Attributes: Composite attributes can be divided into smaller subparts, which represent more basic attributes with independent meanings. For

Page 35: mc0067

example, the Address attribute of the employee entity shown in Figure 2.3 can be subdivided into StreetAddress, City, State, and Zip, with the values "2311 Ameerpet," "Hyderabad," "AR" and "500001." Attributes that are not divisible are called simple or atomic attributes. Composite attributes can form a hierarchy. For example, StreetAddress can be further subdivided into three simple attributes: Number, Street, and ApartmentNumber, as shown in Figure 2.4. The value of a composite attribute is the concatenation of the values of its constituent simple attributes.

A hierarchy of composite attributes

Composite attributes are useful to model situations in which a user sometimes refers to the composite attribute as a unit but at other times refers specifically to its components. If the composite attribute is referenced only as a whole, there is no need to subdivide it into component attributes. For example, if there is no need to refer to the individual components of an address (zip code, street, and so on), then the whole address can be designated as a simple attribute.

Single-Valued versus Multivalued Attributes: Most attributes have a single value for a particular entity; such attributes are called single-valued. For example, Age is a single-valued attribute of a person. In some cases an attribute can have a set of values for the same entity—for example, a Colors attribute for a car, or a CollegeDegrees attribute for a person. Cars with one color have a single value, whereas two-tone cars have two values for Colors. Similarly, one person may not have a college degree, another person may have one, and a third person may have two or more degrees; therefore, different persons can have different numbers of values for the CollegeDegrees attribute. Such attributes arc called multivalued. A multivalued attribute may have lower and upper bounds to constrain the number of values allowed for each individual entity. For example, the Colors attribute of a car may have between one and three values, if we assume that a car can have at most three colors.

Stored versus Derived Attributes. In some cases, two (or more) attribute values are related – for example, the Age and POB attributes of a person. For a particular person entity, the value of Age can be determined from the current (today’s) date and the value of that person’s date of birth (DOB). The Age attribute is hence called a derived attribute and is said to be derivable from the DOB attribute, which is called a stored attribute. Some attribute values can be derived from related entities. For example, an attribute Number Of Employees of a department entity can be derived by counting the number of employees related to (working for) that department.

Page 36: mc0067

Null Values. In some cases a particular entity may not have an applicable value for an attribute. For example, the ApartmentNumber attribute of an address applies only to addresses that are in apartment buildings and not to other types of residences, such as single-family homes. Similarly, a CollegeDegrees attribute applies only to persons with college degrees. For such situations, a special value called null is created. An address of a single-family home would have null for its ApartmentNumber attribute, and a person with no college degree would have null for ColicgcDegrees. Null can also be used if we do not know the value of an attribute for a particular entity – for example, if we do not know the home phone of "Mahesh Kumar" the meaning of the former type of null is not applicable, whereas the meaning of the latter is unknown. The "unknown" category of null can be further classified into two cases. The first: case arises when it is known that the attribute value exists but is missing – for example, if the Height attribute of a person is listed as null. The second case arises when it is net known whether the attribute value exists – tor example, if the HomePhone attribute of a person is null.

Complex Attributes. Notice that composite and multivalued attributes can be nested in an arbitrary way. We can represent arbitrary nesting by grouping components of a composite attribute between parentheses () and separating the components with commas, and by displaying multivalued attributes between braces {}. Such attributes are called complex attributes. For example, if a person can have more than one residence and each residence can have multiple phones, an attribute AddressPhone for a person can be specified as shown in Figure 2.5.

A complex attribute: AddressPhone

Entity Types, Entity Sets, Keys, and Value Sets

Entity Types and Entity Sets. A database usually contains groups of entities that are similar. For example, a company employing hundreds of employees may want to store pimilar information concerning each of the employees. These employee entities share the same attributes, but each entity has its own value(s) tor each attribute. An entity type defines a collection (or set) of entities that have the same attributes. Each entity type in the database is described by its name and attributes. Figure 2.6 shows two entity types, named employee and COMPANY, and a list of attributes for each. A few individual entities of each type are also illustrated, alone; with the values of their attributes. The collection of all entities of a particular entity type in the database at any point in time is called an entity set; the entity set is usually referred to using the same name as the entity type. For example, employee refers to both a type of entity as well as the current set of all employee entities m the database.

An entity type is represented in ER diagrams as a rectangular box enclosing the entity type name. Attribute names are enclosed in ovals and are attached to their entity type by

Page 37: mc0067

straight lines. Composite attributes are attached to their component attributes by straight lines. Multivalued attributes arc displayed in double ovals.

An entity type describes the schema or intension for a set of entities that share the same structure. The collection of entities of a particular entity type are grouped into an entity set, which is also called the extension of the entity type.

Key Attributes Of an Entity Type

An important constraint on the entities of an entity type is the key or uniqueness constraint on attributes. An entity type usually has an attribute whose values are distinct for each individual entity in the entity set. Such an attribute is called a key attribute, and its values can be used to identify each entity uniquely. For example, the Name attribute is a key of the company entity type in Figure 2.6, because no two companies are allowed to have the same name. Sometimes, several attributes together form a key, meaning that the combination of the attribute values must be distinct for each entity. If a set of attributes possesses this property, the proper way to represent this in the ER model that we describe here.

Specifying that an attribute is a key of an entity type means that the preceding uniqueness property must hold for every entity set of the entity type. Hence, it is a constraint that prohibits any two entities from having the same value for the key attribute at the same time. It is not the property of a particular extension; rather, it is a constraint on all extensions of the entity type. This key constraint (and other constraints we discuss later) is derived from the constraints of the miniworld that the database represents.

Some entity types have more than one key attribute. For example, each of the VehiclelD and Registration attributes of the entity type CAR is a key in its own right. The Registration attribute is an example of a composite key formed from two simple component attributes, RegistrationNumber and State, neither of which is a key on its own. An entity type may also have no key, in which case it is called a weak entity type.

Value Sets (Domains) of Attributes

Each simple attribute of an entity type is associated with a value set (or domain of values), which specifies the set of values that may be assigned to that attribute for each individual entity. We can specify the value set for the Name attribute as being the set of strings of alphabetic characters separated by blank characters, and so on. Value sets are not displayed in ER diagrams. Value sets are typically specified using the basic data types available in most programming languages, such as integer, string, boolean, float, enumerated type, subrange, and so on. Additional data types to represent date, time, and other concepts are also employed.

Mathematically, an attribute A of entity type E whose value set is V can be defined as a function from E to the power set P(V) of V:

Page 38: mc0067

A : E -» P(V)

We refer to the value of attribute A for entity e as A(e). The previous definition covers both single-valued and multivalued attributes, as well as nulls. A null value is represented by the empty set. For single-valued attributes, A(e) is restricted to being a singleton set for each entity e in E, whereas there is no restriction on multivalued attributes. For a composite attribute A, the value set V is the Cartesian product of:

P(V2),…, P(VJ, where V,, V2,. . ., Vn are the value sets of the simple component attributes that form A: V = P(V,) xP(V,) x … x

7. Illustrate with an example of your own the Relational Model Notations

Answer : Relational Data Model:The model uses the concept of a mathematical relation-which looks somewhat like a table of values-as its basic building block, and has its theoretical basis in set theory and first order predicate logic.The relational model represents the database a collection of relations. Each relation resembles a table of values or, to some extent, a “flat” file of records. When a relation is thought of as a table of values, each row in the table represents a collection of related data values. In the relation model, each row in the table represents a fact that typically corresponds to a real-world entity or relationship. The table name and column names are used to help in interpreting the meaning of the values in each row. In the formal relational model terminology, a row is called a tuple, a column header is called an attribute, and the table is called a relation. The data type describing the types of values that can appear in each column is represented by domain of possible values.ER Model:An entity-relationship model (ERM) is an abstract and conceptual representation of data. Entity-relationship modeling is a database modeling method, used to produce a type of conceptual schema or semantic data model of a system, often a relational database, and its requirements in a top-down fashion. Diagrams created by this process are called entity-relationship diagrams, ER diagrams, or ERDs.

The first stage of information system design uses these models during the requirements analysis to describe information needs or the type of information that is to be stored in a database. In the case of the design of an information system that is based on a database, the conceptual data model is, at a later stage (usually called logical design), mapped to a logical data model, such as the relational model; this in turn is mapped to a physical model during physical design. We create a relational schema from an entity-relationship(ER) schema.In the case of the design of an information system that is based on a database, the conceptual data model is, at a later stage (usually called logical design), mapped to a logical data model, such as the relational model; this in turn is mapped to a physical model during physical design. Sometimes, both of these phases are referred to as

Page 39: mc0067

"physical design". Key elements of this model are entities, attributes, identifiers and relationships.

Correspondence between ER and Relational Models:

ER Model Relational ModelEntity type “Entity” relation1:1 or 1:N relationship type Foregin keyM:N relationship type “Relationship” relation and two foreign

keysn ary relationship type “Relationship” relation and n foreign keysSimple attributes AttributesComposite attributes Set of simple component attributesMultivalued attributes Relation and foreign keyValue set DomainKey attribute Primary key or secondary key

The COMPANY ER schema is below:

Page 40: mc0067

Result of mapping the company ER schema into a relational database schema:

EMPLOYEE

FNAME

INITIAL

LNAME

ENO

DOB

ADDRESS

SEX SALARY

SUPERENO

DNO

DEPARTMENT

Page 41: mc0067

DNAME DNUMBER MGRENO MGRSTARTDATE

DEPT_LOCATIONS

DNUMBER DLOCATION

PROJECT

PNAME PNUMBER PLOCATION DNUM

WORKS_ON

EENO PNO HOURS

DEPENDENT

EENO DEPENDENT_NAME SEX DOB RELATIONSHIP

8. Consider a University Database System and develop the ER Conceptual Schema diagram i.e. E-R Diagram for the same 

This used to be where I'd let off steam after uncovering a nasty bug in the Illustra object-relational database management system. And, in fact, sometimes I reached such poetic heights of vitriol that I'm leaving the old stuff at the bottom (also, it might be useful if you are still running Illustra for some reason).

However, there really aren't any good reasons to pick on Illustra anymore. The company was bought by Informix, one of the "big three" traditional RDBMS vendors (Oracle and Sybase being the other two). Informix basically folded the interesting features of the old Illustra system into their industrial-strength enterprise-scale RDBMS and calls the result "Informix Universal Server" (IUS). To the extent that IUS is based on old code, it is based on Informix's tried and true Online Server, which has been keeping banks and insurance companies with thousands of simultaneous users up and running for many years.

I plan to be experimenting with IUS in some heavily accessed sites during the latter portion of 1997. I'm going to record my experiences here and hope to have lots of tips and source code to distribute. Schemas, Instances, and Database State

Page 42: mc0067

In any data model, it is important to distinguish between the description of the database and the database itself. The description of a database is called the database schema, which is specified during database design and is not expected to change frequently. Most data models have certain conventions for displaying schemas as diagrams. A displayed schema is called a schema diagram. Figure 1.1 shows a schema diagram for a database; the diagram displays the structure of each record type but not the actual instances of records. We call each object in the schema – such as STUDENT or COURSE – a schema construct.

A schema diagram displays only some aspects of a schema, such as the names of record types and data items, and some types of constraints. Other aspects are not specified in the schema diagram. Many types of constraints are not represented in schema diagrams. A constraint such as "students majoring in computer science must take CS1310 before the end of their second year" is quite difficult to represent.

The actual data in a database may change quite frequently. The data in the database at a particular moment in time is called a database state or snapshot. It is also called the current set of occurrences or instances in the database. In a given database state, each schema construct has its own current set of instances. Many database states can be constructed to correspond to a particular database schema. Every time we insert or delete a record or change the value of a data item in a record, we change one state of the database into another state.

The distinction between database schema and database state is very important. When we define a new database, we specify its database schema only to the DBMS. At this point, the corresponding database state is the empty state with no data. We get the initial state of the database when the database is first populated or loaded with the initial data. From then on, every time an update operation is applied to the database, we get another database state. At any point in time, the database has a current state. The DBMS is partly responsible for ensuring that every state of the database is a valid state – that is, a state that satisfies the structure and constraints specified in the schema. Hence, specifying a correct schema to the DBMS is extremely important, and the schema must be designed with the utmost care. The DBMS stores the descriptions of the schema constructs and constraints – also called the meta-data – in the DBMS catalog so that DBMS software can refer to the schema whenever it needs to. The schema is sometimes called the intension, and a database state an extension of the schema.

Although, as mentioned earlier, the schema is not supposed to change frequently, it is not uncommon that changes need to be occasionally applied to the schema as the application requirements change. Most modern DBMSs include some operations for schema evolution that can be applied while the database is operational.

Page 43: mc0067

MC0067 – Database ManagementAssignment Set – 2

BOOK ID: B0716 & BO717

1. Write about: Physical Storage Structure of DBMS

Indexing

Answer : Physical Storage Structure of DBMS:

The physical design of the database specifies the physical configuration of the database

on the storage media. This includes detailed specification of data elements, data types,

indexing options and other parameters residing in the DBMS data dictionary. It is the

detailed design of a system that includes modules & the database's hardware & software

specifications of the system. Physical structures are those that can be seen and

operated on from the operating system, such as the physical files that store data on a

disk.

• Basic Storage Concepts (Hard Disk)

• disk access time = seek time + rotational delay

• disk access times are much slower than access to main memory.

overriding DBMS performance objective is to minimise the number of disk accesses

(disk I/Os)

Indexing:

Data structure allowing a DBMS to locate particular records more quickly and hence

speed up queries.Book index has index term (stored in alphabetic order) with a page

number.Database index (on a particular attribute) has attribute value (stored in order)

with a memory address.An index gives direct access to a record and prevents having to

scan every record sequentially to find the one required.

• Using SUPPLIER(Supp# , SName, SCity)

Consider the query Get all the suppliers in a certain city ( e.g. London)

2 possible strategies:

a. Search the entire supplier file for records with city 'London'

Page 44: mc0067

b. Create an index on cities, access it for 'London’ entries and follow the pointer to the

corresponding records

SCity Index Supp# SName SCity

Dublin S1 Smith London

London S2 Jones Paris

London S3 Brown Paris

Paris S4 Clark London

Paris S5 Ellis Dublin

2. Write about: Application Logic

One Tier Architecture

Client / Server Architecture

Answer : Application Logic

Database architectures can be distinguished by examining the way application logic is distributed throughout the system.  Application logic consists of three components:  Presentation Logic, Processing Logic, and Storage Logic. 

The presentation logic component is responsible for formatting and presenting data on the user’s screen The processing logic component handles data processing logic, business rules logic, and data management logic.  Finally, the storage logic component is responsible for the storage and retrieval from actual devices such as a hard drive or RAM. By determining which tier(s) these components are processed on we can get a good idea of what type of architecture and subtype we are dealing with

 One Tier Architectures

Imagine a person on a desktop computer who uses Microsoft Access to load up a list of personal addresses and phone numbers that he or she has saved in MS Windows’ “My Documents” folder.  This is an example of a one-tier database architecture.  The program (Microsoft Access) runs on the user’s local machine, and references a file that is stored on that machine’s hard drive, thus using a single physical resource to access and process information.

Another example of a one-tier architecture is a file server architecture.  In this scenario, a workgroup database is stored in a shared location on a single machine. Workgroup members use a software package such as Microsoft Access to load the data and then process it on their local machine.  In this case, the data may be shared among different users, but all of the processing occurs on the local machine.  Essentially, the file-server is just an extra hard drive from which to retrieve files.

Yet another way one-tier architectures have appeared is in that of mainframe computing.  In this outdated system, large machines provide directly connected unintelligent terminals with the means necessary to access, view and manipulate data.  Even though this is considered a client-server system, since all of the processing power (for both data and applications) occurs on a single machine, we have a one-tier architecture.

One-tier architectures can be beneficial when we are dealing with data that is relevant to a single user (or small number of users) and we have a relatively small amount of data.  They are somewhat inexpensive to deploy and maintain.

Client/Server Architectures

A two-tier architecture is one that is familiar to many of today’s computer users.  A common implementation of this type of system is that of a Microsoft Windows based client program that accesses a server database such as Oracle or SQL Server.  Users interact through a GUI (Graphical User Interface) to communicate with the database server across a network via SQL (Structured Query Language).

Page 45: mc0067

In two-tier architectures it is important to note that two configurations exist.  A thin-client (fat-server) configuration exists when most of the processing occurs on the server tier.  Conversely, a fat-client (thin-server) configuration exists when most of the processing occurs on the client machine. Another example of a two-tier architecture can be seen in web-based database applications.  In this case, users interact with the database through applications that are hosted on a web-server and displayed through a web-browser such as Internet Explorer. The web server processes the web application, which can be written in a language such as PHP or ASP.  The web app connects to a database server to pass along SQL statements which in turn are used to access, view, and modify data.  The DB server then passes back the requested data which is then formatted by the web server for the user.

Although this appears to be a three-tier system because of the number of machines required to complete the process, it is not.  The web-server does not normally house any of the business rules and therefore should be considered part of the client tier in partnership with the web-browser. Two-tier architectures can prove to be beneficial when we have a relatively small number of users on the system (100-150) and we desire an increased level of scalability.

  Client-Server Architecture

3. Write about: Basic Constructs of E-R Modeling

E-R Notations with examples

Answer : Basic Constructs of E-R Modeling:The ER model views the real world as a construct of entities and association between entities. The basic constructs of ER modeling are entities, attributes, and relationships.Entity:An entity may be defined as a thing which is recognized as being capable of an independent existence and which can be uniquely identified. An entity is an abstraction from the complexities of some domain. When we speak of an entity we normally speak of some aspect of the real world which can be distinguished from other aspects of the real world.

An entity may be a physical object such as a house or a car, an event such as a house sale or a car service, or a concept such as a customer transaction or order. Although the term entity is the one most commonly used, following Chen we should really distinguish between an entity and an entity-type. An entity-type is a category. An entity, strictly speaking, is an instance of a given entity-type. There are usually many instances of an entity-type. Because the term entity-type is somewhat cumbersome, most people tend to use the term entity as a synonym for this term.

Entities can be thought of as nouns. Examples: a computer, an employee, a song, a mathematical theorem.

Relationship:

Page 46: mc0067

A relationship captures how two or more entities are related to one another. Relationships can be thought of as verbs, linking two or more nouns. Examples: an owns relationship between a company and a computer, a supervises relationship between an employee and a department, a performs relationship between an artist and a song, a proved relationship between a mathematician and a theorem.

Attributes:

Entities and relationships can both have attributes. Examples: an employee entity might have a Social Security Number (SSN) attribute; the proved relationship may have a date attribute.

 ER Notation

There is no standard for representing data objects in ER diagrams. Each modeling methodology uses its own notation. The original notation used by Chen is widely used in academics texts and journals but rarely seen in either CASE tools or publications by non-academics. Today, there are a number of notations used, among the more common are Bachman, crow’s foot, and IDEFIX.

All notational styles represent entities as rectangular boxes and relationships as lines connecting boxes. Each style uses a special set of symbols to represent the cardinality of a connection. The notation used in this document is from Martin. The symbols used for the basic ER constructs are:

· entities are represented by labeled rectangles. The label is the name of the entity. Entity names should be singular nouns.

· relationships are represented by a solid line connecting two entities. The name of the relationship is written above the line. Relationship names should be verbs.

· attributes, when included, are listed inside the entity rectangle. Attributes which are identifiers are underlined. Attribute names should be singular nouns.

· cardinality of many is represented by a line ending in a crow’s foot. If the crow’s foot is omitted, the cardinality is one.

· existence is represented by placing a circle or a perpendicular bar on the line. Mandatory existence is shown by the bar (looks like a 1) next to the entity for an instance is required. Optional existence is shown by placing a circle next to the entity that is optional.

Examples of these symbols are : ER Notation

4. Write about: Types of Discretionary Privileges

Propagation of Privileges using Grant Option

with appropriate examples for each

Answer : Types of Discretionary Privileges:

Page 47: mc0067

The concept of an authorization identifier is used to refer, to a user account. The DBMS must provide selective access to each relation in the database based on specific accounts. There are two levels for assigning privileges to use use the database system:

The account level: At this level, the DBA specifies the particular privileges that each account holds independently of the relations in the database.

The relation (or table level): At this level, the DBA can control the privilege to access each individual relation or view in the database.

The privileges at the account level apply to the capabilities provided to the account itself and can include the CREATE SCHEMA or CREATE TABLE privilege, to create a schema or base relation; the CREATE VIEW privilege; the ALTER privilege, to apply schema changes such adding or removing attributes from relations; the DROP privilege, to delete relations or views; the MODIFY privilege, to insert, delete, or update tuples; and the SELECT privilege, to retrieve information from the database by using a SELECT query.The second level of privileges applies to the relation level, whether they are base relations or virtual (view) relations.The granting and revoking of privileges generally follow an authorization model for discretionary privileges known as the access matrix model, where the rows of a matrix M represents subjects (users, accounts, programs) and the columns represent objects (relations, records, columns, views, operations). Each position M(i,j) in the matrix represents the types of privileges (read, write, update) that subject i holds on object j. To control the granting and revoking of relation privileges, each relation R in a database is assigned and owner account, which is typically the account that was used when the relation was created in the first place. The owner of a relation is given all privileges on that relation. The owner account holder can pass privileges on R to other users by granting privileges to their accounts.In SQL the following types of privileges can be granted on each individual relation R:

SELECT (retrieval or read) privilege on R: Gives the account retrieval privilege. In SQL this gives the account the privilege to use the SELECT statement to retrieve tuples from R.

MODIFY privileges on R: This gives the account the capability to modify tuples of R. In SQL this privilege is further divided into UPDATE, DELETE, and INSERT privileges to apply the corresponding SQL command to R. In addition, both the INSERT and UPDATE privileges can specify that only certain attributes can be updated or inserted by the account.

REFERENCES privilege on R: This gives the account the capability to reference relation R when specifying integrity constraints. The privilege can also be restricted to specific attributes of R.

Propagation of Privileges using the GRANT OPTION:Whenever the owner A of a relation R grants a privilege on R to another account B, privilege can be given to B with or without the GRANT OPTION. If the GRANT OPTION is given, this means that B can also grant that privilege on R to other accounts. Suppose that B is given the GRANT OPTION by A and that B then grants the privilege on R to a third account C, also with GRANT OPTION. In this way, privileges on R can propagate to other accounts without the knowledge of the

Page 48: mc0067

owner of R. If the owner account A now revokes the privilege granted to B, all the privileges that B propagated based on that privilege should automatically be revoked by the system.

5. Describe the following Association Rules: Classification

Clustering

ClassificationClass i f i ca t i on   i s   t he  p rocess  o f   l ea rn ing  a  mode l   t ha t  desc r i besdifferent classes of data. The classes are predetermined. For example, in ab a n k i n g   a p p l i c a t i o n ,   c u s t o m e r s   w h o   a p p l y   f o r   a   c r e d i t   c a r d  m a y   b e classified as a “poor risk,” a “fair risk,” or a “good risk.” Hence this type of activity is also called supervised learning. Once the model is built, then itcan be used to classify new data. The first step, of learning the model, isa c c o m p l i s h e d   b y   u s i n g   a   t r a i n i n g   s e t   o f   d a t a   t h a t   h a s   a l r e a d y  b e e n classified. Each record in the training data contains an attribute, called theclass label, that indicates which class the record belongs to. The model thatis produced is usually in the form of a decision tree or a set of rules. Someof the important issues with regard to the model and the algorithm thatproduces the model include the model’s ability to predict the correct classof new data, the computational cost associated with the algorithm, and thescalability of the algorithm.  

We wil l examine the approach where our model is in the form of a decision tree. A decision tree is simply a graphical representation of thed e s c r i p t i o n   o f   e a c h   c l a s s   o r   i n   o t h e r   w o r d s ,   a   r e p r e s e n t a t i o n  o f   t h e classification rules.Clustering : Association rules and clustering are fundamental data mining techniques used for different goals. We propose a unifying theory by proving association support and rule confidence can be bounded and estimated from clusters on binary dimensions. Three support metrics are introduced: lower, upper and average support. Three confidence metrics are proposed: lower, upper and average confidence. Clusters represent a simple model that allows understanding and approximating association rules, instead of searching for them in a large transaction data set

6. Write about: Categories of Data Models Schemas, Instances and Database States

With an example for eachAnswer : Categories of Data Models

A model is a representation of reality, ‘real world’ objects and events, and their associations. It is an abstraction that concentrates on the essential, inherent aspects of an organization and ignores the accidental properties. A data model represents the organization itself. Let should provide the basic

Page 49: mc0067

concepts and notations that will allow database designers and end users unambiguously and accurately to communicate their understanding of the organizational data.

Data Model can be defined as an integrated collection of concepts for describing and manipulating data, relationships between data, and constraints on the data in an organization.

A data model comprises of three components:

• A structural part, consisting of a set of rules according to which databases can be constructed.

• A manipulative part. Defining the types of operation that are allowed on the data (this includes the operations that are used or updating or retrieving data from the database and for changing the structure of the database).

• Possibly a set of integrity rules, which ensures that the data is accurate.

The purpose of a data model is to represent data and to make the data understandable.There have been many data models proposed in the literature. They fall into three broad categories:

• Object Based Data Models• Physical Data Models• Record Based Data Models

The object based and record based data models are used to describe data at the conceptual and external levels, the physical data model is used to describe data at the internal level.

A) Object Based logical Models:

 These models are used to describe data at the logical and view levels. The following are the well known models in

this group.

Entity Relationship Model.

The Object-Oriented Model.

The Semantic data model

The functional data model.

Entity Relationship Model:

Entity: An Entity is an object or a thing such as person, place about which an organization keeps information. Any two

objects or things are distinguishable.

E.g.: Each student is an entity.

Attribute: The describing properties of an entity are called Attributes.

E.g.: For a student entity, name, sex, date of birth are attributes.

Relationship: An association among entities is called a relationship.

Page 50: mc0067

The data model that consists of a set of entities and a set of relationships among those entities is called ER Model.

The set of all entities of the same type is called an entity set and the set of all relationship of the same type are called

a relationship set.

The Object-Oriented Model:

The object oriented model is a data model based on a collection of objects.

Each object has a unique identity. The group of objects containing the same type of     values and the same methods

are called classes.

The Semantic data model:

These models were based on semantic networks. Inter dependencies among the entities can be expressed in this

data model.

Functional Data Model:

In this model objects, properties of objects, their relationships are viewed uniformly and are defined as functions.

C) Physical Data Models:

These models are used to represent data at the lowest level. Two important physical Data Models are:

Unifying Model

Frame Memory Model.

B) Record Based Logical models:

This model is used to describe data at the logical and view levels. The database is structured in fixed format records

of different types. Each record type has a fixed number of fields. And each field is of fixed length. The following are

the three important record based logical models.

Relational Model

Network Model

Hierarchical Model.

Relational Model:

A data model in which both data and their relationships are represented by means of tables is called Relational

Model.

Page 51: mc0067

The relation is the only data structure used in this model to represent both entities and their interrelationships. A

relation is a two dimensional table with a unique name.

Each row of a table is called a tuple and each column of a table is called an attribute. The set of all possible values in

an attribute is called the domain of the attribute.

Network Model:

The network model uses two different structures. The data are represented by a collection of records and the

relationships among data are represented by links.

Hierarchical Model:

In Hierarchical Model, data are represented by records and relationships among data are represented by links. But

unlike in Network model, data are organized in an ordered tree structure, which is called   Hierarchical structure.

Schemas

A database schema is described in a formal language supported by the database management system (DBMS). In a relational database, the schema defines the tables, the fields in each table, and the relationships between fields and tables.

Schemas are generally stored in a data dictionary. Although a schema is defined in text database language, the term is often used to refer to a graphical depiction of the database structure

Levels of database schema

1. Conceptual schema, a map of concepts and their relationships

2. Logical schema, a map of entities and their attributes and relations

3. Physical schema, a particular implementation of a logical schema

4. Schema object, Oracle database object

5. Schema is the overall structure of the database

The following examples illustrate common schema designs based on the design considerations that are essential to usability and performance.

Page 52: mc0067

This example illustrates a multi star schema, in which the primary and foreign keys are not composed of the same set of columns. This design also contains a family of fact tables: a Bookings table, an Actual table, and a Promo_Schedule table.

This database tracks reservations (bookings) and actual accommodations rented for a chain of hotels, as well as various promotions. It also maintains information about customers, promotions, and each hotel in the chain.

In cases where payment is received in advance (for example, reservation deposits, cable TV subscriptions, automobile insurance), in accordance with proper accounting procedures, transactions must be made to reflect income as it is earned, not when it is received, and the database must be designed to accommodate such transactions.

Instances

Every running Oracle database is associated with an Oracle instance. When a database is started on a database server (regardless of the type of computer), Oracle allocates a memory area called the System Global Area (SGA) and starts one or more Oracle

Page 53: mc0067

processes. This combination of the SGA and the Oracle processes is called an Oracle instance. The memory and processes of an instance manage the associated database's data efficiently and serve the one or multiple users of the database.

Below figure shows an Oracle instance.

Example: An organization with an employee’s database might have three different instances: production (used to contain live data), pre-production (used to test new functionality prior to release into production) and development (used by database developers to create new functionality).

Database States

When a configuration is enabled, its databases can be in one of several states that direct the behavior of Data Guard, for example transmitting redo data or applying redo data. The broker does not manage the state of the database (that is, mounted or opened). Below table describes the various database states.

Database States and Descriptions

Database Role State Name Description

Primary TRANSPORT-ON

Redo transport services are set up to transmit redo data to the standby databases when the primary database is open for read/write access.

If this is an Oracle RAC database, all instances open in read/write mode will have redo transport services running.

Page 54: mc0067

Database Role State Name Description

This is the default state for a primary database when it is enabled for the first time.

Primary TRANSPORT-OFF

Redo transport services are stopped on the primary database.

If this is an Oracle RAC database, redo transport services are not running on any instances.

Physical standby

APPLY-ON Redo Apply is started on a physical standby database.

If the standby database is an Oracle RAC database, the broker starts Redo Apply on exactly one standby instance, called the apply instance. If this instance fails, the broker automatically chooses another instance that is either mounted or open read-only. This new instance then becomes the apply instance.

This is the default state for a physical standby database when it is enabled for the first time.

If a license for the Oracle Active Data Guard option has been purchased, a physical standby database can be open while Redo Apply is active. This capability is known as real-time query.

Physical standby

APPLY-OFF Redo Apply is stopped.

If this is an Oracle RAC database, there is no instance running Apply Services until you change the database state to APPLY-ON.

Snapshot standby

APPLY-OFF Redo data is received from the primary database but is not applied. The database is opened for read/write access.

Logical standby

APPLY-ON SQL Apply is started on the logical standby database when it is opened and the logical standby database guard is on.

If this is an Oracle RAC database, SQL Apply is running on one instance, the apply instance. If this instance fails, the broker automatically chooses another open instance. This new instance becomes the apply instance.

This is the default state for a logical standby database when it is enabled for the first time.

Logical standby

APPLY-OFF SQL Apply is not running on the logical standby database. The logical standby database guard is on.

Page 55: mc0067

Database Role State Name Description

If this is an Oracle RAC database, there is no instance running SQL Apply until you change the state to APPLY-ON.

We can use the DGMGRL EDIT DATABASE command to explicitly change the state of a database. For example, the EDIT DATABASE command in the following example changes the state of the North_Sales database to TRANSPORT-OFF.

DGMGRL> EDIT DATABASE 'North_Sales' SET STATE='TRANSPORT-OFF';Succeeded.

7. Taking an example Enterprise System, List out the Relationship types, Relationship sets and roles

Answer : There are three type of relationships

1) One to one

2) One to many

3) Many to many

Say we have table1 and table2

For one to one relationship, a record(row) in table1 will have at most one matching record or row in table2

I.e. it mustn’t have two matching records or no matching records in table2.

For one to many, a record in table1 can have more than one record in table2 but not vice versa

Let’s take an example,

Say we have a database which saves information about Guys and whom they are dating.

We have two tables in our database Guys and Girls

Guy id Guy name

1 Andrew

2 Bob

3 Craig

Page 56: mc0067

Girl id Girl name

1 Girl1

2 Girl2

3 Girl3

Here in above example Guy ID and Girl ID are primary keys of their respective table.

Say Andrew is dating Girl1, Bob – Girl2 and Craig is dating Girl3.

So we are having a one to one relationship over there.

So in this case we need to modify the Girls table to have a Guy id foreign key in it.

Girl id Girl name Guy id

1 Girl1 1

2 Girl2 2

3 Girl3 3

Now let say one guy has started dating more than one girl.

i.e. Andrew has started dating Girl1 and say a new Girl4

That takes us to one to many relationships from Guys to Girls table.

Now to accommodate this change we can modify our Girls table like this

Girl Id Girl Name Guy Id

1 Girl1 1

2 Girl2 2

3 Girl3 3

4 Girl4 1

Now say after few days, comes a time where girls have also started dating more than one boy i.e. many to many relationships

So the thing to do over here is to add another table which is called Junction Table, Associate Table or linking Table which will contain primary key columns of both girls and guys table.

Let see it with an example

Page 57: mc0067

Guy id Guy name

1 Andrew

2 Bob

3 Craig

Girl id Girl name

1 Girl1

2 Girl2

3 Girl3

Andrew is now dating Girl1 and Girl2 and

Now Girl3 has started dating Bob and Craig

so our junction table will look like this

Guy ID Girl ID

1 1

1 2

2 2

2 3

3 3

It will contain primary key of both the Girls and Boys table.

A relationship type R among n entity types E1, E2, …, En is a set of associations among entities from these types. Actually, R is a set of relationship instances ri where each ri is an n-tuple of entities (e1, e2, …, en), and each entity ej in ri is a member of entity type Ej, 1≤j≤n. Hence, a relationship type is a mathematical relation on E1, E2, …, En, or alternatively it can be defined as a subset of the Cartesian product E1x E2x … xEn . Here, entity types E1, E2, …, En defines a set of relationship, called relationship sets.

Relationship instance: Each relationship instance ri in R is an association of entities, where the association includes exactly one entity from each participating entity type. Each such relationship instance ri represent the fact that the entities participating in ri are related in some way in the corresponding miniworld situation. For example, in relationship type WORKS_FOR associates one EMPLOYEE and DEPARTMENT, which associates each employee with the department for which the employee works. Each relationship instance in the relationship set WORKS_FOR associates one EMPLOYEE and one DEPARTMENT.

Page 58: mc0067

8. With an example show that Serializability can be guaranteed with two – phase locking

Answer : In databases and transaction processing, two-phase locking (2PL) is a concurrency control method that guaranteesserializability.[1][2] It is also the name of the resulting set of database transaction schedules (histories). The protocol utilizes locks, applied by a transaction to data, which may block (interpreted as signals to stop) other transactions from accessing the same data during the transaction's life.By the 2PL protocol locks are applied and removed in two phases:Expanding phase: locks are acquired and no locks are released.Shrinking phase: locks are released and no locks are acquired.Two types of locks are utilized by the basic protocol: Shared and Exclusive locks. Refinements of the basic protocol may utilize more lock types. Using locks that block processes, 2PL may be subject to deadlocks that result from the mutual blocking of two or more transactions.2PL is a superset of strong strict two-phase locking (SS2PL),[3] also called rigorousness,[4] which has been widely utilized for concurrency control in general-purpose database systems since the 1970s. SS2PL implementations have many variants. SS2PL was called strict 2PL[1] but this name usage is not recommended now. Now strict 2PL (S2PL) is the intersection of strictness and 2PL, which is different from SS2PL. SS2PL is also a special case of commitment ordering,[3] and inherits many of CO's useful properties. SS2PL actually comprises only one phase: phase-2 does not exist, and all locks are released only after transaction end. Thus this useful 2PL type is not two-phased at all.Neither 2PL nor S2PL in their general forms are known to be used in practice. Thus 2PL by itself does not seem to have much practical importance, and whenever 2PL or S2PL utilization has been mentioned in the literature, the intention has been SS2PL. What has made SS2PL so popular (probably the most utilized serializability mechanism) is the effective and efficient locking-based combination of two ingredients (the first does not exist in both general 2PL and S2PL; the second does not exist in general 2PL):Commitment ordering, which provides both serializability, and effective distributed serializability and global serializability, andStrictness, which provides cascadelessness (ACA, cascade-less recoverability) and (independently) allows efficient databaserecovery from failure.Additionally SS2PL is easier, with less overhead to implement than both 2PL and S2PL, provides exactly same locking, but sometimes releases locks later. However, practically (though not simplistically theoretically) such later lock release occurs only slightly later, and this apparent disadvantage is insignificant and disappears next to the advantages of SS2PL.Thus, the importance of the general Two-phase locking (2PL) is historic only, while Strong strict two-phase locking (SS2PL) is practically the important mechanism and resulting schedule propertyrespective properties) that have common schedules, either onecontains the other (strictly contains if they are not equal), or they areincomparable. The containment relationships among the 2PL classes and other major schedule classes are summarized in the following diagram. 2PL and its subclasses are inherently blocking, which means that no optimistic implementations for them exist (and whenever "Optimistic 2PL" is mentioned it refers to a different mechanism with a class that includes also schedules not in the 2PL class).[edit]Deadlocks in 2PLLocks block data-access operations. Mutual blocking between transactions results in a  deadlock, where execution of these transactions is stalled, and no completion can be reached. Thus deadlocks need to be resolved to complete these transactions' executions and release related computing resources. A deadlock is a reflection of a potential cycle in the precedence graph, that would occur without the blocking. A deadlock is resolved by aborting a transaction involved with such potential cycle, and

Page 59: mc0067

breaking the cycle. It is often detected using a wait-for graph (a graph of conflicts blocked by locks from being materialized; conflicts not materialized in the database due to blocked operations are not reflected in the precedence graph and do not affect serializability), which indicates which transaction is "waiting for" lock release by which transaction, and a cycle means a deadlock. Aborting one transaction per cycle is sufficient to break the cycle. Transactions aborted due to deadlock resolution are executed again immediately.In a distributed environment an atomic commitment protocol, typically the Two-phase commit (2PC) protocol, is utilized for atomicity. When recoverable data (data under transaction control) are partitioned among 2PC participants (i.e., each data object is controlled by a single 2PC participant), then distributed (global) deadlocks, deadlocks involving two or more participants in 2PC, are resolved automatically as follows:When SS2PL is effectively utilized in a distributed environment, then global deadlocks due to locking generate voting-deadlocks in 2PC, and are resolved automatically by 2PC (see  Commitment ordering (CO), in Exact characterization of voting-deadlocks by global cycles; No reference except the CO articles is known to notice this). For the general case of 2PL, global deadlocks are similarly resolved automatically by the synchronization point protocol of phase-1 end in a distributed transaction (synchronization point is achieved by "voting" (notifying local phase-1 end), and being propagated to the participants in a distributed transaction the same way as a decision point in atomic commitment; in analogy to decision point in CO, a conflicting operation in 2PL cannot happen before phase-1 end synchronization point, with the same resulting voting-deadlock in the case of a global

data-access deadlock; the voting-deadlock (which is also a locking based global deadlock) is automatically resolved by the protocol aborting some transaction involved, with a missing vote, typically using a timeout).

Page 60: mc0067