Top Banner
Subject : Advanced Databases ORAL QUESTIONS GENERAL DBMS QUESTIONS 1. What is data abstraction ? Data abstraction is the enforcement of a clear separation between the abstract properties of a data type and the concrete details of its implementation. The abstract properties are those that are visible to client code that makes use of the data type--the interface to the data type--while the concrete implementation is kept entirely private, and indeed can change, for example to incorporate efficiency improvements over time. The idea is that such changes are not supposed to have any impact on client code, since they involve no difference in the abstract behaviour. For example, one could define an abstract data type called lookup table, where keys are uniquely associated with values, and values may be retrieved by specifying their corresponding keys. Such a lookup table may be implemented in various ways: as a hash table , a binary search tree , or even a simple linear list. As far as client code is concerned, the abstract properties of the type are the same in each case. 2. What are 3 levels of data abstraction ? Since many users of database systems are not deeply familiar with computer data structures,
28
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ADBMS

Subject : Advanced DatabasesORAL QUESTIONS

GENERAL DBMS QUESTIONS

1. What is data abstraction ?Data abstraction is the enforcement of a clear separation between the abstract properties of a data type and the concrete details of its implementation. The abstract properties are those that are visible to client code that makes use of the data type--the interface to the data type--while the concrete implementation is kept entirely private, and indeed can change, for example to incorporate efficiency improvements over time. The idea is that such changes are not supposed to have any impact on client code, since they involve no difference in the abstract behaviour.For example, one could define an abstract data type called lookup table,

where keys are uniquely associated with values, and values may be retrieved by specifying their corresponding keys. Such a lookup table may be implemented in various ways: as a hash table, a binary search tree , or even a simple linear list. As far as client code is concerned, the abstract properties of the type are the same in each case.

2. What are 3 levels of data abstraction ?Since many users of database systems are not deeply familiar with computer data structures, database developers often hide complexity through the following levels:

Data abstraction levels of a database system

Physical level: The lowest level of abstraction describes how the data is actually stored. The physical level describes complex low-level data structures in detail.

Logical level: The next higher level of abstraction describes what data are stored in the database, and what relationships exist among those data. The

Page 2: ADBMS

logical level thus describes an entire database in terms of a small number of relatively simple structures. Although implementation of the simple structures at the logical level may involve complex physical level structures, the user of the logical level does not need to be aware of this complexity. Database administrators, who must decide what information to keep in a database, use the logical level of abstraction.

View level: The highest level of abstraction describes only part of the entire database. Even though the logical level uses simpler structures, complexity remains because of the variety of information stored in a large database. Many users of a database system do not need all this information; instead, they need to access only a part of the database. The view level of abstraction exists to simplify their interaction with the system. The system may provide many views for the same database.

3. Explain Normalization ,1NF,2NF,3NF,Boyee codd NF,4NFDatabase normalization, sometimes referred to as canonical synthesis, is a technique for designing relational database tables to minimize duplication of information and, in so doing, to safeguard the database against certain types of logical or structural problems, namely data anomalies. For example, when multiple instances of a given piece of information occur in a table, the possibility exists that these instances will not be kept consistent when the data within the table is updated, leading to a loss of data integrity. A table that is sufficiently normalized is less vulnerable to problems of this kind, because its structure reflects the basic assumptions for when multiple instances of the same information should be represented by a single instance only.1NF :A table is in first normal form (1NF) if and only if it represents a relation.[3] Given that database tables embody a relation-like form, the defining characteristic of one in first normal form is that it does not allow duplicate rows or nulls. Simply put, a table with a unique key (which, by definition, prevents duplicate rows) and without any nullable columns is in 1NF.2NF:The table must be in 1NF.

None of the non-prime attributes of the table are functionally dependent on a part (proper subset) of a candidate key; in other words, all functional dependencies of non-prime attributes on candidate keys are full functional dependencies.[7] For example, consider an "Employees' Skills" table whose attributes are Employee ID,

Page 3: ADBMS

Employee Name, and Skill; and suppose that the combination of Employee ID and Skill uniquely identifies records within the table. Given that Employee Name depends on only one of those attributes – namely, Employee ID – the table is not in 2NF.

In simple terms, a table is 2NF if it is in 1NF and all fields are dependent on the whole of the primary key, or a relation is in 2NF if it is in 1NF and every non-key attribute is fully dependent on each candidate key of the relation.

Note that if none of a 1NF table's candidate keys are composite – i.e. every candidate key consists of just one attribute – then we can say immediately that the table is in 2NF.

All columns must be a fact about the entire key, and not a subset of the key.

3NF:The criteria for third normal form (3NF) are: The table must be in 2NF. Transitive dependencies must not be eliminated. All attributes must

rely only on the primary key. So, if a database has a table with columns Student ID, Student, Company, and Company Phone Number, it is not in 3NF. This is because the Phone number relies on the Company. So, for it to be in 3NF, there must be a second table with Company and Company Phone Number columns; the Phone Number column in the first table would be removed.

Boyee codd NF: A table is in Boyce-Codd normal form (BCNF) if and only if, for every one of its non-trivial functional dependencies X → Y, X is a superkey—that is, X is either a candidate key or a superset thereof.[4NF:A table is in fourth normal form (4NF) if and only if, for every one of its non-trivial multivalued dependencies X Y, X is a superkey—that is, X is either a candidate key or a superset thereof.[9]

For example, if you can have two phone numbers values and two email address values, then you should not have them in the same table.

4. What is Object Oriented Databases ?In an object database (also object oriented database), information is represented in the form of objects as used in object-oriented programming . When database capabilities are combined with object programming language capabilities, the result is an object database management

Page 4: ADBMS

system (ODBMS). An ODBMS makes database objects appear as programming language objects in one or more object programming languages. An ODBMS extends the programming language with transparently persistent data, concurrency control, data recovery, associative queries, and other capabilities.

5. What is ORDBMS ?An object-relational database (ORD) or object-relational database management system (ORDBMS) is a database management system (DBMS) similar to a relational database, but with an object-oriented database model: objects, classes and inheritance are directly supported in database schemas and in the query language. In addition, it supports extension of the data model with custom data-types and methods.One aim for this type of system is to bridge the gap between conceptual data modeling techniques such as Entity-relationship

diagram (ERD) and object-relational mapping (ORM), which often use classes and inheritance, and relational databases, which do not directly support them.

Another, related, aim is to bridge the gap between relational databases and the object-oriented modeling techniques used in programming languages such as Java, C++ or C#. However, a more popular alternative for achieving such a bridge is to use a standard relational database systems with some form of ORM software.

6. Explain

7. What are Acid Properties ?In computer science, ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction.

Atomicity

Atomicity refers to the ability of the DBMS to guarantee that either all of the tasks of a transaction are performed or none of them are. For example, the transfer of funds can be completed or it can fail for a multitude of reasons, but atomicity guarantees that one account won't be debited if the other is not credited. Atomicity states that database modifications must follow an “all or

Page 5: ADBMS

nothing” rule. Each transaction is said to be “atomic.” If one part of the transaction fails, the entire transaction fails. It is critical that the database management system maintain the atomic nature of transactions in spite of any DBMS, operating system or hardware failure. Atomicity is obtained when an attribute can no longer be broken down any further.

Consistency

The Consistency property ensures that the database remains in a consistent state before the start of the transaction and after the transaction is over (whether successful or not).

Consistency states that only valid data will be written to the database. If, for some reason, a transaction is executed that violates the database’s consistency rules, the entire transaction will be rolled back and the database will be restored to a state consistent with those rules. On the other hand, if a transaction successfully executes, it will take the database from one state that is consistent with the rules to another state that is also consistent with the rules.

Isolation

Isolation refers to the requirement that other operations cannot access or see the data in an intermediate state during a transaction. This constraint is required to maintain the performance as well as the consistency between transactions in a DBMS system.

Durability

Durability refers to the guarantee that once the user has been notified of success, the transaction will persist, and not be undone. This means it will survive system failure, and that the database system has checked the integrity constraints and won't need to abort the transaction. Many databases implement durability by writing all transactions into a log that can be played back to recreate the system state right before a failure. A transaction can only be deemed committed after it is safely in the log.

UNIT 11. Explain architecture of parallel database And explain with

example4 types of PDB architectures based on arrangement of processors, disks and memory:Shared memory, shared disk, shared nothing and hierarchical

Page 6: ADBMS

2. Explain speedup and scale up w.r.to parallel databasesSpeed up : more no. of small transactions per unit time given by Ts/Tl.Scale up : larger transactions executed in same time by parallelism and increasing resourcesTs/Tl=1

3. Explain different partitioning techniques1. round robin 2. hash partitioning 3. range partitioning

4. Explain intraquery and interquery parallelism intraquery : operations within a query are executed in parallel

This will improve both throughput as well as response time.(speed up)

interquery : queries within a transaction are executed in parallelThis will increase throughput but not response time(scale up)

5. Describe the good way to parallelizea. The difference operationb. Count, avg : 1. Partition the relation on grouping attributes

2. Compute the aggregate value locally on each processor.

c. Join : any of the 4 methods 1. partitioned join 2. fragment and replicate join3. partitioned parallel hash join4. parallel nested loop join

6. Explain skew handling w r to P.D1. balanced range partitioning vector can be constructed by sorting2. Use virtual processors to distribute the work.

UNIT 2

7. What is distributed database system ? explain with exampleA distributed database management system is a software system that permits the management of a distributed database and makes the distribution transparent to the users. A distributed database is a collection of multiple, logically interrelated databases distributed over a computer network. Sometimes "distributed database system" is used to refer jointly to the distributed database and the distributed DBMS.

Page 7: ADBMS

A distributed database is a database that is under the control of a central database management system (DBMS) in which storage devices are not all attached to a common CPU. It may be stored in multiple computers located in the same physical location, or may be dispersed over a network of interconnected computers.

Collections of data (eg. in a database) can be distributed across multiple physical locations. A distributed database is distributed into separate partitions/fragments. Each partition/fragment of a distributed database may be replicated (ie. redundant fail-overs, RAID like).

Besides distributed database replication and fragmentation, there are many other distributed database design technologies. For example, local autonomy, synchronous and asynchronous distributed database technologies. These technologies' implementation can and does depend on the needs of the business and the sensitivity/confidentiality of the data to be stored in the database, and hence the price the business is willing to spend on ensuring data security, consistency and integrity.

8. What is homogenous and heterogenous distributed system ?Homogenous DDBS : same schema, dbms and sites are aware of each other

Heterogeneous : different schema and dbms

sites r unaware of each other

9. What is distributed data storage ?How a relation is stored a different sites by replication, fragmentation or both.

10.What is the role of transaction manager in distributed system ?To manage access to data stored at that siteTo maintain a log for recovery purpose.Concurrency control scheme to control concurrent execution of transactions at THAT site.

11.What is the role of transaction coordinator in distributed system ?To coordinate execution of transactions(local/global) initiated at that site.

Page 8: ADBMS

To start transaction.Divide into subtransactions and distribute subtransactions to appropriate sites.To coordinate termination of transactions.

12.What are system failure modes in d.s.Site failure.Lost messages.Communication link failure.Network partition.

13.Explain availability w.r.to d.s.System should continue normal functioning even if some site fails.Concurrency protocols can be modified to allow availability:1. Majority based approach 2. Read 1, write all availabile approach.Failed site should be reintegrated properly.

In case of coordinator failure : backup coordinator can be used or new coordinator can be selected by election algo.

14.Consider a d.s with 2 sites A and B Can site A distinguish among the following

a. B goes downb. The link between A and B goes down c. B is extremely loaded

No. It cannot distingush between the above cases . Site A can detect failure, but it cannot determine the reason of failure.

15.Explain working of election algorithmEvery site is given a unique ID and site with highest ID becomes coordinator.

16.What are directories?Directory is a listing of info about some class of objects.

17.Give examples of directoriestelephone directory.Favourites in web browser.

18.What is directory system ?software engineering, a directory is similar to a dictionary; it enables the look up of a name and information associated with that name. As a

Page 9: ADBMS

word in a dictionary may have multiple definitions, in a directory, a name may be associated with multiple, different, pieces of information. Likewise, as a word may have different parts and different definitions, a name in a directory may have many different types of data. Based on this rudimentary explanation of a directory, a directory service is simply the software system that stores, organizes and provides access to information in a directory.

Directories may be very narrow in scope, supporting only a small set of node types and data types, or they may be very broad, supporting an arbitrary or extensible set of types. In a telephone directory, the nodes are names and the data items are telephone numbers. In the DNS the nodes are domain names or internet addresses. In a directory used by a network operating system, the nodes represent resources that are managed by the OS, including users, computers, printers and other shared resources. Many different directory services have been used since the advent of the Internet but this article focuses mainly on those that have descended from the X.500 directory service.

19.What is LDAP ?The Lightweight Directory Access Protocol, or LDAP (IPA: [ˈɛl dæp]), is an application protocol for querying and modifying directory services running over TCP/IP.[1]

A directory is a set of objects with similar attributes organised in a logical and hierarchical manner. The most common example is the telephone directory, which consists of a series of names (either of persons or organizations) organized alphabetically, with each name having an address and phone number attached.

An LDAP directory tree often reflects various political, geographic, and/or organizational boundaries, depending on the model chosen. LDAP deployments today tend to use Domain name system (DNS) names for structuring the topmost levels of the hierarchy. Deeper inside the directory might appear entries representing people, organizational units, printers, documents, groups of people or anything else that represents a given tree entry (or multiple entries).

20.Explain LDIF format for LDAPLDAP data interchange format.

Page 10: ADBMS

21.How querying mechanism works in LDAP ?Consists of simple selections and projections, no joins.Query can be fired directly or API can be used.Query consists of search condition, base, return attributes, limit and scope.

22.How LDAP works at client side ?Client uses API to access LDAP server. A query is transparently processed using referrals.

23.How does LDAP works ?

24.What are LDAP backends ?LDAP servers??

25.What are LDAP objects ?

26.What are LDAP attributes ?Entries in LDAP can have attributes.

27.How access cntl mechanism works in LDAP ?

28.Explain conf. file sections

29.What are benefits of LDAP ?Simple n/w protocol to access directory info.Referrals allow transparent access to a distributed LDAP tree.

UNIT 330.What is XML?

Extensible Markup Language.Provides standard data format for data exchange between applications over the web.

31.Name the XML parser and working of each in brief????

Page 11: ADBMS

32.What is two, three, multi tier architecture?2 tier architecture : tier 1 : web server and application server are

combinedtier 2 : data server

3 tier architecture : tier 1 : web server tier 2 : application server

tier 3 : data serverN tier architecture : client has presentation GUI

tier 1 : presentation logictier 2 : business logic tier/proxy tier (SOAP)tier 3 : database access tiertier 4 : data tier

33.Explain XML DTDDocument type declaration specifies schema of XML documents.It spcifies : 1. what elements may occur

2. How they may be nested3. what are thei attributes

34.Explain SOAPSimple object access protocol invoking procedures by specifying a standard XML format fot procedure parameters, return values which are embedded in the SOAP XML header.SOAP procedures can be invoked by any application and are programming language independent.SOAP uses HTTP as transport protocol.

UNIT 435.What is the need of Data warehousing ?

1. normal processing in operational database will get slowed down if time is spent processing analytical queries2. Analysis and decision making needs historic data

36.Explain OLAPIt is an interactive system which provides summary about multidimensional data.

37.Explain OLTP

Page 12: ADBMS

Interactiv system which handles storing records about data created, srored and used by business transactions.

38.What is difference between OLTP and OLAP1. OLTP : Views current data OLAP : viw historic data2. OLTP : handles operational data OLAP : informational data3. OLTP : transaction OLAP : decision making4 OLTP : E-R model OLAP : multidimensional data model

39.What is ROLAP,MOLAP,Hybrid OLAPRelational OLAP : extension of relation database acting between from end tools and back-end relational DB, highly scalableMOLAP : multidimensional data model

very fast computationHybrid OLAP : combination of both

40.What are different OLAP operation?

1. Roll up2. Drill Down3. Slice n dice4. Pivot5. Drill across6. Drill through

41.What are schemas for multidimensional Databases?Star, snowflake, fact constellation

42.What is decision support system?Enables easy decision making to managers by providing statistical data.

43.What is data cube?Generation of cross-tab visualized having n-dimensions is called data cube.

44.Explain architecture of data warehouse 3-tier architecture of data warehouse :

Page 13: ADBMS

Bottom tier : Data warehouse server

Middle tier : OLAP server

Top tier : front end tools

45.What is Data mart?It is a subset of corporate wide data that has value to a specific group of users.

46.What are the different phases of data warehouse? Explain each??Extract dataClean datatransform dataloadrefresh

47.What are the forms of data pre-processing ?Data cleaningData integrationData transformationdata reduction

48.What is the need of cleaning Data?Data in warehouse is incomplete, noisy and inconsistent.Hence it needs to be cleaned before used for data mining.

UNIT 549.What is materalised view ?

A materialized view is a database object that contains the results of a query. They are local copies of data located remotely, or are used to create summary tables based on aggregations of a table's data. Materialized views, which store data based on remote tables, are also known as snapshots. Snapshot is redefined as Materialized view and the Query rewrite feature is added from ORACLE 8i.

A materialized view takes a different approach in which the query result is cached as a concrete table that may be updated from the original base tables from time to time. This enables much more efficient access, at the cost of some data

Page 14: ADBMS

being potentially out-of-date. It is most useful in data warehousing scenarios, where frequent queries of the actual base tables can be extremely expensive.

50.What is Data mining?It is the process of extracting knowledge from large amounts of data.

51.Explain architecture of Data mining?1. data cleaning, integration and selection2. Data warehouse server3. data mining engine4. knowledge base5. pattern evaluation6. User interface

52.What is frequent pattern ?53.What is sequential pattern ?54.Explain support and confidence w .r . to Association Rule Mining

D : set of transactionsA : itemsetSupport : P(A U B) : percentage of transactions in D containing (A U B)Confidence : P(B/A)

: percentage of transactions in D containing A that also contain B.

55.What is association rule mining ?Associations among data in large transactional databases can be found by perfoming frequent itemset mining.Association rule mining is a 2 step process : 1. Find all frquent itemsets2. Generate strong association rules from frequent itemsets : these rules must satisfy minimum support and minimum confidence.

56.What is frequent itemsets?It is an itemset which occurs at least as frequently as predetermined minimum support count.

57.What is closed itemsets ?An itemset X is clodes in dataset D if there exists no proper super-itemset Y, which has the same support count as X, in D.

58.What is closed frequent itemset?

Page 15: ADBMS

If X is both closed and frequent in D.

59.Explain Apriori algorithm It is an algorithm for finding frequent itemsets using candidate generation.

Principle : All non-empty subsets of a frequent itemset must also be frequent.

Input : D : set of transactionsmin_sup : minimum support count threshold

Output : frequent Itemsets in D

Steps : 1. Join 2. prune

60.Explain generation of association rule from frequent itemset Association rule mining is a 2 step process : 1. Find all frquent itemsets2. Generate strong association rules from frequent itemsets : these rules must satisfy minimum support and minimum confidence.

61.What is correlation analysis ???

62.What is classification ?Given past instances and classes to which thy belong, the problem is to find the class to which a new item belongs.

63.What is prediction ?Prediction is a continuous valued function unlike classification which gives categorical values.

64.What is difference between classification and prediction ?Classification finds categorical labels, while prediction is a continuous valued function.

Page 16: ADBMS

Accuracy of a classifier refers to the ability find accutare class label, while accuracy of predictor is in how accurately the predictor can guess the value.

65.Explain decision tree induction with algorithm Input : 1. set of training tuples and their associated class labels.

2. Attribute list3. Attribute selection method.

Output : Decision tree giving classification rulesMethod : find nodes by selecting attributes from the list which are best

splitting attributes that partition the tuples into distinct classes.

66.What is Bayesian classification?It is a statistical classification that allows to predict class membership probabilities.

67.What is bayes theorem?Let H be a hypothesis that X belongs to specific class C.Then posterior probability of H condition on X isP(H/X) = P(X/H) P(H)/P(X)

68.How to predict a class label using naïve bayesian classification X is a tupleC1, C2.......Cn are classes.Then X belongs to a class having highest posterior probability.I.e. P(Ci/X) > P(Cj/X) ; for j=1 to n

69.What is cluster analysis?Clustering is the grouping of physical or abstract objects into classes of similar objects.

70.Explain centroid based technique – K-mean algorithm Partitions set of n objects into K clusters so that resulting intracluster similarity is high, and intercluster similarity is low.

71.What is outlier analysis?It is the detecton and analysis of outlier data.

Page 17: ADBMS

72.What is Text Mining?It is the process of mining information stored in text documents.

UNIT 673.What is information retrieval system?

Systems which are used to store and query unstructured textual data such as documents.

74.What is difference between IR and Database systemDatabase systems handle structured data which is has a complex data model. Queryind data is relatively easier.IR systems handle ustructured data which follows a simple data model. Problems in approximate keyword searchind relevance ranking.

75.What is the need of relevance ranking?User cannot state the query precisely.Also, keyword search returns a large no. of documents which match.Hence, IR system needs to order answer on the basis of relevance.

76.What are functionality of web search engine?Finds wen pages relevant to given keywords which are ranked according to relevance.

77.Explain architecture of web search engine 1. Search engine software2. Web crawler3. Index database4. Relevance Ranking algorithm

78.Explain diferent ways of relevance ranking?1. relevance ranking using terms

Page 18: ADBMS

2. relevance ranking using hyperlinks

79.Explain Pagerank algorithm? Measure of popularity of page is based on popularity of pages that

link to that page.Page rank pf page page gives the probability that a random walker will visit that page.

80.Explain HITS algorithm

Compute popularity of pages using only pages that contain keyword.

81.What is hub & authorities w. r. to HITS algorithmHub is a page that stores links to many related pages; but may not in itself contain information on that topic; but it points to pages that contain actual information.Authority is page which contains actual information on that topic, although it may not store links to many related pages.

82.how to evaluate ranked list ????

83.What are the measure of text retrieval?Precision : Percentage of relevant pages in retrieved pages.Recall : Percentage of retrieved pages in relevant.

84.What is web crawler?It is a process which recursively follows hyperlinks and stores indexes and information about web pages.

85.What is indexing? name the data structure used for Indexing storing key-pointer pairs for fast retrieval of data.Here, Inverted index is used for storing list Si of document identifiers which contain a particular keyword Ki.B+ tree is used.

86.Explain inverted list for indexing?Inverted index is used for storing list Si of document identifiers which contain a particular keyword Ki.

Page 19: ADBMS

87.What is need of context based querying?The problem of homonyms can b solved by concept based querying.Here, concept that each word in the document is understood and replacement is done.This disambiguation is done by looking at surrounding words in the document.

88.What is ontology?These are hierarchical structures which represent relationships between concepts.

89.What is synonym, homonym?Synonyms : words with same meaningHomonyms : same word having different meaning.

PRACTICAL ASSN QUESTIONS

90.Explain the architecture of MYSQL, Oracle ,Sql server 91.Name ETL tools 92.Explain ETL tool working93.Compare different ETL tools available