Top Banner
As one of the oldest components associated with computers, the database management system, or DBMS, is a computer software program that is designed as the means of managing all databases that are currently installed on a system hard drive or network. Different types of database management systems exist, with some of them designed for the oversight and proper control of databases that are configured for specific purposes. Here are some examples of the various incarnations of DBMS technology that are currently in use, and some of the basic elements that are part of DBMS software applications. As the tool that is employed in the broad practice of managing databases, the DBMS is marketed in many forms. Some of the more popular examples of DBMS solutions include Microsoft Access, FileMaker, DB2, and Oracle. All these products provide for the creation of a series of rights or privileges that can be associated with a specific user. This means that it is possible to designate one or more database administrators who may control each function, as well as provide other users with various levels of administration rights. This flexibility makes the task of using DBMS methods to oversee a system something that can be centrally controlled, or allocated to several different people. There are four essential elements that are found with just about every example of DBMS currently on the market. The first is the implementation of a modeling language that serves to define the language of each database that is hosted via the DBMS. There are several approaches currently in use, with hierarchical, network, relational, and object examples. Essentially, the modeling language ensures the ability of the databases to communicate with the DBMS and thus operate on the system. Second, data structures also are administered by the DBMS. Examples of data that are organized by this function are individual profiles or records, files, fields and their definitions, and objects such as visual media. Data structures are what allows DBMS to interact with the data
103
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: dbms

As one of the oldest components associated with computers, the database management system, or DBMS, is a computer software program that is designed as the means of managing all databases that are currently installed on a system hard drive or network. Different types of database management systems exist, with some of them designed for the oversight and proper control of databases that are configured for specific purposes. Here are some examples of the various incarnations of DBMS technology that are currently in use, and some of the basic elements that are part of DBMS software applications.

As the tool that is employed in the broad practice of managing databases, the DBMS is marketed in many forms. Some of the more popular examples of DBMS solutions include Microsoft Access, FileMaker, DB2, and Oracle. All these products provide for the creation of a series of rights or privileges that can be associated with a specific user. This means that it is possible to designate one or more database administrators who may control each function, as well as provide other users with various levels of administration rights. This flexibility makes the task of using DBMS methods to oversee a system something that can be centrally controlled, or allocated to several different people.

There are four essential elements that are found with just about every example of DBMS currently on the market. The first is the implementation of a modeling language that serves to define the language of each database that is hosted via the DBMS. There are several approaches currently in use, with hierarchical, network, relational, and object examples. Essentially, the modeling language ensures the ability of the databases to communicate with the DBMS and thus operate on the system.

Second, data structures also are administered by the DBMS. Examples of data that are organized by this function are individual profiles or records, files, fields and their definitions, and objects such as visual media. Data structures are what allows DBMS to interact with the data without causing and damage to the integrity of the data itself.

A third component of DBMS software is the data query language. This element is involved in maintaining the security of the database, by monitoring the use of login data, the assignment of access rights and privileges, and the definition of the criteria that must be employed to add data to the system. The data query language works with the data structures to make sure it is harder to input irrelevant data into any of the databases in use on the system.

Page 2: dbms

Last, a mechanism that allows for transactions is an essential basic for any DBMS. This helps to allow multiple and concurrent access to the database by multiple users, prevents the manipulation of one record by two users at the same time, and preventing the creation of duplicate records.

Database management system

A database management system (DBMS) is computer software that manages databases. DBMSes may use any of a variety of database models, such as the network model or relational model. In large systems, a DBMS allows users and other software to store and retrieve data in a structured way.

Overview

A DBMS is a set of software programs that controls the organization, storage, management, and retrieval of data in a database. DBMS are categorized according to their data structures or types. It is a set of prewritten programs that are used to store, update and retrieve a Database. The DBMS accepts requests for data from the application program and instructs the operating system to transfer the appropriate data. When a DBMS is used, information systems can be changed much more easily as the organization's information requirements change. New categories of data can be added to the database without disruption to the existing system.

Organizations may use one kind of DBMS for daily transaction processing and then move the detail onto another computer that uses another DBMS better suited for random inquiries and analysis. Overall systems design decisions are performed by data administrators and

Page 3: dbms

systems analysts. Detailed database design is performed by database administrators.

Database servers are computers that hold the actual databases and run only the DBMS and related software. Database servers are usually multiprocessor computers, with generous memory and RAID disk arrays used for stable storage. Connected to one or more servers via a high-speed channel, hardware database accelerators are also used in large volume transaction processing environments. DBMSs are found at the heart of most database applications. Sometimes DBMSs are built around a private multitasking kernel with built-in networking support although nowadays these functions are left to the operating system.

History

Databases have been in use since the earliest days of electronic computing. Unlike modern systems which can be applied to widely different databases and needs, the vast majority of older systems were tightly linked to the custom databases in order to gain speed at the expense of flexibility. Originally DBMSs were found only in large organizations with the computer hardware needed to support large data sets.

1960s Navigational DBMS

As computers grew in capability, this trade-off became increasingly unnecessary and a number of general-purpose database systems emerged; by the mid-1960s there were a number of such systems in commercial use. Interest in a standard began to grow, and Charles Bachman, author of one such product, Integrated Data Store (IDS), founded the "Database Task Group" within CODASYL, the group responsible for the creation and standardization of COBOL. In 1971 they delivered their standard, which generally became known as the "Codasyl approach", and soon there were a number of commercial products based on it available.

The Codasyl approach was based on the "manual" navigation of a linked data set which was formed into a large network. When the database was first opened, the program was handed back a link to the first record in the database, which also contained pointers to other pieces of data. To find any particular record the programmer had to step through these pointers one at a time until the required record was returned. Simple queries like "find all the people in India" required the program to walk the entire data set and collect the matching results. There was, essentially, no concept of "find" or "search". This might

Page 4: dbms

sound like a serious limitation today, but in an era when the data was most often stored on magnetic tape such operations were too expensive to contemplate anyway.

IBM also had their own DBMS system in 1968, known as IMS. IMS was a development of software written for the Apollo program on the System/360. IMS was generally similar in concept to Codasyl, but used a strict hierarchy for its model of data navigation instead of Codasyl's network model. Both concepts later became known as navigational databases due to the way data was accessed, and Bachman's 1973 Turing Award award presentation was The Programmer as Navigator. IMS is classified as a hierarchical database. IDS and IDMS, both CODASYL databases, as well as CINCOMs TOTAL database are classified as network databases.

1970s Relational DBMS

Edgar Codd worked at IBM in San Jose, California, in one of their offshoot offices that was primarily involved in the development of hard disk systems. He was unhappy with the navigational model of the Codasyl approach, notably the lack of a "search" facility which was becoming increasingly useful. In 1970, he wrote a number of papers that outlined a new approach to database construction that eventually culminated in the groundbreaking A Relational Model of Data for Large Shared Data Banks.[1]

In this paper, he described a new system for storing and working with large databases. Instead of records being stored in some sort of linked list of free-form records as in Codasyl, Codd's idea was to use a "table" of fixed-length records. A linked-list system would be very inefficient when storing "sparse" databases where some of the data for any one record could be left empty. The relational model solved this by splitting the data into a series of normalized tables, with optional elements being moved out of the main table to where they would take up room only if needed.

Page 5: dbms

In the relational model, related records are linked together with a "key".

For instance, a common use of a database system is to track information about users, their name, login information, various addresses and phone numbers. In the navigational approach all of these data would be placed in a single record, and unused items would simply not be placed in the database. In the relational approach, the data would be normalized into a user table, an address table and a phone number table (for instance). Records would be created in these optional tables only if the address or phone numbers were actually provided.

Linking the information back together is the key to this system. In the relational model, some bit of information was used as a "key", uniquely defining a particular record. When information was being collected about a user, information stored in the optional (or related) tables would be found by searching for this key. For instance, if the login name of a user is unique, addresses and phone numbers for that user would be recorded with the login name as its key. This "re-linking" of related data back into a single collection is something that traditional computer languages are not designed for.

Just as the navigational approach would require programs to loop in order to collect records, the relational approach would require loops to collect information about any one record. Codd's solution to the necessary looping was a set-oriented language, a suggestion that would later spawn the ubiquitous SQL. Using a branch of mathematics known as tuple calculus, he demonstrated that such a system could support all the operations of normal databases (inserting, updating

Page 6: dbms

etc.) as well as providing a simple system for finding and returning sets of data in a single operation.

Codd's paper was picked up by two people at the Berkeley, Eugene Wong and Michael Stonebraker. They started a project known as INGRES using funding that had already been allocated for a geographical database project, using student programmers to produce code. Beginning in 1973, INGRES delivered its first test products which were generally ready for widespread use in 1979. During this time, a number of people had moved "through" the group — perhaps as many as 30 people worked on the project, about five at a time. INGRES was similar to System R in a number of ways, including the use of a "language" for data access, known as QUEL — QUEL was in fact relational, having been based on Codd's own Alpha language, but has since been corrupted to follow SQL, thus violating much the same concepts of the relational model as SQL itself.

IBM itself did only one test implementation of the relational model, PRTV, and a production one, Business System 12, both now discontinued. Honeywell did MRDS for Multics, and now there are two new implementations: Alphora Dataphor and Rel. All other DBMS implementations usually called relational are actually SQL DBMSs. In 1968, the University of Michigan began development of the Micro DBMS relational database management system. It was used to manage very large data sets by the US Department of Labor, the Environmental Protection Agency and researchers from University of Alberta, the University of Michigan and Wayne State University. It ran on mainframe computers using Michigan Terminal System. The system remained in production until 1996.

End 1970s SQL DBMS

IBM started working on a prototype system loosely based on Codd's concepts as System R in the early 1970s. The first "quickie" version was ready in 1974/5, and work then started on multi-table systems in which the data could be broken down so that all of the data for a record (much of which is often optional) did not have to be stored in a single large "chunk". Subsequent multi-user versions were tested by customers in 1978 and 1979, by which time a standardized query language, SQL, had been added. Codd's ideas were establishing themselves as both workable and superior to Codasyl, pushing IBM to develop a true production version of System R, known as SQL/DS, and, later, Database 2 (DB2).

Many of the people involved with INGRES became convinced of the future commercial success of such systems, and formed their own

Page 7: dbms

companies to commercialize the work but with an SQL interface. Sybase, Informix, NonStop SQL and eventually Ingres itself were all being sold as offshoots to the original INGRES product in the 1980s. Even Microsoft SQL Server is actually a re-built version of Sybase, and thus, INGRES. Only Larry Ellison's Oracle started from a different chain, based on IBM's papers on System R, and beat IBM to market when the first version was released in 1978.

Stonebraker went on to apply the lessons from INGRES to develop a new database, Postgres, which is now known as PostgreSQL. PostgreSQL is primarily used for global mission critical applications (the .org and .info domain name registries use it as their primary data store, as do many large companies and financial institutions).

In Sweden, Codd's paper was also read and Mimer SQL was developed from the mid-70s at Uppsala University. In 1984, this project was consolidated into an independent enterprise. In the early 1980s, Mimer introduced transaction handling for high robustness in applications, an idea that was subsequently implemented on most other DBMS.

DBMS building blocks

A DBMS includes four main parts: modeling language, data structure, database query language, and transaction mechanisms:

Modeling language

A data modeling language to define the schema of each database hosted in the DBMS, according to the DBMS database model. The four most common types of organizations are the:

hierarchical model , network model , relational model , and object model .

Inverted lists and other methods are also used. A given database management system may provide one or more of the four models. The optimal structure depends on the natural organization of the application's data, and on the application's requirements (which include transaction rate (speed), reliability, maintainability, scalability, and cost).

The dominant model in use today is the ad hoc one embedded in SQL, despite the objections of purists who believe this model is a corruption

Page 8: dbms

of the relational model, since it violates several of its fundamental principles for the sake of practicality and performance. Many DBMSs also support the Open Database Connectivity API that supports a standard way for programmers to access the DBMS.

Data structure

Data structures (fields, records, files and objects) optimized to deal with very large amounts of data stored on a permanent data storage device (which implies relatively slow access compared to volatile main memory).

Database query language

A database query language and report writer allows users to interactively interrogate the database, analyze its data and update it according to the users privileges on data. It also controls the security of the database. Data security prevents unauthorized users from viewing or updating the database. Using passwords, users are allowed access to the entire database or subsets of it called subschemas. For example, an employee database can contain all the data about an individual employee, but one group of users may be authorized to view only payroll data, while others are allowed access to only work history and medical data.

If the DBMS provides a way to interactively enter and update the database, as well as interrogate it, this capability allows for managing personal databases. However, it may not leave an audit trail of actions or provide the kinds of controls necessary in a multi-user organization. These controls are only available when a set of application programs are customized for each data entry and updating function.

Transaction mechanism

A database transaction mechanism ideally guarantees ACID properties in order to ensure data integrity despite concurrent user accesses (concurrency control), and faults (fault tolerance). It also maintains the integrity of the data in the database. The DBMS can maintain the integrity of the database by not allowing more than one user to update the same record at the same time. The DBMS can help prevent duplicate records via unique index constraints; for example, no two customers with the same customer numbers (key fields) can be entered into the database. See ACID properties for more information (Redundancy avoidance).

Page 9: dbms

DBMS Topics

Logical and physical view

Traditional View of Data[2]

A database management system provides the ability for many different users to share data and process resources. But as there can be many different users, there are many different database needs. The question now is: How can a single, unified database meet the differing requirement of so many users?

A DBMS minimizes these problems by providing two views of the database data: a logical (external) view and physical (internal) view. The logical view/user’s view, of a database program represents data in a format that is meaningful to a user and to the software programs that process those data. That is, the logical view tells the user, in user terms, what is in the database. The physical view deals with the actual, physical arrangement and location of data in the direct access storage devices(DASDs). Database specialists use the physical view to make efficient use of storage and processing resources. With the logical view users can see data differently from how they are stored, and they do not want to know all the technical details of physical storage. After all, a business user is primarily interested in using the information, not in how it is stored.

One strength of a DBMS is that while there is only one physical view of the data, there can be an endless number of different logical views. This feature allows users to see database information in a more business-related way rather than from a technical, processing viewpoint. Thus the logical view refers to the way user views data, and

Page 10: dbms

the physical view to the way the data are physically stored and processed...

DBMS Features and capabilities

Alternatively, and especially in connection with the relational model of database management, the relation between attributes drawn from a specified set of domains can be seen as being primary. For instance, the database might indicate that a car that was originally "red" might fade to "pink" in time, provided it was of some particular "make" with an inferior paint job. Such higher arity relationships provide information on all of the underlying domains at the same time, with none of them being privileged above the others.

Throughout recent history specialized databases have existed for scientific, geospatial, imaging, document storage and like uses. Functionality drawn from such applications has lately begun appearing in mainstream DBMSs as well. However, the main focus there, at least when aimed at the commercial data processing market, is still on descriptive attributes on repetitive record structures.

Thus, the DBMSs of today roll together frequently-needed services or features of attribute management. By externalizing such functionality to the DBMS, applications effectively share code with each other and are relieved of much internal complexity. Features commonly offered by database management systems include:

Query ability Querying is the process of requesting attribute information from various perspectives and combinations of factors. Example: "How many 2-door cars in Texas are green?" A database query language and report writer allow users to interactively interrogate the database, analyze its data and update it according to the users privileges on data.

Backup and replication Copies of attributes need to be made regularly in case primary disks or other equipment fails. A periodic copy of attributes may also be created for a distant organization that cannot readily access the original. DBMS usually provide utilities to facilitate the process of extracting and disseminating attribute sets. When data is replicated between database servers, so that the information remains consistent throughout the database system and users cannot tell or even know which server in the DBMS they are using, the system is said to exhibit replication transparency.

Rule enforcement 

Page 11: dbms

Often one wants to apply rules to attributes so that the attributes are clean and reliable. For example, we may have a rule that says each car can have only one engine associated with it (identified by Engine Number). If somebody tries to associate a second engine with a given car, we want the DBMS to deny such a request and display an error message. However, with changes in the model specification such as, in this example, hybrid gas-electric cars, rules may need to change. Ideally such rules should be able to be added and removed as needed without significant data layout redesign.

Security Often it is desirable to limit who can see or change which attributes or groups of attributes. This may be managed directly by individual, or by the assignment of individuals and privileges to groups, or (in the most elaborate models) through the assignment of individuals and groups to roles which are then granted entitlements.

Computation There are common computations requested on attributes such as counting, summing, averaging, sorting, grouping, cross-referencing, etc. Rather than have each computer application implement these from scratch, they can rely on the DBMS to supply such calculations.

Change and access logging Often one wants to know who accessed what attributes, what was changed, and when it was changed. Logging services allow this by keeping a record of access occurrences and changes.

Automated optimization If there are frequently occurring usage patterns or requests, some DBMS can adjust themselves to improve the speed of those interactions. In some cases the DBMS will merely provide tools to monitor performance, allowing a human expert to make the necessary adjustments after reviewing the statistics collected.

Meta-data repository

Metadata is data describing data. For example, a listing that describes what attributes are allowed to be in data sets is called "meta-information". The meta-data is also known as data about data.

Examples of Database Management Systems

Computhink 's ViewWise Microsoft SQL Server

Page 12: dbms

Adabas Alpha Five DataEase Oracle database IBM DB2 Adaptive Server Enterprise FileMaker Firebird Ingres Informix Mark Logic Microsoft Access

InterSystems Caché

Microsoft Visual FoxPro MonetDB MySQL PostgreSQL Progress SQLite Teradata CSQL OpenLink Virtuoso Daffodil DB OpenOffice.org Base Linter SQL RDBMS

SQL Anywhere

Column-oriented DBMS Data warehouse Database-centric architecture Directory service Distributed database management system Document management system Hierarchical model

Navigational database Network model Object model Object database Object-relational database Real time database Associative model of data

Relational model Relational database management system Run Book Automation Comparison of relational database management systems SQL

Computer softwareComputer software, or just software is a general term used to describe a collection of computer programs, procedures and documentation that perform some tasks on a computer system.[1]

Page 13: dbms

The term includes:

Application software such as word processors which perform productive tasks for users.

Firmware which is software programmed resident to electrically programmable memory devices on board mainboards or other types of integrated hardware carriers.

Middleware which controls and co-ordinates distributed systems. System software such as operating systems, which interface with

hardware to provide the necessary services for application software.

Software testing is a domain independent of development and programming. It consists of various methods to test and declare a software product fit before it can be launched for use by either an individual or a group. Many tests on functionality, performance and appearance are conducted by modern testers with various tools such as QTP, Load runner, Black box testing etc to edit a checklist of requirements against the developed code. ISTQB is a certification that is in demand for engineers who want to pursue a career in testing.[2]

Testware which is an umbrella term or container term for all utilities and application software that serve in combination for testing a software package but not necessarily may optionally contribute to operational purposes. As such, testware is not a standing configuration but merely a working environment for application software or subsets thereof.

Software includes websites, programs, video games, etc. that are coded by programming languages like C, C++, etc.

"Software" is sometimes used in a broader context to mean anything which is not hardware but which is used with hardware, such as film, tapes and records.[3]

Overview

Computer software are often regarded as anything but hardware, meaning that the "hard" are the parts that are tangible while the "soft" part is the intangible objects inside the computer. Software encompasses an extremely wide array of products and technologies developed using different techniques like programming languages, scripting languages or even microcode or a FPGA state. The types of software include web pages developed by technologies like HTML, PHP, Perl, JSP, ASP.NET, XML, and desktop applications like Microsoft Word,

Page 14: dbms

OpenOffice developed by technologies like C, C++, Java, C#, etc. Software usually runs on an underlying software operating systems such as the Microsoft Windows or Linux. Software also includes video games and the logic systems of modern consumer devices such as automobiles, televisions, toasters, etc.

Relationship to computer hardware

Computer software is so called to distinguish it from computer hardware, which encompasses the physical interconnections and devices required to store and execute (or run) the software. At the lowest level, software consists of a machine language specific to an individual processor. A machine language consists of groups of binary values signifying processor instructions which change the state of the computer from its preceding state. Software is an ordered sequence of instructions for changing the state of the computer hardware in a particular sequence. It is usually written in high-level programming languages that are easier and more efficient for humans to use (closer to natural language) than machine language. High-level languages are compiled or interpreted into machine language object code. Software may also be written in an assembly language, essentially, a mnemonic representation of a machine language using a natural language alphabet. Assembly language must be assembled into object code via an assembler.

The term "software" was first used in this sense by John W. Tukey in 1958.[4] In computer science and software engineering, computer software is all computer programs. The theory that is the basis for most modern software was first proposed by Alan Turing in his 1935 essay Computable numbers with an application to the Entscheidungsproblem.[5]

Types of software

Page 15: dbms

A layer structure showing where Operating System is located on generally used software systems on desktops

Practical computer systems divide software systems into three major classes: system software, programming software and application software, although the distinction is arbitrary, and often blurred.

System software

System software helps run the computer hardware and computer system. It includes:

device drivers , operating systems , servers , utilities , Faraware , windowing systems ,

(these things need not be distinct)

The purpose of systems software is to unburden the applications programmer from the details of the particular computer complex being used, including such accessory devices as communications, printers, readers, displays, keyboards, etc. And also to partition the computer's resources such as memory and processor time in a safe and stable manner.

Programming software

Page 16: dbms

Programming software usually provides tools to assist a programmer in writing computer programs, and software using different programming languages in a more convenient way. The tools include:

compilers , debuggers , interpreters , linkers , text editors ,

An Integrated development environment (IDE) is a single application that attempts to manage all these functions.

Application software

Application software allows end users to accomplish one or more specific (not directly computer development related) tasks. Typical applications include:

industrial automation , business software , computer games , telecommunications , (ie the internet and everything that flows

on it) databases , educational software , medical software ,

Application software exists for and has impacted a wide variety of topics.

Software topics

Architecture

See also: Software architecture

Users often see things differently than programmers. People who use modern general purpose computers (as opposed to embedded systems, analog computers, supercomputers, etc.) usually see three layers of software performing a variety of tasks: platform, application, and user software.

Platform software: Platform includes the firmware, device drivers, an operating system, and typically a graphical user

Page 17: dbms

interface which, in total, allow a user to interact with the computer and its peripherals (associated equipment). Platform software often comes bundled with the computer. On a PC you will usually have the ability to change the platform software.

Application software: Application software or Applications are what most people think of when they think of software. Typical examples include office suites and video games. Application software is often purchased separately from computer hardware. Sometimes applications are bundled with the computer, but that does not change the fact that they run as independent applications. Applications are almost always independent programs from the operating system, though they are often tailored for specific platforms. Most users think of compilers, databases, and other "system software" as applications.

User-written software: End-user development tailors systems to meet users' specific needs. User software include spreadsheet templates, word processor macros, scientific simulations, and scripts for graphics and animations. Even email filters are a kind of user software. Users create this software themselves and often overlook how important it is. Depending on how competently the user-written software has been integrated into default application packages, many users may not be aware of the distinction between the original packages, and what has been added by co-workers.

Documentation

Main article: Software documentation

Most software has software documentation so that the end user can understand the program, what it does and how to use it. Without a clear documentation a software can be hard to use and especially if it is a very specialized and relatively complex software like the Photoshop, AutoCAD, etc.

Developer documentation may also exist, either with the code as comments and/or as separate files, detailing how the programs works and can be modified.

Library

Main article: Software library

A executable is almost always not sufficiently complete for direct execution. Software libraries include collections of functions and functionality that may be embedded in other applications. Operating

Page 18: dbms

systems include many standard Software libraries, and applications are often distributed with their own libraries.

Standard

Main article: Software standard

Since software can be designed using many different programming languages and in many different operating systems and operating environments, software standard is needed so that different software can understand and exchange information between each other. For instance, an email sent from a Microsoft Outlook should be readable from Yahoo! Mail and vice versa.

Execution

Main article: Execution (computing)

Computer software has to be "loaded" into the computer's storage (such as a [hard drive], memory, or RAM). Once the software has loaded, the computer is able to execute the software. This involves passing instructions from the application software, through the system software, to the hardware which ultimately receives the instruction as machine code. Each instruction causes the computer to carry out an operation – moving data, carrying out a computation, or altering the control flow of instructions.

Data movement is typically from one place in memory to another. Sometimes it involves moving data between memory and registers which enable high-speed data access in the CPU. Moving data, especially large amounts of it, can be costly. So, this is sometimes avoided by using "pointers" to data instead. Computations include simple operations such as incrementing the value of a variable data element. More complex computations may involve many operations and data elements together.

Quality and reliability

Main articles: Software quality, Software testing, and Software reliability

Software quality is very important, especially for commercial and system software like Microsoft Office, Microsoft Windows, Linux, etc. If software is faulty (buggy), it can delete a person's work, crash the computer and do other unexpected things. Faults and errors are called

Page 19: dbms

"bugs". Many bugs are discovered and eliminated (debugged) through software testing. However, software testing rarely – if ever – eliminates every bug; some programmers say that "every program has at least one more bug" (Lubarsky's Law). All major software companies, such as Microsoft, Novell and Sun Microsystems, have their own software testing departments with the specific goal of just testing. Software can be tested through unit testing, regression testing and other methods, which are done manually, or most commonly, automatically, since the amount of code to be tested can be quite large. For instance, NASA has extremely rigorous software testing procedures for its Space Shuttle and other programs because faulty software can crash the whole program and make the vehicle not functional, at great expense.

License

Main article: Software license

The software's license gives the user the right to use the software in the licensed environment. Some software comes with the license when purchased off the shelf, or an OEM license when bundled with hardware. Other software comes with a free software license, granting the recipient the rights to modify and redistribute the software. Software can also be in the form of freeware or shareware. See also License Management.

Patents

Main articles: Software patent and Software patent debate

Software can be patented; however software patents can be controversial in the software industry with many people holding different views about it. The controversy over software patents is that a specific algorithm or technique that the software has cannot be duplicated by others and is considered an intellectual property and copyright infringement depending on the severity. Some people believe that software patent hinder software development, while others argue that software patents provide an important incentive to spur software innovation.

Ethics and rights

Main article: Computer ethicsThis section may contain original research or unverified claims. Please improve the article by adding references. See the talk page for details. (July 2008)

Page 20: dbms

There is more than one approach to creating, licensing, and distributing software. For instance, the free software or the open source community produces software under licensing that makes it free for inspection of its code, modification of its code, and distribution. While the software released under an open source license (such as General Public License, or GPL for short) can be sold for money,[6] the distribution cannot be restricted in the same way as software with copyright and patent restrictions (used by corporations to require licensing fees).

While some advocates of free software use slogans such as "information wants to be free," hinting that it is easy to copy digital data and that the licenses (enforced through laws) are unnatural restrictions, other creators and users of open source software recognize it to be one model among many for software creation, licensing, and distribution. And the laws themselves are put into place for the ostensible purpose of increasing creative output, by allowing the creators to control and profit most effectively from their intellectual property.

Design and implementation

Main articles: Software development, Computer programming, and Software engineering

Design and implementation of a software varies depending on the complexity of the software. For instance design and creation of Microsoft Word software will take much longer time than designing and developing Microsoft Notepad because of the difference in functionalities in each one.

Software is usually designed and created (coded/written/programmed) in integrated development environments (IDE) like emacs, xemacs, Microsoft Visual Studio and Eclipse that can simplify the process and compile the program. As noted in different section, software is usually created on top of an existing software and the application programming interface (API) that the underlying software provides like GTK+, JavaBeans, Swing etc. Libraries (APIs) are categorized for different purposes. For instance JavaBeans library is used for designing enterprise applications, Windows Forms library is used for designing graphical user interface (GUI) applications like Microsoft Word and Windows Communication Foundation is used for designing web services. There are also underlying concepts in computer programming like quicksort, hashtable, array, binary tree that can be useful to creating a software. When a program is designed, it relies on the API.

Page 21: dbms

For instance, if a user is designing a Microsoft Windows desktop application, he/she might use the .NET Windows Forms library to design the desktop application and call its APIs like Form1.Close() and Form1.Show() to close or open the application and write the additional operations him/herself that it need to have. Without these APIs, the programmer needs to write these APIs him/herself. Companies like Sun Microsystems, Novell and Microsoft provide their own APIs so that many applications are written using their software libraries that usually have numerous APIs in them.

Software has special economic characteristics that make its design, creation, and distribution different from most other economic goods.[7]

[8]

A title of a person that creates a software is called a programmer, software engineer, software developer and code monkey that all essentially have a same meaning.

Industry and organizations

Main article: Software industry

Software has its own niche industry that is called the software industry made up of different entities and peoples that produce software, and as a result there are many software companies and programmers in the world. Because software is increasingly used in many different areas like in finance, searching, mathematics, space exploration, gaming and mining and such, software companies and people usually specialize in certain areas. For instance, Electronic Arts primarily creates video games.

Also selling a software can be quite a profitable industry. For instance, Bill Gates, the founder of Microsoft is the second richest man in the world in 2008 largely by selling the Microsoft Windows and Microsoft Office software programs, and same goes for Larry Ellison largely through his Oracle database software.

There are also many non-profit software organizations like the Free Software Foundation, GNU Project, Mozilla Foundation. Also there are many software standard organizations like the W3C, IETF and others that try to come up with a software standard so that many software can work and interoperate with each other like through standards such as XML, HTML, HTTP, FTP, etc.

Page 22: dbms

Some of the well known software companies include Microsoft, Apple, IBM, Oracle, Novell, SAP, HP, etc.[9]

See also

Lists List of basic

computer programming topics

List of computer programming topics

Origins of computer terms

Types of software Custom

Software Free software Freeware Open source

software Proprietary

software Scientific

software

Shareware

Software portal

Related subjects Software as a

Service Software

development process

Software ecosystem

Software industry

Software license

DatabaseJump to: navigation, searchThis article is principally about managing and structuring the collections of data held on computers. For a fuller discussion of DBMS software, see Database management system. For databased content libraries, see Online database

A database is a structured collection of records or data that is stored in a computer system. The structure is achieved by organizing the data according to a database model. The model in most common use today is the relational model. Other models such as the hierarchical model and the network model use a more explicit representation of relationships.

Database topics

Page 23: dbms

Architecture

Depending on the intended use, there are a number of database architectures in use. Many databases use a combination of strategies. On-line Transaction Processing systems (OLTP) often use a row-oriented datastore architecture, while data-warehouse and other retrieval-focused applications like Google's BigTable, or bibliographic database (library catalogue) systems may use a Column-oriented DBMS architecture.

Document-Oriented, XML, knowledgebases, as well as frame databases and RDF-stores (aka triple-stores), may also use a combination of these architectures in their implementation.

Finally, it should be noted that not all databases have or need a database 'schema' (so called schema-less databases).

Over many years the database industry has been dominated by General Purpose database systems, which offer a wide range of functions that are applicable to many, if not most circumstances in modern data processing. These have been enhanced with extensible datatypes, pioneered in the PostgreSQL project, to allow a very wide range of applications to be developed.

There are also other types of database which cannot be classified as relational databases.

Database management systems

A computer database relies on software to organize the storage of data. This software is known as a database management system (DBMS). Database management systems are categorized according to the database model that they support. The model tends to determine the query languages that are available to access the database. A great deal of the internal engineering of a DBMS, however, is independent of the data model, and is concerned with managing factors such as performance, concurrency, integrity, and recovery from hardware failures. In these areas there are large differences between products.

A Relational Database Management System (RDBMS) implements the features of the relational model outlined above. In this context, Date's "Information Principle" states: "the entire information content of the database is represented in one and only one way. Namely as explicit values in column positions (attributes) and rows in relations (tuples). Therefore, there are no explicit pointers between related tables."

Page 24: dbms

Database models

Main article: Database model

Post-relational database models

Products offering a more general data model than the relational model are sometimes classified as post-relational. The data model in such products incorporates relations but is not constrained by the Information Principle, which requires that all information is represented by data values in relations.

Some of these extensions to the relational model actually integrate concepts from technologies that pre-date the relational model. For example, they allow representation of a directed graph with trees on the nodes.

Some products implementing such models have been built by extending relational database systems with non-relational features. Others, however, have arrived in much the same place by adding relational features to pre-relational systems. Paradoxically, this allows products that are historically pre-relational, such as PICK and MUMPS, to make a plausible claim to be post-relational in their current architecture.

Object database models

In recent years, the object-oriented paradigm has been applied to database technology, creating a new programming model known as object databases. These databases attempt to bring the database world and the application programming world closer together, in particular by ensuring that the database uses the same type system as the application program. This aims to avoid the overhead (sometimes referred to as the impedance mismatch) of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). At the same time, object databases attempt to introduce the key ideas of object programming, such as encapsulation and polymorphism, into the world of databases.

A variety of these ways have been tried for storing objects in a database. Some products have approached the problem from the application programming end, by making the objects manipulated by the program persistent. This also typically requires the addition of some kind of query language, since conventional programming

Page 25: dbms

languages do not have the ability to find objects based on their information content. Others have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities.

Database storage structures

Main article: Database storage structures

This section requires expansion.

Relational database tables/indexes are typically stored in memory or on hard disk in one of many forms, ordered/unordered flat files, ISAM, heaps, hash buckets or B+ trees. These have various advantages and disadvantages discussed further in the main article on this topic. The most commonly used are B+ trees and ISAM.

Object databases use a range of storage mechanisms. Some use virtual memory mapped files to make the native language (C++, Java etc.) objects persistent. This can be highly efficient but it can make multi-language access more difficult. Others break the objects down into fixed and varying length components that are then clustered tightly together in fixed sized blocks on disk and reassembled into the appropriate format either for the client or in the client address space. Another popular technique is to store the objects in tuples, much like a relational database, which the database server then reassembles for the client.

Other important design choices relate to the clustering of data by category (such as grouping data by month, or location), creating pre-computed views known as materialized views, partitioning data by range or hash. Memory management and storage topology can be important design choices for database designers as well. Just as normalization is used to reduce storage requirements and improve the extensibility of the database, conversely denormalization is often used to reduce join complexity and reduce execution time for queries.[1]

Indexing

All of these databases can take advantage of indexing to increase their speed. This technology has advanced tremendously since its early uses in the 1960s and 1970s. The most common kind of index is a sorted list of the contents of some particular table column, with pointers to the row associated with the value. An index allows a set of table rows

Page 26: dbms

matching some criterion to be located quickly. Typically, indexes are also stored in the various forms of data-structure mentioned above (such as B-trees, hashes, and linked lists). Usually, a specific technique is chosen by the database designer to increase efficiency in the particular case of the type of index required.

Most relational DBMS's and some object DBMSs have the advantage that indexes can be created or dropped without changing existing applications making use of it. The database chooses between many different strategies based on which one it estimates will run the fastest. In other words, indexes are transparent to the application or end-user querying the database; while they affect performance, any SQL command will run with or without index to compute the result of an SQL statement. The RDBMS will produce a plan of how to execute the query, which is generated by analyzing the run times of the different algorithms and selecting the quickest. Some of the key algorithms that deal with joins are nested loop join, sort-merge join and hash join. Which of these is chosen depends on whether an index exists, what type it is, and its cardinality.

An index speeds up access to data, but it has disadvantages as well. First, every index increases the amount of storage on the hard drive necessary for the database file, and second, the index must be updated each time the data are altered, and this costs time. (Thus an index saves time in the reading of data, but it costs time in entering and altering data. It thus depends on the use to which the data are to be put whether an index is on the whole a net plus or minus in the quest for efficiency.)

A special case of an index is a primary index, or primary key, which is distinguished in that the primary index must ensure a unique reference to a record. Often, for this purpose one simply uses a running index number (ID number). Primary indexes play a significant role in relational databases, and they can speed up access to data considerably.

Transactions and concurrency

In addition to their data model, most practical databases ("transactional databases") attempt to enforce a database transaction. Ideally, the database software should enforce the ACID rules, summarized here:

Atomicity : Either all the tasks in a transaction must be done, or none of them. The transaction must be completed, or else it must be undone (rolled back).

Page 27: dbms

Consistency : Every transaction must preserve the integrity constraints — the declared consistency rules — of the database. It cannot place the data in a contradictory state.

Isolation : Two simultaneous transactions cannot interfere with one another. Intermediate results within a transaction are not visible to other transactions.

Durability : Completed transactions cannot be aborted later or their results discarded. They must persist through (for instance) restarts of the DBMS after crashes

In practice, many DBMSs allow most of these rules to be selectively relaxed for better performance.

Concurrency control is a method used to ensure that transactions are executed in a safe manner and follow the ACID rules. The DBMS must be able to ensure that only serializable, recoverable schedules are allowed, and that no actions of committed transactions are lost while undoing aborted transactions.

Replication

Replication of databases is closely related to transactions. If a database can log its individual actions, it is possible to create a duplicate of the data in real time. The duplicate can be used to improve performance or availability of the whole database system. Common replication concepts include:

Master/Slave Replication: All write requests are performed on the master and then replicated to the slaves

Quorum: The result of Read and Write requests are calculated by querying a "majority" of replicas.

Multimaster: Two or more replicas sync each other via a transaction identifier.

Parallel synchronous replication of databases enables transactions to be replicated on multiple servers simultaneously, which provides a method for backup and security as well as data availability.

Security

Database security denotes the system, processes, and procedures that protect a database from unintended activity.

Security is usually enforced through access control, auditing, and encryption.

Page 28: dbms

Access control ensures and restricts who can connect and what can be done to the database.

Auditing logs what action or change has been performed, when and by whom.

Encryption: Since security has become a major issue in recent years, many commercial database vendors provide built-in encryption mechanisms. Data is encoded natively into the tables and deciphered "on the fly" when a query comes in. Connections can also be secured and encrypted if required using DSA, MD5, SSL or legacy encryption standard.

Enforcing security is one of the major tasks of the DBA.

In the United Kingdom, legislation protecting the public from unauthorized disclosure of personal information held on databases falls under the Office of the Information Commissioner. United Kingdom based organizations holding personal data in electronic format (databases for example) are required to register with the Data Commissioner.[2]

Locking

This section requires expansion.

Locking is how the database handles multiple concurrent operations. This is how concurrency and some form of basic integrity is managed within the database system. Such locks can be applied on a row level, or on other levels like page (a basic data block), extent (multiple array of pages) or even an entire table. This helps maintain the integrity of the data by ensuring that only one process at a time can modify the same data.

In basic filesystem files or folders, only one lock at a time can be set, restricting the usage to one process only. Databases, on the other hand, can set and hold mutiple locks at the same time on the different level of the physical data structure. How locks are set, last is determined by the database engine locking scheme based on the submitted SQL or transactions by the users. Generally speaking, no activity on the database should be translated by no or very light locking.

For most DBMS systems existing on the market, locks are generally shared or exclusive. Exclusive locks mean that no other lock can acquire the current data object as long as the exclusive lock lasts.

Page 29: dbms

Exclusive locks are usually set while the database needs to change data, like during an UPDATE or DELETE operation.

Shared locks can take ownership one from the other of the current data structure. Shared locks are usually used while the database is reading data, during a SELECT operation. The number, nature of locks and time the lock holds a data block can have a huge impact on the database performances. Bad locking can lead to disastrous performance response (usually the result of poor SQL requests, or inadequate database physical structure)

Default locking behavior is enforced by the isolation level of the data server. Changing the isolation level will affect how shared or exclusive locks must be set on the data for the entire database system. Default isolation is generally 1, where data can not be read while it is modified, forbidding to return "ghost data" to end user.

At some point intensive or inappropriate exclusive locking, can lead to the "dead lock" situation between two locks. Where none of the locks can be released because they try to acquire resources mutually from each other. The Database has a fail safe mechanism and will automatically "sacrifice" one of the locks releasing the resource. Doing so processes or transactions involved in the "dead lock" will be rolled back.

Databases can also be locked for other reasons, like access restrictions for given levels of user. Some databases are also locked for routine database maintenance, which prevents changes being made during the maintenance. See "Locking tables and databases" (section in some documentation / explanation from IBM) for more detail.) However, many modern databases don't lock the database during routine maintenance. e.g. "Routine Database Maintenance" for PostgreSQL.

Applications of databases

Databases are used in many applications, spanning virtually the entire range of computer software. Databases are the preferred method of storage for large multiuser applications, where coordination between many users is needed. Even individual users find them convenient, and many electronic mail programs and personal organizers are based on standard database technology. Software database drivers are available for most database platforms so that application software can use a common Application Programming Interface to retrieve the information stored in a database. Two commonly used database APIs are JDBC and ODBC.

Page 30: dbms

See also

Comparison of relational database management systems Comparison of database tools Database-centric architecture Database theory Government database Object database Online database Real time database Relational database

Database modelFrom Wikipedia, the free encyclopedia

Jump to: navigation, search

A database model or database schema is the structure or format of a database, described in a formal language supported by the database management system. Schemas are generally stored in a data dictionary.

Collage of five types of database models.

Page 31: dbms

Although a schema is defined in text database language, the term is often used to refer to a graphical depiction of the database structure.[1]

Overview

A database model is a theory or specification describing how a database is structured and used. Several such models have been suggested.

Common models include:

Hierarchical model Network model Relational model Entity-relationship Object-relational model Object model

A data model is not just a way of structuring data: it also defines a set of operations that can be performed on the data. The relational model, for example, defines operations such as select, project, and join. Although these operations may not be explicit in a particular query language, they provide the foundation on which a query language is built.

Models

Various techniques are used to model data structure. Most database systems are built around one particular data model, although it is increasingly common for products to offer support for more than one model. For any one logical model various physical implementations may be possible, and most products will offer the user some level of control in tuning the physical implementation, since the choices that are made have a significant effect on performance. An example of this is the relational model: all serious implementations of the relational model allow the creation of indexes which provide fast access to rows in a table if the values of certain columns are known.

Flat model

Page 32: dbms

Flat File Model.[1]

This may not strictly qualify as a data model, as defined above.

The flat (or table) model consists of a single, two-dimensional array of data elements, where all members of a given column are assumed to be similar values, and all members of a row are assumed to be related to one another. For instance, columns for name and password that might be used as a part of a system security database. Each row would have the specific password associated with an individual user. Columns of the table often have a type associated with them, defining them as character data, date or time information, integers, or floating point numbers.

Hierarchical model

Hierarchical Model.[1]

Main article: Hierarchical model

In a hierarchical model, data is organized into a tree-like structure, implying a single upward link in each record to describe the nesting, and a sort field to keep the records in a particular order in each same-level list. Hierarchical structures were widely used in the early mainframe database management systems, such as the Information Management System (IMS) by IBM, and now describe the structure of XML documents. This structure allows one 1:N relationship between two types of data. This structure is very efficient to describe many relationships in the real world; recipes, table of contents, ordering of paragraphs/verses, any nested and sorted information. However, the hierarchical structure is inefficient for certain database operations

Page 33: dbms

when a full path (as opposed to upward link and sort field) is not also included for each record.

One limitation of the hierarchical model is its inability to efficiently represent redundancy in data. Entity-Attribute-Value database models like Caboodle by Swink are based on this structure.

Parent–child relationship: Child may only have one parent but a parent can have multiple children. Parents and children are tied together by links called "pointers“. A parent will have a list of pointers to each of their children.

Network model

Network Model.[1]

Main article: Network model

The network model (defined by the CODASYL specification) organizes data using two fundamental constructs, called records and sets. Records contain fields (which may be organized hierarchically, as in the programming language COBOL). Sets (not to be confused with mathematical sets) define one-to-many relationships between records: one owner, many members. A record may be an owner in any number of sets, and a member in any number of sets.

The network model is a variation on the hierarchical model, to the extent that it is built on the concept of multiple branches (lower-level structures) emanating from one or more nodes (higher-level structures), while the model differs from the hierarchical model in that branches can be connected to multiple nodes. The network model is able to represent redundancy in data more efficiently than in the hierarchical model.

Page 34: dbms

The operations of the network model are navigational in style: a program maintains a current position, and navigates from one record to another by following the relationships in which the record participates. Records can also be located by supplying key values.

Although it is not an essential feature of the model, network databases generally implement the set relationships by means of pointers that directly address the location of a record on disk. This gives excellent retrieval performance, at the expense of operations such as database loading and reorganization.

Most object databases use the navigational concept to provide fast navigation across networks of objects, generally using object identifiers as "smart" pointers to related objects. Objectivity/DB, for instance, implements named 1:1, 1:many, many:1 and many:many named relationships that can cross databases. Many object databases also support SQL, combining the strengths of both models.

Relational model

Example of a Relational Model.[1]

The relational model was introduced by E. F. Codd in 1970[2] as a way to make database management systems more independent of any particular application. It is a mathematical model defined in terms of predicate logic and set theory.

The products that are generally referred to as relational databases in fact implement a model that is only an approximation to the mathematical model defined by Codd. Three key terms are used extensively in relational database models: relations, attributes, and domains. A relation is a table with columns and rows. The named

Page 35: dbms

columns of the relation are called attributes, and the domain is the set of values the attributes are allowed to take.

The basic data structure of the relational model is the table, where information about a particular entity (say, an employee) is represented in columns and rows (also called tuples). Thus, the "relation" in "relational database" refers to the various tables in the database; a relation is a set of tuples. The columns enumerate the various attributes of the entity (the employee's name, address or phone number, for example), and a row is an actual instance of the entity (a specific employee) that is represented by the relation. As a result, each tuple of the employee table represents various attributes of a single employee.

All relations (and, thus, tables) in a relational database have to adhere to some basic rules to qualify as relations. First, the ordering of columns is immaterial in a table. Second, there can't be identical tuples or rows in a table. And third, each tuple will contain a single value for each of its attributes.

A relational database contains multiple tables, each similar to the one in the "flat" database model. One of the strengths of the relational model is that, in principle, any value occurring in two different records (belonging to the same table or to different tables), implies a relationship among those two records. Yet, in order to enforce explicit integrity constraints, relationships between records in tables can also be defined explicitly, by identifying or non-identifying parent-child relationships characterized by assigning cardinality (1:1, (0)1:M, M:M). Tables can also have a designated single attribute or a set of attributes that can act as a "key", which can be used to uniquely identify each tuple in the table.

A key that can be used to uniquely identify a row in a table is called a primary key. Keys are commonly used to join or combine data from two or more tables. For example, an Employee table may contain a column named Location which contains a value that matches the key of a Location table. Keys are also critical in the creation of indexes, which facilitate fast retrieval of data from large tables. Any column can be a key, or multiple columns can be grouped together into a compound key. It is not necessary to define all the keys in advance; a column can be used as a key even if it was not originally intended to be one.

A key that has an external, real-world meaning (such as a person's name, a book's ISBN, or a car's serial number) is sometimes called a "natural" key. If no natural key is suitable (think of the many people

Page 36: dbms

named Brown), an arbitrary or surrogate key can be assigned (such as by giving employees ID numbers). In practice, most databases have both generated and natural keys, because generated keys can be used internally to create links between rows that cannot break, while natural keys can be used, less reliably, for searches and for integration with other databases. (For example, records in two independently developed databases could be matched up by social security number, except when the social security numbers are incorrect, missing, or have changed.)

Dimensional model

The dimensional model is a specialized adaptation of the relational model used to represent data in data warehouses in a way that data can be easily summarized using OLAP queries. In the dimensional model, a database consists of a single large table of facts that are described using dimensions and measures. A dimension provides the context of a fact (such as who participated, when and where it happened, and its type) and is used in queries to group related facts together. Dimensions tend to be discrete and are often hierarchical; for example, the location might include the building, state, and country. A measure is a quantity describing the fact, such as revenue. It's important that measures can be meaningfully aggregated - for example, the revenue from different locations can be added together.

In an OLAP query, dimensions are chosen and the facts are grouped and added together to create a summary.

The dimensional model is often implemented on top of the relational model using a star schema, consisting of one table containing the facts and surrounding tables containing the dimensions. Particularly complicated dimensions might be represented using multiple tables, resulting in a snowflake schema.

A data warehouse can contain multiple star schemas that share dimension tables, allowing them to be used together. Coming up with a standard set of dimensions is an important part of dimensional modeling.

Object database models

Page 37: dbms

Example of a Object-Oriented Model.[1]

Main article: Object-relational modelMain article: Object model

In recent years, the object-oriented paradigm has been applied to database technology, creating a new programming model known as object databases. These databases attempt to bring the database world and the application programming world closer together, in particular by ensuring that the database uses the same type system as the application program. This aims to avoid the overhead (sometimes referred to as the impedance mismatch) of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). At the same time, object databases attempt to introduce the key ideas of object programming, such as encapsulation and polymorphism, into the world of databases.

A variety of these ways have been tried for storing objects in a database. Some products have approached the problem from the application programming end, by making the objects manipulated by the program persistent. This also typically requires the addition of some kind of query language, since conventional programming languages do not have the ability to find objects based on their information content. Others have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities.

Object databases suffered because of a lack of standardization: although standards were defined by ODMG, they were never implemented well enough to ensure interoperability between products. Nevertheless, object databases have been used successfully in many applications: usually specialized applications such as engineering databases or molecular biology databases rather than mainstream commercial data processing. However, object database ideas were

Page 38: dbms

picked up by the relational vendors and influenced extensions made to these products and indeed to the SQL language.

See also

Associative Concept-oriented Entity-attribute-value Information model Multi-dimensional model Semantic data model Semi-structured Star schema XML database

Network modelThe network model is a database model conceived as a flexible way of representing objects and their relationships.

Example of a Network Model.

The network model's original inventor was Charles Bachman, and it was developed into a standard specification published in 1969 by the CODASYL Consortium.

Overview

Page 39: dbms

Where the hierarchical model structures data as a tree of records, with each record having one parent record and many children, the network model allows each record to have multiple parent and child records, forming a lattice structure.

The chief argument in favour of the network model, in comparison to the hierarchic model, was that it allowed a more natural modeling of relationships between entities. Although the model was widely implemented and used, it failed to become dominant for two main reasons. Firstly, IBM chose to stick to the hierarchical model with semi-network extensions in their established products such as IMS and DL/I. Secondly, it was eventually displaced by the relational model, which offered a higher-level, more declarative interface. Until the early 1980s the performance benefits of the low-level navigational interfaces offered by hierarchical and network databases were persuasive for many large-scale applications, but as hardware became faster, the extra productivity and flexibility of the relational model led to the gradual obsolescence of the network model in corporate enterprise usage.

Some Well-known Network Databases

TurboIMAGE IDMS (Integrated Database Management System) RDM Embedded RDM Server

History

In 1969, the Conference on Data Systems Languages (CODASYL) established the first specification of the network database model. This was followed by a second publication in 1971, which became the basis for most implementations. Subsequent work continued into the early 1980s, culminating in an ISO specification, but this had little influence on products.

See also

CODASYL Navigational database Semantic Web

Relational model

Page 40: dbms

The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by E.F. Codd [1][2][3]

Example of a Relational model.[4]

Overview

Its core idea is to describe a database as a collection of predicates over a finite set of predicate variables, describing constraints on the possible values and combinations of values. The content of the database at any given time is a finite (logical) model of the database, i.e. a set of relations, one per predicate variable, such that all predicates are satisfied. A request for information from the database (a database query) is also a predicate.

Relational model concepts.

Page 41: dbms

In the relational model, related records are linked together with a "key".

The purpose of the relational model is to provide a declarative method for specifying data and queries: we directly state what information the database contains and what information we want from it, and let the database management system software take care of describing data structures for storing the data and retrieval procedures for getting queries answered.

IBM's original implementation of Codd's ideas was System R. There have been several commercial and open source products based on Codd's ideas, including IBM's DB2, Oracle Database, Microsoft SQL Server, PostgreSQL, MySQL, and many others. Most of these use the SQL data definition and query language. A table in an SQL database schema corresponds to a predicate variable; the contents of a table to a relation; key constraints, other constraints, and SQL queries correspond to predicates. However, it must be noted that SQL databases, including DB2, deviate from the relational model in many details; Codd fiercely argued against deviations that compromise the original principles[5].

Alternatives to the relational model

Other models are the hierarchical model and network model. Some systems using these older architectures are still in use today in data centers with high data volume needs or where existing systems are so complex and abstract it would be cost prohibitive to migrate to systems employing the relational model; also of note are newer object-oriented databases.A recent development is the Object-Relation type-Object model, which is based on the assumption that any fact can be expressed in the form

Page 42: dbms

of one or more binary relationships. The model is used in Object Role Modeling (ORM), RDF/Notation 3 (N3) and in Gellish English.

The relational model was the first formal database model. After it was defined, informal models were made to describe hierarchical databases (the hierarchical model) and network databases (the network model). Hierarchical and network databases existed before relational databases, but were only described as models after the relational model was defined, in order to establish a basis for comparison.

Implementation

There have been several attempts to produce a true implementation of the relational database model as originally defined by Codd and explained by Date, Darwen and others, but none have been popular successes so far. Rel is one of the more recent attempts to do this.

History

The relational model was invented by E.F. (Ted) Codd as a general model of data, and subsequently maintained and developed by Chris Date and Hugh Darwen among others. In The Third Manifesto (first published in 1995) Date and Darwen show how the relational model can accommodate certain desired object-oriented features.

Controversies

Codd himself, some years after publication of his 1970 model, proposed a three-valued logic (True, False, Missing or NULL) version of it in order to deal with missing information, and in his The Relational Model for Database Management Version 2 (1990) he went a step further with a four-valued logic (True, False, Missing but Applicable, Missing but Inapplicable) version. But these have never been implemented, presumably because of attending complexity. SQL's NULL construct was intended to be part of a three-valued logic system, but fell short of that due to logical errors in the standard and in its implementations.

Relational model topics

The model

The fundamental assumption of the relational model is that all data is represented as mathematical n -ary relations , an n-ary relation being a subset of the Cartesian product of n domains. In the mathematical

Page 43: dbms

model, reasoning about such data is done in two-valued predicate logic, meaning there are two possible evaluations for each proposition: either true or false (and in particular no third value such as unknown, or not applicable, either of which are often associated with the concept of NULL). Some think two-valued logic is an important part of the relational model, while others think a system that uses a form of three-valued logic can still be considered relational.[citation needed][who?]

Data are operated upon by means of a relational calculus or relational algebra, these being equivalent in expressive power.

The relational model of data permits the database designer to create a consistent, logical representation of information. Consistency is achieved by including declared constraints in the database design, which is usually referred to as the logical schema. The theory includes a process of database normalization whereby a design with certain desirable properties can be selected from a set of logically equivalent alternatives. The access plans and other implementation and operation details are handled by the DBMS engine, and are not reflected in the logical model. This contrasts with common practice for SQL DBMSs in which performance tuning often requires changes to the logical model.

The basic relational building block is the domain or data type, usually abbreviated nowadays to type. A tuple is an unordered set of attribute values. An attribute is an ordered pair of attribute name and type name. An attribute value is a specific valid value for the type of the attribute. This can be either a scalar value or a more complex type.

A relation consists of a heading and a body. A heading is a set of attributes. A body (of an n-ary relation) is a set of n-tuples. The heading of the relation is also the heading of each of its tuples.

A relation is defined as a set of n-tuples. In both mathematics and the relational database model, a set is an unordered collection of items, although some DBMSs impose an order to their data. In mathematics, a tuple has an order, and allows for duplication. E.F. Codd originally defined tuples using this mathematical definition.[6] Later, it was one of E.F. Codd's great insights that using attribute names instead of an ordering would be so much more convenient (in general) in a computer language based on relations[citation needed]. This insight is still being used today. Though the concept has changed, the name "tuple" has not. An immediate and important consequence of this distinguishing feature is that in the relational model the Cartesian product becomes commutative.

Page 44: dbms

A table is an accepted visual representation of a relation; a tuple is similar to the concept of row, but note that in the database language SQL the columns and the rows of a table are ordered.[citation needed]

A relvar is a named variable of some specific relation type, to which at all times some relation of that type is assigned, though the relation may contain zero tuples.

The basic principle of the relational model is the Information Principle: all information is represented by data values in relations. In accordance with this Principle, a relational database is a set of relvars and the result of every query is presented as a relation.

The consistency of a relational database is enforced, not by rules built into the applications that use it, but rather by constraints, declared as part of the logical schema and enforced by the DBMS for all applications. In general, constraints are expressed using relational comparison operators, of which just one, "is subset of" (⊆), is theoretically sufficient. In practice, several useful shorthands are expected to be available, of which the most important are candidate key (really, superkey) and foreign key constraints.

Interpretation

To fully appreciate the relational model of data it is essential to understand the intended interpretation of a relation.

The body of a relation is sometimes called its extension. This is because it is to be interpreted as a representation of the extension of some predicate, this being the set of true propositions that can be formed by replacing each free variable in that predicate by a name (a term that designates something).

There is a one-to-one correspondence between the free variables of the predicate and the attribute names of the relation heading. Each tuple of the relation body provides attribute values to instantiate the predicate by substituting each of its free variables. The result is a proposition that is deemed, on account of the appearance of the tuple in the relation body, to be true. Contrariwise, every tuple whose heading conforms to that of the relation but which does not appear in the body is deemed to be false. This assumption is known as the closed world assumption.

For a formal exposition of these ideas, see the section Set Theory Formulation, below.

Page 45: dbms

Application to databases

A type as used in a typical relational database might be the set of integers, the set of character strings, the set of dates, or the two boolean values true and false, and so on. The corresponding type names for these types might be the strings "int", "char", "date", "boolean", etc. It is important to understand, though, that relational theory does not dictate what types are to be supported; indeed, nowadays provisions are expected to be available for user-defined types in addition to the built-in ones provided by the system.

Attribute is the term used in the theory for what is commonly referred to as a column. Similarly, table is commonly used in place of the theoretical term relation (though in SQL the term is by no means synonymous with relation). A table data structure is specified as a list of column definitions, each of which specifies a unique column name and the type of the values that are permitted for that column. An attribute value is the entry in a specific column and row, such as "John Doe" or "35".

A tuple is basically the same thing as a row, except in an SQL DBMS, where the column values in a row are ordered. (Tuples are not ordered; instead, each attribute value is identified solely by the attribute name and never by its ordinal position within the tuple.) An attribute name might be "name" or "age".

A relation is a table structure definition (a set of column definitions) along with the data appearing in that structure. The structure definition is the heading and the data appearing in it is the body, a set of rows. A database relvar (relation variable) is commonly known as a base table. The heading of its assigned value at any time is as specified in the table declaration and its body is that most recently assigned to it by invoking some update operator (typically, INSERT, UPDATE, or DELETE). The heading and body of the table resulting from evaluation of some query are determined by the definitions of the operators used in the expression of that query. (Note that in SQL the heading is not always a set of column definitions as described above, because it is possible for a column to have no name and also for two or more columns to have the same name. Also, the body is not always a set of rows because in SQL it is possible for the same row to appear more than once in the same body.)

SQL and the relational model

SQL, initially pushed as the standard language for relational databases, deviates from the relational model in several places. The current ISO

Page 46: dbms

SQL standard doesn't mention the relational model or use relational terms or concepts. However, it is possible to create a database conforming to the relational model using SQL if one does not use certain SQL features.

The following deviations from the relational model have been noted in SQL. Note that few database servers implement the entire SQL standard and in particular do not allow some of these deviations. Whereas NULL is ubiquitous, for example, allowing duplicate column names within a table or anonymous columns is uncommon.

Duplicate rowsThe same row can appear more than once in an SQL table. The same tuple cannot appear more than once in a relation.

Anonymous columnsA column in an SQL table can be unnamed and thus unable to be referenced in expressions. The relational model requires every attribute to be named and referenceable.

Duplicate column namesTwo or more columns of the same SQL table can have the same name and therefore cannot be referenced, on account of the obvious ambiguity. The relational model requires every attribute to be referenceable.

Column order significanceThe order of columns in an SQL table is defined and significant, one consequence being that SQL's implementations of Cartesian product and union are both noncommutative. The relational model requires there to be no significance to any ordering of the attributes of a relation.

Views without CHECK OPTIONUpdates to a view defined without CHECK OPTION can be accepted but the resulting update to the database does not necessarily have the expressed effect on its target. For example, an invocation of INSERT can be accepted but the inserted rows might not all appear in the view, or an invocation of UPDATE can result in rows disappearing from the view. The relational model requires updates to a view to have the same effect as if the view were a base relvar.

Columnless tables unrecognizedSQL requires every table to have at least one column, but there are two relations of degree zero (of cardinality one and zero) and they are needed to represent extensions of predicates that contain no free variables.

NULLThis special mark can appear instead of a value wherever a value can appear in SQL, in particular in place of a column value in

Page 47: dbms

some row. The deviation from the relational model arises from the fact that the implementation of this ad hoc concept in SQL involves the use of three-valued logic, under which the comparison of NULL with itself does not yield true but instead yields the third truth value, unknown; similarly the comparison NULL with something other than itself does not yield false but instead yields unknown. It is because of this behaviour in comparisons that NULL is described as a mark rather than a value. The relational model depends on the law of excluded middle under which anything that is not true is false and anything that is not false is true; it also requires every tuple in a relation body to have a value for every attribute of that relation. This particular deviation is disputed by some if only because E.F. Codd himself eventually advocated the use of special marks and a 4-valued logic, but this was based on his observation that there are two distinct reasons why one might want to use a special mark in place of a value, which led opponents of the use of such logics to discover more distinct reasons and at least as many as 19 have been noted, which would require a 21-valued logic. SQL itself uses NULL for several purposes other than to represent "value unknown". For example, the sum of the empty set is NULL, meaning zero, the average of the empty set is NULL, meaning undefined, and NULL appearing in the result of a LEFT JOIN can mean "no value because there is no matching row in the right-hand operand".

ConceptsSQL uses concepts "table", "column", "row" instead of "relvar", "attribute", "tuple". These are not merely differences in terminology. For example, a "table" may contain duplicate rows, whereas the same tuple cannot appear more than once in a relation.

Relational operations

Users (or programs) request data from a relational database by sending it a query that is written in a special language, usually a dialect of SQL. Although SQL was originally intended for end-users, it is much more common for SQL queries to be embedded into software that provides an easier user interface. Many web sites, such as Wikipedia, perform SQL queries when generating pages.

In response to a query, the database returns a result set, which is just a list of rows containing the answers. The simplest query is just to return all the rows from a table, but more often, the rows are filtered in some way to return just the answer wanted.

Page 48: dbms

Often, data from multiple tables are combined into one, by doing a join. Conceptually, this is done by taking all possible combinations of rows (the Cartesian product), and then filtering out everything except the answer. In practice, relational database management systems rewrite ("optimize") queries to perform faster, using a variety of techniques.

There are a number of relational operations in addition to join. These include project (the process of eliminating some of the columns), restrict (the process of eliminating some of the rows), union (a way of combining two tables with similar structures), difference (which lists the rows in one table that are not found in the other), intersect (which lists the rows found in both tables), and product (mentioned above, which combines each row of one table with each row of the other). Depending on which other sources you consult, there are a number of other operators - many of which can be defined in terms of those listed above. These include semi-join, outer operators such as outer join and outer union, and various forms of division. Then there are operators to rename columns, and summarizing or aggregating operators, and if you permit relation values as attributes (RVA - relation-valued attribute), then operators such as group and ungroup. The SELECT statement in SQL serves to handle all of these except for the group and ungroup operators.

The flexibility of relational databases allows programmers to write queries that were not anticipated by the database designers. As a result, relational databases can be used by multiple applications in ways the original designers did not foresee, which is especially important for databases that might be used for a long time (perhaps several decades). This has made the idea and implementation of relational databases very popular with businesses.

Database normalization

Main article: Database normalization

Relations are classified based upon the types of anomalies to which they're vulnerable. A database that's in the first normal form is vulnerable to all types of anomalies, while a database that's in the domain/key normal form has no modification anomalies. Normal forms are hierarchical in nature. That is, the lowest level is the first normal form, and the database cannot meet the requirements for higher level normal forms without first having met all the requirements of the lesser normal forms.[7]

Page 49: dbms

Examples

Database

An idealized, very simple example of a description of some relvars and their attributes:

Customer(Customer ID, Tax ID, Name, Address, City, State, Zip, Phone)

Order(Order No, Customer ID, Invoice No, Date Placed, Date Promised, Terms, Status)

Order Line(Order No, Order Line No, Product Code, Qty) Invoice(Invoice No, Customer ID, Order No, Date, Status) Invoice Line(Invoice No, Invoice Line No, Product Code, Qty

Shipped) Product(Product Code, Product Description)

In this design we have six relvars: Customer, Order, Order Line, Invoice, Invoice Line and Product. The bold, underlined attributes are candidate keys. The non-bold, underlined attributes are foreign keys.

Usually one candidate key is arbitrarily chosen to be called the primary key and used in preference over the other candidate keys, which are then called alternate keys.

A candidate key is a unique identifier enforcing that no tuple will be duplicated; this would make the relation into something else, namely a bag, by violating the basic definition of a set. Both foreign keys and superkeys (which includes candidate keys) can be composite, that is, can be composed of several attributes. Below is a tabular depiction of a relation of our example Customer relvar; a relation can be thought of as a value that can be attributed to a relvar.

Customer relation

Customer ID Tax ID Name Address [More fields....]==================================================================================================1234567890 555-5512222 Munmun 323 Broadway ...2223344556 555-5523232 SS4 Vegeta 1200 Main Street ...3334445563 555-5533323 Ekta 871 1st Street ...4232342432 555-5325523 E. F. Codd 123 It Way ...

If we attempted to insert a new customer with the ID 1234567890, this would violate the design of the relvar since Customer ID is a primary key and we already have a customer 1234567890. The DBMS must

Page 50: dbms

reject a transaction such as this that would render the database inconsistent by a violation of an integrity constraint.

Foreign keys are integrity constraints enforcing that the value of the attribute set is drawn from a candidate key in another relation. For example in the Order relation the attribute Customer ID is a foreign key. A join is the operation that draws on information from several relations at once. By joining relvars from the example above we could query the database for all of the Customers, Orders, and Invoices. If we only wanted the tuples for a specific customer, we would specify this using a restriction condition.

If we wanted to retrieve all of the Orders for Customer 1234567890, we could query the database to return every row in the Order table with Customer ID 1234567890 and join the Order table to the Order Line table based on Order No.

There is a flaw in our database design above. The Invoice relvar contains an Order No attribute. So, each tuple in the Invoice relvar will have one Order No, which implies that there is precisely one Order for each Invoice. But in reality an invoice can be created against many orders, or indeed for no particular order. Additionally the Order relvar contains an Invoice No attribute, implying that each Order has a corresponding Invoice. But again this is not always true in the real world. An order is sometimes paid through several invoices, and sometimes paid without an invoice. In other words there can be many Invoices per Order and many Orders per Invoice. This is a many-to-many relationship between Order and Invoice (also called a non-specific relationship). To represent this relationship in the database a new relvar should be introduced whose role is to specify the correspondence between Orders and Invoices:

OrderInvoice(Order No,Invoice No)

Now, the Order relvar has a one-to-many relationship to the OrderInvoice table, as does the Invoice relvar. If we want to retrieve every Invoice for a particular Order, we can query for all orders where Order No in the Order relation equals the Order No in OrderInvoice, and where Invoice No in OrderInvoice equals the Invoice No in Invoice.

Set-theoretic formulation

Basic notions in the relational model are relation names and attribute names. We will represent these as strings such as "Person" and

Page 51: dbms

"name" and we will usually use the variables and a,b,c to range over them. Another basic notion is the set of atomic values that contains values such as numbers and strings.

Our first definition concerns the notion of tuple, which formalizes the notion of row or record in a table:

TupleA tuple is a partial function from attribute names to atomic values.

HeaderA header is a finite set of attribute names.

ProjectionThe projection of a tuple t on a finite set of attributes A is

.

The next definition defines relation which formalizes the contents of a table as it is defined in the relational model.

RelationA relation is a tuple (H,B) with H, the header, and B, the body, a set of tuples that all have the domain H.

Such a relation closely corresponds to what is usually called the extension of a predicate in first-order logic except that here we identify the places in the predicate with attribute names. Usually in the relational model a database schema is said to consist of a set of relation names, the headers that are associated with these names and the constraints that should hold for every instance of the database schema.

Relation universeA relation universe U over a header H is a non-empty set of relations with header H.

Relation schemaA relation schema (H,C) consists of a header H and a predicate C(R) that is defined for all relations R with header H. A relation satisfies a relation schema (H,C) if it has header H and satisfies C.

Key constraints and functional dependencies

One of the simplest and most important types of relation constraints is the key constraint. It tells us that in every instance of a certain

Page 52: dbms

relational schema the tuples can be identified by their values for certain attributes.

SuperkeyA superkey is written as a finite set of attribute names.A superkey K holds in a relation (H,B) if:

and there exist no two distinct tuples such that t1[K]

= t2[K].

A superkey holds in a relation universe U if it holds in all relations in U.Theorem: A superkey K holds in a relation universe U over H if and only if and holds in U.

Candidate keyA superkey K holds as a candidate key for a relation universe U if it holds as a superkey for U and there is no proper subset of K that also holds as a superkey for U.

Functional dependencyA functional dependency (FD for short) is written as for X,Y finite sets of attribute names.A functional dependency holds in a relation (H,B) if:

and

tuples ,

A functional dependency holds in a relation universe U if it holds in all relations in U.

Trivial functional dependencyA functional dependency is trivial under a header H if it holds in all relation universes over H.Theorem: An FD is trivial under a header H if and only if

.Closure

Armstrong's axioms: The closure of a set of FDs S under a header H, written as S + , is the smallest superset of S such that:

(reflexivity) (transitivi

ty) and

(augmentation)

Theorem: Armstrong's axioms are sound and complete; given a header H and a set S of FDs that only contain subsets of H,

Page 53: dbms

if and only if holds in all relation universes over H in which all FDs in S hold.

CompletionThe completion of a finite set of attributes X under a finite set of FDs S, written as X + , is the smallest superset of X such that:

The completion of an attribute set can be used to compute if a certain dependency is in the closure of a set of FDs.Theorem: Given a set S of FDs, if and only if

.Irreducible cover

An irreducible cover of a set S of FDs is a set T of FDs such that: S + = T +

there exists no such that S + = U +

is a singleton set and

.

Algorithm to derive candidate keys from functional dependencies

INPUT: a set S of FDs that contain only subsets of a header H OUTPUT: the set C of superkeys that hold as candidate keys in all relation universes over H in which all FDs in S hold begin C := ∅; // found candidate keys Q := { H }; // superkeys that contain candidate keys while Q <> ∅ do let K be some element from Q; Q := Q - { K }; minimal := true; for each X->Y in S do K' := (K - Y) ∪ X; // derive new superkey if K' ⊂ K then minimal := false; Q := Q ∪ { K' }; end if end for if minimal and there is not a subset of K in C then remove all supersets of K from C; C := C ∪ { K }; end if end while end

See also

Domain relational calculus Relation

Page 54: dbms

Life cycle of a relational database

List of relational database management systems

Query language o Database query language

o Information retrieval query language

Relational database Relational database

management system The Third Manifesto (TTM) TransRelational model

Tuple-versioning

Query languageQuery languages are computer languages used to make queries into databases and information systems.

Broadly, query languages can be classified according to whether they are database query languages or information retrieval query languages. Examples include:

.QL is a proprietary object-oriented query language for querying relational databases;

Common Query Language (CQL) a formal language for representing queries to information retrieval systems such as as web indexes or bibliographic catalogues.

CODASYL ; CxQL is the Query Language used for writing and customizing

queries on CxAudit by Checkmarx. D is a query language for truly relational database management

systems (TRDBMS); DMX is a query language for Data Mining models; Datalog is a query language for deductive databases; ERROL is a query language over the Entity-relationship model

(ERM) which mimics major Natural language constructs (of the English language and possibly other languages). It is especially tailored for relational databases;

Gellish English is a language that can be used for queries in Gellish English Databases [1], for dialogues (requests and responses) as well as for information modeling and knowledge modeling;

ISBL is a query language for PRTV, one of the earliest relational database management systems;

LDAP is an application protocol for querying and modifying directory services running over TCP/IP.

Page 55: dbms

MQL is a cheminformatics query language for a substructure search allowing beside nominal properties also numerical properties;

MDX is a query language for OLAP databases; OQL is Object Query Language; OCL (Object Constraint Language). Despite its name, OCL is also

an object query language and a OMG standard. OPath , intended for use in querying WinFS Stores; Poliqarp Query Language is a special query language designed to

analyze annotated text. Used in the Poliqarp search engine; QUEL is a relational database access language, similar in most

ways to SQL; SMARTS is the cheminformatics standard for a substructure

search; SPARQL is a query language for RDF graphs; SQL is a well known query language for relational databases; SuprTool is a proprietary query language for SuprTool, a

database access program used for accessing data in Image/SQL (TurboIMAGE) and Oracle databases;

TMQL Topic Map Query Language is a query language for Topic Maps;

XQuery is a query language for XML data sources; XPath is a language for navigating XML documents; XSQL

Hierarchical modelHierarchical model redirects here. For the statistics usage, see hierarchical linear modeling.

A hierarchical data model is a data model in which the data is organized into a tree-like structure. The structure allows repeating information using parent/child relationships: each parent can have many children but each child only has one parent. All attributes of a specific record are listed under an entity type.

Page 56: dbms

Example of a Hierarchical Model.

In a database, an entity type is the equivalent of a table; each individual record is represented as a row and an attribute as a column. Entity types are related to each other using 1: N mapping, also known as one-to-many relationships. The most recognized example of hierarchical model database is IMS designed by IBM.

History

Prior to the development of the first database management system (DBMS), access to data was provided by application programs that accessed flat files. Data integrity problems and the inability of such file processing systems to reprsent logical data relationships lead to the first data model: the hierarchical data model. This model, which was implemented primarily by IBM's Information Management System (IMS) only allows one-to-one or one-to-many relationships between entities. Any entity at the many end of the relationship can be related only to one entity at the one end.[1]

A relational database implementation of this type of data model was first discussed in publication form in 1992[2] (see also nested set model).

Example

An example of a hierarchical data model would be if an organization had records of employees in a table (entity type) called "Employees". In the table there would be attributes/columns such as First Name, Last Name, Job Name and Wage. The company also has data about the employee’s children in a separate table called "Children" with attributes such as First Name, Last Name, and date of birth. The Employee table represents a parent segment and the Children table

Page 57: dbms

represents a Child segment. These two segments form a hierarchy where an employee may have many children, but each child may only have one parent.

Consider the following structure:

EmpNo Designation ReportsTo10 Director20 Senior Manager 1030 Typist 2040 Programmer 20

In this, the "child" is the same type as the "parent". The hierarchy stating EmpNo 10 is boss of 20, and 30 and 40 each report to 20 is represented by the "ReportsTo" column. In Relational database terms, the ReportsTo column is a foreign key referencing the EmpNo column. If the "child" data type were different, it would be in a different table, but there would still be a foreign key referencing the EmpNo column of the employees table.

This simple model is commonly known as the adjacency list model, and was introduced by Dr. Edgar F. Codd after initial criticisms surfaced that the relational model could not model hierarchical data.

Network modelThe network model is a database model conceived as a flexible way of representing objects and their relationships.

Example of a Network Model.

Page 58: dbms

The network model's original inventor was Charles Bachman, and it was developed into a standard specification published in 1969 by the CODASYL Consortium.

Overview

Where the hierarchical model structures data as a tree of records, with each record having one parent record and many children, the network model allows each record to have multiple parent and child records, forming a lattice structure.

The chief argument in favour of the network model, in comparison to the hierarchic model, was that it allowed a more natural modeling of relationships between entities. Although the model was widely implemented and used, it failed to become dominant for two main reasons. Firstly, IBM chose to stick to the hierarchical model with semi-network extensions in their established products such as IMS and DL/I. Secondly, it was eventually displaced by the relational model, which offered a higher-level, more declarative interface. Until the early 1980s the performance benefits of the low-level navigational interfaces offered by hierarchical and network databases were persuasive for many large-scale applications, but as hardware became faster, the extra productivity and flexibility of the relational model led to the gradual obsolescence of the network model in corporate enterprise usage.

Some Well-known Network Databases

TurboIMAGE IDMS (Integrated Database Management System) RDM Embedded RDM Server

History

In 1969, the Conference on Data Systems Languages (CODASYL) established the first specification of the network database model. This was followed by a second publication in 1971, which became the basis for most implementations. Subsequent work continued into the early 1980s, culminating in an ISO specification, but this had little influence on products.

Page 59: dbms

Entity-relationship model

In software engineering, an Entity-Relationship Model (ERM) is an abstract and conceptual representation of data. Entity-relationship modeling is a database modeling method, used to produce a type of conceptual schema or semantic data model of a system, often a relational database, and its requirements in a top-down fashion.

Diagrams created using this process are called entity-relationship diagrams, or ER diagrams or ERDs for short.

The definitive reference for entity relationship modelling is generally given as Peter Chen's 1976 paper[1]. However, variants of the idea existed previously (see for example A.P.G. Brown[2]) and have been devised subsequently.

Page 60: dbms

Contents

[show]

Overview

The first stage of information system design uses these models during the requirements analysis to describe information needs or the type of information that is to be stored in a database. The data modeling technique can be used to describe any ontology (i.e. an overview and classifications of used terms and their relationships) for a certain universe of discourse (i.e. area of interest). In the case of the design of an information system that is based on a database, the conceptual data model is, at a later stage (usually called logical design), mapped to a logical data model, such as the relational model; this in turn is mapped to a physical model during physical design. Note that sometimes, both of these phases are referred to as "physical design".

There are a number of conventions for entity-relationship diagrams (ERDs). The classical notation is described in the remainder of this article, and mainly relates to conceptual modeling. There are a range of notations more typically employed in logical and physical database design, such as IDEF1X.

Connection

Two related entities

An entity with an attribute

A relationship with an attribute

Primary key

An entity may be defined as a thing which is recognized as being capable of an independent existence and which can be uniquely identified. An entity is an abstraction from the complexities of some

Page 61: dbms

domain. When we speak of an entity we normally speak of some aspect of the real world which can be distinguished from other aspects of the real world.[3]

An entity may be a physical object such as a house or a car, an event such as a house sale or a car service, or a concept such as a customer transaction or order. Although the term entity is the one most commonly used, following Chen we should really distinguish between an entity and an entity-type. An entity-type is a category. An entity, strictly speaking, is an instance of a given entity-type. There are usually many instances of an entity-type. Because the term entity-type is somewhat cumbersome, most people tend to use the term entity as a synonym for this term.

Entities can be thought of as nouns. Examples: a computer, an employee, a song, a mathematical theorem. Entities are represented as rectangles.

A relationship captures how two or more entities are related to one another. Relationships can be thought of as verbs, linking two or more nouns. Examples: an owns relationship between a company and a computer, a supervises relationship between an employee and a department, a performs relationship between an artist and a song, a proved relationship between a mathematician and a theorem. Relationships are represented as diamonds, connected by lines to each of the entities in the relationship.

Entities and relationships can both have attributes. Examples: an employee entity might have a Social Security Number (SSN) attribute; the proved relationship may have a date attribute. Attributes are represented as ellipses connected to their owning entity sets by a line.

Every entity (unless it is a weak entity) must have a minimal set of uniquely identifying attributes, which is called the entity's primary key.

Entity-relationship diagrams don't show single entities or single instances of relations. Rather, they show entity sets and relationship sets. Example: a particular song is an entity. The collection of all songs in a database is an entity set. The eaten relationship between a child and her lunch is a single relationship. The set of all such child-lunch relationships in a database is a relationship set.

Lines are drawn between entity sets and the relationship sets they are involved in. If all entities in an entity set must participate in the relationship set, a thick or double line is drawn. This is called a participation constraint. If each entity of the entity set can participate

Page 62: dbms

in at most one relationship in the relationship set, an arrow is drawn from the entity set to the relationship set. This is called a key constraint. To indicate that each entity in the entity set is involved in exactly one relationship, a thick arrow is drawn.

Alternative diagramming conventions

Two related entities shown using Crow's Foot notation

Chen's notation for entity-relationship modelling uses rectangles to represent entities, and diamonds to represent relationships. This notation is appropriate because Chen's relationships are first-class objects: they can have attributes and relationships of their own.

Page 63: dbms

Alternative conventions, with partly historical meaning are:

IDEF1X [4] The Bachman notation of Charles Bachman The Martin notation of James Martin The (min, max)-notation of Jean-Raymond Abrial in 1974, and The UML standard The EXPRESS

Crow's Foot

One alternative notation, known as "crow's foot" notation, was developed independently: in these diagrams, entities are represented by boxes, and relationships by labelled arcs.

The "Crow's Foot" notation represents relationships with connecting lines between entities, and pairs of symbols at the ends of those lines to represent the cardinality of the relationship. Crow's Foot notation is used in Barker's Notation and in methodologies such as SSADM and Information Engineering.

For a while Chen's notation was more popular in the United States, while Crow's Foot notation was used primarily in the UK, being used in the 1980s by the then-influential consultancy practice CACI. Many of the consultants at CACI (including Barker) subsequently moved to Oracle UK, where they developed the early versions of Oracle's CASE tools; this had the effect of introducing the notation to a wider audience, and it is now used in many tools including System Architect, Visio, PowerDesigner, Toad Data Modeler, DeZign for Databases, OmniGraffle, MySQL Workbench and Dia. Crow's foot notation has the following benefits:

Clarity in identifying the many, or child, side of the relationship, using the crow's foot.

Concise notation for identifying mandatory relationship, using a perpendicular bar, or an optional relation, using an open circle.

Shows a clear and concise notation that identifies all classes

ER diagramming tools

There are many ER diagramming tools. Some of the proprietary ER diagramming tools are Avolution, dbForge Studio for MySQL, DeZign for Databases, ConceptDraw, ER/Studio, ERwin, MEGA International, OmniGraffle, Oracle Designer, PowerDesigner, Rational Rose, SmartDraw, Sparx Enterprise Architect, SQLyog, Toad Data Modeler,

Page 64: dbms

Microsoft Visio, and Visual Paradigm. A freeware ER tool that can generate database and application layer code (webservices) is the RISE Editor.

Some free software ER diagramming tools that can interpret and generate ER models, SQL and do database analysis are StarUML, MySQL Workbench, and SchemaSpy[5]. Some free software diagram tools which can't create ER diagrams but just draw the shapes without having any knowledge of what they mean or generating SQL are Kivio, Dia. Although DIA diagrams can be translated with tedia2sql.

See also

Associative entity Data model Data structure diagram Enhanced Entity-Relationship Model Object Role Modeling Three schema approach Unified Modeling Language Value range structure diagrams

Object-relational databaseAn object-relational database (ORD) or object-relational database management system (ORDBMS) is a database management system (DBMS) similar to a relational database, but with an object-oriented database model: objects, classes and inheritance are directly supported in database schemas and in the query language. In addition, it supports extension of the data model with custom data-types and methods.

Page 65: dbms

Example of a Object-Oriented Database Model.[1]

An object-relational database can be said to provide a middle ground between relational databases and object-oriented databases (OODBMS). In object-relational databases, the approach is essentially that of relational databases: the data resides in the database and is manipulated collectively with queries in a query language; at the other extreme are OODBMSes in which the database is essentially a persistent object store for software written in an object-oriented programming language, with a programming API for storing and retrieving objects, and little or no specific support for querying.

Contents

[show]

[edit] Overview

One aim for the Object-relational database is to bridge the gap between conceptual data modeling techniques such as Entity-relationship diagram (ERD) and object-relational mapping (ORM), which often use classes and inheritance, and relational databases, which do not directly support them.

Another, related, aim is to bridge the gap between relational databases and the object-oriented modeling techniques used in programming languages such as Java, C++ or C#. However, a more popular alternative for achieving such a bridge is to use a standard relational database systems with some form of ORM software.

Whereas traditional RDBMS or SQL-DBMS products focused on the efficient management of data drawn from a limited set of data-types (defined by the relevant language standards), an object-relational DBMS allows software-developers to integrate their own types and the methods that apply to them into the DBMS. ORDBMS technology aims to allow developers to raise the level of abstraction at which they view the problem domain.[clarification needed] This goal is not universally shared; proponents of relational databases often argue that object-oriented specification lowers the abstraction level.

Many SQL ORDBMSs on the market today are extensible with user-defined types (UDT) and custom-written functions (e.g. stored procedures). Some (e.g. Microsoft SQL Server) allow such functions to be written in object-oriented programming languages, but this by itself

Page 66: dbms

doesn't make them object-oriented databases; in an object-oriented database, object orientation is a feature of the data model.

History

Object-relational database management systems grew out of research that occurred in the early 1990s. That research extended existing relational database concepts by adding object concepts. The researchers aimed to retain a declarative query-language based on predicate calculus as a central component of the architecture. Probably the most notable research project, Postgres (UC Berkeley), spawned two products tracing their lineage to that research: Illustra and PostgreSQL.

In the mid-1990s, early commercial products appeared. These included Illustra[2] (Illustra Information Systems, acquired by Informix Software which was in turn acquired by IBM), Omniscience (Omniscience Corporation, acquired by Oracle Corporation and became the original Oracle Lite), and UniSQL (UniSQL, Inc., acquired by KCOMS). Ukrainian developer Ruslan Zasukhin, founder of Paradigma Software, Inc., developed and shipped the first version of Valentina database in the mid-1990s as a C++ SDK. By the next decade, PostgreSQL had become a commercially viable database and is the basis for several products today which maintain its ORDBMS features.

Computer scientists came to refer to these products as "object-relational database management systems" or ORDBMSs.[3]

Many of the ideas of early object-relational database efforts have largely become incorporated into SQL:1999. In fact, any product that adheres to the object-oriented aspects of SQL:1999 could be described as an object-relational database management product. For example, IBM's DB2, Oracle database, and Microsoft SQL Server, make claims to support this technology and do so with varying degrees of success.

Comparison to RDBMS

An RDBMS might commonly involve SQL statements such as these:

CREATE TABLE Customers ( Id CHAR(12) NOT NULL PRIMARY KEY, Surname VARCHAR(32) NOT NULL, FirstName VARCHAR(32) NOT NULL, DOB DATE NOT NULL ); SELECT InitCap(Surname) || ', ' || InitCap(FirstName)

Page 67: dbms

FROM Customers WHERE Month(DOB) = Month(getdate()) AND Day(DOB) = Day(getdate())

Most current SQL databases allow the crafting of custom functions, which would allow the query to appear as:

SELECT Formal(Id) FROM Customers WHERE Birthday(Id) = Today()

In an object-relational database, one might see something like this, with user-defined data-types and expressions such as BirthDay():

CREATE TABLE Customers ( Id Cust_Id NOT NULL PRIMARY KEY, Name PersonName NOT NULL, DOB DATE NOT NULL ); SELECT Formal( C.Id ) FROM Customers C WHERE BirthDay ( C.DOB ) = TODAY;

The object-relational model can offer another advantage in that the database can make use of the relationships between data to easily collect related records. In an address book application, an additional table would be added to the ones above to hold zero or more addresses for each user. Using a traditional RDBMS, collecting information for both the user and their address requires a "join":

SELECT InitCap(C.Surname) || ', ' || InitCap(C.FirstName), A.city FROM Customers C JOIN Addresses A ON A.Cust_Id=C.Id -- the join WHERE A.city="New York"

The same query in an object-relational database appears more simply:

SELECT Formal( C.Name ) FROM Customers C WHERE C.address.city="New York" -- the linkage is 'understood' by the ORDB

Object modelFrom Wikipedia, the free encyclopedia

Jump to: navigation, search

In computing, object model has two related but distinct meanings:

Page 68: dbms

1. The properties of objects in general, in a specific computer programming language, technology, notation or methodology that uses them. For example, the Java object model, the COM object model, or the object model of OMT. Such object models are usually defined using concepts such as class, message, inheritance, polymorphism, and encapsulation. There is an extensive literature on formalized object models as a subset of the formal semantics of programming languages.

2. A collection of objects or classes through which a program can examine and manipulate some specific parts of its world. In other words, the object-oriented interface to some service or system. Such an interface is said to be the object model of the represented service or system. For example, the Document Object Model (DOM) [1] is a collection of objects that represent a page in a web browser, used by script programs to examine and dynamically change the page. There is a Microsoft Excel object model [2] for controlling Microsoft Excel from another program, and the ASCOM Telescope Driver [3] is an object model for controlling an astronomical telescope.

See also

Object-Oriented Programming Object-oriented analysis and design Object Management Group Domain-driven design

Conceptual schemaFrom Wikipedia, the free encyclopedia

Jump to: navigation, search

A conceptual schema or conceptual data model is a map of concepts and their relationships. This describes the semantics of an organization and represents a series of assertions about its nature. Specifically, it describes the things of significance to an organization (entity classes), about which it is inclined to collect information, and characteristics of (attributes) and associations between pairs of those things of significance (relationships).

Contents

Page 69: dbms

[show]

Overview

Because a conceptual schema represents the semantics of an organization, and not a database design, it may exist on various levels of abstraction. The original ANSI four-schema architecture began with the set of external schemas that each represent one person's view of the world around him or her. These are consolidated into a single conceptual schema that is the superset of all of those external views. A data model can be as concrete as each person's perspective, but this tends to make it inflexible. If that person's world changes, the model must change. Conceptual data models take a more abstract perspective, identifying the fundamental things, of which the things an individual deals with are just examples.

The model does allow for what is called inheritance in object oriented terms. The set of instances of an entity class may be subdivided into entity classes in their own right. Thus, each instance of a sub-type entity class is also an instance of the entity class's super-type. Each instance of the super-type entity class, then is also an instance of one of the sub-type entity classes.

Super-type/sub-type relationships may be exclusive or not. A methodology may require that each instance of a super-type may only be an instance of one sub-type. Similarly, a super-type/sub-type relationship may be exhaustive or not. It is exhaustive if the methodology requires that each instance of a super-type must be an instance of a sub-type.

Example relationships

Each PERSON may be the vendor in one or more ORDERS. Each ORDER must be from one and only one PERSON. PERSON is a sub-type of PARTY. (Meaning that every instance of

PERSON is also an instance of PARTY.) Each Employee may have the supervisor within Employee.

Data structure diagram

Page 70: dbms

Data Structure Diagram.

A data structure diagram (DSD) is a data model or diagram used to describe conceptual data models by providing graphical notations which document entities and their relationships, and the constraints that binds them.

See also

Concept mapping Data modeling Entity-relationship model Object-relationship modelling Object role modeling Knowledge representation Logical data model Mindmap Ontology Physical data model Semantic Web Three schema approach

Semantic data modelFrom Wikipedia, the free encyclopedia

Jump to: navigation, search

Page 71: dbms

Semantic data models.[1]

A semantic data model in software engineering is a data modeling technique to define the meaning of data within the context of its interrelationships with other data. A semantic data model is an abstraction which defines how the stored symbols relate to the real world.[1] A semantic data model is sometimes called a conceptual data model.

Contents

[show]

Overview

The logical data structure of a database management system (DBMS), whether hierarchical, network, or relational, cannot totally satisfy the requirements for a conceptual definition of data because it is limited in scope and biased toward the implementation strategy employed by the DBMS. Therefore, the need to define data from a conceptual view has led to the development of semantic data modeling techniques. That is, techniques to define the meaning of data within the context of its interrelationships with other data. As illustrated in the figure. The real world, in terms of resources, ideas, events, etc., are symbolically defined within physical data stores. A semantic data model is an abstraction which defines how the stored symbols relate to the real world. Thus, the model must be a true representation of the real world.[1]

The overall goal of semantic data models is to capture more meaning of data by integrating relational concepts with more powerful abstraction concepts known from the Artificial Intelligence field. The idea is to provide high level modeling primitives as integral part of a data model in order to facilitate the representation of real world situations.[2]

History

The need for semantic data models was first recognized by the U.S. Air Force in the mid-1970s as a result of the Integrated Computer-Aided Manufacturing (ICAM) Program. The objective of this program was to increase manufacturing productivity through the systematic application of computer technology. The ICAM Program identified a need for better analysis and communication techniques for people

Page 72: dbms

involved in improving manufacturing productivity. As a result, the ICAM Program developed a series of techniques known as the IDEF (ICAM Definition) Methods which included the following:[1]

IDEF0 used to produce a “function model” which is a structured representation of the activities or processes within the environment or system.

IDEF1 used to produce an “information model” which represents the structure and semantics of information within the environment or system.

IDEF2 used to produce a “dynamics model” which represents the time varying behavioral characteristics of the environment or system.

Applications

A semantic data model can be used to serve many purposes. Some key objectives include:[1]

Planning of Data Resources: A preliminary data model can be used to provide an overall view of the data required to run an enterprise. The model can then be analyzed to identify and scope projects to build shared data resources.

Building of Shareable Databases: A fully developed model can be used to define an application independent view of data which can be validated by users and then transformed into a physical database design for any of the various DBMS technologies. In addition to generating databases which are consistent and shareable, development costs can be drastically reduced through data modeling.

Evaluation of Vendor Software: Since a data model actually represents the infrastructure of an organization, vendor software can be evaluated against a company’s data model in order to identify possible inconsistencies between the infrastructure implied by the software and the way the company actually does business.

Integration of Existing Databases: By defining the contents of existing databases with semantic data models, an integrated data definition can be derived. With the proper technology, the resulting conceptual schema can be used to control transaction processing in a distributed database environment. The U.S. Air Force Integrated Information Support System (I2S2) is an experimental development and demonstration of this type of technology applied to a heterogeneous DBMS environment.

Page 73: dbms

IDEF1X is the semantic data modeling technique. It is used to produce a graphical information model which represents the structure and semantics of information within an environment or system. Use of this standard permits the construction of semantic data models which may serve to support the management of data as a resource, the integration of information systems, and the building of computer databases.

See also

Conceptual schema Entity-relationship model Information model Relational Model/Tasmania Three schema approach QuakeSim

Top-down and bottom-up designFrom Wikipedia, the free encyclopedia

  (Redirected from Top-down)Jump to: navigation, search"Top-down" redirects here. For other uses, see Top-down (disambiguation).

Top-down and bottom-up are strategies of information processing and knowledge ordering, mostly involving software, but also other humanistic and scientific theories (see systemics). In practice, they can be seen as a style of thinking and teaching. In many cases top-down is used as a synonym of analysis or decomposition, and bottom-up of synthesis.

A top-down approach is essentially breaking down a system to gain insight into its compositional sub-systems. In a top-down approach an overview of the system is first formulated, specifying but not detailing any first-level subsystems. Each subsystem is then refined in yet greater detail, sometimes in many additional subsystem levels, until the entire specification is reduced to base elements. A top-down model is often specified with the assistance of "black boxes" that make it easier to manipulate. However, black boxes may fail to elucidate

Page 74: dbms

elementary mechanisms or be detailed enough to realistically validate the model.

A bottom-up approach is piecing together systems to give rise to grander systems, thus making the original systems sub-systems of the emergent system. In a bottom-up approach the individual base elements of the system are first specified in great detail. These elements are then linked together to form larger subsystems, which then in turn are linked, sometimes in many levels, until a complete top-level system is formed. This strategy often resembles a "seed" model, whereby the beginnings are small but eventually grow in complexity and completeness. However, "organic strategies" may result in a tangle of elements and subsystems, developed in isolation and subject to local optimization as opposed to meeting a global purpose.

Contents

Computer science

Software development

Part of this section is from the Perl Design Patterns Book.

In the software development process, the top-down and bottom-up approaches play a key role.

Top-down approaches emphasize planning and a complete understanding of the system. It is inherent that no coding can begin until a sufficient level of detail has been reached in the design of at least some part of the system. The Top-Down Approach is done by attaching the stubs in place of the module. This, however, delays testing of the ultimate functional units of a system until significant design is complete. Bottom-up emphasizes coding and early testing, which can begin as soon as the first module has been specified. This approach, however, runs the risk that modules may be coded without having a clear idea of how they link to other parts of the system, and that such linking may not be as easy as first thought. Re-usability of code is one of the main benefits of the bottom-up approach.[citation needed]

Top-down design was promoted in the 1970s by IBM researcher Harlan Mills and Niklaus Wirth. Mills developed structured programming concepts for practical use and tested them in a 1969 project to automate the New York Times morgue index. The engineering and management success of this project led to the spread of the top-down approach through IBM and the rest of the computer industry. Among

Page 75: dbms

other achievements, Niklaus Wirth, the developer of Pascal programming language, wrote the influential paper Program Development by Stepwise Refinement. Since Niklaus Wirth went on to develop languages such as Modula and Oberon (where one could define a module before knowing about the entire program specification), one can infer that top down programming was not strictly what he promoted. Top-down methods were favored in software engineering until the late 1980s, and object-oriented programming assisted in demonstrating the idea that both aspects of top-down and bottom-up programming could be utilized.

Modern software design approaches usually combine both top-down and bottom-up approaches. Although an understanding of the complete system is usually considered necessary for good design, leading theoretically to a top-down approach, most software projects attempt to make use of existing code to some degree. Pre-existing modules give designs a bottom-up flavour. Some design approaches also use an approach where a partially-functional system is designed and coded to completion, and this system is then expanded to fulfill all the requirements for the project.

Programming

Top-down programming is a programming style, the mainstay of traditional procedural languages, in which design begins by specifying complex pieces and then dividing them into successively smaller pieces. Eventually, the components are specific enough to be coded and the program is written. This is the exact opposite of the bottom-up programming approach which is common in object-oriented languages such as C++ or Java.

The technique for writing a program using top-down methods is to write a main procedure that names all the major functions it will need. Later, the programming team looks at the requirements of each of those functions and the process is repeated. These compartmentalized sub-routines eventually will perform actions so simple they can be easily and concisely coded. When all the various sub-routines have been coded the program is done.

By defining how the application comes together at a high level, lower level work can be self-contained. By defining how the lower level objects are expected to integrate into a higher level object, interfaces become clearly defined.

Page 76: dbms

Advantages of top-down programming Separating the low level work from the higher level objects leads

to a modular design. Modular design means development can be self contained. Having "skeleton" code illustrates clearly how low level modules

integrate. Fewer operations errors (to reduce errors, because each module

has to be processed separately, so programmers get large amount of time for processing).

Much less time consuming (each programmer is only involved in a part of the big project).

Very optimized way of processing (each programmer has to apply their own knowledge and experience to their parts (modules), so the project will become an optimized one).

Easy to maintain (if an error occurs in the output, it is easy to identify the errors generated from which module of the entire program).

Disadvantages of top-down programming Functionality either needs to be inserted into low level objects by

making them return "canned answers"—manually constructed objects, similar to what you would specify if you were mocking them in a test, or otherwise functionality will be lacking until development of low level objects is complete.

Bottom-up approach

Pro/ENGINEER WF4.0 proetools.com - Lego Racer Pro/ENGINEER Parts is a good example of bottom-up design because the parts are first created and then assembled without regard to how the parts will work in the assembly.

Page 77: dbms

In a bottom-up approach the individual base elements of the system are first specified in great detail. These elements are then linked together to form larger subsystems, which then in turn are linked, sometimes in many levels, until a complete top-level system is formed. This strategy often resembles a "seed" model, whereby the beginnings are small, but eventually grow in complexity and completeness.

Object-oriented programming (OOP) is a programming paradigm that uses "objects" to design applications and computer programs.

In Mechanical Engineering with software programs such as Pro/ENGINEER and Solidworks users can design products as pieces not part of the wole and later add those pieces together to form assemblies like building LEGOS. Engineers call this piece part design.

This bottom-up approach has one weakness. We need to use a lot of intuition to decide the functionality that is to be provided by the module. If a system is to be built from existing system, this approach is more suitable as it starts from some existing modules.

Parsing

Parsing is the process of analyzing an input sequence (such as that read from a file or a keyboard) in order to determine its grammatical structure. This method is used in the analysis of both natural languages and computer languages, as in a compiler.

Bottom-up parsing is a strategy for analyzing unknown data relationships that attempts to identify the most fundamental units first, and then to infer higher-order structures from them. Top-down parsers, on the other hand, hypothesize general parse tree structures and then consider whether the known fundamental structures are compatible with the hypothesis. See Top-down parsing and Bottom-up parsing.

Nanotechnology

Main article: Nanotechnology

Top-down and bottom-up are two approaches for the manufacture of products. These terms were first applied to the field of nanotechnology by the Foresight Institute in 1989 in order to distinguish between molecular manufacturing (to mass-produce large atomically precise objects) and conventional manufacturing (which can mass-produce large objects that are not atomically precise). Bottom-up approaches seek to have smaller (usually molecular) components built up into

Page 78: dbms

more complex assemblies, while top-down approaches seek to create nanoscale devices by using larger, externally-controlled ones to direct their assembly.

The top-down approach often uses the traditional workshop or microfabrication methods where externally-controlled tools are used to cut, mill, and shape materials into the desired shape and order. Micropatterning techniques, such as photolithography and inkjet printing belong to this category. Bottom-up approaches, in contrast, use the chemical properties of single molecules to cause single-molecule components to (a) self-organize or self-assemble into some useful conformation, or (b) rely on positional assembly. These approaches utilize the concepts of molecular self-assembly and/or molecular recognition. See also Supramolecular chemistry.

Such bottom-up approaches should, broadly speaking, be able to produce devices in parallel and much cheaper than top-down methods, but could potentially be overwhelmed as the size and complexity of the desired assembly increases.

Neuroscience and psychology

An example of top down processing: Even though the second letter in each word is ambiguous, top down processing allows for easy disambiguation based on the context.

These terms are also employed in neuroscience and psychology. The study of visual attention provides an example. If your attention is drawn to a flower in a field, it may be simply that the flower is more visually salient than the surrounding field. The information that caused you to attend to the flower came to you in a bottom-up fashion — your attention was not contingent upon knowledge of the flower; the outside stimulus was sufficient on its own.

Contrast this situation with one in which you are looking for a flower. You have a representation of what you are looking for. When you see the object you are looking for, it is salient. This is an example of the use of top-down information.

In cognitive terms, two thinking approaches are distinguished. "Top down" (or "big chunk") is stereotypically the visionary, or the person who sees the larger picture and overview. Such people focus on the big picture and from that derive the details to support it. "Bottom up" (or

Page 79: dbms

"small chunk") cognition is akin to focusing on the detail primarily, rather than the landscape. The expression "seeing the wood for the trees" references the two styles of cognition.

Management and organization

In management and organizational arenas, the terms "top down" and "bottom up" are used to indicate how decisions are made.

A "top down" approach is one where an executive, decision maker, or other person or body makes a decision. This approach is disseminated under their authority to lower levels in the hierarchy, who are, to a greater or lesser extent, bound by them. For example, a structure in which decisions either are approved by a manager, or approved by his authorised representatives based on the manager's prior guidelines, is top-down management.

A "bottom up" approach is one that works from the grassroots — from a large number of people working together, causing a decision to arise from their joint involvement. A decision by a number of activists, students, or victims of some incident to take action is a "bottom-up" decision.

Positive aspects of top-down approaches include their efficiency and superb overview of higher levels. Also, external effects can be internalized. On the negative side, if reforms are perceived to be imposed ‘from above’, it can be difficult for lower levels to accept them (e.g. Bresser Pereira, Maravall, and Przeworski 1993). Evidence suggests this to be true regardless of the content of reforms (e.g. Dubois 2002). A bottom-up approach allows for more experimentation and a better feeling for what is needed at the bottom.

State organization

Both approaches can be found in the organization of states, this involving political decisions.

In bottom-up organized organizations, e.g. ministries and their subordinate entities, decisions are prepared by experts in their fields, which define, out of their expertise, the policy they deem necessary. If they cannot agree, even on a compromise, they escalate the problem to the next higher hierarchy level, where a decision would be sought. Finally, the highest common principal might have to take the decision. Information is in the debt of the inferior to the superior, which means that the inferior owes information to the superior. In the effect, as soon

Page 80: dbms

as inferiors agree, the head of the organization only provides his “face″ for the decision which his inferiors have agreed upon.

Among several countries, the German political system provides one of the purest forms of a bottom-up approach. The German Federal Act on the Public Service provides that any inferior has to consult and support any superiors, that he or she – only – has to follow “general guidelines" of the superiors, and that he or she would have to be fully responsible for any own act in office, and would have to follow a specific, formal complaint procedure if in doubt of the legality of an order [1]. Frequently, German politicians had to leave office on the allegation that they took wrong decisions because of their resistance to inferior experts' opinions (this commonly being called to be “beratungsresistent", or resistant to consultation, in German). The historical foundation of this approach lies with the fact that, in the 19th century, many politicians used to be noblemen without appropriate education, who more and more became forced to rely on consultation of educated experts, which (in particular after the Prussian reforms of Stein and Hardenberg) enjoyed the status of financially and personally independent, indismissable, and neutral experts as Beamte (public servants under public law).

A similar approach can be found in British police laws, where entitlements of police constables are vested in the constable in person and not in the police as an administrative agency, this leading to the single constable being fully responsible for his or her own acts in office, in particular their legality. The experience of two dictatorships in the country and, after the end of such regimes, emerging calls for the legal responsibility of the “aidees of the aidees" (Helfershelfer) of such regimes also furnished calls for the principle of personal responsibility of any expert for any decision made, this leading to a strengthening of the bottom-up approach, which requires maximum responsibility of the superiors.

In the opposite, the French administration is based on a top-down approach, where regular public servants enjoy no other task than simply to execute decisions made by their superiors. As those superiors also require consultation, this consultation is provided by members of a cabinet, which is distinctive from the regular minstry staff in terms of staff and organization. Those members who are not members of the cabinet are not entitled to make any suggestions or to take any decisions of political dimension.

The advantage of the bottom-up approach is the great level of expertise provided, combined with the motivating experience of any member of the administration to be responsible and finally the

Page 81: dbms

independent “engine" of progress in that field of personal responsibility. A disadvantage is the lack of democratic control and transparency, this leading, from a democratic viewpoint, to the deferment of actual power of policy-making to faceless, if even unknown, public servants. Even the fact that certain politicians might “provide their face" to the actual decisions of their inferiors might not mitigate this effect, but rather strong parliamentary rights of control and influence in legislative procedures (as they do exist in the example of Germany).

The advantage of the top-bottom principle is that political and administrative responsibilities are clearly distinguished from each other, and that responsibility for political failures can be clearly identified with the relevant office holder. Disadvantages are that the system triggers demotivation of inferiors, who know that their ideas to innovative approaches might not be welcome just because of their position, and that the decision-makers cannot make use of the full range of expertise which their inferiors will have collected.

Administrations in dictatorships traditionally work according to a strict top-down approach. As civil servants below the level of the political leadership are discouraged from making suggestions, they use to suffer from the lack of expertise which could be provided by the inferiors, which regularly leads to a breakdown of the system after an few decades. Modern communist states, which the People's Republic of China forms an example of, therefore prefer to define a framework of permissible, or even encouraged, criticism and self-determination by inferiors, which would not affect the major state doctrine, but allows the use of professional and expertise-driven knowledge and the use of it for the decision-making persons in office.

Architectural

Often, the École des Beaux-Arts school of design is said to have primarily promoted top-down design because it taught that an architectural design should begin with a parti, a basic plan drawing of the overall project.

By contrast, the Bauhaus focused on bottom-up design. This method manifested itself in the study of translating small-scale organizational systems to a larger, more architectural scale (as with the woodpanel carving and furniture design).

Ecological

Page 82: dbms

In ecology, top down control refers to when a top predator controls the structure/population dynamics of the ecosystem. The classic example is of kelp forest ecosystems. In such ecosystems, sea otters are a keystone predator. They prey on urchins which in turn eat kelp. When otters are removed, urchin populations grow and reduce the kelp forest creating urchin barrens. In other words, such ecosystems are not controlled by productivity of the kelp but rather a top predator.

Bottom up control in ecosystems refers to ecosystems in which the nutrient supply and productivity and type of primary producers (plants and phytoplankton) control the ecosystem structure. An example would be how plankton populations are controlled by the availability of nutrients. Plankton populations tend to be higher and more complex in areas where upwelling brings nutrients to the surface.

There are many different examples of these concepts. It is not uncommon for populations to be influenced by both types of control.

EntityFrom Wikipedia, the free encyclopedia

Jump to: navigation, searchThis article is about the concept of an entity. For other uses, see Entity (disambiguation).

It has been suggested that this article or section be merged with Object (philosophy). (Discuss)

An entity is something that has a distinct, separate existence, though it need not be a material existence. In particular, abstractions and legal fictions are usually regarded as entities. In general, there is also no presumption that an entity is animate. Entities are used in system developmental models that display communications and internal processing of, say, documents compared to order processing.

An entity could be viewed as a set containing subsets. In philosophy, such sets are said to be abstract objects.

Sometimes, the word entity is used in a general sense of a being, whether or not the referent has material existence; e.g., is often referred to as an entity with no corporeal form, such as a language. It is also often used to refer to ghosts and other spirits. Taken further, entity sometimes refers to existence or being itself. For example, the former U.S. diplomat George F. Kennan once said that "the policy of

Page 83: dbms

the government of the United States is to seek . . . to preserve Chinese territorial and administrative entity."

The word entitative is the adjective form of the noun entity. Something that is entitative is "considered as pure entity; abstracted from all circumstances", that is, regarded as entity alone, apart from attendant circumstances.

Specialized uses

A DBMS entity is either a thing in the modeled world or a drawing element in an ERD.

A SUMO, Entity is the root node and stands for the universal class of individuals.

In VHDL, entity is the keyword for defining a new object. An SGML entity is an abbreviation for some expanded piece of

SGML text. An open systems architecture entity is an active routine within a

layer. In computer games and game engines, entity is a dynamic

object such as a non-player character or item. In HTML, entity is a code snippet (ie: "&reg;" for "Registered

Trademark") which is interpreted by web browsers to display special characters. See List of XML and HTML character entity references.

In law, a legal entity is an entity that is capable of bearing legal rights and obligations, such as a natural person or an artificial person (e.g. business entity or a corporate entity).