Introduction to Databases By Dr. Kamal Gulati
Introduction to Databases
By Dr. Kamal Gulati
Terminologies of Databases• Data• Information• Knowledge• Decision• Information Systems• CBIS• Record• Field• Attribute• Domain• Tuples• Record• Database• Table• Relation
• Types of Database• Categories of DBMS• History of DBMS• File Organization • Data Hierarchy• Traditional File Processing• SQL• Oracle• MS Access• IBM DBase 2• Components of DBMS
• Relationship• Management System• Front End• Back End• Views of data• Schema• Components of
Database System• DBMS• Data Independence• Examples of
Database System• Advantages of DBMS• Disadvantages of
DBMS• Properties of
Databases• Architecture of DBMS• Database Model
Data vs. Information • Data consists of raw facts (i.e., a list of the numbers)• Information is a collection of facts organized (or processed)
in such a way that they have additional value (i.e., a list of the class grades based on the exam score– In a way, information is data that has been transformed into
a more useful form –Turning data into information is a process performed to
achieve a defined outcome and requires knowledge
Data• Data: Facts, figures, statistics etc. having no
particular meaning (e.g. 1, ABC, 19 etc).• Refers to a collection of facts usually collected
as the result of experience, observation or experiment, or processes within a computer system, or a set of premises.
• This may consist of numbers, words, or images, particularly as measurements or observations of a set of variables.
• Data is often viewed as a lowest level of abstraction from which information and knowledge are derived
DATA EXAMPLES
Yes, Yes, No, Yes, No, Yes, No, Yes42, 63, 96, 74, 56, 86111192, 111234None of the above data sets have any meaning until they are given a CONTEXT and PROCESSED into a useable form
Collections of dataData may be collected, manipulated and retrieved
in various ways: • Plain text editor - simple editing and retrieval • Word processor - adds tables and simple
calculations • Spreadsheet programmes - adds more
sophisticated calculations • Database Management System (DBMS) - adds
formats, structures, rules, ...
INFORMATION
Data that has been processed within a context to give it meaning
OR
Data that has been processed into a form that gives it meaning
EXAMPLE 1
Yes, Yes, No, Yes, No, Yes, No, Yes, No, Yes, YesRaw Data
ContextResponses to the market
research question – “Would you buy brand x
at price y?”
Information ???Processing
EXAMPLE 2
Raw Data
Context
Information
42, 63, 96, 74, 56, 86
Jaya scores in the six AS/A2 ICT modules
???Processing
EXAMPLE 3
Raw Data
Context
Information
111192, 111234
The previous and current readings of a customer’s
gas meter
???Processing
12
Data• raw facts• no context• just numbers and
text
Information• data with context• processed data• value-added to
data– summarized– organized– analyzed
13
Data vs. InformationExample -1
• Data: 102515• Information:
– 10/25/16 The date of your final exam.– $ 102515 the average starting salary of an
DBA in IT Company .– 102515 Zip Code.
14
Data vs. InformationExample - 2
15
Data• 6.34• 6.45• 6.39• 6.62• 6.57• 6.64• 6.71• 6.82• 7.12• 7.06
SIRIUS SATELLITE RADIO INC.
$5.80
$6.00
$6.20
$6.40
$6.60
$6.80
$7.00
$7.20
1 2 3 4 5 6 7 8 9 10
Last 10 Days
Stoc
k Pr
ice
KNOWLEDGE
Knowledge is the understanding of rules needed to interpret information
“…the capability of understanding the relationship between pieces of information and what to actually do with the information”
KNOWLEDGE EXAMPLESUsing the 3 previous examples:
A Marketing Manager could use this information to decide whether or not to raise or lower price y
Jayas teacher could analyse the results to determine whether it would be worth her re-sitting a module
Looking at the pattern of the customer’s previous gas bills may identify that the figure is abnormally low and they are fiddling the gas meter!!!
SUMMARY
Information Data Context Meaning= ++
Processing
Data – raw facts and figures
Information – data that has been processed (in a context) to give it meaning
20
Data Information KnowledgeData
Information
Summarizing the data
Averaging the data
Selecting part of the data
Graphing the data
Adding context
Adding value
Basic system is composed of 5 components:
21
– Input, Output, Processing, Feedback, Control
• Typically processing helps transform data into information.Input Output
ProcessingRaw Data Information
22
INFORMATION SYSTEMS
Why Do People Need Information?
Individuals - Entertainment and enlightenment
Businesses - Decision making, problem solving and control
Management
Organizations Technology
Inform ationSystems
An information system is a system which assembles, stores, processes, and delivers information relevant to an organization (or to society) in such a way that the information is accessible and useful to those who wish to use it, including managers, staff, clients, and citizens.
An information system is a human activity (social) system which may or may not involve the use of computer systems.
An information systems is a collection of components that collects, processes, stores, analyzes, and disseminates information for a specific purpose.
The major components of a Computer-Based Information System (CBIS) can include :-
(1) Hardware(2) Software(3) Database (4) Network (5 ) Procedures, and (6) People.The system operates in a Social Context, and the software usually
includes application programs which perform specific tasks for users.
Record
• Collection of related data items, e.g. in the above example the three data items had no meaning. But if we organize them in the following way, then they collectively represent meaningful information.
Roll Name Age 1 ABC 19
Table or Relation
• Table or Relation: Collection of related records.
Roll Name Age 1 ABC 192 DEF 223 XYZ 28
The columns of this relation are called Fields, Attributes or Domains. The rows are called Tuples or Records.
Database• Collection of related relations. Consider
the following collection of tablesT1
Roll Name Age1 ABC 192 DEF 22
3 XYZ 28
T2Roll Address1 VA2 CA
3 NY
T3Roll No Quarter1 I2 II
3 I
T4Year HostelI H1
II H2
Relationship…• We now have a collection of 4 tables. They can
be called a “related collection” because we can clearly find out that there are some common attributes existing in a selected pair of tables.
• Because of these common attributes we may combine the data of two or more tables together to find out the complete details of a student. Questions like “Which hostel does the youngest student live in?” can be answered now, although Age and Hostel attributes are in different tables.
Management System
• is a set of rules and procedures which help us to create organize and manipulate the database.
• It also helps us to add, modify delete data items in the database.
• The management system can be either manual or computerized.
Front End Vs Back End
Three Views of Data (ANSI-SPARC)
Schema
• The word schema means arrangement – how we want to arrange things that we have to store.
• Structure of Database
Internal or Physical Schema• The lowest level, called the Internal or Physical
schema, deals with the description of how raw data items (like 1, ABC, KOL, H2 etc.) are stored in the physical storage (Hard Disc, CD, Tape Drive etc.).
• It also describes the data type of these data items, the size of the items in the storage media, the location (physical address) of the items in the storage device and so on.
• This schema is useful for database application developers and database administrator.
Conceptual or Logical Schema
• The middle level is known as the Conceptual or Logical Schema, and deals with the structure of the entire database.
• We are interested with the structure of the database. This means we want to know the information about the attributes of each table, the common attributes in different tables that help them to be combined, what kind of data can be input into these attributes, and so on.
• Conceptual or Logical schema is very useful for database administrators whose responsibility is to maintain the entire database.
External or View Schema• The highest level of abstraction is
the External or View Schema. This is targeted for the end users.
• The database administrator may want to create custom made tables, keeping in mind the specific kind of need for each user.
• These tables are also known as virtual tables, because they have no separate physical existence.
Components of Database System
Is Composed of 5 major parts
1. Hardware2. Software (DBMS) 3. People / Users4. Procedures and 5. Data
DBMS• A "Database Management System" is a software that
defines a database, stores the data, supports a query language, produces reports, and create data entry screens.
• Collection of components that support data acquisition, dissemination, storage, maintenance, retrieval, and formatting.
• The role of a DBMS in a larger system is to allow other
software, or users, to store and retrieve data in a structured way.
Reasons for a DBMS • A DBMS is a software package for defining and
managing a database.
• A ‘real’ database includes
definitions of – field names – data formats (text, binary, integer, etc.) – record structures (fixed-length, pointers, field order, etc.) – file structures (sequential, indexed etc.)
• rules for validating and manipulating data
• A DBMS provides Data independence.
Data Independence• It is the property of the database which
tries to ensure that if we make any change in any level of schema of the database, the schema immediately above it would require minimal or no need of change.
• Example: Construction of Building
Data Independence• One must be able to change storage
mechanisms and formats without having to modify all application programmes.
For example: • method of representation of alphanumeric data
(e.g., changing date format to avoid Y2000 problem)
• method of representation of numeric data (e.g., integer vs. long integer or floating-point)
• units (e.g., metric ) • file structure (sequential, sorted, indexed, etc.)
Data independence can be classified into the following two types:
• 1. Physical Data Independence: any change made in the physical schema, the need to change the logical schema is minimal.
• 2. Logical Data Independence: any change made in the logical schema, the need to change the external schema is minimal.
Example for Physical Data Independence
• You have bought an Audio CD of a recently released film and one of your friends has bought an Audio Cassette of the same film. If we consider the physical schema, they are entirely different.
• The first is digital recording on an optical media, where random access is possible. The second one is magnetic recording on a magnetic media, strictly sequential access. However, how this change is reflected in the logical schema is very interesting.
• For music tracks, the logical schema for both the CD and the Cassette is the title card imprinted on their back. We have information like Track no, Name of the Song, Name of the Artist and Duration of the Track, things which are identical for both the CD and the Cassette. We can clearly say that we have achieved the physical data independence here.
Example for Logical Data Independence
• CD you have bought contains 6 songs, and some of your friends are interested in copying some of those songs (which they like in the film) into their favorite collection.
• One friend wants the songs 1, 2, 4, 5, 6, another wants 1, 3, 4, 5 and another wants 1, 2, 3, 6. Each of these collections can be compared to a view schema for that friend.
• Now by some mistake, a scratch has appeared in the CD and you cannot extract the song 3.
• Obviously, you will have to ask the friends who have song 3 in their proposed collection to alter their view by deleting song 3 from their proposed collection as well.
Example usage of Database System
• Membership and subscription mailing lists• Accounting and bookkeeping information• The data obtained from scientific research• Customer information and Inventory
information • Personal records • Library information
Examples of Business Databases
• Telephone book• Student data• Music• Fingerprint database• Dictionaries
• Customer data• Real estate listings• Hospital/patient data• Inventory
–Barcode scanner keeps inventory in database
Database System Environment
Advantages for DBMS
1. Reduction of Redundancy2. Sharing of Data3. Data Integrity4. Data security
Disadvantages of DBMS1. We have to invest a good amount in acquiring the
hardware, software, installation facilities and training of users.
2. We have to keep regular backups because a failure can occur any time. Taking backup is a lengthy process and the computer system cannot perform any other job at this time.
3. While data security system is a boon for using DBMS, it must be very robust. If someone can bypass the security system then the database would become open to any kind of mishandling.
Properties of databases
• 1. Completeness • 2. Integrity • 3. Flexibility • 4. Efficiency • 5. Usability
1. Completeness
• Ensures that users can access the data they want includes ad hoc queries, which would not be explicitly given as part of a statement of data requirements.
• Database has to support the requirements • It requires the complete understanding of
database structure, relationship and constraint.
2. Integrity
• Ensures that data is both consistent (no contradictory data) and correct (no invalid data), and ensures that users trust the database.
• Database integrity ensures that data entered into the database is accurate, valid, and consistent.
3. Flexibility
• Ensures that a database can evolve (without requiring excessive effort) to satisfy changing user requirements.
• Ability to upgrade or change the functionality of database up to the current need.
4. Efficiency
• Ensures that users do not have unduly long response times when accessing data.
• The database should be able to perform effectively.
• The designer has to choose the right DBMS, the right access path in order to improve the efficiency.
5. Usability
• Ensures that data can be accessed and manipulated in ways which match user requirements.
• The database design significantly impacts the quality and usability of the data.
• A poorly designed database may place the entire organization at risk due to the incomplete or incorrect information.
Client-Server Architecture of DBMS
Types of Data Model
• Object Oriented• Entity Relationship• Functional• Semantic
• Network• Hierarchical• Relational
• Unify• Frame Memory
Database organization / Database Model
Six Models of database organization: 1. Flat (TABULAR) 2. Relational RDBMS3. Hierarchical HDBMS4. Object-oriented OODBMS5. Network NDBMS
Flat databases • These databases are the simplest and consist of
single files stored in a tabular form. These files have no relationships that can be built with other files. Simple spreadsheet programs that allow users to input, query, and manipulate information within a single disk file is an example of a tabular database.
• A single kind of record with a fixed number of fields.
• Notice the repetition of data, and thus an increased chance of errors.
Relational Model• The relational model (RDBMS, Relational database
management system): The data is stored in two- ‐dimensional tables (rows and columns). The data is manipulated based on the relational theory of mathematics.
• For example, a data set containing all the real estate transactions in a town can be grouped by the year the transaction occurred; or it can be grouped by the sale price of the transaction; or it can be grouped by the buyer's last name; and so on.
The Relational Data Model
Hierarchical Model• These databases have a tree like structure
with every node of the tree representing a different tabular file. Each file is related to one another through the link to the file above or below. Lateral links between files are not allowed.
• The data is sorted hierarchically, using a downward tree. This model uses pointers to navigate between stored data. It was the first DBMS model.
Each of the boxes in the diagram represents one database. The top database in the hierarchical model is called the "parent" database. The databases under it are called "child" databases. One "parent" can have many "children," but a "child" can only have one "parent." The child databases are all connected to the parent database via links called "pointers."
A Hierarchical Database for a Human Resources System
Network Model
• Is like the hierarchical model, this model uses pointers toward stored data. However, it does not necessarily use a downward tree structure.
Network Model
• Depicts data logically as many-to-many relationships
Object Based Data Model / Object-oriented Database
• An object database (also object-oriented database) is a database model in which information is represented in the form of objects as used in object-oriented programming.
• The object oriented model defines a database in terms of objects, their properties, and their operations.
• The most commonly object based data models are entity relation, semantic, Object oriented, and functional data models.
The Object-Oriented Model
• The object model (ODBMS, object- oriented ‐database management system)
• The data is stored in the form of objects, which are structures called classes that display the data within. The fields are instances of these classes
Physical Based Data Model
• Physical based data model describes how data is stored in the computer by representing information such as record formats, record orderings, and access paths.
• It is the process of choosing specific storage structures and access paths for the database files to achieve good performance for the various database applications.
TYPES OF DATABASE
• CENTRAIZED
• DISTRIBUTED
DATA BASESERVER
ATM
ATM
ATM
TELECOM LINES,LAN,
CENTRAIZED DATA BASE
DISTRIBUTED DATABASE
DATABASE
SERVER
WORKSTATION
INDIA CHINA
USA
DATABASE
SERVER
Workstation
Data base server
workstation
network
Distributed Database
• Databases can be decentralized either by partitioning or by replicating
• Partitioned database: Database is divided into segments or regions.
• For example, a customer database can be divided into Eastern customers and Western customers, and two separate databases maintained in the two regions.
• RAID
DISTRIBUTED DATABASE CAN BE:
HOMOGENEOUS
HETEROGENEOUS
Various Common of DBMS / Categories of DBMS
Server DBMS• Oracle• Microsoft SQL
Server • IBM DB2• Open Source:
MySQL, Firebird, PostgreSQL
Desktop DBMS • Microsoft Access • FoxPro• Paradox• Approach• FileMaker Pro • Lotus
Benefits of Desktop Database
•Easy Management Simple functionality to modify and maintain the
database•Low Running Cost
No need for extra hardware support No need to hire expertise
•Easy to use No advance technical knowledge is needed Programs are normally very intuitive and easy to
learn.
Benefits of Server Database
• Increase Scalability any element can be upgraded when needed
• Increase Flexibility new technology can be easily integrated into the
system• Increase Accessibility
server can be accessed remotely and across multiple platforms
Contd..
Benefits of Server Database
•Increase performance Different CPU’s process application in parallel
Easier to tune the server machine since the task is only to perform database processing
•Increase Consistency Centralization - access, resources, and data security are
controlled through the server.
Things to Consider to Select DBMS
1. Data Model2. Number of user3. Number of sites4. Cost5. Purpose
Data Model• A set of concepts to describe the structure of a
database and certain constrain that the database should obey.
•Types of data model:– Hierarchy– Relational – Network– Relational– Object-oriented
•Current commercial database used relational data model.• Object oriented – has been implemented but
not had widespread use.
Number of users
•Single user – support only one user at one time
•Multi user – support multiple use at one time
Number of sites•Centralized• Data is stored at a single computer site.
• DBMS can support multiple user, but the DBMS and the database reside totally at a single computer site.
•Distributed• Can have many the actual database and
DBMS software distributed over many sites, connected by a computer network.
Cost• Quite difficult to propose any type of DBMS
based on cost which provide different type of services.
•Open source product : MySQL, PostgrSQL
Purpose•General Purpose
– Does not include many transactions•Special Purpose
– Require many transaction.– When performance is primary consideration, a special
purpose of DBMS can be design.– Online Transaction Processing (OLTP) system which
support large number of concurrent processing without imposing excessive delay
– Example: Airline Reservation System
Comparison between DBMSDBMS Operating
SystemEstimated Price
Transaction Support
Interface Max DB size
Oracle WindowMac Unix
$40000 -$12800
Yes GUISQL
Unlimited
IBM DB2 Window Mac Unix
$25000 -$800000
Yes GUI SQL
512 TB
SQL Server Window Yes GUI SQL
524258 TB
MySQL Window Linux Mac SolarisNetware
Open Source Yes GUI SQL
256 TB
Microsoft Access
Window Package with Microsoft products
GUI SQL
2G
Group Discussion
• You are responsible for selecting a new DBMS product for a group of users in your organization. How should you do about evaluating and selecting the best DBMS product?
The Impact of Information Technology on Work and Society
1969: The Arpanet is introduced, funded by the department of defence.
1970: The first automatic teller machine is introduced.
1971: The first single chip central processing unit was introduced, the Intel 4004.
The first network e-mail message is sent by Ray Tomlinson of Bolt Boranek and Newman.
1972: Lexitron, Wang and VYTEC introduce Word Processing systems.
1973: The Xerox Paulo Alto Research Centre developed the Alto, an experimental computer that uses a graphical user interface and a mouse.
1978: Ron Rivest, Adi Shamir and Leonard Adelman introduce the RSA cipher as a public key cryptosystem.
1979: The first electronic spreadsheet program is introduced.
1981: IBM introduces its first personal computer with an operating system developed by Microsoft.
1983: The switchover to the TCP\IP protocol marks the beginning of the global Internet.
1985: Microsoft releases the Windows operating system.
1989: The world wide web project is proposed to the European Council for Nuclear Research (CERN).
1990: Windows version 3.0 is released bringing a stable graphical user interface to the IBM Personal Computer.
1993:The Mosaic NCSA is developed by the National Centre for Super-computing Applications.
1995:The first full length feature movie created by a computer is released. Toy Story.
Late 1990’s:The emergence of electronic commerce.
Contribution of Database Technology to Society
• Reduced Application Development Time– Less time to create new application using DBMS.– Example: Print report, Retrieve Data
• Flexibility– Allow evolutionary changes to the structure of
database without affecting the stored data and existing application.
Contd…
Contribution of Database Technology to Society
• Availability of Up-to-Date Information– Available to all user– As soon as update apply, it is available to all
users.• Economic of Scale
– DBMS can be shared among various department and activities thus reduced the data redundancy
HISTORY OF DBMS
• Database systems were first developed in the 1960’s.
• They were then mostly used for business applications with large amounts of structured data, typically in the banking, insurance, and airline industries.
• Today, virtually all large corporations use database systems to keep track of customers, suppliers, reservations, orders, deliveries, invoices, employees, etc. As database systems became more versatile, powerful, and user friendly, their use in number of areas.
• For example management information systems (MIS), decision support systems (DSS), ad hoc query systems, inventory control systems, point of sale systems (POS), and more.
File Organization Terms and Concepts
• Bit: Smallest unit of data; binary digit (0,1)
• Byte: Group of bits that represents a single character
• Field: Group of words or a complete number
• Record: Group of related fields
• File: Group of records of same type
Data StorageName Symbol Binary Number of Bytes Equal to
Kilobyte KB 2^10 1,0241024 B
MegabyteMB
2^20 1,048,5761024 KB
GigabyteGB
2^30 1,073,741,8241024 MB
TerabyteTB
2^40 1,099,511,627,7761024 GB
PetabytePB
2^50 1,125,899,906,842,6201024 TB
ExabyteEB
2^60 1,152,921,504,606,840,0001024 PB
ZettabyteZB
2^70 1,180,591,620,717,410,000,0001024 EB
YottabyteYB
2^80 1,208,925,801,182,620,000,000,0001024 ZB
File Organization Terms and Concepts (Continued)
• Database: Group of related files
• Entity: Person, place, thing, event about which information is maintained
• Attribute: Description of a particular entity
• Key field: Identifier field used to retrieve, update, sort a record
The Data Hierarchy
Traditional File Processing
SQL• SQL or structured query language is a
programming language designed for managing data in relational database management system(with RDBMS Concept).
• Its scope include data insert, update, delete, creation & modification.
Oracle
• Oracle DBMS is one of the most widely deployed DBMS. The oracle product has proven itself in many of the largest data stores supporting entire corporations.
• Oracle secures and protects the privacy of sensitive business information, manages all internet content of an organization, reduces the time by analyzing data faster.
MS-Access
• Microsoft access is a relational database management system from Microsoft.
• MS Access stores data in its own format and can also import or link directly to data stored in other applications and databases.
IBM DBase 2
• DBase 2 was the first widely used database management system for microcomputers.
• For handling data dbase 2 provided detailed procedural commands to open and traverse records in data files(USE , SKIP ,GO TOP , GO BOTTOM , REPLACE , STORE)
Components of DBMS:
• Data definition language: Specifies content and structure of database and defines each data element
• Data manipulation language: Used to process data in a database
• Data dictionary: Stores definitions of data elements and data characteristics
Linking Internal Databases to the Web
SQL (Structured Query Language)
• Is a programming language that is used to manage data in relational database’s. Microsoft SQL server is a best example. Microsoft SQL server is a relational database that is used to store and retrieve data by applications either on the same computers or over the network.
Basic features of SQL server
• A relational database is a set of tables containing data fitted into predefined categories.
• Each table contains one or more data categories in columns.
• Each row contains a unique instance of data for the categories defined by the columns.
• User can access data from the database without knowing the structure of the database table.
Limitations for SQL database
• Scalability: Users have to scale relational database on powerful servers that are expensive and difficult to handle. To scale relational database it has to be distributed on to multiple servers. Handling tables across different servers is a chaos.
• Complexity: In SQL server’s data has to fit into tables anyhow. If your data doesn’t fit into tables, then you need to design your database structure that will be complex and again difficult to handle.
NoSQL database
• In the past few years, the”one size fits all“-thinking concerning data stores has been questioned by both, Science and web companies, which has lead to the emergence of a great variety of alternative databases. The movement as well as the new datastores is commonly subsumed under the term NoSQL.
• The basic quality of NoSQL is that, it may not require fixed table schemas, usually avoid join operations, and typically scale horizontally. Academic researchers typically refer to these databases as structured storage, a term that includes classic relational databases as a subset.
• NoSQL database also trades off “ACID” (atomicity, consistency, isolation and durability). NoSQL databases, to varying degrees, even allow for the schema of data to differ from record to record. If there doesn’t exist schema or a table in NoSQL, then how do you visualize the database structure? Well here is the answer
• http://www.thewindowsclub.com/difference-sql-nosql-comparision
Q) Problems with the Traditional File EnvironmentQ) Difference Between DBMS & RDBMSQ) Characteristics of DBMSQ) Disadvantages of Hierarchical and Network DBMS Q) ACID Properties of DBMSQ) Difference between SQL and NoSQL
Class Discussion
Thanks You !
My Areas of Interest are IT Project management, Big Data Analytics, Internet & Web Technology, Social Media Marketing, Management Information System, Database Management System, Networking, Advanced Excel with Visual Basis and Decision Support System.
Feel free to contact me at:
Email: [email protected]
LinkedIn: https://in.linkedin.com/in/drkamalgulati
Dr. Kamal GulatiPh. D., M.Sc. (CS) , M.C.A., M.B.A
Certification in Big Data (Wiley), CCNA, MCP, Brain bench (Windows)