Unit - IV
Application Structure
Structure of DBMS:
DBMS (Database Management System) acts as an interface between the user and the database.
The user requests the DBMS to perform various operations such as insert, delete, update and
retrieval on the database.
The components of DBMS perform these requested operations on the database and provide
necessary data to the users
.
Applications: - It can be considered as a user friendly web page where the user enters the
requests. Here he simply enters the details that he needs and presses buttons to get the
data.
End User: - They are the real users of the database. They can be developers, designers,
administrator or the actual users of the database.
DDL: - Data Definition Language (DDL) is a query fired to create database, schema,
tables, mappings etc in the database. These are the commands used to create the
objects like tables, indexes in the database for the first time. In other words, they create
structure of the database.
DDL Compiler: - This part of database is responsible for processing the DDL
commands. That means these compiler actually breaks down the command into
machine understandable codes. It is also responsible for storing the metadata
information like table name, space used by it, number of columns in it, mapping
information etc.
DML Compiler: - When the user inserts, deletes, updates or retrieves the record from the
database, he will be sending request which he understands by pressing some buttons.
But for the database to work/understand the request, it should be broken down to
object code. This is done by this compiler. One can imagine this as when a person is
asked some question, how this is broken down into waves to reach the brain!
Query Optimizer: - When user fires some request, he is least bothered how it will be
fired on the database. He is not all aware of database or its way of performance. But
whatever be the request, it should be efficient enough to fetch, insert, update or delete
the data from the database. The query optimizer decides the best way to execute the
user request which is received from the DML compiler. It is similar to selecting the
best nerve to carry the waves to brain!
Stored Data Manager: - This is also known as Database Control System. It is one the
main central system of the database. It is responsible for various tasks
o It converts the requests received from query optimizer to machine
understandable form. It makes actual request inside the database. It is like
fetching the exact part of the brain to answer.
o It helps to maintain consistency and integrity by applying the constraints. That
means, it does not allow inserting / updating / deleting any data if it has child
entry. Similarly it does not allow entering any duplicate value into database
tables.
o It controls concurrent access. If there is multiple users accessing the database at
the same time, it makes sure, all of them see correct data. It guarantees that
there is no data loss or data mismatch happens between the transactions of
multiple users.
o It helps to backup the database and recover data whenever required. Since it is
a huge database and when there is any unexpected exploit of transaction, and
reverting the changes are not easy. It maintains the backup of all data, so that it
can be recovered.
Data Files: - It has the real data stored in it. It can be stored as magnetic tapes, magnetic
disks or optical disks.
Compiled DML: - Some of the processed DML statements (insert, update, delete) are
stored in it so that if there is similar requests, it will be re-used.
Data Dictionary: - It contains all the information about the database. As the name
suggests, it is the dictionary of all the data items. It contains description of all the
tables, view, materialized views, constraints, indexes, triggers etc.
User Interfaces
A user interface is the view of a database interface that is seen by the user. User interfaces are
often graphical or at least partly graphical (GUI - graphical user interface) constructed and offer
tools which make the interaction with the database easier.
Form-based interfaces
This interface consist of forms which are adapted to the user. He/She can fill in all of the fields
and make new entries to the database or only some of the fields to query the other ones. But
some operations might be restricted by the application.
Form-based user interfaces are wide spread and are a very important means of interacting with a
DBMS. They are easy to use and have the advantage that the user does not need special
knowledge about database languages like SQL.
Text-based interfaces
To be able to administrate the database or for other professional users there are possibilities to
communicate with the DBMS directly in the query language (in code form) via a input/output
window.
We will see this possibility later in the lesson Structured Query Language SQL.
Text-based interfaces are very powerful tools and allow a comprehensive interaction with a
DBMS. However, the use of these is based on active knowledge of the respective database
language.
GIS Interface
A GIS user interface often integrates features of a database interface. The database interaction
takes place through the combination of different interfaces:
Graphical interaction via a selection on the map
Combination of form-based and text-based interaction (e.g. special Query-Wizards for the easier
creation of database queries)
Transaction
A transaction can be defined as a group of tasks. A single task is the minimum processing unit
which cannot be divided further.
Let’s take an example of a simple transaction. Suppose a bank employee transfers Rs 500 from
A's account to B's account. This very simple and small transaction involves several low-level
tasks.
A’s Account
Open_Account(A)
Old_Balance = A.balance
New_Balance = Old_Balance - 500
A.balance = New_Balance
Close_Account(A)
B’s Account
Open_Account(B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account(B)
ACID Properties
A transaction is a very small unit of a program and it may contain several lowlevel tasks. A
transaction in a database system must maintain Atomicity, Consistency, Isolation, and Durability
− commonly known as ACID properties − in order to ensure accuracy, completeness, and data
integrity.
Atomicity − This property states that a transaction must be treated as an atomic unit, that is,
either all of its operations are executed or none. There must be no state in a database where a
transaction is left partially completed. States should be defined either before the execution of the
transaction or after the execution/abortion/failure of the transaction.
Consistency − The database must remain in a consistent state after any transaction. No
transaction should have any adverse effect on the data residing in the database. If the database
was in a consistent state before the execution of a transaction, it must remain consistent after the
execution of the transaction as well.
Durability − The database should be durable enough to hold all its latest updates even if the
system fails or restarts. If a transaction updates a chunk of data in a database and commits, then
the database will hold the modified data. If a transaction commits but the system fails before the
data could be written on to the disk, then that data will be updated once the system springs back
into action.
Isolation − In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions will be
carried out and executed as if it is the only transaction in the system. No transaction will affect
the existence of any other transaction.
States of Transactions
A transaction in a database can be in one of the following states –
Active − In this state, the transaction is being executed. This is the initial state of every
transaction.
Partially Committed − When a transaction executes its final operation, it is said to be in a
partially committed state.
Failed − A transaction is said to be in a failed state if any of the checks made by the database
recovery system fails. A failed transaction can no longer proceed further.
Aborted − If any of the checks fails and the transaction has reached a failed state, then the
recovery manager rolls back all its write operations on the database to bring the database back to
its original state where it was prior to the execution of the transaction. Transactions in this state
are called aborted. The database recovery module can select one of the two operations after a
transaction aborts −
Re-start the transaction
Kill the transaction
Committed − If a transaction executes all its operations successfully, it is said to be committed.
All its effects are now permanently established on the database system.
Form Events
Events are things that happen to objects, such as the clicking of a command button or the
opening and closing of a form, etc. These events trigger an action to be carried out.
Events occur for forms when you open or close a form, move between forms, or work with data
on a form. An easy way to view the available Events for a Form (Control or Report) is to open
the properties window of the form.
Eg: In Microsoft Access form events can be used in the form of,
If you open a Form, the following Events occur in the following order:
Open
Load
Resize
Activate
(GotFocus)
Current
If there is no active Control on a Form, the GotFocus event will occur. If there is an Active
Control, the GotFocus Event will not occur. This is because the Focus is on the form rather than
a Control, and therefore this Event can occur if the focus is not on an on object within the form.
On Open Event
The Open Event occurs when a form is opened, but before the first record is displayed.
Therefore, attach code here that you wish to run as soon as the form is opened. The Open event
will not occur when you activate (move to a previously opened form), i.e. if you open a second
form from the first form, then close the second form, the first form’s On Open Event will not
occur as the first form has not been closed and re-opened, it has just been hidden behind the
second form.
If the Form is based on a Query, the Query is run prior to the On Open Event.
On Load Event
Whereas the on Open Event occurs when the form is opened and before the first record is
displayed, the On Load Event occurs when the first record is displayed.
On Resize Event
The Resize Event occurs when a form is opened or whenever the form’s size changes.
On Activate Event
The Activate Event occurs when a form receives the focus and becomes the active window.
You make a form active by: Opening it; Clicking on form with your mouse, or clicking a control
on the form; Invoking the SetFocus method. The On Activate event can only occur if the form is
visible. If the form is not visible an error will occur.
On GotFocus Event this rarely used Event only occurs when the form gets focus, but only if
there are no visible enabled controls on the form. Which is unlikely.
On Current Event
The On Current event occurs when the focus moves to a new or different record making it the
current record, or when the Form is Refreshed or Requeried.
This Event occurs when a form is opened, whenever the focus leaves one record and moves to
another, and when the Form’s underlying Table or Query is requeried. This event is one of the
more commonly used Events. If you wish to run code whenever a record is displayed, this is the
place to put it.
4.6 Custom Reports
Reports are typically printed on paper, but they are increasingly being created for
direct display on the screen.
Report can be used to format the data and present results from complex analysis.
4.6.1 VB.NET Crystal Reports for Beginners
Open Visual Studio .NET and select a new Visual Basic .NET Project.
Create a new Crystal Reports for Product table from the above database
crystalDB.
The Product Table has three fields (Product_id,Product_name,Product_price) and
we are showing the whole table data in the Crystal Reports.
From main menu in Visual Studio select PROJECT-->Add New Item . Then
Add New Item dialogue will appear and select Crystal Reports from the dialogue
box.
Select Report type from Crystal Reports gallery.
Accept the default settings and click OK.
Next step is to select the appropriate connection to your database. Here we are
going to select OLEDB connection for SQL Server
Select OLE DB (ADO) from Create New Connection .
Select Microsoft OLE DB Provider for SQL Server .
Next screen is the SQL Server authentication screen . Select your Sql Server name ,
enter userid , password and select yourDatabase Name . Click next , Then the screen shows
OLE DB Property values , leave it as it is , and click finish.
Then you will get your Server name under OLEDB Connectionfrom there select database name
(Crystaldb) and click the tables , then you can see all your tables from your database.
From the tables list select Product table to the right side list .
Click Next Button
Select all fields from Product table to the right side list .
Click Finish Button. Then you can see the Crystal Reports designer window . You can arrange
the design according your requirements. Your screen look like the following picture.
Now the designing part is over and the next step is to call the created Crystal Reports in VB.NET
through Crystal Reports Viewer control .
Select the default form (Form1.vb) you created in VB.NET and drag a button
and CrystalReportViewer control to your form.
Select Form's source code view and put the code on top
Imports CrystalDecisions.CrystalReports.Engine
Put the following source code in the button click event
Next : VB.NET Crystal Reports from multiple tables
Download Source Code
Print Source Code
Imports CrystalDecisions.CrystalReports.Engine
Public Class Form1
Private Sub Button1_Click(ByVal sender As System.Object,
ByVal e As System.EventArgs) Handles Button1.Click
Dim cryRpt As New ReportDocument
cryRpt.Load("PUT CRYSTAL REPORT PATH HERE\CrystalReport1.rpt")
CrystalReportViewer1.ReportSource = cryRpt
CrystalReportViewer1.Refresh()
End Sub
End Class
NOTES:
cryRpt.Load("PUT CRYSTAL REPORT PATH HERE\CrystalReport1.rpt")
The Crystal Reports is in your project location, there you can seeCrystalReport1.rpt . So give
the full path name of report here.
After you run the source code you will get the report like this.
4.7 DISTRIBUTED APPLICATIONS
Distributed applications (distributed apps) are applications or software that runs on
multiple computers within a network at the same time and can be stored on servers or
with cloud computing.
These applications interact in order to achieve a specific goal or task.
Traditional applications relied on a single system to run them.
Even in the client-server model, the application software had to run on either the
client, or the server that the client was accessing.
Unlike traditional applications, distributed applications run on multiple systems
simultaneously for a single task or job.
With distributed applications, if a node that is running a particular application goes
down, another node can resume the task.
4.7.1 DISTRIBUTED APPLICATIONS PARADIGM
Distributed applications are broken up into two separate programs:
1. client software
2. server software.
The client software or computer accesses the data from the server
or cloud environment, while the server or cloud processes the data.
Cloud computing can be used instead of servers or hardware to process a distributed
application's data or programs.
4.7.2 Uses of distributed Applications
Distributed applications allow multiple users to access the apps at once.
Many developers, IT professionals or enterprises choose to store distributed apps in
the cloud because of cloud's elasticity and scalability, as well as its ability to handle
large applications or workloads.
4.7.3 Benefits of a Distributed Application
Scalability-
- The load an application can sustain can be increased by placing extra server
processes in a group;
- Adding machines to the application and redistributing the groups across the
machines;
- replicating a group onto other machines within the application and using load
balancing;
- segmenting a database and using data-dependent routing to reach the groups
dealing with these separate database segments.
Ease of development/maintainability
- The separation of the business application logic into services or components that
communicate through well-defined messages or interfaces allows both
development and maintenance to be similarly separated and so simplified.
Resilience
- When multiple machines are in use and one fails, the remainder can continue
operation.
- Similarly, when multiple server processes are within a group and one fails, the
others are present to perform work.
- Finally, if a machine should break, but there are multiple machines within the
application, these other machines can be used to perform the work of the
application.
Coordination of autonomous actions
- If you have separate applications, you can coordinate autonomous actions among
the applications.
- You can coordinate autonomous actions as a single logical unit of work.
- Autonomous actions are actions that involve multiple server groups and/or
multiple resource manager interfaces.
4.7.4. Characteristics of Distributing an Application
A distributed application has the following characteristics:
Enlarges the client and/or server model
Establishes multiple server groups
Allows data-dependent partitioning of data
Enables management of multiple resources
Supports a networked model
4.7.5. REAL TIME EXAMPLES OF DISTRIBUTED APPLICATIONS
For example,
1. In an e-commerce platform, each of the computers may be responsible for specific tasks
such as sending and receiving emails about special offers to current customers;
- compiling a list of customers and their purchase history to better target products
to them;
- updating the customer list with new customers who have registered with the
online market;
- accepting product reviews from each patron for future product decision-making;
- accepting various payment methods at checkout; answering customers’ questions
online whether as a person behind the computer or a chatbot; etc.
Each of these tasks will be carried out by one or more systems on the network, but all
systems communicate with each other to ensure that the customer buys and receives the product
that is beneficial to him or her.
2. In the cryptoeconomy, the blockchain used by most cryptocurrencies uses Distributed
Apps to maintain an efficient digital marketplace.
- Rather than the conventional client-server network adopted by most centralized
organizations, blockchains run on a peer-to-peer network where transactional
information carried out between two parties is recorded and shared across
multiple computers on the network.
- These computers are referred to as nodes.
- Each node acts as an administrator in the Bitcoin markets and joins the network
voluntarily for the opportunity to receive Bitcoins as a reward.
- Each node has a duplicate copy of an original transaction, which gets continually
reconciled by the network.
- So whatever entry that node A has on its record for a Bitcoin transaction between
Jane and John cannot differ from what nodes B, C, D, E, and F have.
- This means that since a version of events can be verifiable with different
computers, a hacker, even though he gets into one system to tweak the
transaction, would need to get into all the systems spread across various
geographical locations to corrupt the recorded data. This feat is impossible,
making the Bitcoin blockchain transparent and incorruptible.
- Also, by storing blocks of information across various nodes on a blockchain
network, the blockchain cannot be brought to ruins by the failure of one system.
When a computer or system fails, the other systems act as backups and keep
running regardless of the down system.
- Once all active nodes have received and verified a transaction as valid,
the block (i.e. the transaction) is added to the chain (i.e. the general ledger) for
public access.
- The ability of all nodes to keep functioning even when one or two nodes drop out
of the network, ensures that users are constantly getting their transactions
recorded and confirmed in an uninterrupted and timely manner.
Data Storage Methods
Databases are stored in file formats, which contain records. At physical level, the actual data is
stored in electromagnetic format on some device. These storage devices can be broadly
categorized into three types −
Primary Storage − The memory storage that is directly accessible to the CPU comes
under this category. CPU's internal memory (registers), fast memory (cache), and main
memory (RAM) are directly accessible to the CPU, as they are all placed on the
motherboard or CPU chipset. This storage is typically very small, ultra-fast, and volatile.
Primary storage requires continuous power supply in order to maintain its state. In case
of a power failure, all its data is lost.
Secondary Storage − Secondary storage devices are used to store data for future use or
as backup. Secondary storage includes memory devices that are not a part of the CPU
chipset or motherboard, for example, magnetic disks, optical disks (DVD, CD, etc.),
hard disks, flash drives, and magnetic tapes.
Tertiary Storage − Tertiary storage is used to store huge volumes of data. Since such
storage devices are external to the computer system, they are the slowest in speed. These
storage devices are mostly used to take the back up of an entire system. Optical disks
and magnetic tapes are widely used as tertiary storage.
Memory Hierarchy
A computer system has a well-defined hierarchy of memory. A CPU has direct access to it main
memory as well as its inbuilt registers. The access time of the main memory is obviously less
than the CPU speed. To minimize this speed mismatch, cache memory is introduced. Cache
memory provides the fastest access time and it contains data that is most frequently accessed by
the CPU.
The memory with the fastest access is the costliest one. Larger storage devices offer slow speed
and they are less expensive, however they can store huge volumes of data as compared to CPU
registers or cache memory.
Magnetic Disks
Hard disk drives are the most common secondary storage devices in present computer systems.
These are called magnetic disks because they use the concept of magnetization to store
information. Hard disks consist of metal disks coated with magnetizable material. These disks
are placed vertically on a spindle. A read/write head moves in between the disks and is used to
magnetize or de-magnetize the spot under it. A magnetized spot can be recognized as 0 (zero) or
1 (one).
Hard disks are formatted in a well-defined order to store data efficiently. A hard disk plate has
many concentric circles on it, called tracks. Every track is further divided into sectors. A sector
on a hard disk typically stores 512 bytes of data.
Redundant Array of Independent Disks
RAID or Redundant Array of Independent Disks, is a technology to connect multiple secondary
storage devices and use them as a single storage media.
RAID consists of an array of disks in which multiple disks are connected together to achieve
different goals. RAID levels define the use of disk arrays.
RAID 0
In this level, a striped array of disks is implemented. The data is broken down into blocks and
the blocks are distributed among disks. Each disk receives a block of data to write/read in
parallel. It enhances the speed and performance of the storage device. There is no parity and
backup in Level 0.
RAID 1
RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a copy of
data to all the disks in the array. RAID level 1 is also called mirroring and provides 100%
redundancy in case of a failure.
RAID 2
RAID 2 records Error Correction Code using Hamming distance for its data, striped on different
disks. Like level 0, each data bit in a word is recorded on a separate disk and ECC codes of the
data words are stored on a different set disks. Due to its complex structure and high cost, RAID
2 is not commercially available.
RAID 3
RAID 3 stripes the data onto multiple disks. The parity bit generated for data word is stored on
a different disk. This technique makes it to overcome single disk failures.
RAID 4
In this level, an entire block of data is written onto data disks and then the parity is generated
and stored on a different disk. Note that level 3 uses byte-level striping, whereas level 4 uses
block-level striping. Both level 3 and level 4 require at least three disks to implement RAID.
RAID 5
RAID 5 writes whole data blocks onto different disks, but the parity bits generated for data
block stripe are distributed among all the data disks rather than storing them on a different
dedicated disk.
RAID 6
RAID 6 is an extension of level 5. In this level, two independent parities are generated and
stored in distributed fashion among multiple disks. Two parities provide additional fault
tolerance. This level requires at least four disk drives to implement RAID.
Data Clustering and Partitioning
Cluster is a group of objects that belongs to the same class. In other words, similar objects are
grouped in one cluster and dissimilar objects are grouped in another cluster.
What is Clustering?
Clustering is the process of making a group of abstract objects into classes of similar objects.
Requirements of Clustering in Data Mining
The following points throw light on why clustering is required in data mining −
Scalability − We need highly scalable clustering algorithms to deal with large databases.
Ability to deal with different kinds of attributes − Algorithms should be capable to be applied on
any kind of data such as interval-based (numerical) data, categorical, and binary data.
Discovery of clusters with attribute shape − The clustering algorithm should be capable of
detecting clusters of arbitrary shape. They should not be bounded to only distance measures that
tend to find spherical cluster of small sizes.
High dimensionality − The clustering algorithm should not only be able to handle low-
dimensional data but also the high dimensional space.
Ability to deal with noisy data − Databases contain noisy, missing or erroneous data. Some
algorithms are sensitive to such data and may lead to poor quality clusters.
Interpretability − The clustering results should be interpretable, comprehensible, and usable.
Clustering Methods
Clustering methods can be classified into the following categories −
Partitioning Method
Hierarchical Method
Density-based Method
Grid-Based Method
Model-Based Method
Constraint-based Method
Partitioning
Partitioning is the database process where very large tables are divided into multiple smaller parts. By splitting
a large table into smaller, individual tables, queries that access only a fraction of the data can run faster
because there is less data to scan. The main of goal of partitioning is to aid in maintenance of large tables and
to reduce the overall response time to read and load data for particular SQL operations.
Vertical Partitioning on SQL Server tables
Vertical table partitioning is mostly used to increase SQL Server performance especially in cases where a
query retrieves all columns from a table that contains a number of very wide text or BLOB columns. In this
case to reduce access times the BLOB columns can be split to its own table. Another example is to restrict
access to sensitive data e.g. passwords, salary information etc. Vertical partitioning splits a table into two or
more tables containing different columns:
Example
CREATE TABLE EmployeeReports
(
ReportID int IDENTITY (1,1) NOT NULL,
ReportName varchar (100),
ReportNumber varchar (20),
ReportDescription varchar (max)
CONSTRAINT EReport_PK PRIMARY KEY CLUSTERED (ReportID)
)
DECLARE @i int
SET @i = 1
BEGIN TRAN
WHILE @i<100000
BEGIN
INSERT INTO EmployeeReports
(
ReportName,
ReportNumber,
ReportDescription
)
VALUES
(
'ReportName',
CONVERT (varchar (20), @i),
REPLICATE ('Report', 1000)
)
SET @i=@i+1
END
COMMIT TRAN
GO
Horizontal Partitioning on SQL Server tables
Horizontal partitioning divides a table into multiple tables that contain the same number of columns, but fewer
rows. For example, if a table contains a large number of rows that represent monthly reports it could be
partitioned horizontally into tables by years, with each table representing all monthly reports for a specific
year. This way queries requiring data for a specific year will only reference the appropriate table. Tables
should be partitioned in a way that queries reference as few tables as possible.
7. Applications of Clustering
Clustering has a large no. of applications spread across various domains. Some of the most
popular applications of clustering are:
Recommendation engines
Market segmentation
Social network analysis
Search result grouping
Medical imaging
Image segmentation
Anomaly detection