Top Banner
Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The user requests the DBMS to perform various operations such as insert, delete, update and retrieval on the database. The components of DBMS perform these requested operations on the database and provide necessary data to the users . Applications: - It can be considered as a user friendly web page where the user enters the requests. Here he simply enters the details that he needs and presses buttons to get the data. End User: - They are the real users of the database. They can be developers, designers, administrator or the actual users of the database. DDL: - Data Definition Language (DDL) is a query fired to create database, schema, tables, mappings etc in the database. These are the commands used to create the objects like tables, indexes in the database for the first time. In other words, they create structure of the database. DDL Compiler: - This part of database is responsible for processing the DDL commands. That means these compiler actually breaks down the command into machine understandable codes. It is also responsible for storing the metadata information like table name, space used by it, number of columns in it, mapping information etc.
28

Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

Aug 24, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

Unit - IV

Application Structure

Structure of DBMS:

DBMS (Database Management System) acts as an interface between the user and the database.

The user requests the DBMS to perform various operations such as insert, delete, update and

retrieval on the database.

The components of DBMS perform these requested operations on the database and provide

necessary data to the users

.

Applications: - It can be considered as a user friendly web page where the user enters the

requests. Here he simply enters the details that he needs and presses buttons to get the

data.

End User: - They are the real users of the database. They can be developers, designers,

administrator or the actual users of the database.

DDL: - Data Definition Language (DDL) is a query fired to create database, schema,

tables, mappings etc in the database. These are the commands used to create the

objects like tables, indexes in the database for the first time. In other words, they create

structure of the database.

DDL Compiler: - This part of database is responsible for processing the DDL

commands. That means these compiler actually breaks down the command into

machine understandable codes. It is also responsible for storing the metadata

information like table name, space used by it, number of columns in it, mapping

information etc.

Page 2: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

DML Compiler: - When the user inserts, deletes, updates or retrieves the record from the

database, he will be sending request which he understands by pressing some buttons.

But for the database to work/understand the request, it should be broken down to

object code. This is done by this compiler. One can imagine this as when a person is

asked some question, how this is broken down into waves to reach the brain!

Query Optimizer: - When user fires some request, he is least bothered how it will be

fired on the database. He is not all aware of database or its way of performance. But

whatever be the request, it should be efficient enough to fetch, insert, update or delete

the data from the database. The query optimizer decides the best way to execute the

user request which is received from the DML compiler. It is similar to selecting the

best nerve to carry the waves to brain!

Stored Data Manager: - This is also known as Database Control System. It is one the

main central system of the database. It is responsible for various tasks

o It converts the requests received from query optimizer to machine

understandable form. It makes actual request inside the database. It is like

fetching the exact part of the brain to answer.

o It helps to maintain consistency and integrity by applying the constraints. That

means, it does not allow inserting / updating / deleting any data if it has child

entry. Similarly it does not allow entering any duplicate value into database

tables.

o It controls concurrent access. If there is multiple users accessing the database at

the same time, it makes sure, all of them see correct data. It guarantees that

there is no data loss or data mismatch happens between the transactions of

multiple users.

o It helps to backup the database and recover data whenever required. Since it is

a huge database and when there is any unexpected exploit of transaction, and

reverting the changes are not easy. It maintains the backup of all data, so that it

can be recovered.

Data Files: - It has the real data stored in it. It can be stored as magnetic tapes, magnetic

disks or optical disks.

Compiled DML: - Some of the processed DML statements (insert, update, delete) are

stored in it so that if there is similar requests, it will be re-used.

Data Dictionary: - It contains all the information about the database. As the name

suggests, it is the dictionary of all the data items. It contains description of all the

tables, view, materialized views, constraints, indexes, triggers etc.

User Interfaces

A user interface is the view of a database interface that is seen by the user. User interfaces are

often graphical or at least partly graphical (GUI - graphical user interface) constructed and offer

tools which make the interaction with the database easier.

Page 3: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

Form-based interfaces

This interface consist of forms which are adapted to the user. He/She can fill in all of the fields

and make new entries to the database or only some of the fields to query the other ones. But

some operations might be restricted by the application.

Form-based user interfaces are wide spread and are a very important means of interacting with a

DBMS. They are easy to use and have the advantage that the user does not need special

knowledge about database languages like SQL.

Text-based interfaces

To be able to administrate the database or for other professional users there are possibilities to

communicate with the DBMS directly in the query language (in code form) via a input/output

window.

We will see this possibility later in the lesson Structured Query Language SQL.

Text-based interfaces are very powerful tools and allow a comprehensive interaction with a

DBMS. However, the use of these is based on active knowledge of the respective database

language.

Page 4: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

GIS Interface

A GIS user interface often integrates features of a database interface. The database interaction

takes place through the combination of different interfaces:

Graphical interaction via a selection on the map

Combination of form-based and text-based interaction (e.g. special Query-Wizards for the easier

creation of database queries)

Page 5: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

Transaction

A transaction can be defined as a group of tasks. A single task is the minimum processing unit

which cannot be divided further.

Let’s take an example of a simple transaction. Suppose a bank employee transfers Rs 500 from

A's account to B's account. This very simple and small transaction involves several low-level

tasks.

A’s Account

Open_Account(A)

Old_Balance = A.balance

New_Balance = Old_Balance - 500

A.balance = New_Balance

Close_Account(A)

B’s Account

Open_Account(B)

Old_Balance = B.balance

Page 6: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

New_Balance = Old_Balance + 500

B.balance = New_Balance

Close_Account(B)

ACID Properties

A transaction is a very small unit of a program and it may contain several lowlevel tasks. A

transaction in a database system must maintain Atomicity, Consistency, Isolation, and Durability

− commonly known as ACID properties − in order to ensure accuracy, completeness, and data

integrity.

Atomicity − This property states that a transaction must be treated as an atomic unit, that is,

either all of its operations are executed or none. There must be no state in a database where a

transaction is left partially completed. States should be defined either before the execution of the

transaction or after the execution/abortion/failure of the transaction.

Consistency − The database must remain in a consistent state after any transaction. No

transaction should have any adverse effect on the data residing in the database. If the database

was in a consistent state before the execution of a transaction, it must remain consistent after the

execution of the transaction as well.

Durability − The database should be durable enough to hold all its latest updates even if the

system fails or restarts. If a transaction updates a chunk of data in a database and commits, then

the database will hold the modified data. If a transaction commits but the system fails before the

data could be written on to the disk, then that data will be updated once the system springs back

into action.

Isolation − In a database system where more than one transaction are being executed

simultaneously and in parallel, the property of isolation states that all the transactions will be

carried out and executed as if it is the only transaction in the system. No transaction will affect

the existence of any other transaction.

States of Transactions

A transaction in a database can be in one of the following states –

Page 7: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

Active − In this state, the transaction is being executed. This is the initial state of every

transaction.

Partially Committed − When a transaction executes its final operation, it is said to be in a

partially committed state.

Failed − A transaction is said to be in a failed state if any of the checks made by the database

recovery system fails. A failed transaction can no longer proceed further.

Aborted − If any of the checks fails and the transaction has reached a failed state, then the

recovery manager rolls back all its write operations on the database to bring the database back to

its original state where it was prior to the execution of the transaction. Transactions in this state

are called aborted. The database recovery module can select one of the two operations after a

transaction aborts −

Re-start the transaction

Kill the transaction

Committed − If a transaction executes all its operations successfully, it is said to be committed.

All its effects are now permanently established on the database system.

Form Events

Events are things that happen to objects, such as the clicking of a command button or the

opening and closing of a form, etc. These events trigger an action to be carried out.

Events occur for forms when you open or close a form, move between forms, or work with data

on a form. An easy way to view the available Events for a Form (Control or Report) is to open

the properties window of the form.

Eg: In Microsoft Access form events can be used in the form of,

Page 8: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

If you open a Form, the following Events occur in the following order:

Open

Load

Resize

Activate

(GotFocus)

Current

If there is no active Control on a Form, the GotFocus event will occur. If there is an Active

Control, the GotFocus Event will not occur. This is because the Focus is on the form rather than

a Control, and therefore this Event can occur if the focus is not on an on object within the form.

On Open Event

The Open Event occurs when a form is opened, but before the first record is displayed.

Therefore, attach code here that you wish to run as soon as the form is opened. The Open event

will not occur when you activate (move to a previously opened form), i.e. if you open a second

form from the first form, then close the second form, the first form’s On Open Event will not

occur as the first form has not been closed and re-opened, it has just been hidden behind the

second form.

If the Form is based on a Query, the Query is run prior to the On Open Event.

On Load Event

Whereas the on Open Event occurs when the form is opened and before the first record is

displayed, the On Load Event occurs when the first record is displayed.

On Resize Event

The Resize Event occurs when a form is opened or whenever the form’s size changes.

Page 9: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

On Activate Event

The Activate Event occurs when a form receives the focus and becomes the active window.

You make a form active by: Opening it; Clicking on form with your mouse, or clicking a control

on the form; Invoking the SetFocus method. The On Activate event can only occur if the form is

visible. If the form is not visible an error will occur.

On GotFocus Event this rarely used Event only occurs when the form gets focus, but only if

there are no visible enabled controls on the form. Which is unlikely.

On Current Event

The On Current event occurs when the focus moves to a new or different record making it the

current record, or when the Form is Refreshed or Requeried.

This Event occurs when a form is opened, whenever the focus leaves one record and moves to

another, and when the Form’s underlying Table or Query is requeried. This event is one of the

more commonly used Events. If you wish to run code whenever a record is displayed, this is the

place to put it.

4.6 Custom Reports

Reports are typically printed on paper, but they are increasingly being created for

direct display on the screen.

Report can be used to format the data and present results from complex analysis.

4.6.1 VB.NET Crystal Reports for Beginners

Open Visual Studio .NET and select a new Visual Basic .NET Project.

Page 10: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

Create a new Crystal Reports for Product table from the above database

crystalDB.

The Product Table has three fields (Product_id,Product_name,Product_price) and

we are showing the whole table data in the Crystal Reports.

From main menu in Visual Studio select PROJECT-->Add New Item . Then

Add New Item dialogue will appear and select Crystal Reports from the dialogue

box.

Select Report type from Crystal Reports gallery.

Page 11: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

Accept the default settings and click OK.

Next step is to select the appropriate connection to your database. Here we are

going to select OLEDB connection for SQL Server

Select OLE DB (ADO) from Create New Connection .

Page 12: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

Select Microsoft OLE DB Provider for SQL Server .

Page 13: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

Next screen is the SQL Server authentication screen . Select your Sql Server name ,

enter userid , password and select yourDatabase Name . Click next , Then the screen shows

OLE DB Property values , leave it as it is , and click finish.

Then you will get your Server name under OLEDB Connectionfrom there select database name

(Crystaldb) and click the tables , then you can see all your tables from your database.

From the tables list select Product table to the right side list .

Page 14: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

Click Next Button

Select all fields from Product table to the right side list .

Page 15: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

Click Finish Button. Then you can see the Crystal Reports designer window . You can arrange

the design according your requirements. Your screen look like the following picture.

Page 16: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

Now the designing part is over and the next step is to call the created Crystal Reports in VB.NET

through Crystal Reports Viewer control .

Select the default form (Form1.vb) you created in VB.NET and drag a button

and CrystalReportViewer control to your form.

Select Form's source code view and put the code on top

Imports CrystalDecisions.CrystalReports.Engine

Put the following source code in the button click event

Next : VB.NET Crystal Reports from multiple tables

Download Source Code

Print Source Code

Imports CrystalDecisions.CrystalReports.Engine

Page 17: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

Public Class Form1

Private Sub Button1_Click(ByVal sender As System.Object,

ByVal e As System.EventArgs) Handles Button1.Click

Dim cryRpt As New ReportDocument

cryRpt.Load("PUT CRYSTAL REPORT PATH HERE\CrystalReport1.rpt")

CrystalReportViewer1.ReportSource = cryRpt

CrystalReportViewer1.Refresh()

End Sub

End Class

NOTES:

cryRpt.Load("PUT CRYSTAL REPORT PATH HERE\CrystalReport1.rpt")

The Crystal Reports is in your project location, there you can seeCrystalReport1.rpt . So give

the full path name of report here.

After you run the source code you will get the report like this.

Page 18: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

4.7 DISTRIBUTED APPLICATIONS

Distributed applications (distributed apps) are applications or software that runs on

multiple computers within a network at the same time and can be stored on servers or

with cloud computing.

These applications interact in order to achieve a specific goal or task.

Traditional applications relied on a single system to run them.

Even in the client-server model, the application software had to run on either the

client, or the server that the client was accessing.

Unlike traditional applications, distributed applications run on multiple systems

simultaneously for a single task or job.

With distributed applications, if a node that is running a particular application goes

down, another node can resume the task.

4.7.1 DISTRIBUTED APPLICATIONS PARADIGM

Distributed applications are broken up into two separate programs:

1. client software

2. server software.

The client software or computer accesses the data from the server

or cloud environment, while the server or cloud processes the data.

Cloud computing can be used instead of servers or hardware to process a distributed

application's data or programs.

4.7.2 Uses of distributed Applications

Distributed applications allow multiple users to access the apps at once.

Many developers, IT professionals or enterprises choose to store distributed apps in

the cloud because of cloud's elasticity and scalability, as well as its ability to handle

large applications or workloads.

4.7.3 Benefits of a Distributed Application

Page 19: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

Scalability-

- The load an application can sustain can be increased by placing extra server

processes in a group;

- Adding machines to the application and redistributing the groups across the

machines;

- replicating a group onto other machines within the application and using load

balancing;

- segmenting a database and using data-dependent routing to reach the groups

dealing with these separate database segments.

Ease of development/maintainability

- The separation of the business application logic into services or components that

communicate through well-defined messages or interfaces allows both

development and maintenance to be similarly separated and so simplified.

Resilience

- When multiple machines are in use and one fails, the remainder can continue

operation.

- Similarly, when multiple server processes are within a group and one fails, the

others are present to perform work.

- Finally, if a machine should break, but there are multiple machines within the

application, these other machines can be used to perform the work of the

application.

Coordination of autonomous actions

- If you have separate applications, you can coordinate autonomous actions among

the applications.

- You can coordinate autonomous actions as a single logical unit of work.

- Autonomous actions are actions that involve multiple server groups and/or

multiple resource manager interfaces.

Page 20: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

4.7.4. Characteristics of Distributing an Application

A distributed application has the following characteristics:

Enlarges the client and/or server model

Establishes multiple server groups

Allows data-dependent partitioning of data

Enables management of multiple resources

Supports a networked model

4.7.5. REAL TIME EXAMPLES OF DISTRIBUTED APPLICATIONS

For example,

1. In an e-commerce platform, each of the computers may be responsible for specific tasks

such as sending and receiving emails about special offers to current customers;

- compiling a list of customers and their purchase history to better target products

to them;

- updating the customer list with new customers who have registered with the

online market;

- accepting product reviews from each patron for future product decision-making;

- accepting various payment methods at checkout; answering customers’ questions

online whether as a person behind the computer or a chatbot; etc.

Each of these tasks will be carried out by one or more systems on the network, but all

systems communicate with each other to ensure that the customer buys and receives the product

that is beneficial to him or her.

2. In the cryptoeconomy, the blockchain used by most cryptocurrencies uses Distributed

Apps to maintain an efficient digital marketplace.

- Rather than the conventional client-server network adopted by most centralized

organizations, blockchains run on a peer-to-peer network where transactional

Page 21: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

information carried out between two parties is recorded and shared across

multiple computers on the network.

- These computers are referred to as nodes.

- Each node acts as an administrator in the Bitcoin markets and joins the network

voluntarily for the opportunity to receive Bitcoins as a reward.

- Each node has a duplicate copy of an original transaction, which gets continually

reconciled by the network.

- So whatever entry that node A has on its record for a Bitcoin transaction between

Jane and John cannot differ from what nodes B, C, D, E, and F have.

- This means that since a version of events can be verifiable with different

computers, a hacker, even though he gets into one system to tweak the

transaction, would need to get into all the systems spread across various

geographical locations to corrupt the recorded data. This feat is impossible,

making the Bitcoin blockchain transparent and incorruptible.

- Also, by storing blocks of information across various nodes on a blockchain

network, the blockchain cannot be brought to ruins by the failure of one system.

When a computer or system fails, the other systems act as backups and keep

running regardless of the down system.

- Once all active nodes have received and verified a transaction as valid,

the block (i.e. the transaction) is added to the chain (i.e. the general ledger) for

public access.

- The ability of all nodes to keep functioning even when one or two nodes drop out

of the network, ensures that users are constantly getting their transactions

recorded and confirmed in an uninterrupted and timely manner.

Data Storage Methods

Databases are stored in file formats, which contain records. At physical level, the actual data is

stored in electromagnetic format on some device. These storage devices can be broadly

categorized into three types −

Page 22: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

Primary Storage − The memory storage that is directly accessible to the CPU comes

under this category. CPU's internal memory (registers), fast memory (cache), and main

memory (RAM) are directly accessible to the CPU, as they are all placed on the

motherboard or CPU chipset. This storage is typically very small, ultra-fast, and volatile.

Primary storage requires continuous power supply in order to maintain its state. In case

of a power failure, all its data is lost.

Secondary Storage − Secondary storage devices are used to store data for future use or

as backup. Secondary storage includes memory devices that are not a part of the CPU

chipset or motherboard, for example, magnetic disks, optical disks (DVD, CD, etc.),

hard disks, flash drives, and magnetic tapes.

Tertiary Storage − Tertiary storage is used to store huge volumes of data. Since such

storage devices are external to the computer system, they are the slowest in speed. These

storage devices are mostly used to take the back up of an entire system. Optical disks

and magnetic tapes are widely used as tertiary storage.

Memory Hierarchy

A computer system has a well-defined hierarchy of memory. A CPU has direct access to it main

memory as well as its inbuilt registers. The access time of the main memory is obviously less

than the CPU speed. To minimize this speed mismatch, cache memory is introduced. Cache

memory provides the fastest access time and it contains data that is most frequently accessed by

the CPU.

The memory with the fastest access is the costliest one. Larger storage devices offer slow speed

and they are less expensive, however they can store huge volumes of data as compared to CPU

registers or cache memory.

Page 23: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

Magnetic Disks

Hard disk drives are the most common secondary storage devices in present computer systems.

These are called magnetic disks because they use the concept of magnetization to store

information. Hard disks consist of metal disks coated with magnetizable material. These disks

are placed vertically on a spindle. A read/write head moves in between the disks and is used to

magnetize or de-magnetize the spot under it. A magnetized spot can be recognized as 0 (zero) or

1 (one).

Hard disks are formatted in a well-defined order to store data efficiently. A hard disk plate has

many concentric circles on it, called tracks. Every track is further divided into sectors. A sector

on a hard disk typically stores 512 bytes of data.

Redundant Array of Independent Disks

RAID or Redundant Array of Independent Disks, is a technology to connect multiple secondary

storage devices and use them as a single storage media.

RAID consists of an array of disks in which multiple disks are connected together to achieve

different goals. RAID levels define the use of disk arrays.

RAID 0

In this level, a striped array of disks is implemented. The data is broken down into blocks and

the blocks are distributed among disks. Each disk receives a block of data to write/read in

parallel. It enhances the speed and performance of the storage device. There is no parity and

backup in Level 0.

RAID 1

RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a copy of

data to all the disks in the array. RAID level 1 is also called mirroring and provides 100%

redundancy in case of a failure.

Page 24: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

RAID 2

RAID 2 records Error Correction Code using Hamming distance for its data, striped on different

disks. Like level 0, each data bit in a word is recorded on a separate disk and ECC codes of the

data words are stored on a different set disks. Due to its complex structure and high cost, RAID

2 is not commercially available.

RAID 3

RAID 3 stripes the data onto multiple disks. The parity bit generated for data word is stored on

a different disk. This technique makes it to overcome single disk failures.

RAID 4

In this level, an entire block of data is written onto data disks and then the parity is generated

and stored on a different disk. Note that level 3 uses byte-level striping, whereas level 4 uses

block-level striping. Both level 3 and level 4 require at least three disks to implement RAID.

RAID 5

Page 25: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

RAID 5 writes whole data blocks onto different disks, but the parity bits generated for data

block stripe are distributed among all the data disks rather than storing them on a different

dedicated disk.

RAID 6

RAID 6 is an extension of level 5. In this level, two independent parities are generated and

stored in distributed fashion among multiple disks. Two parities provide additional fault

tolerance. This level requires at least four disk drives to implement RAID.

Data Clustering and Partitioning

Cluster is a group of objects that belongs to the same class. In other words, similar objects are

grouped in one cluster and dissimilar objects are grouped in another cluster.

What is Clustering?

Clustering is the process of making a group of abstract objects into classes of similar objects.

Requirements of Clustering in Data Mining

Page 26: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

The following points throw light on why clustering is required in data mining −

Scalability − We need highly scalable clustering algorithms to deal with large databases.

Ability to deal with different kinds of attributes − Algorithms should be capable to be applied on

any kind of data such as interval-based (numerical) data, categorical, and binary data.

Discovery of clusters with attribute shape − The clustering algorithm should be capable of

detecting clusters of arbitrary shape. They should not be bounded to only distance measures that

tend to find spherical cluster of small sizes.

High dimensionality − The clustering algorithm should not only be able to handle low-

dimensional data but also the high dimensional space.

Ability to deal with noisy data − Databases contain noisy, missing or erroneous data. Some

algorithms are sensitive to such data and may lead to poor quality clusters.

Interpretability − The clustering results should be interpretable, comprehensible, and usable.

Clustering Methods

Clustering methods can be classified into the following categories −

Partitioning Method

Hierarchical Method

Density-based Method

Grid-Based Method

Model-Based Method

Constraint-based Method

Partitioning

Partitioning is the database process where very large tables are divided into multiple smaller parts. By splitting

a large table into smaller, individual tables, queries that access only a fraction of the data can run faster

because there is less data to scan. The main of goal of partitioning is to aid in maintenance of large tables and

to reduce the overall response time to read and load data for particular SQL operations.

Vertical Partitioning on SQL Server tables

Vertical table partitioning is mostly used to increase SQL Server performance especially in cases where a

query retrieves all columns from a table that contains a number of very wide text or BLOB columns. In this

case to reduce access times the BLOB columns can be split to its own table. Another example is to restrict

access to sensitive data e.g. passwords, salary information etc. Vertical partitioning splits a table into two or

more tables containing different columns:

Page 27: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

Example

CREATE TABLE EmployeeReports

(

ReportID int IDENTITY (1,1) NOT NULL,

ReportName varchar (100),

ReportNumber varchar (20),

ReportDescription varchar (max)

CONSTRAINT EReport_PK PRIMARY KEY CLUSTERED (ReportID)

)

DECLARE @i int

SET @i = 1

BEGIN TRAN

WHILE @i<100000

BEGIN

INSERT INTO EmployeeReports

(

ReportName,

ReportNumber,

ReportDescription

)

VALUES

(

'ReportName',

CONVERT (varchar (20), @i),

REPLICATE ('Report', 1000)

)

SET @i=@i+1

END

COMMIT TRAN

GO

Horizontal Partitioning on SQL Server tables

Horizontal partitioning divides a table into multiple tables that contain the same number of columns, but fewer

rows. For example, if a table contains a large number of rows that represent monthly reports it could be

partitioned horizontally into tables by years, with each table representing all monthly reports for a specific

year. This way queries requiring data for a specific year will only reference the appropriate table. Tables

should be partitioned in a way that queries reference as few tables as possible.

Page 28: Unit - IV Application Structure · Unit - IV Application Structure Structure of DBMS: DBMS (Database Management System) acts as an interface between the user and the database. The

7. Applications of Clustering

Clustering has a large no. of applications spread across various domains. Some of the most

popular applications of clustering are:

Recommendation engines

Market segmentation

Social network analysis

Search result grouping

Medical imaging

Image segmentation

Anomaly detection