Presentation Title: DATA MINING

Post on 23-Feb-2016

37 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Department of Computer Science Sir Syed University of Engineering & Technology, Karachi-Pakistan. Presentation Title: DATA MINING. Submitted By . Osama Ghulam Mohammad. (2010-CS-20) Noureen Chagani (2010-CS-11) - PowerPoint PPT Presentation

Transcript

Department of Computer Science Sir Syed University of Engineering

&Technology, Karachi-Pakistan.

Presentation Title: DATA MINING

Submitted By Osama Ghulam Mohammad. (2010-CS-20)Noureen Chagani (2010-CS-11)Naveed Usman (2010-CS-23)

TABLE OF CONTENTS

What is data mining ? Data mining consists of five major

elements Why Mine Data?Commercial ViewpointScientific Viewpoint Some of the techniques used for

data mining

What is data mining ?

Data Mining, also known as Knowledge-Discovery in Databases (KDD), is the process of automatically searching large volumes of data for patterns.

It is the process of extraction of knowledge from large datasets.

Extremely large datasets.Useful knowledge that can improve

processes.

Data mining consists of five major elements: Extract, transform, and load

transaction data onto the data warehouse system.

Store and manage the data in a multidimensional database system.

Provide data access to business analysts and information technology professionals.

Analyze the data by application software.

Present the data in a useful format, such as a graph or table.

Why Mine Data? Commercial Viewpoint Lots of data is being collected

and warehoused Web data, e-commerce purchases at department/

grocery stores Bank/Credit Card

transactions Computers have become cheaper and

more powerful Competitive Pressure is Strong

Provide better, customized services for an edge (e.g. in Customer Relationship Management)

Why Mine Data? Scientific Viewpoint Data collected and stored at

enormous speeds (GB/hour). remote sensors on a satellite telescopes scanning the skies microarrays generating gene

expression data scientific simulations

generating terabytes of data Traditional techniques

infeasible for raw data. Data mining may help

scientists . in classifying and segmenting data

Some of the techniques used for data mining are:Artificial neural networks - Neural

networks are useful for pattern recognition or data classification, through a learning process. Non-linear predictive models that learn through training and resemble biological neural networks in structure.

Neural Network Neural Networks map a

set of input-nodes to a set of output-nodes

Number of inputs/outputs is variable

The Network itself is composed of an arbitrary number of nodes with an arbitrary topology

Neural Network

Input 0 Input 1 Input n...

Output 0 Output 1 Output m...

Decision tree

Tree-shaped structures that represent sets of decisions. These decisions generate

rules for the classification of a dataset.

Decision tree (data)height hair eyes classshort blond blue Atall blond brown Btall red blue Ashort dark blue Btall dark blue Btall blond blue Atall dark brown Bshort blond brown B

hair

eyesB

B

A

A

darkred

blond

blue brown

The Nearest neighborhood method

A classification technique that classifies each record based on the records most similar to it in an historical

database.

An important technique for Data Mining is:

CLUSTU

RING

Clustering : (Definition)• Clustering can be considered the most important

unsupervised learning technique; so, as every other problem of this kind, it deals with finding a structure in a collection of unlabeled data.

• Clustering is “the process of organizing objects into groups whose members are similar in some way”.

• A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters.

Clustering

The greater the similarity (or homogeneity) within a group, and the greater the difference between groups, the “better” or more distinct the clustering.

Why clustering?

A few good reasons ...

Simplifications Pattern detection

The K-Means Clustering MethodBasic K-means Algorithm for finding K

clusters:1. Select K points as the initial

centroids.2. Assign all points to the closest

centroid.3. Recompute the centroid of each

cluster.4. Repeat steps 2 and 3 until the

centroids don’t change.

Figure 10a shows the case when the cluster centers coincidewith the circle centers. This is a global minimum. Figure 10b shows a local minima.

Cluster Example

“The key in business is to know something that nobody else knows.”

— Aristotle Onassis “To understand is to perceive patterns.”

— Sir Isaiah Berlin

Thank You

top related