DATA MINING TOOLS & ACTIVITIES ppt by me.....

DATA MINING TOOLS & ACTIVITIES

Presented By:

Agenda1. Definition2. Overview3. History4. Evolution5. Scope6. Stages7. Process8. Relationships9. Elements10. Data Warehousing vs Data mining11. Data Mining tools12. Knowledge Discovery in Database13. Advantages/Disadvantages

Definition

“Data mining is the process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques.”

Overview Data mining tools predict future trends and behaviors, allowing

businesses to make proactive, knowledge driven decisions.

Prospective analysis offered by data mining move beyond analyses of past events provided by retrospective tools typical of decision support systems.

Data mining tools can answer business questions that traditionally were too time consuming to resolve.

They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.

Data mining techniques can be implemented rapidly on existing software and hardware platforms to enhance the value of existing information resources, and can be integrated with new products and systems as they are brought on-line.

History Data mining is the evolution of a field with a long history, but the

term itself was only introduced relatively recently, in the 1990s

Statistics are the foundation of most technologies on which data mining is built.

Its roots can be traced back to along three family lines:› Classical statistics› Artificial intelligence› Machine learning

It is finding increasing acceptance in science and business areas which need to analyze large amounts of data to discover trends which they could not otherwise find.

Classical Statistics Classical statistics embrace concepts such as regression analysis,

standard distribution, standard deviation, standard variance, cluster analysis, all of which are used to study data and data relationships.

These are the building blocks with which more advanced statistical analysis are underpinned.

Within the heart of today’s data mining tools and techniques, classical statistical analysis plays a significant role.

Artificial Intelligence (AI)

It is built upon heuristics (method that often rapidly leads to a solution that is usually close to the best possible answer) as opposed to statistics, attempts to apply human-thought-like processing to statistical problems.

Since this approach requires vast computer processing power, it was not practical until the early 1980s, when computers began to offer useful power at reasonable prices.

Certain AI concepts were adopted by some high-end commercial products, such as query optimization modules for Relational Database Management Systems (RDBMS).

Machine Learning Union of statistics and artificial intelligence.

Is an evolution of artificial intelligence because it blends artificial intelligence heuristics with advanced statistical analysis.

Machine learning attempts to let computer programs learn about the data they study, such that programs make different decisions based on the qualities of the studied data, using statistics for fundamental concepts, and adding more advanced AI heuristics and algorithms to achieve its goals.

Evolution of Data MiningEvolutionary

Step Business Question

Enabling Technologies

Product Providers

Purpose

Data Collection(1960)

"What was my total revenue in the last five years?"

Computers, tapes, disks

IBM, CDC Retrospective, static data delivery

Data Access(1980s)

"What were unit sales in New England last March?"

Relational databases (RDBMS), Structured Query Language (SQL), ODBC

Oracle, Sybase, Informix, IBM, Microsoft

Retrospective, dynamic data delivery at record level

Data Warehousing& Decision Support(1990s)

"What were unit sales in New England last March? Drill down to Boston."

On-line analytic processing (OLAP), multidimensional databases, data warehouses

Pilot, Comshare, Arbor, Cognos, Microstrategy

Retrospective, dynamic data delivery at multiple levels

Data Mining (Emerging Today)

"What’s likely to happen to Boston unit sales next month? Why?"

Advanced algorithms, multiprocessor computers, massive databases

Pilot, Lockheed, IBM, SGI, numerous startups (nascent industry)

Prospective, proactive information delivery

Scope of Data Mining Automated prediction of trends and behaviors. A typical

example of a predictive problem is targeted marketing. Data mining uses data on past promotional mailings to identify the targets most likely to maximize return on investment in future mailings. › EX:

forecasting bankruptcy identifying segments of a population likely to respond similarly to given

events. Automated discovery of previously unknown patterns.

Data mining tools sweep through databases and identify previously hidden patterns in one step. › EX:

analysis of retail sales data to identify seemingly unrelated products that are often purchased together (ex beer and diapers).

detecting fraudulent credit card transactions and identifying anomalous data that could represent data entry keying errors.

Stages Stage 1: Exploration

› Data preparation, cleaning and transformation.

Stage 2: Model building and validation› Considering various models and choosing the best one

based on their performance.

Stage 3: Deployment› Using the selected model as best in Stage 2 and applying it

to new data in order to generate predictions or estimates of the expected outcome.

Data Mining Process

Relationships Classes: Stored data is used to locate data in predetermined

groups. For example, a restaurant chain could mine customer purchase data to determine when customers visit and what they typically order. This information could be used to increase traffic by having daily specials.

Clusters: Data items are grouped according to logical relationships or consumer preferences. For example, data can be mined to identify market segments or consumer affinities.

Associations: Data can be mined to identify associations. The beer-diaper example is an example of associative mining.

Sequential patterns: Data is mined to anticipate behavior patterns and trends. For example, an outdoor equipment retailer could predict the likelihood of a backpack being purchased based on a consumer's purchase of sleeping bags and hiking shoes.

Elements Extract, transform, and load transaction data onto the data

warehouse system.

Store and manage the data in a multidimensional database system.

Provide data access to business analysts and information technology professionals.

Analyze the data by application software.

Present the data in a useful format, such as a graph or table.

Date Warehousing vs. Data Mining

Data Warehouse: “is a repository (or archive) of information gathered from multiple sources, stored under a unified schema, at a single site.” (Silberschatz)› Collect data Store in single repository› Allows for easier query development as a single

repository can be queried.

Data Mining:› Analyzing databases or Data Warehouses to discover

patterns about the data to gain knowledge.› Knowledge is power

Data Mining Tools

Data mining tools are software components and theories that allow users to extract information from data. The tools provide individuals and companies with the ability to gather large amounts of data and use it to make determinations about a particular user or groups of users.

Data mining tools can be classified into one of three categories:

1. traditional data mining tools2. dashboards, and3. text-mining tools.

1. Traditional Data Mining Tools.

Help companies establish data patterns and trends by using a number of complex algorithms and techniques.

Some of these tools are installed on the desktop to monitor the data and highlight trends and others capture information residing outside a database.

The majority are available in both Windows and UNIX versions, although some specialize in one operating system only.

While some may concentrate on one database type, most will be able to handle any data using online analytical processing or a similar technology.

http://en.wikipedia.org/wiki/OLAP

2. Dashboards.

Installed in computers to monitor information in a database.

Dashboards reflect data changes and updates onscreen — often in the form of a chart or table — enabling the user to see how the business is performing.

Historical data also can be referenced, enabling the user to see where things have changed (e.g., increase in sales from the same period last year).

This functionality makes dashboards easy to use and particularly appealing to managers who wish to have an overview of the company's performance.

3. Text-mining Tools. Its ability to mine data from different kinds of text — for

example from Microsoft Word and Acrobat PDF documents to simple text files.

These tools scan content and convert the selected data into a format that is compatible with the tool's database, thus providing users with an easy and convenient way of accessing data without the need to open different applications.

Scanned content can be unstructured (i.e., information is scattered almost randomly across the document, including e-mails, Internet pages, audio and video data) or structured (i.e., the data's form and purpose is known, such as content found in a database).

Knowledge Discovery in Database(KDD)

The most prevalent tool used in data mining KDD was developed in 1989 by Gregory Piatetsky-

Shapiro. Users are able to process raw data, mine the data for

information and interpret the various results in the form of information management.

Include information like financials, client lists, policy and procedure documents, shareholder registers, and even electronic copies of contractual agreements with customers and vendors.

With a data mining tool, it is possible to conduct a focused search for data that is needed, rather than having to pore through all the stored data manually.

Stages of KDD

1) SelectionWe select data relevant to some criteria.Eg.: for credit card customers-transactions.

2) PreprocessingUnnecessary information is removed

3) TransformationData is transformed in order to be suitable for data mining.

4) Data MiningExtractions of patterns from data.

5)Interpretation and EvaluationPatterns obtained in data mining stage are converted into

knowledge, whichin turn is used to support decision making.

6)Data VisualizationMakes it possible for the analyst to gain a deeper, more

intuitiveunderstanding of the data.It helps users to examine large volumes of data & detect

patterns visually.

Advantages Historical data can be used to predict future trends

Knowledge about new trends can be used to improve products and services

Extracting knowledge hidden in large volumes of data

Data mining is used in developing models to predict outcomes of future situations.

Disadvantages

Limited information Noise & missing data User interaction & prior knowledge Uncertainty

THANK YOU!!!

DATA MINING TOOLS & ACTIVITIES ppt by me.....

Documents

data relationships

anomalous data

studied data

data mining techniques

dynamic data delivery

large amounts of data

exploration data preparation

s data mining emerging