Top Banner
Institut für Softwarewissenschaft - Universität Wien P.Brezany 1 Toward Knowledge Discovery in Databases Attached to Grids Peter Brezany Institute for Software Science University of Vienna E-mail : [email protected]
16

Institut für Softwarewissenschaft - Universität WienP.Brezany 1 Toward Knowledge Discovery in Databases Attached to Grids Peter Brezany Institute for Software.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Institut für Softwarewissenschaft - Universität WienP.Brezany 1 Toward Knowledge Discovery in Databases Attached to Grids Peter Brezany Institute for Software.

Institut für Softwarewissenschaft - Universität Wien

P.Brezany1

Toward Knowledge Discovery inDatabases Attached to Grids

Peter Brezany

Institute for Software Science

University of Vienna

E-mail : [email protected]

Page 2: Institut für Softwarewissenschaft - Universität WienP.Brezany 1 Toward Knowledge Discovery in Databases Attached to Grids Peter Brezany Institute for Software.

Institut für Softwarewissenschaft - Universität Wien

P.Brezany2

Media That Radically Influenced Society

Web

1500sPrinting Press

1840sPenny Post

1850sTelegraph

1920sTelephone

1930sRadio

1990s

1950s TV

20xxGrid

Page 3: Institut für Softwarewissenschaft - Universität WienP.Brezany 1 Toward Knowledge Discovery in Databases Attached to Grids Peter Brezany Institute for Software.

Institut für Softwarewissenschaft - Universität Wien

P.Brezany3

Talk Outline

• Data Mining on the Grid – Background Information

• Application Examples

• Architecture of a Traditional Data Mining System

• GridMiner – A framework for Data Mining on the Grid

• GridMiner Architecture

• Functional and Data Access Model

• Conclusions

Page 4: Institut für Softwarewissenschaft - Universität WienP.Brezany 1 Toward Knowledge Discovery in Databases Attached to Grids Peter Brezany Institute for Software.

Institut für Softwarewissenschaft - Universität Wien

P.Brezany4

Data Mining on the Grid

• Data mining on the Grid (DMG) : finding unknown data patterns in an environment with geographically distributed data and computation.

• Data may be highly heterogeneous with a high update frequency

• A good DMG algorithm analyzes data in a distributed fashion with modest data communication overhead.

• A typical DMG algorithm involves local data analysis followed by the generation of a global data model.

Page 5: Institut für Softwarewissenschaft - Universität WienP.Brezany 1 Toward Knowledge Discovery in Databases Attached to Grids Peter Brezany Institute for Software.

Institut für Softwarewissenschaft - Universität Wien

P.Brezany5

Application Examples

• Finding out the dependency of the emergence of hepatitis-C on the weather patterns: access to a large hepatitis-C DB at one location and an environmental DB at another location.

• 2 major financial organizations want to cooperate. They need to share data patterns relevant to the data mining task, they do not want to share the data since it is sensitive - combining the databases may not be feasible.

• Federating Brain Data Project – Integrating several neuro-science DBs

• A major multi-national corporation wants to analyze the customer transaction records for quickly developing successful business strategies. - It has thousands of establishments through out the world

- Collecting all the data to a centralized data warehouse, followed by analysis using existing commercial data mining software,takes too long.

Page 6: Institut für Softwarewissenschaft - Universität WienP.Brezany 1 Toward Knowledge Discovery in Databases Attached to Grids Peter Brezany Institute for Software.

Institut für Softwarewissenschaft - Universität Wien

P.Brezany6

Telemedical ApplicationsAMG – Austrian Medical Grid

Web

Raw Medical Data

Reconstructed Medical Data

Derived Medical DataDatabase Database

Page 7: Institut für Softwarewissenschaft - Universität WienP.Brezany 1 Toward Knowledge Discovery in Databases Attached to Grids Peter Brezany Institute for Software.

Institut für Softwarewissenschaft - Universität Wien

P.Brezany7

Telemedical Collaboration - Example

A patient living in a remote village has a heart problem.

An EEG is taken by the local doctor and all the patient’s detailsare stored in the doctor’s PC based telemedical system.

MRI and CT scans are taken within different departments of ageneral hospital and stored in the telemedical DB. A consultantcompiles a report and saves it in the DB.

If necessary, in a specialized clinic a 3D ultrasound scan is takenand further report compiled.

Requiring complicated surgery, an external specialist using VirtualReality techniques defines how the surgery should be planned.The resulting operation is placed on video for, e.g., education.

Data mining support/assistance is needed.

Page 8: Institut für Softwarewissenschaft - Universität WienP.Brezany 1 Toward Knowledge Discovery in Databases Attached to Grids Peter Brezany Institute for Software.

Institut für Softwarewissenschaft - Universität Wien

P.Brezany8

Architecture of a Data Mining System

Graphical user interface

Pattern evaluation

Data mining engine

Database or data warehouse server

Knowledge base

Database Datawarehouse

FilteringData cleaning, data integration

Page 9: Institut für Softwarewissenschaft - Universität WienP.Brezany 1 Toward Knowledge Discovery in Databases Attached to Grids Peter Brezany Institute for Software.

Institut für Softwarewissenschaft - Universität Wien

P.Brezany9

On Line Analytical Mining (OLAM)

Page 10: Institut für Softwarewissenschaft - Universität WienP.Brezany 1 Toward Knowledge Discovery in Databases Attached to Grids Peter Brezany Institute for Software.

Institut für Softwarewissenschaft - Universität Wien

P.Brezany10

GridMiner – A Framework for Data Mining on Grids

System Requirements:- Algorithm and data publishing and integration- Compatibility with grid infrastructure and Grid awareness- Openness- Scalability- Security and data privacy

Functionality requirements:- Mining different kinds of knowledge in databases- Incremental data mining algorithms- Interactive mining of knowledge at multiple levels of abstraction

Page 11: Institut für Softwarewissenschaft - Universität WienP.Brezany 1 Toward Knowledge Discovery in Databases Attached to Grids Peter Brezany Institute for Software.

Institut für Softwarewissenschaft - Universität Wien

P.Brezany11

GridMiner (Layered) Architecture(Based on the K.F. Jeffery´s idea)

Page 12: Institut für Softwarewissenschaft - Universität WienP.Brezany 1 Toward Knowledge Discovery in Databases Attached to Grids Peter Brezany Institute for Software.

Institut für Softwarewissenschaft - Universität Wien

P.Brezany12

Functional and Data Access Model

MDS

Page 13: Institut für Softwarewissenschaft - Universität WienP.Brezany 1 Toward Knowledge Discovery in Databases Attached to Grids Peter Brezany Institute for Software.

Institut für Softwarewissenschaft - Universität Wien

P.Brezany13

Example: Mining Patterns for Data Classification and

Associations

use database dat1, dat2mine classificationsanalyze credit_ratingusing g_parsimonydisplay as tree

use database DBs attributesmine associationsusing method attributesdisplay as rules

Page 14: Institut für Softwarewissenschaft - Universität WienP.Brezany 1 Toward Knowledge Discovery in Databases Attached to Grids Peter Brezany Institute for Software.

Institut für Softwarewissenschaft - Universität Wien

P.Brezany14

Knowledge Grid Architecture Layers

Generic Grid and Data Grid Services

KnowledgeDirectory Service

Resource AllocationExecution Management

DataAccess Service

Tools and AlgorithmsAccess Service

Execution PlanManagement

Result Present.Service

High level layer

Core layer

Page 15: Institut für Softwarewissenschaft - Universität WienP.Brezany 1 Toward Knowledge Discovery in Databases Attached to Grids Peter Brezany Institute for Software.

Institut für Softwarewissenschaft - Universität Wien

P.Brezany15

Conclusions

• Grid data mining is a relevant research topic• GridMiner approach may contribute to this research

domain• Collaborations are needed• IPG (Information Power Grid) is the only Grid project,

which wants to addresss knowledge discovery issues• Looking for a pilot application(s)• Open issues

- basic Grid technology: Globus, DataGrid,

Jini, JXTA ?

Page 16: Institut für Softwarewissenschaft - Universität WienP.Brezany 1 Toward Knowledge Discovery in Databases Attached to Grids Peter Brezany Institute for Software.

Institut für Softwarewissenschaft - Universität Wien

P.Brezany16

Data Storage and the Components

Site A Site B Site C Site D

Preprocesing Preprocessing Preprocessing Preprocessing

Local DM Local DM Local DM Local DM

Construction of the Global Model

GUI Site E