بنام خدا. داده كاوي و كاربرد آن در پزشكي. نام دانشجو : بابك رزاقي شماره دانشجويي : 85233510 استاد راهنما : جناب آقاي دكتر توحيد خواه (سمينار درس كاربرد فناوري اطلاعات در پزشكي). Why DATA MINING?. Necessity is mother of invention Huge amounts of data - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
داده كاوي و كاربرد آن در پزشكي
بنام خدا
نام دانشجو : بابك رزاقي شماره دانشجويي : 85233510
استاد راهنما : جناب آقاي دكتر توحيد خواه )سمينار درس كاربرد فناوري اطالعات در پزشكي(
Necessity is mother of invention Huge amounts of data Electronic records of our decisions
Choices in the supermarket Financial records Our comings and goings
We swipe our way through the world – every swipe is a record in a database
Data rich – but information poor Lying hidden in all this data is information! 2
3
Extracting or “mining” knowledge from large amounts of data
Data -driven discovery and modeling of hidden patterns in large volumes of data
Extraction of implicit, previously unknown and unexpected, potentially extremely useful information from data
4
Large database
Data mining
Data visualization
Ways of seeing patterns in large data sets Uses the efficiency of human pattern recognition
5
Gold Mining Knowledge mining from databases Knowledge extraction Data/pattern analysis Knowledge Discovery Databases or
KDD
6
______
______
______
Transformed Data
Patternsand
Rules
Target Data
Raw Data
KnowledgeData MiningTransformation
Interpretation& Evaluation
Selection& Cleaning
IntegrationUnderstanding
Knowledge Discovery Process
DATAWarehouse
Knowledge
7
Find true patterns and avoid overfitting (false patterns due to randomness)
8
Classification: predicting an item class Clustering: finding clusters in data Associations: e.g. A & B & C occur frequently Visualization: to facilitate human discovery Summarization: describing a group Estimation: predicting a continuous value Deviation Detection: finding changes Link Analysis: finding relationships
9
Computationally expensive to investigate all possibilities
Dealing with noise/missing information and errors in data
Finding the minimal attribute space Finding adequate evaluation function(s) Extracting meaningful information Not over fitting
10
INSIGHTFUL MINERAngoss Knowledge ACCESS ARMiner Eudaptics Viscovery Goal TV MDR
Viscovery SOMine
SPSS
11
Science: Chemistry, Physics Bioscience
Sequence-based analysis Protein structure and function prediction Protein family classification Microarray gene expression
Financial Industry - banks, businesses, e-commerce Stock and investment analysis
Pharmaceutical companies Health care Sports and Entertainment
Clinical Data Mining processes
Digital format for all pertinent data Create structure Obtain coded information Natural language understanding Create a widely accessible repository
12
13
Minimum systolic blood pressure over a 24-hour period following admission to the hospital
Class 2:
Early death
Age of Patient
Class 1:
Survivors
Was there sinus tachycardia?
Class 1:
Survivors
Class 2:
Early death
<= 91 > 91
<=62.5>62.5
YESNO
14
15
16
An organism’s genome is the “program” for making the organism, encoded in DNA Human DNA has about 30-35,000 genes A gene is a segment of DNA that specifies how
to make a protein Cells are different because of differential
gene expression About 40% of human genes are expressed at
one time Microarray devices measure gene expression