Trend Analysis and Risk Identification 1 The Gerstner laboratory for intelligent decision making and control, Czech Technical University, Prague Lenka Nováková 1 , Jiří Kléma 1 , Michal Jakob 1 , Simon Rawles 2 , Olga Štěpánková 1 PKDD 2003, Discovery Challenge 2 Department of Computer Science, University of Bristol, Bristol, UK
Trend Analysis and Risk Identification. Lenka Nov áková 1 , Ji ří Kléma 1 , Michal Jakob 1 , Simon Rawles 2 , Olga Štěpánková 1. 1 The Gerstner laboratory for intelligent decision making and control, Czech Technical University, Prague. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Trend Analysis and Risk Identification
1 The Gerstner laboratory for intelligent decision making and
control, Czech Technical University, Prague
Lenka Nováková1, Jiří Kléma1, Michal Jakob1, Simon Rawles2, Olga Štěpánková1
PKDD 2003, Discovery Challenge
2 Department of Computer Science, University of Bristol,
Bristol, UK
Outline STULONG data, orientation towards CVD Used tools
– SumatraTT, Statistica, Weka
Used techniques– mainly statistical tests - ANOVA, Chi-square, etc.
Exploratory analysis and subgroup discovery– Entry table
Trend analysis – Entry and Control tables
– three principal ways of preprocessing
– derived aggregated attributes
– univariate and multivariate analysis
STULONG Data Four tables: Entry, Control, Letter, Death Dependent variable: CVD
– CardioVascular Disease
– boolean attribute derived of A2 questionnaire (Control table)
CVD = false The patient has no coronary disease.
CVD = true The patient has one of these attributes true (Hodn1, Hodn2, Hodn3, Hodn11, Hodn13, Hodn14)
We remove patients who have diabetes (Hodn4)or cancer (Hodn15) only.
positive angina
pectoris
(silent)myocardial infarction
cerebrovascular accident
ischaemic heart
disease
ENTRY - subgroup discovery AQ no.6: Are there any differences in the ENTRY
examination for different CVD groups? Statistica 6.0
– module for interactive decision tree induction
– two tailed t-test or chi-square test to asses significance of subgroups
Dependencies are relatively weak Interesting dependencies found
– social characteristics: derived attribute AGE_of_ENTRY
– alcohol: positive effect of beer, no effect of wine
– sugar consumption increases CVD risk
– well-known dependencies are not mentioned (smoking, BMI, cholesterol)