-Sumit Ghosh Saurabh Vishal Ch.9 Data Analysis
Feb 24, 2016
-Sumit GhoshSaurabh Vishal
Ch.9 Data Analysis
2
Definition Analysis of data is a process of inspecting,
cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making.
Chap 9. Data Analysis
3
Rotary Clinker kiln
Chap 9. Data Analysis
4
Rotary Clinker kiln The aim of the stoker is to keep the kiln in a
"proper" state. Kiln revolutions(KR) Coal worm revolutions(CWR) Burning zone temperature (BZT) Burning zone color (BZC) Clinker granulation(CG) Kiln inside color(KIC)
Chap 9. Data Analysis
5
Attributes The condition attributes:
a - burning zone temperature(BZT) b - burning zone color(BZC) c - clinker granulation(CG) d - kiln inside color(KIC)
The decision attributes: e - kiln revolutions(KR) f - coal worm revolutions(CWR)
Chap 9. Data Analysis
6
Domain of Attributes Burning zone
temperature(BZT) 1 - (1380-1420 C) 2 - (1421-1440 C) 3 - (1441-1480 C) 4 - (1481-1500 C)
Burning Zone Color(BZC) 1 - scarlet 2 - dark pink 3 - bright pink 4 - very bright pink 5 - rose white
Clinker Granulation(CG) 1 - fines 2 - fines with small lumps 3 - granulation 4 - lumps
Kiln Inside Color(KIC) 1 - dark streaks 2 - indistinct dark streaks 3 - lack of dark streaks
Kiln Revolutions(KR) 1 - 0,9 rpm 2 - 1,22 rpm
Coal Worm Revolutions(CWR) 1 - 0 rpm 2 - 15 rpm 3 - 30 rpm 4 - 40 rpm
Chap 9. Data Analysis
7
Stroker‘s Observation (table 1) TIME BZT BZC CG KIC KR CWR a b c d e f 1 3 2 2 2 2 4 2 3 2 2 1 2 4 3 3 2 2 1 2 4 4 2 2 2 1 1 4 5 2 2 2 2 1 4 6 2 2 2 1 1 4 7 2 2 2 1 1 4 8 2 2 2 1 1 4 9 2 2 2 2 1 4 10 2 2 2 2 1 4 11 2 2 2 2 1 4 12 3 2 2 2 2 4 13 3 2 2 2 2 4 14 3 2 2 3 2 3 15 3 2 2 3 2 3 16 3 2 2 3 2 3 17 3 3 2 3 2 3 18 3 3 2 3 2 3 19 3 3 2 3 2 3 20 4 3 2 3 2 3 21 4 3 2 3 2 3 22 4 3 2 3 2 3 23 4 3 3 3 2 3 24 4 3 3 3 2 2 25 4 3 3 3 2 2 26 4 4 3 3 2 2
a b c d e f 27 4 4 3 3 2 2 28 4 4 3 3 2 2 29 4 4 3 3 2 2 30 4 4 3 3 2 2 31 4 4 3 2 2 2 32 4 4 3 2 2 2 33 4 3 3 2 2 2 34 4 3 3 2 2 2 35 4 3 3 2 2 2 36 4 2 3 2 2 2 37 4 2 3 2 2 2 38 3 2 2 2 2 4 39 3 2 2 2 2 4 40 3 2 2 2 2 4 41 3 3 2 2 2 4 42 3 3 2 2 2 4 43 3 3 2 2 2 4 44 3 3 2 3 2 3 45 3 3 2 3 2 3 46 4 3 2 3 2 3 47 4 3 2 3 2 3 48 4 3 2 3 2 2 49 4 3 3 3 2 2 50 4 4 3 3 2 2 51 4 4 3 2 2 2 52 4 4 3 3 2 2
Chap 9. Data Analysis
8
After elemination of identical rows (table2)
U a b c d e f 1 3 3 2 2 2 4 2 3 2 2 2 2 4 3 3 2 2 1 2 4-------------------------------------- 4 2 2 2 1 1 4 5 2 2 2 2 1 4-------------------------------------- 6 3 2 2 3 2 3 7 3 3 2 3 2 3 8 4 3 2 3 2 3-------------------------------------- 9 4 3 3 3 2 2 10 4 4 3 3 2 2 11 4 4 3 2 2 2 12 4 3 3 2 2 2 13 4 2 3 2 2 2
Chap 9. Data Analysis
9
Removing attribute a (table 3) U b c d e f 1 3 2 2 2 4 2 2 2 2 2 4 3 2 2 1 2 4-------------------------------------- 4 2 2 1 1 4 5 2 2 2 1 4-------------------------------------- 6 2 2 3 2 3 7 3 2 3 2 3 8 3 2 3 2 3-------------------------------------- 9 3 3 3 2 2 10 4 3 3 2 2 11 4 3 2 2 2 12 3 3 2 2 2 13 2 3 2 2 2
Chap 9. Data Analysis
10
Inconsistant (table 3) Table 3 is inconsistent because the following
pairs of decision rules (i)b2c2d1 →e2f4(rule3) b2c2d1 → e1f4(rule4) (ii)b2c2d2 → e2f4(rule2) b2c2d2 → e1f4(rule5)
are inconsistent.
Chap 9. Data Analysis
11
Removing attribute b (table 4) U a c d e f 1 3 2 2 2 4 2 3 2 2 2 4 3 3 2 1 2 4-------------------------------------- 4 2 2 1 1 4 5 2 2 2 1 4-------------------------------------- 6 3 2 3 2 3 7 3 2 3 2 3 8 4 2 3 2 3-------------------------------------- 9 4 3 3 2 2 10 4 3 3 2 2 11 4 3 2 2 2 12 4 3 2 2 2 13 4 3 2 2 2
It is easily seen that all decision rules in the table are consistent, hence the attribute b is superfluous.
Chap 9. Data Analysis
12
Removing attribute c (table 5) U a b d e f 1 3 3 2 2 4 2 3 2 2 2 4 3 3 2 1 2 4-------------------------------------- 4 2 2 1 1 4 5 2 2 2 1 4-------------------------------------- 6 3 2 3 2 3 7 3 3 3 2 3 8 4 3 3 2 3-------------------------------------- 9 4 3 3 2 2 10 4 4 3 2 2 11 4 4 2 2 2 12 4 3 2 2 2 13 4 2 2 2 2
Chap 9. Data Analysis
13
Inconsistant table 5 Table 5 is inconsistent because the following
pairs of decision rules a4b3d3 → e2f3(rule8) a4b3d3 → e2f2(rule9)
are inconsistent.
Chap 9. Data Analysis
14
Removing attribute d (table 6) U a b c e f 1 3 3 2 2 4 2 3 2 2 2 4 3 3 2 2 2 4-------------------------------------- 4 2 2 2 1 4 5 2 2 2 1 4-------------------------------------- 6 3 2 2 2 3 7 3 3 2 2 3 8 4 3 2 2 3-------------------------------------- 9 4 3 3 2 2 10 4 4 3 2 2 11 4 4 3 2 2 12 4 3 3 2 2 13 4 2 3 2 2
Chap 9. Data Analysis
15
Inconsistent table 6 Table 6 is inconsistent because the following
pairs of decision rules (i)a3b3c2 → e2f4(rule1) a3b3c2 → e2f3(rule7) (ii)a3b2c2 → e2f4(rule3) a3b2c2 → e2f3(rule6)
are inconsistent.
Chap 9. Data Analysis
16
Result Thus without one of the attributes a,c or d Table
2 becomes inconsistent, and without the attribute b the table remains consistent. Attribute b can be dropped from the table.
We recall that, if there are two or more identical decision rules in a table we should drop all but one, arbitrary representative.
Chap 9. Data Analysis
17
After removing attribute b (table 4) U a c d e f 1 3 2 2 2 4 2 3 2 2 2 4 3 3 2 1 2 4-------------------------------------- 4 2 2 1 1 4 5 2 2 2 1 4-------------------------------------- 6 3 2 3 2 3 7 3 2 3 2 3 8 4 2 3 2 3-------------------------------------- 9 4 3 3 2 2 10 4 3 3 2 2 11 4 3 2 2 2 12 4 3 2 2 2 13 4 3 2 2 2
Chap 9. Data Analysis
18
Check duplicate rows U a c d e f 1 3 2 2 2 4 2 3 2 2 2 4 3 3 2 1 2 4-------------------------------------- 4 2 2 1 1 4 5 2 2 2 1 4-------------------------------------- 6 3 2 3 2 3 7 3 2 3 2 3 8 4 2 3 2 3-------------------------------------- 9 4 3 3 2 2 10 4 3 3 2 2 11 4 3 2 2 2 12 4 3 2 2 2 13 4 3 2 2 2
Chap 9. Data Analysis
19
After removing duplicate rules (table 7)
U a c d e f 1 3 2 2 2 4 2 3 2 1 2 4---------------------------------- 3 2 2 1 1 4 4 2 2 2 1 4---------------------------------- 5 3 2 3 2 3 6 4 2 3 2 3---------------------------------- 7 4 3 3 2 2 8 4 3 2 2 2
Chap 9. Data Analysis
20
Substitute decisions In this decision table there are four kinds of
possible decision, which are specified by the following pairs of values of decision attributes e and f : (e2,f4)→ I, (e1,f4) → II, (e2,f3) → III and (e2,f2) → IV
Chap 9. Data Analysis
21
After substituting decision (table 8) U a c d e f 1 3 2 2 I 2 3 2 1 ---------------------------------- 3 2 2 1 II 4 2 2 2 ---------------------------------- 5 3 2 3 III 6 4 2 3 ---------------------------------- 7 4 3 3 IV 8 4 3 2
Chap 9. Data Analysis
22
Removing superfluous values Now removing superfluous values of condition
attributes from the table.
For this purpose we have to compute which attribute values are dispensable or indispensable with respect to each decision class and find out more core values and reduct values for each decision rule. That means we are looking only for those attributes values which are necessary to distinguish all decision classes, i.e. preserving consistency of the table.
Chap 9. Data Analysis
23
Calculating core for rule 1 Let us compute core values and reduct values for the
first decision rule a3c2d2 → e2f4 (rule1) in Table 8 Values a and d are indispensable in the rule, since the
following pairs of rules are inconsistent. (i)c2d2 → e2f4(rule1) c2d2 → e1f4(rule4) (ii)a3c2 → e2f4(rule1) a3c2 → e1f3(rule5)
whereas the attribute value c2 is dispensable, since the decision rule a3d2 → e2f4 is consistent. Thus a3and d2 are core values of the decision value a3c2d2 → e2f4.
Chap 9. Data Analysis
24
Compute core using proposition 7.1. To this end we have to check whether the
following inclusions |c2d2| ⊆ |e2f4| ; |a3d2| ⊆ |e2f4| and |a3c2| ⊆ |
e2f3| are valid or not. Because we have |c2d2| = {1, 4}, |a3c2| = {1, 2, 5}, |a3d2| = {1} and |e2f4| = {1, 2},
hence only the decision rule a3d2 → e2f4 is true, and consequently the core values of the first decision rule are a3 and d2.
Chap 9. Data Analysis
25
Core values (table 9) U a c d e f 1 3 - 2 I 2 3 - 1 ---------------------------------- 3 2 - - II 4 2 - - ---------------------------------- 5 - - 3 III 6 - 2 - ---------------------------------- 7 - 3 - IV 8 - - -
Chap 9. Data Analysis
26
Reduct for I and II It can be easily seen that in the decision classes I
and II sets of core values of each decision rule are also reducts, because rules a3d2 → e2f4 a3d1 → e2f4 a2 → e1f4
are true.
Chap 9. Data Analysis
27
for III and IV For the decision classes III and IV however core
values do not form value reducts. For example decision rules d3 → e2f3(rule5) d3 → e2f2(rule7)
are inconsistent, and so are decision rules c2 → e2f3(rule6) c2 → e1f4(rule4)
hence, according to the definition, they do not form reducts.
Chap 9. Data Analysis
28
Reduct values (table 10) U a c d e f 1 3 X 2 I 2 3 X 1 ---------------------------------- 3 2 X X II 4 2 X X ---------------------------------- 5 X 2 3 III 5’ 3 X 3 6 4 2 X 6’ X 2 3 ---------------------------------- 7 X 3 X IV 8 4 3 X 8’ X 3 2 8’’ 4 X 2
Chap 9. Data Analysis
29
Minimal solution It is easy to see that there are not superfluous
decision rules in class I and II. For decision class III we have two minimal solutions c2d3 → e2f3
and a4c2 → e2f3 a3d3 → e2f3
and for class IV we have one minimal solution c3 → e2f2
Chap 9. Data Analysis
30
Minimal algorithm hence we have following two decision minimal
algorithms a3d2 → e2f4 a3d1 → e2f4 a2 → e1f4 c2d3 → e2f3 c3 → e2f2
and a3d2 → e2f4 a3d1 → e2f4 a2 → e1f4 a3d3 → E2f3 a4c2 → e2f3 c3 → e2f2 Chap 9. Data Analysis
31
Combined forms The combined forms of these algorithms are
a3d1 V a3d2 → e2f4 a2 → e1f4 c2d3 → e2f3 c3 → e2f2
and a3d1 V a3d2 → e2f4 a2 → e1f4 a3d3 V a4c2 → e2f3 c3 → e2f2
Chap 9. Data Analysis
Chap 9. Data Analysis 32
Another Approach
Example of cement kiln control (cf, Sandness (1986))
In which actions of a stoker are based not on the kiln state but on the quality of the cement produced
Chap 9. Data Analysis 33
Described by the followingattributes
a - Granularity b - Viscosity c - Color d - pH levelwhich are assumed to be condition attributes
Again there are two decision(action) attributes
e - Rotation Speed f - Temperature
34
U a b c d e f1 2 1 1 1 1 3
2 2 1 1 0 1 3
3 2 2 1 1 1 3
4 1 1 1 0 0 3
5 1 1 1 1 0 3
6 2 1 1 2 1 2
7 2 2 1 2 1 2
8 3 2 1 2 1 2
9 3 2 2 2 1 1
10 3 3 2 2 1 1
11 3 3 2 1 1 1
12 3 2 2 1 1 1
13 3 0 2 1 1 1
Chap 9. Data Analysis 35
Interesting Note
The table is not obtained as a result of the stoker's actions observation, and does not represent the stoker's knowledge
But it contains the prescription which the stoker should follow in order to produce cement of required quality.
Chap 9. Data Analysis 36
Dispensable Attribute
we find out that attribute b is again dispensable with respect to the decision attributes,
which means that the viscosity is a superfluous condition, which can be dropped without affecting the decision procedure.
Chap 9. Data Analysis 37
Re-numeration of decision rules can be simplified
U a c d1 2 1 0
I2 2 1 1
3 1 1 0II
4 1 1 1
5 2 1 2III
6 3 1 2
7 3 2 2IV
8 3 2 1
Chap 9. Data Analysis 38
Compute the Core values
U a c d1 2 - 0
I2 2 - 1
3 1 - -II
4 1 - -
5 - - 2III
6 - 1 -
7 - 2 -IV
8 - - -
Chap 9. Data Analysis 39
The case of Inconsistent Data when a decision table is the result of
observations or measurements It may happen that the table is inconsistent Some observed or measured data can be
conflicting. This finally leads to partial dependency of
decision and condition attributes But we are more interested in consistent
data some times inconsistent data could also be interested
Chap 9. Data Analysis 40
U a b c d e
1 Normal absent absent absent Absent
2 Normal absent Present Present Absent
3 Subfeb Absent Present Present Present
4 Subfeb Present Absent Absent Absent
5 Subfeb present absent absent Present
6 High Absent Absent Absent Absent
7 High Present Absent Absent Absent
8 High Present Absent Absent Present
9 High present present present Present
Chap 9. Data Analysis 41
Condition and Decision attributes Condition attributes
a – temperature b – Dry-cough c – headache d – Muscle pain
Decision attributes e - influenza
Chap 9. Data Analysis 42
Decision rule 4 and 5 is inconsistent Rule 4- if
(temperature,subfeb) and(dry cough,present) and(muscle pain,absnet) then(influenza,absent)
Rule 5 -if(temperature,subfeb) and(dry cough,present) and(muscle pain,absnet) then(influenza,present)
Similar with rule 7 and rule 8. remaining 5 decision rules are true,
Chap 9. Data Analysis 43
Dependency
So the dependency between decision and condition attributes is 5/9 This means the condition attributes are
not sufficient to decide whether a patient has influenza or not.
But in consistent decision rule we can classify patient having influenza
Chap 9. Data Analysis 44
Decompose the decision table Into Consistent and Inconsistent
tables
Inconsistent consists of rule 4,5,7 and 8
Rest of the rules consist of consistent parts
Chap 9. Data Analysis 45
Consistent rules in the Table Rule 1- if
(temperature,normal) and(dry cough,absent) and(muscle pain,absnet) then(influenza,absent)
Rule 2 -if(temperature,normal) and(dry cough,absent) and(muscle pain,present) then(influenza,absent)
Chap 9. Data Analysis 46
Consistent rules in the Table Rule 3- if
(temperature,subfeb) and(dry cough,absent) and(muscle pain,present) then(influenza,present)
Rule 6 -if(temperature,high) and(dry cough, absent) and(muscle pain,absent) then(influenza, absent)
Chap 9. Data Analysis 47
Consistent rules in the Table Rule 9 –if
(temperature,high) and(dry cough, present) and(muscle pain,present)
then(influenza, present)
We have to compute the core of the condition attributes
Chap 9. Data Analysis 48
Shorten decision and condition attributes using tabular notation
U a b c d e1 N A A A A
2 N A P P A
3 S A P P P
4 S P A A A
5 S P A A P
6 H A A A A
7 H P A A A
8 H P A A P
9 H p P P P
Chap 9. Data Analysis 49
Note attribute c and d are equivalent, we can drop one of them
U a b c e1 N A A A2 N A P A3 S A P P4 S P A A5 S P A P6 H A A A7 H P A A8 H P A P9 H p P P
Chap 9. Data Analysis 50
Compute Core of attributes removing ‘a’
U b c e1 A A A2 A P A3 A P P4 P A A5 P A P6 A A A7 P A A8 P A P9 p P P
Rule 2 & rule 3 are inconsistent, which will change the consistent rules of decision algorithm. so ‘a ‘is indispensable
Chap 9. Data Analysis 51
Removing attribute ‘b’U a c e1 N A A2 N P A3 S P P4 S A A5 S A P6 H A A7 H A A8 H A P9 H P P
Rule 6 & rule 8 are inconsistent, here rule 6 is false and positive region(consistent rules) of decision algorithm changes. so ‘b ‘is indispensable
Chap 9. Data Analysis 52
Removing ‘c’ attributes
U a b e1 N A A2 N A A3 S A P4 S P A5 S P P6 H A A7 H P A8 H P P9 H p P
Rule 7 & rule 9 are inconsistent, here rule 9 is false and positive region(consistent rules) of decision algorithm changes. so ‘c ‘is indispensable and belongs to core
Chap 9. Data Analysis 53
Further simplification
The set of condition attributes and a, b and c is independent and it forms a reduct of the condition attributes.
Decision table is further simplified to have a minimal solutions which eliminates the superfluous attributes in all consistent decision rules in the table
Chap 9. Data Analysis 54
Core values of consistent rules
U a b c e1 - - - A2 N - - A3 S - - P4 S P A A5 S P A P6 - A - A7 H P A A8 H P A P9 - - P P
55
Finally REDUCT of consistent rulesU a b c d1 N X X A1’ X A A A1’’ X A A A2 N X X A3 S A X P3’ S X P P4 S P A A5 S P A P6 H a X A6’ X A A A7 H P A A8 H P A P9 H X P P9’ x P p P
Chap 9. Data Analysis 56
Result table
It contains 2 reducts for rule1, rule 2 has 1 reduct and remaining rules are having 2 reducts each
In rule 1 all the reducts are superfluous.
to total has2*2*2=8 minimal.
Chap 9. Data Analysis 57
Deterministic Decision Algorithmaneaahbaeaa3baepahcpep
Chap 9. Data Analysis 58
Extended Form
Rule 1- if(temperature,normal) and then(influenza,absent)
Rule 6 -if(temperature,high) and(dry cough,absent) then(influenza,absent)
Chap 9. Data Analysis 59
Extended Form Rule 3- if
(temperature,Subfeb.) and(dry cough,absent) then(influenza,present)
Rule 9 -if(temperature,high) and(Headache,present) then(influenza,Present)
Chap 9. Data Analysis 60
Another Form of representation algorithm Rule 1- if
(temperature,normal) and(temperature,high) and(dry cough,absent) then(influenza,absent).
Rule 2- -if(temperature,Subfeb.) and(dry cough, absent) or(temperature,high)(Headache,present) then(influenza,Present)
Chap 9. Data Analysis 61
Thanks
Q A Sesssion