Classification Trees and Regression Trees

What Are Classification Trees and Regression Trees?

Classification trees and regression trees predict responses to data. To predict a response, follow the decisions in the tree from the root (beginning) node down to a leaf node. The leaf node contains the response. Classification trees give responses that are nominal, such as 'true' or 'false'. Regression trees give numeric responses.

Statistics Toolbox trees are binary. Each step in a prediction involves checking the value of one predictor (variable). For example, here is a simple classification tree:

This tree predicts classifications based on two predictors, x1 and x2. To predict, start at the top node, represented by a triangle. The first decision is whether x1 is smaller than 0.5. If so, follow the left branch, and see that the tree classifies the data as type 0.

If, however, x1 exceeds 0.5, then follow the right branch to the lower-right triangle node. Here the tree asks if x2 is smaller than 0.5. If so, then follow the left branch to see that the tree classifies the data as type 0. If not, then follow the right branch to see that the tree classifies the data as type 1.

Classification Trees and Regression Trees

To learn how to prepare *o(r data for classification or regression (sing decision trees$ see)teps in )(per"ised .earning %Machine .earning' #

Example: Creating a Classification Tree

To create a classification tree for the ionosphere data:

load ionosphere % contains X and Y variablesctree = X!Y"

ctree =

ClassificationTree# $redictor a&es# 1x() cell* Cate+orical$redictors# ,-

esponse a&e# 'Y' Class a&es# 'b' '+'* /coreTransfor&# 'none' bservations# (51

Example: Creating a Regression Tree

To create a regression tree for the cars&all data &ased on the orsepo er and 3ei+ht "ectors for data$ and 4$ "ector for response:

load cars&all % contains orsepo er! 3ei+ht! 4$X = , orsepo er 3ei+ht-6rtree = X!4$ "

rtree =

e+ressionTree# $redictor a&es# 'x1' 'x2'* Cate+orical$redictors# ,- esponse a&e# 'Y' esponseTransfor&# 'none' bservations# 7)

Viewing a Tree

There are two wa*s to "iew a tree:

• vie tree" ret(rns a text description of the tree#• vie tree!'&ode'!'+raph'" ret(rns a graphic description of the tree#

Example: Creating a Classification Tree has the following two "iews:

load fisheririsctree = &eas!species"6vie ctree"

8ecision tree for classification1 if x(92.)5 then node 2 elseif x(:=2.)5 then node ( else setosa2 class = setosa( if x)91.;5 then node ) elseif x):=1.;5 then node 5 else versicolor) if x(9).75 then node < elseif x(:=).75 then node ; else versicolor5 class = vir+inica< if x)91.<5 then node elseif x):=1.<5 then node 7 else versicolor; class = vir+inica

class = versicolor7 class = vir+inica

vie ctree!'&ode'!'+raph'"

)imilarl*$ Example: Creating a Regression Tree has the following two "iews:

load cars&all % contains orsepo er! 3ei+ht! 4$X = , orsepo er 3ei+ht-6rtree = X!4$ !'4in$arent'!(0"6vie rtree"

8ecision tree for re+ression1 if x29(0 5.5 then node 2 elseif x2:=(0 5.5 then node ( else 2(.;1 12 if x19 7 then node ) elseif x1:= 7 then node 5 else 2 .;7(1( if x19115 then node < elseif x1:=115 then node ; else 15.5)1;) if x2921<2 then node elseif x2:=21<2 then node 7 else (0.7(;55 fit = 2).0 2< fit = 17.<25; fit = 1).(;5

fit = ((.(05<7 fit = 27

vie rtree!'&ode'!'+raph'"

How the it !ethods Create Trees

The and methods perform the followingsteps to create decision trees:

0# )tart with all inp(t data$ and examine all possi&le &inar* splits on e"er* predictor#1# )elect a split with &est optimi2ation criterion#

• !f the split leads to a child node ha"ing too few o&ser"ations %less than the4in>eaf parameter'$ select a split with the &est optimi2ation criterion s(&3ectto the 4in>eaf constraint#

4# !mpose the split#5# Repeat rec(rsi"el* for the two child nodes#

The explanation re6(ires two more items: description of the optimi2ation criterion$ andstopping r(le#

"topping r#le: )top splitting when an* of the following hold:

• The node is pure #o For classification$ a node is p(re if it contains onl* o&ser"ations of one class#

o For regression$ a node is p(re if the mean s6(ared error %M)E' for theo&ser"ed response in this node drops &elow the M)E for the o&ser"edresponse in the entire data m(ltiplied &* the tolerance on 6(adratic error pernode %?etoler parameter'#

There are fewer than 4in$arent o&ser"ations in this node#• An* split imposed on this node wo(ld prod(ce children with fewer than 4in>eaf


Optimi$ation criterion:

• Regression: mean-s6(ared error %M)E'# Choose a split to minimi2e the M)E of predictions compared to the training data#

• Classification: 7ne of three meas(res$ depending on the setting of the/plitCriterion name-"al(e pair:

o '+di' %8ini9s di"ersit* index$ the defa(lt'

o 't oin+'

o 'deviance'

For details$ see ClassificationTree efinitions #

For a contin(o(s predictor$ a tree can split halfwa* &etween an* two ad3acent (ni6(e "al(esfo(nd for this predictor# For a categorical predictor with L le"els$ a classification tree needs toconsider 1 L ;0 ;0 splits# To o&tain this form(la$ o&ser"e that *o( can assign L distinct "al(es to

the left and right nodes in 1 L wa*s# Two o(t of these 1 L config(rations wo(ld lea"e either leftor right node empt*$ and therefore sho(ld &e discarded# <ow di"ide &* 1 &eca(se left andright can &e swapped# A classification tree can th(s process onl* categorical predictors with amoderate n(m&er of le"els# A regression tree emplo*s a comp(tational shortc(t: it sorts thele"els &* the o&ser"ed mean response$ and considers onl* the L ;0 splits &etween the sortedle"els#

%redicting Responses With Classification Trees and Regression Trees

After creating a tree$ *o( can easil* predict responses for new data# )(ppose Xne is new datathat has the same n(m&er of col(mns as the original data X# To predict the classification orregression &ased on the tree and the new data$ enter

Yne = predict tree!Xne "6

For each row of data in Xne $predict r(ns thro(gh the decisions in tree and gi"es theres(lting prediction in the corresponding element of Yne # For more information forclassification$ see the classification predict reference page= for regression$ see the regressionpredict reference page#

For example$ to find the predicted classification of a point at the mean of the ionosphere data:

load ionosphere % contains X and Y variablesctree = X!Y"6Yne = predict ctree!&ean X""

Yne ='+'

To find the predicted 4$ of a point at the mean of the cars&all data:

load cars&all % contains orsepo er! 3ei+ht! 4$X = , orsepo er 3ei+ht-6rtree = X!4$ "6Yne = predict rtree!&ean X""

Yne = 2 .;7(1

&mpro'ing Classification Trees and Regression Trees

>o( can t(ne trees &* setting name-"al(e pairs in # The remainder of this section descri&es how to determine the 6(alit*

of a tree$ how to decide which name-"al(e pairs to set$ and how to control the si2e of a tree:

• Examining Res(&stit(tion Error• Cross Validation

• Control epth or .eafiness

• r(ning

Examining Res#(stit#tion Error

Resubstitution error is the difference &etween the response training data and the predictionsthe tree ma+es of the response &ased on the inp(t training data# !f the res(&stit(tion error ishigh$ *o( cannot expect the predictions of the tree to &e good# Howe"er$ ha"ing lowres(&stit(tion error does not g(arantee good predictions for new data# Res(&stit(tion error isoften an o"erl* optimistic estimate of the predicti"e error on new data#

Example: Res#(stit#tion Error of a Classification Tree) Examine the res(&stit(tion errorof a defa(lt classification tree for the Fisher iris data:

load fisheririsctree = &eas!species"6resuberror = resub>oss ctree"

resuberror = 0.0200

The tree classifies nearl* all the Fisher iris data correctl*#

Cross Validation

To get a &etter sense of the predicti"e acc(rac* of *o(r tree for new data$ cross "alidate the

tree# /* defa(lt$ cross "alidation splits the training data into 0@ parts at random# !t trains 0@new trees$ each one on nine parts of the data# !t then examines the predicti"e acc(rac* of eachnew tree on the data not incl(ded in training that tree# This method gi"es a good estimate ofthe predicti"e acc(rac* of the res(lting tree$ since it tests the new trees on new data#

Example: Cross Validating a Regression Tree) Examine the res(&stit(tion and cross-"alidation acc(rac* of a regression tree for predicting mileage &ased on the cars&all data:

load cars&allX = ,@cceleration 8isplace&ent orsepo er 3ei+ht-6rtree = X!4$ "6

resuberror = resub>oss rtree"resuberror = ).;1

The res(&stit(tion loss for a regression tree is the mean-s6(ared error# The res(lting "al(eindicates that a t*pical predicti"e error for the tree is a&o(t the s6(are root of 5# $ or a &it o"er1#

<ow calc(late the error &* cross "alidating the tree:

cvrtree = crossval rtree"6cvloss = Afold>oss cvrtree"

cvloss = 2(.) 0

The cross-"alidated loss is almost 1B$ meaning a t*pical predicti"e error for the tree on newdata is a&o(t B# This demonstrates that cross-"alidated loss is (s(all* higher than simpleres(&stit(tion loss#

Control *epth or +,eafiness+

When *o( grow a decision tree$ consider its simplicit* and predicti"e power# A deep treewith man* lea"es is (s(all* highl* acc(rate on the training data# Howe"er$ the tree is notg(aranteed to show a compara&le acc(rac* on an independent test set# A leaf* tree tends too"ertrain$ and its test acc(rac* is often far less than its training %res(&stit(tion' acc(rac*# !ncontrast$ a shallow tree does not attain high training acc(rac*# /(t a shallow tree can &e morero&(st its training acc(rac* co(ld &e close to that of a representati"e test set# Also$ ashallow tree is eas* to interpret#

!f *o( do not ha"e eno(gh data for training and test$ estimate tree acc(rac* &* cross"alidation#

For an alternati"e method of controlling the tree depth$ see r(ning #

Example: "electing Appropriate Tree *epth) This example shows how to control thedepth of a decision tree$ and how to choose an appropriate depth#

1. .oad the ionosphere data:

load ionosphere2. 8enerate minim(m leaf occ(pancies for classification trees from 10 to 100 $ spaced

exponentiall* apart:

leafs = lo+space 1!2!10"63. Create cross "alidated classification trees for the ionosphere data with minim(m leaf

occ(pancies from leafs :). = nu&el leafs"65. err = Beros !1"6<. for n=1#;. t = X!Y!'crossval'!'on'!...

. '&inleaf'!leafs n""67. err n" = Afold>oss t"610. end11. plot leafs!err"612. xlabel '4in >eaf /iBe'"6

label 'crossDvalidated error'"6

The &est leaf si2e is &etween a&o(t 20 and 50 o&ser"ations per leaf#

13. Compare the near-optimal tree with at least )0 o&ser"ations per leaf with the defa(lttree$ which (ses 10 o&ser"ations per parent node and 1 o&ser"ation per leaf#

1). 8efaultTree = X!Y"6vie 8efaultTree!'&ode'!'+raph'"

pti&alTree = X!Y!'&inleaf'!)0"6vie pti&alTree!'&ode'!'+raph'"

resub pt = resub>oss pti&alTree"6loss pt = Afold>oss crossval pti&alTree""6resub8efault = resub>oss 8efaultTree"6loss8efault = Afold>oss crossval 8efaultTree""6resub pt!resub8efault!loss pt!loss8efault

resub pt = 0.0 (

resub8efault = 0.011)

loss pt = 0.105)

loss8efault = 0.102<

The near-optimal tree is m(ch smaller and gi"es a m(ch higher res(&stit(tion error#>et it gi"es similar acc(rac* for cross-"alidated data#


r(ning optimi2es tree depth %leafiness' is &* merging lea"es on the same tree &ranch#Control epth or .eafiness descri&es one method for selecting the optimal depth for a tree#

Dnli+e in that section$ *o( do not need to grow a new tree for e"er* node si2e# !nstead$ growa deep tree$ and pr(ne it to the le"el *o( choose#

r(ne a tree at the command line (sing the prune method %classification' or prune method%regression'# Alternati"el*$ pr(ne a tree interacti"el* with the tree "iewer:

vie tree!'&ode'!'+raph'"

To pr(ne a tree$ the tree m(st contain a pr(ning se6(ence# /* defa(lt$ & and calc(late a pr(ning se6(ence for a treed(ring constr(ction# !f *o( constr(ct a tree with the '$rune' name-"al(e pair set to 'off' $ orif *o( pr(ne a tree to a smaller le"el$ the tree does not contain the f(ll pr(ning se6(ence#8enerate the f(ll pr(ning se6(ence with the prune method %classification' or prune method%regression'#

Example: %r#ning a Classification Tree) This example creates a classification tree for theionosphere data$ and pr(nes it to a good le"el#

1. .oad the ionosphere data:

load ionosphere1# Constr(ct a defa(lt classification tree for the data:

tree = X!Y"64# View the tree in the interacti"e "iewer:

vie tree!'&ode'!'+raph'"

Page 11: Classification Trees and Regression Trees

8/12/2019 Classification Trees and Regression Trees 11/38

The pr(ned tree is the same as the near-optimal tree in Example: )electingAppropriate Tree epth #

10. )et 'treesiBe' to 'se' %defa(lt' to find the maximal pr(ning le"el for which the treeerror does not exceed the error from the &est le"el pl(s one standard de"iation:

11. ,E!E!E!bestlevel- = cv>oss tree!'subtrees'!'all'"12.1(. bestlevel =


!n this case the le"el is the same for either setting of 'treesiBe' #

05# r(ne the tree to (se it for other p(rposes:15. tree = prune tree!'>evel'!<"6

vie tree!'&ode'!'+raph'"

/ac+ to Top

Alternati'e: classregtree

The ClassificationTree and e+ressionTree classes are new in MAT.A/ R1@00a#re"io(sl*$ *o( represented &oth classification trees and regression trees with a

classre+tree o&3ect# The new classes pro"ide all the f(nctionalit* of the classre+tree class$ and are more con"enient when (sed in con3(nction with Ensem&le Methods #

/efore the classre+tree class$ there were treefit $treedisp $treeval $treeprune $ andtreetest f(nctions# )tatistics Tool&ox software maintains these onl* for &ac+wardcompati&ilit*#

Example: Creating Classification Trees -sing classregtree

This example (ses Fisher9s iris data in fisheriris.&at to create a classification tree for predicting species (sing meas(rements of sepal length$ sepal width$ petal length$ and petalwidth as predictors# Here$ the predictors are contin(o(s and the response is categorical#

1. .oad the data and (se the classre+tree constr(ctor of the classre+tree class tocreate the classification tree:

2. load fisheriris(.

). t = classre+tree &eas!species!...5. 'na&es'! '/>' '/3' '$>' '$3'*"<. t =;. 8ecision tree for classification

. 1 if $>92.)5 then node 2 elseif $>:=2.)5 then node ( else setosa7. 2 class = setosa10. ( if $391.;5 then node ) elseif $3:=1.;5 then node 5 else

versicolor11. ) if $>9).75 then node < elseif $>:=).75 then node ; else

versicolor12. 5 class = vir+inica1(. < if $391.<5 then node elseif $3:=1.<5 then node 7 else


1). ; class = vir+inica15. class = versicolor7 class = vir+inica

t is a classre+tree o&3ect and can &e operated on with an* class method#

16. Dse the t pe method of the classre+tree class to show the t*pe of the tree:1;. treet pe = t pe t"1 . treet pe =17. classification

classre+tree creates a classification tree &eca(se species is a cell arra* of strings$and the response is ass(med to &e categorical#

20. To "iew the tree$ (se the vie method of the classre+tree class:

vie t"

The tree predicts the response "al(es at the circ(lar leaf nodes &ased on a series of6(estions a&o(t the iris at the triang(lar &ranching nodes# A true answer to an*6(estion follows the &ranch to the left# A false follows the &ranch to the right#

21. The tree does not (se sepal meas(rements for predicting species# These can go(nmeas(red in new data$ and *o( can enter them as a "al(es for predictions# Forexample$ to (se the tree to predict the species of an iris with petal length ). and petalwidth 1.< $ t*pe:

22. predicted = t , a a ). 1.<-"2(. predicted =


The o&3ect allows for f(nctional e"al(ation$ of the form t X" # This is a shorthand wa*of calling the eval method of the classre+tree class# The predicted species is theleft leaf node at the &ottom of the tree in the pre"io(s "iew#

24. >o( can (se a "ariet* of methods of the classre+tree class$ s(ch as cutvar andcutt pe to get more information a&o(t the split at node that ma+es the finaldistinction &etween versicolor and vir+inica :

25. var< = cutvar t!<" % 3hat variable deter&ines the splitF2<. var< =2;. '$3'2 .27. t pe< = cutt pe t!<" % 3hat t pe of split is itF(0. t pe< =(1. 'continuous'32. Classification trees fit the original %training' data well$ &(t can do a poor 3o& of

classif*ing new "al(es# .ower &ranches$ especiall*$ can &e strongl* affected &*o(tliers# A simpler tree often a"oids o"erfitting# >o( can (se the prune method of theclassre+tree class to find the next largest tree from an optimal pr(ning se6(ence:

((. pruned = prune t!'level'!1"(). pruned =(5. 8ecision tree for classification(<. 1 if $>92.)5 then node 2 elseif $>:=2.)5 then node ( else setosa(;. 2 class = setosa( . ( if $391.;5 then node ) elseif $3:=1.;5 then node 5 else

versicolor(7. ) if $>9).75 then node < elseif $>:=).75 then node ; else

versicolor)0. 5 class = vir+inica)1. < class = versicolor)2. ; class = vir+inica)(.

vie pruned"

To find the &est classification tree$ emplo*ing the techni6(es of res(&stit(tion and cross"alidation$ (se the test method of the classre+tree class#

Example: Creating Regression Trees -sing classregtree

This example (ses the data on cars in cars&all.&at to create a regression tree for predictingmileage (sing meas(rements of weight and the n(m&er of c*linders as predictors# Here$ one

predictor %weight' is contin(o(s and the other %c*linders' is categorical# The response%mileage' is contin(o(s#

1. .oad the data and (se the classre+tree constr(ctor of the classre+tree class tocreate the regression tree:

2. load cars&all(.). t = classre+tree ,3ei+ht! C linders-!4$ !...5. 'cat'!2!'split&in'!20!...<. 'na&es'! '3'!'C'*";.

. t =7.10. 8ecision tree for re+ression11. 1 if 39(0 5.5 then node 2 elseif 3:=(0 5.5 then node ( else

2(.;1 112. 2 if 392(;1 then node ) elseif 3:=2(;1 then node 5 else 2 .;7(11(. ( if C= then node < elseif C in ) <* then node ; else 15.5)1;1). ) if 3921<2 then node elseif 3:=21<2 then node 7 else (2.0;)115. 5 if C=< then node 10 elseif C=) then node 11 else 25.7(551<. < if 39)( 1 then node 12 elseif 3:=)( 1 then node 1( else 1).27<(1;. ; fit = 17.2;;

1 . fit = ((.(05<17. 7 fit = 27.<11120. 10 fit = 2(.2521. 11 if 392 2;.5 then node 1) elseif 3:=2 2;.5 then node 15 else

2;.21)(22. 12 if 39(5((.5 then node 1< elseif 3:=(5((.5 then node 1; else

1). <7<2(. 1( fit = 112). 1) fit = 2;.<( 725. 15 fit = 2).<<<;2<. 1< fit = 1<.<

1; fit = 1).( 7

t is a classre+tree o&3ect and can &e operated on with an* of the methods of theclass#

27. Dse the t pe method of the classre+tree class to show the t*pe of the tree:2 . treet pe = t pe t"

27. treet pe =(0. re+ression

classre+tree creates a regression tree &eca(se 4$ is a n(merical "ector$ and theresponse is ass(med to &e contin(o(s#

31. To "iew the tree$ (se the vie method of the classre+tree class:

vie t"

&ntrod#ction to !A.OVA

The anal*sis of "ariance techni6(e in Example: 7ne-Wa* A<7VA ta+es a set of gro(peddata and determine whether the mean of a "aria&le differs significantl* among gro(ps# 7ftenthere are m(ltiple response "aria&les$ and *o( are interested in determining whether the entireset of means is different from one gro(p to the next# There is a m(lti"ariate "ersion ofanal*sis of "ariance that can address the pro&lem#

/ac+ to Top

A.OVA with !#ltiple Responses

The cars&all data set has meas(rements on a "ariet* of car models from the *ears 0G @$0G $ and 0G 1# )(ppose *o( are interested in whether the characteristics of the cars ha"echanged o"er time#

First$ load the data#

load cars&allhos

a&e /iBe H tes Class @cceleration 100x1 00 double arra C linders 100x1 00 double arra 8isplace&ent 100x1 00 double arra orsepo er 100x1 00 double arra 4$ 100x1 00 double arra 4odel 100x(< ;200 char arra 4odelIYear 100x1 00 double arra

ri+in 100x; 1)00 char arra 3ei+ht 100x1 00 double arra

Fo(r of these "aria&les % @cceleration $8isplace&ent $ orsepo er $ and 4$ ' arecontin(o(s meas(rements on indi"id(al car models# The "aria&le 4odelIYear indicates the*ear in which the car was made# >o( can create a gro(ped plot matrix of these "aria&les(sing the +plot&atrix f(nction#

x = ,4$ orsepo er 8isplace&ent 3ei+ht-6+plot&atrix x!,-!4odelIYear!,-!'Jxo'"

%When the second arg(ment of +plot&atrix is empt*$ the f(nction graphs the col(mns ofthe x arg(ment against each other$ and places histograms along the diagonals# The empt*fo(rth arg(ment prod(ces a graph with the defa(lt colors# The fifth arg(ment controls thes*m&ols (sed to disting(ish &etween gro(ps#'

!t appears the cars do differ from *ear to *ear# The (pper right plot$ for example$ is a graph of4$ "ers(s 3ei+ht # The 0G 1 cars appear to ha"e higher mileage than the older cars$ and the*

appear to weigh less on a"erage# /(t as a gro(p$ are the three *ears significantl* differentfrom one another? The &anova1 f(nction can answer that 6(estion#

,d!p!stats- = &anova1 x!4odelIYear"d = 2p = 1.0eD00< K 0 0.11)1stats =

3# ,)x) double- H# ,)x) double-

T# ,)x) double- df3# 70 dfH# 2 dfT# 72 la&bda# ,2x1 double- chis?# ,2x1 double- chis?df# ,2x1 double- ei+enval# ,)x1 double- ei+envec# ,)x) double- canon# ,100x) double- &dist# ,100x1 double- +&dist# ,(x( double-

The &anova1 f(nction prod(ces three o(tp(ts:

• The first o(tp(t$ d$ is an estimate of the dimension of the gro(p means# !f the meanswere all the same$ the dimension wo(ld &e @$ indicating that the means are at the same

point# !f the means differed &(t fell along a line$ the dimension wo(ld &e 0# !n theexample the dimension is 1$ indicating that the gro(p means fall in a plane &(t notalong a line# This is the largest possi&le dimension for the means of three gro(ps#

• The second o(tp(t$ p$ is a "ector of p-"al(es for a se6(ence of tests# The first p "al(etests whether the dimension is @$ the next whether the dimension is 0$ and so on# !nthis case &oth p-"al(es are small# That9s wh* the estimated dimension is 1#

• The third o(tp(t$ stats $ is a str(ct(re containing se"eral fields$ descri&ed in thefollowing section#

The ields of the stats "tr#ct#re

The 3$H$ and T fields are matrix analogs to the within$ &etween$ and total s(ms of s6(ares inordinar* one-wa* anal*sis of "ariance# The next three fields are the degrees of freedom forthese matrices# Fields la&bda $chis? $ and chis?df are the ingredients of the test for thedimensionalit* of the gro(p means# %The p-"al(es for these tests are the first o(tp(t arg(mentof &anova1 #'

The next three fields are (sed to do a canonical anal*sis# Recall that in principal componentsanal*sis % rincipal Component Anal*sis % CA' ' *o( loo+ for the com&ination of the original"aria&les that has the largest possi&le "ariation# !n m(lti"ariate anal*sis of "ariance$ *o(instead loo+ for the linear com&ination of the original "aria&les that has the largest separation

&etween gro(ps# !t is the single "aria&le that wo(ld gi"e the most significant res(lt in a(ni"ariate one-wa* anal*sis of "ariance# Ha"ing fo(nd that com&ination$ *o( next loo+ for

the com&ination with the second highest separation$ and so on#

The ei+envec field is a matrix that defines the coefficients of the linear com&inations of theoriginal "aria&les# The ei+enval field is a "ector meas(ring the ratio of the &etween-gro(p"ariance to the within-gro(p "ariance for the corresponding linear com&ination# The canon field is a matrix of the canonical "aria&le "al(es# Each col(mn is a linear com&ination of themean-centered original "aria&les$ (sing coefficients from the ei+envec matrix#

A gro(ped scatter plot of the first two canonical "aria&les shows more separation &etweengro(ps then a gro(ped scatter plot of an* pair of original "aria&les# !n this example it showsthree clo(ds of points$ o"erlapping &(t with distinct centers# 7ne point in the &ottom right sitsapart from the others# /* (sing the +na&e f(nction$ *o( can see that this is the 1@th point#

c1 = #!1"6c2 = #!2"6+scatter c2!c1!4odelIYear!,-!'oxs'"+na&e

Page 22: Classification Trees and Regression Trees

8/12/2019 Classification Trees and Regression Trees 22/38

atten(ation in the high fre6(enc* components# !f discontin(ities exist in the << inter"alseries$ either &eca(se of the presence of a&normal &eats or &eca(se of gaps or extreme noisein the original EC8 recording$ traditional approaches re6(ire either discarding the data org(esswor+ to estimate the locations of missing normal &eats# To eliminate the need for e"enl*sampled data re6(ired &* Fo(rier or maxim(m entrop* methods$ fre6(enc* domain spectra

can &e calc(lated (sing the .om& periodogram for (ne"enl* sampled data I $ J %the method(sed in this tool+it'#

Altho(gh the long term %15-ho(r' statistics of ) A<<$ ) <<! L and D.F power can &ecalc(lated for shorter data lengths$ the* will &ecome increasingl* (nrelia&le# For short-termdata %less than 0B min(tes in length'$ onl* the time domain meas(res of AV<<$ ) <<$rM)) and p<<B@ and the fre6(enc* domain meas(res of total power$ V.F power$ HF

power and .F HF ratio sho(ld &e (sed#

A n(m&er of the HRV meas(res are highl* correlated with each other# These incl(de ) <<$) A<<$ total power and D.F power= ) <<! L$ V.F power and .F power= and rM)) $

p<<B@ and HF power# The .F HF ratio does not correlate strongl* with an* other HRVmeas(res I5J#

Heart rate "aria&ilit* %HRV' has &een widel* applied in &asic and clinical research st(dies# !tsclinical application is "er* limited at present$ howe"er# These limitations are d(e to lac+ ofstandardi2ation of methodolog* and application to different non-compara&le s(&sets ofs(&3ects$ as well as to the confo(nding effects of age$ gender$ dr(gs$ health stat(s$ andchrono&iologic "ariations$ among others# F(rthermore$ o(tliers d(e to ectop* and artifact canha"e ma3or effects on comp(ted HRV "al(es# !n elderl* s(&3ects$ especiall*$ a sp(rio(sl*high "al(e of certain meas(res ma* &e d(e to the effects of erratic s(pra"entric(lar rh*thmIGJ d(e to s(&tle atrial ectop*$ wandering atrial pacema+er$ or sin(s node cond(ctiona&normalities# Additional information on heart rate d*namics and anal*sis techni6(es$incl(ding non-linear and complexit* &ased meas(res$ can &e fo(nd in the HRV 1@@ co(rsenotes and elsewhere on h*sio<et %see for example: etrended Fl(ct(ation Anal*sis $M(ltiscale Entrop* Anal*sis $ and !nformation-/ased )imilarit* $ among others'#

h*sio<et9s HRV Tool+it$ a"aila&le here$ is a rigoro(sl* "alidated pac+age of open so(rcesoftware for HRV anal*sis$ incl(ding "is(ali2ation of << inter"al time series$ a(tomatedo(tlier remo"al$ and calc(lation of the &asic time- and fre6(enc*-domain HRV statisticswidel* (sed in the literat(re$ incl(ding all of those listed in the ta&les &elow#

)e"eral other high-6(alit*$ freel* a"aila&le HRV tool+its ma* also &e of interest toresearchers= lin+s to them are pro"ided at the end of this page#

Ta(le 1: Commonl2 #sed time3domain meas#res

AVNN * Average of all NN intervals

SDNN * Standard deviation of all NN intervals

SDANNStandard deviation of the averages of NN intervals in all 5 !in"teseg!ents of a 24 ho"r re#ording


&ean of the standard deviations of NN intervals in all 5 !in"teseg!ents of a 24 ho"r re#ording

r&SSD * S'"are root of the !ean of the s'"ares of differen#es (et)eenad a#ent NN intervals

+NN50 * ,er#entage of differen#es (et)een ad a#ent NN intervals that aregreater than 50 !s- a !e!(er of the larger +NN fa!il/ 6

N )hort-term HRV statistics

Ta(le 4: Commonl2 #sed fre5#enc23domain meas#res

, * otal s+e#tral +o)er of all NN intervals "+ to 0.04

8 : otal s+e#tral +o)er of all NN intervals "+ to 0.003

V : * otal s+e#tral +o)er of all NN intervals (et)een 0.003 and 0.04

: * otal s+e#tral +o)er of all NN intervals (et)een 0.04 and 0.15 .

: * otal s+e#tral +o)er of all NN intervals (et)een 0.15 and 0.4

:; : * atio of lo) to high fre'"en#/ +o)er

N )hort-term HRV statistics %V.F O spectral power &etween @ and @#@5 H2#'

"elected References:

I0J Wolf MM$ Varigos 8A$ H(nt $ )loman P8# )in(s arrh*thmia in ac(te m*ocardialinfarction# Med P A(st 0G =1:B1-B4#

I1J Qleiger RE$ Miller P $ /igger PT$ Moss AP$ and the M(lticenter ost-!nfarction Research8ro(p# ecreased heart rate "aria&ilit* and its association with increased mortalit* after ac(te

m*ocardial infarction# Am P Cardiol 0G =BG:1B -1 1#

I4J Tas+ Force of the E(ropean )ociet* of Cardiolog* and the <orth American )ociet* ofacing and Electroph*siolog*# Heart rate "aria&ilit*: )tandards of meas(rement$

ph*siological interpretation$ and clinical (se# Circ(lation 0GG = G4:0@54-0@ B#

I5J Miet(s PE# Time domain meas(res: from "ariance to p<<x #

http: ph*sionet#org e"ents hr"-1@@ miet(s-0#pdf

IBJ arati 8$ Mancia 8$ i Rien2o M$ Castiglioni $ Ta*lor PA$ )t(dinger # oint-Co(nterpoint: Cardio"asc(lar "aria&ilit* is is not an index of a(tonomic control ofcirc(lation# P Appl h*siol 1@@ = 0@0: - 1#

I J Miet(s PE$ eng C-Q$ Henr* !$ 8oldsmith R.$ 8old&erger A.# The p<<x-files:Reexamining a widel*-(sed heart rate "aria&ilit* meas(re# Heart 1@@1= :4 -4 @#

I J ress WH$ Te(+ols+* )A$ Vetterling WT$ Flanner* / # <(merical Recipes in C: The Artof )cientific Comp(ting$ 1nd ed# Cam&ridge Dni"# ress$ 0GG1$ pp# B B-B 5#

I J Mood* 8/# )pectral anal*sis of heart rate witho(t resampling # Comp(ters in Cardiolog*0GG4= 0B- 0 #

IGJ )tein Q$ >ane2 $ omitro"ich $ 8ottdiener P$ Cha"es $ Qronmal R$ Ra(tahar3( #Heart rate "aria&ilit* is confo(nded &* the presence of erratic sin(s rh*thm# Comp(ters inCardiolog* 1@@1= G- 1#

&&) O(taining the HRV Tool0it

The HRV Tool0it: Contents

The HRV Tool+it consists of a Lsa+es file$ a 4aAefile $ and the following programs:

pltIrrs a shell s#ri+t for +lotting ;NN interval series

+etIhrv a shell s#ri+t for #al#"lating ti!e and fre'"en#/ do!ain V statisti#s

Scripts above use the programs below, and others from the WFDB and plt packages

rrlist.c < #ode for e tra#ting an interval list fro! an annotation file

filt.c < #ode for filtering ;NN intervals

filtnn.c < #ode for filtering ;NN intervals

statnn.c < #ode for #al#"lating ti!e do!ain statisti#s

p r.c< #ode for #al#"lating +o)er in "+ to 10 fre'"en#/ (ands fro! a +o)ers+e#tr"!

seconds.c < #ode for #onverting hh=!!=ss to se#onds

hours.c < #ode for #onverting se#onds to hh=!!=ss

pltIrrs and +etIhrv are the main scripts (sed in calc(lating HRV as ill(strated &elow#These two scripts call the "ario(s C programs to accomplish their calc(lations# The C

programs can &e (sed separatel* and their (sage and "ario(s options can &e fo(nd in Dsages $or &* r(nning an* of these programs with the Dh option#

*ownloading and &nstalling the HRV Tool0it!nstalling the HRV tool+it is eas*:

1. n indo)s> install the free </g)in soft)are first- see o"r t"torial for details. </g)inin#l"des the +cc < #o!+iler and all other "tilities needed to ("ild and r"n the#o!+onents of the V ool?it on indo)s. Start a </g)in @ter!inal e!"lator

)indo) and "se it for all of the re!aining ste+s.2. $nstall the free :DB and +lt soft)are +a#?ages.

3. he V tool?it is availa(le as a tar(all of so"r#es @for all +latfor!s or as tar(alls of+re("ilt (inaries for CN8; in" @ 6 > &a# S % @ 6 >Solaris @S,A < > or

indo)s;</g)in . Do)nload the version of /o"r #hoi#e.

4. 8n+a#? the V tool?it tar(all /o" do)nloaded. @See the ,h/sioNet :AE forinfor!ation on "n+a#?ing tar(alls .

5. $f /o" do)nloaded the so"r#es> enter the so"r#e dire#tor/ @ M.src and #o!+ileand install the tool?it (/ t/+ing=

<. &aAe install

$f /o" do)nloaded the (inaries> !ove the #ontents of the M dire#tor/ into so!edire#tor/ in /o"r $@T @or add the M dire#tor/ to /o"r $@T . he (inaries re'"ire

the sa!e additional +a#?ages as the so"r#e distri("tion @ 3N8H>plt > and @onindo)s </g)in .

&&&) -sing the HRV tool0it

-ser interface

he V tool?it does not in#l"de a gra+hi#al "ser interfa#e. $ts #o!+onents are #o!!andline tools that !"st (e r"n fro! a ter!inal )indo) @"nder &S indo)s> a </g)in )indo)or (/ a shell s#ri+t. @Fven pltIrrs !"st (e started fro! the #o!!and line or a s#ri+t>altho"gh it +rod"#es gra+hi#al o"t+"t.

&np#t data format

/oth pltIrrs %for plotting the RR << inter"al series' and +etIhrv %for calc(lating the HRVstatistics' can ta+e as inp(t either a h*sio/an+-compati&le &eat annotation file or a text filecontaining an RR inter"al list# RR inter"al lists can &e in an* of fo(r formats:

• 3 #ol"!ns @ > > A• 2 #ol"!ns @ > A

• 2 #ol"!ns @ >

• 1 #ol"!n @

)here is the ti!e of o##"rren#e of the (eginning of the interval> is the d"ration ofthe interval> and A is a (eat la(el. Nor!al sin"s (eats are la(eled .

Altho(gh T is ass(med to &e expressed in seconds &* defa(lt$ the DO option of pltIrrs and+etIhrv %see &elow' can &e (sed if the RR inter"al list contains T "al(es expressed as cloc+time %hh:mm:ss#xxx'$ ho(rs %hh#xxxxxxx' or min(tes %mm#xxxxx'# )imilarl*$ altho(gh RR isass(med to &e expressed in seconds$ inter"als in milliseconds can &e inp(t &* (sing the -m9option# %)ee details &elow or in Dsages '#

/eat annotation files are a"aila&le for most of the h*sio/an+ records that incl(de EC8s# Forinformation a&o(t record and annotation con"entions see the h*sio<et FA # !f *o( wish tost(d* a recording for which no &eat annotation file or RR inter"al listing is a"aila&le$ *o(ma* &e a&le to create an annotation file (sing software from the WF / software pac+age#Additional information on RR inter"als$ heart rate$ and HRV can &e fo(nd at theRR HR HRV Howto #

&np#t #sed in this t#torial

The examples &elow read the ec+ annotations from record chf0( of the /! MC Congesti"eHeart Fail(re ata&ase # To reprod(ce the res(lts shown &elow$ it is not necessar* todownload this annotation file$ &eca(se applications that (se the WF / li&rar* %incl(dingthose in the HRV Tool+it that read annotation files' can locate and read the cop* from the

h*sio<et we& ser"er if no local cop* exists# !n order to do so s(ccessf(ll*$ it is necessar* tospecif* the path to the record from the h*sio/an+ archi"e director* within the record name$as shown in these examples %(se chfdbPchf0( rather than simpl* chf0( '# Examples thatill(strate inp(t from RR inter"al lists ass(me that the inp(t file is named chf0(.rr $ and thatit is in the c(rrent %local' director*#

plt_rrs : plotting the RR6.. inter'al time series

<< inter"al o(tliers d(e to missed or false &eat detections can serio(sl* corr(pt HRVstatistics# Most fre6(enc* domain meas(res are especiall* s(scepti&le to o(tliers$ partic(larl*.F and HF power which can &e in error &* greater than 0@@@S# Most time domain meas(resare less affected &* o(tliers$ &(t still can gi"e errors in excess of 0@@S# AV<<$ p<<B@ andD.F power are least affected$ generall* with errors less than 0@S %see Time domain

Meas(res: From Variance to p<<x '#

Page 29: Classification Trees and Regression Trees

pltIrrs can &e (sed with either an annotation file or an RR inter"al listing %see &elow' and

has the following options:

pltIrrs ,options- D rrfile Q record annotator ,start ,end-- $lot intervals or interval heart rates options # ,D$ 2Q)Q Q1<Q2)Q(2- # plot 2! )! ! 1<! 2) or (2 hours per pa+e default# scale pa+e len+th to data len+th" ,D rrfile- # interval file # ti&e sec"! interval ,D - # plot intervals instead of intervals ,D - # plot P interval heart rate ,DN Rfilt h inR- # filter intervals! plot filtered data ,Df Rfilt h inR- # filter intervals ,Dp- # plot points ,DO cQhQ&- # input ti&e for&at# hh##&&#ss! hours! &inutes default#seconds" ,D&- # intervals in &sec ,D R &in &axR- # axis li&its ,Do- # output postscriptB/ defa"lt> pltIrrs )ill o+en a plt )indo) and dis+la/ the +lot on the ter!inal s#reen. oo"t+"t the +lot as a +osts#ri+t file "se the Do o+tion. o +rint the +lot to a +osts#ri+t +rinter"se

pltIrrs Do other arguments Q lpr

Dsing an annotation file$ li+e this:

pltIrrs chfdbPchf0( ec+ )ill +lot the entire interval series for the #ongestive heart fail"re re#ord #hfd(;#hf03 "singthe ec+ annotator. B/ defa"lt> pltIrrs s#ales the +lot so that the entire interval series is+lotted on one +age> )ith a !a i!"! of lines +er +age. o +lot a s+e#ified n"!(er ofho"rs +er +age> "se the D$ o+tion. :or e a!+le> D$ ) )ill +rint 4 ho"rs +er +age @15 !in"tes+er line > D$ 1< )ill +rint 16 ho"rs +er +age @2 ho"rs +er line > et#.

To plot selected portions of the data$ specif* a start time and optionall* an end time# Forexample$

pltIrrs D$ chfdbPchf0( ec+ 00#00#00 01#00#00+lots the first ho"r of the interval se'"en#e=

Page 30: Classification Trees and Regression Trees

pltIrrs D$ Dp chfdbPchf0( ec+ 00#00#00 01#00#00pltIrrs +lots the interval se'"en#e as individ"al +oints=

With the D option$ as in

pltIrrs D$ D chfdbPchf0( ec+ 00#00#00 01#00#00pltIrrs +lots NN intervals onl/> o!itting intervals that are not (o"nded (/ nor!al sin"s(eats at (oth ends=

The # = ( 2( # )20( = 0.710 ,( 0 nonD - in the title indicates the totaln(m&er of << inter"als$ the total n(m&er of RR inter"als$ the fraction of RR inter"als that are

<< inter"als and the n(m&er of non-<< inter"als#

To filter o(tliers$ (se the DN or Df options$ as in

Page 31: Classification Trees and Regression Trees

his +ro#ed"re )ill +lot the first ho"r of the filtered NN intervals se'"en#e. ere the G DN 0.220 Dx 0.) 2.0 G s+e#ifies the filtering +ara!eters as follo)s. :irst> an/ intervals less than0.4 se# or greater than 2.0 se# are e #l"ded. Ne t> "sing a )indo) of 41 intervals @20

intervals on either side of the #entral +oint > the average over the )indo) is #al#"latede #l"ding the #entral interval. $f the #entral interval lies o"tside 20H @0.2 of the )indo)average this interval is flagged as an o"tlier and e #l"ded. hen the )indo) is advan#ed tothe ne t interval. hese +ara!eters #an (e ad "sted as a++ro+riate for different data sets.

Dsing DN allows pltIrrs to plot the excl(ded inter"als as small filled circles:

The Df option s(ppresses o(tp(t of excl(ded inter"als:

The RNilt # # = (;<2 # ( 2( # )20( = 7 ) # 0.710 = 0. 75 ,<1Niltered! ( 0 nonD -R in the title of the two plots a&o"e gi"es the total n(m&er of <<inter"als remaining after filtering$ the total n(m&er of << inter"als &efore filtering$ the totaln(m&er of RR inter"als$ the fraction of << inter"als remaining after filtering$ the fraction ofRR inter"als that are << inter"als$ the fraction of the total n(m&er of RR inter"als that are

<< inter"als remaining after filtering$ and the n(m&er of << inter"als filtered o(t togetherwith the n(m&er of non-<< inter"als#

Filtering the << inter"al data ma* not &e necessar* if there are no extreme o(tliers#

To plot the RR inter"al series of an RR inter"al file containing three col(mns of data : time in

seconds$ inter"al in seconds and &eat la&el$ (se the D option followed &* the name of the

Page 32: Classification Trees and Regression Trees

1.(5< 0.752 2.(12 0.75< (.252 0.7)0 ).212 0.7<0 5.15< 0.7)) <.112 0.75< ;.0) 0.7(< ;.77< 0.7)

.7<0 0.7<) 7.700 0.7)0 ...

"se pltIrrs D chf0(.rr )ith an/ of the a(ove o+tions to +lot its #ontents.

The a&o"e RR inter"al list was generated (sing the command

rrlist ec+ chfdbPchf0( Ds :chf0(.rr@Noti#e that the re#ord and annotator arg"!ents a++ear in reverse order in rrlist #o!!ands. he vario"s other o+tions availa(le for rrlist are given in 8sages .

!f an RR inter"al listing contains onl* time in seconds and inter"al in seconds$ e#g#$

1.(5< 0.7522.(12 0.75<(.252 0.7)0

).212 0.7<05.15< 0.7))<.112 0.75<;.0) 0.7(<;.77< 0.7)

.7<0 0.7<)7.700 0.7)0...

then a #o!!and s"#h as pltIrrs D chf0(.rr +lots the interval se'"en#e #orre#tl/>("t )ill not (e a(le to +lot the NN interval se'"en#e "sing the D o+tion> sin#e the (eatla(els are !issing.

To plot an RR inter"al listing containing onl* RR inter"als in seconds and &eat la&els$ e#g#$

0.752 0.75< 0.7)0 0.7<0 0.7)) 0.75< 0.7(< 0.7) 0.7<) 0.7)0 .

Page 33: Classification Trees and Regression Trees

Finall*$ if the RR inter"al listing contains onl* RR inter"als in seconds$ e#g#$



.then pltIrrs D chf0(.rr #al#"lates the ti!e of the ti!e of the interval fro! the interval se'"en#e> ("t as in a +revio"s e a!+le )ill not (e a(le to +lot the NN intervalse'"en#e "sing the D o+tion.

!f the RR inter"als in an* of the a&o"e formats are listed in milliseconds rather than seconds$(se pltIrrs with the D& option#

get_hrv : calc#lating the HRV statistics

To calc(late the time and fre6(enc* domain HRV statistics (se +etIhrv $ where the optionsare:

+etIhrv ,options- D rrfile Q record annotator ,start ,end--et M statistics #

SC # P @M /8 /8@ /8 O 8X 4//8 $ # T T$3 L>N M>N>N N >NP N

options # ,D rrfile- # interval file # ti&e sec"! interval ,Df Rfilt h inR- # filter outliers ,Dp Rnndiff ...R- # nn difference for pnn default# 50 &sec" ,D$ Rlo1 hi1 lo2 hi2 lo( hi( lo) hi)R- # po er bands

default # 0 0.00(( 0.00(( 0.0) 0.0) 0.15 0.15 0.)" ,Ds- # short ter& stats of SC # P @M /8 4//8 $ # T T$3 M>N >N N >NP N ,DO cQhQ&- # input ti&e for&at# hh##&&#ss! hours! &inutes

default# seconds" ,D&- # intervals in &sec

,D4- # output statistics in &sec rather than sec ,D>- # output statistics on one line

,D/- # plot M results on screen

plottin+ options # ,DN Rfilt h inR- # filter outliers! plot filtered data ,D R &in &axR- # ti&e series Daxis li&its RD DR for selfD

scalin+" ,DX &axfre?- # fft &axi&u& fre?uenc default # 0.) B"

,DY fft&ax- # fft &axi&u& RDR for selfDscalin+" ,Do- # output plot in postscript

+etIhrv "ses statnn to #al#"late the ti!e do!ain statisti#s> and lo&b @fro! the :DBsoft)are +a#?age and p r to #al#"late the fre'"en#/ do!ain statisti#s. NN; is thefra#tion of total intervals that are #lassified as nor!al to nor!al @NN intervals and

in#l"ded in the #al#"lation of V statisti#s. his ratio #an (e "sed as a !eas"re of datarelia(ilit/. :or e a!+le> if the NN; ratio is less than 0. > fe)er than 0H of the intervals are #lassified as NN intervals> and the res"lts )ill (e so!e)hat "nrelia(le.

The command

+etIhrv Df R0.2 20 Dx 0.) 2.0R Dp R20 50R chfdbPchf0( ec+sho"ld give the follo)ing o"t+"t=

chfdbPchf0( # P = 0.7)2 77 @M = 0. 7220< /8 = 0.05);12 /8@ = 0.0)<<;;( /8 O8X = 0.02)112) r4//8 = 0.01;;<7) p 20 = 0.0< <7 p 50 = 0.0252( 7 T T $3 = 0.00(( 2 L>N $3 = 0.002;0) 7 M>N $3 = 0.000(701 >N $3 = 0.00012)10) N $3 = 0.0001<702; >NP N = 0.;()22<$n this #ase AVNN> SDNN> SDANN> SDNN$D% are r&SSD are given in se#onds> +NN val"esas ratios and , val"es in se#onds s'"ared. Note that the e a#t val"es o(tained ondifferent +latfor!s !a/ var/ (/ a++ro i!atel/ 10 6 d"e to ro"nd off errors.

/* defa(lt$ +etIhrv o(tp(ts the HRV statistics in seconds# To o(tp(t res(lts in milliseconds(se the D4 option$ as in

+etIhrv D4 Df R0.2 20 Dx 0.) 2.0R Dp R20 50R chfdbPchf0( ec+his sho"ld give the follo)ing o"t+"t=

chfdbPchf0( # P = 0.7)2 77 @M = 72.20< /8 = 5).;12 /8@ = )<.<;;( /8 O8X = 2).112) r4//8 = 1;.;<7) p 20 = <. <7 p 50 = 2.52( 7 T T $3 = (( .<5 L>N $3 = 2;05.17 M>N $3 = (70.25 >N $3 = 12).15< N $3 = 1<7.05; >NP N = 0.;())0(

$n this #ase AVNN> SDNN> SDANN> SDNN$D% are r&SSD are given in !illise#onds> +NNval"es as +er#entages and , val"es in !illise#onds s'"ared.

Page 35: Classification Trees and Regression Trees

inter"als of the << inter"al time series: the non-normal &eats as small open circles ando(tliers as small filled circles# For example

+etIhrv D/ Ds D4 DN R0.2 20 Dx 0.) 2.0R Dp R20 50R chf0( ec+ 0#00#001#00#00

)ill generate the follo)ing +lot=

