Top Banner
A Robust Approach for Dealing with Missing Values in Compositional Data Karel Hron, Matthias Templ, Peter Filzmoser ICORS’08, Antalya, 8. 9. 2008
17

A Robust Approach for Dealing with Missing Values in Compositional Data Karel Hron, Matthias Templ, Peter Filzmoser ICORS’08, Antalya, 8. 9. 2008.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Robust Approach for Dealing with Missing Values in Compositional Data Karel Hron, Matthias Templ, Peter Filzmoser ICORS’08, Antalya, 8. 9. 2008.

A Robust Approach for Dealingwith Missing Values in Compositional Data

Karel Hron, Matthias Templ,Peter Filzmoser

ICORS’08, Antalya, 8. 9. 2008

Page 2: A Robust Approach for Dealing with Missing Values in Compositional Data Karel Hron, Matthias Templ, Peter Filzmoser ICORS’08, Antalya, 8. 9. 2008.

Compositional data (CoDa)

• ... D-part composition

• and contain essentially the same information

• simplex – sample space of D-part compositions

• D-1 dimensionality of compositions

Page 3: A Robust Approach for Dealing with Missing Values in Compositional Data Karel Hron, Matthias Templ, Peter Filzmoser ICORS’08, Antalya, 8. 9. 2008.

Standard statistics and CoDa

• difficulties when applying standard statistical methods (like correlation analysis and PCA)

• the results can be completely useless• reason: sample space of CoDa, induces different

geometrical structure (Aitchison geometry)• solution: family of logratio transformations from

the simplex to real space (Aitchison, 1986)• in case of missing values in CoDa allow for a

reasonable imputation

Page 4: A Robust Approach for Dealing with Missing Values in Compositional Data Karel Hron, Matthias Templ, Peter Filzmoser ICORS’08, Antalya, 8. 9. 2008.

Isometric logratio transformations

• shortly ilr (Egozcue et al., 2003), result in D-1 dimensional real space

• regularity of transformed data is provided, necessary for robust statistical methods

• isometry

Page 5: A Robust Approach for Dealing with Missing Values in Compositional Data Karel Hron, Matthias Templ, Peter Filzmoser ICORS’08, Antalya, 8. 9. 2008.

Ilr and balances

• interpretation of ilr coordinates (balances) in the sense of original compositional parts is not possible

• reason: definition of CoDa• solution: split the parts into separated groups

and order balances• this construction is provided using a special

procedure, called sequential binary partition

Page 6: A Robust Approach for Dealing with Missing Values in Compositional Data Karel Hron, Matthias Templ, Peter Filzmoser ICORS’08, Antalya, 8. 9. 2008.

Ilr and balances

• result of a special choice of sequential binary partition (SBP)

Page 7: A Robust Approach for Dealing with Missing Values in Compositional Data Karel Hron, Matthias Templ, Peter Filzmoser ICORS’08, Antalya, 8. 9. 2008.

Outliers and CoDa

1) caused by Aitchison geometry:• provide measure of differences between the

compositions in a natural way, respecting their relative scale property

• distinguish between the following two differences within compositional parts,

0.500 and 0.501 vs. 0.001 and 0.002• consequence: the error term in the parts is not

the same for values close to the baricentre or to the border of the simplex

Page 8: A Robust Approach for Dealing with Missing Values in Compositional Data Karel Hron, Matthias Templ, Peter Filzmoser ICORS’08, Antalya, 8. 9. 2008.

Outliers and CoDa

• solution: using ilr transformation and outlier detection (Filzmoser and Hron, 2008)

Page 9: A Robust Approach for Dealing with Missing Values in Compositional Data Karel Hron, Matthias Templ, Peter Filzmoser ICORS’08, Antalya, 8. 9. 2008.

Outliers and CoDa

2) caused by definition of CoDa:• each observed composition is a member of the

corresponding equivalence class

• every two compositions from the same class have zero Aitchison distance

• low and high values of c can simultaneously cause high Euclidean distance

Page 10: A Robust Approach for Dealing with Missing Values in Compositional Data Karel Hron, Matthias Templ, Peter Filzmoser ICORS’08, Antalya, 8. 9. 2008.

Outliers and CoDa

Page 11: A Robust Approach for Dealing with Missing Values in Compositional Data Karel Hron, Matthias Templ, Peter Filzmoser ICORS’08, Antalya, 8. 9. 2008.

Missing values in CoDa sets

• most statistical methods cannot be directly applied on data sets with missing information

• removing incomplete observations can cause an unacceptable loss of information

• most of imputation methods use assumptions like missing at random (MAR) and normality of the data

• outliers could have a dramatical influence on the estimation of missing values

Page 12: A Robust Approach for Dealing with Missing Values in Compositional Data Karel Hron, Matthias Templ, Peter Filzmoser ICORS’08, Antalya, 8. 9. 2008.

Missing values in CoDa sets

• with robust imputation methods the estimation of missings is based on the majority of the data

• existing robust methods may not deal with compositional data (another geometry of the data and wrong identification of outliers)

=> a more effective way of dealing with CoDa for imputation, with respect to the Aitchison geometry, is needed

Page 13: A Robust Approach for Dealing with Missing Values in Compositional Data Karel Hron, Matthias Templ, Peter Filzmoser ICORS’08, Antalya, 8. 9. 2008.

Robust imputation of missing values for CoDa

• we propose an iterative procedure to estimate the missing values

• initialization of the missings: fast kNN (Aitchison)• compositional part with highest amount of

missings is chosen and the data are transformed using proper ilr transformation – missing values from the chosen part (x1) appear in one ilr variable and does not contaminate the others

Page 14: A Robust Approach for Dealing with Missing Values in Compositional Data Karel Hron, Matthias Templ, Peter Filzmoser ICORS’08, Antalya, 8. 9. 2008.

Robust imputation of missing values for CoDa

• consequently, fast LTS regression (able to deal also with large data sets) of z1 on z2 ,…,zD-1 is prefered, but also other robust methods can be considered

• missing values are imputed for any variable (starting from the highest amount of missings)

• procedure is repeated in an iterative manner till convergence

Page 15: A Robust Approach for Dealing with Missing Values in Compositional Data Karel Hron, Matthias Templ, Peter Filzmoser ICORS’08, Antalya, 8. 9. 2008.

Simulation study

Page 16: A Robust Approach for Dealing with Missing Values in Compositional Data Karel Hron, Matthias Templ, Peter Filzmoser ICORS’08, Antalya, 8. 9. 2008.

Simulation study

Page 17: A Robust Approach for Dealing with Missing Values in Compositional Data Karel Hron, Matthias Templ, Peter Filzmoser ICORS’08, Antalya, 8. 9. 2008.

References

• Aitchison, J., 1986, The statistical analysis of compositional data. Chapman and Hall, London.

• Egozcue, J.J., Pawlowsky-Glahn, V., Mateu-Figueraz, G., Barceló-Vidal, C., 2003, Isometric logratio transformations for compositional data analysis. Math. Geol., vo. 35, no. 3, p. 279-300.

• Filzmoser, P., Hron, K., 2008, Outlier detection for compositional data using robust methods. Math. Geosci.,

vo. 40, no. 3, p. 233-248.