8/6/2019 Chemo Metrics Application 070511
1/35
Data TreatmentData Treatment
ChemometricsChemometricsdefined as the application ofdefined as the application ofmathematical, statistical, graphical ormathematical, statistical, graphical or
symbolic methods to maximize thesymbolic methods to maximize thechemical information that can bechemical information that can be
extracted from the dataextracted from the data
8/6/2019 Chemo Metrics Application 070511
2/35
Selection ofSuitable Microwave Digestion MethodSelection ofSuitable Microwave Digestion Method
PCA (Principal component analysis)PCA (Principal component analysis)
SIMCA (Soft Independent Modeling ofClass Analogies)SIMCA (Soft Independent Modeling ofClass Analogies)
PROMETHEE (PreferenceRanking OrganizationPROMETHEE (PreferenceRanking Organization
METHod for Enrichment Evaluation)METHod for Enrichment Evaluation)
GAIA (GeometricalAnalysis for InteractiveAid)GAIA (GeometricalAnalysis for InteractiveAid)
FuzzyClusteringFuzzyClustering
(From Kokot, et al., 1992.Anal. Chim.Acta, 259, 267-279)
8/6/2019 Chemo Metrics Application 070511
3/35
Principal Component Analysis (PCA)Principal Component Analysis (PCA)
A summarization and data reductionA summarization and data reductiontechniquetechnique
Examines the interrelationships among aExamines the interrelationships among alarge number of variables and thenlarge number of variables and thenattempts to explain them in terms of theirattempts to explain them in terms of theircommon underlying dimensions, referred tocommon underlying dimensions, referred to
as componentsas components
8/6/2019 Chemo Metrics Application 070511
4/35
PCAPCA
Based on the the derivation of linearBased on the the derivation of linear
combinations of the original variables tocombinations of the original variables toproduce principal componentsproduce principal componentscharacterized by scores and loadingscharacterized by scores and loadings
PCPCjkjk= a= aj1j1xxk1k1 + a+ aj2j2xxk2k2 + ...... + a+ ...... + ajnjnxxknkn
where PCwhere PCjkjk= the score for object k on component j, a= the score for object k on component j, ajiji = the loading of variable= the loading of variableiion component j, xon component j, xkiki = the measured value of a variable= the measured value of a variable iion object k and n =on object k and n =total number of original variablestotal number of original variables
SCORESSCORES projections of objects in aprojections of objects in a
particular componentparticular component
LOADINGSLOADINGS reflect the contribution of eachreflect the contribution of eachvariable to a particular componentvariable to a particular component
8/6/2019 Chemo Metrics Application 070511
5/35
PCAPCA
BIPLOTBIPLOT displays scaleddisplays scaled
scores and loadings in ascores and loadings in a
PC planePC plane
1st component1st component accounts for the largestaccounts for the largest
amount of variationamount of variation
Subsequent componentsSubsequent components
decreasing amounts ofdecreasing amounts of
data variancedata varianceFrom Kokot, et al., 1992.Anal. Chim.Acta, 259, 267-279
8/6/2019 Chemo Metrics Application 070511
6/35
ExtractedExtracted
Information:Information:
The objects (methods of digestion)The objects (methods of digestion)appear to cluster in at least twoappear to cluster in at least twogroups based on the six metals (Cu,groups based on the six metals (Cu,Pb, Ni, Cr, Co, and Zn) variables.Pb, Ni, Cr, Co, and Zn) variables.
Group IGroup I -- methods 4Cb, 7Ab, andmethods 4Cb, 7Ab, andHPb. No hydrofluoric acid (HF) inHPb. No hydrofluoric acid (HF) inacid mixtures.acid mixtures.
Group II consists of methods thatGroup II consists of methods thatcontain HF in their acid digest.contain HF in their acid digest.
The presence of HF plays a majorThe presence of HF plays a majorrole in the discrimination ofrole in the discrimination ofmethods into groups.methods into groups.
A typical method of digestion, 8Ab,A typical method of digestion, 8Ab,appeared either as an outlier or aappeared either as an outlier or asingle member group.single member group.
The metals Cr and Pb are two mostThe metals Cr and Pb are two mostdiscriminating variables.discriminating variables.
From Kokot, et al., 1992.Anal. Chim.Acta, 259, 267-279
8/6/2019 Chemo Metrics Application 070511
7/35
Soft Independent Modeling of Class AnalogiesSoft Independent Modeling of Class Analogies
(SIMCA)(SIMCA)
Uses PCA to model the shape
and position of the object formed
by the samples in row space for
class definition
The shape of a class depends
on the number of components
used in the model
To predict the classification of
future samples, it is necessary to
determine what region of
measurement space it occupies
8/6/2019 Chemo Metrics Application 070511
8/35
SIMCASIMCA
1. Compute for residual standard deviation (RSD)1. Compute for residual standard deviation (RSD)
for a class as a whole ( mean distance between thefor a class as a whole ( mean distance between the
objects of a class and the class model)objects of a class and the class model)
2. Compute for RSD for each object (orthogonal2. Compute for RSD for each object (orthogonal
distance between the object and the class model)distance between the object and the class model)
3.Compute F value from the computed residuals. If3.Compute F value from the computed residuals. IfFcal < Fcrit , the unknown sample is a member ofFcal < Fcrit , the unknown sample is a member of
a classa class
Procedure:
8/6/2019 Chemo Metrics Application 070511
9/35
Extracted Information :Extracted Information : Only method 4C andOnly method 4C and
probably 11A could beprobably 11A could bepart of the training setspart of the training setsconsisted of digestionconsisted of digestionmethods with HF in theirmethods with HF in theiracid mixtures.acid mixtures.
This means that methodsThis means that methods4C and 11A could perform4C and 11A could performrelatively well as thoserelatively well as thosedigestion methods with HFdigestion methods with HFincluded in their acidincluded in their acidmixtures based on themixtures based on thedefined variables.defined variables.From Kokot, et al., 1992.Anal. Chim. Acta, 259, 267-279
8/6/2019 Chemo Metrics Application 070511
10/35
Preference Ranking Organization METHod forPreference Ranking Organization METHod for
Enrichment Evaluation (PROMETHEE) andEnrichment Evaluation (PROMETHEE) and
Geometrical Analysis for Interactive Aid (GAIA)Geometrical Analysis for Interactive Aid (GAIA)
PROMETHEEPROMETHEE
Designed to rank number of actions (objects)Designed to rank number of actions (objects)
in the context of constraints present in orin the context of constraints present in orimposed on the dataimposed on the data
Ranking is performed according to a set ofRanking is performed according to a set ofuser supplied preference conditions which areuser supplied preference conditions which areapplied to the criteria (variables)applied to the criteria (variables)
8/6/2019 Chemo Metrics Application 070511
11/35
Method Metal content (g g-1
)Cu Pb Co Zn
2B 103 155 12.6 441
4A 99.5 166 15 432
4B 91.4 159 16 433
6A 103 145 13.7 432
8B 98 159 13 421
8C 99 164 13 435
NBS 2704 98.6 5.0 161 17 14.0 0.6 438 12
objects variables
preference conditions
8/6/2019 Chemo Metrics Application 070511
12/35
8/6/2019 Chemo Metrics Application 070511
13/35
Actions (methods of digestion) that arecomparable are joined by one or more
arrows, Any comparable action to the left of
another is preferred,
Any actions that are incomparable
remain unconnected.
Interpretation of flow chart
8/6/2019 Chemo Metrics Application 070511
14/35
Extracted Information:Extracted Information:
For theFor the BSRBSRset of data (denoted by a b in the label, e.g.set of data (denoted by a b in the label, e.g.7Ab), methods 12b and 4Bb outranked the others but they7Ab), methods 12b and 4Bb outranked the others but they
could not be compared because each method performedcould not be compared because each method performeddifferently on the six metal variables.differently on the six metal variables.
ForFor NBS 2704NBS 2704 data (denoted by labels without a b, e.g.data (denoted by labels without a b, e.g.7A), the performance of method 8C is comparable to7A), the performance of method 8C is comparable tomethods 4A and 8B. However, 8C is located on the left of 4Amethods 4A and 8B. However, 8C is located on the left of 4A
and 8B thus the former method is preferred than the latter.and 8B thus the former method is preferred than the latter.
From Kokot, et al., 1992.Anal. Chim. Acta, 259, 267-279
8/6/2019 Chemo Metrics Application 070511
15/35
PROMETHEE IIPROMETHEE II
Applied to eliminate the indecisive resultApplied to eliminate the indecisive resultand to produce a simple ranking scaleand to produce a simple ranking scale
Compute net outranking flow value:Compute net outranking flow value:difference between the associated positivedifference between the associated positive
and negative outranking flowsand negative outranking flows
Results are less reliable thanResults are less reliable than those ofthose of
PROMETHEE IPROMETHEE I
8/6/2019 Chemo Metrics Application 070511
16/35
Extracted Information:Extracted Information: The net flowThe net flow values for thevalues for the
two most preferredtwo most preferred BSRBSRmethods are very similar andmethods are very similar and
are well above the value forare well above the value forthe next methods 11Bb, 6Ab,the next methods 11Bb, 6Ab,and 8Ab.and 8Ab.
ForFor NBS 2704NBS 2704, method 8C, method 8Chad considerably higherhad considerably higher value than that of each of thevalue than that of each of the
next two methods.next two methods. HF in the acid mixturesHF in the acid mixtures
(methods 4Bb, 12b, and(methods 4Bb, 12b, and11Bb) plays a major role in11Bb) plays a major role inthe digestion ofBSRsamplethe digestion ofBSRsample
HCl in the acid mixturesHCl in the acid mixtures(methods 8C, 8B, and 4A)(methods 8C, 8B, and 4A)determines the efficiency ofdetermines the efficiency ofdigestion of NBS 2704digestion of NBS 2704sample.sample.
PROMETHEE II ranking for complete NBS 2704, BSRand
polished combined data. (From Kokot, et al., 1992. Anal.
Chim.Acta, 259, 267-279)
8/6/2019 Chemo Metrics Application 070511
17/35
GAIAGAIA Method for investigating theMethod for investigating thePROMETHEE resultsPROMETHEE results
Net outranking flows areNet outranking flows aredecomposed to suit for PCAdecomposed to suit for PCA
Biplot facilitates theBiplot facilitates theinterpretation of theinterpretation of thesignificance of the criteriasignificance of the criteria
From Kokot, et al., 1992.Anal. Chim. Acta, 259, 267-279
The GAIA biplot shows the discrimination of the methods ofThe GAIA biplot shows the discrimination of the methods ofdigestion according to the acid digest composition (PC2) and thedigestion according to the acid digest composition (PC2) and the
origin of the rock/sediment sample (PC1).origin of the rock/sediment sample (PC1).
The diagram is very similar to exploratory PCA, however, theThe diagram is very similar to exploratory PCA, however, the
cluster separation appears to be sharper.cluster separation appears to be sharper.
Extracted Information:
8/6/2019 Chemo Metrics Application 070511
18/35
FUZZYCLUSTERINGFUZZYCLUSTERING Attempts to assign a degree of class membership forAttempts to assign a degree of class membership for
a given object over several classesa given object over several classes
Classification is performed with the aid of aClassification is performed with the aid of amembership functionmembership function
m(x) = 1m(x) = 1 c /xc /x a/a/pp
where a and c are constants, p is a positive orwhere a and c are constants, p is a positive orconstructed with reference to the dataconstructed with reference to the data
Sum of the membership values for each object is 1Sum of the membership values for each object is 1
Main advantage: Facilitates the distinction betweenMain advantage: Facilitates the distinction betweenobjects that clearly belong to one clusterobjects that clearly belong to one cluster(membership value of 1 or close to 1) and those that(membership value of 1 or close to 1) and those thatare members of several clusters (membership valueare members of several clusters (membership value
of 1/(no. of clusters)).of 1/(no. of clusters)).
8/6/2019 Chemo Metrics Application 070511
19/35
Fuzzy Clustering
From Kokot, et al., 1992.Anal. Chim. Acta, 259, 267-279
Results are in good agreement with
the other chemometrics procedures.
For NBS 2704, methods 2B, 4C, 7A,and 11Aare the important class
members of one cluster. This cluster is
characterized by the exclusion of HF in
the acid mixtures.
The second cluster is composed ofmethods 4A, 4B, 8B, and 8C. These
methods have HF in their acid digest.
In the case ofBSR, a 3-cluster model
is more appropriate for analysis. The
use of a 2-cluster model is heavilyinfluenced by the atypical method 8Ab.
Some methods are in the intermediate
positions, e.g. 2A. They are classified as
members of two clusters.
Extracted Information:
8/6/2019 Chemo Metrics Application 070511
20/35
It was shown from the previous examples that allIt was shown from the previous examples that allthe chemometrics procedures provide consistentthe chemometrics procedures provide consistentinformation about outliers, groupings and trends.information about outliers, groupings and trends.
However, only the multicriteria decisionHowever, only the multicriteria decision--makingmakingPROMETHEE provides the rank orderPROMETHEE provides the rank orderinformation which help in the selection of suitableinformation which help in the selection of suitablemicrowave digestion method.microwave digestion method.
SIMCA and FC methods are most preferred forSIMCA and FC methods are most preferred forthe purposes of sample classification.the purposes of sample classification.
Selection ofSuitable Digestion Method
8/6/2019 Chemo Metrics Application 070511
21/35
ChemometricsChemometrics
Extraction of Latent InformationExtraction of Latent Information
8/6/2019 Chemo Metrics Application 070511
22/35
Ordination DiagramOrdination Diagram
Used in the determination of carrierUsed in the determination of carrier
substances for trace metals in sedimentssubstances for trace metals in sediments
Involved simple correlation analysis on theInvolved simple correlation analysis on theset of major and trace elementsset of major and trace elements
The positive correlation coefficientThe positive correlation coefficient
matrices obtained are graphically picturedmatrices obtained are graphically pictured
in a 2in a 2--dimensional diagramdimensional diagram
8/6/2019 Chemo Metrics Application 070511
23/35
Interpretation ofOrdination Diagram
The proximity of two variables on the diagram is a
measure of their statistical dependence.
If a trace element is significantly correlated to one orseveral major elements, it is possible that the mineral
phase containing the major elements can be their
carrier.
The validity of this hypothesis should be verified by a
chemical speciation analysis.
8/6/2019 Chemo Metrics Application 070511
24/35
Ordination DiagramOrdination Diagram (From Jaquet, et al. 1982. Hydrobiologia, 91, 139-146.)
Three major carrier substances in the whole study area:
1. Organic matter - Cd, Pb,Ag, Cu and Hg
2. Phosphates - Zn and Sn
3. Silicates Ni, V, Co, Be, Cr and Zn
contaminateddolomiticsilicate
mixed silicatemixed carbonateautochtonous carbonate
8/6/2019 Chemo Metrics Application 070511
25/35
Ordination DiagramOrdination Diagram
autochtonous carbonate mixed carbonate mixed silicate
dolomiticsilicate contaminated
Organic matter does not act as a carrier for any metal in
facies 1 (autochtonous carbonate) unlike most of the otherfacies. This exceptional behaviour has been attributed to the
fact that organic carbon in facies 1 is mostly autochtonous
whereas in other facies, particularly in facies 7, the
allochtonous, anthropogenic organic matter predominates.
(From Jaquet, et al. 1982. Hydrobiologia, 91, 139-146.)
8/6/2019 Chemo Metrics Application 070511
26/35
Linear Regression AnalysisLinear Regression Analysis
Develops linear equations from collected experimental data toDevelops linear equations from collected experimental data tomake predictions about the values of a dependent variablemake predictions about the values of a dependent variable
based on the values of one or more independent variables.based on the values of one or more independent variables.
Simple Linear RegressionSimple Linear Regression one independent variable is used toone independent variable is used topredict the value of the dependent variablepredict the value of the dependent variable
Eqn.:Eqn.: Y = a + bXY = a + bXwhere Y = dependent variablewhere Y = dependent variable
a = constant; intercepta = constant; intercept
b = slope; regression coefficient orb = slope; regression coefficient or coefficientcoefficient
X = independent variableX = independent variable
Multiple Linear RegressionMultiple Linear Regression more than one independentmore than one independentvariable is used to predict the criterion.variable is used to predict the criterion.
Eqn.:Eqn.: Y = a + bY = a + b11XX11 + b+ b22XX22+ ..........+ b+ ..........+ bnnXXnn
8/6/2019 Chemo Metrics Application 070511
27/35
Application: Normalization ProceduresApplication: Normalization Procedures
Metal: Grain size normalizationMetal: Grain size normalization Metal:Reference metal normalizationMetal:Reference metal normalization
MultiMulti--element normalizationelement normalization
Concept: Should the concentration of the metal be
related to changing sediment particle size,
the concentration will change with aconstant relation to grain size or its proxy
(Loring and Rantala, 1992. Earth Science Reviews, 235-283)
8/6/2019 Chemo Metrics Application 070511
28/35
Why do we use normalization procedures?Why do we use normalization procedures?
To reduce or eliminate grain size effects onTo reduce or eliminate grain size effects on
chemical datachemical data
Identification of anomalousIdentification of anomalous metalmetal
concentrations in sedimentsconcentrations in sediments
Determination of factors that control theDetermination of factors that control the
trace metal distribution in sedimentstrace metal distribution in sediments
(multiple regression)(multiple regression)
8/6/2019 Chemo Metrics Application 070511
29/35
(FromWindom, et al., 1989. Environ. Sci. Technol., 23, 314-320)
Example: Metal: Reference Metal NormalizationExample: Metal: Reference Metal Normalization
Al was used as a proxy forthe granular variations of the
aluminosilicate fractions
Concentration of metalscovary withAl except forCd
Data points outside the
95% confidence band wereconsidered contaminated
The slopes of theseregression equations can be
compared to the metal toaluminum ratios computedfor average continental rocksand for average continentalsoils
95% confidence band
8/6/2019 Chemo Metrics Application 070511
30/35
Hierarchical Cluster Analysis (HCA)Hierarchical Cluster Analysis (HCA)
Seeks to minimize withinSeeks to minimize within--group variance andgroup variance andmaximize betweenmaximize between--group variance and representgroup variance and representthat information in the form of a twothat information in the form of a two--dimensionaldimensionalplot called dendrogramplot called dendrogram
Result is a number of heterogeneous groups withResult is a number of heterogeneous groups withhomogeneous contentshomogeneous contents
Classify objects or variables into several mutuallyClassify objects or variables into several mutuallyexclusive groups based on the similarity of theexclusive groups based on the similarity of thecharacteristics they possesscharacteristics they possess
Develop hypothesis about the nature of the data orDevelop hypothesis about the nature of the data orexamine previously stated hypothesisexamine previously stated hypothesis
8/6/2019 Chemo Metrics Application 070511
31/35
HCAHCADendrogram
Cluster I: The group of Fe, Mn, Zn, Pb and Li. The presence
naturally occurring Li, Fe and Mn in this group suggests that the
other elements (Zn and Pb) may also be of similar (natural) origin
or they may have been distributed evenly in the coastal sediments
by the tidal activity. The oxides of Fe and Mn probably play an
important role in their distribution.
Cluster II: The group of organic carbon and Cu indicates the
role of organic matter in the distribution ofCu.
Cluster I
Cluster II
(FromAngelidis andAloupi, 2000. Mar. Poll. Bull., 77-82)
Cluster III Cluster IV
8/6/2019 Chemo Metrics Application 070511
32/35
Cluster III: The group ofAl, Cr, and Ni. The fact that Cr and Ni
apper in the same group with the naturally derivedAl, suggests
that weathering of natural rocks may play an important role in the
distribution of those metals in the sediments of the study area.
Cluster IV: The group ofCd. Cadmium forms a group of its own
which indicates that the metal has a different distribution process
compared to the other metals.
Dendrogram
Cluster III Cluster IV
(FromAngelidis andAloupi, 2000. Mar. Poll. Bull., 77-82)
8/6/2019 Chemo Metrics Application 070511
33/35
Principal Component Analysis (PCA)Principal Component Analysis (PCA)
A summarization and data reduction techniqueA summarization and data reduction technique Examines the interrelationships among a largeExamines the interrelationships among a large
number of variables and then attempts to explainnumber of variables and then attempts to explainthem in terms of their common underlyingthem in terms of their common underlying
dimensions, referred to as componentsdimensions, referred to as components Provides visual display of the data that is oftenProvides visual display of the data that is often
more enlightening than comparison of only one ormore enlightening than comparison of only one ortwo variables at a timetwo variables at a time
Used to delimit areas of most contaminatedUsed to delimit areas of most contaminatedsediments and the relative importance of thesediments and the relative importance of themajor metal anthropogenic inputsmajor metal anthropogenic inputs
8/6/2019 Chemo Metrics Application 070511
34/35
Spatial distribution of metalsSpatial distribution of metals
is explained by two PCs whichis explained by two PCs whichaccount for 77.9% of theaccount for 77.9% of thevariance.variance.
Identified three end members:Identified three end members:
thethe clean Buzzards Bayclean Buzzards Bay
sedimentssediments thethe less contaminatedless contaminated
outer harbor sedimentsouter harbor sediments
thethe contaminated innercontaminated innerharbor sedimentsharbor sediments
The first PC has separated theThe first PC has separated theclean Buzzards Bay samplesclean Buzzards Bay samplesfrom the contaminatedfrom the contaminatedsamples in New Bedfordsamples in New BedfordHarbor.Harbor.
The second PC has furtherThe second PC has furtherseparated the samples from theseparated the samples from theNew Bedford Harbor based onNew Bedford Harbor based onthe types of metals present inthe types of metals present inthe sediments.the sediments.
(From Shine, et al., 1995. Environ. Sci. Technol., 29, 1781-1788)
PC APC A
8/6/2019 Chemo Metrics Application 070511
35/35
Co, Mn and Ni define theCo, Mn and Ni define the
clean Buzzards Bayclean Buzzards Baysedimentssediments
Zn and Pb define the outerZn and Pb define the outerportion of New Bedfordportion of New BedfordHarborHarbor
Cu, Cd and Cr define theCu, Cd and Cr define thecontaminated innercontaminated innerportion of New Bedfordportion of New BedfordHarborHarbor
Each of these threeEach of these threeclusters of metals haveclusters of metals havesimilar loadings as thesimilar loadings as thegeographical clusters ingeographical clusters inthe score plot.the score plot.
(From Shine, et al., 1995. Environ. Sci. Technol., 29, 1781-1788)
PC APC A