-
A of Toof AirB
l for Base
the Data
Spatasets
tio‐Ts for
2 0 13
Oliver KGerboles
ExampAT, Cand NL
empr Abn
Kracht, Hann
ple of PCZ, DE,L (2006
poral norm
nes I. Reu
PM10 d, FR, E6-2007)
Scremal Va
Report EUR
uter and M
datasetsES, IT, U
eeninalues
25787 EN
Michel
s ofUK
ngs
kracholTypewritten Text
kracholTypewritten Text
-
EuropeanCommissionJointResearchCentreInstitute for Environment
and Sustainability
ContactinformationOliverKracht,MichelGerbolesAddress:JointResearchCentre,ViaEnricoFermi2749,TP442,21027Ispra(VA),ItalyE‐mail:[email protected].:+390332785652Fax:+390332789931http://ies.jrc.ec.europa.eu/http://www.jrc.ec.europa.eu/LegalNoticeNeithertheEuropeanCommissionnoranypersonactingonbehalfoftheCommissionisresponsiblefortheusewhichmightbemadeofthispublication.EuropeDirectisaservicetohelpyoufindanswerstoyourquestionsabouttheEuropeanUnionFreephonenumber(*):0080067891011(*)Certainmobiletelephoneoperatorsdonotallowaccessto00800numbersorthesecallsmaybebilled.AgreatdealofadditionalinformationontheEuropeanUnionisavailableontheInternet.ItcanbeaccessedthroughtheEuropaserver
http://europa.eu/.JRC78437EUR25787ENISBN978‐92‐79‐28286‐7(PDF)ISSN1831‐9424(online)doi:10.2788/81552Luxembourg:PublicationsOfficeoftheEuropeanUnion,2013©EuropeanUnion,2013Reproductionisauthorisedprovidedthesourceisacknowledged.Printed
in 2013
-
Summary
In order to provide scientifically sound information for
regulatory purposes andenvironmental impact assessment, long term
meso‐ to large‐scale datasets of ambient airqualityprovidean
importantmeans forairpollutionmonitoring,
evaluationandvalidation.However,thecollectionofhighqualitydatasetswithsuitablespatialcoverageforairpollutionmanagement
and decision support poses many challenges. It is thus critical to
establishexpedient tools for the efficient assessment and data
quality control of air
pollutionmeasurementsinlargescalenationalandinternationalmonitoringnetworks.The
European Environmental Agency collects, in the Air Quality Database
named
AirBase,measurementsofambientairpollutionatmorethan6000monitoringstationsfromover30countries.
The quality of these data depends on the chosenmethod of
measurements
andQA/QCproceduresappliedbyeachcountry.Wepresentanovelmethodologytoautomaticallyscreen
the AirBase records for internal consistency and to detect
spatio‐temporal outliersnestedinthedata.We implemented a
spatio‐temporal toolset for screening abnormal valueswhich
considersbothattributevaluesandspatialrelationships.Thealgorithmsarebasedonanadaptionofthe“SmoothSpatialAttributemethod”thatwasfirstdevelopedfortheidentificationofoutliersintrafficsensors.Themethodreliesonthedefinitionofaneighbourhoodforeachairpollutantmeasurement,
corresponding to a spatio‐temporal domain limited in time (e.g.,
+/‐ 2days)anddistance(e.g.,+/‐1degree)around locationx. It
isassumedthatwithinagivenspatio‐temporaldomaininwhichtheattributevaluesofneighbourshavearelationshipduetotheemission,
transport and reaction of air pollutants, abnormal values can be
detected
byextremevaluesoftheirattributescomparedtotheattributevaluesoftheirneighbours.The
application of this method is demonstrated by a comprehensive
simulation and
dataanalysisstudybasedonthe2006and2007AirBasebackgroundstationrecordsofdailyPM10values
for a selection of 8 countries (AT, CZ, DE, ED, FR, GB, IT and NL).
These
datasetscoveredarangeofdifferentcountrysizesandcomprisedbetween35561and166436recordseach.Fromthese,thecontentofabnormaldatapointsidentifiedrangedbetween2%and4.1%oftheindividualcountrydatasets.However,
not all records did fulfill the selection criteria for being
included into
thecomputations.Furthermore,thesettingupoftheabnormalvaluestestcanalsoleadtosomemathematical
deadends restricting theverifiability of individual records. In
consequence acertainpercentageof thedatarecords (between9%and40%of
the recordsper individualcountry)had tobe
flaggedasnon‐verifiable.Thosedatapointshad tobeexcluded
fromtheinvestigationandfromthescreeningforirregularitiesforsafetyoftheconclusionsThe
implementedmethodcanbeof interestas thebasisofadataquality
screeningsystemwhen countries report their measurements to the
European Environment Agency.
Beyondthis,itcanalsoprovideasimplesolutiontoinvestigatetheaccuracyofstationclassificationinAirBase.Seenfromanotherviewpoint,
itcanaswellbeusedasatooltodetectirregularairpollutionemissionevents(e.g.theinfluenceoffires,winderosionevents,orotheraccidentalsituations).
-
Contents1
Introduction...........................................................................................................................................................5 2
Airbase......................................................................................................................................................................6 3
Methodology..........................................................................................................................................................6 4
Robustness,sensitivityandoptimisationofthescreeningtool....................................................10 4.1
Normalityofdatasetsandlogtransformation..............................................................................11 4.2
Optimisationoftheparametersusedintheabnormalvaluescreening............................16 4.2.1
Spatio‐temporallimitsoftheneighbourhood......................................................................16 4.2.2
Testthresholdforz‐test.................................................................................................................19 4.2.3
Limitvalueforincludingziinthecomputationofθ..........................................................21 4.2.4
Windowwidthforthecomputationofθ................................................................................23
4.3
Manualcalculations..................................................................................................................................25 5
Results....................................................................................................................................................................25 Annex:Z(Sx)2006/2007timeseriesandabnormaldatapointsidentificationsummaries
-
5
1 Introduction
TheEuropeanCommissionhasworked intensively on the
implementationof a harmonizedprogramme for themonitoring of air
pollutants. The harmonization program relies on
theadoptedEuropeanDirectives2008/50/ECand2004/107/EC
[1,2].Thesedirectivesdefineslimit and target values for air
pollution that should not be exceeded. Exceedances of theselimits
may have legal consequences that trigger mitigation plans. To avoid
measurementartefacts triggering suchmeasures, the Directives
endeavour to improve the quality of
themeasurementsbydefiningdataqualityobjectives(DQOs)thatrepresentthehighestallowedrelative
expanded uncertainty of measurements. The reference methods have
beenstandardized by the European Committee for Standardization
(CEN). These standardsdescribe themethodology tobeapplied for
theestimationof
themeasurementuncertainty.Thisestimationoftheuncertaintyofmeasurementsisalongandtediousprocedurethatmayrequireconsiderableexperimentalwork.From
another perspective, it is possible to derive the uncertainty of
spatially referencedmeasurements from the nugget effect of
variogram analysis. The nugget effect
representsfluctuationsofthemeasurementsonaverysmallscale(tendingtowards0).Gerbolesetal.[3]have
shown the possibility to automatically derive the uncertainty of
measurements ofambient air pollutants using an innovativemethod
based on geostatistical analysis. Duringthis study, it became clear
that abnormal values influence the geostatistical
calculation.Therefore, a detectionmodulewas developed in order to
exclude abnormal value
stationsresponsibleforhighdiscrepanciesfromthegeostatisticalevaluations.Whenthemethodwaspresented
at the meeting of the AQUILA Network of National Air Quality
ReferenceLaboratories (Ispra, June 2010) Member States
representatives and the
EuropeanEnvironmentalAgencyofficerconsideredtheabnormalvaluemoduleasavaluabletoolabletosupplyimportantinformation.Thisreportgivesdetailsaboutaconsolidatedscreeningmethodforthedetectionofabnormalvalues,
andanexampleofwarningsonabnormalvalues for2006‐2007
timeseriesofPM10datasetsinAirBase.Thisreportisintendedtothefollowingstakeholders:
Localauthoritiesthatmayusetheindicatortochecktheconsistencyoftheirstations
measurementsystemorclassification The European Environment
Agency (EEA), to take into account the robustness of
stationoutcomeswhenestimatingtrendsandstatisticsaboutairpollutioninEurope
ResearcherandscientistsusingdataofAirBaseinparticularmodellersinchargeofthe
validation of models compared to field measurements. They could
use the qualityindicators provide by our method to better
understand differences between
airpollutionestimationandfieldmeasurements.
1
Directive2004/107/ECoftheEuropeanParliamentandoftheCouncilof15December2004relatingtoarsenic,cadmium,mercury,nickelandpolycyclicaromatichydrocarbonsinambientair.OfficialJournalL23,26/01/2005.2
Directive2008/50/ECoftheEuropeanParliamentandtheCouncilof21May2008onAmbientAirQualityandCleanerAirforEurope,OfficialJournaloftheEuropeanUnionL152/1of11.6.20083
M.GerbolesandH.I.Reuter,Estimationofthemeasurementuncertaintyofambientairpollutiondatasetsusinggeostatisticalanalysis,EUR24475EN,ISBN978‐92‐79‐16358‐6,ISSN1018‐5593,DOI10.2788/44902,2010.
-
6
Due to the envisioned group of final users, a free and
extensible simulation platformwasconsidered an important point to
start from. All computer codes were created in the
RenvironmentwhichisfreelyavailableundertheGNUGeneralPublicLicense[4].
2 Airbase
The European Environmental Agency (EEA) maintains a database on
behalf of
theparticipatingcountriesthroughoutEurope,theEIONETnetwork.Memberstates(MS)areduetoreportonthebasisoftheCouncilDecision97/101/EC[5],withamendments2001/752/EC[6].Between2006and2007,over6738stationsareinthisdatabase,eachprovidingdifferentcomponents
of multi‐annual time series of air quality measurements starting in
1981.Geographically, the stations are spread all over Europe with
data collected in 36
differentcountries,including27EuropeanUnionMemberStates.The
location of measuring stations of the EIONET network is clustered
in general due tonature of themeasuring network. About 155
parameters are reported in AirBase, rangingfrom the concentrations
of inorganic/organic gases, particulate matter concentrations
andwet and dry depositionwith their speciation. IN 2008, about 66%
of all values in AirBasecomes from four different parameters: O3
(21.2%), NO2 (17.2%)/NO (8.2%), SO2
(18.8%),carbonmonoxide(9.4%)andParticulateMatter(PM109.0%,PM2,50.5%,blacksmoke1.1%TotalSuspendedParticulate–2.9%andPb/Cd/As/Ni1.5%).ThequalityofthedatadependsonthechosenmeasurementmethodandQA/QCproceduresapplied
by each country. The data in AirBase has undergone additional
quality controlperformedduring theuploadof thedata from theMS
toEEAsdatabaseusinga specificallydesigned software calledDEM
(DataExchangeModule). TheEuropeanTopic Centre
onAirandClimateChange(ETC/ACC)isalsoinvolvedindataqualitychecking.
3 Methodology
Theabnormalvalueprocedurewasimplementedbasedonalreadyexistingliterature.Chang‐TienLu[7]haveoutlinedandclassifiedseveralalgorithms[8,9,10,11,12,13,14,15,16]as4
R Development Core Team (2011): R: A language and environment for
statistical computing. http://www.R‐project.org/ 5
Council Decision 97/101/EC of 27 January 1997 establishing a reciprocal exchange of information and data from networks and individual stations measuring ambient air pollution within the Member States, Official Journal L 035 , 05/02/1997 P. 0014 ‐ 0022 6
Commission Decision 2001/752/EC of 17 October 2001 amending the Annexes to Council Decision 97/101/EC establishing a
reciprocal exchange of
information and data
from networks and
individual stations measuring ambient air pollution within
the Member States. 7
Chang‐Tien Lu, Dechang Chen, Yufeng Kou, "Detecting Spatial Outliers with Multiple Attributes,"
ictai, pp.122, 15th
IEEE International Conference on Tools with Artificial Intelligence (ICTAI'03), 2003. 8
M. Ankerst, M. Breuning, H.
Kriegel and J. Sander. Optics:
Ordering points to identify the
clustering structure
in Proceedings of the 1999 ACM SIGMOD Int. Conf. on Management of Data, Philadelphia, Pennsylvania, USA, pages 49‐60, 1999. 9
V. Barnett and T. Lewis. Outliers in Statistical Data. John Wiley, New York, 3rd Ed. 1994. 10
M. Breuning, H. Kriegel, R. T. Ng and J. Sander. OPTICS‐OF: Identifying Local Outliers in Proc. Of PKDD ’99, Prague, Czech Republic, Lectures Notes in Computer Science (LNAI 1704), pp 262‐270, Springer Verlag, 1999. 11
R. Johnson. Applied Multivariate Statistical Analysis, Prentice Hall, 1992. 12
E. Knorr and R. Ng. Algorithms for Mining Distance‐Based Outliers in Large Datasets in Pric. 24th VLDB Conference, 1998. 13
M. Kraak and F. Ormeling. Cartographer: Visualization of Spatial Data. Longman, 1996 14
F. Preparata and M. Shamos. Computational Geometry: An Introduction. Springer Verlag, 1998. 15
I. Ruts ans P. Rousseeuw.
Computing Depth Contours Of Bivariate
Point Clouds. In Computational
Statistics
and Data Analysis, 23:153‐168, 1996. 16
D. Yu, G. Shekholeslami and A. Zhang. Findout: Finding Outliers in Very Large Datasets. In Department of Computer Science and Engineering State University of New York at Buffalo, Technical report 99‐03, http://www.cse.buffalo.edu/tech‐reports/, 1999.
-
7
summarised in Figure 1. Two families of abnormal value detection
methods can bedistinguished.First theoneswhichcalculatesstatisticof
thedistributionofpollutant inonedimension and ignore geographical
location [9, 11]. The second family, the
spatial‐setabnormalvaluedetectionmethods,
considerbothattributevaluesandspatial relationships.Fromwithin this
family we used the “Smooth Spatial Attributemethod” [7] that was
firstdevelopedfortheidentificationofabnormalvaluesintrafficsensors.Thismethodisthoughttobefitfortheidentificationofabnormalvaluesinagivenhomogeneousdatasetofairqualitydatathatrepresentsinasimilarwayaquantitymeasuredintimeandspace.
TheSmoothSpatialAttributemethodreliesonthedefinitionofaneighbourhoodforeachairpollutantmeasurement.
It corresponds to a spatio‐temporal domain limited in time (+/‐
2days)anddistance(+/‐1sphericaldegrees)aroundalocationx.TheneighbourhoodisbetterunderstoodbyobservingthediagraminFigure2.Wehypothesisethatwithinagivenspatio‐temporal
domain the non spatial attribute values (air pollutants) of
neighbours have arelationship due to the
distribution/transport/emission and reaction of air pollution.
Theobjectiveof themethod is
thatabnormalvalueswillbedetectedbyextremevaluesof
theirattributevaluecomparedtotheattributevaluesof
theirneighbours.Themaincomputationcost of themethod is dominated by
the large amount ofmultiple calculations of
statisticalpropertiesperneighbourhood.Animportantconstrainofthemethodisthenormalityofthedistributionoftheattributevaluesofneighbours.TakingintoaccountthehighspatialvariabilityofPM10concentrationsaroundindustrialandtraffic
stations, it was decide to apply the screening method for detection
of possibleabnormal values to the sole stations of background type,
but for all area types (urban,suburbanandrural).In the following
text, we use x to denote a spatial object which attributes are (i)
theconcentration of a pollutant, and (ii) its location.Within each
neighbourhood of x, severalmeasurements xn,i of the same compounds,
performed at different locations and
differenttimes,areavailable.Equation1allowsthecomputationofaweightedaverageofallavailable
Figure 1: Several methods to detect abnormal values in multi
dimensional datasets ([7])
-
measuremcorresponNotethatneighborhsettings.
Figure 2: D± 1 spherica
Theweighnormalizeparametereach neigdimension The
isttim
The(xn,dec
Thenormdeviations(excludingdistancezj, estimat
ments (nonndtotheinvouralgorithood exten
Definition ofal degree an
htingfactored Euclidears
characteghbourhoodnalmultivaespatio‐temthelongitumeindays.espatio‐te,i,1,xn,i,2,xn,icimaldegremalizedEucsoftheattg
the centrzero).sj2isated using
n‐spatial atversedistathmallowsnd in time
f a spatial-temnd ± 1 day
rswiarecaan distancerize thedd
stationariatevectomporalposdeindecim
mporalpoi,3),wherexees,andxn,ilideandisttributevalural
stationanunbiasethe sam
ttributes oanceinspacforadyna, in case o
mporal neig
alculatedue, and (B)istance in(xn,i).
Theors:sitionofx(maldegree
sitionsof txn,i,1 is thei,3isthetimtance iscomuesxn,i,1,xn,xto
avoidedestimatomple varian
8
f xn,i) withceandtimeamicexpanof insuffici
ghborhood of
singtwod) the inverspaceandspatial att
(thecentraes,x2isthe
thexn,i (thelongitude
meindays.mputedusi,2andxn,i,3division b
oroftheponce of in
hin each nebetweenxnsionofinmient data b
f sampling s
ifferentmerse squaretimebetwtributes of
lstation)islatitudein
eneighbouindecimal
singEquati3overthenby zero foropulationvandependent
eighbourhoxn,iandx.maximumfbeing retri
site x with an
ethods:(A)ed Mahalanween thecef x and xn
sdefinedbndecimald
urhoodstatldegrees,x
on2wherneighbourhr theweigharianceoftt and ide
ood. The w
fivetimesoieved with
n interval of
)theinversnobis distaentral statin,i are defi
y(x1,x2,x3)degrees,an
tions)aredxn,i,2 is the
resjare thehoodsetofht of spatiotheattribuentically d
weights wi
ofthebaseh the base
f selection of
sesquaredance. Bothon (x)andined as 3‐
),wherex1ndx3isthe
definedbylatitude in
estandardfnstationso‐temporaltevariabledistributed
i
ee
f
dhd‐
1e
yn
dsled
-
9
observations(henceusingthedenominatorn–1)(Equation3).TheMahalanobisdistanceiscomputed
using Equation 4 where S is the covariance matrix of the xn,i of
the
wholeneighbourhoodset(excludingthecentralstationx).Asasimplecontrolstep,
thenormalizedEuclideanweighting factorsshallbesymmetricallyaround
the point in time of observation. This cannot necessarily be
expected for theMahalanobis Distance based weighting factors, which
makes them more difficult to
bechecked.OnemaynoticethatifthecovariancematrixSisdiagonal,theMahalanobisdistancereduces
to thenormalizedEuclideandistance.Forcontrolpurposes,wesetup
thecode forthecomputationof theMahalanobisdistance inawaythat it
canbemodifiedbyartificiallysettingallnon‐diagonalelementsofStozero.TheweightingfactorswicanfinallybecalculatedbycomputingtheinverseofthesquareofthenormalizedEuclidiandistanceorthesquareoftheMahalanobisdistance.
,1
1
n
i n ii
n n
ii
w xx
w
Equation1
2
3, ,
, 21
, n i j jnormalized Euclidian n ij j
x xD x x
s
Equation2
22 , , , ,1
11
n
j n i j n i ji
s x xn
Equation3
1, , ,,T
Mahalanobis n i n i n iD x x x x S x x Equation4
nSx f x f x Equation5
n
n
Sx
Sx Sxzs
Equation6
,1
1
n
i n ii
n n
ii
w SxSx
w
Equation7
2,1
1
1n
n
i n i ni
Sx n
ii
w Sx Sxns
n w
Equation8
1.96 1.96iz
Equation9After a log‐transformation of non‐Gaussian data, we
compute the weighted average ix according to Equation 1 and the
differences Sx between the non‐spatial attribute
valuef(x)(pollutant concentration) at locationx and the
averageattributevalueof itsneighboursaccordingtoEquation5.
-
10
Withineachneighbourhood,theSxvaluesarenormalisedtocenterdataat0withastandarddeviationof1usingEquation6.Inthisequation,
Sx
andsSxaretheweightedaverageandtheweightedstandarddeviationofallSxiattributevaluescalculatedoverallstationswithintheneighbourhoodofx[17].
Sx andsSxarecalculatedusingEquation7andEquation8wheren’is
thenumberofnon‐zeroweightswithin thewi vectorof lengthn.Noteby
calculating
theweightsfromtheinverseofthesquaredspatio‐temporaldistances,thewiarealwaysnon‐zeroandthereforen=n’inourapplication.Anotherapproachcouldhavebeentoestimate
x andsoverthewholedataset[18].However,since air pollution time
series exhibit a strong seasonality effect, applying such a
methodwould have led to an overestimation for Sx and sSx, resulting
in a number of
undetectedpossibleabnormalvalues(falsenegative)whenapplying
theabnormalvalue test (Equation9).Finally, the test
fordetectinganabnormalvalue,given inEquation9, searches forzi
valuesexceedingalimitvalueθconsistingofthemovingaverageoffiveconsecutivezivaluesplusapredefinedthresholdof1.96,correspondingtoaconfidenceintervalinwhich95%ofzivaluesshouldlay.Somelimitationswereapplied:
Incaseof|zi|exceedingavalueof1.96,ziwasnottakenintoaccountforthecalculation
ofthemovingaverage. In case ofθ estimatedbasedon less than
threezi values, amoving averagewasnot
calculated.Thustheabnormalvaluetestwasnotperformedatthisposition.Asa
furtherrestriction,outlierswereonlyflaggedwhenthereferencepointneighbourhoodcontainedaminimumnumberofdatapoints(thresholdsetto20datapoints).In
contrast to the paper by Lu [7], we precisely did not use an
absolute value of the z‐transformation. Indeed the sign of the
abnormal value is of interest to us as we want tounderstand ifa
station ismeasuring to lowquantitiesor tohighquantitiescompared to
itsneighbourhoodstationswiththesameclassification(backgroundstations).Bycomparingtheresultoftheziagainstthemovingaverageoftheziplus/minusthethresholdvalue,abnormalvaluescanbeidentified.
4 Robustness,sensitivityandoptimisationofthescreeningtool
Among AT, CZ, DE, ED, FR, GB, IT and NL, a few negative values
were observed in
theAIRBASE_2007PM10datasets(70valuesforFranceoutof168153recordsand9forGBout48872
records). These values were discarded because they disturb the
process oftransformationofdatasetsfornormalisation.The design of
the outlier test implies some limitations and can lead
tomathematical deadends: Lackofminimum20dataintheneighbourhood. The
Mahalanobis distance calculation requires an inversion of the
S‐matrix. The S‐
matrix,however,revealedtobenon‐invertibleforsomedatacases.Forthisreason,theuseofnormalizedEuclideandistancewasintroducedasafirstalternativesolution.
17 Ref: Shekhar et al “A Unified approach to detecting spatial
outliers” page 141, Example 1 18 Dissertation of Yufeng Kou –
“Abnormal Pattern Recognition in Spatial Data”, page 19, lines 4 to
8
-
11
Otherstatisticalparametersmightaswellnotberetrievableincaseofcolinearitiesinthespatialstructureoftheneighbourhoodofadatapoint.
Thefirsttrailingdaysandlastdaysofatimeseriescannotbetestedbecauseθvaluescannotbecomputed.
More generally,when less than 3 zi values are available to
calculateθ, computationstops and abnormal datapoint thresholding
cannot be performed for this
datapoint.Wehoweverobservedaconsiderableamountof|zi|suspectedtobehigherthan1.96whichareacceptedforsafetyoftheconclusions
All these shortcoming cases are summarized under the data
category “non‐verified data”.However it is possible that a
considerable part of these unverified values corresponds
toabnormalvalues.Thismightespeciallybethecasewhencalculationsstopbecauseofseveralzivaluesexceedingthethresholdvaluesarediscarded,whichinconsequencecanpreventthecontinuouscomputationofthe5daysmovingaverageofθ.InairbaseafewstationsreportPM10valuesformorethanonemethodofmeasurements.Forexample,
a few stationsmay use onemanualmethod integrated over 24 and an
automaticmethodproducinghourly values. In some cases, stations
report values from
twoautomaticmethod.However,itwascheckedthatwithinthetableofdailyvaluesonlyoneuniquemethodwas
used per station and per day, making unnecessary to check the
robustness of
thescreeningtoolatstationswithmultiplemeasuringmethods.4.1
NormalityofdatasetsandlogtransformationOurtestforabnormalvalueslooselyassumesthatthePM10datasetsarenormallydistributed.A
significant violation of the assumption of normality could
increases the chances of
un‐reliabledetectionsconsistingeitheraTypeI(falsepositive)orTypeII(falsenegative)error,dependingonthenon‐normality.Thenon‐normalityofPM10datasetsisarealfeatureduetothenatureofairpollutantthat
iseasilyobserved(seeFigure3),ratherthancausedbydataentryerror,missingvaluesorpresenceofoutliervalues.Misclassificationof
stationsmightalso be a source of skewness, e. g. traffic or
industrial stations wrongly classified asbackground stations.
Visual inspection of Figure3 shows right‐skeweddistributions
(meanvalue higher than themodevalue)with skewness coefficients of
2.51 (DE), 2.38 (FR), 2.25(GB)and1.87(IT).
-
A commonThe squarnestedinwasaddednumbersbFigure 4 ssomeskew
Figure 3: D
n transformre root ofsomedatadtomovebetween0shows
thatwness(0.95
Density of PM
mation forevery valusets(seeatheminimand1bect the
distri5forDE,0.
M10 datasets
normalisinue was takannex1).Amumvalueocoming largbutions
of97forFR,0
12
in Airbase fo
ngdata isken after dAconstanteofthedistrgerwhilenf
square‐ro0.86forGB
for DE, FR, G
theso‐calldiscardingequalto(1ributionabnumbers aot
transforBand0.78f
GB and IT in
edsquarethe few ne–minimubove1 inoabove1wormedPM1forIT).
n 2006-2007
root transegative PMmPM10peordertoavouldbecom0datasets
7
sformation.M10 valuesercountry)oidhavingme smaller.still
show
.sg.w
-
Figure
SinceasimoftheinitiforexampwasappliePM10valufor
thesqaddingacMoreover,transformthatraiseis a
genelogarithmcharacterivaluesofλ19 Osborne, J. Evaluation 15,
e 4: Density o
mplesquarialdistribuple,logarithed(seeFiguueswerediuareroot
tconstanteq, we have
mationsorBnumberstoralisationic and inveized as
x1/2λhavebeenW. “Improving, no. 12 (2010):
of square-ro
reroottrantionsofPMhmicorinvure5).Astiscardedprtransformaqualto(1–investigatBox‐Coxtraoanexponof
a grouperse transf2, inverse
tnsetbyan Your Data Tra 1–9.
ot transform
nsformationM10values,versetransfthelogarithriortotranation,wemminimumed
the useansformatinent(seeEqp of
otherformation.transformanoptimizati
ansformations: A
13
med PM10 dat
nwasineffe,moresophformation.hmofanynnsformationmovethemPM10perce
of anotheion[19].Poquation10r transformFor exampations
canionalgorith
Applying the B
tasets for DE
ectivetocohisticatedtAnaturallnullornegan.Additionminimumvountry).er
class ofowertransfwhereλ≠mations whple, a squabe
characthmableto
Box‐Cox Transfo
E, FR, GB a
ompletelyrtechniqueslogarithmoativenumbnally,andfoalueof the
f transformformations0).Theboxhich includare root trterized as
xminimizet
ormation.” Prac
and IT in 200
removethecouldbeaofPM10daberisundeforthesameedistributi
mations calsaretransfx‐Coxtransdes the sqansformatix‐1 and
sotheskewne
ctical Assessme
06-2007
eskewnessappliedlikeataplus0.5fined,suchereasonasonto1by
led powerformationssformationquare root,ion can beforth.
Theessofeach
nt, Research &
sehsy
rsn,eeh
&
-
distributioandITres0.03and0transform
Figure 5: Dand Italy in
Comparinboth
transthePM10However,doesnotedistributioEquation7Anyhow,implemeneffectivein
on.Thefollspectively.0.01,respemationof‐0
Density of Bn 2006-2007
g the skewsformationdatasets.asshownensurethaon, too.
A7andEquait is likelyntationofthndetecting
lowingλvaConsequenectively.Th.03,0.141,
ox-Cox tran
wness of lons successfu
inFigure6teach indiLog‐transfation8requy that
brehez‐testprgabnormal
alueswerentlytheskeesevalues‐0.38and‐
'10
PMPM
nsformed PM
og transformfully reach
6,asymmeividualneigformationuirethattheeching
throvidedthavalues.
14
eobtained:ewnessofDcanbecom‐0.16forDE
110 M
M10 datasets
medandBthe goal o
etricaldistrghbourhoowithin eacheSxvaluese
normalitatthethres
0.093,0.1DE,FR,GBmparedtoE,FR,GBan
in Airbase f
Box‐Cox traof producin
ributionforoddatasetch neighbosbetweenty assumpsholdvalue
0,0.13andandITdectheskewnndIT,resp
for Germany
ansformedng symmet
rthewholewillaswelourhood isneighbourhption doese1.96setin
d0.14forDcreasedtonessfiguresectively.
E
y, France, G
values sugtrical distri
e2006‐20llpresenta impossiblhoodsarecnot jeopa
nEquation
DE,FR,GB0.01,0.17,softhelog
quation10
Great Britain
ggests thatibutions of
07datasetaGaussianle becauseconsistent.ardize the9remains
,g
n
tf
tne.es
-
Figure 6: H02-20006 (D
Histrogram oDE and FR)
of PM10 valu and 01/02/2
ues and of th2007 (GB an
15
heir logarithnd IT) in thei
hmic transfoir neighbour
ormation of rhood
selected stattions on 01--
-
16
4.2
OptimisationoftheparametersusedintheabnormalvaluescreeningThechoiceofdifferent
functionalparametersthataffect
theoutcomeoftheabnormalvaluescreening has been investigated. This
includes the temporal/spatial limits of
theneighbourhood(initially±2days,±1ºlongitudeand±1ºlatitude),thethresholdvalue1.96setinEquation9,thetestvalueforacceptingvaluesinthemovingaverageofθandthewidthofwindowusedtocalculatethecriteriaforthemovingaverageofθ(5consecutiveziallowingfor2missingvalues).Thesensitivityofthescreeningresultstothesevalueswasinvestigatedby
simulations usingPM10datasets. The findings from this sensitivity
analysis allow for
anoptimizedselectionofparametervalues,andforavalidationofparameterselection.4.2.1
Spatio‐temporallimitsoftheneighbourhood
For these simulations, the neighbourhood domainwas
systematically adjusted in time andspace.We testedall
combinationsofneighbourhoodsizes from±1 to±4days in
timeandfrom±1to±4degrees in
longitudeandlatitude.Byextendingthelimitsofneighbourhoodoutside the
given station conditions, these simulation increased the
probability of
falsedetectionofabnormalvalues.Validationoftheneighbourhoodlimitswasperformedforallselectedcountries(AT,CZ,DE,ES,FR,GB,ITandNL)forallbackgroundstationofallareatypes(rural,urbanandsuburban)usingthePM10datasetsof2006to2007.TheresultsofthesesimulationsaregiveninTable1andFigure7.Note
that for thesesimulationsnodynamicexpansionof the
timeandspatiallimitsofneighbourhoodhavebeenallowed.On the contrary
to initial anticipation, the selection of the time and spatial
limits of
theneighbourhood,doesnothaveastrongeffectonthenumberofdetectedabnormalvalues.Infact,therelativestandarddeviations,whichappeartobeindependentofthetotalnumberofabnormalvalues,withintheresponsesurfacevaluesare10%(AT),11%(CZ),14%(ES),4%(FR),6%(GB),10%(IT),and15%(NL),respectively.Table1showsthatbetweenthesmallestandlargestneighbourhood,thetotalnumberofabnormalvaluesisonlytwiceasbigforNL.Itcan
be concluded that the weighting algorithms presented in chapter 3
make the methodreasonably independent of the preselected extent of
the neighbourhood. The effect of theweighting factors ismuch
stronger than the preselected limitations of the
spatio‐temporalneighbourhoodboundaries.An absolute definition of
abnormal values is not feasible. Consequently, we do not
havereferencedatafortheoptimumnumberofabnormalvaluestobecomparedtotheoutputofthescreeningtool.Onlyexpertjudgementorrationalindicators(i.elackofcontinuityofthetotalnumberofabnormalvalues)canbeusedtoselectthebestcombinationofspatiallimitsand
time limits. Since the screening tool could be used as a warning
system for doubtfulvalues by various stakeholders, a combination of
limits producing reasonably high figuresshould be selected. At the
same time, the extent of the neighbourhood should be
asparsimoniousaspossibletosaveonCPUtimeofthecomputationsandinordertoproducez’indicatorsthatarecharacteristicofmeasurementsinthevicinityoftestedstations.Asmentionedabove,forthescatteringofthenumberofabnormalvaluesallcombinationsoftime
and space limits produce comparable numbers of abnormal values.
However,
thevariationsalongthetimeandspatialdimensionsaredifferent.Amultipleanalysisofvarianceshowed20that
country is themain influenceaffecting
thenumberofabnormalvalueswhiletime window had double an effect
compared to the space window. Moreover, one may20 Note that FR was
discarded from this analysis because it gave a high number of
abnormal values,
-
17
observeseveralsteepdecreasesofthenumberofabnormalvaluesoccuringatatimelimitof1dayand1
sphericaldegree.Consequently, itwasdecided to select a timewindowof
twodays(withadditionalpossibilityofexpansion)toavoidclosenesstothesteepgradient.ForATandIT,onecanalsoobservethatthevariationofthetotalnumberofabnormalvaluesfluctuatemorealongthespacedimension.Itislikelythatorography,characterisedbyarapidchangebetweenmountainsandvalleysforthesetwocountries,producesthesefluctuations.Followingthisobservationandinordertolimitpossiblefalsepositivesandfalsenegatives,itwasdecided
to set the spatial limitsof theneighbourhood to the smallest
spacedimensionwithout the possibility of expansion. These figures
represent, in our view, the bestequilibrium between avoiding
unverified data, high number of detected abnormal
values,avoidingtheextremefigurescharacterisedbyalackofcontinuityofthenumberofabnormalvaluesandlimittheCPUtimeneededtoperformthesecalculations.Table
1: Effect of changing the spatial and temporal limits on the
detection of abnormal values for Germany for the background - urban
- 2007 - PM10 out of 236797 total records -constant threshold for
the z value and constant value for the rolling mean value
Timewindow[days]
Spatialwindow[°]
AT CZ DE ES FR GB IT NL
±1 ±1 611 1240 2899 506 4959 471 837 146±1 ±2 594 1227 2693 844
5508 570 926 248±1 ±3 579 1190 2444 892 5473 569 939 238±1 ±4 582
1170 2321 885 5388 584 1141 236±2 ±1 714 1214 3058 688 5750 553 825
227±2 ±2 566 1127 2586 803 5821 523 917 304±2 ±3 546 1054 2227 773
5704 515 937 280±2 ±4 564 1020 2082 771 5769 522 1071 278±3 ±1 661
1100 2939 726 5883 511 809 316±3 ±2 544 1030 2396 756 5854 535 933
294±3 ±3 503 961 2100 720 5616 530 913 266±3 ±4 543 919 1930 714
5755 507 1022 266±4 ±1 611 1014 2707 713 5552 492 788 293±4 ±2 509
937 2240 694 5568 543 913 277±4 ±3 482 911 1999 665 5375 511 871
257±4 ±4 528 897 1864 658 5465 491 967 255
-
Figure 7: InfDE, FR, GBand temporalength of the
nfluence of timB, IT and NL ial extend are e edge of ± 1,
me and spatiain 2006-2007.given in extethus 2° in lon
al extent in the. Note the diffension aroundngitude and 2
18
e determinatio
fferent axis ord a centerpoi° in latitude.
on of abnormrientation per int. Example
mal values for graph. Note given, a spat
PM10 datasetalso that the stial extend of
ts for AT, CZ,spatial extent
f 1describes a
t a
-
19
4.2.2 Testthresholdforz‐test
Thetestthresholdtodetectabnormalvaluesshouldbefromastatisticalpointofviewaround1.96forasimplez‐test.However,
theexperimentshaveshownthat
thisvaluemightbetooconservative.WerunaseriesofexperimentsusingtheresultsofscreeningsforAT,CZ,DE,ES,FR,
GB, IT and NL to further investigate this parameter. We observed
that for
thresholdshigherthan3thenumberofidentifiedabnormalpointsrapidlyconvergestowardszero.Overthewholerangeofthresholdvalues,thenumberofunverifiedvaluesremainsconstant.Figure8showsthatthetestthresholdhighlyaffectstheoutputofthescreeningtoolregardingthe
number of abnormal values. However, like for optimisation of the
limits of theneighbourhood, without reference values for the number
of abnormal values, we cannoteasily decide which threshold to use.
Further investigations are needed to find rules
andmechanismtosetthisparameter.Furthermore, theselectionof
thisparameterwillstronglydependonthespecificobjectivesoftheintendedapplication.
-
20
Figure 8: Percentage of abnormal values with respect to
different choices for the z-test threshold
0%
2%
4%
6%
8%
10%
12%
14%
0 1 2 3 4 5 6
0%
2%
4%
6%
8%
10%
12%
14%
16%
Austria (2006 - 2007)
Threshold value for z test
% o
f abn
orm
als
in v
erifi
ed re
cord
s
% o
f unv
erifi
ed re
cord
s
% of abnormal values
% of unverified records
0%
2%
4%
6%
8%
10%
12%
0 1 2 3 4 5 6
0%
2%
4%
6%
8%
10%
12%
Czech Republic (2006 - 2007)
Threshold value for z test
% o
f abn
orm
als
in v
erifi
ed re
cord
s
% o
f unv
erifi
ed re
cord
s
% of abnormal values
% of unverified records
0%
2%
4%
6%
8%
10%
12%
14%
16%
0 1 2 3 4 5 6
0%
2%
4%
6%
8%
10%
12%
Germany (2006 - 2007)
Threshold value for z test
% o
f abn
orm
als
in v
erifi
ed re
cord
s
% o
f unv
erifi
ed re
cord
s% of abnormal values
% of unverified records
0%
2%
4%
6%
8%
10%
12%
0 1 2 3 4 5 6
0%
10%
20%
30%
40%
50%
Spain (2006 - 2007)
Threshold value for z test
% o
f abn
orm
als
in v
erifi
ed re
cord
s
% o
f unv
erifi
ed re
cord
s
% of abnormal values
% of unverified records
0%
5%
10%
15%
20%
25%
0 1 2 3 4 5 6
0%
5%
10%
15%
20%
25%
30%
35%
France (2006 - 2007)
Threshold value for z test
% o
f abn
orm
als
in v
erifi
ed re
cord
s
% o
f unv
erifi
ed re
cord
s
% of abnormal values
% of unverified records
0%
2%
4%
6%
8%
10%
12%
14%
16%
0 1 2 3 4 5 6
0%
5%
10%
15%
20%
25%
30%
35%
40%
United Kingdom (2006 - 2007)
Threshold value for z test
% o
f abn
orm
als
in v
erifi
ed re
cord
s
% o
f unv
erifi
ed re
cord
s
% of abnormal values
% of unverified records
0%
2%
4%
6%
8%
10%
12%
14%
0 1 2 3 4 5 6
0%
5%
10%
15%
20%
25%
30%
35%
40%
Italy (2006 - 2007)
Threshold value for z test
% o
f abn
orm
als
in v
erifi
ed re
cord
s
% o
f unv
erifi
ed re
cord
s
% of abnormal values
% of unverified records
0%
2%
4%
6%
8%
10%
12%
14%
16%
0 1 2 3 4 5 6
0%
5%
10%
15%
20%
Netherlands (2006 - 2007)
Threshold value for z test
% o
f abn
orm
als
in v
erifi
ed re
cord
s
% o
f unv
erifi
ed re
cord
s
% of abnormal values
% of unverified records
-
21
4.2.3 Limitvalueforincludingziinthecomputationofθ
The z‐test for detecting abnormal values (Equation 9) is based
on the computation ofθ, amoving average of 5 consecutive zi values.
zi values are included into themoving
averageprovidedthattheirvaluesdonotexceedapredefinedthresholdwhichiscurrentlysetto1.96.All|zi|exceedingavalueof1.96arediscardedfromthecomputationofthemovingaverage.Thisproducesunverifiedrecordswhenseveralconsecutiveziarerejected,hencerestrictingacontinuouscalculationofθ.Figure9showstheinfluenceofthethresholdforacceptingzivalues.Tuningthisparameterindirectionof“strict”values(lowthreshold)causesalargenumberofunverifiedrecordsintheevaluation.The
influence on the number of identified abnormal points is complex
and indicates thesuperimposition of two or more effects. First, the
reduction of the number of
unverifiedrecords(byusinglessstrictthresholdvalues)seemstobedirectlyconnectedtoanincreaseofidentified
abnormal records (examples of ES, FR, and IT). This indicates that
a largeproportion of abnormal records have been hidden within the
non‐verifiables. Second,however, towards higher threshold values
the effect can also be opposite (decrease
ofidentifiedabnormalrecordsintheexamplesofDE,GB,andNL).Asanotherimportantobservation,itisnotfeasibletosetittothehighestnumberofabnormalvaluesandlowestnumberofunverifiedrecords.
-
22
Figure 9: Effect of the upper limit value (currently 1.96) for
including zi-values into the moving average computation of θ
0%
0.5%
1%
1.5%
2%
2.5%
0 1 2 3 4 5 6
0%
5%
10%
15%
20%
25%
30%
35%
40%
Austria (2006 - 2007)
Limit value for accepting zi values
% o
f abn
orm
als
in v
erifi
ed re
cord
s
% o
f unv
erifi
ed re
cord
s
% of abnormal values
% of unverified records
0%
0.5%
1%
1.5%
2%
2.5%
0 1 2 3 4 5 6
0%
5%
10%
15%
20%
25%
30%
35%
Czech Republic (2006 - 2007)
Limit value for accepting zi values
% o
f abn
orm
als
in v
erifi
ed re
cord
s
% o
f unv
erifi
ed re
cord
s
% of abnormal values
% of unverified records
0%
0.5%
1%
1.5%
2%
2.5%
3%
0 1 2 3 4 5 6
0%
5%
10%
15%
20%
25%
30%
35%
40%
Germany (2006 - 2007)
Limit value for accepting zi values
% o
f abn
orm
als
in v
erifi
ed re
cord
s
% o
f unv
erifi
ed re
cord
s
% of abnormal values
% of unverified records
0%
0.5%
1%
1.5%
2%
2.5%
3%
3.5%
4%
0 1 2 3 4 5 6
0%
10%
20%
30%
40%
50%
60%
70%
Spain (2006 - 2007)
Limit value for accepting zi values
% o
f abn
orm
als
in v
erifi
ed re
cord
s
% o
f unv
erifi
ed re
cord
s
% of abnormal values
% of unverified records
0%
1%
2%
3%
4%
5%
6%
7%
8%
0 1 2 3 4 5 6
0%
10%
20%
30%
40%
50%
60%
France (2006 - 2007)
Limit value for accepting zi values
% o
f abn
orm
als
in v
erifi
ed re
cord
s
% o
f unv
erifi
ed re
cord
s
% of abnormal values
% of unverified records0%
0.5%
1%
1.5%
2%
2.5%
3%
0 1 2 3 4 5 6
0%
10%
20%
30%
40%
50%
60%
United Kingdom (2006 - 2007)
Limit value for accepting zi values
% o
f abn
orm
als
in v
erifi
ed re
cord
s
% o
f unv
erifi
ed re
cord
s
% of abnormal values
% of unverified records
0%
0.5%
1%
1.5%
2%
2.5%
3%
3.5%
4%
0 1 2 3 4 5 6
0%
10%
20%
30%
40%
50%
60%
70%
Italy (2006 - 2007)
Limit value for accepting zi values
% o
f abn
orm
als
in v
erifi
ed re
cord
s
% o
f unv
erifi
ed re
cord
s
% of abnormal values
% of unverified records0%
0.5%
1%
1.5%
2%
2.5%
0 1 2 3 4 5 6
0%
10%
20%
30%
40%
50%
Netherlands (2006 - 2007)
Limit value for accepting zi values
% o
f abn
orm
als
in v
erifi
ed re
cord
s
% o
f unv
erifi
ed re
cord
s
% of abnormal values
% of unverified records
-
23
4.2.4 Windowwidthforthecomputationofθ
The effect of the width of the time window of the moving average
(θ) on the number
ofdetectedabnormalvalueswasstudiedfortheresultsofthescreeningtoolforAT,CZ,DE,ES,FR,GB,
IT andNL. In these calculations,weassumed that foranywindowwidth
theactualpercentage of requiredminimum number of valid zi for
partial calculations of
themovingaveragewassetto60%.Figure10indicatesthatthetimewindowofthemovingaverageshouldnotbesettovalueslower
than 4 days to avoid a strong decrease of the percentage of
detected abnormal.Conversely, for timewindowwidthover5days, only a
slight increaseof thepercentageofabnormal values takes place. This
latter effect might be due to instability of
weatherconditionsoverlongertimespans,thereforethetimewindowindaysshouldberathershort.Therefore
we choose a value of 5, as this seems to be a good compromise over
stablethresholding and not indicating to many abnormal values due
to false positives.
Thisparameterseemsnottoinfluencethepercentageofunverifiedrecordsalthoughsomenoisecanbeobservedfortimewindowoflessthan10days.
-
24
Figure 10: Influence of the moving windows width used for the
moving average computation of θ
0%
0.5%
1%
1.5%
2%
2.5%
3%
0 5 10 15 20 25 30 35 40 45
0%
5%
10%
15%
20%
Austria (2006 - 2007)
Moving-Window width [days]
% o
f abn
orm
als
in v
erifi
ed re
cord
s
% o
f unv
erifi
ed re
cord
s
% of abnormal values
% of unverified records0%
0.5%
1%
1.5%
2%
2.5%
3%
3.5%
0 5 10 15 20 25 30 35 40 45
0%
2%
4%
6%
8%
10%
12%
14%
16%
Czech Republic (2006 - 2007)
Moving-Window width [days]
% o
f abn
orm
als
in v
erifi
ed re
cord
s
% o
f unv
erifi
ed re
cord
s
% of abnormal values
% of unverified records
0%
0.5%
1%
1.5%
2%
2.5%
3%
3.5%
0 5 10 15 20 25 30 35 40 45
0%
2%
4%
6%
8%
10%
12%
14%
16%
Germany (2006 - 2007)
Moving-Window width [days]
% o
f abn
orm
als
in v
erifi
ed re
cord
s
% o
f unv
erifi
ed re
cord
s
% of abnormal values
% of unverified records0%
0.5%
1%
1.5%
2%
2.5%
3%
3.5%
0 5 10 15 20 25 30 35 40 45
0%
10%
20%
30%
40%
50%
60%
Spain (2006 - 2007)
Moving-Window width [days]
% o
f abn
orm
als
in v
erifi
ed re
cord
s
% o
f unv
erifi
ed re
cord
s
% of abnormal values
% of unverified records
0%
1%
2%
3%
4%
5%
6%
7%
8%
9%
0 5 10 15 20 25 30 35 40 45
0%
5%
10%
15%
20%
25%
30%
35%
40%
France (2006 - 2007)
Moving-Window width [days]
% o
f abn
orm
als
in v
erifi
ed re
cord
s
% o
f unv
erifi
ed re
cord
s
% of abnormal values
% of unverified records0%
1%
2%
3%
4%
5%
6%
0 5 10 15 20 25 30 35 40 45
0%
5%
10%
15%
20%
25%
30%
35%
40%
United Kingdom (2006 - 2007)
Moving-Window width [days]
% o
f abn
orm
als
in v
erifi
ed re
cord
s
% o
f unv
erifi
ed re
cord
s% of abnormal values
% of unverified records
0%
1%
2%
3%
4%
5%
0 5 10 15 20 25 30 35 40 45
0%
10%
20%
30%
40%
50%
Italy (2006 - 2007)
Moving-Window width [days]
% o
f abn
orm
als
in v
erifi
ed re
cord
s
% o
f unv
erifi
ed re
cord
s
% of abnormal values
% of unverified records0%
0.5%
1%
1.5%
2%
2.5%
0 5 10 15 20 25 30 35 40 45
0%
5%
10%
15%
20%
25%
Netherlands (2006 - 2007)
Moving-Window width [days]
% o
f abn
orm
als
in v
erifi
ed re
cord
s
% o
f unv
erifi
ed re
cord
s
% of abnormal values
% of unverified records
-
25
4.3 ManualcalculationsChecks by manual calculation were
performed for a set of stations in different countriesincluding
stations FR34032, FR34052 on day 2006‐02‐01, DETH025 on day
2006‐02‐01,GB0643Aonday2007‐02‐01andIT1186Aonday2007‐02‐01.Thecheckhavebeencarriedoutbothusing
twoversionsof theAIRBASEversion,onewithdatasetsending
in2007andanotherversionforwhichdatasetsendingin2010.Thecheckconsistedinconfirmingthelistofextractedstationswithinneighbourhoodsfortheselecteddateswithinthespatiallimitsoftheneighbourhoodandforthecorrectcombinationof
station type (background) and area type (all area type: urban,
suburban and
rural).Equation1toEquation9werecomputedforthemanualcalculationsandtheirresultsagreedwiththeresultsofthescreeningtool.AfewdifferenceswereobservedbetweenAirbase2007and2010,mainlyconsistingofafewstationspresentinAirBase_2010thatweremissinginAirbase_2007.Moreover,thevaluesofstationsintheneighbourhoodofDETH025wereslightlydifferentinAirBase2010(abouthalfof
the valuesdifferedby less than0.2µg/m³without changing theoutput of
the
screeningtools).StationGB0788AhadPM10valuesof36,24,31and35µg/m³inAirbase_2007and34,22,
20 and 25 µg/m³ in Airbase_2010. Moreover, when extracting the
neighbourhoodAirbase_2007,thefollowingstationsweremissing:
DEHE055andDETH042fortheneighbourhoodofDETH025
IT0940AandIT1672AfortheneighbourhoodofIT1186A
5 Results
AcompletesetoftimeseriesplotsofdailyPM10abnormalvaluesforthebackgroundstationsof
AT, CZ, DE, ES, FR, GB, IT and NL are given in Annex 1. The graphs
in Annex 1 areconsidered to be useful for local authorities in
order to question the consistency of thedetected abnormal values of
their stations. Modellers can use this information
whenestimatingtheperformanceofmodelscomparedtofieldmeasurements.Table2summarizestheoutcomeofthescreeningtoolappliedpercountry.Fromatheoreticalperspective,
a screening procedure that looks at extreme values within
normalizeddistributions implies that a certain percentage of
abnormal value detections should beexpected. However, because of
the different data transformations employed, we cannotanticipate a
detection of 5 % of abnormal values corresponding to the selected
level
ofconfidence.Infact,takingallcountriesintoconsideration,thepercentagesofabnormalvalueidentifications
rangesbetween1.5 and4.1%.However, once thematter of unverified
datawillbesettledown, thenumberofabnormalvaluesperstationmay
increasewhena
largernumberofextremezivaluesareacceptedintheestimationofθ.WehavelookedatcorrelationbetweenthepercentagesofabnormalvaluespercountryanddifferentvariablesinTable2.Tooursurprise,thehighestpercentagesofunverifieddatawerenotcorrelatedwiththedensityofmonitoringstationsofeachcountrynorthehomogeneityofPM10measurementmethod(gravimetry,TEOMorβ‐ray)percountrynorthehomogeneityofarea
types of stations per country (urban, suburban or rural). At a
first glance, one
mayobservethatthepercentageofabnormalvaluesisgenerallyhigherforthecountriesreportingthehighestnumberofrecords.Finally,byvisualinspectionofthegraphsoftheannex,ruralsitesappear
toproducemoreabnormalvalues than forurbanorsuburbanareas
indicatingthatthepresenceofruralstationsinthe“Allbackground”categoryshouldbefurtherstudied.The
above conclusions are somehow premature. We would like to emphasize
that the
-
26
reportedfiguresaresomewhatdependentontheparametervalueschosen
inthetools,andthatthesearestillgoingtobefine‐tunedfurther.Forthenextdevelopmentsofthemethod,wewanttogiveadefinitiveevaluationofwhatcanbeachievedwiththescreeningtools.Ourshorttermobjectiveconsistsof:
Investigate if unverified records partly represent abnormal values;
decrease the
percentage of unverified records by modification of the
calculation of θ
movingaverage(e.g.byapplyingaKolmorogovZurbenkotypeoffilter).
Compare the current screening tool using normalised Euclidan
distance with thefindings using the Mahalanobis distance.
Investigate which power of the
inversedistance(currently2)isbestsuitedtoestimatetheweightingfactors.Infact,stationsmayhaveoneverycloseneighbour.Theresultingproblemisthattheweightingfactorsforthisonecloseneighbouraregettingverylarge,andtheneighbourhoodmeanistoomuchdependentonthisonesingleattributevalue.
Validate currently optimised parameter values (neighbourhood
limits, averagingwindows for θ, threshold value for the z‐test and
for accepting zi values) by
spikingPM10datasetstoartificiallyproduceoutliers.Studythepossibilitytoimprovethetoolbysettingitsparametersperindividualday.
Currently,spatialdistancesare
indecimaldegrees,butshouldratherbeevaluatedinkilometres. Therefore
we will implement a geodetic projection procedure
forcoordinatetransformations.
Currently the base station is not part of the selection for the
calculation ofneighbourhood statistics. This limitation is a
consequence of inverse distances forweighting factors calculations
becoming undefined otherwise. We will
trycircumventingorimprovingthiscalculationlimitation.
Study if includingruralstations in the“allbackground”categoryof
testedstations isappropriate as this type of area in the “All
background” categoryproduce toomanyabnormal values. Evaluate the
possibility to run the screening for the sole
urban,suburbanandruralareatypesandforthetrafficandIndustrialtypesofstations.
Evaluate the feasibility of an iterative procedure, where once
an abnormal value
isdetected,immediatecorrectionsaremadesuchasreplacingtheattributevalueofthisabnormaldatapointbytheaverageattributevalueofitsneighboursandupdatingthesubsequentcomputation.Theeffectofthesecorrectionsistoavoidnormalpointsclosetothetrueabnormalpointstobeclaimedaspossibleabnormalpoints,too.
Determination of abnormal values for all PM10 datasets and for
the last version ofAirbase over the 10 last available years for all
countries having sufficient PM10records.
Ourmiddletermobjectiveis:
Listandmapofstationscontinuallyproducingzindicatorshigherorbelowtheother
stationsintheirneighbourhoodinordertocheckstationclassifications.
ApplythescreeningtoolstoNO2andO3datasets,iffoundfeasible.
-
27
Investigation of transboundary effects on PM10 records; cluster
effect will beevaluated by including stations belonging to more
than one country into theneighbourhoodofstationsnearborders.
Re‐evaluate the measurement uncertainty for PM10, according to
the methoddeveloped inGerbolesandReuter,2010[3]andtakingadvantageof
theconsolidatedscreeningtool.
Finally,ourlongtermobjectiveislinkedwithinvestigationslike:
Applicationofthescreeningtoolforcheckingofdataqualityintheframeworkofnear
torealtimedatareporting.
Evaluatetheperspectivesandfeasibilitiestodevelopthescreeningtoolintoanonline‐
applicationforoperationaluseandaccessibilitybyindividualstationmanagers.
-
28
Table 2: Summary of the output of the screening tool per country
including numbers and density of background stations, total number
of records, percentages of unverified records and detected abnormal
values, types of measuring methods and area type of stations
Backgr. Stations Density [stations / 10³ km²]
Records Unverified records Abnormal
data* Affected Stations Gravimetry TEOM
Beta ray
Unknown and others
Urban area
Suburban area
Rural area
AT 63 0.75 40471 5697 (14%)
722 (2.1%)
57 (90%)
20 % 56 % 22 % 1 %,
Reflect. 1 % 31% 35% 34%
CZ 96 1.22 64996 6545 (10%)
1214 (2.1%)
87 (91%)
30 % 70 % 0 % 32% 24% 45%
DE 240 0.67 160083 16575 (10%)
3070 (2.1%)
224 (93%)
29 % 8 % 40 % 22%, Chrom. 1 % 42% 31% 27%
ES 134 0.26 59668 24980 (42%)
729 (2.1%)
81 (60%)
39 % 3 % 30 % 0 %,
DOAS 9 %, AAS 20 %
33% 33% 34%
FR 286 0.52 165443 49385 (30%)
6306 (5.4%)
259 (91%)
85 % 15 % 59% 35% 6%
GB 56 0.24 35561 12342 (35%)
600 (2.6%)
41 (73%)
5 % 94 % 1 % 0 % 84% 6% 10%
IT 108 0.36 49656 18527 (37%)
871 (2.8%)
82 (76%)
20 % 8 % 59 % 4 %,
Cond. 1%, Neph. 6%
59% 26% 14%
NL 24 0.58 16135 3004 (19%)
227 (1.7%)
22 (92%)
100 % 0 % 37% 32% 32%
*Percentages of the verified records
TEOM: tapered element oscillating microbalance
Cond.: conductimetry
Neph.: nephelometry Chrom.: chromatography
DAOS: differential
optical absorption spectrometry
AAS: atomic absorption spectrometry
Reflect.: reflectometry
-
ANNEX:
-
Z(Sx) 2006 / 2007 time series
and
abnormal datapoint identification summaries
Austria
-
−4
−3
−2
−1
0
1
2
3
4
Jan 2006 Jul 2006 Jan 2007 Jul 2007 Jan 2008
AT0002R (background, rural)long = 16.766 deg E, lat = 47.77 deg
N
z(s x
) ●●●●
●
●
●
●
●●●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●●●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●●
●●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●●●
●●
●
●
●
●●
●●●
●●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●●●
●
●
●●
●●
●
●
●
●
●
●●●
●
●●●
●
●●●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●●
●●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●●
●●●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●●●●
●●
●
●●●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●●●●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●●
●
●
●
●
●
●●
●
●
●●
●●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●●●●
●
●●
●
●●●
●
●●●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●●
● ● ●●
●
●
● ● ●normal point threshold limits abnormal point
non−verifiable
number of datapoints investigated for AT0002R: 728
identified abnormal datapoints: 17
abnormal datapoints content: 2.34 %
abnormal datapoints station ranking = 17 within a total of 63
stations investigated for AT
non verifiable datapoints: 0
−3
−2
−1
0
1
2
3
4
Jan 2006 Jul 2006 Jan 2007 Jul 2007 Jan 2008
●
AT0003A (background, urban)long = 14.678 deg E, lat = 47.179 deg
N
z(s x
)
●
●
●●
●
●●●
●
●●●
●●
●
●
●
●
●
●
●●●●●●●●
●●●●●
●
●
●●●●
●
●
●●●●
●●
●●●●
●
●●●
●
●●
●●
●
●
●
●●●
●
●
●●●
●●●
●●●
●
●●
●●●
●
●
●
●
●●
●●
●
●●
●●●
●●
●●
●
●
●
●
●●●
●
●●●●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●●
●●●
●●
●●
●
●
●●●
●●●●●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●
●●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●●
●●●
●
●
●●●●
●●●●●
●
●
●●
●
●
●●●
●
●●●
●
●●●●
●
●●
●●●
●●
●
●
●●●
●
●
●●
●●
●
●
●●
●
●●●●●●●
●
●
●
●
●
●●
●●
●●
●●
●●
●●
●●
●●
●
●●
●●●●●●●
●●●
●●●
●●●●●●
●
●●●●
●
●●
●
●
●
●●●
●●●
●●●●●●
●
●●●●●●●●●●●●●
●
●●●●●
●●●
●
●●
●●●
●
●
●●
●
●●●
●
●●
●●●●●●●●●
●
●●●
●
●●●●●●
●●●●●
●
●
●
●
●
●●●●
●
●●
●●
●●●
●
●
●●●
●●●
●●●●
●
●●●
●●●●●
●
●●
●
●
●●
●
●●
●●
●
●
●
●
●●●●
●
●●
●●●●
●
●●
●
●●●
●
●●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●●●●●●●●
●
●
●●●●●●●●
●
●
●
●●
●
●●●●
●●
●●●●●
●
●●
●●●
●●
●
●
●
●
●●
●
●●●●
●●●●●
●
●
●
●
●●
●●●
●
●
●
●
●●
●●●●
●
●
●
●●●●
●●●
●●
●●●●
●●
●
●●
●
●●
●
●●●
●
●
●
●●●●
●
●
●●
●
●
●
●●
●●
●●
●●
●●
●
●●
●
●
●
●●●
●●●
●●●●●●●
●●●
●
●
●●
●
●
●
●
●●●
●●
●
●
●●
●
●
●
●
●
●●●●●
●
●●●
●●
●●
●
●
●
●
●●●●●●●
●●●
●●●
● ● ●normal point threshold limits abnormal point
non−verifiable
number of datapoints investigated for AT0003A: 725
identified abnormal datapoints: 3
abnormal datapoints content: 0.41 %
abnormal datapoints station ranking = 49 within a total of 63
stations investigated for AT
non verifiable datapoints: 1
−16
−14
−12
−10
−8
−6
−4
−2
0
2
Jan 2006 Jul 2006 Jan 2007 Jul 2007 Jan 2008
●
●
●
●●●●●●●
●●●
●●●●●●●●●
●
●
●
●
●●
●
●
●●
●●●●
●
●●●●
●●●●
●
●
●●
●
●
●●
●
●
●
●●
●●●●
●●
●●●
●●
●●●●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●●
●●
●●
●
●●●
●●
●●
●●●
●
●●●●
●
●●
●●
●
●●
●
●●
●●●●●
●●●●●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●●
●
●●●●
●●
●
●●●
●●●
●
●
●●●
●
●
●
●
●
●●
●
AT0005R (background, rural)long = 12.972 deg E, lat = 46.68 deg
N
z(s x
)
●
●
●●
●●●●
●●●
●
●
●●
●
●●
●●
●●●●●
●●●●●
●●
●●
●●●
●●
●●●●
●●●
●
●●
●
●
●
●
●●
●●●
●
●●●
●●
●●
●●
●
●
●
●●
●
●
●
●
●●
●●
●●
●●
●●
●
●
●●
●
●●
●●
●
●
●●
●
●
●●
●
●●●
●
●●●●
●
●●●●●
●
●
●
●●●●
●
●
●●●●
●●
●●●
●
●
●●
●
●●
●●●
●
●●●● ●●●
●
●
●
●
●●●● ●●
●
●
●
● ● ●normal point threshold limits abnormal point
non−verifiable
number of datapoints investigated for AT0005R: 682
identified abnormal datapoints: 12
abnormal datapoints content: 1.76 %
abnormal datapoints station ranking = 25 within a total of 63
stations investigated for AT
non verifiable datapoints: 526
−3
−2
−1
0
1
2
3
4
Jan 2006 Jul 2006 Jan 2007 Jul 2007 Jan 2008
●
AT0012A (background, urban)long = 14.036 deg E, lat = 48.165 deg
N
z(s x
)
●
●
●●
●
●●●
●
●
●●
●
●●●
●●●●
●
●●●●●
●●●●●●●●●●●
●
●●●
●
●●●
●●●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●●
●●●
●
●
●
●
●
●
●●●●●●●
●
●
●
●
●●
●●
●
●
●
●●
●●
●
●
●●
●
●
●●●
●
●
●
●●●●●●●●●●●●●●
●●
●●●●●
●
●
●●
●
●
●●
●
●●●
●●
●
●●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●●●
●●
●
●
●
●
●●●
●
●●●●●
●●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●●
●●
●
●
●
●●
●●●
●●
●
●●●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●●
●●●●●
●●●
●●●●●●●
●
●●
●
●●●
●●
●●
●●
●
●●●●
●
●●
●
●
●
●
●●
●
●
●
●●●●
●
●●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●●
●●●
●●
●
●●
●
●●●●
●
●
●
●●
●●●
●
●
●●
●●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●●●●
●
●●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●●●
●●●●●●●●●●●
●●
●
●
●
●
●●
●●●
●●
●●
●
●
●
●
●●●
●
●
●●●
●●
●●●●●●●
●
●●●
●
●●
●
●
●
●
●
●●●
●●
●
●
●
●●
●●●
●
●
●
●●
●
●
●
●●
●
●
●●●●●●●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●●●●●●
●●●
●
●
●
●
●
●
●
●
●●
●●
●●●
●
●●
●●
●●●
●
●
●●●●●●
●●●
●●●●
●
● ● ●normal point threshold limits abnormal point
non−verifiable
number of datapoints investigated for AT0012A: 726
identified abnormal datapoints: 1
abnormal datapoints content: 0.14 %
abnormal datapoints station ranking = 56 within a total of 63
stations investigated for AT
non verifiable datapoints: 1
−3
−2
−1
0
1
2
3
4
Jan 2006 Jul 2006 Jan 2007 Jul 2007 Jan 2008
●
●
AT0016A (background, suburban)long = 14.239 deg E, lat = 48.225
deg N
z(s x
)
●
●
●
●
●●●●●●
●●
●
●●●
●●
●
●
●
●●
●●●
●●
●●●●●●●●●●
●
●
●
●●●
●●●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●●
●
●●●
●
●●
●●
●●
●
●●
●
●
●●
●●●●●●
●
●●●
●●
●●●●
●●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●●●
●
●
●●
●
●
●
●
●●●●●
●
●
●●
●
●
●●●●
●
●●●●
●●●●
●●
●●●●●●●●●
●●
●
●
●●
●●●
●●
●
●
●●●●●●●
●●
●●
●●●
●●
●●●●
●●
●●
●
●●
●●●●
●●●●
●●
●
●●
●
●●●●●●●
●
●
●
●
●●
●
●
●●
●●●
●●
●
●
●
●
●●
●●●●●●
●
●
●
●
●
●●●
●●●●●
●●●
●
●
●●●
●●●
●
●
●
●●●
●●
●
●
●●●●
●
●
●
●●●
●●●
●
●
●●
●
●●
●
●●●
●●●●
●
●●●●●●●
●
●●●●●●●●●
●
●●●
●●
●●
●
●
●
●●●●●●●●●●
●
●
●
●●
●
●
●●●●●●
●●●
●
●
●
●●
●
●
●
●
●
●
●●
●●●
●
●
●
●●
●
●●
●
●
●
●●●●
●●
●
●●
●
●
●
●●●
●
●●●
●
●
●
●
●
●
●●
●
●●●●●●●●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●●
●●
●●
●
●●
●●●
●
●
●
●
●
●
●
●
●●●
●●
●
●●
●
●●●
●
●
●●
●
●●●
●●
●
●●●●
●
●
●
●
●
●
●
●●
●
●
●●●●
●
●
●
●●●●●●
●●
●●●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●●●●
●
●●
●●●●●
●
●
●●
●●
●●
●●●●
●●
●
●
●
●
●
●●●●●●●●●
●
●
●
●●
●
●●
●●●
●
●
●
●●
●●●
●●
●
●
●
●●
●
●●
●
●
●
●●●
●●
●●
●
●
●●
●
●
●●●
●●●●
●
●
●
●
●
●
●●●
●
●●
●
●●●
●
●●●
●
●
●
●●●
●
●●●●●●●●●
●
●●●
●●
●●
●●
●●●●●
●●
●●●●
●
● ● ●normal point threshold limits abnormal point
non−verifiable
number of datapoints investigated for AT0016A: 724
identified abnormal datapoints: 1
abnormal datapoints content: 0.14 %
abnormal datapoints station ranking = 55 within a total of 63
stations investigated for AT
non verifiable datapoints: 2
−8
−6
−4
−2
0
2
4
Jan 2006 Jul 2006 Jan 2007 Jul 2007 Jan 2008
●
●
AT0020A (background, suburban)long = 16.303 deg E, lat = 48.236
deg N
z(s x
)
●
●●●●●●●
●●●
●
●
●●●
●●
●●
●
●●●
●●●●●
●●●
●
●
●●
●●●
●●
●
●●
●
●
●
●
●
●●●●
●
●
●
●●●●●
●
●
●
●
●
●●●●●●
●
●●●●
●●
●●
●
●●
●●
●●
●
●
●
●
●●●
●
●●●●●●
●●
●
●
●●
●
●●
●●
●●
●●●
●
●●●
●●
●
●●●●●
●●
●●●
●
●●
●●
●●●
●●
●
●●
●
●●●●
●●●
●●
●●
●●
●●●
●●
●●
●●
●●
●
●
●●●●●
●
●
●
●
●●●●●●●
●
●●●
●
●
●●
●●●
●●●●●
●●●
●
●
●●
●●●
●
●
●●
●
●
●
●●
●●
●●●●●●●
●●
●●
●●●
●●●
●
●
●
●
●
●
●
●●●●●●●●●
●
●
●●●
●●●
●●●
●
●
●
●
●
●●●
●
●
●●
●●
●●●
●
●●●
●
●
●
●
●
●
●●●●
●
●
●●
●
●
●●●
●●●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●●●●
●
●●●●
●
●
●●●●●
●●
●
●
●
●
●●
●●●●●●●
●●●●
●
●
●
●●
●●
●
●
●
●●●●●●
●●
●
●
●●●●●●●
●●●●●
●
●
●●●●●●
●
●
●●●●●●
●●
●
●●●
●
●●●●
●
●
●●●●●●●
●
●
●●
●
●
●
●
●
●
●●●
●
●●●●