This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
A Comparison of Strategies for Estimation of Ultrafine Particle Number Concentrations in 1
Urban Air Pollution Monitoring Networks 2
3 Matteo Reggentea*1, Jan Petersa, Jan Theunisa, Martine Van Poppela, Michael Rademakerb, 4
Bernard De Baetsb, Prashant Kumarc,d 5 aVITO, Flemish Institute for Technological Research, Boeretang 200, B-2400 Mol, Belgium 6 bDepartment of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, 7
Coupure links 653, 9000 GENT, Belgium 8 cDepartment of Civil and Environmental Engineering, Faculty of Engineering and Physical 9
Science (FEPS), University of Surrey, GU2 7XH, United Kingdom 10 dEnvironmental Flow (EnFlo) Research Centre, FEPS, University of Surrey, GU2 7XH, United 11
Kingdom 12 Abstract 13
We propose three estimation strategies (local, remote and mixed) for ultrafine particles (UFP) at 14
three sites in an urban air pollution monitoring network. Estimates are obtained through Gaussian 15
process regression based on concentrations of gaseous pollutants (NOx, O3, CO) and UFP. As 16
local strategy, we use local measurements of gaseous pollutants (local covariates) to estimate 17
UFP at the same site. As remote strategy, we use measurements of gaseous pollutants and UFP 18
from two independent sites (remote covariates) to estimate UFP at a third site. As mixed strategy, 19
we use local and remote covariates to estimate UFP. The results suggest: UFP can be estimated 20
with good accuracy based on NOx measurements at the same location; it is possible to estimate 21
UFP at one location based on measurements of NOx or UFP at two remote locations; the addition 22
of remote UFP to local NOx, O3 or CO measurements improves modelsβ performance. 23
Capsule abstract: 24
UFP can be estimated with good accuracy at one location based on NOx measurements at the 25
same location and based on measurements of NOx or UFP at two remote locations. 26
Key words: Ultrafine particles estimation; urban air pollution; pollution monitoring network; 27 Gaussian process regression; statistical modelling. 28
Figure 3: Local estimation. The left column shows the time series plots of the estimated and the 244
measured UFP number concentrations. The dashed grey line is the estimated UFP number 245
concentration and the black line is the measured UFP number concentration relative to the 246
evaluation period (Deval). The middle column shows the scatterplots, R2 and RMSE between the 247
estimated and measured UFP number concentrations. The grey lines have slope 1 and an 248
intercept of 0 (ideal case, when the estimated and measured values are equal). The dashed grey 249
lines delimit the FAC2 area. The right column shows the QQ plots between the estimated and 250
measured UFP number concentrations. The top row refers to site 1, the middle row refers to site 251
2 and the bottom row refers to site 3. 252
3.2.2 Remote estimation 253
Tables 4 and 5 show the results obtained in the remote estimation configuration. In Table 4, the 254
evaluation refers to those models that use UFP data recorded at any of the two sites (remote 255
covariates) to estimate UFP at a third site. In Table 5, the evaluation refers to those models that 256
use NOx measurements as remote covariates to estimate UFP at a third site. 257
The models selected in the training phase (higher ML), at all three sites, are the ones that use 258
both UFP measurements recorded at the other two sites (in bold in Table 4). Moreover, the results 259
based on the unseen measurements (R2 and RMSE metrics in Tables 4 and 5) confirm that the 260
selected models outperform the others. Those models explain between 69% (site 1) and 87% (site 261
2) of the variance. 262
Comparison of these results with those obtained in the local estimation configuration (Tables 3) 263
shows that the model performances at sites 1 and 3 are weaker compared with the local 264
estimation and similar at site 2. The weaker performance at two sites can be explained by the 265
absence of local covariates. 266
Table 4: Remote estimation based on UFP covariates: evaluation of the models at half hour 267
resolution and 14 days of training in terms of ML, R2 and RMSE. In bold are denoted the models 268
with the highest ML. 269 Remote Covariates Target Model UFP
Site 1 UFP Site 2
UFP Site 3
R2 RMSE ML Deval (days)
UFP Site 1
GPS1(UFPS2, UFPS3) X X 0.69 0.58 -398 14 GPS1(UFPS2) X 0.68 0.58 -433 14 GPS1(UFPS3) X 0.58 0. 65 -557 14
UFP Site 2
GPS2(UFPS1, UFPS3) X X 0.87 0.35 -190 14 GPS2(UFPS1) X 0.65 0.59 -383 14 GPS2(UFPS3) X 0.82 0.42 -345 14
UFP Site 3
GPS3(UFPS1, UFPS2) X X 0.81 0.42 -423 14 GPS3(UFPS1) X 0.56 0.68 -526 14 GPS3(UFPS2) X 0.80 0.45 -442 14
270
Table 5: Remote estimation based on NO/NO2 covariates: evaluation of the models at half hour 271
resolution and 14 days of training in terms of ML, R2 and RMSE. 272 Remote Covariates Target Model NO/NO2
Site 1 NO/NO2 Site 2
NO/NO2 Site 3
R2 RMSE ML Deval (days)
UFP Site 1
GPS1(NOxS2, NOxS3) X X 0.67 0.61 -405 9 GPS1(NOxS2) X 0.67 0.63 -440 9 GPS1(NOxS3) X 0.61 0.76 -556 14
UFP Site 2
GPS2(NOxS1, NOxS3) X X 0.80 0.47 -280 14 GPS2(NOxS1) X 0.76 0.51 -408 14 GPS2(NOxS3) X 0.74 0.52 -454 14
UFP Site 3
GPS3(NOxS1, NOxS2) X X 0.80 0.47 -443 9 GPS3(NOxS1) X 0.79 0.49 -462 9 GPS3(NOxS2) X 0.68 0.57 -489 14
In the case of models that use NOx measurements (Table 5) recorded at two sites (remote 273
covariates) to estimate UFP at a third site, the best models are obtained using remote NOx 274
measurements from two sites simultaneously. Those models have a similar performance, at sites 275
1 and 3, and worse, at site 2, than that of models that use UFP as covariates, and they explain 276
between 67% (site 1) and 80% (sites 2 and 3) of the variance. 277
We would like to point out that caution has to be taken when comparing the model performances 278
reported in Tables 4 and 5. At site 2, gaseous measurements are limited to 9 days due to monitor 279
malfunctioning (Section 2.4). Therefore, the performance of the models, which use NOx 280
covariates recorded at site 2, are computed using a shorter dataset (Deval) than the others (9 days 281
instead of 14 days). 282
283
Figure 4: Remote estimation. The left column shows the time series plots of the estimated and 284
the measured UFP number concentrations. The dashed grey line is the estimated UFP number 285
concentration and the black line is the measured UFP number concentration relative to the 286
evaluation period (Deval). The middle column shows the scatterplots, R2 and RMSE between the 287
estimated and measured UFP number concentrations. The grey lines have slope 1 and an 288
intercept of 0 (ideal case, when the estimated and measured values are equal). The dashed grey 289
lines delimit the FAC2 area. The right column shows the QQ plots between the estimated and 290
measured UFP number concentrations. The top row refers to site 1, the middle row refers to site 291
2 and the bottom row refers to site 3. 292
Tables 4 and 5 also show that the models based on two remote locations are better performing 293
than models based on covariates from one remote location. For example, at sites 1 and 3, the 294
models that use the remote covariates from site 2 have a similar performance as the ones that use 295
the covariates from the other two remote sites simultaneously. On the other hand, at site 1, the 296
models that use the covariates from site 3, and at site 3, the models that use the covariates from 297
site 1, have a weaker performance than the models that use the covariates from two remote sites 298
simultaneously. At site 2 instead, all the models that use covariates from sites 1 or 3 have a 299
weaker performance than the ones that use the covariates from two remote sites simultaneously. 300
From Figure 4, we can observe that the model tends to overestimate low values of UFP at site 1 301
and underestimate low values at site 2. 302
In summary: (i) model results are comparable when using remote UFP only or when using 303
remote NOx only to estimate UFP at a distant location; (ii) models that use covariates from only 304
one remote site have fair performance only if there is a priori knowledge of which of the two 305
sites is more informative; (iii) models that use covariates from two remote sites do not need a-306
priori knowledge of which of the two sites is more informative because the models learn at 307
which covariate to give more importance during the training period, maximising the likelihood 308
between the covariates and the target function. 309
3.2.3 Mixed estimation 310
Tables 6 and 7 show the performances of the models for the mixed estimation configuration. 311
In Table 6 the evaluation refers to models that use local gaseous covariates (NOx, O3 and CO 312
recorded at the same site where the estimation are made) in addition to UFP concentrations 313
recorded at the other two sites (remote covariates). Table 7 shows the results of cases where only 314
remote NOx (but not UFP) recorded at two sites are added to the local covariates (NOx, O3, CO). 315
Comparison of Tables 3β6 shows that the performances of models are improved when the remote 316
UFP are combined with the local gaseous covariates. The best performances (in bold in Table 6) 317
are obtained using the local NOx plus remote UFP; the models explain more than 90% of the 318
variance at all sites. 319
The models that combine remote UFP with local O3 or CO perform better either than the models 320
that use only local O3 and CO covariates (Table 3) or models based on remote UFP (Table 4). 321
Table 6: Mixed estimation: evaluation of the models at half hour resolution and 14 days of 322
training in terms of ML, R2 and RMSE. In bold are denoted the models with the highest ML. 323
Local Covariates Remote
Covariates
Target Model UFP Site 1
UFP Site 2
UFP Site 3 R2 RMSE ML
Deval (days)
UFP Site 1
GPS1(NOxS1, UFPS2, UFPS3) NO/NO2 X X 0.91 0.32 -114 14 GPS1(O3S1, UFPS2, UFPS3) O3 X X 0.70 0.58 -282 14 GPS1(COS1, UFPS2, UFPS3) CO X X 0.77 0.50 -231 14
UFP Site 2
GPS2(NOxS2, UFPS1, UFPS3) NO/NO2 X X 0.91 0.35 -69 9 GPS2(O3S2, UFPS1, UFPS3) O3 X X 0.89 0.34 -177 9
UFP Site 3
GPS3(NOxS3, UFPS1, UFPS2) NO/NO2 X X 0.92 0.32 -265 14 GPS3(O3S3, UFPS1, UFPS2) O3 X X 0.82 0.42 -404 14 GPS3(COS3, UFPS1, UFPS2) CO X X 0.80 0.50 -367 14
324
Table 7: Mixed estimation: evaluation of the models at half hour resolution and 14 days of 325
training in terms of ML, R2 and RMSE. 326
Local Covariates Remote
Covariates
Target Model NO/NO2 Site 1
NO/NO2 Site 2
NO/NO2 Site 3 R2 RMSE ML
Deval (days)
UFP Site 1
GPS1(NOxS1, NOxS2, NOxS3) NO/NO2 X X 0.91 0.35 -136 9 GPS1(O3S1, NOxS2, NOxS3) O3 X X 0.74 0.62 -340 9 GPS1(COS1, NOxS2, NOxS3) CO X X 0.81 0.53 -299 9
UFP Site 2
GPS2(NOxS2, NOxS1, NOxS3) NO/NO2 X X 0.84 0.42 -188 9 GPS2(O3S2, NOxS1, NOxS3) O3 X X 0.80 0.48 -275 9
UFP Site 3
GPS3(NOxS3, NOxS1, NOxS2) NO/NO2 X X 0.89 0.39 -332 9 GPS3(O3S3, NOxS1, NOxS2) O3 X X 0.82 0.44 -411 9 GPS3(COS3, NOxS1, NOxS2) CO X X 0.80 0.50 -412 9
Comparison between Tables 3 and 7 shows that the models that use NOx measurements from all 327
the sites have similar performances compared to the models that use only the local covariates. In 328
other words, the remote NOx measurements are not improving the estimations based on local 329
gaseous components only. On the other hand, comparing Tables 3, 5 and 7, we note that models 330
that combine remote NOx with local O3 and CO perform better either than models that use local 331
O3 and CO or models based on remote NOx. 332
From Figure 5, we can observe that the model tends to underestimate low and high values of 333
UFP at site 2. However, these deviations are not substantial, and the estimated distributions seem 334
to describe the measurements well. 335
In summary: (i) the addition of remote UFP to local NOx results in improved model 336
performance; (ii) the addition of remote NOx to local NOx does not improve the estimation 337
based on local NOx measurements; (iii) the addition of remote UFP or NOx to local O3 or CO 338
results in improved estimations compared to models that use only local O3 or CO measurements. 339
3.3 Training length 340
In practical situations such as designing the measurement campaign and planning the facilities 341
needed, it is useful to know how the model performs according to the amount of data used for 342
training. In Figure 6, the model performance for each site and for each monitoring strategy is 343
evaluated on different days of training at 30 min resolution (solid lines). One day of training 344
refers to the day before the first day of evaluation, two days of training means two days before 345
the first day of evaluation and so on up to 14 days. 346
The plots show that the performance of models increases with the training length. It seems that a 347
training period of at least seven days (in which at least two days correspond to weekend days) is 348
suitable (in terms of a trade-off between costs and model performance) to let the model learn the 349
UFP dynamics in different typologies of traffic. 350
351
Figure 5: Mixed estimation. The left column shows the time series plots of the estimated and the 352
measured UFP number concentrations. The dashed grey line is the estimated UFP number 353
concentration and the black line is the measured UFP number concentration relative to the 354
evaluation period (Deval). The middle column shows the scatterplots, R2 and RMSE between the 355
estimated and measured UFP number concentrations. The grey lines have slope 1 and an 356
intercept of 0 (ideal case, when the estimated and measured values are equal). The dashed grey 357
lines delimit the FAC2 area. The right column shows the QQ plots between the estimated and 358
measured UFP number concentrations. The top row refers to site 1, the middle row refers to site 359
2 and the bottom row refers to site 3. 360
3.4 Models at 5 min resolution 361
All the above results are based on half hour resolution. Considering the high variability of UFP, it 362
is also interesting to have models with a higher time resolution. In Figure 6, the performances of 363
models for each site and for each monitoring strategy are evaluated on different days of training 364
for models at 5 min resolution (dashed lines). The results of these models, as for the half hour 365
models, show a good correspondence of the modelled UFP values with the measured values. 366
Furthermore, the local and mixed estimation models explain up to 85% of the variance, and the 367
remote estimation around 60%, at site 1. At site 2, the mixed estimation model explains up 85% 368
of the variance, and the local and remote models up to 78% of the variance. At site 3, the mixed 369
estimation model explains up to 90% of the variance, the local estimation model explains up to 370
86% of the variance and the remote estimation model explains up to 72% of the variance. 371
3.5 Network complexity 372
The three estimation strategies have different levels of complexity. In the local estimation, at the 373
estimation site, this strategy requires the presence of the local covariate monitors or sensors (e.g. 374
NOx) for the whole period (training and estimation), plus the UFP monitor for the training 375
period. The remote estimation strategy requires local UFP for the training, and remote NOx or 376
UFP for the training and estimation periods. The mixed estimation requires UFP data at the 377
estimation site for the training period, plus local NOx and remote UFP or NOx data for the 378
training and estimation periods. This is, however, a costly solution, compared with the local 379
estimation case, given the number of monitoring devices needed and a rather limited increase in 380
estimation accuracy. 381
382
Figure 6: Performances of the GP models at half hour (solid lines) and 5 min (dashed lines) 383
resolution evaluated on different days of training. First row: coefficient of determination (R2); 384
second row: root mean square error (RMSE). First column refers to site 1, middle column refers 385
to site 2 and the right column refers to site 3. One day of training refers to the day before the first 386
day of evaluation, two days of training means two days before the first day of evaluation and so 387
on up to 14 days. 388
3.6 Limitations 389
The applied modelling approach also has its limitations. For instance, there is no guarantee that 390
the proposed model structure is optimal. However, different covariates (e.g. traffic and 391
meteorological data) could be easily added to the proposed structure. For instance, considering 392
that the rain influences the concentrations of gaseous pollutants and UFP differently, models that 393
include weather conditions may have better performances. 394
The models are developed and trained in the first place for use in traffic locations within city 395
boundaries. All three locations in this study are urban traffic locations, and their pollution profile 396
is dominated by traffic emissions. The three locations are distinct from each other in terms of 397
traffic intensity, distance to traffic and surrounding street pattern. We have tested the method 398
simultaneously at these three different traffic locations, and results were found to be 399
encouraging. Therefore, we assume that the proposed method could be applied to other traffic 400
locations to address part of the spatial inhomogeneity of UFP between sites within a city reported 401
in literature (Mejia et al. 2008; Buonanno et al., 2011; Mishra et al., 2012; Birmili et al., 2013; 402
Kumar et al. 2014). However, this assumption could not be tested with the available data set. 403
Moreover, this study cannot assess how models trained at one area/city perform in other 404
areas/cities with different fleet composition, traffic dynamics and meteorological circumstances. 405
The transferability of these models to other areas is probably limited when circumstances differ 406
substantially. In that case, a new data collection period should be carried out for model training. 407
A further limitation of the used data set is that it is only one month long, and considering that 408
half of it has been used for training, only half a month was left for the evaluation. This restricted 409
the possibility to assess questions such as how long the proposed model will perform 410
satisfactorily, and how often the training has to be performed. 411
The measurements used in this study were performed during a winter when the influence of 412
photochemical reactions is rather limited. Considering that ratios of NO-NO2-O3 are strongly 413
influenced by photochemistry, and secondary formation of UFP is partially driven by 414
atmospheric photochemical reactions and conversion (Westerdahl et al., 2005; Seinfeld and 415
Pandis, 2006), it could be interesting to study their long-term performances by applying the 416
proposed models on data sets that cover various seasons. 417
Finally, the lower cut-off limit of UFP used here does not account for the nucleation mode 418
particles that are volatile and much more dynamic. It would be interesting to use the model on 419
such data set to evaluate its performance. 420
4. Summary and conclusions 421
In this work, we investigated strategies to estimate UFP at specific locations based on 422
concentrations of gaseous components at the same and remote locations, and/or UFP at remote 423
locations. We have used Gaussian process regression to estimate UFP at three sites in an air 424
pollution monitoring network. 425
In the local estimation, we found that the models that use NOx have the best performances. This 426
strategy would be especially interesting in case a dense network of low-cost gas sensors can be 427
deployed: novel low-cost gas sensors are being developed with increasing level of performance 428
(Mead et al., 2013, Kumar et al., 2015). 429
The case of the remote estimation reflects the situation where one tries to estimate UFP in 430
locations where no local measurements are available. We used the measurements from two 431
locations to estimate UFP at a third location. On a practical level this corresponds to the 432
installation of permanent monitoring devices at two locations, and training the models at all 433
similar locations of interest. The results also suggest that it is possible to estimate UFP at one 434
location based on measurements of NOx at two remote locations. This would give rise to the 435
possibility to install a limited number of NOx monitors at specific locations to estimate UFP at 436
all similar locations in the same city. 437
The case of the mixed estimation examines combinations of remote and local measurements to 438
improve the model performance. This strategy requires the highest number of monitoring 439
devices, and thus presents a trade-off between higher accuracy and increased costs. In practical 440
terms we can conclude that estimations based on remote UFP are improved by adding local 441
covariates to take into account local variability. 442
Acknowledgement 443
This research is part of the IDEA (Intelligent, Distributed Environmental Assessment) project, 444
financially supported by IWT-Vlaanderen (IWT-SBO 080054). The authors thank Carl 445
Rasmussen and Hannes Nickisch for making the GPML Toolbox available. 446
References 447
Alvarez R, Weilenmann M, Favez, J.Y, 2008. Evidence of increased mass fraction of NO2 within 448
real-world NOx emissions of modern light vehicles β derived from a reliable online 449