A comparison of strategies for estimation of ultrafine particle number concentrations in urban air pollution monitoring networks

A Comparison of Strategies for Estimation of Ultrafine Particle Number Concentrations in 1

Urban Air Pollution Monitoring Networks 2

3 Matteo Reggentea*1, Jan Petersa, Jan Theunisa, Martine Van Poppela, Michael Rademakerb, 4

Bernard De Baetsb, Prashant Kumarc,d 5 aVITO, Flemish Institute for Technological Research, Boeretang 200, B-2400 Mol, Belgium 6 bDepartment of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, 7

Coupure links 653, 9000 GENT, Belgium 8 cDepartment of Civil and Environmental Engineering, Faculty of Engineering and Physical 9

Science (FEPS), University of Surrey, GU2 7XH, United Kingdom 10 dEnvironmental Flow (EnFlo) Research Centre, FEPS, University of Surrey, GU2 7XH, United 11

Kingdom 12 Abstract 13

We propose three estimation strategies (local, remote and mixed) for ultrafine particles (UFP) at 14

three sites in an urban air pollution monitoring network. Estimates are obtained through Gaussian 15

process regression based on concentrations of gaseous pollutants (NOx, O3, CO) and UFP. As 16

local strategy, we use local measurements of gaseous pollutants (local covariates) to estimate 17

UFP at the same site. As remote strategy, we use measurements of gaseous pollutants and UFP 18

from two independent sites (remote covariates) to estimate UFP at a third site. As mixed strategy, 19

we use local and remote covariates to estimate UFP. The results suggest: UFP can be estimated 20

with good accuracy based on NOx measurements at the same location; it is possible to estimate 21

UFP at one location based on measurements of NOx or UFP at two remote locations; the addition 22

of remote UFP to local NOx, O3 or CO measurements improves models’ performance. 23

Capsule abstract: 24

UFP can be estimated with good accuracy at one location based on NOx measurements at the 25

same location and based on measurements of NOx or UFP at two remote locations. 26

Key words: Ultrafine particles estimation; urban air pollution; pollution monitoring network; 27 Gaussian process regression; statistical modelling. 28

* Corresponding author. E-mail address: [email protected] (Matteo Reggente). 1 Present address: Atmospheric Particle Research Laboratory, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland. Tel: +41 21 69 36331.

1. Introduction 29

Exposure to traffic-related pollution, especially UFP and nitrogen oxides (NOx), is of great 30

concern in urban environments because of their adverse impact on human health (Hong et al., 31

2002; de Hartog et al., 2003; Atkinson et al., 2010; Jacobs et al., 2010; Bos et al., 2011; Kumar et 32

al., 2011a; Kumar et al., 2014). 33

UFP are commonly defined as particles having a diameter of less than 100 nm (Morawska et al., 34

1998), and the consensus is that these particles contribute most (around 80%) of the total particle 35

number concentration (PNC) (Heal et al., 2012; Kumar et al., 2011b; Morawska et al., 2008; 36

Charron and Harrison, 2003), whereas their corresponding mass accounts for less than 20% of 37

the total particle mass concentration (Kittelson, 1998). UFP can be classified into the 38

“nucleation”, “Aitken” and “accumulation” modes. In terms of size ranges, the nucleation, 39

Aitken and accumulation modes typically encompass 1–30, 20–100 and 30–300 nm, 40

respectively. Particles with a diameter below 30 nm contain nearly 30% of total PNC (Morawska 41

et al., 2008; Kumar et al., 2010). 42

Road vehicle emissions in polluted urban environments can contribute up to 90% of the total 43

PNC (Kumar et al., 2010; Pey et al., 2009). The UFP along the roadside show an association with 44

the vehicle flow characteristics. For instance, increasing vehicle speed increases the emissions of 45

UFP (Kittelson et al., 2004). Among the road vehicles, diesel engines dominate road traffic 46

emission of UFP, and heavy duty vehicles have an average factor of magnitude of two with 47

respect to the light duty engine (Beddows and Harrison, 2008). 48

UFP vary spatially between the sources and the receptors living or travelling close to the roads 49

(Kumar et al., 2014). This variation depends on many factors such as source type and strength, 50

meteorological and dilution conditions, location geometry and transformation processes, among 51

others (Heal et al., 2012; Goel and Kumar, 2014). 52

Currently there is no limit value to control ambient UFP. Consequently, there are not many UFP 53

monitors deployed as part of the governmental monitoring stations. On the other hand, NOx, 54

ozone (O3) and carbon monoxide (CO) are regulated pollutants (Directive 2008/50/EC) and their 55

monitors are spread all over Europe. Nitric oxide (NO) and nitrogen dioxide (NO2) together 56

make NOx. Emissions of NOx are associated with all types of high-temperature combustion, but 57

similar to UFP, their most important sources in urban areas remain road vehicles (Westmoreland 58

et al., 2007; Alvarez et al., 2008; Kumar and Imam, 2013). 59

The dispersion modelling of pollutants mostly fits into two categories: deterministic and 60

statistical. Deterministic dispersion models provide a link between theory and measurements and 61

account for source dynamics and physico-chemical processes explicitly (Holmes and Morawska, 62

2006). A drawback of these models is that they need detailed information (e.g. boundary 63

conditions), which is not always available. Statistical models do not describe the actual physical 64

processes, but they treat the input data as random variables to derive a statistical description of 65

the target distribution using a set of measurements. A few studies have used a statistical approach 66

in the past (Hussein et al. 2006; Clifford et al., 2011; Mølgaard et al., 2012; Sabaliauskas et al. 67

2012; Reggente et al., 2014). 68

We employ a statistical modelling approach – Gaussian process (GP) regression – to estimate 69

UFP in an urban air pollution monitoring network based on local and remote concentrations of 70

NOx, O3, CO and UFP. 71

2. Materials and Methods 72

2.1. Instrumentation 73

We recorded UFP and gaseous pollutants for one month at a sampling frequency of 5 min and 74

then averaged on a half-hourly basis. 75

Measurements of UFP were obtained using the GRIMM Nano-Check model 1.320. The Nano-76

Check can count total PNCs between 25 and 300 nm, and provides the mean diameter of the 77

measured size range. 78

Chemiluminescence (EN 14211), ultraviolet photometry (EN 14625) and non-dispersive infrared 79

(EN14626) analysers (Air-pointer) were used to measure NOx, CO and O3, respectively. The 80

lowest detectable concentration was 1 µg m-3 for NOx and O3, and 50 µg m-3 for CO. 81

Vehicle counts were recorded in four categories (cars, vans, small and big trucks/buses) using 82

double inductive loop detectors at sites 1 and 3; video counting was performed to obtain traffic 83

data at site 2 (Table 1). 84

Table 1: Description of the measurement sites. 85

2.2 Description of the sampling locations 86

Measurements were carried out in the Borgerhout district (51º 13′ N and 4 º 26′ E) of Antwerp, 87

Belgium. Borgerhout is a typical urban commercial and residential area with busy traffic. 88

Measurements were carried out simultaneously for one month (12/02/2010–12/03/2010) at three 89

different sites (Figure 1). Sites 1 and 2 were located in two street canyons with two traffic lanes 90

and moderate levels of traffic. The monitoring devices were deployed in parking lots (few 91

metres far from the traffic). Site 3 was located in a parking area ∼30 m far from a major access 92

road with busy traffic intersections and four lanes (two in each direction) and ∼200 m far from a 93

Distance from traffic

(m)

Weekday traffic

volume (veh/day)

Weekend traffic

volume (veh/day)

Heavy duty vehicle

on weekday

(weekend) (% total)

Site 1 ~ 3 5000 4000 5% (2%)

Site 2 ~ 2 4000 3000 4% (2%)

Site 3 ~ 20-30 37,000 25,000 7% (3%)

highway. 94

95

Figure 1: Map of the measurement sites (Antwerp, Belgium) and their distances from each other. 96

The images show the deployed instrumentation at each site. The black arrow in the image of site 97

3 shows the location of the deployed monitors. 98

2.3. Description of the model 99

2.3.1 Gaussian process regression 100

We treat the estimation problem as a non-parametric regression problem, and solve it using 101

Gaussian process (GP) regression. 102

Definition: A Gaussian process is a collection of random variables, any finite number of which 103

have a joint Gaussian distribution (Rasmussen and Williams, 2006). We want to learn, from a set 104

of measurements (D) a function f(·) of the relationship existing between the set of covariates x 105

(NOx, CO, O3, UFP) and the target variable, UFP (y), assuming that the observed data y is 106

generated with Gaussian noise around the underlying function f 107

𝒚 = 𝑓 𝐱 + 𝜖 (1)

Because of the nature of the dataset used, we do not assume an independent noise, and the 108

dependencies are modelled adding a noise term to the covariance function (kNoise). This method 109

has been suggested by Rasmussen and Williams (2006) and by Murray-Smith and Girard (2001). 110

Prior beliefs about the properties of the latent function are included in the mean m(x) and 111

covariance k(x,x′) functions. In order to estimate UFP based on data, we consider the joint 112

Gaussian prior of the training observations y and the test outputs f∗. The posterior distribution is 113

obtained by conditioning the prior on the observed training outputs, such that the conditional 114

distribution of f∗ only contains those functions from the prior that are consistent with the training 115

data 116

𝑝 𝒇∗ 𝐗∗,𝐗, 𝐲 = 𝒩(𝝁∗, 𝚺∗) (2)

where 117

𝝁∗ = 𝑲 𝐗∗,𝐗 𝑲 𝐗,𝐗 + 𝜎!!𝑰 !!𝒚 (3)

𝚺∗ = 𝑲 𝐗∗,𝐗∗ − 𝑲 𝐗∗,𝐗 𝑲 𝐗,𝐗 + 𝜎!!𝑰 !!𝑲(𝐗,𝐗∗) (4)

are the posterior mean and the posterior variance, respectively. K is the covariance matrix, which 118

is built from a covariance function (or kernel) k(x,x′). X and X∗ are the matrices of the training 119

and test inputs, respectively. 120

We have assumed that the mean of the GP prior is zero everywhere. At first glance, this could 121

appear restrictive, but in practice it is not, because offsets and simple trends can be eliminated 122

before modelling. The covariance function defines similarity between data points and it is chosen 123

such that it reflects the prior beliefs about the function to be learned. Because UFP follow from a 124

dynamic process, we have used a non-stationary kernel based on the addition of a linear (kLin) 125

and a rational quadratic kernel (kRQ). Moreover, we also include a noise term (kNoise) to take into 126

account noise dependencies. The sum of kernels allows us to model the data as a superposition of 127

independent functions representing different structures: 128

𝑘 𝐱, 𝐱! = 𝑘!"# 𝐱, 𝐱! + 𝑘!" 𝐱, 𝐱! + 𝑘!"#$% 𝐱, 𝐱! (5)

𝑘!"# 𝐱, 𝐱! = 𝜎!! + (𝐱!𝐱′) (6)

𝑘!" 𝐱, 𝐱! = 𝜎!exp 1 +

𝐱 − 𝐱! !

2𝛼ℓ𝓁!

!!

(7)

For the noise model (kNoise) we use the sum of a squared exponential (kSE) contribution and an 129

independent component 130

𝑘!"#$% 𝐱, 𝐱! = 𝜎!!exp −

𝐱 − 𝐱! !

2ℓ𝓁!!

!

+ 𝜎!!𝛿!!! (8)

The hyperparameters are σ0 (offset of the model), σ, σn and σl (magnitudes), α (relative 131

weighing), ℓ𝓁 and ℓ𝓁! (length-scales). The reliability of the regression is dependent on how well 132

we select the covariance function and therefore the covariance hyperparameters θ. The 133

hyperparameters are selected by minimising the negative log marginal likelihood with respect to 134

θ. Since by assumption the distribution of the data is Gaussian, the log marginal likelihood is: 135

ℒ = log 𝑝(𝐲|𝐱,𝜽) = −12𝐲! 𝑘 𝐗,𝐗 !!𝐲 −

12log 𝑘 𝐗,𝐗 −

𝑛2log2 𝜋 (9)

The values of the hyperparameters that optimize the marginal likelihood, are found using its 136

partial derivative in conjunction with a numerical optimization routine based on conjugate 137

gradients. We refer to Chapters 2 and 5 in Rasmussen and Williams (2006) for a detailed 138

description of GP models. 139

The major limitation of GP regression is the computational complexity, since it requires matrix 140

inversion, which has a complexity of 𝒪(𝑛!), where n is the number of training data points. 141

Different solutions have been proposed to cope with this problem (e.g. Higdon, 1998; Rasmussen 142

and Ghahramani, 2002; Snelson and Ghahramani, 2006). In this work, in the case of 5 min data, 143

we have used the FITC approximation (Snelson and Ghahramani, 2006). 144

2.3.2. Estimation strategies 145

Figure 2 depicts the three different strategies that we have employed to estimate UFP at each site 146

in the monitoring network. For the sake of brevity, we show only the estimation of UFP number 147

concentration (target of the model) at site 1 (black dot) in each panel of Figure 2. 148

Considering the high cost of the pollutant monitors (~10,000 Euro), we have evaluated models 149

that use covariates from only one monitor at each site. The inclusion of additional covariates 150

requires the inclusion of one monitor for each covariate, increasing the costs of the 151

instrumentation and maintenance. 152

Local estimation: At each site, we use local measurements of NOx, O3 and CO (local covariates) 153

to estimate UFP at the same site. 154

Remote estimation: In this strategy, we use either UFP or NOx measurements from two sites to 155

estimate UFP at a third site. For this strategy, we evaluate the models for the cases in which UFP 156

measurements are either included or not in the set of covariates. 157

Mixed estimation: In this strategy, we use combinations of local gaseous pollutants 158

measurements (local covariates) and remote UFP or gaseous pollutants measurements (remote 159

covariates) to estimate UFP at the target site. Also for this strategy, we evaluate the models for 160

the cases in which remote UFP number concentration measurements are either included or not in 161

the set of covariates. 162

163

Figure 2: Estimation strategies and their set of covariates for: local estimation (left), remote 164

estimation (middle), and mixed estimation (right). Black dots depict the site of the UFP 165

estimation. 166

2.4. Model evaluation 167

In order to evaluate the model we have followed the steps suggested by Bennett et al. (2013). 168

First of all, we have divided the dataset at each site into two disjoint datasets. The data collected 169

during the first two weeks of the measurement campaign have been used as training set (Dtrain). 170

The data collected during the third and fourth week of the measurement campaign have been 171

used as unseen data to evaluate the proposed model (Deval). At site 2, the evaluation dataset is 172

limited to 9 days due to monitor malfunctioning. In the second step, we have used the highest 173

marginal likelihood (ML) to select the models that have at the same time a good fit and a low 174

complexity. At this stage, we have compared models (at half hour resolution) based on the 175

maximum length (14 days) of training. In the third step, we have evaluated the selected models 176

in terms of their ability to estimate unseen measurements. We have used the R2 and RMSE 177

metrics because their wide usage aids communication of the model performance. 178

2.4.1. Marginal Likelihood (ML) 179

The log marginal likelihood is 180

log 𝑝(𝐲|𝐱,𝜽) = − 12𝐲! 𝑘 𝐗,𝐗 !!𝐲 −

12log 𝑘 𝐗,𝐗 −

𝑛2log2𝜋 (14)

The first term gives a measure of the quality of the model fit. It is the only term that involves 181

observed targets. The second term is a complexity penalty term, which measures and penalizes 182

the complexity of the model. The third term is a log normalization term. Models with a higher 183

ML should be preferred to models with a lower ML. 184

2.4.2. Coefficient of determination (R2) 185

The coefficient of determination R2 indicates the fraction of variance of observations explained 186

by the model: 187

R! = 1 −

𝑦! − 𝑦∗ !!!!!

𝑦! − 𝑦 !!!!!

(15)

where ym and y∗ are the measured and estimated UFP; 𝑦 is the mean of the observed UFP; M is 188

the number of evaluation measurements. 189

2.4.3. Root Mean Square Error (RMSE) 190

The root mean square error (RMSE) is calculated as the difference between the measured UFP 191

and the estimated ones: 192

RMSE =1𝑀

𝑦! − 𝑦∗ !!

!!!

(16)

where ym and y∗ are the measured and estimated UFP and M is the number of evaluation 193

measurements. 194

3. Results and discussion 195

In this section, first we present a statistical summary of UFP concentrations, diameters and 196

gaseous pollutant concentrations recorded over the entire sampling period. Second, for each 197

model strategy, we show the model selection results (based on ML). Third, we evaluate and 198

discuss the performance of the selected GP models by comparing the estimated UFP with the 199

measured ones. We conclude by assessing the models on different amounts of training data at 200

half hour resolution and their performance at 5 min resolution. All the results are based on log-201

transformed and standardized data with zero mean and unit variance. 202

3.1 Summary statistics 203

Table 2 shows that the UFP concentrations are within the same range at all three sites, although 204

traffic density at site 3 is almost one order of magnitude higher than that at the other two sites 205

(Table 1). Dilution effects are leading to the lower UFP concentration at site 3 compared to what 206

would have been expected from the traffic counts. The summary of data in Table 2 confirms that 207

the traffic volume is not the only decisive factor in the variations of UFP. The distance from the 208

moving traffic, the site morphology and dispersion conditions specific to individual sampling 209

locations are also contributing factors (Kumar et al., 2014). These observations are also in 210

agreement with the findings of Kumar et al. (2008), Buonanno et al. (2009), Kumar et al. (2009), 211

Buonanno et al. (2011), Fujitani et al. (2012) and Peters et al. (2013). 212

The mean UFP diameters at the sites are also similar and vary between 48 nm (site 3) and 52 nm 213

(site1), with the maximum values ranging between 80 nm (sites 2 and 3) and 96 nm (site 1). 214

The NOx, CO and O3 concentrations (according to the medians and the inter-quartile ranges) are 215

similar at all three sites. The higher traffic intensity at site 3 as compared to sites 1 and 2 is again 216

probably offset by the larger distance to the traffic (Table 1) and resulting pollutant dilution. 217

Table 2: Summary statistics of UFP number concentrations measured at the three sites. 218 Variable Mean Std Median Min Max Q1 Q3

Site 1

UFP (#cm-3) 22,810 12,934 20,628 1768 88,004 13,190 29,316

UFP diameter (nm) 52 9 50 32 96 46 57

NO (µg m-3) 57 69 36 0 571 14 70

NO2 (µg m-3) 56 26 54 5 150 36 73

O3 (µg m-3) 32 21 32 1 88 14 49

CO (µg m-3) 435 220 376 84 1658 286 515

Site 2

UFP (#cm-3) 21,586 11,249 19,278 2168 80,355 13,866 27,486

UFP diameter (nm) 51 8 49 29 80 44 56

NO (µg m-3) 78 80 53 1 716 29 97

NO2 (µg m-3) 72 30 70 11 170 50 89

O3 (µg m-3) 27 19 23 3 83 9 41

CO (µg m-3) - - - - - - -

Site 3

UFP (#cm-3) 23,219 14,129 19,518 2528 87,210 13,703 28,883

UFP diameter (nm) 48 8 46 28 80 42 53

NO (µg m-3) 69 101 31 1 854 11 85

NO2 (µg m-3) 62 33 57 7 218 37 82

O3 (µg m-3) 30 21 26 1 91 10 47

CO (µg m-3) 322 221 270 25 1606 164 403

3.2 Model selection and evaluation 219

3.2.1 Local estimation 220

The ML metric (based on Dtrain; Table 3) reveals that the models that use NOx as covariates 221

(GPSn(NOxSn), n = 1,…,3 in bold) outperform the models that use O3 (GPSn(O3Sn), n = 1,…3) or 222

CO as covariates (GPSm(COSm), m = 1,…, 2). Moreover, the results based on the unseen 223

measurements (R2 and RMSE metrics in Table 3) confirm that the selected models 224

(GPSn(NOxSn)) outperform the others, and they show a good correspondence between the 225

modelled and the measured values. At all three sites, the models explain between 87% (site 2) 226

and 90% (site 1) of the variance. 227

These results are probably due to the strong correlation of UFP with NOx. More in detail, road 228

vehicles are the major sources of UFP in urban environments (Kumar et al., 2014; Harrison et al., 229

2011; Pey et al., 2009). These vehicles also generate NO (from all types of combustion engines) 230

and primary NO2 (especially from diesel cars equipped with after treatment technologies 231

including oxidation catalysts) at the same time. Moreover, the Belgian car fleet presents a high 232

share of diesel vehicles (64.3%; Beckx et al., 2013), which have high emission of both UFP and 233

NOx (Beddows and Harrison, 2008). 234

From Figure 3, we can observe that the model tends to underestimate high and low values of 235

UFP at site 2 as opposed to underestimation of low values at site 3. It should be emphasized that 236

these deviations are not substantial, and the estimated distributions seem to describe the 237

measurements well. In particular, the models do not tend to underestimate high concentrations. 238

In summary: the GP models that use NOx as covariates outperform the models that use CO and 239

O3 as covariates. 240

Table 3: Local estimation: evaluation of the models at half hour resolution and 14 days of 241

training in terms of ML, R2 and RMSE. In bold are denoted the models with the highest ML. 242

Target Model Local Covariates R2 RMSE ML Deval

(days)

UFP Site 1

GPS1(NOxS1) NO/NO2 0.90 0.35 -115 14 GPS1(O3S1) O3 0.57 0.67 -466 14 GPS1(COS1) CO 0.55 0.67 -511 14

UFP Site 2

GPS2(NOxS2) NO/NO2 0.87 0.45 -168 9 GPS2(O3S2) O3 0.53 0.73 -392 9

UFP Site 3

GPS3(NOxS3) NO/NO2 0.88 0.39 -403 14 GPS3(O3S3) O3 0.65 0.62 -520 14 GPS3(COS3) CO 0.40 0.78 -588 14

243

Figure 3: Local estimation. The left column shows the time series plots of the estimated and the 244

measured UFP number concentrations. The dashed grey line is the estimated UFP number 245

concentration and the black line is the measured UFP number concentration relative to the 246

evaluation period (Deval). The middle column shows the scatterplots, R2 and RMSE between the 247

estimated and measured UFP number concentrations. The grey lines have slope 1 and an 248

intercept of 0 (ideal case, when the estimated and measured values are equal). The dashed grey 249

lines delimit the FAC2 area. The right column shows the QQ plots between the estimated and 250

measured UFP number concentrations. The top row refers to site 1, the middle row refers to site 251

2 and the bottom row refers to site 3. 252

3.2.2 Remote estimation 253

Tables 4 and 5 show the results obtained in the remote estimation configuration. In Table 4, the 254

evaluation refers to those models that use UFP data recorded at any of the two sites (remote 255

covariates) to estimate UFP at a third site. In Table 5, the evaluation refers to those models that 256

use NOx measurements as remote covariates to estimate UFP at a third site. 257

The models selected in the training phase (higher ML), at all three sites, are the ones that use 258

both UFP measurements recorded at the other two sites (in bold in Table 4). Moreover, the results 259

based on the unseen measurements (R2 and RMSE metrics in Tables 4 and 5) confirm that the 260

selected models outperform the others. Those models explain between 69% (site 1) and 87% (site 261

2) of the variance. 262

Comparison of these results with those obtained in the local estimation configuration (Tables 3) 263

shows that the model performances at sites 1 and 3 are weaker compared with the local 264

estimation and similar at site 2. The weaker performance at two sites can be explained by the 265

absence of local covariates. 266

Table 4: Remote estimation based on UFP covariates: evaluation of the models at half hour 267

resolution and 14 days of training in terms of ML, R2 and RMSE. In bold are denoted the models 268

with the highest ML. 269 Remote Covariates Target Model UFP

Site 1 UFP Site 2

UFP Site 3

R2 RMSE ML Deval (days)

UFP Site 1

GPS1(UFPS2, UFPS3) X X 0.69 0.58 -398 14 GPS1(UFPS2) X 0.68 0.58 -433 14 GPS1(UFPS3) X 0.58 0. 65 -557 14

UFP Site 2

GPS2(UFPS1, UFPS3) X X 0.87 0.35 -190 14 GPS2(UFPS1) X 0.65 0.59 -383 14 GPS2(UFPS3) X 0.82 0.42 -345 14

UFP Site 3

GPS3(UFPS1, UFPS2) X X 0.81 0.42 -423 14 GPS3(UFPS1) X 0.56 0.68 -526 14 GPS3(UFPS2) X 0.80 0.45 -442 14

270

Table 5: Remote estimation based on NO/NO2 covariates: evaluation of the models at half hour 271

resolution and 14 days of training in terms of ML, R2 and RMSE. 272 Remote Covariates Target Model NO/NO2

Site 1 NO/NO2 Site 2

NO/NO2 Site 3

R2 RMSE ML Deval (days)

UFP Site 1

GPS1(NOxS2, NOxS3) X X 0.67 0.61 -405 9 GPS1(NOxS2) X 0.67 0.63 -440 9 GPS1(NOxS3) X 0.61 0.76 -556 14

UFP Site 2


UFP Site 3


In the case of models that use NOx measurements (Table 5) recorded at two sites (remote 273

covariates) to estimate UFP at a third site, the best models are obtained using remote NOx 274

measurements from two sites simultaneously. Those models have a similar performance, at sites 275

1 and 3, and worse, at site 2, than that of models that use UFP as covariates, and they explain 276

between 67% (site 1) and 80% (sites 2 and 3) of the variance. 277

We would like to point out that caution has to be taken when comparing the model performances 278

reported in Tables 4 and 5. At site 2, gaseous measurements are limited to 9 days due to monitor 279

malfunctioning (Section 2.4). Therefore, the performance of the models, which use NOx 280

covariates recorded at site 2, are computed using a shorter dataset (Deval) than the others (9 days 281

instead of 14 days). 282

283

Figure 4: Remote estimation. The left column shows the time series plots of the estimated and 284

the measured UFP number concentrations. The dashed grey line is the estimated UFP number 285








Tables 4 and 5 also show that the models based on two remote locations are better performing 293

than models based on covariates from one remote location. For example, at sites 1 and 3, the 294

models that use the remote covariates from site 2 have a similar performance as the ones that use 295

the covariates from the other two remote sites simultaneously. On the other hand, at site 1, the 296

models that use the covariates from site 3, and at site 3, the models that use the covariates from 297

site 1, have a weaker performance than the models that use the covariates from two remote sites 298

simultaneously. At site 2 instead, all the models that use covariates from sites 1 or 3 have a 299

weaker performance than the ones that use the covariates from two remote sites simultaneously. 300

From Figure 4, we can observe that the model tends to overestimate low values of UFP at site 1 301

and underestimate low values at site 2. 302

In summary: (i) model results are comparable when using remote UFP only or when using 303

remote NOx only to estimate UFP at a distant location; (ii) models that use covariates from only 304

one remote site have fair performance only if there is a priori knowledge of which of the two 305

sites is more informative; (iii) models that use covariates from two remote sites do not need a-306

priori knowledge of which of the two sites is more informative because the models learn at 307

which covariate to give more importance during the training period, maximising the likelihood 308

between the covariates and the target function. 309

3.2.3 Mixed estimation 310

Tables 6 and 7 show the performances of the models for the mixed estimation configuration. 311

In Table 6 the evaluation refers to models that use local gaseous covariates (NOx, O3 and CO 312

recorded at the same site where the estimation are made) in addition to UFP concentrations 313

recorded at the other two sites (remote covariates). Table 7 shows the results of cases where only 314

remote NOx (but not UFP) recorded at two sites are added to the local covariates (NOx, O3, CO). 315

Comparison of Tables 3–6 shows that the performances of models are improved when the remote 316

UFP are combined with the local gaseous covariates. The best performances (in bold in Table 6) 317

are obtained using the local NOx plus remote UFP; the models explain more than 90% of the 318

variance at all sites. 319

The models that combine remote UFP with local O3 or CO perform better either than the models 320

that use only local O3 and CO covariates (Table 3) or models based on remote UFP (Table 4). 321

Table 6: Mixed estimation: evaluation of the models at half hour resolution and 14 days of 322

training in terms of ML, R2 and RMSE. In bold are denoted the models with the highest ML. 323

Local Covariates Remote

Covariates

Target Model UFP Site 1

UFP Site 2

UFP Site 3 R2 RMSE ML

Deval (days)

UFP Site 1

GPS1(NOxS1, UFPS2, UFPS3) NO/NO2 X X 0.91 0.32 -114 14 GPS1(O3S1, UFPS2, UFPS3) O3 X X 0.70 0.58 -282 14 GPS1(COS1, UFPS2, UFPS3) CO X X 0.77 0.50 -231 14

UFP Site 2

GPS2(NOxS2, UFPS1, UFPS3) NO/NO2 X X 0.91 0.35 -69 9 GPS2(O3S2, UFPS1, UFPS3) O3 X X 0.89 0.34 -177 9

UFP Site 3

GPS3(NOxS3, UFPS1, UFPS2) NO/NO2 X X 0.92 0.32 -265 14 GPS3(O3S3, UFPS1, UFPS2) O3 X X 0.82 0.42 -404 14 GPS3(COS3, UFPS1, UFPS2) CO X X 0.80 0.50 -367 14

324

Table 7: Mixed estimation: evaluation of the models at half hour resolution and 14 days of 325

training in terms of ML, R2 and RMSE. 326

Local Covariates Remote

Covariates

Target Model NO/NO2 Site 1

NO/NO2 Site 2

NO/NO2 Site 3 R2 RMSE ML

Deval (days)

UFP Site 1

GPS1(NOxS1, NOxS2, NOxS3) NO/NO2 X X 0.91 0.35 -136 9 GPS1(O3S1, NOxS2, NOxS3) O3 X X 0.74 0.62 -340 9 GPS1(COS1, NOxS2, NOxS3) CO X X 0.81 0.53 -299 9

UFP Site 2

GPS2(NOxS2, NOxS1, NOxS3) NO/NO2 X X 0.84 0.42 -188 9 GPS2(O3S2, NOxS1, NOxS3) O3 X X 0.80 0.48 -275 9

UFP Site 3

GPS3(NOxS3, NOxS1, NOxS2) NO/NO2 X X 0.89 0.39 -332 9 GPS3(O3S3, NOxS1, NOxS2) O3 X X 0.82 0.44 -411 9 GPS3(COS3, NOxS1, NOxS2) CO X X 0.80 0.50 -412 9

Comparison between Tables 3 and 7 shows that the models that use NOx measurements from all 327

the sites have similar performances compared to the models that use only the local covariates. In 328

other words, the remote NOx measurements are not improving the estimations based on local 329

gaseous components only. On the other hand, comparing Tables 3, 5 and 7, we note that models 330

that combine remote NOx with local O3 and CO perform better either than models that use local 331

O3 and CO or models based on remote NOx. 332

From Figure 5, we can observe that the model tends to underestimate low and high values of 333

UFP at site 2. However, these deviations are not substantial, and the estimated distributions seem 334

to describe the measurements well. 335

In summary: (i) the addition of remote UFP to local NOx results in improved model 336

performance; (ii) the addition of remote NOx to local NOx does not improve the estimation 337

based on local NOx measurements; (iii) the addition of remote UFP or NOx to local O3 or CO 338

results in improved estimations compared to models that use only local O3 or CO measurements. 339

3.3 Training length 340

In practical situations such as designing the measurement campaign and planning the facilities 341

needed, it is useful to know how the model performs according to the amount of data used for 342

training. In Figure 6, the model performance for each site and for each monitoring strategy is 343

evaluated on different days of training at 30 min resolution (solid lines). One day of training 344

refers to the day before the first day of evaluation, two days of training means two days before 345

the first day of evaluation and so on up to 14 days. 346

The plots show that the performance of models increases with the training length. It seems that a 347

training period of at least seven days (in which at least two days correspond to weekend days) is 348

suitable (in terms of a trade-off between costs and model performance) to let the model learn the 349

UFP dynamics in different typologies of traffic. 350

351

Figure 5: Mixed estimation. The left column shows the time series plots of the estimated and the 352

measured UFP number concentrations. The dashed grey line is the estimated UFP number 353








3.4 Models at 5 min resolution 361

All the above results are based on half hour resolution. Considering the high variability of UFP, it 362

is also interesting to have models with a higher time resolution. In Figure 6, the performances of 363

models for each site and for each monitoring strategy are evaluated on different days of training 364

for models at 5 min resolution (dashed lines). The results of these models, as for the half hour 365

models, show a good correspondence of the modelled UFP values with the measured values. 366

Furthermore, the local and mixed estimation models explain up to 85% of the variance, and the 367

remote estimation around 60%, at site 1. At site 2, the mixed estimation model explains up 85% 368

of the variance, and the local and remote models up to 78% of the variance. At site 3, the mixed 369

estimation model explains up to 90% of the variance, the local estimation model explains up to 370

86% of the variance and the remote estimation model explains up to 72% of the variance. 371

3.5 Network complexity 372

The three estimation strategies have different levels of complexity. In the local estimation, at the 373

estimation site, this strategy requires the presence of the local covariate monitors or sensors (e.g. 374

NOx) for the whole period (training and estimation), plus the UFP monitor for the training 375

period. The remote estimation strategy requires local UFP for the training, and remote NOx or 376

UFP for the training and estimation periods. The mixed estimation requires UFP data at the 377

estimation site for the training period, plus local NOx and remote UFP or NOx data for the 378

training and estimation periods. This is, however, a costly solution, compared with the local 379

estimation case, given the number of monitoring devices needed and a rather limited increase in 380

estimation accuracy. 381

382

Figure 6: Performances of the GP models at half hour (solid lines) and 5 min (dashed lines) 383

resolution evaluated on different days of training. First row: coefficient of determination (R2); 384

second row: root mean square error (RMSE). First column refers to site 1, middle column refers 385

to site 2 and the right column refers to site 3. One day of training refers to the day before the first 386

day of evaluation, two days of training means two days before the first day of evaluation and so 387

on up to 14 days. 388

3.6 Limitations 389

The applied modelling approach also has its limitations. For instance, there is no guarantee that 390

the proposed model structure is optimal. However, different covariates (e.g. traffic and 391

meteorological data) could be easily added to the proposed structure. For instance, considering 392

that the rain influences the concentrations of gaseous pollutants and UFP differently, models that 393

include weather conditions may have better performances. 394

The models are developed and trained in the first place for use in traffic locations within city 395

boundaries. All three locations in this study are urban traffic locations, and their pollution profile 396

is dominated by traffic emissions. The three locations are distinct from each other in terms of 397

traffic intensity, distance to traffic and surrounding street pattern. We have tested the method 398

simultaneously at these three different traffic locations, and results were found to be 399

encouraging. Therefore, we assume that the proposed method could be applied to other traffic 400

locations to address part of the spatial inhomogeneity of UFP between sites within a city reported 401

in literature (Mejia et al. 2008; Buonanno et al., 2011; Mishra et al., 2012; Birmili et al., 2013; 402

Kumar et al. 2014). However, this assumption could not be tested with the available data set. 403

Moreover, this study cannot assess how models trained at one area/city perform in other 404

areas/cities with different fleet composition, traffic dynamics and meteorological circumstances. 405

The transferability of these models to other areas is probably limited when circumstances differ 406

substantially. In that case, a new data collection period should be carried out for model training. 407

A further limitation of the used data set is that it is only one month long, and considering that 408

half of it has been used for training, only half a month was left for the evaluation. This restricted 409

the possibility to assess questions such as how long the proposed model will perform 410

satisfactorily, and how often the training has to be performed. 411

The measurements used in this study were performed during a winter when the influence of 412

photochemical reactions is rather limited. Considering that ratios of NO-NO2-O3 are strongly 413

influenced by photochemistry, and secondary formation of UFP is partially driven by 414

atmospheric photochemical reactions and conversion (Westerdahl et al., 2005; Seinfeld and 415

Pandis, 2006), it could be interesting to study their long-term performances by applying the 416

proposed models on data sets that cover various seasons. 417

Finally, the lower cut-off limit of UFP used here does not account for the nucleation mode 418

particles that are volatile and much more dynamic. It would be interesting to use the model on 419

such data set to evaluate its performance. 420

4. Summary and conclusions 421

In this work, we investigated strategies to estimate UFP at specific locations based on 422

concentrations of gaseous components at the same and remote locations, and/or UFP at remote 423

locations. We have used Gaussian process regression to estimate UFP at three sites in an air 424

pollution monitoring network. 425

In the local estimation, we found that the models that use NOx have the best performances. This 426

strategy would be especially interesting in case a dense network of low-cost gas sensors can be 427

deployed: novel low-cost gas sensors are being developed with increasing level of performance 428

(Mead et al., 2013, Kumar et al., 2015). 429

The case of the remote estimation reflects the situation where one tries to estimate UFP in 430

locations where no local measurements are available. We used the measurements from two 431

locations to estimate UFP at a third location. On a practical level this corresponds to the 432

installation of permanent monitoring devices at two locations, and training the models at all 433

similar locations of interest. The results also suggest that it is possible to estimate UFP at one 434

location based on measurements of NOx at two remote locations. This would give rise to the 435

possibility to install a limited number of NOx monitors at specific locations to estimate UFP at 436

all similar locations in the same city. 437

The case of the mixed estimation examines combinations of remote and local measurements to 438

improve the model performance. This strategy requires the highest number of monitoring 439

devices, and thus presents a trade-off between higher accuracy and increased costs. In practical 440

terms we can conclude that estimations based on remote UFP are improved by adding local 441

covariates to take into account local variability. 442

Acknowledgement 443

This research is part of the IDEA (Intelligent, Distributed Environmental Assessment) project, 444

financially supported by IWT-Vlaanderen (IWT-SBO 080054). The authors thank Carl 445

Rasmussen and Hannes Nickisch for making the GPML Toolbox available. 446

References 447

Alvarez R, Weilenmann M, Favez, J.Y, 2008. Evidence of increased mass fraction of NO2 within 448

real-world NOx emissions of modern light vehicles – derived from a reliable online 449

measuring method. Atmospheric Environment 42, 4699–4707. 450

Atkinson R.W, Fuller G.W, Anderson H.R, Harrison R.M, Armstrong B, 2010. Urban ambient 451

particle metrics and health: a time-series analysis. Epidemiology 21, 501–511. 452

Beddows D.C.S, Harrison R.M, 2008. Comparison of average particle number emission factors 453

for heavy and light duty vehicles derived from rolling chassis dynamometer and field studies. 454

Atmospheric Environment 42, 7954–7966. 455

Beckx C, Denys T, Michiels H, 2013. Analysis of the Belgian Car Fleet 2012 – Report for the 456

Flemish, the Walloon and the Brussels Capital Region. Technical Report. VITO. 457

Bennett N.D, Croke B.F.W, Guariso G, Guillaume J.H.A, Hamilton S.H, Jakeman A.J, Marsili-458

Libelli S, Newham L.T.H, Norton J.P, Perrin C, Pierce S.A, Robson B, Seppelt R, Voinov 459

A.A, Fath B.D, Andreassian V., 2013. Characterising performance of environmental models. 460

Environmental Modelling & Software 40, 1–20. 461

Birmili W, Tomsche L, Sonntag A, Opelt C, Weinhold K, Nordmann S, Schmidt, W, 2013. 462

Variability of aerosol particles in the urban atmosphere of Dresden (Germany): effects of 463

spatial scale and particle size. Meteorologische Zeitschrift 22, 195–211. 464

Bos I, Jacobs L, Nawrot T.S, de Geus B, Torfs R, Int Panis L, Degraeuwe B, Meeusen R, 2011. 465

No exercise-induced increase in serum BDNF after cycling near a major traffic road. 466

Neuroscience Letters 500, 129–132. 467

Buonanno G, Lall A.A, Stabile L, 2009. Temporal size distribution and concentration of particles 468

near a major highway. Atmospheric Environment 43, 1100–1105. 469

Buonanno G, Fuoco F.C, Stabile L, 2011. Influential parameters on particle exposure of 470

pedestrians in urban microenvironments. Atmospheric Environment 45, 1434–1443. 471

Charron A, Harrison R.M, 2003. Primary particle formation from vehicle emissions during 472

exhaust dilution in the roadside atmosphere. Atmospheric Environment 37, 4109–4119. 473

Clifford S, Low Choy S, Hussein T, Mengersen K, Morawska L, 2011. Using the Generalised 474

Additive Model to model the particle number count of ultrafine particles. Atmospheric 475

Environment 45, 5934–5945. 476

de Hartog J.J, Hoek G, Peters A, Timonen K.L, Ibald-Mulli A, Brunekreef B, Heinrich J, 477

Tiittanen P, van Wijnen J.H, Kreyling W, Kulmala M, Pekkanen J, 2003. Effects of fine and 478

ultrafine particles on cardiorespiratory symptoms in elderly subjects with coronary heart 479

disease: the ULTRA study. American Journal of Epidemiology 157, 613–623. 480

Fujitani Y, Kumar P, Tamura K, Fushimi A, Hasegawa S, Takahashi K, Tanabe K, Kobayashi S, 481

Hirano S, 2012. Seasonal differences of the atmospheric particle size distribution in a 482

metropolitan area in Japan. Science of The Total Environment 437, 339–347. 483

Goel A, Kumar P, 2014. A review of fundamental drivers governing the emissions, dispersion 484

and exposure to vehicle-emitted nanoparticles at signalised traffic intersections. Atmospheric 485

Environment 97, 316–331. 486

Harrison R.M, Beddows D.C.S, Dall’Osto M, 2011. PMF analysis of wide-range particle size 487

spectra collected on a major highway. Environmental Science & Technology 45, 5522–5528. 488

Heal M.R., Kumar P, Harrison R.M, 2012. Particles, air quality, policy and health. Chemical 489

Society Reviews 41, 6606–6630. 490

Higdon D, 1998. A process-convolution approach to modelling temperatures in the North 491

Atlantic Ocean. Environmental and Ecological Statistics 5, 173–190. 492

Holmes N.S, Morawska L, 2006. A review of dispersion modelling and its application to the 493

dispersion of particles: An overview of different dispersion models available. Atmospheric 494

Environment 40, 5902–5928. 495

Hong Y.C, Lee J.T, Kim H, Ha E.H, Schwartz J, Christiani D.C, 2002. Effects of air pollutants on 496

acute stroke mortality. Environmental Health Perspectives 110, 187–191. 497

Hussein T, Karppinen A, Kukkonen J, Härkönen J, Aalto P.P, Hämeri K, Kerminen V.M, Kulmala 498

M, 2006. Meteorological dependence of size-fractionated number concentrations of urban 499

aerosol particles. Atmospheric Environment 40, 1427–1440. 500

Jacobs L, Nawrot T.S, de Geus B, Meeusen R, Degraeuwe B, Bernard A, Sughis M, Nemery B, 501

Int Panis L, 2010. Subclinical responses in healthy cyclists briefly exposed to traffic-related 502

air pollution: an intervention study. Environmental Health 9, 64–71. 503

Kittelson D.B., 1998. Engines and nanoparticles: a review. Journal of Aerosol Science 29, 575–504

588. 505

Kittelson D.B, Watts W.F, Johnson J.P, 2004. Nanoparticle emissions on Minnesota highways. 506


Kumar P, Fennell P, Langley D, Britter R, 2008. Pseudo-simultaneous measurements for the 508

vertical variation of coarse, fine and ultrafine particles in an urban street canyon. Atmospheric 509

Environment 42, 4304–4319. 510

Kumar P, Robins A, Britter R, 2009. Fast response measurements of the dispersion of 511

nanoparticles in a vehicle wake and a street canyon. Atmospheric Environment 43, 6110–512

6118. 513

Kumar P, Robins A, Vardoulakis S, Britter R, 2010. A review of the characteristics of 514

nanoparticles in the urban atmosphere and the prospects for developing regulatory controls. 515


Kumar P, Gurjar B.R, Nagpure A.S, Harrison R.M, 2011a. Preliminary estimates of nanoparticle 517

number emissions from road vehicles in megacity Delhi and associated health impacts. 518

Environmental Science & Technology 45, 5514–5521. 519

Kumar P, Robins A, Vardoulakis S, Quincey P, 2011b. Technical challenges in tackling 520

regulatory concerns for urban atmospheric nanoparticles. Particuology 9, 566–571. 521

Kumar P, Imam B, 2013. Footprints of air pollution and changing environment on the 522

sustainability of built infrastructure. Science of The Total Environment 444, 85–101. 523

Kumar P, Morawska L, Birmili W, Paasonen P, Hu M, Kulmala M, Harrison R.M, Norford L, 524

Britter R, 2014. Ultrafine particles in cities. Environment International 66, 1-10. 525

Kumar P, Morawska L, Martani C, Biskos G, Neophytou M, Di Sabatino S, Bell M, Norford L, 526

Britter R, 2015. The rise of low-cost sensing for managing air pollution in cities. Environment 527

International 75, 199-205. 528

Mead M.I, Popoola O.A.M, Stewart G.B, Landshoff P, Calleja M, Hayes M, Baldovi J.J, 529

McLeod M.W, Hodgson T.F, Dicks J, Lewis A, Cohen J, Baron R, Saffell J.R, Jones R.L, 530

2013. The use of electrochemical sensors for monitoring urban air quality in low-cost, high-531

density networks. Atmospheric Environment, 70, 186–203. 532

Mejía J.F, Morawska L, Mengersen K, 2008. Spatial variation in particle number size 533

distributions in a large metropolitan area. Atmospheric Chemistry and Physics, 8, 1127-1138. 534

Mishra V.K, Kumar P, Van Poppel M, Bleux N, Frijns E, Reggente M, Berghmans P, Int Panis L, 535

Samson R, 2012. Wintertime spatio-temporal variation of ultrafine particles in a Belgian city. 536

Science of The Total Environment 431, 307–313. 537

Mølgaard B, Hussein T, Corander J, Hämeri K, 2012. Forecasting size-fractionated particle 538

number concentrations in the urban atmosphere. Atmospheric Environment 46, 155–163. 539

Morawska L, Thomas S, Bofinger N, Wainwright D, Neale D, 1998. Comprehensive 540

characterization of aerosols in a subtropical urban atmosphere: particle size distribution and 541

correlation with gaseous pollutants. Atmospheric Environment 32, 2467–2478. 542

Morawska L, Ristovski Z, Jayaratne E.R, Keogh D.U, Ling X, 2008. Ambient nano and ultrafine 543

particles from motor vehicle emissions: Characteristics, ambient processing and implications 544

on human exposure. Atmospheric Environment 42, 8113–8138. 545

Murray-Smith R, Girard A, 2001. Gaussian Process priors with ARMA noise models. Irish 546

Signals and Systems Conference, 147–153. 547

Peters J, Theunis J, Van Poppel M, Berghmans P, 2013. Monitoring PM10 and ultrafine particles 548

in urban environments using mobile measurements. Aerosol and Air Quality Research 13, 549

509–522. 550

Pey J, Querol X, Alastuey A, Rodríguez S, Putaud J.P, Van Dingenen R, 2009. Source 551

apportionment of urban fine and ultra-fine particle number concentration in a Western 552

Mediterranean city. Atmospheric Environment 43, 4407–4415. 553

Rasmussen C.E, Ghahramani Z, 2002. Infinite mixtures of Gaussian process experts. Advances 554

in Neural Information Processing Systems 14, MIT Press, 881–888. 555

Rasmussen C.E, Williams C.K.I, 2006. Gaussian Processes for Machine Learning. MIT Press. 556

Available free of charge at http://www.gaussianprocess.org 557

Reggente M, Peters J, Theunis J, Van Poppel M, Rademaker M, Kumar P, De Baets B, 2014. 558

Prediction of ultrafine particle number concentrations in urban environments by means of 559

Gaussian process regression based on measurements of oxides of nitrogen. Environmental 560

Modelling & Software 61, 135–150. 561

Sabaliauskas K, Jeong C.H, Yao X, Jun Y.S, Jadidian P, Evans G.J, 2012. Five-year roadside 562

measurements of ultrafine particles in a major Canadian city. Atmospheric Environment 49, 563

245–256. 564

Seinfeld J.H. and Pandis S.P, 2006. Atmospheric Chemistry and Physics: From Air Pollution to 565

Climate Change, 2nd Edition, Wiley, Hoboken, New Jersey. 566

Snelson E, Ghahramani Z, 2006. Sparse Gaussian processes using pseudo-inputs. Advances in 567

Neural Information Processing Systems 18, MIT press, 1257–1264. 568

Westerdahl D, Fruin S, Sax T, Fine P.M, and Sioutas C, 2005. Mobile Platform Measurements of 569

Ultrafine Particles and Associated Pollutant Concentrations on Freeways and Residential 570

Streets in Los Angeles. Atmospheric Environment 41, 3597–3610. 571

Westmoreland E.J, Carslaw N, Carslaw D.C, Gillah A, Bates E, 2007. Analysis of air quality 572

within a street canyon using statistical and dispersion modelling techniques. Atmospheric 573

Environment 41, 9195–9205. 574

A comparison of strategies for estimation of ultrafine particle number concentrations in urban air pollution monitoring networks

Documents