Transcript
8/3/2019 Population Density in Urban Areas
1/22
A cokriging method for estimatingpopulation density in urban areas
Changshan Wu a,*, Alan T. Murray b,1
a Department of Geography, University of Wisconsin-Milwaukee, P.O. Box 413,
Milwaukee, WI 53201-0413, USAb Department of Geography, The Ohio State University, Columbus, OH 43210-1361, USA
Abstract
Population information is typically available for analysis in aggregate socioeconomic
reporting zones, such as census blocks in the United States and enumeration districts in theUnited Kingdom. However, such data mask underlying individual population distributions
and may be incompatible with other information sources (e.g. school districts, transportation
analysis zones, metropolitan statistical areas, etc.). Moreover, it is well known that there are
potential significance issues associated with scale and reporting units, the modifiable areal unit
problem (MAUP), when such data are used in analysis. This may lead to biased results in spa-
tial modeling approaches. In this study, impervious surface fraction derived from Thematic
Mapper (TM) imagery was applied to derive the underlying population of an urban region.
A cokriging method was developed to interpolate population density by modeling the spatial
correlation and cross-correlation of population and impervious surface fraction. Results sug-
gest that population density can be accurately estimated using cokriging applied to impervious
surface fraction. In particular, the relative population estimation error is 0.3% for the entirestudy area and 1015% at block group and tract levels. Moreover, unlike other interpolation
methods, cokriging gives estimation variance at the TM pixel level.
2005 Elsevier Ltd. All rights reserved.
0198-9715/$ - see front matter 2005 Elsevier Ltd. All rights reserved.
doi:10.1016/j.compenvurbsys.2005.01.006
* Corresponding author. Tel.: +1 414 2294860; fax: +1 414 2293981.
E-mail addresses: cswu@uwm.edu (C. Wu), murray.308@osu.edu (A.T. Murray).1 Tel.: +1 614 688 5441; fax: +1 614 292 6213.
Computers, Environment and Urban Systems
29 (2005) 558579
www.elsevier.com/locate/compenvurbsys
mailto:cswu@uwm.edumailto:murray.308@osu.edumailto:murray.308@osu.edumailto:cswu@uwm.edu8/3/2019 Population Density in Urban Areas
2/22
Keywords: Population interpolation; Cokriging; Remote sensing
1. Introduction
The difficulties associated with the application of zone-based census population
data in geographical analyses have been well documented in previous studies
(Fotheringham & Wong, 1991; Martin, 1989, 1996). One important issue is data
aggregation. In many applications, census data cannot sufficiently represent the
underlying geographical distribution of population because it is reported through
aggregating individual population counts in irregular areal units, which can be geo-
graphically meaningless. This aggregation tends to smooth local variability and
requires an assumption of uniformly distributed population within a reporting unit(Moon & Farmer, 2001). While there are legitimate reasons for reporting census
information in this way (i.e. privacy of census respondents), business and service
planning benefit substantially from greater resolution population data (Longley
& Clarke, 1995). For example, Martin and Williams (1992) and Beguin, Thomas,
and Vandenbussche (1992) emphasized the importance of detailed population
information in the location analyses of health-care centers and public libraries.
Moreover, in urban sustainability studies Harris and Longley (2000) point out that
census-based models tend to overestimate residential area because of its coarse
resolution.
Another difficulty with zone-based population data is related to incompatible
spatial information layers (Bracken, 1993; Goodchild, Anselin, & Deichmann,
1993). Different departments and agencies collect and distribute data in varying zo-
nal arrangements (e.g. school districts, transportation analysis zones, metropolitan
statistical areas, etc.). As a consequence, a significant problem arises in regional
analysis and modeling, in which multiple data sources must be integrated before
analysis can be implemented (Goodchild et al., 1993). Moreover, the boundaries
of areal units in census data are not data derived, but rather are the result of enu-
meration and reporting. The modifiable areal unit problem (MAUP) may exist
when utilizing such data in geographical applications. In particular, the relation-ship between variables may only be valid for one particular zonal arrangement
and scale, potentially biasing results obtained in statistical and spatial analyses
(Martin, 1996; Openshaw, 1977).
One approach for dealing with the above problems is to transform aggregated
census data to grid-based population estimates using areal interpolation (Langford,
Maguire, & Unwin, 1991; Martin, 1989; Okabe & Sadahiro, 1997). Areal interpola-
tion methods may be grouped into two categories: simple interpolation and intelli-
gent interpolation (Okabe & Sadahiro, 1997). Simple interpolation involves
transferring data from irregular polygons to regular grids without any supplemen-
tary data (Lam, 1983; Martin, 1996; Tobler, 1999). This method is preferred whenfast computation is important or additional information is unavailable (Okabe &
Sadahiro, 1997). In contrast, intelligent interpolation transfers data with the help
C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579 559
8/3/2019 Population Density in Urban Areas
3/22
of additional information (Harris & Longley, 2000; Langford et al., 1991). This
method has proven more accurate than simple interpolation, although greater
computational processing is required (Fisher & Langford, 1995; Sadahiro, 1999).
Regression analyses supplemented with land use and land cover data are often ap-plied in intelligent interpolation (Langford et al., 1991; Langford & Unwin, 1994).
However, detailed biophysical information is usually lost in producing land use data
from remotely sensed images (Jensen, 1983). As a result, limited land use types are
too coarse for estimating detailed population density. Moreover, the basic assump-
tions of regression analyses (e.g. spatial independence) are unlikely to be satisfied in
geographical applications (Griffith & Can, 1996). Impervious surface fraction in res-
idential areas may be useful for supplementing the developed interpolation process.
Detailed information on residential areas can thus be maintained, providing clues on
population distribution (Ji & Jensen, 1999). Spatial autocorrelation in impervious
surface fraction and population, and the cross-correlation between these two spatialvariables, are explored and modeled in this paper using geostatistical techniques.
Based on modeled spatial relationships, cokriging is applied in this paper to deter-
mine population density in Columbus, OH.
The organization of this paper is as follows. Our study area and data sources
are described in Section 2. The process of deriving impervious surface fraction
in residential areas from remotely sensed imagery is described in Section 3. In par-
ticular, we detail the creation of impervious surface fraction from ETM+ imagery
for the entire study region and describe a procedure for delineating residential
areas within this region. Population density estimation using cokriging combined
with residential impervious surface fraction is reported in Section 4. Accuracy
assessment of the population estimates is addressed in Section 5. Section 6 reports
an adjustment of the population estimates. Finally, conclusions and discussion are
provided in Section 7.
2. Study area and data sources
A portion of the Columbus metropolitan area in Franklin County, OH, USA waschosen as our study region for this research. This region is 47.4 km2 and is divided
into 36 tracts, 125 block groups, and 2445 blocks in the 2000 US Census (see Fig. 1).
The 2000 Census data were acquired from the ESRI website in the shapefile format
(United States Census Bureau, 2002). Landsat 7 ETM+ imagery, which was utilized
to derive residential impervious surface fraction, was acquired on July 8, 1999. Addi-
tional data, such as Digital Orthophoto Quarterquadrangles (DOQQs) from the
Ohio Geographically Referenced Information Program (OGRIP, 1999) and Na-
tional Land Cover Data (NLCD) from the Multi-Resolution Land Characteristics
Consortium (Multi-Resolution Land Characteristics Consortium, 2002), were uti-
lized to examine residential classification accuracy and select training samples. More-over, parcel data from the Franklin County Auditor (2002) and address-based
employment data from the Mid-Ohio Regional Planning Commission (MORPC,
560 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579
8/3/2019 Population Density in Urban Areas
4/22
2002) were utilized to identify possible misclassified pixels since these data maintain
detailed local information about land use and employment.
3. Estimating impervious surface fraction in residential areas
Impervious surface is any material prohibiting the infiltration of water into soil.
As a major component of urban infrastructure, impervious surface has become a pri-
mary variable in urban planning and environmental management (Ji & Jensen, 1999;
Ridd, 1995). Impervious surface fraction, calculated as the proportion of impervioussurface over a small area, has been found to reveal more information about built-up
areas than land use and land cover classification (Ji & Jensen, 1999). For population
Fig. 1. Study area as part of the Columbus metropolitan area in Franklin County, OH, USA (left) and
Landsat ETM+ image acquired on July 8, 1999 for this area (right).
C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579 561
8/3/2019 Population Density in Urban Areas
5/22
estimation, as an example, impervious surface in residential areas generally corre-
sponds to housing, which serves as an indicator of people.
3.1. Impervious surface fraction estimation
Methods for quantifying impervious surface from remotely sensed data are typi-
cally based on either fuzzy classification or spectral mixture analysis (Ji & Jensen,
1999; Phinn, Stanford, Scarth, Murray, & Shyy, 2002; Rashed, Weeks, Gadalla, &
Hill, 2001). In this study, a spectral mixture analysis method was applied to estimate
impervious surface fraction from an ETM+ image (Wu & Murray, 2003). Four end-
members (see Fig. 2), vegetation, high albedo, low albedo and soil, were selected to
represent heterogeneous urban land use and land cover through the analysis of the
spectral feature spaces of a transformed ETM+ image using the maximum noise
fraction (MNF) transformation, the details of which are given in Green, Berman,Switzer, and Craig (1988) and Lee, Woodyatt, and Berman (1990). Consequently,
a fully constrained four-endmember linear mixing model was applied to calculate
each endmember fraction from the Landsat ETM+ data (see Fig. 3). Furthermore,
impervious surface fraction in each ETM+ pixel was modeled by adding the frac-
tions of low albedo and high albedo endmembers after removing the effects of water
and clouds (see Fig. 4).
3.2. Residential area classification
To this point we have detailed impervious surface fraction estimation for the
entire study area. However, we know that population (the major interest in this
Fig. 2. ETM+ reflectance spectra of selected endmembers. These endmembers were chosen by analyzing
the spectral feature spaces of the MNF transformed ETM+ image.
562 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579
8/3/2019 Population Density in Urban Areas
6/22
research) is generally restricted to residential areas. Therefore, it is necessary to iden-tify residential land use within the study area. A maximum likelihood classification
was applied to delineate residential pixels. Similar approaches have been utilized in
classifying residential land uses by Lo (1995), Mesev (1998), and Chen (2002). Six
classes, vegetation, soil, water, commercial and transportation, low density residen-
tial, and high density residential, were specified in selecting training samples with the
help of DOQQ data, NLCD data, and the original ETM+ image. The classification
(see Fig. 5) was conducted using a maximum likelihood classifier provided in ER-
DAS Imagine 8.4 (ERDAS Imagine, 1997). After deriving this image, we grouped
the six classes into two major classes: residential and non-residential.
Since we are estimating detailed population density, residential classification accu-racy is essential in this research. Therefore, we performed post-processing to identify
possible misclassified pixels. In particular, pixels within zero population census
Fig. 3. Endmember fraction images calculated through a fully constrained four-endmember linear mixing
model: (a) vegetation fraction image; (b) high albedo fraction image; (c) low albedo fraction image
(including water); (d) soil fraction image.
C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579 563
8/3/2019 Population Density in Urban Areas
7/22
blocks should obviously not be classified as residential land use. Such pixels were
identified and reclassified as non-residential. Alternatively, pixels within high popu-
lation density census blocks were also subject to further scrutiny. If these pixels are
not classified as residential, they are possibly misclassified and require further anal-
ysis. In this study, we utilized parcel and employment data to identify potential mis-
classified pixels. Group-quarter populations, people in institutions, shelters, and
nursing homes, and students in university dormitories (Plane & Rogerso, 1994), weretypically found in these misclassified pixels. Such areas are difficult to classify using
only remotely sensed data because they share similar spectral signatures to commer-
Fig. 4. Impervious surface fraction image calculated through adding low albedo and high albedo
endmember fractions after removing the effects of water and clouds.
Fig. 5. A maximum likelihood classification of the ETM+ image for the Columbus metropolitan area.
564 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579
8/3/2019 Population Density in Urban Areas
8/22
cial land uses. With the help of parcel and employment data, we were able to identify
these pixels and reclassify them as residential areas.
The classification accuracy of residential land use after the maximum likelihood
classification and post-processing (see Fig. 6) was examined using 400 stratified ran-
domly selected samples. The DOQQ images acquired between 1994 and 1995 were
used in this study for ground truthing. These DOQQs were co-registered with theETM+ image. A 3 by 3 sampling unit was adopted to avoid geometric errors. The
overall classification accuracy is 90% and the overall kappa coefficient is 0.7942
(see Table 1).
With impervious surface fraction for the entire study area (Fig. 4) and the iden-
tified residential land use areas (Fig. 6), impervious surface fraction in residential
areas was easily obtained (see Fig. 7).
4. Interpolating population density using cokriging
After obtaining impervious surface fraction for residential areas, it can be utilized
as supplementary data to interpolate population density. Population density is
Fig. 6. Residential land use classification after the maximum likelihood classification and post-processing.
Table 1
Residential land use classification accuracy assessment
Classified image Reference image
Residential Non-residential Commission errors (%)
Residential 146 15 9.32
Non-residential 25 214 10.46
Omission error 14.62 6.55
Overall accuracy = 90.00%, overall kappa statistics = 0.7942.
C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579 565
8/3/2019 Population Density in Urban Areas
9/22
usually estimated using a regression approach, which models the relationship be-
tween population and supplementary data derived from remote sensing imagery
(Chen, 2002; Harvey, 2002; Lo, 1995). An implicit assumption of regression analysis
is that population density is spatially independent. However, many researchers have
questioned this assumption, claiming that simple regression may lead to biased re-
sults (Griffith, 1993; Griffith & Can, 1996). Therefore, a model considering spatial
autocorrelation is more appropriate. Cokriging may improve the estimation preci-
sion by accounting simultaneously for spatial autocorrelation in population density
and impervious surface fraction and the cross-correlation between these spatial vari-
ables. Moreover, it is suitable when the variable to be estimated (e.g. population den-
sity) is under-sampled while other supplementary variables are abundant (e.g.
impervious surface fraction).
Cokriging is a geostatistical method originating from mining applications (Cres-sie, 1993; Journel & Huijbregts, 1978) and widely applied in soil science (Vauclin,
Vieira, Vachaud, & Nielsen, 1983; Webster, 1985; Webster & Burgess, 1980). Geosta-
tistical methods were introduced in remote sensing in the late 1980s (Curran, 1988;
Woodcock, Strahler, & Jupp, 1988). Now geostatistics are commonly applied in soil
science, biogeography, climatology, and environmental studies (Atkinson, Webster,
& Curran, 1992, 1994; Oliver, Webster, & Gerrard, 1989a, 1989b). A review of geo-
statistical methods and associated applications may be found in Cressie (1993), Cur-
ran and Atkinson (1998), and Curran (2001). Although widely applied in physical
geography, cokriging has rarely been utilized in estimating socio-economic condi-
tions, such as population densities. In this paper, population density is estimatedusing a cokriging method in which the impervious surface fraction is taken as a sec-
ondary variable to improve estimation accuracy.
Fig. 7. Impervious surface fraction in residential areas.
566 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579
8/3/2019 Population Density in Urban Areas
10/22
4.1. Cokriging theory
As an extension to two or more variables in ordinary kriging, cokriging is based
on regionalized variable theory (Journel & Huijbregts, 1978; Oliver et al., 1989a).According to this theory, any regionalized variable z(x) can be considered a realiza-
tion of a random function Z(x), which is a combination of a deterministic compo-
nent, m(x), and random fluctuation, e(x):
zx mx ex 1
where x denotes the geographical coordinates in one, two, or three dimensions; m(x)
indicates a geographical trend or drift; and, e(x) is the spatially dependent random
errors with mean zero. In most applications, the deterministic component, m(x), is
assumed to be locally constant,mx l 2
and for any given distance and direction h, the variance of differences between z(x)
and z(x + h) is finite and independent of x:
varzx zx h Efzx zx hg2 2ch 3
where vector h, the lag, is a given separation distance and direction from x, and c(h)
is the variogram. c(h) has been found to be an important tool in modeling spatial
autocorrelation (Journel & Huijbregts, 1978). Moreover, if two or more variables
are needed, a cross-variogram is defined as follows:
cuvh 12Efzux zux hgfzvx zvx hg 4
Based on regionalized variable theory, it is necessary to estimate an under-sampled
variable using cokriging. This method ensures unbiased estimates with minimum and
known variance (Curran, 2001). If we consider estimating a variable u in a block B
with sampling points of u and a second variable v, our estimate will be
zuB XNu
i
1
kuizuxui XNv
j
1
kvjzvxvj 5
in which Nu and Nv are the number of sampling points for variable u and v; xui and
xvjare the locations of sampling points for variable u and v, respectively; and, kuiand
kvj are the weights to be calculated.
In order to ensure unbiasedness, the following constraints must be satisfied
(Aboufirassi & Marino, 1984):XNui1
kui 1 6
XNvj1
kvj 0 7
C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579 567
8/3/2019 Population Density in Urban Areas
11/22
The first constraint indicates that at least one observation of the primary variable u is
necessary for cokriging. Moreover, constraint (7) ensures that the summation of the
weights for the secondary variable v is zero. Subject to these constraints, we minimize
the estimation variance:
r2uB EfzuB zuBg2 8
This is an optimization problem in which kui and kvj are the decision variables and
r2uB is the objective function. Standard Lagrangian techniques can be applied tosolve this problem. This results in the following:
XNui1
kuicuuxui;xuk XNvj1
kvjcuvxuk;xvj wu cuuB;xuk k 1;Nu 9
XNui1
kuicuvxui;xvl XNvj1
kvjcvvxvj;xvl wv cuvB;xvl l 1;Nv 10
cuu(xui, xuk) is the semi-variogram of variable u between site iand k, cuv(xuk, xvj) is the
cross semi-variogram between variable u and v at site kand j. Finally, cuvB;xvl is thecross semi-variogram between variable u and v at block B and site l.
Using this method, there are Nu + Nv + 2 equations and Nu + Nv + 2 variables,
which can be easily solved by linear algebra. After obtaining the parameters kuiand kvj, zuB may be estimated using Eq. (5). The cokriging variance can be obtained
as a byproduct of the cokriging process as follows:
r2uB XNui1
kuicuuB;xui XNvj1
kvjcuvB;xvj wu cuuB;B 11
Matrix formulations of these equations can be found in Myers (1982), McBratney
and Webster (1983), and Aboufirassi and Marino (1984). Details on solving this
problem using Lagrangian techniques are given in Vauclin et al. (1983) and Atkinson
et al. (1992).
4.2. Variogram estimation
From Eqs. (6), (7), (9) and (10), it is clear that parameters kui and kvj are depen-
dent on the variograms associated with variables u and v, their cross-variogram, and
block size. In this study, block size is defined to be the same as the TM image reso-
lution (30 m by 30 m). Therefore, once the variograms and cross-variogram have
been derived, cokriging is a straightforward process (Atkinson et al., 1992, Atkinson,
Webster, & Curran, 1994). In practice, the variograms are typically estimated using
sampling points as follows:
ch 1
2Nh
XNhi1
fzxi zxi hg2 12
568 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579
8/3/2019 Population Density in Urban Areas
12/22
where z(xi) are known values of variable u or v at sampling point xi, and N(h) is the
number of sampling point pairs separated by lag h. Similarly, the cross-variogram
can be estimated as follows:
cuvh 1
2Nh
XNhi1
fzuxi zuxi hg fzvxi zvxi hg 13
After obtaining the variogram and cross-variogram, a theoretical model is needed to
fit them. Such a model needs to be positive definite and coregionalized to ensure the
cokriging variance is non-negative. More discussion about choosing theoretical func-
tions can be found in McBratney and Webster (1986) and Curran (1988). In this
study, we chose the model satisfying the positive definite and coregionalized require-
ments, the details of which are discussed later in this paper.
4.3. Interpolating population density using cokriging
In this study population density is considered the primary variable to be esti-
mated. In addition, residential impervious surface fraction is considered a secondary
variable used to increase estimation accuracy. One issue is that reported census sta-
tistics are not based on a sampling point, but rather on an areal unit like a block. The
centroid of a census block may be used as the sampling point for the assignment of
population density. However, this method is not realistic because there may not
actually be people at the centroid of a block. Martin (1989) solved this problem
by using a population-weighted point as the representative point of a census block.In a similar manner, in this research the central point of the pixel whose impervious
surface fraction is approximately equal to the block mean is used as a population-
weighted block point. In addition, we assign impervious surface fraction of the pixel
and average population density of the block to this sampling point. After obtaining
the impervious surface fraction and population density on these samples, the charac-
teristics of the data are explored. If they are not secondary stationary, i.e. have the
same mean and variance, the accuracy of the estimated experimental variogram and
associated cokriging will be degraded (Cressie, 1993). The histograms for population
density (see Fig. 8a) and impervious surface fraction (see Fig. 9) were captured based
on the sampling points. It is clear that population density is highly positively skewedand may be approximated by a Poisson function with its variance proportional to its
mean value (Bailey & Gatrell, 1995; Harvey, 2002). A square root transformation
was performed on population density to stabilize its variance. The histogram of
the transformed population density (see Fig. 8b) shows that its distribution is near
normal and its variance is approximately constant. The histogram of impervious sur-
face fraction is slightly negatively skewed, but may be considered approximately nor-
mal. Thus, no transformation was conducted on impervious surface fraction. We
excluded zero population density census blocks because no interpolation is necessary
for these blocks.
In this study, the primary variable u is the square root of population density, and
the secondary variable v is impervious surface fraction. Experimental variograms
C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579 569
8/3/2019 Population Density in Urban Areas
13/22
and cross-variograms were calculated using Eqs. (3) and (4). Gstat software was uti-
lized to fit these variograms to theoretical functions (Pebesma & Wesselin, 1998).
The weighted least squared method and visualization were applied in modeling the
experimental variograms (Cressie, 1985). Directional variograms were also com-
puted and no obvious anisotropies were found. Therefore, the variograms were as-sumed to be isotropic and were fitted using an exponential model of the following
form:
Fig. 8. Histogram of (a) population density and (b) square root of population density at sampling points.
It shows that population density may be described by a Poisson distribution, while the square root
transformation is a reasonable approximation of a normal distribution.
Fig. 9. Histogram of impervious surface fraction at sampling points.
570 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579
8/3/2019 Population Density in Urban Areas
14/22
ch C0 C1f1 e
h=rg for h > 0
0 h 0
(14
Here C0 is the nugget representing unexplained variance and r defines the spatialscale of the variation. In practice, the sill is C0 + 0.95C1 at the point of 3r. In this
study, the parameters were calculated for the variograms of the square root of pop-
ulation density and impervious surface fraction, and also for their cross-variogram
(see Table 2 and Fig. 10).
After obtaining the variograms of impervious surface fraction, square root of
population density, and their cross-variogram, a block cokriging was performed to
interpolate population density (see Fig. 11) using Gstat software embedded in
Idrisi (Harmon, 2002). Fig. 11 shows a clear geographical pattern of population
distribution in the study region. In particular, few people live in the CBD except
Table 2
Coefficients of the theoretical variogram and cross-variogram functions
C0 C1 r
Population density 0.196 0.176 1000
Impervious surface 0.007 0.0089 1000
Population densityimpervious surface 0.012 0.030 1000
Fig. 10. Variograms of (a) square root of population density, (b) residential impervious surface fraction,
and (c) the cross-variogram between square root of population density and impervious surface fraction.
Exponential functions with r = 1000 are chosen to model these variograms.
C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579 571
8/3/2019 Population Density in Urban Areas
15/22
group-quarter populations. High-density household-based populations are adjacent
to the CBD in the southern and northwestern portions of the study region. More-
over, low-density household-based populations reside relatively far away from the
CBD (in the eastern and southern portions).
5. Accuracy assessment
Using the cokriging variance approach defined in Eq. (11) for the square root of
population density, the mean cokriging variance is 23.5% (minimum of 21.3% and
maximum of 50.3%). Fig. 12 shows the distribution of cokriging variance in the
study area. In particular, cokriging variance is high along the study area boundarybecause few samples are used in estimating population density in this portion of
the region.
It is possible to examine population count estimation accuracies at each census
zonal level using the root mean square error (ERMS) and coefficient of variation
(V) to evaluate the absolute and relative error as follows:
ERMS 1
n
Xni1
Pi
bPi
2
" #1=215
V1
P
Xni1
jPi bPij 16
Fig. 11. Estimated population density using developed cokriging method. The height indicates the value
of population density for each TM pixel. The average population density is 4.28, with a maximum of 52,
and a minimum of 0.
572 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579
8/3/2019 Population Density in Urban Areas
16/22
8/3/2019 Population Density in Urban Areas
17/22
residential areas within each census block. Applied to our study area, the model is as
follows (see Table 4):bPAi 6.5798 ILi 9.4650 IHi 19where lLi and l
Hi are the fraction of impervious surface in low and high residential
areas in a census block and bPA
i is the expected population density in a census block
using this alternative regression approach. In both regression models, the area ofeach census block was chosen as a weighting factor to reduce the effects of zone size.
Moreover, the intercepts in these regression models are not included because they are
not statistically significant (further its meaning in population estimation is not clear).
The explanatory variables are statistically significant (p 6 0.0001), which shows the
strong correlation between population density and the chosen explanatory variables
(see Tables 3 and 4).
Comparative results (see Table 5) show that the cokriging method is the most
accurate. In particular, the coefficient of variation is relatively low at the census
block level (34.7%), low at the block group and tract levels (15.2% and 10.2% respec-
tively), and near zero for the entire study area (0.3%). The estimation accuracies ofthe two regression models are reported in Table 5 as well. Neither regression models
perform as well as the cokriging method in terms of estimation accuracy. As an
example, the coefficients of variation for the census tract level in the regression mod-
els are 22.9% and 21.0% respectively, substantially higher than the variation ob-
tained using cokriging (10.2%). Comparing the two regression models, regression
with impervious surface fraction is slightly better than with land use classes (e.g.
21.0% vs. 22.9% estimate error at the census tract level). This result is consistent with
the literature showing that impervious surface fraction performs better than land
use/cover in urban analysis (Ji & Jensen, 1999).
Table 4
Coefficients of the regression model with residential impervious surface fraction as explanatory variables
Coefficients Value Std. error t value Pr(>jtj)
IL 6.5798 0.4335 15.1793 0.0000IH 9.4650 0.1687 56.1212 0.0000
Table 5
Absolute and relative estimation errors of the cokriging and regression models
Zones Average
population
Cokriging Regression with
land cover
Regression with
impervious surface
ERMS V ERMS V ERMS V
Block (2445) 40.99 45.3 34.7% 47.9 48.8% 45.5 46.6%
Block group (125) 801.74 215.0 15.2% 325.6 27.8% 290.7 25.2%
Tract (36) 2825.84 411.0 10.2% 967.6 22.9% 846.0 21.0%
Total study area 100, 200 0.3% 1.0% 2.6%
574 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579
8/3/2019 Population Density in Urban Areas
18/22
6. Population density adjustment
The cokriging approach gives unbiased estimates for the square root of popula-
tion density with minimum variance. However, the population count estimation er-rors evaluated at the census block level are still somewhat large (34.7%). As discussed
in previous studies (Langford & Unwin, 1994; Fisher & Langford, 1995; Martin,
1996), interpolation methods should preserve population counts in each reporting
zone. One option is adding a volume-preserving constraint in the cokriging model.
However, this will make the model more complex since it has a quadratic objective
function and a quadratic regional constraint. In fact, it is not clear that this resulting
model can be solved, exactly or heuristically. An alternative option is to rescale the
population estimates on every pixel to satisfy this zonal constraint:
Pij bPij PibPi 20Here Pij is the rescaled population estimates of pixel j in census block i,
bPij is thepopulation estimates through the cokriging, and Pi and bPi are the population countsof block i (census count and cokriging estimates, respectively). This rescaled popu-
lation density (see Fig. 13) generally maintains the estimates obtained using cokri-
ging, but emphasizes local variation as well. For example, the cokriging method
tends to underestimate population counts in multi-story and high-rise buildings
(the middle portion of Fig. 11). In contrast, the rescaling approach adjusts these
inaccuracies and obtains more accurate population density estimates.
Fig. 13. Adjusted population density that preserves zonal population counts. The height indicates the
value of population density for each TM pixel. The average population density is 4.40, with a maximum of
143, and a minimum of 0.
C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579 575
8/3/2019 Population Density in Urban Areas
19/22
7. Conclusion
In this paper a cokriging method was developed for interpolating residential pop-
ulation density using census count data and impervious surface fraction. The resultsare clearly better than regression-based interpolation approaches. In particular, the
relative population estimation error for the entire study area is 0.3%, which is bet-ter than the results obtained using regression methods (1.0%2.6% estimation error).
Moreover, the estimation errors at the census block group and tract levels (15.2%
and 10.2% respectively) are about 10% lower than those calculated using regression
models (about 2527% and 2123% respectively). At census block level, the estima-
tion error is about 1315% lower than those reported for the regression models (see
Table 3). These results demonstrate that cokriging applied to residential impervious
surface fraction is a superior alternative to traditional regression based interpolation
approaches using land use and land cover data.One reason explaining why cokriging performs well is that it addresses spatial
autocorrelation and cross-autocorrelation associated with the distribution of people
in urban areas. Instead of ignoring spatial dependence, it models the spatial autocor-
relation of population and impervious surface fraction through variograms, and ap-
plies them in population interpolation. Moreover, unlike other interpolation
methods, it provides estimation variance (see Eq. (11) and Fig. 12) at the TM pixel
level (30 by 30 meter). This estimation variance is an important tool for assessing
population estimation error, without aggregating to census reporting zones.
Another interesting aspect of this work is that residential impervious surface frac-
tion was found to be an effective replacement for land use and land cover data typ-
ically used in modeling population density. This makes sense intuitively given that
impervious surface fraction is closely related to housing development, and thus pop-
ulation density. Moreover, the cross-variogram (see Fig. 10c and Table 2) clearly
shows that population density and impervious surface fraction are co-regionalized
variables, with only 25% variance unexplained. Also, regression analyses show that
the regression model with impervious surface fraction consistently performs better
than the other utilizing land use classes.
A final point is that the obtained population estimates are essential for urban
planning applications. As an example, in sustainability studies, residential popula-tion density is a primary indicator of automobile dependent regions (Harris & Long-
ley, 2000). In addition, the estimates of population density may be utilized in
transportation analyses. The traffic analysis zone (TAZ) is typically used as a basic
unit in traffic demand estimation and trip generation. However, there are significant
problems with traditional TAZ definitions as well as difficulties with associated tra-
vel distance calculation (Daganzo, 1980; Miller, 1999). Detailed population informa-
tion may be potentially helpful in redefining TAZs in order to achieve more
homogeneous population densities and socio-economic characteristics, thus poten-
tially eliminating the modifiable areal unit problem in a range transportation analy-
sis approaches.While the developed approach is a considerable improvement for estimating pop-
ulation density at a fine scale, there are potential improvements that may be worth
576 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579
8/3/2019 Population Density in Urban Areas
20/22
exploring. One improvement would be satisfying the volume preserving constraint
during the interpolation process, requiring that interpolated population counts in
every census zone be equal to observed counts. In this study, we satisfied this con-
straint by rescaling population density in every pixel after interpolation. Althoughthe population counts in every census zone are maintained, this adjustment may
introduce bias and increase estimation variance. More sophisticated models might
increase population density estimation accuracy and maintain population counts
in every census zone simultaneously.
References
Aboufirassi, M., & Marino, M. A. (1984). Cokriging of aquifer transmissivities from field measurements of
transmissivity and specific capacity. Mathematical Geology, 16(1), 1935.Atkinson, P. M., Webster, R., & Curran, P. J. (1992). Cokriging with ground-based radiometry. Remote
Sensing of Environment, 41, 4560.
Atkinson, P. M., Webster, R., & Curran, P. J. (1994). Cokriging with airborne MSS imagery. Remote
Sensing of Environment, 50, 335345.
Bailey, T., & Gatrell, A. C. (1995). Chapter 7: The analysis of area data. Interactive Spatial Data Analysis,
Longman Group Limited.
Beguin, H., Thomas, I., & Vandenbussche, D. (1992). Weight variation with a set of demand points, and
locationallocation issues: A case study of public libraries. Environment and Planning A, 24, 17691779.
Bracken, I. (1993). An extensive surface model database for population related information: Concept and
application. Environment and Planning B, 20, 1327.
Chen, K. (2002). An approach to linking remotely sensed data and areal census data. International Journal
of Remote Sensing, 23, 3748.Cressie, N. (1985). Fitting variogram models by weighted least squares. Mathematical Geology, 17,
563586.
Cressie, N. (1993). Statistics for spatial data (revised edition). New York: Wiley.
Curran, P. J. (1988). The semivariogram in remote sensing: An introduction. Remote Sensing of
Environment, 24, 493507.
Curran, P. J. (2001). Remote sensing: Using the spatial domain. Environmental and Ecological Statistics, 8,
331344.
Curran, P. J., & Atkinson, P. M. (1998). Geostatistics and remote sensing. Progress in Physical Geography,
22(1), 6178.
Daganzo, C. F. (1980). Network representation, continuum approximations and a solution to the spatial
aggregation problem of traffic assignment. Transportation Research, 14B, 229239.
ERDAS Imagine (1997). ERDAS Imagine tour guides (4th ed.). Atlanta Georgia: ERDAS, Inc.Fisher, P. F., & Langford, M. (1995). Modeling the errors in areal interpolation between zonal systems by
Monte Carlo simulation. Environment and Planning A, 27, 211224.
Fotheringham, A. S., & Wong, D. W. S. (1991). The modifiable areal unit problem in multivariate
statistical analysis. Environmental and Planning A, 23, 10251034.
Franklin County Auditor (2002). Franklin county auditors interactive geographic information system.
.
Goodchild, M. F., Anselin, L., & Deichmann, U. (1993). A framework for the areal interpolation of
socioeconomic data. Environment and Planning A, 25, 383397.
Green, A. A., Berman, M., Switzer, P., & Craig, M. D. (1988). A transformation for ordering multispectral
data in terms of image quality with implications for noise removal. IEEE Transactions on Geoscience
and Remote Sensing, 26, 6574.
Griffith, D. A. (1993). Spatial regression analysis on the PC: Spatial statistics using SAS. Washington, DC:
Association of American Geographers.
C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579 577
http://209.51.193.83/search.htmlhttp://209.51.193.83/search.html8/3/2019 Population Density in Urban Areas
21/22
Griffith, D. A., & Can, A. (1996). Spatial statistical/econometric version of simple urban population
density models. In S. L. Arlinghaus & D. A. Griffith (Eds.), Practical handbook of spatial statistics.
CRC Press.
Harmon, D. (2002). Quick Take Reviews: Idrisi32 Release 2. GEOWorld, March, pp. 5051.
Harris, R. J., & Longley, P. A. (2000). New data and approaches for urban analysis: Modeling residential
densities. Transactions in GIS, 4(3), 217234.
Harvey, J. T. (2002). Estimating census district populations from satellite imagery: Some approaches and
limitations. International Journal of Remote Sensing, 23(10), 20712095.
Jensen, J. R. (1983). Biophysical remote sensing. Annals of the Association of American Geographers, 73,
111132.
Ji, M., & Jensen, J. R. (1999). Effectiveness of subpixel analysis in detecting and quantifying urban
imperviousness from Landsat Thematic Mapper imagery. Geocarto International, 14(4), 3139.
Journel, A. G., & Huijbregts, C. J. (1978). Mining geostatistics. New York: Academic Press.
Lam, N. S. (1983). Spatial interpolation methods: A review. American Cartographer, 10(2), 129149.
Langford, M., Maguire, D. J., & Unwin, D. J. (1991). The areal interpolation problem: Estimating
population using remote sensing in a GIS framework. In I. Masser & M. Blakemore (Eds.), Handling geographical information: Methodology and potential applications (pp. 5577). Harlow, Essex:
Longman.
Langford, M., & Unwin, D. J. (1994). Generating and mapping population density surfaces within a
geographical information system. Cartographic Journal, 31, 2126.
Lee, J. B., Woodyatt, A. S., & Berman, M. (1990). Enhancement of high spectral resolution remote sensing
data by a noise-adjusted principal components transformation. IEEE Transactions on Geoscience and
Remote Sensing, 28, 295304.
Lo, C. P. (1995). Automated population and dwelling unit estimation from high-resolution satellite
images: A GIS approach. International Journal of Remote Sensing, 16(1), 1734.
Longley, P., & Clarke, G. (1995). GIS for business and service planning. Cambridge: GeoInformation
International.
Martin, D. (1989). Mapping population data from zone centroid locations. TransactionsInstitute ofBritish Geographers, 14, 9097.
Martin, D. (1996). An assessment of surface and zonal models of population. International Journal of
Geographical Information Systems, 10(8), 973989.
Martin, D., & Williams, H. C. W. L. (1992). Market-area analysis and accessibility to primary health-care
centers. Environment and Planning A, 24, 10091019.
McBratney, A. B., & Webster, R. (1983). Optimal interpolation and isarithmic mapping of soil properties:
5. Co-regionalization and multiple sampling strategy. Journal of Soil Science, 34(1), 137162.
McBratney, A. B., & Webster, R. (1986). Choosing functions for semi-variograms of soil properties and
fitting them to sampling estimates. Journal of Soil Science, 37, 617639.
Mesev, V. (1998). The use of census data in urban image classification. Photogrammetric Engineering and
Remote Sensing, 64, 431438.
Mid-Ohio Regional Planning Commission (MORPC) (2002). GIS technology. .
Miller, H. J. (1999). Potential contributions of spatial analysis to geographical information systems for
transportation (GIS-T). Geographical Analysis, 31(4), 373399.
Moon, Z. K., & Farmer, F. L. (2001). Population density surface: A new approach to an old problem.
Society and Natural Resources, 14, 3949.
Multi-Resolution Land Characteristics Consortium (MRLC) (2002). National land cover data (NLCD).
.
Myers, D. E. (1982). Matrix formulation of co-kriging. Mathematical Geology, 14(3), 250257.
Ohio Geographically Referenced Information Program (OGRIP) (1999). Digital orthophoto quarter-
quadrangles. .
Okabe, A., & Sadahiro, Y. (1997). Variation in count data transferred from a set of irregular zones to a set
of regular zones through the point-in-polygon method. International Journal of Geographical
Information Science, 11(1), 93106.
578 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579
http://www.morpcsoft.org/GIS/gis.htmhttp://www.morpcsoft.org/GIS/gis.htmhttp://www.morpcsoft.org/GIS/gis.htmhttp://www.epa.gov/mrlc/nlcd.htmlhttp://www.epa.gov/mrlc/nlcd.htmlhttp://www.epa.gov/mrlc/nlcd.htmlhttp://www.morpcsoft.org/GIS/gis.htmhttp://www.morpcsoft.org/GIS/gis.htm8/3/2019 Population Density in Urban Areas
22/22
Oliver, M., Webster, R., & Gerrard, J. (1989a). Geostatistics in physical geography, Part I: Theory.
TransactionsInstitute of British Geographers, 14, 259269.
Oliver, M., Webster, R., & Gerrard, J. (1989b). Geostatistics in physical geography, Part II: Applications.
TransactionsInstitute of British Geographers, 14, 270286.
Openshaw, S. (1977). Optimal zoning systems for spatial interaction models. Environment and Planning A,
9, 169184.
Pebesma, E. J., & Wesseling, C. G. (1998). Gstat: A program for geostatistical modeling, prediction and
simulation. Computers and Geosciences, 24(1), 1731.
Phinn, S., Stanford, M., Scarth, P., Murray, A. T., & Shyy, T. (2002). Monitoring the composition and
form of urban environments based on the vegetationimpervious surfacesoil (VIS) model by sub-pixel
analysis techniques. International Journal of Remote Sensing, 23, 41314153.
Plane, D. A., & Rogerson, P. A. (1994). The geographical analysis of population with applications to
business and planning. New York: Wiley.
Rashed, T., Weeks, J. R., Gadalla, M. S., & Hill, A. G. (2001). Revealing the anatomy of cities through
spectral mixture analysis of multispectral satellite imagery: A case study of the Greater Cairo region,
Egypt. Geocarto International, 16(4), 515.Ridd, M. K. (1995). Exploring a VIS (vegetationimpervious surfacesoil) model for urban ecosystem
analysis through remote sensing: Comparative anatomy for cities. International Journal of Remote
Sensing, 16, 21652185.
Sadahiro, Y. (1999). Accuracy of areal interpolation: A comparison of alternative methods. Journal of
Geographical Systems, 1, 323346.
Tobler, W. (1999). Linear pycnophylactic reallocationcomment on a paper by D. Martin. International
Journal of Geographical Information Science, 13(1), 8590.
United States Census Bureau (2002). United States Census 2000. .
Vauclin, M., Vieira, S. R., Vachaud, G., & Nielsen, D. R. (1983). The use of cokriging with limited field
soil observations. Journal of Soil Science Society of American, 47(2), 175184.
Webster, R. (1985). Quantitative spatial analysis of soil in the field. Advances in Soil Science, 3, 170.Webster, R., & Burgess, T. M. (1980). Optimal interpolation and isarithmic mapping of soil properties, III
changing drift and universal kriging. Journal of Soil Science, 31, 505524.
Woodcock, C. E., Strahler, A. H., & Jupp, D. L. B. (1988). The use of variograms in remote sensing: I.
Scene models and simulated images. Remote Sensing of Environment, 25, 323348.
Wu, C., & Murray, A. T. (2003). Estimating impervious surface distribution by spectral mixture analysis.
Remote Sensing of Environment, 84, 493505.
C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579 579
http://www.census.gov/main/www/cen2000.htmlhttp://www.census.gov/main/www/cen2000.htmlhttp://www.census.gov/main/www/cen2000.htmlhttp://www.census.gov/main/www/cen2000.htmlhttp://www.census.gov/main/www/cen2000.html
top related