Top Banner

of 22

Population Density in Urban Areas

Apr 06, 2018

Download

Documents

John Pissourios
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/3/2019 Population Density in Urban Areas

    1/22

    A cokriging method for estimatingpopulation density in urban areas

    Changshan Wu a,*, Alan T. Murray b,1

    a Department of Geography, University of Wisconsin-Milwaukee, P.O. Box 413,

    Milwaukee, WI 53201-0413, USAb Department of Geography, The Ohio State University, Columbus, OH 43210-1361, USA

    Abstract

    Population information is typically available for analysis in aggregate socioeconomic

    reporting zones, such as census blocks in the United States and enumeration districts in theUnited Kingdom. However, such data mask underlying individual population distributions

    and may be incompatible with other information sources (e.g. school districts, transportation

    analysis zones, metropolitan statistical areas, etc.). Moreover, it is well known that there are

    potential significance issues associated with scale and reporting units, the modifiable areal unit

    problem (MAUP), when such data are used in analysis. This may lead to biased results in spa-

    tial modeling approaches. In this study, impervious surface fraction derived from Thematic

    Mapper (TM) imagery was applied to derive the underlying population of an urban region.

    A cokriging method was developed to interpolate population density by modeling the spatial

    correlation and cross-correlation of population and impervious surface fraction. Results sug-

    gest that population density can be accurately estimated using cokriging applied to impervious

    surface fraction. In particular, the relative population estimation error is 0.3% for the entirestudy area and 1015% at block group and tract levels. Moreover, unlike other interpolation

    methods, cokriging gives estimation variance at the TM pixel level.

    2005 Elsevier Ltd. All rights reserved.

    0198-9715/$ - see front matter 2005 Elsevier Ltd. All rights reserved.

    doi:10.1016/j.compenvurbsys.2005.01.006

    * Corresponding author. Tel.: +1 414 2294860; fax: +1 414 2293981.

    E-mail addresses: [email protected] (C. Wu), [email protected] (A.T. Murray).1 Tel.: +1 614 688 5441; fax: +1 614 292 6213.

    Computers, Environment and Urban Systems

    29 (2005) 558579

    www.elsevier.com/locate/compenvurbsys

    mailto:[email protected]:[email protected]:[email protected]:[email protected]
  • 8/3/2019 Population Density in Urban Areas

    2/22

    Keywords: Population interpolation; Cokriging; Remote sensing

    1. Introduction

    The difficulties associated with the application of zone-based census population

    data in geographical analyses have been well documented in previous studies

    (Fotheringham & Wong, 1991; Martin, 1989, 1996). One important issue is data

    aggregation. In many applications, census data cannot sufficiently represent the

    underlying geographical distribution of population because it is reported through

    aggregating individual population counts in irregular areal units, which can be geo-

    graphically meaningless. This aggregation tends to smooth local variability and

    requires an assumption of uniformly distributed population within a reporting unit(Moon & Farmer, 2001). While there are legitimate reasons for reporting census

    information in this way (i.e. privacy of census respondents), business and service

    planning benefit substantially from greater resolution population data (Longley

    & Clarke, 1995). For example, Martin and Williams (1992) and Beguin, Thomas,

    and Vandenbussche (1992) emphasized the importance of detailed population

    information in the location analyses of health-care centers and public libraries.

    Moreover, in urban sustainability studies Harris and Longley (2000) point out that

    census-based models tend to overestimate residential area because of its coarse

    resolution.

    Another difficulty with zone-based population data is related to incompatible

    spatial information layers (Bracken, 1993; Goodchild, Anselin, & Deichmann,

    1993). Different departments and agencies collect and distribute data in varying zo-

    nal arrangements (e.g. school districts, transportation analysis zones, metropolitan

    statistical areas, etc.). As a consequence, a significant problem arises in regional

    analysis and modeling, in which multiple data sources must be integrated before

    analysis can be implemented (Goodchild et al., 1993). Moreover, the boundaries

    of areal units in census data are not data derived, but rather are the result of enu-

    meration and reporting. The modifiable areal unit problem (MAUP) may exist

    when utilizing such data in geographical applications. In particular, the relation-ship between variables may only be valid for one particular zonal arrangement

    and scale, potentially biasing results obtained in statistical and spatial analyses

    (Martin, 1996; Openshaw, 1977).

    One approach for dealing with the above problems is to transform aggregated

    census data to grid-based population estimates using areal interpolation (Langford,

    Maguire, & Unwin, 1991; Martin, 1989; Okabe & Sadahiro, 1997). Areal interpola-

    tion methods may be grouped into two categories: simple interpolation and intelli-

    gent interpolation (Okabe & Sadahiro, 1997). Simple interpolation involves

    transferring data from irregular polygons to regular grids without any supplemen-

    tary data (Lam, 1983; Martin, 1996; Tobler, 1999). This method is preferred whenfast computation is important or additional information is unavailable (Okabe &

    Sadahiro, 1997). In contrast, intelligent interpolation transfers data with the help

    C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579 559

  • 8/3/2019 Population Density in Urban Areas

    3/22

    of additional information (Harris & Longley, 2000; Langford et al., 1991). This

    method has proven more accurate than simple interpolation, although greater

    computational processing is required (Fisher & Langford, 1995; Sadahiro, 1999).

    Regression analyses supplemented with land use and land cover data are often ap-plied in intelligent interpolation (Langford et al., 1991; Langford & Unwin, 1994).

    However, detailed biophysical information is usually lost in producing land use data

    from remotely sensed images (Jensen, 1983). As a result, limited land use types are

    too coarse for estimating detailed population density. Moreover, the basic assump-

    tions of regression analyses (e.g. spatial independence) are unlikely to be satisfied in

    geographical applications (Griffith & Can, 1996). Impervious surface fraction in res-

    idential areas may be useful for supplementing the developed interpolation process.

    Detailed information on residential areas can thus be maintained, providing clues on

    population distribution (Ji & Jensen, 1999). Spatial autocorrelation in impervious

    surface fraction and population, and the cross-correlation between these two spatialvariables, are explored and modeled in this paper using geostatistical techniques.

    Based on modeled spatial relationships, cokriging is applied in this paper to deter-

    mine population density in Columbus, OH.

    The organization of this paper is as follows. Our study area and data sources

    are described in Section 2. The process of deriving impervious surface fraction

    in residential areas from remotely sensed imagery is described in Section 3. In par-

    ticular, we detail the creation of impervious surface fraction from ETM+ imagery

    for the entire study region and describe a procedure for delineating residential

    areas within this region. Population density estimation using cokriging combined

    with residential impervious surface fraction is reported in Section 4. Accuracy

    assessment of the population estimates is addressed in Section 5. Section 6 reports

    an adjustment of the population estimates. Finally, conclusions and discussion are

    provided in Section 7.

    2. Study area and data sources

    A portion of the Columbus metropolitan area in Franklin County, OH, USA waschosen as our study region for this research. This region is 47.4 km2 and is divided

    into 36 tracts, 125 block groups, and 2445 blocks in the 2000 US Census (see Fig. 1).

    The 2000 Census data were acquired from the ESRI website in the shapefile format

    (United States Census Bureau, 2002). Landsat 7 ETM+ imagery, which was utilized

    to derive residential impervious surface fraction, was acquired on July 8, 1999. Addi-

    tional data, such as Digital Orthophoto Quarterquadrangles (DOQQs) from the

    Ohio Geographically Referenced Information Program (OGRIP, 1999) and Na-

    tional Land Cover Data (NLCD) from the Multi-Resolution Land Characteristics

    Consortium (Multi-Resolution Land Characteristics Consortium, 2002), were uti-

    lized to examine residential classification accuracy and select training samples. More-over, parcel data from the Franklin County Auditor (2002) and address-based

    employment data from the Mid-Ohio Regional Planning Commission (MORPC,

    560 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

  • 8/3/2019 Population Density in Urban Areas

    4/22

    2002) were utilized to identify possible misclassified pixels since these data maintain

    detailed local information about land use and employment.

    3. Estimating impervious surface fraction in residential areas

    Impervious surface is any material prohibiting the infiltration of water into soil.

    As a major component of urban infrastructure, impervious surface has become a pri-

    mary variable in urban planning and environmental management (Ji & Jensen, 1999;

    Ridd, 1995). Impervious surface fraction, calculated as the proportion of impervioussurface over a small area, has been found to reveal more information about built-up

    areas than land use and land cover classification (Ji & Jensen, 1999). For population

    Fig. 1. Study area as part of the Columbus metropolitan area in Franklin County, OH, USA (left) and

    Landsat ETM+ image acquired on July 8, 1999 for this area (right).

    C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579 561

  • 8/3/2019 Population Density in Urban Areas

    5/22

    estimation, as an example, impervious surface in residential areas generally corre-

    sponds to housing, which serves as an indicator of people.

    3.1. Impervious surface fraction estimation

    Methods for quantifying impervious surface from remotely sensed data are typi-

    cally based on either fuzzy classification or spectral mixture analysis (Ji & Jensen,

    1999; Phinn, Stanford, Scarth, Murray, & Shyy, 2002; Rashed, Weeks, Gadalla, &

    Hill, 2001). In this study, a spectral mixture analysis method was applied to estimate

    impervious surface fraction from an ETM+ image (Wu & Murray, 2003). Four end-

    members (see Fig. 2), vegetation, high albedo, low albedo and soil, were selected to

    represent heterogeneous urban land use and land cover through the analysis of the

    spectral feature spaces of a transformed ETM+ image using the maximum noise

    fraction (MNF) transformation, the details of which are given in Green, Berman,Switzer, and Craig (1988) and Lee, Woodyatt, and Berman (1990). Consequently,

    a fully constrained four-endmember linear mixing model was applied to calculate

    each endmember fraction from the Landsat ETM+ data (see Fig. 3). Furthermore,

    impervious surface fraction in each ETM+ pixel was modeled by adding the frac-

    tions of low albedo and high albedo endmembers after removing the effects of water

    and clouds (see Fig. 4).

    3.2. Residential area classification

    To this point we have detailed impervious surface fraction estimation for the

    entire study area. However, we know that population (the major interest in this

    Fig. 2. ETM+ reflectance spectra of selected endmembers. These endmembers were chosen by analyzing

    the spectral feature spaces of the MNF transformed ETM+ image.

    562 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

  • 8/3/2019 Population Density in Urban Areas

    6/22

    research) is generally restricted to residential areas. Therefore, it is necessary to iden-tify residential land use within the study area. A maximum likelihood classification

    was applied to delineate residential pixels. Similar approaches have been utilized in

    classifying residential land uses by Lo (1995), Mesev (1998), and Chen (2002). Six

    classes, vegetation, soil, water, commercial and transportation, low density residen-

    tial, and high density residential, were specified in selecting training samples with the

    help of DOQQ data, NLCD data, and the original ETM+ image. The classification

    (see Fig. 5) was conducted using a maximum likelihood classifier provided in ER-

    DAS Imagine 8.4 (ERDAS Imagine, 1997). After deriving this image, we grouped

    the six classes into two major classes: residential and non-residential.

    Since we are estimating detailed population density, residential classification accu-racy is essential in this research. Therefore, we performed post-processing to identify

    possible misclassified pixels. In particular, pixels within zero population census

    Fig. 3. Endmember fraction images calculated through a fully constrained four-endmember linear mixing

    model: (a) vegetation fraction image; (b) high albedo fraction image; (c) low albedo fraction image

    (including water); (d) soil fraction image.

    C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579 563

  • 8/3/2019 Population Density in Urban Areas

    7/22

    blocks should obviously not be classified as residential land use. Such pixels were

    identified and reclassified as non-residential. Alternatively, pixels within high popu-

    lation density census blocks were also subject to further scrutiny. If these pixels are

    not classified as residential, they are possibly misclassified and require further anal-

    ysis. In this study, we utilized parcel and employment data to identify potential mis-

    classified pixels. Group-quarter populations, people in institutions, shelters, and

    nursing homes, and students in university dormitories (Plane & Rogerso, 1994), weretypically found in these misclassified pixels. Such areas are difficult to classify using

    only remotely sensed data because they share similar spectral signatures to commer-

    Fig. 4. Impervious surface fraction image calculated through adding low albedo and high albedo

    endmember fractions after removing the effects of water and clouds.

    Fig. 5. A maximum likelihood classification of the ETM+ image for the Columbus metropolitan area.

    564 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

  • 8/3/2019 Population Density in Urban Areas

    8/22

    cial land uses. With the help of parcel and employment data, we were able to identify

    these pixels and reclassify them as residential areas.

    The classification accuracy of residential land use after the maximum likelihood

    classification and post-processing (see Fig. 6) was examined using 400 stratified ran-

    domly selected samples. The DOQQ images acquired between 1994 and 1995 were

    used in this study for ground truthing. These DOQQs were co-registered with theETM+ image. A 3 by 3 sampling unit was adopted to avoid geometric errors. The

    overall classification accuracy is 90% and the overall kappa coefficient is 0.7942

    (see Table 1).

    With impervious surface fraction for the entire study area (Fig. 4) and the iden-

    tified residential land use areas (Fig. 6), impervious surface fraction in residential

    areas was easily obtained (see Fig. 7).

    4. Interpolating population density using cokriging

    After obtaining impervious surface fraction for residential areas, it can be utilized

    as supplementary data to interpolate population density. Population density is

    Fig. 6. Residential land use classification after the maximum likelihood classification and post-processing.

    Table 1

    Residential land use classification accuracy assessment

    Classified image Reference image

    Residential Non-residential Commission errors (%)

    Residential 146 15 9.32

    Non-residential 25 214 10.46

    Omission error 14.62 6.55

    Overall accuracy = 90.00%, overall kappa statistics = 0.7942.

    C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579 565

  • 8/3/2019 Population Density in Urban Areas

    9/22

    usually estimated using a regression approach, which models the relationship be-

    tween population and supplementary data derived from remote sensing imagery

    (Chen, 2002; Harvey, 2002; Lo, 1995). An implicit assumption of regression analysis

    is that population density is spatially independent. However, many researchers have

    questioned this assumption, claiming that simple regression may lead to biased re-

    sults (Griffith, 1993; Griffith & Can, 1996). Therefore, a model considering spatial

    autocorrelation is more appropriate. Cokriging may improve the estimation preci-

    sion by accounting simultaneously for spatial autocorrelation in population density

    and impervious surface fraction and the cross-correlation between these spatial vari-

    ables. Moreover, it is suitable when the variable to be estimated (e.g. population den-

    sity) is under-sampled while other supplementary variables are abundant (e.g.

    impervious surface fraction).

    Cokriging is a geostatistical method originating from mining applications (Cres-sie, 1993; Journel & Huijbregts, 1978) and widely applied in soil science (Vauclin,

    Vieira, Vachaud, & Nielsen, 1983; Webster, 1985; Webster & Burgess, 1980). Geosta-

    tistical methods were introduced in remote sensing in the late 1980s (Curran, 1988;

    Woodcock, Strahler, & Jupp, 1988). Now geostatistics are commonly applied in soil

    science, biogeography, climatology, and environmental studies (Atkinson, Webster,

    & Curran, 1992, 1994; Oliver, Webster, & Gerrard, 1989a, 1989b). A review of geo-

    statistical methods and associated applications may be found in Cressie (1993), Cur-

    ran and Atkinson (1998), and Curran (2001). Although widely applied in physical

    geography, cokriging has rarely been utilized in estimating socio-economic condi-

    tions, such as population densities. In this paper, population density is estimatedusing a cokriging method in which the impervious surface fraction is taken as a sec-

    ondary variable to improve estimation accuracy.

    Fig. 7. Impervious surface fraction in residential areas.

    566 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

  • 8/3/2019 Population Density in Urban Areas

    10/22

    4.1. Cokriging theory

    As an extension to two or more variables in ordinary kriging, cokriging is based

    on regionalized variable theory (Journel & Huijbregts, 1978; Oliver et al., 1989a).According to this theory, any regionalized variable z(x) can be considered a realiza-

    tion of a random function Z(x), which is a combination of a deterministic compo-

    nent, m(x), and random fluctuation, e(x):

    zx mx ex 1

    where x denotes the geographical coordinates in one, two, or three dimensions; m(x)

    indicates a geographical trend or drift; and, e(x) is the spatially dependent random

    errors with mean zero. In most applications, the deterministic component, m(x), is

    assumed to be locally constant,mx l 2

    and for any given distance and direction h, the variance of differences between z(x)

    and z(x + h) is finite and independent of x:

    varzx zx h Efzx zx hg2 2ch 3

    where vector h, the lag, is a given separation distance and direction from x, and c(h)

    is the variogram. c(h) has been found to be an important tool in modeling spatial

    autocorrelation (Journel & Huijbregts, 1978). Moreover, if two or more variables

    are needed, a cross-variogram is defined as follows:

    cuvh 12Efzux zux hgfzvx zvx hg 4

    Based on regionalized variable theory, it is necessary to estimate an under-sampled

    variable using cokriging. This method ensures unbiased estimates with minimum and

    known variance (Curran, 2001). If we consider estimating a variable u in a block B

    with sampling points of u and a second variable v, our estimate will be

    zuB XNu

    i

    1

    kuizuxui XNv

    j

    1

    kvjzvxvj 5

    in which Nu and Nv are the number of sampling points for variable u and v; xui and

    xvjare the locations of sampling points for variable u and v, respectively; and, kuiand

    kvj are the weights to be calculated.

    In order to ensure unbiasedness, the following constraints must be satisfied

    (Aboufirassi & Marino, 1984):XNui1

    kui 1 6

    XNvj1

    kvj 0 7

    C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579 567

  • 8/3/2019 Population Density in Urban Areas

    11/22

    The first constraint indicates that at least one observation of the primary variable u is

    necessary for cokriging. Moreover, constraint (7) ensures that the summation of the

    weights for the secondary variable v is zero. Subject to these constraints, we minimize

    the estimation variance:

    r2uB EfzuB zuBg2 8

    This is an optimization problem in which kui and kvj are the decision variables and

    r2uB is the objective function. Standard Lagrangian techniques can be applied tosolve this problem. This results in the following:

    XNui1

    kuicuuxui;xuk XNvj1

    kvjcuvxuk;xvj wu cuuB;xuk k 1;Nu 9

    XNui1

    kuicuvxui;xvl XNvj1

    kvjcvvxvj;xvl wv cuvB;xvl l 1;Nv 10

    cuu(xui, xuk) is the semi-variogram of variable u between site iand k, cuv(xuk, xvj) is the

    cross semi-variogram between variable u and v at site kand j. Finally, cuvB;xvl is thecross semi-variogram between variable u and v at block B and site l.

    Using this method, there are Nu + Nv + 2 equations and Nu + Nv + 2 variables,

    which can be easily solved by linear algebra. After obtaining the parameters kuiand kvj, zuB may be estimated using Eq. (5). The cokriging variance can be obtained

    as a byproduct of the cokriging process as follows:

    r2uB XNui1

    kuicuuB;xui XNvj1

    kvjcuvB;xvj wu cuuB;B 11

    Matrix formulations of these equations can be found in Myers (1982), McBratney

    and Webster (1983), and Aboufirassi and Marino (1984). Details on solving this

    problem using Lagrangian techniques are given in Vauclin et al. (1983) and Atkinson

    et al. (1992).

    4.2. Variogram estimation

    From Eqs. (6), (7), (9) and (10), it is clear that parameters kui and kvj are depen-

    dent on the variograms associated with variables u and v, their cross-variogram, and

    block size. In this study, block size is defined to be the same as the TM image reso-

    lution (30 m by 30 m). Therefore, once the variograms and cross-variogram have

    been derived, cokriging is a straightforward process (Atkinson et al., 1992, Atkinson,

    Webster, & Curran, 1994). In practice, the variograms are typically estimated using

    sampling points as follows:

    ch 1

    2Nh

    XNhi1

    fzxi zxi hg2 12

    568 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

  • 8/3/2019 Population Density in Urban Areas

    12/22

    where z(xi) are known values of variable u or v at sampling point xi, and N(h) is the

    number of sampling point pairs separated by lag h. Similarly, the cross-variogram

    can be estimated as follows:

    cuvh 1

    2Nh

    XNhi1

    fzuxi zuxi hg fzvxi zvxi hg 13

    After obtaining the variogram and cross-variogram, a theoretical model is needed to

    fit them. Such a model needs to be positive definite and coregionalized to ensure the

    cokriging variance is non-negative. More discussion about choosing theoretical func-

    tions can be found in McBratney and Webster (1986) and Curran (1988). In this

    study, we chose the model satisfying the positive definite and coregionalized require-

    ments, the details of which are discussed later in this paper.

    4.3. Interpolating population density using cokriging

    In this study population density is considered the primary variable to be esti-

    mated. In addition, residential impervious surface fraction is considered a secondary

    variable used to increase estimation accuracy. One issue is that reported census sta-

    tistics are not based on a sampling point, but rather on an areal unit like a block. The

    centroid of a census block may be used as the sampling point for the assignment of

    population density. However, this method is not realistic because there may not

    actually be people at the centroid of a block. Martin (1989) solved this problem

    by using a population-weighted point as the representative point of a census block.In a similar manner, in this research the central point of the pixel whose impervious

    surface fraction is approximately equal to the block mean is used as a population-

    weighted block point. In addition, we assign impervious surface fraction of the pixel

    and average population density of the block to this sampling point. After obtaining

    the impervious surface fraction and population density on these samples, the charac-

    teristics of the data are explored. If they are not secondary stationary, i.e. have the

    same mean and variance, the accuracy of the estimated experimental variogram and

    associated cokriging will be degraded (Cressie, 1993). The histograms for population

    density (see Fig. 8a) and impervious surface fraction (see Fig. 9) were captured based

    on the sampling points. It is clear that population density is highly positively skewedand may be approximated by a Poisson function with its variance proportional to its

    mean value (Bailey & Gatrell, 1995; Harvey, 2002). A square root transformation

    was performed on population density to stabilize its variance. The histogram of

    the transformed population density (see Fig. 8b) shows that its distribution is near

    normal and its variance is approximately constant. The histogram of impervious sur-

    face fraction is slightly negatively skewed, but may be considered approximately nor-

    mal. Thus, no transformation was conducted on impervious surface fraction. We

    excluded zero population density census blocks because no interpolation is necessary

    for these blocks.

    In this study, the primary variable u is the square root of population density, and

    the secondary variable v is impervious surface fraction. Experimental variograms

    C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579 569

  • 8/3/2019 Population Density in Urban Areas

    13/22

    and cross-variograms were calculated using Eqs. (3) and (4). Gstat software was uti-

    lized to fit these variograms to theoretical functions (Pebesma & Wesselin, 1998).

    The weighted least squared method and visualization were applied in modeling the

    experimental variograms (Cressie, 1985). Directional variograms were also com-

    puted and no obvious anisotropies were found. Therefore, the variograms were as-sumed to be isotropic and were fitted using an exponential model of the following

    form:

    Fig. 8. Histogram of (a) population density and (b) square root of population density at sampling points.

    It shows that population density may be described by a Poisson distribution, while the square root

    transformation is a reasonable approximation of a normal distribution.

    Fig. 9. Histogram of impervious surface fraction at sampling points.

    570 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

  • 8/3/2019 Population Density in Urban Areas

    14/22

    ch C0 C1f1 e

    h=rg for h > 0

    0 h 0

    (14

    Here C0 is the nugget representing unexplained variance and r defines the spatialscale of the variation. In practice, the sill is C0 + 0.95C1 at the point of 3r. In this

    study, the parameters were calculated for the variograms of the square root of pop-

    ulation density and impervious surface fraction, and also for their cross-variogram

    (see Table 2 and Fig. 10).

    After obtaining the variograms of impervious surface fraction, square root of

    population density, and their cross-variogram, a block cokriging was performed to

    interpolate population density (see Fig. 11) using Gstat software embedded in

    Idrisi (Harmon, 2002). Fig. 11 shows a clear geographical pattern of population

    distribution in the study region. In particular, few people live in the CBD except

    Table 2

    Coefficients of the theoretical variogram and cross-variogram functions

    C0 C1 r

    Population density 0.196 0.176 1000

    Impervious surface 0.007 0.0089 1000

    Population densityimpervious surface 0.012 0.030 1000

    Fig. 10. Variograms of (a) square root of population density, (b) residential impervious surface fraction,

    and (c) the cross-variogram between square root of population density and impervious surface fraction.

    Exponential functions with r = 1000 are chosen to model these variograms.

    C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579 571

  • 8/3/2019 Population Density in Urban Areas

    15/22

    group-quarter populations. High-density household-based populations are adjacent

    to the CBD in the southern and northwestern portions of the study region. More-

    over, low-density household-based populations reside relatively far away from the

    CBD (in the eastern and southern portions).

    5. Accuracy assessment

    Using the cokriging variance approach defined in Eq. (11) for the square root of

    population density, the mean cokriging variance is 23.5% (minimum of 21.3% and

    maximum of 50.3%). Fig. 12 shows the distribution of cokriging variance in the

    study area. In particular, cokriging variance is high along the study area boundarybecause few samples are used in estimating population density in this portion of

    the region.

    It is possible to examine population count estimation accuracies at each census

    zonal level using the root mean square error (ERMS) and coefficient of variation

    (V) to evaluate the absolute and relative error as follows:

    ERMS 1

    n

    Xni1

    Pi

    bPi

    2

    " #1=215

    V1

    P

    Xni1

    jPi bPij 16

    Fig. 11. Estimated population density using developed cokriging method. The height indicates the value

    of population density for each TM pixel. The average population density is 4.28, with a maximum of 52,

    and a minimum of 0.

    572 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

  • 8/3/2019 Population Density in Urban Areas

    16/22

  • 8/3/2019 Population Density in Urban Areas

    17/22

    residential areas within each census block. Applied to our study area, the model is as

    follows (see Table 4):bPAi 6.5798 ILi 9.4650 IHi 19where lLi and l

    Hi are the fraction of impervious surface in low and high residential

    areas in a census block and bPA

    i is the expected population density in a census block

    using this alternative regression approach. In both regression models, the area ofeach census block was chosen as a weighting factor to reduce the effects of zone size.

    Moreover, the intercepts in these regression models are not included because they are

    not statistically significant (further its meaning in population estimation is not clear).

    The explanatory variables are statistically significant (p 6 0.0001), which shows the

    strong correlation between population density and the chosen explanatory variables

    (see Tables 3 and 4).

    Comparative results (see Table 5) show that the cokriging method is the most

    accurate. In particular, the coefficient of variation is relatively low at the census

    block level (34.7%), low at the block group and tract levels (15.2% and 10.2% respec-

    tively), and near zero for the entire study area (0.3%). The estimation accuracies ofthe two regression models are reported in Table 5 as well. Neither regression models

    perform as well as the cokriging method in terms of estimation accuracy. As an

    example, the coefficients of variation for the census tract level in the regression mod-

    els are 22.9% and 21.0% respectively, substantially higher than the variation ob-

    tained using cokriging (10.2%). Comparing the two regression models, regression

    with impervious surface fraction is slightly better than with land use classes (e.g.

    21.0% vs. 22.9% estimate error at the census tract level). This result is consistent with

    the literature showing that impervious surface fraction performs better than land

    use/cover in urban analysis (Ji & Jensen, 1999).

    Table 4

    Coefficients of the regression model with residential impervious surface fraction as explanatory variables

    Coefficients Value Std. error t value Pr(>jtj)

    IL 6.5798 0.4335 15.1793 0.0000IH 9.4650 0.1687 56.1212 0.0000

    Table 5

    Absolute and relative estimation errors of the cokriging and regression models

    Zones Average

    population

    Cokriging Regression with

    land cover

    Regression with

    impervious surface

    ERMS V ERMS V ERMS V

    Block (2445) 40.99 45.3 34.7% 47.9 48.8% 45.5 46.6%

    Block group (125) 801.74 215.0 15.2% 325.6 27.8% 290.7 25.2%

    Tract (36) 2825.84 411.0 10.2% 967.6 22.9% 846.0 21.0%

    Total study area 100, 200 0.3% 1.0% 2.6%

    574 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

  • 8/3/2019 Population Density in Urban Areas

    18/22

    6. Population density adjustment

    The cokriging approach gives unbiased estimates for the square root of popula-

    tion density with minimum variance. However, the population count estimation er-rors evaluated at the census block level are still somewhat large (34.7%). As discussed

    in previous studies (Langford & Unwin, 1994; Fisher & Langford, 1995; Martin,

    1996), interpolation methods should preserve population counts in each reporting

    zone. One option is adding a volume-preserving constraint in the cokriging model.

    However, this will make the model more complex since it has a quadratic objective

    function and a quadratic regional constraint. In fact, it is not clear that this resulting

    model can be solved, exactly or heuristically. An alternative option is to rescale the

    population estimates on every pixel to satisfy this zonal constraint:

    Pij bPij PibPi 20Here Pij is the rescaled population estimates of pixel j in census block i,

    bPij is thepopulation estimates through the cokriging, and Pi and bPi are the population countsof block i (census count and cokriging estimates, respectively). This rescaled popu-

    lation density (see Fig. 13) generally maintains the estimates obtained using cokri-

    ging, but emphasizes local variation as well. For example, the cokriging method

    tends to underestimate population counts in multi-story and high-rise buildings

    (the middle portion of Fig. 11). In contrast, the rescaling approach adjusts these

    inaccuracies and obtains more accurate population density estimates.

    Fig. 13. Adjusted population density that preserves zonal population counts. The height indicates the

    value of population density for each TM pixel. The average population density is 4.40, with a maximum of

    143, and a minimum of 0.

    C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579 575

  • 8/3/2019 Population Density in Urban Areas

    19/22

    7. Conclusion

    In this paper a cokriging method was developed for interpolating residential pop-

    ulation density using census count data and impervious surface fraction. The resultsare clearly better than regression-based interpolation approaches. In particular, the

    relative population estimation error for the entire study area is 0.3%, which is bet-ter than the results obtained using regression methods (1.0%2.6% estimation error).

    Moreover, the estimation errors at the census block group and tract levels (15.2%

    and 10.2% respectively) are about 10% lower than those calculated using regression

    models (about 2527% and 2123% respectively). At census block level, the estima-

    tion error is about 1315% lower than those reported for the regression models (see

    Table 3). These results demonstrate that cokriging applied to residential impervious

    surface fraction is a superior alternative to traditional regression based interpolation

    approaches using land use and land cover data.One reason explaining why cokriging performs well is that it addresses spatial

    autocorrelation and cross-autocorrelation associated with the distribution of people

    in urban areas. Instead of ignoring spatial dependence, it models the spatial autocor-

    relation of population and impervious surface fraction through variograms, and ap-

    plies them in population interpolation. Moreover, unlike other interpolation

    methods, it provides estimation variance (see Eq. (11) and Fig. 12) at the TM pixel

    level (30 by 30 meter). This estimation variance is an important tool for assessing

    population estimation error, without aggregating to census reporting zones.

    Another interesting aspect of this work is that residential impervious surface frac-

    tion was found to be an effective replacement for land use and land cover data typ-

    ically used in modeling population density. This makes sense intuitively given that

    impervious surface fraction is closely related to housing development, and thus pop-

    ulation density. Moreover, the cross-variogram (see Fig. 10c and Table 2) clearly

    shows that population density and impervious surface fraction are co-regionalized

    variables, with only 25% variance unexplained. Also, regression analyses show that

    the regression model with impervious surface fraction consistently performs better

    than the other utilizing land use classes.

    A final point is that the obtained population estimates are essential for urban

    planning applications. As an example, in sustainability studies, residential popula-tion density is a primary indicator of automobile dependent regions (Harris & Long-

    ley, 2000). In addition, the estimates of population density may be utilized in

    transportation analyses. The traffic analysis zone (TAZ) is typically used as a basic

    unit in traffic demand estimation and trip generation. However, there are significant

    problems with traditional TAZ definitions as well as difficulties with associated tra-

    vel distance calculation (Daganzo, 1980; Miller, 1999). Detailed population informa-

    tion may be potentially helpful in redefining TAZs in order to achieve more

    homogeneous population densities and socio-economic characteristics, thus poten-

    tially eliminating the modifiable areal unit problem in a range transportation analy-

    sis approaches.While the developed approach is a considerable improvement for estimating pop-

    ulation density at a fine scale, there are potential improvements that may be worth

    576 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

  • 8/3/2019 Population Density in Urban Areas

    20/22

    exploring. One improvement would be satisfying the volume preserving constraint

    during the interpolation process, requiring that interpolated population counts in

    every census zone be equal to observed counts. In this study, we satisfied this con-

    straint by rescaling population density in every pixel after interpolation. Althoughthe population counts in every census zone are maintained, this adjustment may

    introduce bias and increase estimation variance. More sophisticated models might

    increase population density estimation accuracy and maintain population counts

    in every census zone simultaneously.

    References

    Aboufirassi, M., & Marino, M. A. (1984). Cokriging of aquifer transmissivities from field measurements of

    transmissivity and specific capacity. Mathematical Geology, 16(1), 1935.Atkinson, P. M., Webster, R., & Curran, P. J. (1992). Cokriging with ground-based radiometry. Remote

    Sensing of Environment, 41, 4560.

    Atkinson, P. M., Webster, R., & Curran, P. J. (1994). Cokriging with airborne MSS imagery. Remote

    Sensing of Environment, 50, 335345.

    Bailey, T., & Gatrell, A. C. (1995). Chapter 7: The analysis of area data. Interactive Spatial Data Analysis,

    Longman Group Limited.

    Beguin, H., Thomas, I., & Vandenbussche, D. (1992). Weight variation with a set of demand points, and

    locationallocation issues: A case study of public libraries. Environment and Planning A, 24, 17691779.

    Bracken, I. (1993). An extensive surface model database for population related information: Concept and

    application. Environment and Planning B, 20, 1327.

    Chen, K. (2002). An approach to linking remotely sensed data and areal census data. International Journal

    of Remote Sensing, 23, 3748.Cressie, N. (1985). Fitting variogram models by weighted least squares. Mathematical Geology, 17,

    563586.

    Cressie, N. (1993). Statistics for spatial data (revised edition). New York: Wiley.

    Curran, P. J. (1988). The semivariogram in remote sensing: An introduction. Remote Sensing of

    Environment, 24, 493507.

    Curran, P. J. (2001). Remote sensing: Using the spatial domain. Environmental and Ecological Statistics, 8,

    331344.

    Curran, P. J., & Atkinson, P. M. (1998). Geostatistics and remote sensing. Progress in Physical Geography,

    22(1), 6178.

    Daganzo, C. F. (1980). Network representation, continuum approximations and a solution to the spatial

    aggregation problem of traffic assignment. Transportation Research, 14B, 229239.

    ERDAS Imagine (1997). ERDAS Imagine tour guides (4th ed.). Atlanta Georgia: ERDAS, Inc.Fisher, P. F., & Langford, M. (1995). Modeling the errors in areal interpolation between zonal systems by

    Monte Carlo simulation. Environment and Planning A, 27, 211224.

    Fotheringham, A. S., & Wong, D. W. S. (1991). The modifiable areal unit problem in multivariate

    statistical analysis. Environmental and Planning A, 23, 10251034.

    Franklin County Auditor (2002). Franklin county auditors interactive geographic information system.

    .

    Goodchild, M. F., Anselin, L., & Deichmann, U. (1993). A framework for the areal interpolation of

    socioeconomic data. Environment and Planning A, 25, 383397.

    Green, A. A., Berman, M., Switzer, P., & Craig, M. D. (1988). A transformation for ordering multispectral

    data in terms of image quality with implications for noise removal. IEEE Transactions on Geoscience

    and Remote Sensing, 26, 6574.

    Griffith, D. A. (1993). Spatial regression analysis on the PC: Spatial statistics using SAS. Washington, DC:

    Association of American Geographers.

    C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579 577

    http://209.51.193.83/search.htmlhttp://209.51.193.83/search.html
  • 8/3/2019 Population Density in Urban Areas

    21/22

    Griffith, D. A., & Can, A. (1996). Spatial statistical/econometric version of simple urban population

    density models. In S. L. Arlinghaus & D. A. Griffith (Eds.), Practical handbook of spatial statistics.

    CRC Press.

    Harmon, D. (2002). Quick Take Reviews: Idrisi32 Release 2. GEOWorld, March, pp. 5051.

    Harris, R. J., & Longley, P. A. (2000). New data and approaches for urban analysis: Modeling residential

    densities. Transactions in GIS, 4(3), 217234.

    Harvey, J. T. (2002). Estimating census district populations from satellite imagery: Some approaches and

    limitations. International Journal of Remote Sensing, 23(10), 20712095.

    Jensen, J. R. (1983). Biophysical remote sensing. Annals of the Association of American Geographers, 73,

    111132.

    Ji, M., & Jensen, J. R. (1999). Effectiveness of subpixel analysis in detecting and quantifying urban

    imperviousness from Landsat Thematic Mapper imagery. Geocarto International, 14(4), 3139.

    Journel, A. G., & Huijbregts, C. J. (1978). Mining geostatistics. New York: Academic Press.

    Lam, N. S. (1983). Spatial interpolation methods: A review. American Cartographer, 10(2), 129149.

    Langford, M., Maguire, D. J., & Unwin, D. J. (1991). The areal interpolation problem: Estimating

    population using remote sensing in a GIS framework. In I. Masser & M. Blakemore (Eds.), Handling geographical information: Methodology and potential applications (pp. 5577). Harlow, Essex:

    Longman.

    Langford, M., & Unwin, D. J. (1994). Generating and mapping population density surfaces within a

    geographical information system. Cartographic Journal, 31, 2126.

    Lee, J. B., Woodyatt, A. S., & Berman, M. (1990). Enhancement of high spectral resolution remote sensing

    data by a noise-adjusted principal components transformation. IEEE Transactions on Geoscience and

    Remote Sensing, 28, 295304.

    Lo, C. P. (1995). Automated population and dwelling unit estimation from high-resolution satellite

    images: A GIS approach. International Journal of Remote Sensing, 16(1), 1734.

    Longley, P., & Clarke, G. (1995). GIS for business and service planning. Cambridge: GeoInformation

    International.

    Martin, D. (1989). Mapping population data from zone centroid locations. TransactionsInstitute ofBritish Geographers, 14, 9097.

    Martin, D. (1996). An assessment of surface and zonal models of population. International Journal of

    Geographical Information Systems, 10(8), 973989.

    Martin, D., & Williams, H. C. W. L. (1992). Market-area analysis and accessibility to primary health-care

    centers. Environment and Planning A, 24, 10091019.

    McBratney, A. B., & Webster, R. (1983). Optimal interpolation and isarithmic mapping of soil properties:

    5. Co-regionalization and multiple sampling strategy. Journal of Soil Science, 34(1), 137162.

    McBratney, A. B., & Webster, R. (1986). Choosing functions for semi-variograms of soil properties and

    fitting them to sampling estimates. Journal of Soil Science, 37, 617639.

    Mesev, V. (1998). The use of census data in urban image classification. Photogrammetric Engineering and

    Remote Sensing, 64, 431438.

    Mid-Ohio Regional Planning Commission (MORPC) (2002). GIS technology. .

    Miller, H. J. (1999). Potential contributions of spatial analysis to geographical information systems for

    transportation (GIS-T). Geographical Analysis, 31(4), 373399.

    Moon, Z. K., & Farmer, F. L. (2001). Population density surface: A new approach to an old problem.

    Society and Natural Resources, 14, 3949.

    Multi-Resolution Land Characteristics Consortium (MRLC) (2002). National land cover data (NLCD).

    .

    Myers, D. E. (1982). Matrix formulation of co-kriging. Mathematical Geology, 14(3), 250257.

    Ohio Geographically Referenced Information Program (OGRIP) (1999). Digital orthophoto quarter-

    quadrangles. .

    Okabe, A., & Sadahiro, Y. (1997). Variation in count data transferred from a set of irregular zones to a set

    of regular zones through the point-in-polygon method. International Journal of Geographical

    Information Science, 11(1), 93106.

    578 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

    http://www.morpcsoft.org/GIS/gis.htmhttp://www.morpcsoft.org/GIS/gis.htmhttp://www.morpcsoft.org/GIS/gis.htmhttp://www.epa.gov/mrlc/nlcd.htmlhttp://www.epa.gov/mrlc/nlcd.htmlhttp://www.epa.gov/mrlc/nlcd.htmlhttp://www.morpcsoft.org/GIS/gis.htmhttp://www.morpcsoft.org/GIS/gis.htm
  • 8/3/2019 Population Density in Urban Areas

    22/22

    Oliver, M., Webster, R., & Gerrard, J. (1989a). Geostatistics in physical geography, Part I: Theory.

    TransactionsInstitute of British Geographers, 14, 259269.

    Oliver, M., Webster, R., & Gerrard, J. (1989b). Geostatistics in physical geography, Part II: Applications.

    TransactionsInstitute of British Geographers, 14, 270286.

    Openshaw, S. (1977). Optimal zoning systems for spatial interaction models. Environment and Planning A,

    9, 169184.

    Pebesma, E. J., & Wesseling, C. G. (1998). Gstat: A program for geostatistical modeling, prediction and

    simulation. Computers and Geosciences, 24(1), 1731.

    Phinn, S., Stanford, M., Scarth, P., Murray, A. T., & Shyy, T. (2002). Monitoring the composition and

    form of urban environments based on the vegetationimpervious surfacesoil (VIS) model by sub-pixel

    analysis techniques. International Journal of Remote Sensing, 23, 41314153.

    Plane, D. A., & Rogerson, P. A. (1994). The geographical analysis of population with applications to

    business and planning. New York: Wiley.

    Rashed, T., Weeks, J. R., Gadalla, M. S., & Hill, A. G. (2001). Revealing the anatomy of cities through

    spectral mixture analysis of multispectral satellite imagery: A case study of the Greater Cairo region,

    Egypt. Geocarto International, 16(4), 515.Ridd, M. K. (1995). Exploring a VIS (vegetationimpervious surfacesoil) model for urban ecosystem

    analysis through remote sensing: Comparative anatomy for cities. International Journal of Remote

    Sensing, 16, 21652185.

    Sadahiro, Y. (1999). Accuracy of areal interpolation: A comparison of alternative methods. Journal of

    Geographical Systems, 1, 323346.

    Tobler, W. (1999). Linear pycnophylactic reallocationcomment on a paper by D. Martin. International

    Journal of Geographical Information Science, 13(1), 8590.

    United States Census Bureau (2002). United States Census 2000. .

    Vauclin, M., Vieira, S. R., Vachaud, G., & Nielsen, D. R. (1983). The use of cokriging with limited field

    soil observations. Journal of Soil Science Society of American, 47(2), 175184.

    Webster, R. (1985). Quantitative spatial analysis of soil in the field. Advances in Soil Science, 3, 170.Webster, R., & Burgess, T. M. (1980). Optimal interpolation and isarithmic mapping of soil properties, III

    changing drift and universal kriging. Journal of Soil Science, 31, 505524.

    Woodcock, C. E., Strahler, A. H., & Jupp, D. L. B. (1988). The use of variograms in remote sensing: I.

    Scene models and simulated images. Remote Sensing of Environment, 25, 323348.

    Wu, C., & Murray, A. T. (2003). Estimating impervious surface distribution by spectral mixture analysis.

    Remote Sensing of Environment, 84, 493505.

    C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579 579

    http://www.census.gov/main/www/cen2000.htmlhttp://www.census.gov/main/www/cen2000.htmlhttp://www.census.gov/main/www/cen2000.htmlhttp://www.census.gov/main/www/cen2000.htmlhttp://www.census.gov/main/www/cen2000.html