Page 1
A Spatial Disaggregation Model for Maximising the Application of Long-Term Forecasting Land
Use Transport Models Based on Zonal Data
Youngsoo, An1, Valentina, Nacar2, Seungil, Lee3
Abstract
This paper presents a process and an empirical analysis on the new disaggregation model. This is
need to maximize application of long-term forecasting land-use transport model based on zonal
data. This study used approach for disaggregating the predicted results based on zonal data.
However, when the zonal data are disaggregated for each cell, we apply the aggregated cell data
based on building units in the base year. This paper composed the process of disaggregation model
as two parts which are “Which will be reconstructed before the target year?” and “How much will
the cell be reconstructed?” Regarding the two parts, this paper presents the results of empirical
analysis for them. First, we calculated the probability for whether or not a cell will be reconstructed
using binary logistic regression model. As the result, we could classified the cells will be
reconstructed up to 80.4% using rank of accessibility value and rank of density value for each cell.
Second, we estimated the expected reconstructed floor space in the cell using location utility. The
𝑅2 value, which represents the explanation power of the regression model, was approximately 69.5%,
which can be very high. Results of this study is expected to develop more detailed disaggregation
model.
1 The University of Seoul (First author, [email protected] )
2 David Simmons Consultancy ([email protected] )
3 The University of Seoul ([email protected] )
Page 2
1 Background and Goal
A city has been recognized as an object within the framework of the urban ecological
theory for a city, which means the city continues to evolve (Heeyun, H., 2002). For a
long period of time, many researchers have attempted to find patterns for the changes
in a city, and then to make forecast using these patterns. These attempts are natural
for sustainable growth in a city. In the early research in this area, intensive studies were
conducted on indexes such as the total population, number of households, employees,
and jobs to represent a city, region, or country based on macro-spatial units. Since then,
various urban simulation models have been developed using macro or mezzo spatial
units, i.e. zones with increasing demand, to determine the results of forecasting based
on smaller spatial units. In particular, models were developed in some countries in
Europe, with the most representative models including MEPLAN (Marcial Echenique and
Partners Ltd in the UK, Echenique et al., 1990; Echenique, 1994), IRPUD (Wegener, 1998,
1982), ITLUP (Integrated Transportation and Land Use Package, Putman, 1991, 1989),
TRANUS (de la Barra, 1989), and DELTA (Simmonds, 1999), which usually consider the
interaction with land-use and transport. Therefore, since the 2000s, studies on urban
simulation models have focused more on micro-spatial units.
Today there are several microsimulation models of urban land use and transport under
development in North America: the California Urban Futures (CUF) Model at the
University of California at Berkeley (Landis and Zhang 1998a, 1998b), the Integrated
Land Use, Transport and Environment (ILUTE) model at Canadian universities (Miller
2001), the Urban Simulation (UrbanSim) model at the University of Washington, Seattle
(Waddell 2000), and the 'second-generation' model of the Transport and Land Use
Model Integration Program (TLUMIP) of the State of Oregon, USA. There are no efforts
of comparable size in Europe. There are a few national projects, such as the Learning-
Based Transportation Oriented Simulations System (ALBATROSS) of Dutch universities
(Arentze and Timmermans 2000) or the Integrated Land-Use Modelling and
Transportation System Simulation (ILUMASS) in Germany (Moeckel, et al., 2003). A
microsimulation model can increase the demand for high-quality spatial data (Wagner,
P. and Wegener, M., 2007). Considering the nature of social phenomena with too many
(known & unknown) complex factors is the first problem in simulating these systems.
Therefore, although much academic attention has been given to the subject, there have
been very few applications (Bazghandi, et al., 2012).
Using spatial units with a higher resolution for simulating a city will make it possible to
Page 3
find more specific and detailed spatial changes in the future. In addition, this will widely
expand the range of uses. On the other hand, such a model may be more complicated
and require the validation of many parameters. In addition, it is difficult to construct a
database based on extremely small spatial units. In particular, this becomes more
important when attempting long-term forecasting in a city compared to short-term
forecasting because the variables to validate getting increased.
This study focuses on how we can estimate changes in a city effectively and use those
results efficiently. To do that, this study suggests using the zone-based model for long-
term forecasting of a city and estimating the results based on a microsimulation model
of the disaggregated zone. Particularly, this study attempts to develop a
microsimulation model to disaggregate from zonal data. In doing so, we think using
both the zone-based model and the microsimulation model would effectively and
efficiently solve long-term forecasting.
2 Literature Review
In this section, the literature on spatial disaggregation methods and the applications in
urban simulation model are reviewed in turn. For the spatial disaggregation methods,
mainly we reviewed spatial interpolation, because the spatial disaggregation methods
are based on areal interpolation techniques, can be classified according to various
criteria such as underlying assumptions or the use of ancillary data (2015? Wu?). Then
we reviewed the applications of spatial disaggregation method in urban models which
are PROPOLIS, ILUMASS and SOLUTIONS. Considerations and suggestions for model
estimation and selection are discussed in the end.
2.1 Spatial Disaggregation Method
In the beginning, the reason why the spatial disaggregation method was studied
principally by geography researchers was due to the limitation of aggregated data
based on zone or region, such as population and employment numbers. Tobler (1979)
mentioned that aggregate data could indicate how population density, a continuous
quantity, varies over a particular portion of the earth. The usual assumption made here
is that the density of any individual reporting region is a constant, and given a lack of
information to the contrary, it is implicit that this is an optimal viewpoint. To overcome
this assertion, Tobler (1979) originally suggested using the pycnophylactic interpolation
Page 4
method for isopleth mapping. This method assumes the existence of a smooth density
function, which takes into account the effect of adjacent source zones (Lam, 1983). The
function is that polygon data, such as population based on region, is rasterized by cell
units and then smoothed out (see Figure. 1).
Figure 1. Process of smoothing (data polygons, rasterized and smoothed) (Source by
Tobler, 1979)
Since then, the limitation of aggregated data has been continuously studied by many
researchers; it is known in the literature as the modifiable area unit problem (MAUP)
(Fotheringham & Wong 1991; Fotheringham & Rogerson 1993; Dennis & Wu 1996; Moon
& Farmer 2001). In addition, the spatial disaggregation method has been studying to solve
the MAUP, it can be divided into five types under different method as follows.
Table 1 A comparison of different spatial disaggregation techniques in terms of their
assumptions, methods and data demand. (Source by Li et al., 2007)
Technique Method Assumption Control Surface
(ancillary data)
Complexity
(1-5)
Simple
Area
Weighting
Cartogr
aphic
Homogeneous source zones None 5
Regression
Model
Statistic
al
Source zone composed of land
classes with global uniform density
Discrete or
Continuous
3
Binary
Dasymetric
Mapping
Cartogr
aphic
Source zone composed of
populated and unpopulated areas
Discrete (binary) 2
Three-Class
Dysymetric
Mapping
Cartogr
aphic
Homogeneity at different land class
(at each source zone)
Discrete 1-2
EM
Algorithm
Statistic
al
Source zone composed of land
classes with global uniform density
that conserve aggregate value
Discrete or
Continuous
1- 2
Page 5
The simple area weighting method assumes homogeneity of the distribution within a
region. This method is also far-reaching from the real world of expected spatial
distributions. In some cases, simple cartographic processing methods, such as overlay,
are used to disaggregate the source zones. Other more advanced techniques embrace
the more realistic expectation that source zones are heterogeneous but with an
unknown structure (Li et al., 2007).
Different approaches have been proposed based upon the assumptions made about
the spatial structure imposed on the source zones that resulted from the overlaid spatial
data (Li et al., 2007).
Regression models (Langford et al. 1991; Yuan et al. 1997) assume that the ancillary
land use classes define areas of global uniform density. That is, the land classes have a
uniform area density that is related to the parameter of interest over the whole of the
area, but it is unknown. Using a combination of the aggregate source values and the
ancillary data with unknown densities it is possible to developed regression equations
to numerically resolve this relationship (Li et al., 2007).
A drawback of this approach is that the global densities it computes allow for small
errors between the estimated and the actual source-zone values. The quality of resolved
densities maintaining the volume of the aggregate data value is called the
pycnophylactic property (Tobler 1979; Goodchild et al. 1993).
Hence there is another statistical technique for estimating the globally uniform density
for each land class while satisfying the pycnophylactic property - the EM algorithm
(Flowerdew & Green 1991; Flowerdew & Green 1992; Gregory & Paul 2005). However,
the assumption of uniform area density for each land class might be problematic when
dealing with many areas over a large region where relationships between population
and land class are not spatially uniform. Langford (2006) argued that global fitted
density can be estimated at local level by dasymetric mapping which allows for some
global variability in density for each land class.
A simple example is binary dasymetric mapping (Eicher & Brewer 2001) which takes a
binary land classification to control the population allocation. It assumes a non-zero
density in the populated areas within each source zone and a zero density elsewhere.
Hence varying assumptions can be made about the density in a functional way. A
further refinement to this is three-class dasymetric mapping (Mennis 2003), which
incorporates a functional relationship with area densities so that densities are uniform
within a source zone even though they may vary across the larger region.
Page 6
Overall, the density assumptions of different spatial disaggregation techniques can be
illustrated by Figure 1, where the vertical bars represent density for each land use class
and the parallel bar represents the density of the source zones. Comparably, the most
relaxed assumption of homogeneity used by three-class dasymetric mapping is close
to the complexity of real world.
The three-class dasymetric mapping is theoretically more appropriate to accommodate
the spatial heterogeneity of a large geographical area. Langford (2006) evaluated spatial
disaggregation techniques using UK Census data for the county of Leicestershire. The
results show that the three-class dasymetric method largely outperforms other spatial
disaggregation techniques, apart from the comparatively simpler binary dasymetric
method.
One possible reason of this inconclusive result is the more complex three-class
dasymetric technique is more sensitive to the land classification errors. On the other
hand, as Fisher and Langford (1995) pointed out, the significance of comparative results
is always limited by simplicity in the spatial structure of the study area, and a more
conclusive result could be experimentally validated by broadening the study area to
include more spatial heterogeneous density.
2.2 Cases of Application in Urban Model
In this section, two urban models, including the spatial disaggregation method, were
reviewed. The PROPOLIS project was reviewed first, followed by the ILUMASS project.
Figure 2 shows the difference between the PROPOLIS and the ILUMASS using the spatial
Page 7
disaggregation method.
Figure 2. Cases of using the spatial disaggregation method in the urban model
based on zonal data (Source by Wegener, 2010)
Row (A) indicates the urban models in which the spatial disaggregation method was
not used, and (B) and (C) indicate urban models that used the spatial disaggregation
method in a different way. In row (B), first, the urban model determined long-term
forecasting using zonal input data, and then the spatial disaggregation method was
used to expand its usefulness, such as to check the environmental impact. The
PROPOLIS is included in this case. Lastly, in row (C), the zonal data was spatially
disaggregated before implementing long-term forecasting using zonal data, followed
by a microsimulation using the disaggregated cell data for forecasting. Regarding the
details of the methods for the two models, which included the PROPOLIS and the
ILUMASS, they are as follows.
PROPOLIS
The major objective of the PROPOLIS (Planning and Research of Policies for Land Use
and Transport for Increasing Urban Sustainability) project is to research, develop, and
test integrated land use and transport policy assessment tools and methodologies. The
Page 8
project also defines sustainable urban strategies and demonstrates their long-term
effects (Spiekermann, 2003). In order to calculate such PROPOLIS indicators, a spatial
disaggregation module has been developed in the model called the raster module. The
raster module maintains the zonal organisation of the land use transport models and
adds a disaggregated raster-based representation of the space to include some of the
specific environmental and social impact sub models. Because the raster module is
based on the output of aggregate urban models, several steps must be taken to go
from a polygon-vector representation of zones and networks to a small scale of
environmental and social impacts to a re-aggregation of indicators for assessing
sustainability.
There are two main sources of input for the raster module. On the one hand, there is
a spatial database that depicts zone boundaries and land use categories as polygons,
and vectors are used to code the network. On the other hand, there are the policy-
dependent forecasts implemented by the land use transport models for the location of
households according to socio-economic group, employment, and zonal floor space
and traffic flow on the links of the network. This information is then converted to raster
cells. The main assumption concerning the disaggregation of activity locations is that
population and employment are not equally distributed over the territory of a zone,
but that there are differentiations in density. The assumption is that intra-zonal
differentiation is reflected by weights assigned to the raster cells based on typical
densities of land use categories (e.g. Bosserhof, 2000). These weights are converted to
probabilities by dividing them by the zone’s total weights. This gives a probability
distribution of households in a zone. Cumulating the weights over the cells of a zone
provides a range of numbers that can be associated with each cell. Using a random
number generator for each household, a cell is selected as the household's location.
This allocation of households takes into account the different weighting schemes for
the three socio-economic groups. The disaggregation of employment follows the same
procedure but with different weights (Spiekermann and Wegener, 1998).
ILUMASS
The ILUMASS project aims to develop, test, and apply a new type of integrated urban
land use/transport/environment (LTE) planning model. Urban LTE models simulate the
interaction between urban land use development, transport demand, traffic, and
environment. The distribution of land use in the urban region, such as residences,
workplaces, shops, and leisure facilities, creates a demand for spatial interaction, such
as work, shopping, or leisure trips. These trips occur as road, rail, bicycle, or walking
Page 9
trips over the transport network in the region, and they have environmental impacts.
The land use component of the ILUMASS model is based on the land use parts of the
mostly aggregate land use transport model developed at IRPUD (Wegener, 1999).
However, the ILUMASS model is microscopic, i.e. all land use changes and traffic flow
are modelled by microsimulation. The micro database contains a listing of residential
buildings and floor space details of non-residential buildings. Features associated with
each dwelling include building type, size, quality, tenure, and price, and every dwelling
has a raster cell as a micro location. The non-residential floor space is also distinguished
by industrial, retail, office, and public use. Raster cells are used as addresses for the
microsimulation, and for the disaggregation of zonal activities to raster cells, GIS-based
techniques are used.
To disaggregate spatially aggregated data within a zone, the land use distribution within
that zone is taken into account, i.e. it is assumed that there are areas of different density
in that zone. As a result, the spatial disaggregation of zonal data consists of three steps:
the generation of a raster representation of land use, the assignment of probabilities
to land use categories, and the allocation of the data to raster cells. Figure 4 illustrates
the three steps for a simple example (Spiekermann and Wegener 1999, 2000).
- First, land use data and zone borders, in vector-based GIS usually stored as polygons,
are converted to a raster representation by using a point-in-polygon algorithm for the
centroids of the raster cells. As a result, each cell has two attributes, the land use
category and the zone number of its centroid. These cells represent the addresses for
the disaggregation of zonal activity data.
- For each activity to be disaggregated, weights are assigned to each land use category,
and all cells are attributed with the weights of their land use category. Dividing the
weight of a cell by the total weight of all the cells of the zone gives the probability that
the cell is the address of just one element of the zonal activity. Cumulating the weights
over the cells of a zone then yields the range of numbers associated with each cell.
- Using a random number generator for each element of the zonal activity, one cell is
selected as its address. The results are individual addresses for all activities with a raster
representation of the distribution of each activity within the zone.
Page 10
Figure 3 Disaggregation of zonal data to raster data (Spiekermann and Wegener
2000: 48)
Although this study is similar to others from the viewpoint of using the data results
based on zone, such as in the Seoul Model with spatial disaggregation, this
methodology is different. In addition, this study also uses the microsimulation method,
however, the method is not for long-term forecasting, but rather for spatial
disaggregation of the estimated zonal data. Therefore, this study is proposing to use a
macro model based on zone for long-term forecasting as well as a microsimulation
model for an efficient use of the zonal data results in a spatial disaggregation method.
Page 11
3 Implementation
3.1 Conceptual Diagram
Figure 4 shows the main conceptual diagram of this study. Because the Seoul model of
long-term forecasting is based on zones, the model needs zonal data (A). Usually, zonal
data such as population, employment, and jobs come from the National Statistical
Office, but some of the zonal data, such as an average of land price and floor space
area for each land use type (based on a building or a parcel) comes from the GIS spatial
data (A’), which is aggregated according to zone (①). Using the zonal data as input
data (②), the Seoul model indicates long-term forecasts from the base year and
calculates the estimated data (B) for the zone in the target year (③). Because of the
limits of the zonal data utilized, there needs to be a disaggregation by each cell (④).
At this point, the microsimulation model works to disaggregate each cell using the
estimated zonal data (B’), which reflects the building data from the base year (⑤).
Figure 4 Conceptual diagram
This concept, which is the largest differentiation in this study, is expected to produce
more realistic values when disaggregating the predicted zonal results for the Seoul
model through the reflection (⑤). In addition, this study is focused on using the
microsimulation model to disaggregate by cell (④+⑤). We also assumed that the
Page 12
results of the Seoul model, which were based on zones in the target year, were
acceptable because we focused on the disaggregation methodology. Therefore, we are
omitting a detailed description of the Seoul model.
3.2 Study area and analysis spatial unit
The Seoul model used for the Seoul Metropolitan Area consists of a total of 579 zones,
which includes 522 small zones (Seoul city) and 57 large zones (Incheon city and Keong-
gi province) (refer to the side of the top-left map (①) in Fig. 5). We selected four small
zones (refer to the side of the bottom-right map (③) in Fig. 5) in Seoul city and applied
the disaggregation method in this study to them. In addition, we considered only retail
and business land use. This means that we did not consider residential land use or
other uses, such as for education and parks. The disaggregation method in this study
does not disaggregate all of the predicted zonal data for each cell at one time, but
rather first selects some of the zones as targets and then implements the
disaggregation method for each of them.
Figure 5 Study area and analysis spatial unit
Page 13
In addition, we divided the four zones into 1,015 cells using a 50 m × 50 m rectangle.
In terms of the spatial units, previous research (PROPOLIS, ILUMASS) used a 100 m ×
100 m rectangle to disaggregate all of the zones, however this study performed an
analysis using smaller 50 m × 50 m units for the disaggregation method because only
four zones were considered.
The four zones selected as the study area included a metro station in the middle (Shillim,
refer to ③ in Fig. 5) and two other stations (Shindaebang and Bongcheon) outside the
zones. These areas are already developed, and many retail and office facilities are
located in the catchment area of the Shillim metro station. Recently, there has been a
significant increase in the total number of passengers at the Shillim station. Because
the zones have been under strong pressure to redevelop, there is a need for long-term
management in the future. The long-term forecast for these zones shows that they can
be of use for sustainable management.
3.3 Model process
The Seoul model can provide zonal data for the four zones, including the total floor-space
values for each land use type for the base year (2010) and the target year (2030). By
comparing the total amounts of retail and office floor space in the base and target years,
we can also find the values for the total difference in floor space (TDFs) for the four zones
between the base and target years. An increase in the TDFs values indicates that some cells
in the four zones were redeveloped. In addition, this increase means that other cells had
the same total floor space as before. However, this point could not be determined through
the allocation methods used in previous studies in which the zonal data was disaggregated
to cell units. Therefore, this study used a rotational process that selected a cell, redeveloped
it using some functions, and reiterated the process until a condition was satisfied.
In addition, the increase in TDFs values prompted two main research questions: ‘Which cell
will be redeveloped before the target year?’ and ‘How much will the cell be redeveloped?’
In terms of the first research question, because the cell to be redeveloped will have a
relatively higher probability value compared to the other cells, we used a probability
function to select a cell that is likely to be redeveloped. The second research question is
related to how much the new total floor space will increase when the cell is redeveloped.
As a result, we used a utility function to calculate the new total floor space for the cell
because it is related to the location utility. The process diagram created for this step is
shown in Figure 6.
Page 14
Figure 6 Process of disaggregation model
The detailed explanation is as follows. First, reconstruction could have already started
in the area before starting the rotation process. In 2008, redevelopment had already
started in three places. The total floor space that had already been planned was
excluded from the total amount in the rotation process.
Apply existed plan
The first step is check for and apply the existing reconstruction field and fixed plan to
the reconstruction between the base year and target year.
Calculate probability for each cell
The probability function is used to calculate the probability for each cell whether the
cell will be reconstructed or not. The probability (p(𝑥𝑖)) in terms of reconstruction in a
cell can be presented by following Equation 1.
p(𝑦𝑖 = 1|𝑥𝑖) = 𝑓(𝛽0 + 𝛽1𝑥𝑖), 𝑓(𝐴) =𝑒𝐴
1+𝑒𝐴 Equation 1
Where 𝑦𝑖 denotes the redevelopment in cell I, 𝑦𝑖 =1 means the cell ( 𝑥𝑖 ) was
redeveloped.
Page 15
The logistic regression model is used to obtain the probability for whether or not a cell
will be redeveloped. Two dependent variables are used: the rank of accessibility and
rank of density value. The rank of accessibility (RankAcc) refers to the potential users of
the cell in which accessibility is calculated by multiplying the total number of passengers
in the nearest metro station and estimating the number of users of the total floor space
of the cell, and then dividing by the distance between the nearest metro station and
the cell.
𝐴 = 𝛽0 + 𝛽1𝑅𝑎𝑛𝑘𝐷𝑒𝑛𝑠𝑖𝑡𝑦𝑖 + 𝛽2𝑅𝑎𝑛𝑘𝐴𝑐𝑐𝑖 Equation 2
Where, 𝑅𝑎𝑛𝑘𝐷𝑒𝑛𝑠𝑖𝑡𝑦𝑖 denotes the rank of density value in i cell (𝑥𝑖), 𝑅𝑎𝑛𝑘𝐴𝑐𝑐𝑖 means
the rank of accessibility value of i cell (𝑥𝑖).
The ranking value is calculated using the accessibility value for each cell for a relative
comparison; it is also used as a dependent variable. The accessibility value is based on
the gravity function and the equation to calculate the accessibility is as follows:
𝐴𝑐𝑐𝑖 =𝑇𝑁𝐷𝑘∙𝑇𝐹𝑈𝑖
𝑃𝑁𝐷𝑖∙𝑘 Equation 3
Where,
𝐴𝑐𝑐𝑖= Accessibility of i cell.
𝑇𝑁𝑃𝑘= Total number of passengers of k metro station in which k metro station
means the nearest metro station from i cell (it is calculated from the
transport model of the Seoul model).
𝑇𝐹𝑈𝑖=
=
Total number of floor space users in i cell.
Total floor space area in i cell / possession area per person
(it is calculated from the land use model of the Seoul model).
𝑃𝑁𝐷𝑘∙𝑖= Pedestrian network distance from i cell to nearest k metro station.
The rank of density (RankDen) refers to the total amount of that cell that has already
been developed. The density value, however, is calculated by dividing the total existing
floor space by the area of the cell (2,500 m2 = 50 m × 50 m). We converted from the
density value to the ranking value for the same reason as RankAcc.
Page 16
Select a cell
We select a cell randomly based on the estimated probability value for each cell. The
cell is selected randomly because the probability does not suggest that it will be
redeveloped, but rather that the probability for redevelopment is high. At this step, if
the age of a building (establishment year: 2016) in the cell is less than 10 years, the
cell is not selected. The building’s age needs to be greater than 10 years as a minimum
selection requirement based on the results of an empirical analysis.
Calculate expected reconstructed floor space of the cell
At the next step, we calculate the expected total floor space when the selected cell is
redeveloped. The expected reconstructed floor space is calculated using a multiple
regression model based on the location utility of a cell with the location factors as
following Equation 4.
𝑅𝑒𝐶𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝐹𝑙𝑟𝑖 = 𝛽0 + 𝛽1𝐷𝑖𝑠𝑡𝑀𝑒𝑡𝑟𝑜𝑖∙𝑘 + 𝛽2𝐷𝑖𝑠𝑡𝑅𝑜𝑎𝑑𝑖∙𝑚 + 𝛽3𝐷𝑒𝑛𝑠𝑖𝑡𝑦𝑖 Equation 4
Where,
𝑅𝑒𝐶𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝐹𝑙𝑟𝑖= The expected reconstructed floor space in I cell
𝐷𝑖𝑠𝑡𝑀𝑒𝑡𝑟𝑜𝑖∙𝑘= Network distance from i cell to the nearest metro station k
𝐷𝑖𝑠𝑡𝑅𝑜𝑎𝑑𝑖∙𝑚= Distance from i cell to the nearest main road
𝐷𝑒𝑛𝑠𝑖𝑡𝑦𝑖= Density of i cell
There are three dependent variables included in this equation: the network distance
from the cell to the nearest metro station (DistMetro), the distance from the cell to the
main road (DistRoad), and the density of the cell (DenCell). In addition, we apply a
random variable in the range of ±10% in the calculation of the new total floor space.
At this step, the age of a building in the cell being redeveloped is converted to zero
years old to avoid reselecting it.
Conditional statement
Lastly, the accumulated value of total increased floor space (TIFs) by the redevelopment
is compared to TDFs, and if TIFs is less than TDFs, the process returns to the second
Page 17
step and repeats. The expressions for TIFs and TDFs are as follows in Equations 5 and
6.
TIFs = ∑ (𝑅𝑒𝐶𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝐹𝑙𝑟𝑖 − 𝐸𝑥𝑖𝑠𝑡𝐹𝑙𝑟𝑖)𝑛𝑖=0 Equation 5
TIFs: Total increased floor space by the disaggregation model
𝑅𝑒𝐶𝑜𝑛𝑠𝐹𝑙𝑟𝑖 : 𝑇𝑜𝑡𝑎𝑙 𝑎𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝑟𝑒𝑐𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑒𝑑 𝑓𝑙𝑜𝑜𝑟 𝑠𝑝𝑎𝑐𝑒 𝑖𝑛 𝑐𝑒𝑙𝑙 𝑖
𝐸𝑥𝑖𝑠𝑡𝐹𝑙𝑟𝑖: 𝑇𝑜𝑡𝑎𝑙 𝑎𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝑒𝑥𝑖𝑠𝑡𝑖𝑛𝑔 𝑓𝑙𝑜𝑜𝑟 𝑠𝑝𝑎𝑐𝑒 𝑖𝑛 𝑐𝑒𝑙𝑙 𝑖
𝑖 (0, ⋯ , 𝑛): 𝑐𝑒𝑙𝑙𝑠 𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑 𝑎𝑛𝑑 𝑟𝑒𝑐𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑒𝑑 𝑏𝑦 𝑡ℎ𝑒 𝑚𝑜𝑑𝑒𝑙
TDFs = 𝑇𝐹𝑠𝑏𝑎𝑠𝑒 𝑦𝑒𝑎𝑟𝑧𝑜𝑛𝑒𝑠 − 𝑇𝐹𝑠𝑡𝑎𝑟𝑔𝑒𝑡 𝑦𝑒𝑎𝑟
𝑧𝑜𝑛𝑒𝑠 Equation 6
TDFs: Total difference in floor space from base year to target year in the zones
𝑇𝐹𝑠𝑏𝑎𝑠𝑒 𝑦𝑒𝑎𝑟𝑧𝑜𝑛𝑒𝑠 : 𝑇𝑜𝑡𝑎𝑙 𝑓𝑙𝑜𝑜𝑟 𝑠𝑝𝑎𝑐𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑧𝑜𝑛𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑏𝑎𝑠𝑒 𝑦𝑒𝑎𝑟
𝑇𝐹𝑠𝑡𝑎𝑟𝑔𝑒𝑡 𝑦𝑒𝑎𝑟𝑧𝑜𝑛𝑒𝑠 : 𝑇𝑜𝑡𝑎𝑙 𝑓𝑙𝑜𝑜𝑟 𝑠𝑝𝑎𝑐𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑧𝑜𝑛𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑡𝑎𝑟𝑔𝑒𝑡 𝑦𝑒𝑎𝑟
If the condition is satisfied, this simulation model ends. To recap briefly, this simulation
model reiterates to distribute the total increase of floor space in the zones for each of
the cells. There are two statistical models used, which are the binary logistic model
used to calculate the probability of each cell and the multiple regression model used
to estimate the expected reconstructed floor space.
4 Estimate Coefficient and validation
4.1 Introduction and base data
In this section, we use an empirical analysis to estimate the parameters for the model
constructed in the previous section. The base year is 2008, and the year to verify is
2016. Verification of the model is divided into two parts: selecting a cell and calculating
the reconstruction floor space.
In addition, we constructed some basic data for the empirical analysis as follows. First,
we needed the total floor-space data for each cell for the retail and business land use
in 2008. We extracted the buildings for which the main land use was retail or business
Page 18
from the entire building GIS data set, and then aggregated the buildings using 50 m ×
50 m cells (Fig. 7).
Figure 7 Process for calculating base data
There was a need to exclude some cells because they would not be changed, such as
foothills, streams, or main roads. We also excluded cells with only residential floor space
because we did not consider that the land use would change for a type such as
residential to retail or business. Finally, the number of cells that were analysed was 459
cells, as shown in Fig. 8.
Figure 8 Exclusion of some cells and final studied cells.
Fig. 9 shows the case of redeveloped buildings and aggregation by cell. We collected
the redeveloped building data from 2008 to 2016. The total number of cases was 107
buildings. If these data are aggregated by cell unit, the total number of cells, including
those with at least one redeveloped building, is 83 cells. However, among these,
construction had already started on two buildings in 2008. Therefore, we applied the
two building case as the first step to apply the existing plan. We performed an empirical
Page 19
analysis and verified the disaggregation model using the basic data for these two
buildings based on cells.
Figure 9 Redeveloped buildings and aggregate on each cell
4.2 Estimate parameters of the probability function
This study used a binary logistic model, which is one of the probabilistic choice models.
Redeveloping a building can be a discrete decision for an existing building. In addition,
it is the result of a decision by some developer, landowner, or planner. These decisions
are based on uncertainty. Therefore, when a decision is discrete and based on
uncertainty, the most popular method is a probability choice model that applies the
random utility theory. The probability for a cell being redeveloped can be explained as
the equation 1.
Among the total number of cells, the number that was redeveloped from 2008 to 2016
was 81 cells. Using this number as a dependent variable (𝑦𝑖 = 1), this study analysed
the binary logistic analysis. In this analysis, we used two independent variables: the
RankAcc and RankDen, as previously stated. RankAcc represents the rank value of cell
accessibility, and the accessibility and its rank value were calculated as shown in the
top maps of Fig. 10. In the same way, RankDen represents the rank value of cell density,
and the density and its rank value are shown in the bottom maps of Fig 10.
Page 20
Figure 10 Calculated RankAcc and RankDen based on cell units
Table 2 lists the results of the binary logistic regression model. The values of both
RankAcc and RankDen are statistically significant. RankAcc has a negative parameter,
which means if a cell has a higher accessibility ranking (ascending order), it will have a
higher probability value. In addition, RankDen has a positive effect, which means if a
cell has a higher density ranking (ascending order), it will have a lower probability value.
In the binary logistic regression model, the classification accuracy is important, and it
was 80.4%. This value is not low and can be acceptable.
Table 2 Results of binary logistic analysis
Index variables β S.E. Wals Sig. Exp(β)
Explanatory
variables
(constants) -.830 .228 13.216 .000 .436
RankAcc -.010 .002 18.535 .000 .990
RankDen .007 .002 9.599 .002 1.007
Goodness of
fit for the
model
𝑥2 26.238 (0.000)
Classification accuracy 80.4%
-2 Log-likelihood 428.097
𝑅2 of Cox and Snell 0.056
𝑅2 of Nagelkerke 0.088
Page 21
We can derive Equation 7 using the parameters in Table 1 for Equation 1.
Prob(𝑦𝑖 = 1|𝑥𝑖) =exp (−0.830+(−0.010)𝑅𝑎𝑛𝑘𝐴𝑐𝑐𝑖+(0.007)𝑅𝑎𝑛𝑘𝐷𝑒𝑛𝑖)
1+exp (−0.830+(−0.010)𝑅𝑎𝑛𝑘𝐴𝑐𝑐𝑖+(0.007)𝑅𝑎𝑛𝑘𝐷𝑒𝑛𝑖) Equation 7
We calculated the probabilities for each cell using Equation 9. The left side of Fig. 10
shows the order of the probabilities calculated for each cell (blue line), along with the
frequency of the cells that were actually reconstructed (orange vertical line). This graph
shows that even if a cell has a high probability value, it will not necessarily be
reconstructed, but simply has a high probability of being reconstructed. The right side
of Fig. 11 shows this point. The graph shows that with a higher range of probability,
the average frequency of reconstruction is higher. Therefore, we applied the random
sampling function based on the probability value of each cell.
Figure 11 Comparison of estimated probability and actual reconstructed cells (left),
along with average frequency for each range of probability (right).
4.3 Estimate parameters of expected reconstructed floor space
In this section, we verify the calculation of the expected reconstructed floor space in a
cell. We developed an equation to calculate the floor space using a multiple regression
model as the equation 4 using the location utility.
Regarding these dependent variables, the calculation processes for the DistMetro and
Density variables were presented in section 3. Fig. 12 presents the process for the
DistRoad variable. At the left side of Fig. 12, the bold red lines represent the main roads,
Page 22
and the distance from each cell to the main road is presented on the right side of Fig.
12
Figure 11 Calculated distances from main roads to each cell
The natural logs of the two distance variables (DistMetro and DistRoad) were obtained
when the regression model was analysed. The independent variable was the total actual
reconstructed floor space for the 81 cells. The results of the analysis are listed in Table
3. The 𝑅2 value, which represents the explanation power of the regression model, was
approximately 69.5%, which is very high. DistMetro and DistRoad showed negative
effects, which means being closer to the nearest metro station and main road further
increased the reconstructed floor space of the cell. In addition, if the Density variable
had a high value in a cell, the cell had a large reconstructed floor space.
Table 3 Results of multiple regression analysis
Dependent
variables
Unstandardized
Coefficients
Standardized
Coefficients t Sig.
Collinearity
Statistics
𝛽 Std. Error 𝛽∗ Tolerance VIF
(constant) 10.874 0.847 12.832 0.000
Ln_DistMetro -0.605 0.135 -0.326 -4.492 0.000 0.747 1.338
Ln_DistRoad -0.040 0.032 -0.081 -1.258 0.212 0.960 1.042
Density 0.773 0.089 0.627 8.701 0.000 0.759 1.318
If the parameters listed in Table 2 are used for equation 8, the following is obtained,
𝑅𝑒𝐶𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝐹𝑙𝑟𝑖 = (10.874) + (−0.605)𝐷𝑖𝑠𝑀𝑒𝑡𝑟𝑜𝑖∙𝑘 + (−0.040)𝐷𝑖𝑠𝑅𝑜𝑎𝑑𝑖∙𝑚 +
(−0.773)𝐷𝑒𝑛𝑠𝑖𝑡𝑦𝑖 Equation 8
Page 23
Using this equation, we estimated the reconstructed floor space when the cell was
reconstructed. Fig. 13 shows the estimated reconstructed floor space and actual
reconstructed floor space for each cell. At the left side of Fig. 13, the patterns of the
two graphs that are very similar exclude cell number 824. In addition, when we compare
the estimated and actual values at the right side of Fig. 13, the graph (bold red dotted
line) is between the two graphs within ±10%, which means the result of comparing the
estimated and actual values is acceptable.
Figure 12 Comparison of estimated floor space and actual floor space of
reconstructed cell.
In the case of cell 824, we checked the actual data to determine why the difference was
so large. There were a hospital (cell 797) and restaurant (cell 824) in 2008, but the
hospital was expanded to cell 824 in 2011. Therefore, the characteristics of the hospital
affected cell no. 824 (refer to Fig. 14).
Figure 14 Before (left) and after reconstruction in cells 824 and 797.
Page 24
5 Conclusion
In this paper, we proposed an improved disaggregation model to enhance urban
models based on zonal data. The previous studies usually utilized a spatial interpolation
method or allocation method with weight values for disaggregating from zonal data to
cell units. These methods have many potential advantages in the visualization of the
resulting data predicted by the models, but there are some limitations because of the
difference between the results and reality. On the other hand, this paper presented a
process for a disaggregation method to overcome these limitations. It repeatedly
selected a cell with the highest probability to be reconstructed and calculated the
expected floor space after its reconstruction using a utility for the cell’s location to
distribute the increased floor space in zones based on cell data that were aggregated
from building data in the base year. We performed an empirical analysis to verify the
disaggregation process using actual data collected until 2016.
As a result of this empirical analysis, we first calculated the probability value for each
cell to select a cell to be reconstructed using a binary logistic regression model. We
used two dependent variables (RankAcc and RankDen), and the accuracy was
approximately 81% for the classification of whether or not the cell will be reconstructed.
If RankAcc was high and RankDen was low, the probability was high. Next, we estimated
the new floor space when the cell will be reconstructed using a multinomial regression
model as the utility function. Three variables were used as dependent variables:
DistMetro, DistRoad, and DenCell. As a result, the 𝑅2 value, which represented the
explanation power of the regression model, was approximately 69.5%. This is a very
high value. A comparison of the estimated reconstructed floor space and actual
reconstructed floor space showed similar patterns, excluding some cells.
The following conclusions were reached in this study. First, the zonal data predicted by
various urban simulation models based on zonal spatial units can be disaggregated
based on building data in the base year. This is very beneficial not only for the
visualization of the zones but also for reducing the difference compared to the actual
data from the base year. In particular, this study focused on retail and business land
use and the catchment areas of metro stations, which makes it useful for the
management of a business district, analysis of long-term changes in commercial power,
or opening of a store. Second, this study used random functions in two steps, when
selecting a cell and estimating the floor space, based on the uncertainty theory. We
believed that a future spatial structure cannot be presented using a single scenario
Page 25
because there are many unpredictable conditions. Therefore, we presented various
future scenarios within an acceptable statistical range. In this sense, this study has
meaning in the application of the disaggregation process and can be a step forward
from previous studies.
As a final remark, we would like to point out some weaknesses of the research. First,
the dataset adopted only covers retail and business land use. There is a need for
another disaggregation model for residential floor space because retail or business
facilities are not separated from residential facilities. Therefore, if a disaggregation
model for the residential part is developed and integrated in the next study, the model
is expected to be more useful.
Reference
Bazghandi, A., (2012). Techniques, Advantages and Problems of Agent Based Modeling
for Traffic Simulation, International Journal of Computer Science Issues, 9_1(3): 115-119.
Fotheringham, A.S., Wegener, M., Eds., (2000): Spatial Models and GIS: New Potential
and New Models. GISDATA 7. London: Taylor & Francis, 45-61.
Goldner, W., (1971). The Lowry Model Heritage, Journal of the American Institute of
Planners, 37( ): 100-110.
Heonsoo, P. and Kyuyoung, C., (2008). An Empirical Analysis of Land Use Changes in
Daegu Metropolitan City by Using Probabilistic Choice Model, Korea Spatial Planning
Review, 58(): 137-150.
Heeyun, H., (2002). Urban Ecology and Urban Spatial Structure, Boseonggak Publishers,
Seoul, Republic of Korea.
Lam N. S., (1983). Spatial Interpolation Methods: A Review, The American Cartographer,
10(2): 129-149.
Moeckel, R., Spiekermann, K., Schurmass, C. and Wegener, M., (2003). Microsimulation
of Land Use, International Journal of Urban Sciences, 7(1): 14-31.
Oryani, K., (1997). Review of Land Use Models: Theory and Application, Transportation
Research Board, pp: 80-91
Pratesi M., (2015). Spatial Spatial Disaggregation and Small-Area Estimation Methods
Page 26
for Agricultural Surveys: Solutions and Perspectives, Technical Report Series
PROPOLIS: Planning and Research of Policies for Land Use and Transport for Increasing
Urban Sustainability, (2004) DG Research, Brussel, Belgium.
Putman, S. H., (1983). Integrated Urban Models, Policy Analysis of Transportation and
Land Use, Pion Limited, London
Putman S. H., (1991). Integrated Urban Models 2: New Research and Application of
Optimization and Dynamics, Pion Limited, London.
Sungsil, H. and Munjung, K., (2004). Acceptance Probability Model Using the Logistic
Regression Model, Journal of the Korean Data Analysis Society, 6(4): 1153-1161.
Tobler W. R., (1979). Smooth Pycnophylactic Interpolation for Geographical Regions,
Journal of the American Statistical Association, 74(367): 519-530.
Wagner P. and Wegener, M., (2007). Urban Land Use, Transport and Environment Models:
Experiences with an Integrated Microscopic Approach, disP-The Planning Review, 170(3):
45-56.
Wegener, M., (1994). Operational Urban Models: State of the Art, Journal of American
Planning Association, 60(1): 17-29.
Youngsoo, A., Seongman, J. and Seungil, L., (2016). An Empirical Study on the
Relationship between Pedestrian Network Distance and Building Density in the Area of
Urban Rail Station, Journal of Korea Planning Association, 51(2): 179-192.
Youngsoo, A., Yeonggyeong, K. and Seungil, L., (2014). A Study on the Impact of Soft
Location Factors in the Relocation of Service and Manufacturing Firms, International
Journal of Urban Science, 18(3): 327-339.
Youngsoo, A., Seongman, J. and Seungil, L., (2012). A Study on the Distribution Pattern
of Commercial Facilities around a Subway Station Using GIS Network Analysis, Journal
of Korea Planning Association, 47(1): 199-213.