Session Number: Parallel Session 2A Time: Monday, August 23, PM Paper Prepared for the 31st General Conference of The International Association for Research in Income and Wealth St. Gallen, Switzerland, August 22-28, 2010 The Decomposition of a House Price index into Land and Structures Components: A Hedonic Regression Approach W. Erwin Diewert, Jan de Haan and Rens Hendriks For additional information please contact: Name: W. Erwin Diewert Affiliation: University of British Columbia Email Address: [email protected]This paper is posted on the following website: http://www.iariw.org
31
Embed
St. Gallen, Switzerland, August 22-28, 2010 · 1 The Decomposition of a House Price index into Land and Structures Components: A Hedonic Regression Approach W. Erwin Diewert, Jan
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Session Number: Parallel Session 2A
Time: Monday, August 23, PM
Paper Prepared for the 31st General Conference of
The International Association for Research in Income and Wealth
St. Gallen, Switzerland, August 22-28, 2010
The Decomposition of a House Price index into Land and Structures
Our goal in this paper is to use readily available multiple listing data on sales of
residential properties and to somehow decompose the sales price of each property into a
land component and a structures component. We will use the data pertaining to the sales
of detached houses in a small Dutch city for 10 quarters, starting in January 1998.
In section 2, we will consider a very simple hedonic regression model where we use
information on only three characteristics of the property: the lot size, the size of the
structure and the (approximate) age of the structure. We run a separate hedonic regression
for each quarter which lead to estimated prices for land and structures for each quarter.
These estimated characteristics prices can then be into land and structures prices covering
the 10 quarters of data in our sample. We postulate that the value of a residential property
is the sum of two components: the value of the land which the structure sits on plus the
value of the residential structure. Thus our approach to the valuation of a residential
property is essentially a crude cost of production approach. Note that the overall value of
the property is assumed to be the sum of these two components.
In section 3, we generalize the model explained in section 2 to allow for the observed fact
that the per unit area price of a property tends to decline as the size of the lot increases (at
least for large lots). We use a simple linear spline model with 2 break points. Again, a
separate hedonic regression is run for each period and the results of these separate
regressions were linked together to provide separate land and structures price indexes
(along with an overall price index that combined these two components).
The models described in sections 2 and 3 were not very successful. The problem is the
variability in the data and this volatility leads to a tendency for the regression models to
fit the outliers, leading to volatile estimates for the price of land and structures. Thus in
section 4, we note that since the median price of the houses sold in each quarter never
declined, it is likely that the underlying separate land and structures prices also did not
decline over our sample period. Thus we imposed this monotonicity restriction on our
nonlinear regression model by using squared coefficients and nonlinear regression
techniques in one big regression using all 10 quarters of data. We obtained reasonable
estimates for the land and structures components using this technique.
3
Buoyed by the success of our quarterly model, we implemented the model using monthly
data instead of quarterly data in section 5. This is more challenging since we had only 30
to 60 observations for each month. However, the monthly model also worked reasonably
well and when we aggregated the monthly results into quarterly results, we obtained
quarterly results which were similar to the results obtained in section 4.
In section 6, we decided to compare our quarterly results with a more traditional hedonic
regression model for residential properties. In this more traditional approach, the log of
the property price is regressed on either the logs of the main characteristics of the
property (the land area and the floor space area) or on the levels of the main
characteristics, with dummy variables to represent quarter to quarter price change. We
found that the log-log regression fit the data much better than the log-levels regression
and the overall index of prices generated by the log-log regression was quite close to our
overall index of prices generated by the cost of production model explained in section 4.
However, when we used the log-log model to generate separate price index series for
land and for structures, the results did not seem to be credible.
Section 7 concludes with an agenda for further research on this topic.
2. Model 1: A Very Simple Model
Hedonic regression models are frequently used to obtain constant quality price indexes
for owner occupied housing.2 Although there are many variants of the technique, the
basic model regresses the logarithm of the sale price of the property on the price
determining characteristics of the property and a time dummy variable is added for each
period in the regression (except the base period). Once the estimation has been
completed, these time dummy coefficients can be exponentiated and turned into an
index.3
Since hedonic regression methods assume that information on the characteristics of the
properties sold is available, the data can be stratified and a separate regression can be run
for each important class of property. Thus hedonic regression methods can be used to
produce a family of constant quality price indexes for various types of property.4
A real estate property has two important price determining characteristics:5
2 See for example, Crone, Nakamura and Voith (2000) (2009) Diewert, Nakamura and Nakamura (2009),
Gouriéroux and Laferrère (2009), Hill, Melser and Syed (2009) and Li, Prud‟homme and Yu (2006). 3 An alternative approach to the time dummy hedonic method is to estimate separate hedonic regressions
for both of the periods compared; i.e., for the base and current period. See Haan (2008) (2009) and Diewert,
Heravi and Silver (2009) for discussions and comparisons between these alternative approaches. 4 This property of the hedonic regression method also applies to stratification methods. The main difference
between the two methods is that continuous variables can appear in hedonic regressions (like the area of the
structure and the area of the lot size) whereas stratification methods can only work with discrete ranges for
the independent variables in the regression. Typically, hedonic regressions are more parsimonious; i.e.,
they require fewer parameters to explain the data as opposed to stratification methods. 5 A third important characteristic is the location of the property; i.e., how far is the property from shopping
centers, places of employment, hospitals and good schools; does the property have a view; is the property
subject to noise or particulate pollution and so on. The presence or lack of these amenities will affect the
4
The land area of the property and
The livable floor space area of the structure.
For some purposes, it would be very useful to decompose the overall price of a property
into additive components that reflected the value of the land that the structure sits on and
the value of the structure. The purpose of the present paper is to determine whether a
hedonic regression technique could provide such a decomposition.
Diewert (2007) suggested some possible hedonic regression models that might lead to
additive decompositions of an overall property price into land and structures
components.6 We will now outline his suggested model (with a few modifications).
If we momentarily think like a property developer who is planning to build a structure on
a particular property, the total cost of the property after the structure is completed will be
equal to the floor space area of the structure, say S square meters, times the building cost
per square meter, say, plus the cost of the land, which will be equal to the cost per
square meter, say, times the area of the land site, L. Now think of a sample of
properties of the same general type, which have prices vnt in period t
7 and structure areas
Snt and land areas Ln
t for n = 1,...,N(t), and these prices are equal to costs of the above
type plus error terms nt which we assume have means 0. This leads to the following
hedonic regression model for period t where t and
t are the parameters to be estimated
in the regression:8
(1) vnt =
tLn
t +
tSn
t + n
t ; n = 1,...,N(t); t = 1,...,T.
Note that the two characteristics in our simple model are the quantities of land Lnt and the
quantities of structure Snt associated with the sale of property n in period t and the two
constant quality prices in period t are the price of a square meter of land t and the price
of a square meter of structure floor space t. Finally, note that separate linear regressions
can be run of the form (1) for each period t in our sample.
price of land in the neighbourhood and thus it is important to stratify the sample in order to control for
these neighbourhood effects. In our example, the Dutch town of “A” is small enough and homogeneous
enough so that these neighbourhood effects can be neglected. 6 Two other recent studies that followed up on Diewert‟s suggested approach are by Koev and Santos Silva
(2008) and Statistics Portugal (2009). 7 Note that we have labeled these property prices as vn
0 to emphasize that these are values of the property
and we need to decompose these values into two price and two quantity components, where the
components are land and structures. 8 In order to obtain homoskedastic errors, it would be preferable to assume multiplicative errors in equation
(1) since it is more likely that expensive properties have relatively large absolute errors compared to very
inexpensive properties. However, following Koev and Santos Silva (2008), we think that it is preferable to
work with the additive specification (1) since we are attempting to decompose the aggregate value of
housing (in the sample of properties that sold during the period) into additive structures and land
components and the additive error specification will facilitate this decomposition.
5
The hedonic regression model defined by (1) is the simplest possible one but it is a bit too
simple since it neglects the fact that older structures will be worth less than newer
structures due to the depreciation of the structure. Thus suppose in addition to
information on the selling price of property n at time period t, vnt, the land area of the
property Lnt and the structure area Sn
t, we also have information on the age of the
structure at time t, say Ant. Then if we assume a straight line depreciation model, a more
realistic hedonic regression model than that defined by (1) above is the following model:
(2) vnt =
tLn
t +
t(1
tAn
t)Sn
t + n
t ; n = 1,...,N(t); t = 1,...,T
where the parameter t reflects the depreciation rate as the structure ages one additional
period. Thus if the age of the structure is measured in years, we would expect t to be
between 1 and 2%.9 Note that (2) is now a nonlinear regression model whereas (1) was a
simple linear regression model. Both models (1) and (2) can be run period by period; it is
not necessary to run one big regression covering all time periods in the data sample. The
period t price of land will the estimated coefficient for the parameter t and the price of a
unit of a newly built structure for period t will be the estimate for t. The period t quantity
of land for property n is Lnt and the period t quantity of structure for property n,
expressed in equivalent units of a new structure, is (1 tAn
t)Sn
t where Sn
t is the floor
space area of property n in period t.
We implemented the above model (2) using real estate sales data on the sales of detached
houses for a small city (population is around 60,000) in the Netherlands, City A, for 10
quarters, starting in January 1998 (so our T = 10). The data that we used can be described
as follows:
vnt is the selling price of property n in quarter t in units of 10,000 Euros where t =
1,...,10;
Lnt is the area of the plot for the sale of property n in quarter t in units of 100
meters squared;10
Snt is the living space area of the structure for the sale of property n in quarter t in
units of 100 meters squared;
Ant is the (approximate) age (in decades) of the structure on property n in period
t.11
There were 1404 observations in our 10 quarters of data on sales of detached houses in
City A. The sample means for the data were as follows: v = 11.198, L = 2.5822, S =
1.2618 and A = 1.1859. Thus the sample of houses sold at the average price of 111,980
Euros, the average plot size was 258.2 meters squared, the average living space in the
9 This estimate of depreciation will be an underestimate of “true” structure depreciation because it will not
account for major renovations or additions to the structure. 10
We chose units of measurement in order to scale the data to be small in magnitude in order to facilitate
the nonlinear regression package used, which was Shazam. 11
The original data were coded as follows: if the structure was built 1960-1970, the observation was
assigned the dummy variable BP = 5; 1971-1980, BP=6; 1981-1990, BP=7; 1991-2000, BP=8. Our Age
variable A was set equal to 8 BP. Thus for a recently built structure n in quarter t, Ant = 0.
6
structure was 126.2 meters squared and the average age was approximately 12.6 years.
The sample median price was 95,918 Euros.
The results of our 10 nonlinear regressions of the type defined by (2) above are
summarized in Table 1 below. The Adjusted Structures Quantities in quarter t, ASt, is
equal to the sum over the properties sold n in that quarter adjusted into new structure
units, n (1 tAn
t)Sn
t.
Table 1: Estimated Land Prices t, Structure Prices
t, Decade Depreciation Rates
t, Land Quantities L
t and Adjusted Structures Quantities AS
t
Quarter t
t
t L
t AS
t
1 1.52015 5.13045 0.10761 380.1 177.5
2 1.40470 6.33087 0.15918 426.9 166.4
3 1.83006 5.13292 0.13410 248.6 111.2
4 1.71757 5.56902 0.14427 285.2 122.0
5 0.70942 8.23225 0.12613 390.2 158.4
6 0.26174 9.94447 0.09959 419.4 168.7
7 2.12605 6.27949 0.13258 368.9 136.5
8 1.71496 7.29677 0.13092 347.3 136.2
9 1.47354 7.86387 0.10507 356.7 156.4
10 2.68556 6.21736 0.18591 402.1 161.6
It can be seen that the decade depreciation rates t are in the 10 to 18% range which is not
unreasonable but the volatility in these rates is not consistent with our a priori expectation
of a stable rate. Unfortunately, our estimated land and structures prices are not at all
reasonable: the price of land sinks to a very low level in quarter 6 while the price of
structures peaks in this quarter. Thus it appears that either or model is incorrect or that
our sample is too small and we are fitting the errors to some extent.
It is of some interest to compare the above land and structures prices with the mean and
median prices for houses in the sample for each quarter. These prices were normalized to
equal 1 in quarter 1 and are listed as PMean and PMedian in Table 2 below. The land and
structures prices in Table 1, t and
t, were also normalized to equal 1 in quarter 1 and
are listed as PL and PS in Table 2. Finally, we used the price data in Table 1, t and
t,
along with the corresponding quantity data, Lt and AS
t, in Table 1 in order to calculate a
“constant quality” chained Fisher house price index, which is listed as PF in Table 2.
Table 2: Quarterly Mean, Median and Predicted Fisher Housing Prices and the
Price of Land and Structures
Quarter PMean PMedian PF PL PS
1 1.00000 1.00000 1.00000 1.00000 1.00000
2 1.11935 1.07727 1.10689 0.92406 1.23398
3 1.07982 1.11666 1.08649 1.20387 1.00048
7
4 1.13171 1.13636 1.10735 1.12987 1.08548
5 1.20659 1.24242 1.13521 0.46668 1.60459
6 1.31463 1.32424 1.20389 0.17218 1.93832
7 1.36667 1.33333 1.33644 1.39858 1.22397
8 1.43257 1.43939 1.32944 1.12816 1.42225
9 1.41027 1.44242 1.32764 0.96934 1.53278
10 1.45493 1.51515 1.47253 1.76665 1.21185
Note that the median price increases in each quarter while the mean price drops (slightly)
in quarters 3 and 9. It can be seen that the overall Fisher housing price index PF is
roughly equal to the mean and median price indexes but again, the separate price series
for housing land PL and for housing structures PS are not realistic.
The series in Table 2 are graphed in Chart 1 below.
Chart 1: Quarterly Mean, Median and Predicted Fisher Housing Prices and the
Price of Land and Structures Using Model 1
It can be seen that while the overall predicted Fisher house price index is not too far
removed from the median and mean house price indexes, the separate land and structures
components of the overall index are not at all sensible.
One possible problem with our highly simplified house price model is that our model
makes no allowance for the fact that larger sized plots tend to sell for an average price
that is below the price for medium and smaller sized plots. Thus in the following section,
we will generalize the model (2) to take into account this empirical regularity.
3. Model 2: The Use of Linear Splines on Lot Size
0
0.5
1
1.5
2
2.5
1 2 3 4 5 6 7 8 9 10
PMean PMedian PFisher PLand PStructures
8
We broke up our 1404 observations into 3 groups of property sales:
Sales involving lot sizes less than or equal to 200 meters squared (Group S);
Sales involving lot sizes between 200 and 400 meters squared (Group M) and
Sales involving lot sizes greater than 400 meters squared (Group L).
For an observation n in period t that was associated with a small lot size, our regression
model was essentially the same as in (2) above; i.e., the following estimating equation
was used:
(3) vnt = S
tLn
t +
t(1
tAn
t)Sn
t + n
t ; t = 1,...,T; n belongs to Group S
where the unknown parameters to be estimated are t,
t and
t. For an observation n in
period t that was associated with a medium lot size, the following estimating equation
was used:12
(4) vnt = S
t (2) + M
t (Ln
t 2) +
t(1
tAn
t)Sn
t + n
t ; t = 1,...,T; n belongs to Group
M
where we have now added a fourth parameter to be estimated, Mt. Finally, for an
observation n in period t that was associated with a large lot size, the following
estimating equation was used:
(5) vnt = S
t (2) + M
t (4 2) + L
t (Ln
t 4) +
t(1
tAn
t)Sn
t + n
t ;
t = 1,...,T; n belongs to Group L
where we have now added a fifth parameter to be estimated, Lt. Thus for small lots, the
value of an extra marginal addition of land in quarter t is St, for medium lots, the value
of an extra marginal addition of land in quarter t is Mt and for large lots, the value of an
extra marginal addition of land in quarter t is Lt. These pricing schedules are joined
together so that the cost of an extra unit of land increases with the size of the lot in a
continuous fashion.13
The above model can readily be put into a nonlinear regression
format for each period using dummy variables to indicate whether an observation is in
Group S, M or L.
The results of our 10 nonlinear regressions of the type defined by (3)-(5) above are
summarized in Table 3 below.
12
Recall that we are measuring land in 100‟s of square meters instead of in squared meters. 13
Thus if we graphed the total cost C of a lot as a function of the plot size L in period t, the resulting cost
curve would be made up of three linear segments whose endpoints are joined. The first line segment starts
at the origin and has the slope St, the second segment starts at L = 2 and runs to L = 4 and has the slope
Mt and the final segment starts at L = 4 and has the slope L
t.
9
Table 3: Marginal Land Prices for Small, Medium and Large Lots, the Price of
Structures t and Decade Depreciation Rates
t
Quarter St
t L
t
t
t
1 0.31648 3.30552 0.87617 6.17826 0.06981
2 0.79113 2.96475 0.78643 6.44827 0.13999
3 1.77147 2.57100 1.27783 4.96547 0.12411
4 0.49927 3.48688 1.02879 6.61768 0.09022
5 0.59573 3.01473 0.44064 7.39286 0.13002
6 0.08365 3.81462 0.2504 8.38993 0.09269
7 1.09346 4.12335 1.26155 6.84204 0.09168
8 2.44028 3.06473 1.29751 5.71713 0.14456
9 2.00417 3.88380 0.88777 6.38234 0.14204
10 3.04236 3.33855 2.30271 5.49038 0.20080
Obviously, the estimated prices are not sensible; in particular, it is not likely that the cost
of an extra unit of land for a large plot could be negative in quarter 6!
Looking at the median price of a house over the 10 quarters in our sample, it was noted
earlier that the median price never fell over the sample period. This fact suggests that we
should impose this condition on all of our prices; i.e., we should set up a nonlinear
regression where the marginal prices of land never fall from quarter to quarter and where
the price of a square meter of a new structure also never falls. We will do this in the
following section and we will also impose a single depreciation rate over our sample
period, rather than allowing the depreciation rate to fluctuate from quarter to quarter.
4. Model 3: The Use of Monotonicity Restrictions on the Price of Land and
Structures
For the model to be described in this section, the data for all 10 quarters were run in one
big nonlinear regression. The equations that describe the model in quarter 1 are the same
as equations (3), (4) and (5) in the previous section except that the quarter one
depreciation rate parameter, 1, is replaced by the parameter , which will be used in all
subsequent quarters. For the remaining quarters, equations (3), (4) and (5) can still be
used except that the parameters St, M
t, L
t and
t are set equal to their quarter 1
counterparts plus a sum of squared parameters where one squared parameter is added
each period; i.e., St, M
t, L
t and
t are reparameterized as follows:
(6) St = S
1 + (S2)
2 + ... + (St)
2 ; t = 2,3,...,T;
(7) Mt = M
1 + (M2)
2 + ... + (Mt)
2 ; t = 2,3,...,T;
(8) Lt = L
1 + (L2)
2 + ... + (Lt)
2 ; t = 2,3,...,T;
(9) t =
1 + (2)
2 + ... + (t)
2 ; t = 2,3,...,T;
(10) t = ; t = 2,3,....T.
10
Thus our new parameters S2,...,St; M2,...,Mt; L2,...,Lt and 2,...,t and their squares
enter equations (6)-(9). It can be seen that this reparameterization will prevent the
marginal price of each type of land from falling and it will also impose monotonicity on
the price of structures.
The results of the above reparameterized model were as follows: the quarter 1 estimated
parameters were S1 = 0.56040 (0.24451), M
1 = 3.4684 (0.11304), L
1 = 0.33729
(0.04310), 1 = 6.2987 (0.39094) and = 0.11512 (0.006664), (standard errors in
brackets) with an R2 of .8439. Thus the overall decade depreciation rate was a very
reasonable 11.5% and the other parameters seemed to be reasonable in magnitude as well.
The only mild surprise was the fact that, at the beginning of the sample period, the
marginal valuation of land for small plots was 0.5604 while the marginal valuation for
medium plots was 3.4684 which was over 6 times as big. Thus small plots of land
suffered a discount in price per meter squared as compared to medium plots of land, at
least at the beginning of the sample period.14
Of the 36 squared parameters that pertain to
quarters 2 to 10, 23 were set equal to 0 by the nonlinear regression and only 13 were
nonzero with only 8 of these nonzero parameters having t statistics greater than 2. The
quarter by quarter values of the parameters St M
t L
t and
t defined by (6)-(9) are
reported in Table 4 below.
Table 4: Marginal Prices of Land for Small, Medium and Large Plots and New
Construction Prices by Quarter
Quarter St
t L
t
t
1 0.56040 3.46843 0.33729 6.29869
2 0.56040 3.46843 0.33729 6.42984
3 0.69803 3.46843 0.33729 6.42984
4 0.69803 3.46843 0.33729 6.72520
5 0.75139 3.46843 0.33729 6.80488
6 1.16953 3.46843 0.33729 6.80488
7 1.45453 3.62075 1.10353 6.80488
8 1.52233 3.62075 1.10353 6.80488
9 1.67159 3.62075 1.10353 6.80488
10 1.80029 3.62075 1.85418 6.80488
The above results look reasonable. The imputed price of new construction, t, was
approximately equal to 6.3 to 6.8 over the sample period (this translates into a price of
630 to 680 Euros per meter squared of structure floor space).15
The imputed value of land
14
This may not be a “genuine” effect; it is likely that the quality of construction is lower on small plots as
compared to the quality of medium and larger plots and since we are not taking this possibility into account
in our model, the lower average quality of structures on small plots may show up as a lower price of land
for small plots. We note also that by the end of the sample period, the difference in price was greatly
reduced. 15
Thus the imputed structures value of a new house with a floor space area of 125 meters squared would be
approximately 78,000 to 85,000 Euros.
11
for a small lot grew from 56 Euros per meter squared in the first quarter of 1998 to 180
Euros per meter squared in the second quarter of 2000. The imputed marginal value of
land16
for a lot size in the range of 200 to 400 meters squared grew very slowly from 347
Euros per meter squared to 362 Euros per meter squared over the same period. Finally,
the imputed marginal value of land17
for a lot size greater than 400 meters squared grew
very rapidly from 34 Euros per meter squared to 185 Euros per meter squared over the
sample period.
It is possible to work out the total imputed value of structures transacted in each quarter,
VSt, and divide this quarterly value by the total quantity of structures (converted into
equivalent new structure units), QSt, in order to obtain an average price of structures, PS
t.
Similarly, we can add up all of the imputed values for small, medium and large plot sizes
for each quarter t, say VLSt, VLM
t and VLL
t, and then add up the total quantity of land
transacted in each of the three classes of property, say QLSt, QLM
t and QLL
t. Finally, we
can form quarterly unit value prices for each of the three classes of property, PLSt, PLM
t
and PLLt, by dividing each value series by the corresponding quantity series. The resulting
price and quantity series are listed in Table 5 below.
Table 5: Average Prices for New Structures, Small, Medium and Large Plots and
Total Quantities Transacted per Quarter of Structures and the Three Types of Plot