Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods Julie Sungsoon Hwang Department of Geography, University of Washington Jean-Claude Thill Department of Geography, State University of New York at Buffalo November 10, 2005 North American Meetings of Regional Science Association International
23
Embed
Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods Julie Sungsoon Hwang Department of Geography, University of Washington Jean-Claude.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods
Julie Sungsoon HwangDepartment of Geography, University of Washington
Jean-Claude ThillDepartment of Geography, State University of New York at Buffalo
November 10, 2005North American Meetings of Regional Science Association International
Outlines
• Research objectives
• Methodology: specification
• Methodology: illustration
• Evaluating the performance of fuzzy clustering
• Conclusions
Research objectives
• Demonstrate the use of fuzzy c-means (FCM) algorithm for delineating housing submarkets– Comparison to K-means
• Discuss empirical characteristics of FCM applied to given applications, in particular choice of parameters– Cluster validity index
Challenges
• Are the boundaries of clusters crisp?
Cluster A
Cluster C
X1
X2
Housing market in metropolitan area q
Cluster B
Cluster A
Cluster B Cluster C
X1
X2
Housing market in metropolitan area p
Methodology: specification
• Our task is to group census tracts to homogeneous housing submarkets within a metropolitan area
• Using fuzzy c-means algorithm• In order to examine whether fuzzy set-based
clustering can do the better job• Implemented in 85 metropolitan areas• Most of data set are public (e.g. 2000 Census)• The whole procedure is automated in GIS
Methodology: flow chart
National
Regional
Local…Census Tract Layer
# x1 x2 x3 … xm
1
2
3
…
n
# y1 y2 … yk
1
2
3
…
n
Cluster Analysis# U1 U2 … Uc
1 1 0 … 0
2 0 1 … 0
… 0 1 … 0
n 0 0 … 1
# U1 U2 … Uc
1 0.85 0.05 … 0.10
2 0.12 0.80 .. 0.05
… 0.02 0.74 … 0.12
n 0.40 0.03 … 0.50
K-means
Fuzzy Fuzzy CC--meansmeans
Candidate variables
Significant variables
Stepwise regression (k ≤ m)
Metro
Hard Cluster Layer
(c ≤ n)
Fuzzy Cluster Layer
…1
2
c
k: # selected variables
c: # submarkets
For each metropolitan area
Uj: membership to cluster j
Explanatory variables for house priceVar_Name Variable Definition Data Year Spatial Unit
Socioeconomic/demographic Characteristics of Residents
pcincome per capita income Census 2000 Census Tract
college % college degree Census 2000 Census Tract
managep % management workers Census 2000 Census Tract
prodp % production workers Census 2000 Census Tract
famcpchl % family with children Census 2000 Census Tract
nfmalone % nonfamily living alone Census 2000 Census Tract
black_p % black Census 2000 Census Tract
nhwht_p % non-hispanic white Census 2000 Census Tract
nativebr % native born Census 2000 Census Tract
Structural Characteristics of Housing Units
medroom median number of room Census 2000 Census Tract
hudetp % detached housing unit Census 2000 Census Tract
yrhublt median year structure built Census 2000 Census Tract
Locational Characteristics (Amenities) of Neighborhoods
ptratio pupil to teacher ratio NCES* 2002 School District
schexp school expenditure per student NCES 2002 School District
vrlcrime violent crime rate FBI** 2003 Designated Place
prpcrime property crime rate FBI 2003 Designated Place
jobacm job accessibility (Hansen 1959) CTPP*** 2000 Census Tract
*National Center for Education Statistics; **FBI annual report “Crime in the U.S. 2003”; *** CTPP: Census Transportation Planning Package Dependent variables: median home value of owner-occupied housing units
Metropolitan AreasCMSAMSA
State
300 0 300 600 Miles
N
Source: TIGER/Line 1999
Metropolitan AreasCMSAMSA
StateStudy Set
300 0 300 600 Miles
N
Source: TIGER/Line 1999
Study set: 85 metropolitan areas
kx
iv
• Clustering method that minimizes the following objective function:
• Updates cluster means vi and membership degree uik until the algorithm converges
ikum
2
1 1
( )n c
mik k i A
k i
u x v
Vectors of data point, 1 ≤ k ≤ n
Center of cluster i, 1 ≤ i ≤ c
Membership degree of data point k with cluster i; [0,1]
Fuzziness amount associated with assigning data point k to cluster i, 1≤ m ≤ ∞
1 1
n nm m
i ik k ikk k
v u x u
12/( 1)
1
mc
k iik
j k j
x vu
x v
Source: Bezdek 1981
#
#
#
#
#
#
#
#
#
#
#
#
#
#
####
#
#
#
#
#
#
#
##
#
#
#
#
# #
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
# #
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
##
#
#
#
#
#
#
#
##
#
#
##
# #
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
##
#
#
##
#
#
#
#
##
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
x1
x2
What is fuzzy c-means (FCM)?
(III-3a) (III-3b)
FCM: missing elements
• Optimal number of clusters c*
• Optimal fuzziness amount m*
mc
FCM
Extended fuzzy c-means algorithm
• Step 1: Initialize the parameters related to fuzzy partitioning: c = 2 (2 ≤ c cmax), m = 1 (1 ≤ m mmax), where c is an integer, m is a real number; Fix minc where minc is incremental value of m ( 0 < minc ≤ 0.1); Fix cut-off threshold L; Choose validity index v
• Step 2: Given c and m, initialize U(0) so that it becomes the fuzzy matrix. Then at step l, l = 0, 1, 2, ….;
• Step 3: Calculate the c fuzzy cluster centers {vi(l)} with (III-3a) and U(l)• Step 4: Update U(l+1) using (III-3b) and {vi(l)}• Step 5: Compare U(l) to U(l+1) in a convenient matrix norm; if || U(l+1) – U(l) || ≤ L to
go step 6; otherwise return to Step 3.• Step 6: Compute the validity index for given c and m• Step 7: If c < cmax, then increase c c + 1 and go to step 3; otherwise go to step 8• Step 8: If m < mmax, then increase m m + minc and go to step 3; otherwise go to
step 9• Step 9: Obtain the optimal validity index from , optimal number of clusters c*, and
optimal amount of fuzziness exponent m*; The optimal fuzzy partition U is obtained given c* and m*
Cluster validity indices
2
1 1
( )( )
c n
iki k
uPC U
n
Partition coefficient
21 1
[ log ( )]( )
c c
ik iki k
u uPE U
n
Partition entropy
22
1 12
,
( )
min
n c
ik k i Ak i
XB
i j i j
u x vU
n v v
Xie-Beni index
2
1
1
11 1
2(2 ) /
1 1
( )
( )
nm
ik k ic Ak
ni
ikk
VI c cw w
ij j i Ai j
u x v
uS
z z
1
1
1ij w
cj i A
l j l Al j
z z
z z
1 2 1 1 2[ , ,...., , ] [ , ,...., , ]
1 1,1 1,
T Tc c cz z z z v v v x
i c j c j i
SVi indexwhere w is set to 2 in this study
• Selected validity indices are calibrated over the study set
Xie-Beni index is recommended as a validity indexAverage m* is 1.38
0
0.2
0.4
0.6
0.8
1
1.2
1.4
2 3 4 5 6 7 8 9 10 11 12 13 14 15
Number of clusters c
Ind
ex
va
lue UXB
PC
PE
SVI/100
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9
Fuzziness amount mIn
dex
val
ue
UXB
SVI/100
Determining c* and m*
Histogram of m* for FCM
Methodology: illustration
Median home value of Buffalo, NY
Dimensionality of Buffalo housing market
Predictor Coefficient Standard Error t-statistics p-value
Constant -1455768 164417 -8.85 0.000
Per capita income 2.3667 0.2791 8.48 0.000
% college degree 88221 11346 7.78 0.000
% family: couple with children 65735 18775 3.50 0.001
% detached housing unit -31260 5527 -5.66 0.000
Housing age (year) 692.88 80.26 8.63 0.000
% non-hispanic white 11186 3914 2.86 0.005
% native born status 130039 31111 4.18 0.000
Job accessibility -0.05266 0.02227 -2.36 0.019
Hedonic regression equation of median home value in Buffalo, NY
Adjusted R sq = 84.3%
Optimal number of housing submarkets c*, Optimal fuzziness amount m*, Buffalo, NY