Visualising large spatial databases and Building bespoke geodemographics Muhammad Adnan University College London
Dec 08, 2014
Visualising large spatial databases and Building bespoke geodemographics
Muhammad Adnan
University College London
About Me
• 2007 – 2009 • Worldnames (http://worldnames.publicprofiler.org)• Onomap (http://www.onomap.org)
• Nov. 2009 – Oct. 2011 (A KTP between UCL and Local Futures Group)
• LFG is a research and strategy consultancy
• Aim of the KTP was to device a better visualisation of the data
Data
• A database of 1600 indicators around 130 data sources
• Data sources cover social, economic, and environmental change in the UK
• The data is held at 8 spatial levels• Region, Sub region, District 2009, Nuts 3,
District (pre 2009), Ward, LSOA, OA
Visualisation of the data
• A ‘total place maps’ solution using different technologies (Video)
Base Layer Data
On the fly rendering of tiles
Programming in C# and ASP.NET
Data retrieval from database
Building Bespoke Geodemographics
Geodemographics
• “Analysis of people by where they live” or “locality marketing”
(Sleight, 1993:3)
HomeAddressPerson
Area
How a classification is created ?
Data – Census + Other
Experian: Mosaic
• Census data: 54%• Non-Census data: 46%
CACI: Accorn
• Census data: 30%• Non-Census data: 70%
ONS Output Area Classification
• Census data: 100%
How a classification is created ?
Segmentations are created by cluster analysis
Area V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 ...
Area1
Area2
Area3
Area4
Area5
Area6
Area7
Area8
...
Inputs…
How a classification is created ?
Variable 1
Variable 2
Cluster 1Cluster 2
Cluster 3
Cluster Analysis
K-means is used for clustering
How a classification is created ?
Output of Cluster Analysis
Area Cluster
Area1 1
Area2 1
Area3 2
Area4 1
Area5 3
Area6 3
Area7 3
Area8 2
...
Research Issues
• Optimisation of clustering algorithms• K-means• PAM (Partitioning Around Mediods)
• Open Tools ? • OACoder• GeodemCreator
• Bespoke local area classifications• UK’s open data initiative• ONS Neighbour Statistics API• UK’s police API• Barclays cycle hire API
Optimisation of Clustering Algorithms (K-Means)
K-means optimisation
0.46
0.47
0.48
0.49
0.5
0.51
0.52
0.53
0.54
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109 115 121 127 133 139 145
Run
RS
Q
n
x
n
yyxV z
1 1
2
)(
K-means (100 runs of k-means on OAC data set for k=4)
K-means (100 runs of k-means on OAC data set for k=4)
Run k-means multiple times (10,000 times) (Singleton & Longley, 2009)
CUDA & GPUs (Graphical Processing Units)
• Nvidia graphics cards have GPUs (Graphical Processing Units)• Can be used for parallel processing• Nvidia GeForce GT 420M (96 GPUs)• Latest Telsa graphics cards have 1000 GPUs
• CUDA (Computer United Device Architecture)• Parallel computing architecture• C and C++ can be used for programming
• A parallel implementation of k-means (Adnan & Longley, 2011)
K-means vs Parallel K-means
Could be useful for building geodemographics quickly in online environments
Open Tools for Geodemographics
Open Tools - OACoder
• Developed with Alex Singleton
• Assigns UK’s postcodes their corresponding OAC groups
• Download from
http://areaclassification.org.uk/
Open Tools – ‘GeodemCreator’
• Allows users to create their local area Geodemographic Classifications• Provides data available in the public domain (but users can use ancillary
data sources)
Open Tools – ‘GeodemCreator’
• Allows users to create their local area Geodemographic Classifications• Provides data available in the public domain (but users can use ancillary
data sources)
Will be available to download from http://publicprofiler.org
Spatially Weighted Geodemographics
Spatially Weighted Geodemographcis
• Geodemographic classifications do not account for spatial weights in the results
• A spatially weighted Geodemographic classification introduces spatial weights in addition to the socio-economic characteristics
• Tobler’s first law of geography• “Everything is related to everything else, but near things are more
related than distant things”
Spatially weighted Geodemographics
Step - 1: Construct a Neighbours Graph
Spatially weighted Geodemographics
Step - 1: Construct a Neighbours Graph
Spatially weighted Geodemographics
Step - 2: Apply Moran’s I to the data set
• It is a measure of spatial auto correlation
• Values of spatial auto-correlation range from -1 to 1
• A negative value represents a negative spatial auto-correlation
Spatially weighted Geodemographics
Step - 2: Apply Moran’s I to the data set
Spatially weighted Geodemographics
Area V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 VM
Area1
Area2
Area3
Area4
Area5
Area6
Area7
Area8
...
Moran’s I Result
Step - 3: Apply K-means
Spatially weighted Geodemographics
Result
• Open methods and tools for building geodemographics are important
• A testing of Spatial Weighted Geodemographics technique• On lower spatial levels
• I will be working on the new research grant of Paul Longley on “Uncertainty of Identity”• How behaviours of people in the real-world could be mapped with their
behaviours in the virtual world ?
• Could marketing strategies be devised for targeting online social networks and communities ?
Conclusion and future work
A quick illustration
http://worldnames.publicprofiler.org
• We have a record of 100,000 ‘IP Address’ entries for the last 6 months
A quick illustration
http://quova.com
An API to convert “IP addresses” to their corresponding latitude / longitude values
A quick illustration
A quick illustration
Any Questions ?
Adnan, M., Longley, P.A., Singleton, A.D., Brunsdon, C. (2010) Towards Real-Time Geodemographics: Clustering Algorithm Performance for Large Multidimensional Spatial Databases. Transactions in GIS, 14(3), 283 – 297. Hall, J.D., Hart, J.C. (2004). GPU acceleration of iterative clustering. In: ACM Workshop on General-Purpose Computing on Graphics Processors, p C-6Harris, R., Sleight, P., Webber, R. (2005). Geodemographics, GIS and Neighbourhood Targeting. Wiley, London. Reynolds, A.P., Richards, G., Rayward-Smith, V.J. (2004) The Application of K-Medoids and PAM to the Clustering of Rules. Lecture Notes in Computer Science. 3177/2004, 173-178. Singleton, A.D., Longley, P.A (2008). Creating open source geodemographic classifications for Higher Education applications. Papers in Regional Science, 88(3), 643-666. Takizawa, H., Kobayashi, H. (2006). Hierarchical parallel processing of large scale data clustering on a pc cluster with GPU co-processing. J. Supercomput.,36(3):219–234. Vickers, D.W. and Rees, P.H. (2007). Creating the National Statistics 2001 Output Area Classification. Journal of the Royal Statistical Society, Series A. 170(2), 379-403.
References