High Performance Geoserver ClustersDerek Kern - Ubisense, Inc
1
What this talk is about• I want to walk you through the reasoning and process
involved in scaling and clustering GeoServer / GeoWebCache
• Why would you begin scaling and clustering?• While scaling and clustering, what do you need to
consider?
2
What this talk is *not* about• This talk is *not* about tuning an individual GeoServer
instance• This is an important and complicated topic• However, it has already been covered quite well at
numerous previous Foss4g gatherings– Google for the talk entitled: GeoServer on steroids
• We will discuss GeoServer parameters only insofar as they are needed for scaling and clustering
3
We use GeoServer a lot
4
We use GeoServer a lot
5
Why scaling and clustering?• So, what events would initiate the scaling / clustering
process?– Poor application performance– GeoServer machine resources being exhausted– Onboarding new users– Onboarding new feature layers / layer groups– Onboarding new spatial applications– Onboarding new spatial data sources
6
Why scaling and clustering?• So, what events would initiate the scaling / clustering
process? (cont’d)– Changing the scales at which layers are being rendered– Others
• At one customer site, we are using GeoServer / GeoWebCache, nightly, to construct SQLite tilestores that are distributed to offline users
7
Why scaling and clustering?• These events are relevant to performance insofar as
they relate to the following factors affecting performance:– Number of users, i.e. tile requests (duh)– Hardware capacity (GeoServer and/or Database)– Network capacity– Database structure– Feature density
8
Zoom in - Database structure• The structure of tables, tablespaces, etc can affect the
rate at which data can be queried and rendered onto tiles
• Example #1: If a feature table is large enough, then missing spatial index could dramatically slow the rendering process
• Example #2: In PostgreSQL, a table needing vacuuming might be enough to slow the rendering process
9
Zoom in - Feature density• The density of features per tile can greatly affect
performance• This offers strong incentive to be very careful when
choosing the scales at which to display features• I’ve witnessed poor choices bring an entire GeoServer
cluster to its knees
10
Arch #1 - The Starting Point
11
This is the portrait
of simplicity
Arch #1 - The Starting Point• This is the starting point for many geographically-
enabled web applications• There is a single, generic application server (Django,
Ruby on Rails, etc)• There is a single database server (PostgreSQL,
MySQL, Oracle, etc)• There is single GeoServer and it is using its bundled
GeoWebCache for caching
12
Quick Note - Scaling In or Out?• GeoServer can obviously be scaled across machines• However, it can also be scaled within a machine, i.e.
multiple GeoServer instances can run on different ports on a single machine
• Let’s call the former “scaling out” and the latter “scaling in”
• Most of this talk is structured around scaling out, but is equally applicable to scaling in
• “GeoServer on steroids” has some content on scaling in13
Arch #2 - Obvious next step
14
Arch #2 - Obvious next step• We’ve simply added another GeoServer instance• This architecture, theoretically, has double the capacity
as #1• It is also a very easy step to make• However, it has problems
– How is traffic to be balanced between the two servers?• Traffic management must be dealt with, somehow, by the
application server
15
Arch #2 - Obvious next step• However, it has problems (cont’d)
– Configuration data is not shared so configuration changes must be made twice
• *** Assuming the instances are serving the same layers– Tiles are being cached twice
• Duplication of effort• Managing expired tiles is now doubly difficult
• Aside: Handling expired tiles– GeoRSS– Bulk layer cache clearing ✭– Targeted layer cache clearing ✭
16
★ Careful: GeoServer disk quota processing can cause problems when tiles are cleared by means of the OS, i.e. ‘rm’. It should be disabled when clearing using ‘rm’
Aside - Tile lifecycle spectrumHigh Entropy Content
|| || || || Real-time Near real-time Near daily DailyΔ: x<4m Δ: 4m≤x<3h Δ: 3h≤x<1d Δ: 1d≤x<3d
Outage status Device status Veggie mgmt CustomersScada status Trouble calls Construction status As builtVehicle position
17
Aside - Tile lifecycle spectrumLow Entropy Content
|| || || || Weekly Monthly Yearly NeverΔ: 3d≤x<10d Δ: 10d≤x<3ms Δ: 3ms≤x<2y Δ: N/A
Legacy as built Roads Rail State boundariesCustomers Parcels City/county boundaries Water features
18
Aside - Tile lifecycle spectrum• Most applications have content (being rendered onto
tiles) that fall all over the lifecycle spectrum• The appropriate GeoServer / GeoWebCache
architecture will ultimately be driven by factors that include:– The lifecycle of the tiles being served– The amount of data being served– The amount of time needed to render tiles– The number of users requesting GeoServer / GeoWebCache
tiles19
Arch #2 - Obvious next step• Example
– If we balance the load by hits, then one GeoServer would serve the ‘NHealth’ layer and the other GeoServer would serve all other layers
– Given the considerations already covered, would this be an equitable balance?
– The answer: Not necessarily
20
Layer Name Hit% Refresh CycleNHealth 50% DailyStatus of Accounts 18% 15 minsDevices 8% DailyActives 8% 1 hourCables 6% 30 minsProblem Accounts 4% DailyTransmit 3% DailyTickets 1% 20 minsOutage Nodes 1% 10 minsRegion 1% DailyFacility 1% Daily
Total 100%
Arch #2 - Obvious next step• Example (cont’d)
– The ‘Account Status’ layer is refreshed every 15 minutes so, depending upon how many tiles are expired during each cycle, the tile cache might be less effective
– In order to strike an equitable balance, the statistic we want is:
• Total hits * Average tile output time• This statistic is, essentially, total tile
output time (TTOT)– As it turns out, balancing layer TTOT is
difficult, if architecture is not considered21
Layer Name Hit% Refresh CycleNHealth 50% DailyAccount Status 18% 15 minsDevices 8% DailyActives 8% 1 hourCables 6% 30 minsProblem Accounts 4% DailyTransmit 3% DailyTickets 1% 20 minsOutage Nodes 1% 10 minsRegion 1% DailyFacility 1% Daily
Total 100%
Arch #3 - A Little Better
22
Arch #3 - A little better• We’ve added a load balancer to mediate traffic between
the GeoServer instances• The application server will point clients to the load
balancer when tiles are needed• While the theoretical capacity hasn’t changed, this
architecture is better able to exploit that capacity
23
Arch #3 - A little better• The load balancer can be hardware or software-based
– Examples• mod_proxy_balancer• NgiNX• BigIP• Barracuda
• This architecture still has problems– Configuration data is still not shared so configuration changes
must be made twice *** Assuming the instances are serving the same layers
24
Arch #3 - A little better• This architecture still has problems (cont’d)
– Tiles are still being cached twice• There is still a duplication of effort• Managing expired tiles is still doubly difficult
25
Arch #4 - Almost there
26
Arch #4 - Almost there• We’ve made a minor change in the storage of
configuration data• Configuration data is now being stored in one location
and shared amongst GeoServers via NFS• One of the GeoServer instances should be designated
as the writer. Configuration changes will be handled by the writer. All other GeoServer instances will be readers
27
Arch #4 - Almost there• Rather than NFS*, configuration data can also be
shared using rsync• The web administration interface should be disabled for
the reader instances– Add -DGEOSERVER_CONSOLE_DISABLED=true to the Tomcat
startup command line– From WEB-INF/lib, delete files matching gs-web*-.jar files
28
* For those who know Linux well, we have chosen to mount the configuration data using autofs, not /etc/fstab .
Arch #4 - Almost there• Again, this architecture has problems
– Tiles are still being cached twice• There is still a duplication of effort• Managing expired tiles is still doubly difficult
– Configuration data is now shared. However, each time the configuration data is changed by the writer, each GeoServer instance must be instructed to re-read its configuration data
• Luckily, this problem is solvable
29
Arch #4 - Almost there• Instructing GeoServers to reread their configuration data
30
#!/bin/bash
# Get the Geoserver server hostname prefixGSRVR_HNAME_PREFIX=$(hostname | rev | cut -c 3- | rev)
# Loop over possible Geoserver server hostnames.for i in {1..20}do # Assembled the next possible Geoserver server hostname GSRVR_HNAME="${GSRVR_HNAME_PREFIX}$(printf %02d ${i})"
# See if the machine exists on the network PING_TEST=$(ping -c 2 -W 2 ${GSRVR_HNAME} &> /dev/null ; echo $?) if [ "${PING_TEST}" -eq 0 ] then
# The server exists so send the reload commandcurl -u admin:geoserver -X POST -d "reload_configuration=1" "http://${GSRVR_HNAME}:8080/geoserver/rest/reload"
echo "Reloaded configurations on ${GSRVR_HNAME}" fidone
Arch #5 - Cooking with grease
31
Arch #5 - Cooking with grease• We’ve put a GeoWebCache instance in front the
GeoServer instances. It is now responsible for caching tiles. GeoServers are now just tile generators
• We have a single, unified cache• GeoWebCache uses the load balancer to determine
which GeoServer instance will generate the tile that it needs
• This architecture is now poised to exploit the maximum amount of tiling capacity from the GeoServer instances
32
Arch #5 - Cooking with grease• This architecture has two minor problems
– GeoWebCache has its own configuration data that must be maintained. Furthermore, this configuration data is dependent upon the configuration of the GeoServer instances. Again, it looks like we are back in the position of having to make configuration changes twice
– GeoWebCache caches must be cleared for GeoServer layers whose configuration data has changed
– Both of these problems are solvable
33
Arch #5 - Cooking with grease• Re-writing GeoWebCache config from GeoServer config
34
#!/bin/bash
# Set the start tagNEW_LAYERS_XML=" <layers>\n"
for a_layer_xml in $(find ${GEOSERVER_DATA_DIR} -name '*.xml' -exec grep -l -i -E "<layer>|<layerGroup>" {} \;)do # Get the layer name from the file layer_name=$(sed -n '/name/{s/.*<name>//;s/<\/name.*//;p;}' ${a_layer_xml}) if [ "${layer_name}" = "" ] then
continue fi
# Get the workspace name from the path workspace_name=$(echo ${a_layer_xml} | grep -Po '(?<=(workspaces/))\w+(?=(/))') if [ "${workspace_name}" != "" ] then
layer_name="${workspace_name}:${layer_name}" fi
Arch #5 - Cooking with grease
35
# Add the layer def NEW_LAYERS_XML="${NEW_LAYERS_XML} <wmsLayer>\n" NEW_LAYERS_XML="${NEW_LAYERS_XML} <name>${layer_name}</name>\n" NEW_LAYERS_XML="${NEW_LAYERS_XML} <wmsUrl><string>http://$(hostname)/geoserver/wms</string></wmsUrl>\n" NEW_LAYERS_XML="${NEW_LAYERS_XML} </wmsLayer>\n"done
# Set the end tagNEW_LAYERS_XML="${NEW_LAYERS_XML} </layers>\n"
# Put the newly generated layer definitions into the GeoWebCache configuration
# Use ElementTree to write the new layers XML to the geowebcache configuration.echo "Writing layers taken from Geoserver configuration to the GeoWebCache configuration"${PYTHONHOME}/bin/python << EOFimport xml.etree.ElementTree as ET
# Read in the current GeoWebCache configurationcgeowebcache = ET.parse( "${GEOWEBCACHE_CACHE_DIR}/geowebcache.xml" )
# Get the namespace from the Geoserver docnamespace = cgeowebcache.getroot().tag.split( '}' )[0].strip( '{' )
# Register the namespaceET.register_namespace( "", namespace )
Arch #5 - Cooking with grease
36
# Build an element to contain the new layers XMLnewlayers = ET.fromstring( "${NEW_LAYERS_XML}" )
# Get the old layers so they can be removedoldlayers = cgeowebcache.find( "{" + namespace + "}layers" )
# Remove the old layerscgeowebcache.getroot().remove( oldlayers )
# Add the new layerscgeowebcache.getroot().append( newlayers )
# Write out the new GeoWebCache XMLcgeowebcache.write( "${GEOWEBCACHE_CACHE_DIR}/geowebcache.xml", encoding="utf-8", xml_declaration=True )EOF
# Finally, tell GeoWebCache to reread its layersecho "Forcing GeoWebCache to reread its configuration"curl -s -u geowebcache:secured -d "reload_configuration=1" http://localhost:8080/geowebcache/rest/reload > /dev/null
Arch #5 - Cooking with grease
37
# Clear the caches for any layers whose definitions have changed in the last 2 hoursfor a_layer_xml in $(find ${GEOSERVER_DATA_DIR} -name '*.xml' -mtime -2 -exec grep -l -i -E "<layer>|<layerGroup>" {} \;)do # Get the layer name from the file layer_cache_directory_name=$(sed -n '/name/{s/.*<name>//;s/<\/name.*//;p;}' ${a_layer_xml}) if [ "${layer_cache_directory_name}" = "" ] then continue fi
# Get the workspace name from the path workspace_name=$(echo ${a_layer_xml} | grep -Po '(?<=(workspaces/))\w+(?=(/))')
if [ "${workspace_name}" != "" ] then layer_cache_directory_name="${workspace_name}_${layer_cache_directory_name}" fi # Now, clear the cache associated with the layer echo "Clearing cache directory ${GEOWEBCACHE_CACHE_DIR}/${layer_cache_directory_name}" rm -rf ${GEOWEBCACHE_CACHE_DIR}/${layer_cache_directory_name}done
Other thoughts on caching• Block Size
– File system block size on GeoWebCache server(s) can be very important
– The default block size for RedHat ext4 is 4K– Very often, raster tiles can be less than 4K in size. Sometimes
less than 2K– If the file system block size is too large, then GeoWebCache
can prematurely exhaust its disk space– Note, however, setting block size too small can adversely affect
performance– This is clearly a balancing act
38
Scaling and clustering obstacles• The capacity of the database server will circumscribe
the capacity of the cluster• Poor configuration• Poor usage (e.g. reading dense layers at high scales)• Network capacity
39
A little benchmarking• I did some very simple benchmarking in order to give you
some idea of scaling• I had access to three machines for benchmarking
– (1) My desktop• Linux Mint 17 Qiana (Ubuntu-based)• AMD FX-8150 Eight-Core Processor 3.7 GHz• 16Gb RAM
– (2) Out-dated laptop• CentOS 6.7 (Redhat-based)• Intel i7-2760QM 2.4 GHz• 8Gb RAM
40
A little benchmarking• Machines for benchmarking
– (3) Really ancient laptop• CentOS 6.7 (Redhat-based)• Intel i5-2530M 2.5 GHz• 8Gb RAM
• GeoWebCache wasn’t used as part of the benchmark. The benchmark is meant to measure the amount of processing power being added
41
A little benchmarking• Benchmark configurations
– 1 GeoServers• Machine (1) running: PostgreSQL 9.5, Apache 2.4• Machine (2) running: 1 GeoServer/Tomcat container
– 2 GeoServers• Machine (1) running: PostgreSQL 9.5, Apache 2.4• Machine (2) running: 1 GeoServer/Tomcat container• Machine (3) running: 1 GeoServer/Tomcat container
– 3 GeoServers• Machine (1) running: PostgreSQL 9.5, Apache 2.4• Machine (2) running: 2 GeoServer/Tomcat containers• Machine (3) running: 1 GeoServer/Tomcat container
42
A little benchmarking• Benchmark configurations (cont’d)
– 4 GeoServers• Machine (1) running: PostgreSQL 9.5, Apache 2.4• Machine (2) running: 2 GeoServer/Tomcat containers• Machine (3) running: 2 GeoServer/Tomcat containers
– 5 GeoServers• Machine (1) running: PostgreSQL 9.5, Apache 2.4• Machine (2) running: 3 GeoServer/Tomcat containers• Machine (3) running: 2 GeoServer/Tomcat containers
43
A little benchmarking - results• The performance jump from
1 to 2 GeoServers is substantial
• The performance jump from 2 to 3 GeoServers is less substantial. This is likely due to hardware limitations
• The performance slumps from 4 to 5 GeoServers. At this point, we’ve probably overloaded the hardware. Remember, at this point, the outdated laptop has 3 GeoServer containers
44
45
?
FIND OUT MOREContact Your NameYour TitleEmail: [email protected] Line: +44 (0)1223 Insert No.
FIND OUT MOREDerek KernPrincipal ArchitectEmail: [email protected]
www.ubisense.net
Thank you!
46