By Adam Brodziak Global Sports Media b.v. Scalable architecture
Jan 15, 2015
By Adam BrodziakGlobal Sports Media b.v.
Scalable architecture
Abstract
Adam Brodziak
An overview of modern web-based application architecture - from hardware infrastructure, through PHP/SQL code, HTML/CSS markup distribution. All of this spiced up by cache, loadbalancing and CDN.
Who is this guy?
Lead developer at Global Sports Media GSM collects and process sports data GSM owns soccerway.com portal
Linux user Interested in frameworks, design patterns Semantic Web enthousiast Football (soccer) fan
Topics
The Challenge Infrastructure Code Cache CDN
Topics
The Challenge Infrastructure Code Cache CDN
Raw numbers
7 millions visits / month 52 millions pageviews / month 1 billion request / month 6TB of traffic / month 300k users at peak time Quite a few clients using the same hardware
Not so much, but...
700 leagues Livescores Game events Match statistics Rankings Editorials
Traffic growth
The Challenge
Loads of data to process Scores Events Stats
In real-time (livescores) Growing number of visitors 13K hits/sec at peak-time
10 servers to run it all
Topics
The Challenge Infrastructure Code Cache CDN
It starts with one
Load balancing
Loadbalancing caveats
Don't relay on the local filesystem Temporary files, session, logs
Avoid assuming exclusive/single cache APC, Zend Cache
Use distributed session storage Memcache, database
Encalsulate above
Separate database server
DB replication
Replicaton caveats
Writes only on master Reads from slaves Data consistency Replication lag
Don't do
$master->query('UPDATE session SET logged = 1');$slave->query('SELECT logged FROM session');
Whole image
Topics
The Challenge Infrastructure Code Cache CDN
PHP is slow!
Yes, but it does not matter! Database access is slower Cache over network is slower Disk access is slower HTTP requests are slower Webservice calls are slower Discover bottlenecks before blaming PHP
It's about architecture
Heavy tasks in background CRON, Gearman
Pregenerate stuff Move some code to SQL
Calculations in queries Stored procedures Triggers
C/C++ or Java for heavy computation Use PHP to glue it together
PHP Frameworks
Hundreds of others Which one to choose?
Framework? Think again!
Raw performance matters Support for master-slave replication Multiple layers of cache Working with accelerators (HipHop!) Beware of bottlenecks
i.e. core part of framework is slow
Designed to scale
Topics
The Challenge Infrastructure Code Cache CDN
Cache is everywhere
CPU: L1, L2 Disk buffer Linux filesystem MySQL PHP (APC) Smarty HTTP Proxy Browser cache
Where to cache?
Memory is cheap
Pre-generate stuff Store results in memory
APC, memcached
App config in memory APC with stat=off
Increase RAM for MySQL Disk is the new tape
Memcached for the rescue!
Dead simple Key-value Distributed storage pool Automatic invalidation after X sec
No garbage collecting invoked
Store arrays, objects, simple values Easy integration
Topics
The Challenge Infrastructure Code Cache CDN
Reverse-proxy
First line of cache Returns content if resource is up-to-date Works on HTTP level
Can be integrated into existing infrastructure
Can do load balancing In-memory cache storage Squid, Nginx, Varnish
Content Delivery Network
Network of servers Worldwide Automatic loadbalancing Fast access (low ping time) Data redundancy gratis Ideal for static resources
But not only
Must-have for worldwide websites
CDN as reverse-proxy
HTTP request / response chain Embraces REST architecture Requests are distributed Reduces latency Lowers traffic volume Increases availability i.e. Akamai Edge Suite
CDN at soccerway.com
All of the content is served via CDN Images, CSS, JS Generated HTML JSON for Ajax
90% of traffic via CDN Origin requests only from Europe Site online even if servers are down Can't live without ;)
Thank you for listening
Questions?
Interested?
Contact me: [email protected] www.goldenline.pl/adam-brodziak www.linkedin.com/in/adambrodziak
We're hiring! Web developers Football / sport fans