Performance: Not an Afterthought DrupalSouth 2015
Jul 14, 2015
Nick Santamaria• Senior Developer at Technocrat
• Acquia Certified Backend Developer
• drupal.org http://drupal.org/user/87915
• twitter @nicksanta
• Github nicksantamaria
Presentation Outline
• Introduction to performance & scalability
• Common problems
• Strategies for success
• Infrastructure design and considerations
• Debugging performance and scalability issues
• QA and discussion
Performance & Scalability
PerformanceThe speed with which a single request can be executed.
ScalabilityThe ability of a request to maintain its performance under increasing load.
What is Performance?
Back-end Performance Components• PHP
• Amount of code being executed (ie, number of modules)• Efficiency of code
• Database• Schema design• Query execution time
What is Performance?
Back-end Performance Components• API Requests
• PHP will wait until the request returns a result or times out
• Caching• Drupal database• Memcached / Redis / MongoDB• Varnish
What is Performance?
Front-end Performance Components• Network Overhead
• Local vs offshore datacenters• Number of requests
• Payload Size• Image optimisation• CSS / JS Minification• Markup size & compression
What is Performance?
Front-end Performance Components• Javascript
• Number of scripts being included• Synchronous vs asynchronous execution• Code efficiency
What is Scalability?
“Why is scalability so hard? Because scalability cannot be an after-thought.” - Werner Vogels, Amazon CTO
What is Scalability?
A system is said to be scalable if adding resources results in proportionally increased performance.
9 women can not make a baby in 1 month.
Will doubling your site’s server resources double the traffic it can handle?
What is Scalability?
Scalability Components• Caching
• Block cache• Page cache• Reverse proxy cache• Opcode caching
• Infrastructure• Web server load balancing• Database clustering• Caching backends - redis, memcached etc..
Common Problems
Too many modules - AKA “Open Buffet Syndrome”
Real life example
• 365 enabled modules• 24 core modules• 51 custom modules• 72 exported features
• 750 files loaded on every request.
• 10 - 20% of PHP execution time was loading files, even with APC.
• CPU cycles wasted - 25,000+ calls to module_implements() per request.
• Pages with product/* paths are NEVER cached.
• Anonymous users who visit this page bypass page cache on all subsequent pages.
• … AND those visitors write to the database on every subsequent page view.
Common Problems
Anonymous users with sessionsSeems innocent, but this one line has consequences.
Common Problems
Complicated entity & field architecture
● Slows down form submission, rendering, views, and more.
Strategies for Success
Complicated entity & field architecture• How many INSERT queries per save?
• node• node_revision• field_collection_item• field_collection_item_revision• field_data_field_collection_b• field_revision_field_collection_b• field_data_field_taxonomy_ref• field_revision_field_taxonomy_ref• field_data_field_collection_c• field_revision_field_collection_c• field_data_field_text• field_revision_field_text• file_managed• field_data_field_media• field_revision_field_media
Real world field collection implementations are FAR more complicated than this example!
Common Problems
Others• Never use views_php module - create custom views handlers and plugins.
• Complex faceted search using Drupal database - use Solr.
• dblog module enabled on production - use syslog.
• Carefully consider use of modules with node access functionality - they disable block caching.
Common Problems
Others• Never use views_php module - create custom views handlers and plugins.
• Complex faceted search using Drupal database - use Solr.
• dblog module enabled on production - use syslog.
• Carefully consider use of modules with node access functionality - they disable block caching.
On-Demand Cache Purging• Planning
• Divide the site into page “types”.• For each type, build a list of events which would require a page
to be cleared from cache.
• Considerations• No relative dates, ie “time ago”.• Some page types may be more suited to periodic caching.
• Create a spidering script to warm the caches!
• Extend to other caches using CacheTags - drupal.org/project/cachetags
Strategies for Success
Strategies for Success
Authcache (2.x branch)• Replaces Drupal’s default page caching allowing you to cache
authenticated pages.
• Huge scalability improvements for sites with a large proportion of authenticated visitors.
• But also much, much more.• Personalisation - authcache_p13n • Form token magic - authcache_form• Store page cache in Varnish - authcache_varnish• Integrates with Cache Expiration
Strategies for Success
Authcache• Planning
• Define which page types are cacheable.• Design how you will segment your visitors (from a cache
perspective).• Identify all personalised information which must be displayed.
• Considerations• Forms can be tricky - ensure you test thoroughly.• Ensure your analytics / marketing / tracking services are
compatible.
• See Commerce Kickstart for great out-of-the-box implementation.
Strategies for Success
Consuming Feeds & Web Services• Regularly importing data into Drupal can be resource intensive.
• Feeds, migrate, custom PHP etc… All share the same fundamental problems:• Fetching large datasets, which hog i/o, memory, and CPU
cycles.• Lots of slow INSERT and UPDATE operations on the database.• New data will not display immediately unless caches cleared.
• The solution? Move to the front end!
Strategies for Success
Consuming Feeds & Web Services• PaRSS - drupal.org/project/parss
• Integrates simple jQuery RSS parser with link fields.
• AngularJS - angularjs.org• Very powerful front-end MVC framework.• Usual implementation may not be suitable for this problem.• Angular Blocks - drupal.org/node/2445795
• Allows other modules to expose AngularJS apps as blocks!• Used successfully on recent intranet project, some pages
having 6 angular apps on a single page.
Strategies for Success
Load Testing• Make it part of your development process.
• Dont leave it to the last minute or post-launch.
• Tools• Apache jMeter
• github.com/jacobSingh/Drupal-Performance-Testing-Suite
• Blazemeter - blazemeter.com• Blitz - blitz.io• Web Page Test - webpagetest.org
Strategies for Success
Queues• Use queues when dealing with:
• Batch processing large datasets.• Performing complex calculations.• Sequential processing of tasks.
• Modules / Tools• Advanced Queue - drupal.org/project/advancedqueue• Advanced Queue Runner - github.com/nvahalik/advancedqueue-runner
• Drupal Core Queues - system.queue.inc
Strategies for Success
Queues• Improves reliability.
• If not using queues• There is no guarantee the process will be completed.• If the process fails, there is no easy way to repeat it.
• If using queues• Each item is executed at least once.• If the process fails, the queue remains intact.• System load is stabilised because processing of complex or
heavy operations is delayed.
Strategies for Success
Optimised Front-end• Image Sprites
• Minimises the number of HTTP requests.
• CSS • Think about what your sass / less becomes once compiled.
• How complex and specific do the selectors become?• Consider architecting your CSS for conditional inclusion.
• Does the site have “sections”?• CSS rendering is a blocking process.
Strategies for Success
Optimised Front-end• Asynchronous Javascript - drupal.org/project/async_js
• Defers javascript execution.• Can improve responsiveness of “sluggish” JS-heavy sites.
• Advanced Aggregation - drupal.org/project/advagg• Use CDN version of jQuery.• On-demand generation of aggregated assets.
Strategies for Success
Other Recommendations• Elysia Cron - drupal.org/project/elysia_cron
• Configure scheduling and frequency of specific cron tasks.• Run heavy cron tasks during low traffic periods.
• Entity Cache - drupal.org/project/entitycache• Stores complete entity objects in your caching backend.• Enable appropriate dependent modules such as
commerce_entitycache, bean_entitycache etc..
• Apache Solr for search• drupal.org/project/search_api_solr• drupal.org/project/apachesolr
Infrastructure
Caching Backends• Memcached - drupal.org/project/memcache
• Battle tested.• Widely deployed.• Volatile storage - not suitable for persistent data.
• Redis - drupal.org/project/redis• Less “mature” than Memcached.• 1:1 featureset with Memcached.• Benchmarks slightly better than Memcached.• Commits data to disk by default, can be used for persistent data• Use PHP extension - github.com/phpredis/phpredis (not Predis class)
Infrastructure
Caching Backends• I recommend Redis
• Store sessions in Redis rather than the databaseSession Proxy - drupal.org/project/session_proxy
• Form cache can go straight into redis - no more need for this line:$conf['cache_class_cache_form'] = 'DrupalDatabaseCache';
Infrastructure
Simplest Approach• Single server with all components
• PHP• Web Server (Apache)• Database (MySQL)• Varnish (... sometimes)
Varnish
Apache
PHP
MySQL
Instance #1
Infrastructure
Scaling Vertically
• Increase instance size.• Change instance types:
• CPU optimised• Memory optimised• I/O optimised
• Will hit an endpoint eventually.
“We’re going to need a bigger box”
Infrastructure
Splitting the box
Break up stack components onto separate servers.
Instance #2
Varnish
Apache
PHP
MySQL
Instance #1
Infrastructure
Horizontally Scalable Infrastructure• Overcomes CPU ceiling issues.
• Considerations• Load balanced web servers• Database clustering• Shared / clustered file
systems
• Autoscaling - the holy grail.
Load Balancer
Varnish
Apache
PHP
MySQLRedis
Apache
PHP
Apache
PHP
Apache
PHP
Debugging Performance and Scalability Issues
Tools• New Relic APM, browser & server monitoring
• MySQL slow query log• Add following lines to my.cnf and restart mysql
• log_slow_queries=/var/log/mysql/slow-query.log• long_query_time=20
• XHProf - PHP profiler• Great slides for getting set up here - http://msonnabaum.github.io/xhprof-
presentation/
• Browser Developer Tools• Javascript profiler• Network Monitor
Debugging Performance and Scalability Issues
General Tips• Look beyond the symptoms to find the underlying cause.
• Change one thing at a time.
• Measure, change, measure.
• Sometimes you just have to throw more RAM at the problem.