NGINX High Availability and Monitoring Introduced by Andrew Alexeev Presented by Owen Garrett Nginx, Inc.
Jul 16, 2015
NGINX High Availability and Monitoring
Introduced by Andrew Alexeev
Presented by Owen Garrett
Nginx, Inc.
About this webinar
No one likes a broken website. Learn about some of the techniques that NGINX
users employ to ensure that server failures are detected and worked around, so that
you too can build large-scale, highly-available web services.
The cost of downtime
The causes of downtime
“ Through 2015, 80% of outages impacting mission-
critical services will be caused by people and process issues, and more than 50% of those outages will be caused by change/configuration/release integration and hand-off issues. ”
Configuration Management for Virtual and Cloud Infrastructures
Ronni J. Colville and George Spafford, Gartner
Hardware failures, disasters
People and Process
INTRODUCING NGINX…
What is NGINX?
Internet
N
Web ServerServe content from disk
Application ServerFastCGI, uWSGI, Passenger…
ProxyCaching, Load Balancing… HTTP traffic
Application Acceleration
SSL and SPDY termination
Performance Monitoring
High Availability
Advanced Features: Bandwidth Management
Content-based Routing
Request Manipulation
Response Rewriting
Authentication
Video Delivery
Mail Proxy
GeoLocation
143,000,000Websites
NGINX Accelerates
22%Top 1 million websites
37%Top 1,000 websites
NGINX and NGINX Plus
NGINX F/OSS
nginx.org
3rd party modules
Large community of >100 modules
NGINX and NGINX Plus
NGINX F/OSS
nginx.org
3rd party modules
Large community of >100 modules
NGINX Plus
Advanced load balancing featuresEase-of-managementCommercial support
IMPROVING AVAILABILITY WITH NGINX
Quick review of load balancingserver {
listen 80;
location / {
proxy_pass http://backend;
}
}
upstream backend {
server webserver1:80;
server webserver2:80;
server webserver3:80;
server webserver4:80;
}
Internet
N
Three NGINX Techniques for High Availability
NGINX: Basic Error Checks
NGINX Plus: Advanced Health Checks
Live software upgrades
1
2
3
1. Basic Error Checks
• Monitor transactions as they happen
– Retry transactions that ‘fail’ where possible
– Mark failed servers as dead
Basic Error Checksserver {
listen 80;
location / {
proxy_pass http://backend;
proxy_next_upstream error timeout; # http_503..., off
}
}
upstream backend {
server webserver1:80 max_fails=1 fail_timeout=10s;
server webserver2:80 max_fails=1 fail_timeout=10s;
server webserver3:80 max_fails=1 fail_timeout=10s;
server webserver4:80 max_fails=1 fail_timeout=10s;
}
More sophisticated retriesserver {
listen 80;
location / {
# On error/timeout, try the upstream group one more time
error_page 502 504 = @fallback;
proxy_pass http://backend;
proxy_next_upstream off;
}
location @fallback {
proxy_pass http://backend;
proxy_next_upstream off;
}
}
2. Advanced Health Checks
• “Synthetic Transactions”
– Probes server health
– Complex, custom tests are possible
– Available in NGINX Plus
Advanced Health Checksserver {
listen 80;
location / {
proxy_pass http://backend;
health_check;
}
}
upstream backend {
zone backend 64k;
server webserver1:80;
server webserver2:80;
server webserver3:80;
server webserver4:80;
}
health_check:interval = period between checksfails = failure count before deadpasses = pass count before aliveuri = custom URI
Default:5 seconds, 1 fail, 1 pass, uri = /
Advanced usageserver {
listen 80;
location / {
proxy_pass http://backend;
health_check uri=/test.php match=statusok;
proxy_set_header Host www.foo.com;
}
}
match statusok {
# Used for /test.php health check
status 200;
header Content-Type = text/html;
body ~ "Server[0-9]+ is alive";
}
Health checks inherit all parameters from location block.
match blocks define the success criteria for a health check
Edge cases – variables in configurationserver {
location / {
proxy_pass http://backend;
health_check;
proxy_set_header Host $host;
}
}
This may not work as expected.
Remember – the health_checktests run in the context of the enclosing location.
Edge cases – variables in configurationserver {
location / {
proxy_pass http://backend;
health_check;
proxy_set_header Host $host;
}
}
server {
location /internal-check {
internal;
proxy_pass http://backend;
health_check;
proxy_set_header Host www.foo.com;
}
}
This may not work as expected.
Remember – the health_checktests run in the context of the enclosing location.
This is the common alternative.
Use a custom URI for the location.Tag the location as internal.Set headers manually.Useful for authentication.
Examples of using health checks
• Verify that pagesdon’t contain errors
• Run internal tests (e.g. test.php => DB connect)
• Managed removal of servers$ touch $DOCROOT/isactive.txt
Advantages of ‘Health Checks’
• Run tests asynchronously (find errors faster)
• Custom tests (not related to ‘real’ traffic)
• More flexibility to specify success/error
MORE NGINX PLUS FEATURES…
Slow start
• When basic error checks and advanced health checks recover:
upstream backends {
zone backends 64k;
server webserver1 slow_start=30s;
}
NGINX Plus status monitoring
http://demo.nginx.com/ and http://demo.nginx.com/status
Total data and connectionsCurrent data and conns.
Split per ‘server zone’
Cache statistics
Upstream statistics:TrafficHealth and Error status
(web) (JSON)
3. Live software upgrades
• Upgrade your NGINX binary on-the-fly
– No downtime
– No dropped connections
No downtime – ever!
• Reload configuration with SIGHUP# nginx –s reload
• Re-exec binary with copy-and-signalhttp://nginx.org/en/docs/control.html#upgrade
NGINX parent process
NGINX workers
NGINX workers
NGINX workers
NGINX workers
In summary...
Basic Error checks and retry logic On-the-fly upgrades
Advanced health checks + slow start Extended status monitoring
NGINX F/OSS:
NGINX Plus:
Compared to other load balancers and ADCs, NGINX Plus is uniquely well-suited to a devops-driven environment.
Closing thoughts
• 37% of the busiest websites use NGINX– In most situations, it’s a drop-in extension
• Check out the blogs on nginx.com
• Future webinars: nginx.com/webinars
Try NGINX F/OSS (nginx.org) or NGINX Plus (nginx.com)