NGINX High Availability and Monitoring

NGINX High Availability and Monitoring

Introduced by Andrew Alexeev

Presented by Owen Garrett

Nginx, Inc.

About this webinar

No one likes a broken website. Learn about some of the techniques that NGINX

users employ to ensure that server failures are detected and worked around, so that

you too can build large-scale, highly-available web services.

The cost of downtime

The causes of downtime

“ Through 2015, 80% of outages impacting mission-

critical services will be caused by people and process issues, and more than 50% of those outages will be caused by change/configuration/release integration and hand-off issues. ”

Configuration Management for Virtual and Cloud Infrastructures

Ronni J. Colville and George Spafford, Gartner

Hardware failures, disasters

People and Process

INTRODUCING NGINX…

What is NGINX?

Internet

N

Web ServerServe content from disk

Application ServerFastCGI, uWSGI, Passenger…

ProxyCaching, Load Balancing… HTTP traffic

Application Acceleration

SSL and SPDY termination

Performance Monitoring

High Availability

Advanced Features: Bandwidth Management

Content-based Routing

Request Manipulation

Response Rewriting

Authentication

Video Delivery

Mail Proxy

GeoLocation

143,000,000Websites

NGINX Accelerates

22%Top 1 million websites

37%Top 1,000 websites

NGINX and NGINX Plus

NGINX F/OSS

nginx.org

3rd party modules

Large community of >100 modules

NGINX and NGINX Plus

NGINX F/OSS

nginx.org

3rd party modules

Large community of >100 modules

NGINX Plus

Advanced load balancing featuresEase-of-managementCommercial support

IMPROVING AVAILABILITY WITH NGINX

Quick review of load balancingserver {

listen 80;

location / {

proxy_pass http://backend;

}

}

upstream backend {

server webserver1:80;




}

Internet

N

Three NGINX Techniques for High Availability

NGINX: Basic Error Checks

NGINX Plus: Advanced Health Checks

Live software upgrades

1

2

3

1. Basic Error Checks

• Monitor transactions as they happen

– Retry transactions that ‘fail’ where possible

– Mark failed servers as dead

Basic Error Checksserver {

listen 80;

location / {


proxy_next_upstream error timeout; # http_503..., off

}

}

upstream backend {

server webserver1:80 max_fails=1 fail_timeout=10s;




}

More sophisticated retriesserver {

listen 80;

location / {

# On error/timeout, try the upstream group one more time

error_page 502 504 = @fallback;


proxy_next_upstream off;

}

location @fallback {


proxy_next_upstream off;

}

}

2. Advanced Health Checks

• “Synthetic Transactions”

– Probes server health

– Complex, custom tests are possible

– Available in NGINX Plus

Advanced Health Checksserver {

listen 80;

location / {


health_check;

}

}

upstream backend {

zone backend 64k;





}

health_check:interval = period between checksfails = failure count before deadpasses = pass count before aliveuri = custom URI

Default:5 seconds, 1 fail, 1 pass, uri = /

Advanced usageserver {

listen 80;

location / {


health_check uri=/test.php match=statusok;

proxy_set_header Host www.foo.com;

}

}

match statusok {

# Used for /test.php health check

status 200;

header Content-Type = text/html;

body ~ "Server[0-9]+ is alive";

}

Health checks inherit all parameters from location block.

match blocks define the success criteria for a health check

Edge cases – variables in configurationserver {

location / {


health_check;

proxy_set_header Host $host;

}

}

This may not work as expected.

Remember – the health_checktests run in the context of the enclosing location.

Edge cases – variables in configurationserver {

location / {


health_check;

proxy_set_header Host $host;

}

}

server {

location /internal-check {

internal;


health_check;

proxy_set_header Host www.foo.com;

}

}

This may not work as expected.

Remember – the health_checktests run in the context of the enclosing location.

This is the common alternative.

Use a custom URI for the location.Tag the location as internal.Set headers manually.Useful for authentication.

Examples of using health checks

• Verify that pagesdon’t contain errors

• Run internal tests (e.g. test.php => DB connect)

• Managed removal of servers$ touch $DOCROOT/isactive.txt

Advantages of ‘Health Checks’

• Run tests asynchronously (find errors faster)

• Custom tests (not related to ‘real’ traffic)

• More flexibility to specify success/error

MORE NGINX PLUS FEATURES…

Slow start

• When basic error checks and advanced health checks recover:

upstream backends {

zone backends 64k;

server webserver1 slow_start=30s;

}

NGINX Plus status monitoring

http://demo.nginx.com/ and http://demo.nginx.com/status

Total data and connectionsCurrent data and conns.

Split per ‘server zone’

Cache statistics

Upstream statistics:TrafficHealth and Error status

(web) (JSON)

http://demo.nginx.com/

http://demo.nginx.com/status

3. Live software upgrades

• Upgrade your NGINX binary on-the-fly

– No downtime

– No dropped connections

No downtime – ever!

• Reload configuration with SIGHUP# nginx –s reload

• Re-exec binary with copy-and-signalhttp://nginx.org/en/docs/control.html#upgrade

NGINX parent process

NGINX workers

NGINX workers

NGINX workers

NGINX workers

In summary...

Basic Error checks and retry logic On-the-fly upgrades

Advanced health checks + slow start Extended status monitoring

NGINX F/OSS:

NGINX Plus:

Compared to other load balancers and ADCs, NGINX Plus is uniquely well-suited to a devops-driven environment.

Closing thoughts

• 37% of the busiest websites use NGINX– In most situations, it’s a drop-in extension

• Check out the blogs on nginx.com

• Future webinars: nginx.com/webinars

Try NGINX F/OSS (nginx.org) or NGINX Plus (nginx.com)

NGINX High Availability and Monitoring

Technology

web application downtime

cost of downtime http

process issues

web application acceleratornot

unplanned outages

handoff issues

available web services

performance errors