Nginx conference 2015

Post on 17-Feb-2017

1006 Views

Category:

Software

1 Downloads

Preview:

Click to see full reader

Transcript

Move Over IBM WebSeal and F5 BigIP, Here Comes NGINX09/23/2015

#nginx #nginxconf2

Advisory IT Specialist at ING Bank N.V.

Bart Warmerdam

Who is ING globally

3

Who is ING in the Netherlands

4

• Bank with diverse software and hardware landscape• Cost driven IT• Traditional software development: design, build, test, implement• Software strategy: buy before build• Middleware strategy: buy• Hardware strategy: appliance

History up to 2.5 years ago within ING

5

• Bank with diverse software and hardware landscape• IT and Time-to-Market is important• 60 scrum teams internally working on software• Software strategy: build before buy (a lot of time)• Middleware strategy: buy but…• Hardware strategy: standard scalable stacks

From 2.5 years ago up to now

6

Complex IT landscape

Task: simplify IT

Add missing functionality

7

• Internet facing reverse proxies (IBM TAM WebSeal) Authenticating proxy Content caching and compression Cookie jar functionality

• Multiple layers of load balancers (F5 BigIP) Over data centers Over nodes in different network zones

For all internet facing domains of domestic banking Netherlands

Infra structure to replace

8

• Investigate open source software: NGINX or Apache vs IBM WebSeal / F5• Perform a proof of concept with NGINX for Authentication and Event Publishing• Write a report for deciding architects which concluded after proof of concept:

Replace IBM TAM WebSeal with NGINX using custom modules Integrate the layers of F5 BigIP’s with NGINX

The result “GO!” Now we are more in control then ever.

The Plan to Simplify

9

Starting with

10

Load balancer

WebSeal

Load balancer

Tier 1 (dmz)

Tier 2

F5

IBM

F5

F5

External Authentication

Interface

ApplicationApplication

Application

10

Inter Connectivity Cloud (between DC’s)Inter Connectivity Cloud (between DC’s)

Policy Mgr LDAP

Load Balancer

Working towards

11

Load balancer

NGINX

Tier 1 (dmz)

Tier 2

F5

NGINX

External Authentication

Interface

ApplicationApplication

Application

11

Inter Connectivity Cloud (between DC’s)Inter Connectivity Cloud (between DC’s)

Control in…

12

• Integrate Authentication and Event Publishing module from PoC

Functionality

Time-to-Market

Operational Monitoring

Control

Control in…

13

• Integrate Authentication and Event Messaging module from PoC• Add missing cookie jar functionality

Functionality

Time-to-Market

Operational Monitoring

Control

Control in…

14

• Integrate Authentication and Event Messaging module from PoC• Add missing cookie jar functionality• Add load balancing persistency over data centers

Functionality

Time-to-Market

Operational Monitoring

Control

Control in…

15

• Integrate Authentication and Event Messaging module from PoC• Add missing cookie jar functionality• Add load balancing persistency over data centers

• Add dynamic service discovery so teams can self-service end points

Functionality

Time-to-Market

Operational Monitoring

Control

Control in…

16

• Integrate Authentication and Event Messaging module from PoC• Add missing cookie jar functionality• Add load balancing persistency over data centers

• Add dynamic service discovery so teams can self-service end points• Integrate existing (Java) Continuous Delivery Pipeline

Functionality

Time-to-Market

Operational Monitoring

Control

Control in…

17

• Integrate Authentication and Event Messaging module from PoC• Add missing cookie jar functionality• Add load balancing persistency over data centers

• Add dynamic service discovery so teams can self-service end points• Integrate existing (Java) Continuous Delivery Pipeline

• Monitor system resource usages and errors to Graphite

Functionality

Time-to-Market

Operational Monitoring

Control

Control in…

18

• Integrate Authentication and Event Messaging module from PoC• Add missing cookie jar functionality• Add load balancing persistency over data centers

• Add dynamic service discovery so teams can self-service end points• Integrate existing (Java) Continuous Delivery Pipeline

• Monitor system resource usages and errors to Graphite• Add Grafana dashboards and Mobile alerts for team dashboards

Functionality

Time-to-Market

Operational Monitoring

Control

Control in…

19

• Integrate Authentication and Event Messaging module from PoC• Add missing cookie jar functionality• Add load balancing persistency over data centers

• Add dynamic service discovery so teams can self-service end points• Integrate existing (Java) Continuous Delivery Pipeline

• Monitor system resource usages and errors to Graphite• Add Grafana dashboards and Mobile alerts for team dashboards• Monitor and report upstream errors to Tivoli Omnibus (MCR)

Functionality

Time-to-Market

Operational Monitoring

Control

Control in…

20

• Integrate Authentication and Event Messaging module from PoC• Add missing cookie jar functionality• Add load balancing persistency over data centers

• Add dynamic service discovery so teams can self-service end points• Integrate existing (Java) Continuous Delivery Pipeline

• Monitor system resource usages and errors to Graphite• Add Grafana dashboards and Mobile alerts for team dashboards• Monitor and report upstream errors to Tivoli Omnibus (MCR)• Make performance data and reports available to all scrum teams

Functionality

Time-to-Market

Operational Monitoring

Control

• First step: Integrate into the Continuous Delivery Pipeline• From GIT to production

• Second step: Add additional functionality to NGINX

• Future roadmap of the NGINX authenticating proxy environment

Roll-out planning

21

• Using standard open source tools like:Git, Jenkins, Maven, Nexus, Docker, Valgrind, Python

• And closed source tools likeNolio (deployments), Fortify (static source code analysis)

First step: integrate in continuous delivery pipeline

22

23

GIT repository

24

Commits on “develop” trigger a build in JenkinsUsing an Apache Maven build profile

25

Which builds the project modules

26

By packaging all own modulesAnd add nginx.org source from our Nexus repositoryAnd 3rd party source modules from our Nexus repositoryAs a tar.gz file

27

And add the RedHat .spec file

28

To start a Docker build in a CentOS imageWhich results in an RPM

29

If all Python tests succeed on the binary

30

If all integration test scripts ran successfullyAll product acceptance scripts ran successfully

31

And all module tests succeed as well

32

Using a Python test frameworkTo easily create test cases for the binary and modules

33

The RPM’s and test results are uploaded to a Nexus RepositoryTogether with Nolio deployment scriptsAfter which Jenkins triggers an automatic Nolio deployment in LCM

34

Each commit in “develop” also starts a Jenkins job thatTriggers the Valgrind tests on all modulesAnd emails the results on failures

35

Each commit in “develop” also starts a nightly Jenkins job thatStarts a Fortify scan for static source code analysisOn all own modules, NGINX code and all 3rd party modules used

36

Releases on “master” trigger a build in JenkinsUsing Apache Maven release profileWhere versioned artifacts are uploaded to Nexus

37

Configuration releases on “master” trigger a build in JenkinsWhere the correct nginx.conf and site information created

38

And SQL is used to create a list of URL endpointsAnd their module directives

39

Using a maven plugin to create the correct configuration files

40

Using Docker to build a RPM and test all generated configurations

41

So it can be automatically deployed in Nolio in LCM by Jenkins

• LCM DEV + TST environment for internal team tests

• DEV + TST for integration tests for all other teams

• ACC for pre-production testsDaily load tests using Load Runner & perf. reports using Python, Latex and gnuplotWeekly resilience testsUnplanned Simian Army testsRun “perf” tests for NGINX profiling (if a change requires it)Penetration and security tests

• Multiple PRD environments in different data centersReplaced all IBM WebSeal reverse proxies with NGINXStarting to replace all F5 BigIP internal load balancers with NGINX load balancer module

The result…

42

• Using “perf” we analyzed the binary under load ~500 URI/sec

Optimizing the result

43

Number 1, 3, 8,11 is GZIP compressionNumber 2 is memset => hard to pinpoint since generic use

Number 4 is network driver => cannot changeNumber 5 is cookie header parsing, triggered by our codeNumber 6 is OSNumber 7 is Kafka CRC32 code

Number 9 is memcpy => hard to pinpoint since generic useNumber 10 is cause by the audit system => cannot change

Number 20 first own method listed

• GZIP is expensive on the CPU, use optimized libraries when possible

• Use static linking when replacing the patched library cannot be done on target machine

• Two patches available, from Intel and CloudflareCompression level 5

Source: https://www.snellman.net/blog/archive/2014-08-04-comparison-of-intel-and-cloudflare-zlib-patches.html

Include optimized libraries

44

• Some libraries are not available on the target machine (Kafka, MaxMind, Protobuf)

• Some libraries are too old on target machine (PCRE3 – for JIT)

• CPU optimized versions are added in the Docker image and statically linked

Patching libraries for performance

45

• Our five most important home-made modules

Cookie jar module – store Set-Cookie operations in reverse proxy WebSeal module – Authentication module based on Extended Authentication Interface (EAI) Kafka module – Send Event Messages from proxy layer to other systems Load balancing – Rule based upstream use, allow dynamic service discovery Monitoring module – Monitor application use and system resource usage

Second step: Add additional functionality to NGINX

46

• Uses two levels of RB Trees to store state

• Highly configurable

• Use timers for automatic expiration and cleanup

• Use shared memory to share state between workers

Cookie jar module

47

• Uses a RB Trees to store session state

• Allows access on different policies (fine or coarse grained)

• Use timers for automatic expiration and cleanup

• Use shared memory to share state between workers

• Implement the EAI interface to allow gradual migration

WebSeal module

48

• Publish Events for monitoring and error analysis

• Highly configurable using a separate json config file

• Fast and asynchronous to avoid processing overhead

Event Publishing (Kafka) module

49

• Use specific upstream servers based on rules (e.g. confidence test)

• Allow static load balancing over data centers for stateful applications

• Allow TCP connection re-use, using pools

• Integration with monitoring module to allow monitoring via MCR

Load balancing module

50

• Read variables from other modules to monitor

• Create and expose variables with system resources to monitor

• Use UDP or TCP to transfer monitor data to Graphite

• Integration with Tivoli Omnibus to allow monitoring via MCR

Monitoring module

51

Monitoring example

52

• Add WAF modules

• Fully implement dynamic service discovery to dynamically add/remove URI’s and upstream servers

• Implement cross datacenter persistency for cookie jar

Future roadmap of the NGINX authenticating proxy environment

53

• Remove manual work in development and testing ASAP

• NGINX has a lot of configuration optimization possibilitiesTCP Socket/TCP options, caching, connection re-use, JIT, Threads, upstream zone, buffer settings, timeouts

• In own modulesUse Shared Memory for Session State (if needed), RB Trees, Thread pools, Timers and the event queueUse atomic reference counter over shared mutex locks if possibleUse variables to pass data between modules

• In NGINX modulesCompression on content is CPU expensive!Cookie lookups in modules are potentially CPU expensiveCRC32 is potentially CPU expensiveIf using symmetric crypto, use types supported by the CPU (EAS-NI), like EAS GCM/CTR

Lessons learned so far…

54

• Older stack require more work to fully use all configurationsRecompiled new GCC C-compiler for strong stack protector and CPU optimization optionsRecompiled libz and static link for latest version and add Intel performance patchesRecompiled libpcre and static link for latest version for JIT, and use CPU optimize flagsRecompiled other libs which are not present in RHEL and use CPU optimize flags

• Make monitoring highly configurable per site and fine-tune over time

• Use good monitoring dashboardsCombination of Graphite and Grafana works very wellTest which log data in error.log is required for good root-cause-analysis if an error occurs

• Take enough time to testPerformance tests under stress load with tools like “perf” give a lot of insightInvest enough time in resilience tests and what key data is needed to monitor your systemAll code which involves shared memory, locks, timers and configuration reloads take more time to get right

Lessons learned so far…

55

And… NGINX is very fast, very efficiently coded and extremely fun to program for!

Lessons learned so far…

56

Questions??

E-mail: bart.warmerdam@ing.nl

And...

57

The opinions expressed in this publication are based on information gathered by ING and on sources that ING deems reliable. This data has been processed with care in our analyses. Neither ING nor employees of the bank can be held liable for any inaccuracies in this publication. No rights can be derived from the information given. ING accepts no liability whatsoever for the content of the publication or for information offered on or via the sites. Author rights and data protection rights apply to this publication. Nothing in this publication may be reproduced, distributed or published without explicit mention of ING as the source of this information. The user of this information is obliged ot abide byb ING's instructions relating to the use of this information. Dutch law applies.

www.ing.com

Disclaimer

58

top related