How Ansible helps Backbase Ansible Benelux meetup Pavel Chunyayev Amsterdam, 27-5-2015
Aug 12, 2015
How Ansible helps Backbase
Ansible Benelux meetup
Pavel ChunyayevAmsterdam, 27-5-2015
Who am I
• Come from Ukraine• 11 years in IT• Worked in Ukraine, Estonia and the Netherlands
• Continuous Delivery architect at Levi9 IT Services• Last 6 months - Automation architect at Backbase
Backbase CXP
Backbase Customer Experience Platform• Core services• Content services• Publication services• 3 environments – Editorial, Staging, Live
Different configuration options
• Java version• Application Server• RDBMS• HTTP/HTTPS• Internal configuration options• Optional application features
A lot of things are already automated• There are servers for released CXP version• With different configurations• They can be started/stopped when needed
• Newest version of the application needs to be deployed.• In most cases manually. • For some configurations deployment required repackaging of the application.• Automated through maven
• There is a sandbox environment with the nightly build• Deployed automatically• Far from production setup
Handcrafted servers
• Hard to maintain• Very time/cost sensitive• Setup is not easily reproducible• May be buggy• It should take less time to rebuild a server from the scratch than to log
in and fix/update it.
Solution
Solution diagram
Why Ansible
• Python powered• No master, agentless• Free, open source• Plenty of modules (batteries included)• Great EC2 support• Windows support (kind of)• Parallel, but controllable execution• Quite simple for developer to understand
Why REST service
• Create infrastructure easily• Just send JSON formatted configuration• Service will analyze it and trigger Ansible run
• Service is the single point of contact for any infrastructure requests• Can be integrated into any CI, script, application or other service
Why UI
• Everyone needs to create an environment from time to time• Opening a ticket and then waiting is not an option• In most situations environments are required for a short period of
time.
• Self-service
Demo
• Directory structure• Flow of work• Decision tree
Ansible features we are using
• Handlers• Variables• Jinja2 templates• Facts• Conditions• Playbook includes• Inventory (fake) – hostgroups!• Roles :(
ec2 ec2: key_name: '{{ keypair }}' group_id: '{{ security_group }}' instance_type: '{{ instance }}' image: '{{ image }}' region: '{{ region }}' vpc_subnet_id: '{{ subnet }}' user_data: "{{ item }}" instance_profile_name: 'access-to-s3' instance_tags: origin: "{{ origin }}" environment_name: "{{ environment_name }}" stack_id: "{{ stack_id }}" owner_id: "{{ owner_id }} " role: "{{ item }}" timestamp: "{{ timestamp }}" with_items: server_roles register: ec2
Facts
- name: Set the facts and hostnames hosts: all_hosts connection: ssh gather_facts: True max_fail_percentage: 0 tasks: - name: Gather EC2 facts ec2_facts: - name: Set environment fact set_fact: this_environment="{{ ansible_ec2_user_data }}" - name: Set hostnames hostname: name="{{ environment_name }}-{{ this_environment }}"
route53
route53: command: create zone: backbase.dev private_zone: yes overwrite: yes record: "{{ environment_name }}-{{ item.0 }}.backbase.dev" type: A ttl: 10 value: "{{ item.1.instances[0].private_ip }}" with_together: - server_roles - ec2.results register: r53_result until: r53_result|success retries: 20 delay: "{{ 10 |random }}"
Jinja2 templates
{% block portal_db %}{% endblock %}{% if http_or_https == 'http' %} {% set port = http_port %}{% else %} {% set port = https_port %}{% endif %}
{% if this_environment == "editorial" %}foundation.environment.editorial=true{% else %}foundation.environment.editorial=false{% endif %}foundation.content.proxy.destination={{ http_or_https }}://{{ environment_name }}-{{ this_environment }}.backbase.dev:{{ port }}/contentservices
wait_for
- name: Start WSLC shell: /opt/IBM/Websphere/INIT.websphere start {{ this_environment }}- name: Wait for WSLC to start wait_for: path=“/opt/IBM/Websphere/usr/servers/{{ this_environment }}/logs/console.log” search_regex=“The server {{ this_environment }} is ready to run a smarter planet.” timeout=30"- name: Run the trigger shell: /opt/install/app_start_trigger.sh &> /opt/install/app_start_trigger.log; sleep 2- name: Wait for all apps to start wait_for: path="/opt/IBM/Websphere/usr/servers/{{ this_environment }}/logs/messages.log" search_regex="SRVE0242I: \[portalserver\] \[/portalserver\] \[/WEB-INF/index\.jsp\]: Initialization successful\." timeout=600
Recovering from failure
- name: Download CXP shell: s3cmd get s3://s3_bucket_here/Backbase_Portal_5.6.0-{{ version }}.zip /opt/install/portal-package-5.6.0-{{ version }}.zip --force 2>&1 | tee /opt/install/direct_loader.log register: cxp_download_sleeper until: cxp_download_sleeper.stdout.find("saved as") != -1 retries: 10 delay: "{{ 10 | random }}"
API
• /api/stacks - GET - List stacks available for provisioning• /api/stacks/stack_name - GET - List the stack configuration• /api/environments - GET - List all currently provisioned
environments• /api/stacks/stack_name - POST - Provision specified stack• /api/environments/environment_id - DELETE - Destroy
environment with specified id• /api/environments/all - DELETE - Destroy all environments
Infrastructure life cycle
• Create• Check if the user is valid• Parse the requested configurarion• Generate unique environment name• Trigger Ansible run• Return environment name
• Destroy• Check if requested environment exists• Check if the user can destroy this environment• Delete environment and clean everything up (DNS, ELB, etc.)
REST Service demo
• Create a set of instances• Destroy them
Current UI :)
Ansible testing
• No way to test playbook without applying it• Currently there’s a quick sanity test suite• We do testing every commit for a selected number of stacks
Demo
• Stash• Feature branches• Ansible testing pipeline
Results
• 14-40 minutes to provision and fully configure environments• From 1 stack to 10 stacks automated testing (~25 soon)• We continuously improve to make a robust process
Continuous Deliverywithout Production
Goals for Continuous Delivery
• Create a repeatable and robust process• Treat all configurations identically• Some are more important of course
• Provide feedback as soon as possible • For now – in the morning
• Provide feedback for feature branches• Release tested artifacts more frequently• For now – every iteration
Stages for Continuous Delivery
• Components are built• Unit and integration tests
• Main application is build and packaged• Published to Artifactory and s3
• Testing pipeline is triggered• Environments are created• Sanity tests, API tests, Functional tests, etc. are run• Notification is sent in case of any test failure• Environments are disposed
Continuous Delivery diagramBuild components
PackageProvision Stack1
API Tests E2E Tests
…Dispose
Provision Stack 2API TestsE2E Tests
…Dispose
Provision Stack 3API TestsE2E Tests
…Dispose
……………
Provision Stack 10API TestsE2E Tests
…Dispose
More pipelines…
• Performance tests pipeline• Feature branches• Security tests• Earlier versions of the applications• Bugfix releases• More applications
Achievements
• Huge quality improvements• Numerous bugs were found in the ‘rare’ stacks• Regressions are found during the night
• Minimum cycle time is 1h 30m, maximum – 2h 30m• Dozens environments are created every day• Repeatable process allows to identify instabilities in tests and
configurations
Closing thoughts
Zero downtime deployments
Future
• Asynchronous provisioning• Ansible roles?• Optimization (time)• Pre-baked images• Docker containers• Plugins to help our specific needs
Ansible v2
• Blocks • begin• rescue• always
• Execution strategy – linear vs free (or anything else)• Execution time include evaluation (with*)• Better variable management
Key takeaways
• Use Ansible – it’s a great tool :)• Think about immutable infrastructure• Create repeatable and reliable process
for releasing software• Build quality in• Improve continuously
[email protected]@PavelChunyayev
Any questions?