Top Banner
R.I.Pienaar Puppet Camp Ghent Managing Puppet using MCollective
39

Managing Puppet using MCollective

May 10, 2015

Download

Technology

Puppet Labs

R.I. Pienaar's talk "Managing Puppet using MCollective" at Puppet Camp Ghent, 2013 and at Puppet Camp New York 2013.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Managing Puppet using MCollective

R.I.Pienaar

Puppet Camp Ghent

Managing Puppet using MCollective

Page 2: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Who am I?

• Puppet user since 0.22.x

• Architect of MCollective

• Author of Extlookup and Hiera

• Developer at Puppet Labs London

• Blog at http://devco.net

• Tweets at @ripienaar

• Volcane on IRC

Page 3: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

The Problem?

• Puppet needs management just like other software

• Enabling, disabling, ad-hoc runs, custom environments etc

• The Puppet Master is a finite resource that needs protection

• Orchestrated deploys

Page 4: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Available on yum.puppetlabs.com and apt.puppetlabs.com

http://srt.ly/mcpuppet

package{[“mcollective-puppet-agent”, “mcollective-puppet-client”]: ensure => present}

MCollective Puppet Agent

Page 5: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Obtaining The Agent Status

Page 6: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

unix text here

Obtaining Statuses

$ mco puppet status

* [ ============================================================> ] 11 / 11

node8.example.net: Currently stopped; last completed run 14 minutes 16 seconds ago ....

Summary of Applying:

false = 11

Summary of Daemon Running:

stopped = 11

Summary of Enabled:

enabled = 10 disabled = 1

Summary of Idling:

false = 11

Finished processing 11 / 11 hosts in 72.05 ms

Per node status

Estate wide summary

Page 7: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

$ mco puppet count

Total Puppet nodes: 11

Nodes currently enabled: 10 Nodes currently disabled: 1

Nodes currently doing puppet runs: 5 Nodes currently stopped: 6

Nodes with daemons started: 10 Nodes without daemons started: 1 Daemons started but idling: 6

Obtaining Statuses

Page 8: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

$ mco rpc puppet last_run_summary

* [ ============================================================> ] 28 / 28

. . .

Summary of Config Retrieval Time:

Average: 20.13

Summary of Total Resources:

Average: 435

Summary of Total Time:

Average: 39.33

Finished processing 28 / 28 hosts in 311.23 ms

Obtaining Statuses

Page 9: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Running Puppet

Page 10: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

$ mco puppet runonce

* [ ============================================================> ] 11 / 11

node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'

Finished processing 11 / 11 hosts in 2593.85 ms

$ mco puppet count

Total Puppet nodes: 11

Nodes currently enabled: 10 Nodes currently disabled: 1

Nodes currently doing puppet runs: 2 Nodes currently stopped: 9

Nodes with daemons started: 10 Nodes without daemons started: 1 Daemons started but idling: 8

Doing Basic Runs

Puppet 3 disable message

Run with default configured splay and splaylimit

Page 11: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Run with no splay, still subject to enable/disable

$ mco puppet runonce -f

* [ ============================================================> ] 11 / 11

node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'

Finished processing 11 / 11 hosts in 2661.99 ms

Doing Basic Runs

Page 12: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Force splay and set a custom splay limit

$ mco puppet runonce --splay --splaylimit 120

* [ ============================================================> ] 11 / 11

node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'

Finished processing 11 / 11 hosts in 2661.99 ms

Doing Basic Runs

Page 13: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Selects 2 tags in a specific Puppet Environment

$ mco puppet runonce --tag webserver --tag syslog --environment development

* [ ============================================================> ] 11 / 11

node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'

Finished processing 11 / 11 hosts in 2661.99 ms

Tags and Environment

Page 14: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Do a noop run, gathers reports and audit information

$ mco puppet runonce --noop

* [ ============================================================> ] 11 / 11

node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'

Finished processing 11 / 11 hosts in 2661.99 ms

Doing noop Runs

Page 15: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

When puppet.conf has noop=true,do an actual run on demand

$ mco puppet runonce --tag webserver --no-noop

* [ ============================================================> ] 11 / 11

node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'

Finished processing 11 / 11 hosts in 2661.99 ms

Doing no-noop Runs

Page 16: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Does a single run against a differentPuppet Master

$ mco puppet runonce --server secops.example.net:8134 --tag compliance

* [ ============================================================> ] 11 / 11

node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'

Finished processing 11 / 11 hosts in 2661.99 ms

Choosing a Master

Page 17: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Preventing Puppet Runs

Page 18: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

The Big Red Button

Disables Puppet, does not change currentlydisabled nodes reasons

$ mco puppet disable “we f’d up, stop the train!”

* [ ============================================================> ] 11 / 11

node9.example.net Request Aborted Could not disable Puppet: Already disabled

Summary of Enabled:

disabled = 11

Finished processing 11 / 11 hosts in 90.06 ms

Page 19: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

The Big Green Button

Enables all disabled Puppet nodes

$ mco puppet enable -S ‘puppet().disable_message=/stop the train/’

* [ ============================================================> ] 10 / 10

Summary of Enabled:

enabled = 10

Finished processing 10 / 10 hosts in 90.06 ms

Page 20: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Operating On Groups Of Hosts

Page 21: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Selective Runs

Run using a filter:all web servers with fact cluster=a

$ mco puppet runonce -W “cluster=a roles::webserver”

* [ ============================================================> ] 5 / 5

Finished processing 5 / 5 hosts in 90.06 ms

Facter fact Puppet Class

Page 22: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Selective Runs

Run using a filter:nodes where we manage /srv/www

$ mco puppet runonce -S “resource(‘File[/srv/www]’).managed=true”

* [ ============================================================> ] 5 / 5

Finished processing 5 / 5 hosts in 90.06 ms

Any Puppet resource

Page 23: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Selective Runs

Run using a filter:Most recent run config_version was xyz

that had > 5 resource failures

$ mco puppet runonce -S “resource().failed_resources>5 and resource().config_version=xyz”

* [ ============================================================> ] 5 / 5

Finished processing 5 / 5 hosts in 90.06 ms

Page 24: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Runs all nodes with a maximum concurrency

$ mco puppet runall 72013-01-19 20:58:59: Running all nodes with a concurrency of 72013-01-19 20:58:59: Discovering enabled Puppet nodes to manage2013-01-19 20:59:02: Found 11 enabled nodes2013-01-19 20:59:06: node3.example.net schedule status: Started a background Puppet run2013-01-19 20:59:07: node1.example.net schedule status: Started a background Puppet run2013-01-19 20:59:09: node4.example.net schedule status: Started a background Puppet run2013-01-19 20:59:10: node6.example.net schedule status: Started a background Puppet run2013-01-19 20:59:12: node0.example.net schedule status: Started a background Puppet run2013-01-19 20:59:13: node5.example.net schedule status: Started a background Puppet run2013-01-19 20:59:17: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:21: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:25: node9.example.net schedule status: Puppet is currently applying a catalog, cannot run now2013-01-19 20:59:29: node8.example.net schedule status: Started a background Puppet run2013-01-19 20:59:33: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:38: node2.example.net schedule status: Started a background Puppet run2013-01-19 20:59:41: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:46: middleware.example.net schedule status: Started a background Puppet run2013-01-19 20:59:50: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:55: node7.example.net schedule status: Started a background Puppet run

Roll Out A Change Quickly

Page 25: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Does not attempt to manage disabled nodes

2013-01-19 20:58:59: Running all nodes with a concurrency of 72013-01-19 20:58:59: Discovering enabled Puppet nodes to manage2013-01-19 20:59:02: Found 11 enabled nodes

Roll Out A Change Quickly

Page 26: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Starts the first 6 quickly but considersadministrators doing 1other run at the same time

2013-01-19 20:59:02: Found 11 enabled nodes2013-01-19 20:59:06: node3.example.net schedule status: Started a background Puppet run2013-01-19 20:59:07: node1.example.net schedule status: Started a background Puppet run2013-01-19 20:59:09: node4.example.net schedule status: Started a background Puppet run2013-01-19 20:59:10: node6.example.net schedule status: Started a background Puppet run2013-01-19 20:59:12: node0.example.net schedule status: Started a background Puppet run2013-01-19 20:59:13: node5.example.net schedule status: Started a background Puppet run2013-01-19 20:59:17: Currently 7 nodes applying the catalog; waiting for less than 7

Roll Out A Change Quickly

Page 27: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

node9 was being run by an administrator or normalschedule already, skipped to next node

2013-01-19 20:59:17: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:21: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:25: node9.example.net schedule status: Puppet is currently applying a catalog, cannot run now2013-01-19 20:59:29: node8.example.net schedule status: Started a background Puppet run

Roll Out A Change Quickly

Page 28: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Regularly checks the concurrency and startsmore nodes soon as possible.

Average node run time 34.39s, totaltime 55 seconds

2013-01-19 20:59:29: node8.example.net schedule status: Started a background Puppet run2013-01-19 20:59:33: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:38: node2.example.net schedule status: Started a background Puppet run2013-01-19 20:59:41: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:46: middleware.example.net schedule status: Started a background Puppet run2013-01-19 20:59:50: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:55: node7.example.net schedule status: Started a background Puppet run

Roll Out A Change Quickly

Page 29: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Does runonce in batches of 5, 5 minute sleepper batch. ^c after any batch to stop.

15 minute total run time.

$ mco puppet runonce --batch 5 --batch-sleep 300

* [ ============================================================> ] 11 / 11

Finished processing 11 / 11 hosts in 903686.29 ms

Roll Out A Change SlowlyWait 5 minutes

Page 30: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Advanced Status And Performance Metrics

Page 31: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Distribution of various metrics.

$ mco puppet summary

Summary statistics for 28 nodes:

Total resources: ▂▇▂▁▁▃▁▂▂▂▄▁▂▁▁▁▁▁▂▁ min: 332.0 max: 695.0 Out Of Sync resources: ▇▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ min: 0.0 max: 2.0 Failed resources: ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ min: 0.0 max: 0.0 Changed resources: ▇▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ min: 0.0 max: 2.0 Config Retrieval time (seconds): ▆▇▅▄▁▃▃▁▁▁▃▁▁▄▂▁▁▁▁▁ min: 2.7 max: 57.1 Total run-time (seconds): ▇▃▄▄▄▃▂▂▂▂▃▂▁▁▁▁▁▂▁▁ min: 7.0 max: 125.1 Time since last run (seconds): ▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂ min: 10.0 max: 89.0k

Performance Analysis

Page 32: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Distribution of various metrics.

Config Retrieval time (seconds): ▆▇▅▄▁▃▃▁▁▁▃▁▁▄▂▁▁▁▁▁ min: 2.7 max: 57.1 Total run-time (seconds): ▇▃▄▄▄▃▂▂▂▂▃▂▁▁▁▁▁▂▁▁ min: 7.0 max: 125.1

Performance Analysis

Page 33: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Distribution of config retrieval time.

$ mco plot resource config_retrieval_time

Information about Puppet managed resources Nodes 8 ++----*-----+----------+-----------+----------+----------+----------++ + * + + + + + + 7 ++ ** ++ | * * | 6 ++ * * ++ | * * | | * * | 5 ++ * * ++ | * * | 4 ++ * * ++ | * * | 3 ++ * * * * ++ | * * ** * ** | 2 ++* **** * * * ++ | * * * | | * * * | 1 ++ ************** ****** * * ** ++ + + + * + ** + *+ *** + 0 ++----------+----------+---------********-----+--*******-+----*-----++ 0 10 20 30 40 50 60 Config Retrieval Time

Performance Analysis

Slow machines

Page 34: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Find machines with config_retrieval_time over30 seconds - all the dev servers.

$ mco find -S "resource().config_retrieval_time > 30"dev3.example.netdev4.example.netdev7.example.netdev6.example.netdev8.example.netdev9.example.netdev10.example.net

Performance Analysis

Page 35: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Maintenance Windows and Access Control

Page 36: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Only cert=manager can enable and disablethe Puppet Agent indicating maintenance

periods

policy default denyallow cert=manager enable disable * *allow cert=sysadmin runonce status * *allow cert=developer * environment=development *

Puppet State As ACL

Page 37: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Puppet State As ACL

policy default denyallow cert=manager stop start * *allow cert=noc stop start puppet().enabled=falseallow cert=developer * environment=development *

NOC can start and stop servicesonly during a maintenance window.

Manager user can always overridemaintenance windows.

Page 38: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

What is MCollective?

• Ruby framework for writing Orchestration systems

• Provides Authentication, Authorization and Auditing

• No direct communication between client and nodes

Page 39: Managing Puppet using MCollective

R.I.Pienaar | [email protected] | http://devco.net | @ripienaar

Questions?twitter: @ripienaar

email: [email protected]

blog: www.devco.net

github: ripienaar

freenode: Volcane

Questions?