Managing Puppet using MCollective

Post on 10-May-2015

54173 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

R.I. Pienaar's talk "Managing Puppet using MCollective" at Puppet Camp Ghent, 2013 and at Puppet Camp New York 2013.

Transcript

R.I.Pienaar

Puppet Camp Ghent

Managing Puppet using MCollective

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Who am I?

• Puppet user since 0.22.x

• Architect of MCollective

• Author of Extlookup and Hiera

• Developer at Puppet Labs London

• Blog at http://devco.net

• Tweets at @ripienaar

• Volcane on IRC

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

The Problem?

• Puppet needs management just like other software

• Enabling, disabling, ad-hoc runs, custom environments etc

• The Puppet Master is a finite resource that needs protection

• Orchestrated deploys

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Available on yum.puppetlabs.com and apt.puppetlabs.com

http://srt.ly/mcpuppet

package{[“mcollective-puppet-agent”, “mcollective-puppet-client”]: ensure => present}

MCollective Puppet Agent

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Obtaining The Agent Status

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

unix text here

Obtaining Statuses

$ mco puppet status

* [ ============================================================> ] 11 / 11

node8.example.net: Currently stopped; last completed run 14 minutes 16 seconds ago ....

Summary of Applying:

false = 11

Summary of Daemon Running:

stopped = 11

Summary of Enabled:

enabled = 10 disabled = 1

Summary of Idling:

false = 11

Finished processing 11 / 11 hosts in 72.05 ms

Per node status

Estate wide summary

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

$ mco puppet count

Total Puppet nodes: 11

Nodes currently enabled: 10 Nodes currently disabled: 1

Nodes currently doing puppet runs: 5 Nodes currently stopped: 6

Nodes with daemons started: 10 Nodes without daemons started: 1 Daemons started but idling: 6

Obtaining Statuses

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

$ mco rpc puppet last_run_summary

* [ ============================================================> ] 28 / 28

. . .

Summary of Config Retrieval Time:

Average: 20.13

Summary of Total Resources:

Average: 435

Summary of Total Time:

Average: 39.33

Finished processing 28 / 28 hosts in 311.23 ms

Obtaining Statuses

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Running Puppet

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

$ mco puppet runonce

* [ ============================================================> ] 11 / 11

node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'

Finished processing 11 / 11 hosts in 2593.85 ms

$ mco puppet count

Total Puppet nodes: 11

Nodes currently enabled: 10 Nodes currently disabled: 1

Nodes currently doing puppet runs: 2 Nodes currently stopped: 9

Nodes with daemons started: 10 Nodes without daemons started: 1 Daemons started but idling: 8

Doing Basic Runs

Puppet 3 disable message

Run with default configured splay and splaylimit

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Run with no splay, still subject to enable/disable

$ mco puppet runonce -f

* [ ============================================================> ] 11 / 11

node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'

Finished processing 11 / 11 hosts in 2661.99 ms

Doing Basic Runs

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Force splay and set a custom splay limit

$ mco puppet runonce --splay --splaylimit 120

* [ ============================================================> ] 11 / 11

node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'

Finished processing 11 / 11 hosts in 2661.99 ms

Doing Basic Runs

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Selects 2 tags in a specific Puppet Environment

$ mco puppet runonce --tag webserver --tag syslog --environment development

* [ ============================================================> ] 11 / 11

node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'

Finished processing 11 / 11 hosts in 2661.99 ms

Tags and Environment

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Do a noop run, gathers reports and audit information

$ mco puppet runonce --noop

* [ ============================================================> ] 11 / 11

node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'

Finished processing 11 / 11 hosts in 2661.99 ms

Doing noop Runs

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

When puppet.conf has noop=true,do an actual run on demand

$ mco puppet runonce --tag webserver --no-noop

* [ ============================================================> ] 11 / 11

node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'

Finished processing 11 / 11 hosts in 2661.99 ms

Doing no-noop Runs

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Does a single run against a differentPuppet Master

$ mco puppet runonce --server secops.example.net:8134 --tag compliance

* [ ============================================================> ] 11 / 11

node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance'

Finished processing 11 / 11 hosts in 2661.99 ms

Choosing a Master

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Preventing Puppet Runs

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

The Big Red Button

Disables Puppet, does not change currentlydisabled nodes reasons

$ mco puppet disable “we f’d up, stop the train!”

* [ ============================================================> ] 11 / 11

node9.example.net Request Aborted Could not disable Puppet: Already disabled

Summary of Enabled:

disabled = 11

Finished processing 11 / 11 hosts in 90.06 ms

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

The Big Green Button

Enables all disabled Puppet nodes

$ mco puppet enable -S ‘puppet().disable_message=/stop the train/’

* [ ============================================================> ] 10 / 10

Summary of Enabled:

enabled = 10

Finished processing 10 / 10 hosts in 90.06 ms

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Operating On Groups Of Hosts

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Selective Runs

Run using a filter:all web servers with fact cluster=a

$ mco puppet runonce -W “cluster=a roles::webserver”

* [ ============================================================> ] 5 / 5

Finished processing 5 / 5 hosts in 90.06 ms

Facter fact Puppet Class

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Selective Runs

Run using a filter:nodes where we manage /srv/www

$ mco puppet runonce -S “resource(‘File[/srv/www]’).managed=true”

* [ ============================================================> ] 5 / 5

Finished processing 5 / 5 hosts in 90.06 ms

Any Puppet resource

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Selective Runs

Run using a filter:Most recent run config_version was xyz

that had > 5 resource failures

$ mco puppet runonce -S “resource().failed_resources>5 and resource().config_version=xyz”

* [ ============================================================> ] 5 / 5

Finished processing 5 / 5 hosts in 90.06 ms

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Runs all nodes with a maximum concurrency

$ mco puppet runall 72013-01-19 20:58:59: Running all nodes with a concurrency of 72013-01-19 20:58:59: Discovering enabled Puppet nodes to manage2013-01-19 20:59:02: Found 11 enabled nodes2013-01-19 20:59:06: node3.example.net schedule status: Started a background Puppet run2013-01-19 20:59:07: node1.example.net schedule status: Started a background Puppet run2013-01-19 20:59:09: node4.example.net schedule status: Started a background Puppet run2013-01-19 20:59:10: node6.example.net schedule status: Started a background Puppet run2013-01-19 20:59:12: node0.example.net schedule status: Started a background Puppet run2013-01-19 20:59:13: node5.example.net schedule status: Started a background Puppet run2013-01-19 20:59:17: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:21: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:25: node9.example.net schedule status: Puppet is currently applying a catalog, cannot run now2013-01-19 20:59:29: node8.example.net schedule status: Started a background Puppet run2013-01-19 20:59:33: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:38: node2.example.net schedule status: Started a background Puppet run2013-01-19 20:59:41: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:46: middleware.example.net schedule status: Started a background Puppet run2013-01-19 20:59:50: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:55: node7.example.net schedule status: Started a background Puppet run

Roll Out A Change Quickly

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Does not attempt to manage disabled nodes

2013-01-19 20:58:59: Running all nodes with a concurrency of 72013-01-19 20:58:59: Discovering enabled Puppet nodes to manage2013-01-19 20:59:02: Found 11 enabled nodes

Roll Out A Change Quickly

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Starts the first 6 quickly but considersadministrators doing 1other run at the same time

2013-01-19 20:59:02: Found 11 enabled nodes2013-01-19 20:59:06: node3.example.net schedule status: Started a background Puppet run2013-01-19 20:59:07: node1.example.net schedule status: Started a background Puppet run2013-01-19 20:59:09: node4.example.net schedule status: Started a background Puppet run2013-01-19 20:59:10: node6.example.net schedule status: Started a background Puppet run2013-01-19 20:59:12: node0.example.net schedule status: Started a background Puppet run2013-01-19 20:59:13: node5.example.net schedule status: Started a background Puppet run2013-01-19 20:59:17: Currently 7 nodes applying the catalog; waiting for less than 7

Roll Out A Change Quickly

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

node9 was being run by an administrator or normalschedule already, skipped to next node

2013-01-19 20:59:17: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:21: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:25: node9.example.net schedule status: Puppet is currently applying a catalog, cannot run now2013-01-19 20:59:29: node8.example.net schedule status: Started a background Puppet run

Roll Out A Change Quickly

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Regularly checks the concurrency and startsmore nodes soon as possible.

Average node run time 34.39s, totaltime 55 seconds

2013-01-19 20:59:29: node8.example.net schedule status: Started a background Puppet run2013-01-19 20:59:33: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:38: node2.example.net schedule status: Started a background Puppet run2013-01-19 20:59:41: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:46: middleware.example.net schedule status: Started a background Puppet run2013-01-19 20:59:50: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:55: node7.example.net schedule status: Started a background Puppet run

Roll Out A Change Quickly

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Does runonce in batches of 5, 5 minute sleepper batch. ^c after any batch to stop.

15 minute total run time.

$ mco puppet runonce --batch 5 --batch-sleep 300

* [ ============================================================> ] 11 / 11

Finished processing 11 / 11 hosts in 903686.29 ms

Roll Out A Change SlowlyWait 5 minutes

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Advanced Status And Performance Metrics

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Distribution of various metrics.

$ mco puppet summary

Summary statistics for 28 nodes:

Total resources: ▂▇▂▁▁▃▁▂▂▂▄▁▂▁▁▁▁▁▂▁ min: 332.0 max: 695.0 Out Of Sync resources: ▇▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ min: 0.0 max: 2.0 Failed resources: ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ min: 0.0 max: 0.0 Changed resources: ▇▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ min: 0.0 max: 2.0 Config Retrieval time (seconds): ▆▇▅▄▁▃▃▁▁▁▃▁▁▄▂▁▁▁▁▁ min: 2.7 max: 57.1 Total run-time (seconds): ▇▃▄▄▄▃▂▂▂▂▃▂▁▁▁▁▁▂▁▁ min: 7.0 max: 125.1 Time since last run (seconds): ▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂ min: 10.0 max: 89.0k

Performance Analysis

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Distribution of various metrics.

Config Retrieval time (seconds): ▆▇▅▄▁▃▃▁▁▁▃▁▁▄▂▁▁▁▁▁ min: 2.7 max: 57.1 Total run-time (seconds): ▇▃▄▄▄▃▂▂▂▂▃▂▁▁▁▁▁▂▁▁ min: 7.0 max: 125.1

Performance Analysis

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Distribution of config retrieval time.

$ mco plot resource config_retrieval_time

Information about Puppet managed resources Nodes 8 ++----*-----+----------+-----------+----------+----------+----------++ + * + + + + + + 7 ++ ** ++ | * * | 6 ++ * * ++ | * * | | * * | 5 ++ * * ++ | * * | 4 ++ * * ++ | * * | 3 ++ * * * * ++ | * * ** * ** | 2 ++* **** * * * ++ | * * * | | * * * | 1 ++ ************** ****** * * ** ++ + + + * + ** + *+ *** + 0 ++----------+----------+---------********-----+--*******-+----*-----++ 0 10 20 30 40 50 60 Config Retrieval Time

Performance Analysis

Slow machines

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Find machines with config_retrieval_time over30 seconds - all the dev servers.

$ mco find -S "resource().config_retrieval_time > 30"dev3.example.netdev4.example.netdev7.example.netdev6.example.netdev8.example.netdev9.example.netdev10.example.net

Performance Analysis

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Maintenance Windows and Access Control

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Only cert=manager can enable and disablethe Puppet Agent indicating maintenance

periods

policy default denyallow cert=manager enable disable * *allow cert=sysadmin runonce status * *allow cert=developer * environment=development *

Puppet State As ACL

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Puppet State As ACL

policy default denyallow cert=manager stop start * *allow cert=noc stop start puppet().enabled=falseallow cert=developer * environment=development *

NOC can start and stop servicesonly during a maintenance window.

Manager user can always overridemaintenance windows.

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

What is MCollective?

• Ruby framework for writing Orchestration systems

• Provides Authentication, Authorization and Auditing

• No direct communication between client and nodes

R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Questions?twitter: @ripienaar

email: rip@puppetlabs.com

blog: www.devco.net

github: ripienaar

freenode: Volcane

Questions?

top related