Best Practices: Demonstrating Value with BSA - BMC Software · BMC Server Automation (BladeLogic) v8.2 Best Practices Demonstrating Value with BSA (BladeLogic) Sean Berry Lead, Customer

© Copyright 1/10/2013 BMC Software, Inc 1

Argentina: 0800 444 6440Australia: 1 800 612 415Austria: 0800 295 780Bahamas: 1 800 389 0491Belgium: 0 800 75 636Brazil: 0800 891 0266Bulgaria: 00 800 115 1141Chile: 123 0020 6707China, Northern Region: 10 800 714 1509China, Southern Region: 10 800 140 1376Colombia: 01 800 518 1171Czech Republic: 800 700 715Denmark: 80 883 277Dominican Republic: 1 888 752 0002France: 0 800 914 176Germany: 0 800 183 0299Greece: 00 800 161 2205 6440Hong Kong: 800 968 066

Hungary: 06 800 112 82India: 000 800 1007 613Indonesia: 001 803 017 6440Ireland: 1 800 947 415Israel: 1 80 925 6440Italy: 800 789 377Japan: 00348 0040 1009Latvia: 8000 3523Lithuania: 8 800 3 09 64Luxembourg: 800 2 3214Malaysia: 1 800 814 723Mexico: 001 800 514 6440Monaco: 800 39 593Netherlands: 0 800 022 1465New Zealand: 0 800 451 520Norway: 800 138 41Panama: 00 800 226 6440Peru: 0800 54 129

Philippines: 1 800 111 010 55Poland: 00 800 112 41 42Portugal: 800 827 538Russian Federation: 810 800 2915 1012Singapore: 800 101 2320Slovenia: 0 800 80439South Africa: 0 800 982 304South Korea, Korea, Republic Of:

003 0813 2344Spain: 900 937 665Sweden: 02 079 3266Switzerland: 0 800 894 821Taiwan: 00 801 127 186Thailand: 001 800 156 205 2068Trinidad and Tobago: 1 800 205 6440United Kingdom: 0 808 101 7156Uruguay: 0004 019 0348Venezuela: 0 800 100 8540

INTERNATIONAL TOLL FREE: Participant Code: 704371

Best Practices: Demonstrating Value with BSA


Housekeeping

Please ask questions in the “Q&A” section, not in Chat:- Many “Q&A” questions can be addressed during the session by our experts, while

Chat is not seen by the Presenter until the very end of the session

BSA BP Webinar Series:- https://communities.bmc.com/communities/docs/DOC-21692

BMC Server Automation (BladeLogic) v8.2

Best PracticesDemonstrating Value with BSA (BladeLogic)Sean BerryLead, Customer Engineering Operations


Disclaimers

First Level TrainingBest Practice vs. How ToCovers Most Common TasksDoes not address every scenarioAssumes prior knowledge of BSA

components and terms


Agenda

Language, Terms and ConceptsDollars and HoursObjects and ScriptsReporting & Metrics

ApplicationEasy Value Realization / Packaging KnowledgeFully Realized Use Cases (CLC, OIC, FSP)Reliable, Repeatable

Where to StartQuestions & Feedback


Why does value matter to me?

What value does automation bring to the organization?How is it going to make my job easier?How is it going to make me look better to my boss?How is it going to make me and my team more marketable? (within and without the company)Ideally, your resume shouldn’t only list your job descriptions, it should be what you accomplished, and what you will be able to accomplish in the future.$$ value and metrics on your resume means more to a company than a list of tasks: “I installed an agent”.


Goals

Be able to:Talk about your server automation environment in dollars and cents: how much money does good reporting or compliance save your company every day/week/month/year?Identify the major use cases in your BSA environment, and how they add value- faster provisioning, - faster reaction to issues, - faster mean time to repair (MTTR), - lower cost of management, - faster customer response

Identify the next use cases you want your group to take on, and start building a business case for rolling it outSpeak to the costs of automation, and where it makes sense (macros vs. AI)Speak to the percentage of project (revenue-impacting) vs. maintenance (overhead) work


“It doesn’t need to be pretty or shiny, it just needs to get the job done.” What does an outage cost your company in dollars per hour?- Do you have a check for everything that’s ever caused an outage in

your environment? Is it built into your build policy? You have a build policy, right?


Different kinds of value

Getting Value From BladeLogic- What goes into a server and why does it matter?- How are data centers built? How do we organize around them? How

do servers end up there? What’s a datacenter and why put them there and not under our desks?

- Value comes both with a capital V value measured by CTO and small v, measured by whether you spend the rest of the week cranking on something, or whether you get it wrapped up tonight before you go home.

- BSA, in the hands of someone who knows how to use it (either through training or experience), is a force multiplier. We estimated at one customer that a skilled BSA user can be 3-5x more productive than an equivalent UNIX or Windows sysadmin. Being able to take on more tasks in a given window of time (more “project” work vs. maintenance work) adds value.


Introduction

Artifacts in the “Best Practices” franchise- BSA Best Practices Webinar Series:

https://communities.bmc.com/communities/docs/DOC-21692

- BSA 8.2 base documentation: https://docs.bmc.com/docs/display/bsa82/Home

- Deployment Architecture: https://docs.bmc.com/docs/display/bsa82/Deployment+architecture

- Sizing and Scalability: https://docs.bmc.com/docs/display/bsa82/Sizing+and+scalability+factors

- Disaster Recovery and High Availability: https://docs.bmc.com/docs/display/bsa82/High+availability+and+disaster+recovery

- Large Scale Installations: https://docs.bmc.com/docs/display/bsa82/Large-scale+installations

- BSA Database Cleanup Best Practice White Paper (internal) https://docs.bmc.com/docs/display/NP/BSA+Database+Cleanup

- Agent Cleanup blcli “Delete cleanup*” spaces

Dollars and Hours


Jobs in Dollars and Hours

What does an FTE or contractor cost per hour in your org?- Base salary + Fully loaded: w/ benefits/overhead/cubicle/workstation/VPN/travel $60k salary -> $30/hr base cost = ~$60-75/hr “loaded” 40*52 – vacation = 2000 working hours (w/o overtime)

For a given script execution, audit, compliance run, or software deploy:- How long would it have taken for an individual to execute this task by hand Including staging time Including identifying the correct servers Including verifying availability Could a level one or level two resource have done this task?

- How long does it take to run the job once?- How long does it take to schedule the job once?

Vs:- How much upkeep is required to maintain the job going forward? Including updating smartgroups (should be marginal or zero)

- 3x manual for setup, then marginal costs


Job Security vs. Project Work

Most organizations: - 80% Maintenance / Keep The Lights On- 20% Project Work (new initiatives, things that bring in revenue)- Maintenance -> overhead: first place to cut costs- CIO/CTO: “How can I get more of my projects done this year?”

Easy to see “Job Security” in the maintenance, but once automation becomes standard…Outsourcing vs. Automation:- Common to see 10 offshore resources executing patching on 10-15

servers each, manually- One engineer can commonly execute automated patching against

several 100s of machines, more automated, fewer human errors.80% of downtime caused by human error: reduce exposure

Objects and Scripts


Objects & Scripts

What’s a script?- A series of commands, sometimes including error-checking or conditional

flows, to accomplish a specific goal.Common scripting languages include various shells (Bourne, Korn, C, etc.), DOS/Command, Visual Basic (vbs), PowerShell, Perl, ExpectMany scripts start their lives as “pipe lines”, several commands piped together to find a specific item of information or answer a specific questionScripts are a great tool in the hands of a skilled user, can sometimes be more difficult to effectively delegate to L1/L2 users- Power tools: don’t always have safeguards- Effective testing- Required options: passing blank arguments or no arguments into scripts that

do “rm” type actions


Objects & Scripts

What’re Objects?- The set of “nouns” in BladeLogic, like files, directories, configuration entries,

registry keys, software packages (both platform-specific and platform-agnostic), service definitions, virtual guest packages, against which the “verbs” like Audit, Snapshot, Package, Deploy, and Rollback/Undo can be used.

What’s the difference?- One-off configuration audits, rather than retrieving and parsing config files (or

parsing in-place on remote servers) become a matter of identifying the desired configuration, and a fast audit, with clear color-coded callouts of which config is correct, incorrect, or missing.

- No more automation required around “ssh”, transport is taken care of.- A “canned” software package and deploy job can be created by a domain

expert working with a BSA expert to correctly install/upgrade an agent in an hour or two of effort. Afterwards, this process (package + job) can be delegated to L1/L2 users, included in the new server provisioning process, and used as a remediation action by the build compliance process.


Objects (continued)

What’s the difference? (continued)- The intelligence about how to talk to different operating systems, parse

configuration files, and deploy/rollback software is already either built or templated in. You get to start two steps ahead. (process development gets cheaper)

- Since the Objects and Jobs are supported by someone else, you’re not stuck supporting your scripts forever, unable to get promoted because you’re “too critical” to take on new responsibilities.


Would You Like To Know More?

“Scripty” post in the Optimize IT Blog: https://communities.bmc.com/communities/community/bsm_initiatives/optimize_it/blog/2011/01/14/scriptyAutomation in Cooking: https://communities.bmc.com/communities/community/bsm_initiatives/optimize_it/blog/2012/02/24/everything-i-know-about-automation-i-learned-from-my-sous-vide-supreme

Reporting & Metrics


Reporting on Jobs in Dollars and Hours

For a given script execution, audit, compliance run, or software deploy:- How long would it have taken for an individual to execute this task by hand Including staging time Including identifying the correct server Including verifying availability

- How long does it take to run the job once?- How long does it take to schedule the job once?- How much upkeep is required to maintain the job going forward? Including updating smartgroups (should be 0)

How often were you running that task?- Were you only running it occasionally because the overhead of the process was too

high to run more often?

How often does that job run now?- Biannual or quarterly compliance audits vs. weekly or even daily visibility into

compliance- Cost of being out of compliance- Cost of getting back to a compliant state


Reporting: Inputs for Presentations & Models

At least one large financial institution uses the output from BSA, combined with some custom reports and a couple of good spreadsheets to demonstrate value delivered with BSA$10MM++ projectQuarterly Business Reviews / Cost JustificationsHeadcount JustificationMetrics are Meaningful & Powerful: - Hard to argue with facts & numbers- Easier to argue with interpretation of facts

Conservative estimates always help, better to aim lowDon’t try to do –everything- in ReportingDon’t be discouraged if you do have to do some post-processing


Executive Perspective- Business analytics- Key Performance Indicators

Decision Support- Operations reporting- Continuous improvement

State of Compliance- Self-certification reports- Full template for each standard

Server Automation LifecycleReport


Pre-defined Standard Reports- Audit results- Trends

Self-Certification Compliance Reports- PCI- HIPPA- ITIL v3

User Definable Reports- Ad-hoc queries- Customize formats, branding and

calculations

Dashboard Summary Reports- Value framework ROI metrics- Validates ROI goals of business

case are being achieved

Server AutomationComprehensive Visibility


Reporting in Dollars and Hours

BDSSA provides OOTB reports that can help report in terms of dollars and hours: you may end up needing to either create a custom report or do some post-processing in Excel- There’s still value in being able to generate the underlying stats- Use what’s available out of the box or with small amounts of work to help support

your business case

Fully Realized Use Cases (CLC, OIC, FSP)


Fully Realized Use Cases

These use cases assume a fully operational BSA environment. Some require integrations with a Change or Incident system.The road to implementing these use cases has many steps, and requires:- Functional process- Buy-in from all impacted groups- Working integrations & supported software versions- A healthy infrastructure environment- Trained and effective staff- Ongoing support

Closed Loop ComplianceOperator Initiated ChangeFull Stack Provisioning


Closed Loop Compliance

Large Insurance Company’s closed loop compliance story.Compliance Initiatives: - Regulatory requirement: demonstrate server hardening / compliance

to a security policy or face a $2MM fine. Could just as easily have been a reaction to bad press at their company or another.

- Industry/Public/Customer Pressure: (PCI)Requirements:- Demonstrate 100% compliance to hardening policy w/ exceptions- Without tripling headcount- Create an incident ticket for every finding- Change tickets - Exceptions- Reporting



Practical solution:- Customized hardening policies from Out of the Box (value: didn't have to start from scratch)

- With workflows (available pre-built these days)- Creates incidents when alerts are generated, and execute the

remediation process. (value: many manual steps, now runs quickly)

- Compliance jobs run on a daily or weekly basis- Results are inspected right in the BSA console, and exceptions are

logged from the same console. - Headcount vs. workflows built once and maintained -> lower cost.



Weekly/daily lights-out audits vs. manually or semi-automated quarterlyPreviously cost-prohibitiveThe "invisible" cost: configuration drift between audits and inertia- Fear of change/risk- More regular audits: easier enforcement






Operator Initiated Change

Operator Initiated Change: - a change is selected or defined by the operator- linked into Change Management- when approved (and maintenance window reached), the approved

Change executesValue: - Effective Change Process- Less time spent in Change meetings, - much better change visibility and documentation, - lower total risk- (morale?)


Full Stack Provisioning (Day 1)

Initial Build Process: More than bare metal provisioning / template deployMany solutions: Most value comes after the “bare metal”:

configuration, hardening, agent stack, middleware provisioning and configuration, install of 3rd party apps, Content deploy (J2EE, .NET, web, app)

- Most of the cost of provisioning: different build technologies support staffing TIME

Many participants and steps: - each contributor has a hand-off and an SLA. - If SLA / each step is 3 days * 10 steps = 30 bus. days = 6 WEEKS


Server Automation Lifecycle – DeployProvision

Operating System (OS)

OS Configuration

Applications

App Configuration

Data

Full S

tack P

rovis

ioning

Rack & StackSetup Hardware

OS ProvisioningInstall Operating System

Application ProvisioningSimple and Complex Applications

Server HardeningApply Security Policies and Patch

Required CapabilitiesBare-Metal Provisioning

Virtualization Template Deployment

Windows Image-Based Provisioning

Required CapabilitiesEnvironment-Aware Packaging

Model-Based Configuration Management

Granular, Surgical Configuration Control

Required CapabilitiesException-Handling allows for flexibility

Roll-Back reduces the risk of changes


Server Lifecycle


Full Stack Provisioning Value Requirements

Functional Bare Metal and/or Virtual Guest Provisioning Environment & Team- Provisioning- Virtualization (on all platforms: VMware, Hyper-V, Solaris Zones, IBM LPARs, etc.)

Functional Packaging and Promotion - BLPackager- Software Packages (incl. Custom Software Packages)- NSH Scripts & Jobs

Functional Compliance & Hardening- Every system should leave the “Server Factory” fully secured & compliant with: Security (CIS, DISA, custom) Regulatory (PCI, HIPAA, GLB, SOX, custom) Build Policies (OS platform, Middleware Platform, Data Center-specific)

Functional Patching & Hardening- Every system should leave the “Server Factory” fully patched to the current policy (no

“big leaps” to get patched to standard)

Functional Inventory/Snapshot

Packaging L3 Know-how for L1/L2 Users


L3 know-how

Talk trackSkilled admins & subject matter experts (SMEs) usually have the privileges to maintain any component of a server or application, however, agent maintenance & other common tasks are not necessarily a good use of their time.Agent install/upgrade & other common tasks can be easily packaged by SMEsL1/L2 can then execute these tasks whenever needed, as many times as required.


Current Inventory vs. Spreadsheets

Most inventories- Static spreadsheets, “stale once emailed”- Compiled quarterly (or worse)- Hard to correct/feedback

BladeLogic Customer example- Automated inventory survey -> report- Massive power outage- Used current inventory spreadsheet to build a “restart” plan

Value- “Date Updated” indicates last contact, currency of data- Current inventory increases confidence in decisions- BSA seen as “source of truth” for the data center- Inventory information used in Smart groups to quickly answer questions like: “How many Windows 2008 Servers do we have in Production” “How many RHEL 5 in the San Jose data center?”

I Found Something Wrong(ad-hoc & build audit)


Ad-hoc Audits

What does an outage cost your company in dollars per hour?Insurance Company – acquired resources- Small set of servers, not built by our process- Remote Data Center: out of sight, out of mind- DNS, service accounts not setup correctly: when there’s an incident only a couple of

people know how to get into these systems- Response time, service level is poor, -> service perception is poor: low value

Datacenter move- Chicago data center: moving between facilities. - Significant pre-planning executed, some “invisible” assumptions.- When Chicago DNS server went offline, so did customer e-servicing “Put it back!” -> delayed move for hours Service unavailable or underperforming for 5 hours Isolated to misconfigured resolv.conf: several sysadmins had looked at that

configuration: only caught through scripted comparison.- Basic build audits could have caught or prevented - Thousands of dollars of lost revenue


Configuration Compliance in Banking Use Case

Large financial institution near NYC, casual conversation discovered:Contractor assigned on a 90-day project to verify & reconcile /etc/resolv.conf entriesContractor probably billed at least $60/hr: 90*8*60: at least $43K problemProblem phases:- What do I have? (Discovery / Inventory)- Which is correct? (Manual/human interaction & Audit)- Identify incorrect servers (Snapshot & Audit or live-live Audit)- Package Changes (from Audit results)- Change approval (usually an external process)- Deploy Changes (execute Deploy)- Rollback in event of issues

Simple audit of /etc/resolv.conf using existing server smartgroups- < 1 hr “door-to-door”- Existing “intrinsic” standards become obvious

How many places in this process can we cut out cost? Do you want to spend 90 days chasing one fairly basic set of configurations?


Build Audits

“One true build policy”: - Single OS -> at least a secure and “standard” build

Many servers in a data center -> at least a few common traits per groupMost orgs have –some- kind of build standard- scribbled notes on a sheet passed around between admins- Under-utilized word doc - Configurations built into bare metal provisioning system

(kickstart/jumpstart/etc.) Most non-automated build standards aren’t complete, and are rarely updated.


Build Standards

Drift: Standards change over time, “July 2011 build”6-12 different builds over three years (times the number of different kinds of builds)Vs. standard RHEL 5 build that changes over time- Evaluate all servers to that standard regularly

Builds break down into major components: a given set of vertically aligned components is sometimes called a “stack”. - “SQL Server 2008” stack might be - built on Windows 2008 R2, - on virtual or on a standard make and model of hardware (HP DL380

G??),- have a standard set of agents appropriate for a database server, etc.


Build Standards

The build standard consists of the:- hardware (virtual or physical)- operating system- OS configurations & hardening- agent stack- middleware or applications- middleware/application configurations- Middleware/application content (web content, J2EE/.NET apps, etc.)- Governing policies Patching Security/Regulatory Build standard


Build Standards

These can all be different policies, which only need to apply to the specific servers they’re relevant to. Even a single policy with a few rules can deliver value, and is a great place to start.Once built, the next time a configuration either causes a problem, or someone remarks on a misconfiguration, create a rule for it.


Change Tracking

This is common any time we want to know when something has changed, but once it's changed, we want to use that as the new standard.Not to be confused with a build audit, where any deviation from standard required remediation.Sometimes called a "rolling" audit: this gives visibility into authorized and unauthorized change, and can be used to either verify configuration change, or identify unauthorized change.Auditing the entire machine (some 100,000 configurations) will generate mostly noise, Filter down for known, managed configuration items.

Reliable, Repeatable


Typical Non-RSCD Agent Deployment

A basic deployment can consist of something as simple as dropping a tarball on a system, extracting it, and running a command.However, most deployments worth automating rarely stay so simple Now we need to be able to pass a hostname, or a directory to install in, or create a user account for the agent to run under.Test whether directory is present, writeable, correct permsDo the right thing if user account is already present. Handle error conditions.Need to be able to train our users to be able to understand the results of this deploy process.


Easy Value Demonstrations

Directory/file sync: scheduled, logged, auditableEmbed non-NSH scriptAnything that consolidates information: remote inventory, cmd or file pickupConfig file audit: resolv.conf, ntp.conf, backup agent config- Easy to add new config file, new grammar

Basic software deploy: build once, use many times- Easy for L1/L2 to use via Execution Tasks- Easy to use in Provisioning- Audit/Compliance: Use for remediation

Build Compliance- Start with basic hardening: sshd/PermitRootLogin- Required agents & versions- Services running/disabled

Any semi-manual or tedious task executed weekly

Where to Start


Additional Resources & Information

BSA Best Practices Webinar Series: https://communities.bmc.com/communities/docs/DOC-21692

Online Documentation- BSA Deployment Architecture Best Practices

http://docs.bmc.com/docs/display/public/bsa82/Deployment+architecture- Product Documentation

http://docs.bmc.com/docs/display/public/bsa82/Home

BMC Communities (public forum)- BMC website documents discussions whitepapers additional information

- https://communities.bmc.com/communities/community/bmcdn/bmc_service_automation/server_configuration_automation_bladelogic

What to do when you inherit a BSA installation, including “How to” videos: https://communities.bmc.com/communities/community/bsm_initiatives/optimize_it/blog/2012/06/15/taking-the-reins-server-automation


Howto Videos

Initial Install – Database Setup: On BMCdocs YouTube at http://www.youtube.com/watch?v=91FEUDVD6sEInitial Install – File Server and App Server Installs: On Communities YouTube at

http://www.youtube.com/watch?v=m7Y3SY23kuQInitial Install – Console GUI and Appserver Config: On Communities YouTube at

http://www.youtube.com/watch?v=uwqlj60Lvo0Compliance Content Install: On BMCdocs YouTube at http://www.youtube.com/watch?v=bXdaogDsCNcCompliance Quick Audit: On BMCdocs YouTube at http://www.youtube.com/watch?v=i8BLi4WAWEYBSA 8.2 Patching - Setting Up a Windows Patch Catalog: On Communities YouTube at

http://www.youtube.com/watch?v=nfpFpOuub9k.Windows Patch Analysis: On Communities YouTube at http://www.youtube.com/watch?v=ODWhC01uEaQ.Patching in Short Maintenance Windows with BMC BladeLogic Server Automation: On Communities YouTube at

http://www.youtube.com/watch?v=o6Lfzbb3JZg.Basic Software Packaging: http://www.youtube.com/watch?feature=player_embedded&v=dtOWTTFqsaYSOCKS Proxies: https://communities.bmc.com/communities/community/bmcdn/bmc_service_automation/server_configuration_automation_bladelogic/blog/2012/11/30/how-to-use-socks-proxies-with-bsa-to-deal-with-firewalls-and-overlapping-ip-ranges

Questions and Feedback

Change and Build Audits


Change and Build Audit Use Cases Tracking

Change Tracking is the most basic form of Build Compliance. It says that something, once configured according to a standard, shouldn't change without authorization.A typical configuration might be a local account deployed on servers, or DNS Server entries (on UNIX, this is typically in /etc/resolv.conf). There are several more advanced ways to do this (including a really beautiful demonstration of the uses of the Property Dictionary), but the basic use case is easy to setup, and easy to show initial value.

Best Practices: Demonstrating Value with BSA - BMC Software · BMC Server Automation (BladeLogic) v8.2 Best Practices Demonstrating Value with BSA (BladeLogic) Sean Berry Lead, Customer

Documents