BSA Deployment Best Practice Part 1 - BMC Software · 2020-05-04 · BMC Server Automation (BladeLogic) v8.2 Best Practices Deployment and Configuration Session 1 ... Deploy core

© Copyright 11/4/2012 BMC Software, Inc 1

Argentina: 0800 444 6440Australia: 1 800 612 415Austria: 0800 295 780Bahamas: 1 800 389 0491Belgium: 0 800 75 636Brazil: 0800 891 0266Bulgaria: 00 800 115 1141Chile: 123 0020 6707China, Northern Region: 10 800 714 1509China, Southern Region: 10 800 140 1376Colombia: 01 800 518 1171Czech Republic: 800 700 715Denmark: 80 883 277Dominican Republic: 1 888 752 0002France: 0 800 914 176Germany: 0 800 183 0299Greece: 00 800 161 2205 6440Hong Kong: 800 968 066

Hungary: 06 800 112 82India: 000 800 1007 613Indonesia: 001 803 017 6440Ireland: 1 800 947 415Israel: 1 80 925 6440Italy: 800 789 377Japan: 00348 0040 1009Latvia: 8000 3523Lithuania: 8 800 3 09 64Luxembourg: 800 2 3214Malaysia: 1 800 814 723Mexico: 001 800 514 6440Monaco: 800 39 593Netherlands: 0 800 022 1465New Zealand: 0 800 451 520Norway: 800 138 41Panama: 00 800 226 6440Peru: 0800 54 129

Philippines: 1 800 111 010 55Poland: 00 800 112 41 42Portugal: 800 827 538Russian Federation: 810 800 2915 1012Singapore: 800 101 2320Slovenia: 0 800 80439South Africa: 0 800 982 304South Korea, Korea, Republic Of:

003 0813 2344Spain: 900 937 665Sweden: 02 079 3266Switzerland: 0 800 894 821Taiwan: 00 801 127 186Thailand: 001 800 156 205 2068Trinidad and Tobago: 1 800 205 6440United Kingdom: 0 808 101 7156Uruguay: 0004 019 0348Venezuela: 0 800 100 8540

INTERNATIONAL TOLL FREE: Participant Code: 703371

BSA Deployment Best Practice Part 1

BMC Server Automation (BladeLogic) v8.2

Best PracticesDeployment and Configuration Session 1Sean BerryLead, Customer Engineering Operations


Overview

First Level Training- Basic Deployment Knowledge

Best Practice vs. How To

Covers Core BSA Components

Does not address every scenario

Assumes prior knowledge of BSA components and terms


Agenda

Phased DeploymentCore ComponentsScaling BladeLogicGeographically-Distributed InstallationsRedundancy / Fault ToleranceSecurity Best PracticesConfiguration GuidanceQuestions & Feedback


Introduction

Artifacts in the “Best Practices” franchise- BSA 8.2 base documentation:

https://docs.bmc.com/docs/display/bsa82/Home- Deployment Architecture:

https://docs.bmc.com/docs/display/bsa82/Deployment+architecture- Sizing and Scalability:

https://docs.bmc.com/docs/display/bsa82/Sizing+and+scalability+factors- Disaster Recovery and High Availability:

https://docs.bmc.com/docs/display/bsa82/High+availability+and+disaster+recovery- Large Scale Installations:

https://docs.bmc.com/docs/display/bsa82/Large-scale+installations- App Server sizing spreadsheet (internal)

Read the book, or just see the movie?- Definitely read the book(s)


Phased Deployment – Customer First

BSM & Automation involves many moving parts and dependencies- People- Process- Products

Understand the environment first- Business- Technical

Design and document initial plan first - Deployment depends on Implementation Architecture- Implementation Architecture depends on Scale- This session is focused on Deployment Architecture

Deployment will be completed in phases.- Don’t try to do everything at once (“Boil the Ocean”)- A plan is required- It’s important to consider dependencies- Plan for what the product & your environment is capable of today.


Phased Deployment – Major Considerations

Drivers- Business Specific project goals: compliance deadline, tool replacement, labor savings Other Projects Time Lines

- Technical What is installed vs. not installed

– AO? Provisioning? Patching? # of Platforms?– ITSM, CMDB?

Environment Readiness– Hardware– Defined policies– Access (PXE, DHCP, firewalls, SOCKS, etc.)– Choose initial use cases with highest value, greatest potential for success, clearest requirements

- People Vacations, other assignments, etc Training (Use cases, general BSA foundational knowledge, etc.) Team priorities (where are the burning needs) Concurrence


MATURITY

Discovery

ConfigurationPolicy Management

Application ReleaseManagement

Service Provisioning & Re-purposing

Software Distribution

InfrastructureProvisioning

Patch Management

Configuration AutomationSolution Summary

Change Management

Improve Service Quality – Execute complex changes correctlyReduce Risk – Comprehensive rollback for changes; Define and enforce administrative rolesReduce Cost – Single change process across all platforms; Automate configuration & performance mgmt processes

Compliance (Build, Security , Regulatory)


Phased Deployment – Recommended Order

Phase 1 – Implementation Architecture- Determine and document implementation architecture- Depends on Scalability- Must be done first for every project big or small

Phase 2 – Deploy core infrastructure- Database- Application Servers & File Server[s]- Client Software (RCPConsole GUI, NSH, etc.)- RSCD Agents- Reporting- Repeaters (as applicable)- Order and content will vary a bit depending on the project

Phase 3 – Deploy initial use cases- Inventory/Discovery- Basic Compliance measurement (Patch, Build, Regulatory/Security)


Phased Deployment – Recommended Order

Phase 4 – Configure Initial Reporting- Show value & metrics back to the business early on

Phase 5 – Configure Software Deployment- Build one or two basic software deployment packages first (start with the easy ones)- Parameterize the packages as necessary next- Get comfortable setting options third

Phase 6 – Setup Patching & Provisioning- Identify supporting infrastructure & start change requests- Define build inputs & policies- Patch: Start with the most generous policy in analyze-only mode

Phase 7 – Setup Script Execution- Roll out existing scripted actions, refactor Inventory & Configuration change /

Configuration validation where possiblePhase 8 – Setup Closed Loop Change or Operator-Initiated Change, CMDB syncPhase 9 – Identify & Measure KPIs- Patch compliance %, number of servers provisioned per week, number of

remediations per month, number of server touches, etc.


Phased Deployment – Last Key Points

Always keep the Business in mindSolution Acceptance / Perception- Work to show and prove value early Identify basic compliance up front Inventory/Asset Reporting If it’s not reportable, it difficult to demonstrate value

Major Mistakes to Avoid- Incomplete Asset Reporting Inaccurate / unknown state Perceived solution failure

- Incomplete Use Case Deployment / Training Reports look good, but users are not comfortable with basic tasks Perceived solution failure

Agents must be deployed comprehensively, early (90+%)Don’t do compliance in reporting only: better to define policy than dump bulk info


Terminology

Target- A managed server or OS instance running the RSCD agent

Application Server- An instance of the software that does work in the form of Jobs, communicating with the

RSCD agentsFile Server- One or more systems, running the RSCD agent, that makes available one shared

storage space for payloads and scripts used by the Application ServersRepeater- One or more systems located in remote data centers for efficient one-to-many

replication of deployment payloadsConsole- An instance of the graphical client, used to interact with the Application Servers and

managed TargetsNetwork Shell- The command-line tool used to interact with servers


Terminology (cont’d)

Server Smart Group- A way of grouping servers together by common properties: OS, location, environment

Job- An automation task that interacts with one or more servers, usually uses Server Smart

GroupsDepot Object- A package, file, patch, configuration item, or software object not associated with any

specific server


Basic Architecture – Consoles & Administration

Core Components- BSA RCP Console (“the GUI”)- Network Shell- BLCLI- Web Services Interface

Optional Components (deploy/use when necessary)- BDSSA (Reporting) Web Interface- PXE/TFTP Servers, Provisioning Datastores- Repeaters- Unified Agent Installer- Atrium CMDB Integration Engine- AO Approvals (Operator Initiated Change)- Active Directory Synchronization- PATROL Agent Installers- Virtualization, AO, ADDM Integrations- Advanced File Server & Advanced Repeater

Core Components


BMC Server Automation (BladeLogic)Logical Architecture

CONSOLE

MID

TIER

NODES


Application Server Types

= + +


BMC Server Automation (BladeLogic)App Server Types

CONSOLE

MID

TIER

NODES


App Server Types

“ALL” type servers include the function of all three of: JOB, CONFIG, and NSH Proxy serversJOB servers:- heavy lifting- run Jobs, compliance calculations- Key players in the bulk of use cases. - Very good at using all available resources for fast Job execution.

CONFIG servers:- receive incoming user connections from the GUI (RCP Console), and Command Line

Interface (CLI or blcli). - When run only as a CONFIG or CONFIG/NSH Proxy server, better UI performance

than shared with JOB.


BMC Server Automation (BladeLogic)Client Access Without NSH Proxy

CONSOLE

MID

TIER

NODES


BMC Server Automation (BladeLogic)Client Access With NSH Proxy

CONSOLE

MID

TIER

NODES


App Server Types (cont’d)

“ALL” type servers include the function of all three of: JOB, CONFIG, and NSH Proxy serversNSH Proxy servers provide:- connectivity to managed targets (solves some firewall challenges)- authentication for NSH- centralized audit point (solves some audit requirements)- Adds an extra layer of security protecting NSH agents


BMC Server Automation (BladeLogic)Depot

CONSOLE

MID

TIER

NODES


Depot Information

File Server –can be anything that will run the RSCD agent and has sufficient storage space (local or NAS/SAN). File server can be on NAS only with Linux/UNIX.Database – SQL Server or Oracle: this is where all metadata, configurations, change tracking data, Jobs, anything that’s not an installable or executable, etc. are stored.Given a good copy/backup of the File Server and the Database, disaster recovery is fairly straightforward. Without one or the other, recovery can be very difficult.


BMC Server Automation (BladeLogic)Repeaters for Remote Networks

CONSOLE

MID

TIER

NODES


BMC Server Automation (BladeLogic)SOCKS Proxy for Restricted Networks

CONSOLE

MID

TIER

NODES


Network Problem Solving: Proxies & Repeaters

SOCKS Proxy- When agents are behind a firewall and app server is outside- One port through firewall, many from SOCKS to managed targets

Repeaters, Advanced Repeaters- Can save significant bandwidth over a WAN App server to repeater (once), then Repeater to target(s)

- For “indirect” deploy jobs- Advanced Repeater uses Marimba technology to reduce WAN traffic even more

Repeaters & SOCKS Proxy can be combined together

Scaling BladeLogic


Scaling Up, Scaling Out

First things first: Make sure your database can handle the load- regular cleanup (weekly & historicals daily if necessary)- monitoring of growth, I/O, CPU usage, etc.- Sufficient db connections to support # of WIT, etc.

Vertically scale your application servers- increase job server work-item threads and JVM heap size- exploit available CPU and memory (typical machine size these days)

Horizontally scale your application servers- add job servers as needed- survey spreadsheet based tool for job server estimation- add config servers as needed, with load balancers

load balancer can be avoided if user population naturally partitions

Job and config servers could be hosted on the same physical host- Rule of thumb: two CPU cores per app server instance- Ensure sufficient physical memory

Virtualized App Servers – one per OS image, (8-12GB RAM, 2 vCPU, 4GB heap recommended)


Job Server capacity

Number of work item threads per job server is configurable - Too few threads jobs could take too long to complete- Too many threads JVM could run out of memory

“Lightweight” Work Items can be served out of a separate thread pool- Some work items (e.g., in deploy jobs) consume very few app server resources, and

can be handled with greater parallelism- Default configuration of zero LWI threads will use normal WITs for LWIs

Asynchronous execution for remote work- When the work really happens on the target, it’s not necessary to hold up a WIT to

just wait for a response- Used by NSH Script Job (type 3), Patch analysis, Deploy, and SCAP compliance

Configure and Schedule your jobs to distribute load- Avoid overlaps where possible- Split targets across multiple jobs if necessary (~4000 servers/job)- Be careful with “unlimited” job parallelism


BMC Server Automation (BladeLogic)Scaling the File Server: Single File Server

CONSOLE

MID

TIER

NODES


BMC Server Automation (BladeLogic)Scaling the File Server: Virtualized File Server

CONSOLE

MID

TIER

NODES


File Server Scaling

The app server transfers data from the file server over NSH- Not as efficient over the network as NFS or CIFS

The file server’s storage is likely to be SAN or NAS already in most environmentsSome performance benefit can be realized by mounting the same share on each app server- Each app server sees the file server as “localhost”- File storage has to be mounted to the same mount point on each app server- NSH communication happens over loopback (app server to local agent)- Data travels over the wire using NFS/CIFS (agent to filer)

The NAS can be clustered/load balanced- Allows the usual benefits: redundancy, performance- Usually already have expertise here

If standalone file server, ensure sufficient file handles (>16k) and I/O performance


BMC Server Automation (BladeLogic)Typical Small Infrastructure (~1000 servers)

CONSOLE

MID

TIER

NODES


BMC Server Automation (BladeLogic)Typical Larger Infrastructure

CONSOLE

MID

TIER

NODES

Geographically-Distributed Installations


Geographically-Distributed Installations

App Servers, Database, and GUI Clients should remain close together- High data volume (sensitive to bandwidth)- High packet volume (sensitive to latency)- Database links in particular are sensitive to packet loss

Install Citrix Presentation Server to support remote users- Or Remote Desktop to provide responsive access to remote users

Install BladeLogic standard Repeaters in each remote data centerInstall BladeLogic Advanced Repeaters in each remote data center where bandwidth must be constrained (sensitive network links)Install provisioning infrastructure in each remote data center- PXE doesn’t perform reliably across many real-world WANs- Keep OS images as local to the provisoning target as is reasonable

Install SOCKS Proxies in each remote data center where minimal firewall configurations are required or overlapping IP addresses exist

Redundancy / Fault Tolerance


High Availability

“High Availability” here means that any one failure isn’t catastrophic (no single point of failure)

Database: Clustered database (e.g., SQL Server clustering or Oracle RAC) for highly available data access

File Server: Clustered NAS/SAN server & virtualized fileserver

Multiple Job servers on separate machines- Job servers can handle their own failover- Work items in flight on a failed job server will fail

Multiple Config servers and NSH Proxy servers on separate machines, with highly-available load balancer.- Load balancer handles failover of config servers


BMC Server Automation (BladeLogic)Disaster Recovery

CONSOLE

MID

TIER

NODES

Database Replication

Filesystem Replication


Disaster Recovery

Disaster recovery is not “high availability”, it is the recovery of the entire environment due to a major catastropheDatabase: Replicate BladeLogic database (e.g., using Oracle DataGuard / GoldenGate / other replication technologies per company standard)Stand-by infrastructure (job servers, config servers, GUI clients) ready to go at DR siteFailover needs to be “rapid,” not “instantaneous.”- Failover is manually initiated.

Worst case: with a good copy of the database and file server, the installation can be stood back up by installing a new appserver. So make sure your backups of both are good, and please test them regularly. This is a great way to do upgrade testing in a lab environment.

Security Best Practices


Security Best Practices: Infrastructure

Use App Server Certificates- By default, the installation process generates self-signed certificates for app servers.- Bad guys can self-sign certificates, too.- Recommend investing the effort in using a proper Certificate Authority, even if

internal.

Use Client-Side Certificates- Each management console should also have a certificate- App servers should be configured to require client-side certificates.

Use an NSH Proxy- Adds an extra layer of authentication to NSH communication

Don’t allow log-ins on the hosting servers- Part of protecting the app server infrastructure is to limit access to the underlying

servers.

Treat the BLAdmin and RBACAdmin user accounts like ‘root’- Each user should have their own account, and switch roles when necessary to

invoke elevated privilege.


Security Best Practices: Agent Management

Use ‘exports’ file on each agent to restrict access- Allow connections only from job servers and NSH Proxies- Prevents rogue appservers or NSH clients from making direct connections

Use “ACL Push” jobs to manage ‘users’ file on each agent- Establish “ACL Push” jobs for all agents (except file servers!)- Schedule ACL Push jobs for regular execution (e.g., weekly)- PUSH_ACL_NO_USERS_FLAG property: leave set

Add BLAdmins:* to ‘users.local’ file on each agent- Allows access by BLAdmins role as a backup, in the event of an issue.- (map as a local administrator)

Setup NSH on an existing shared-access UNIX host- It’s not uncommon to install NSH (client) on an existing shared UNIX host: this is an

easy way to configure NSH where many users can access the shell without involving VPNs

Configuration Guidance


App Server Configuration

Java Memory- 32-bit processes have to walk a fine line- 64-bit processes have more address space, but also use more memory- Specific values for each supported environment

Work Item Threads- Max performance usually comes from max number of work item threads that doesn’t

run the JVM out of memory- Diminishing returns for increasing WITs in job servers- Recommendation: 50 WITs (32-bit), 100 WITs (64-bit)

Database Connections- Two ‘job’ database connections per work item thread

(MaxJobExecutionConnections)


Configuration Guidance - App Server Parameters

Hosting Environment Java Heap Size Physical Memory Work Item Threads32-bit Windows 1GB 4GB 5032-bit Linux 1.5GB 4GB 5032-bit Solaris 2GB 4GB 5064-bit (any) 4-6GB 8-12GB 100

Note that you can get twice as many WorkItemThreads (WITs) per appserver instance on 64-bit Oses with a 50% heap:physical memory ratio, but will need enough memory to support it. Scale is not necessarily as linear as 64-bit objects require more memory than 32-bit. Start with 4GB heap, leave room to grow if very active.


Work-Item Capacity / Time Constraints

Task/Maintenance Windows:- Performance and capacity is often relevant to a specific task or maintenance window:

whether it’s 11PM-7AM Saturday night maintenance window, or a 2-hour change window on a week-night, tasks are often time-constrained

Total performance capacity- Capacity of the environment: number of work items available vs a given task window:

In an environment with 2 app servers with 50 WorkItemThreads (WITs) will have 100 total WorkItemThreads: 2 APP x 50 MaxWIT = 100 WIT total

- In a given 60 minute window: 100 WIT x 60 minutes = 6000 WIT-M).

So, a task that takes 3 minutes and uses one WIT per server, may be able to run across as many as 2000 servers in about an hour at this capacity - 6000 WIT-M / 3 min/svr/WIT = 2000 servers

In practice, there is also time required for setting up a given Job and closing it down, but this is a useful shorthand.


Work-Item-Thread-Minutes / Time Constraints (cont’d)

Some long-running single-server Jobs are constrained by MaxJobs (like Provisioning), and may run for a few minutes: here total capacity may be less (typically 20 MaxJobs / appserver)Some types of NSH Script Jobs run parallel (type 1 & 3), while others are single-threaded (type 2 & 4)JOB_TIMEOUT and JOB_PART_TIMEOUT properties ensure that tasks complete or exit within time constraints, and that they don’t wait for non-responsive hosts or threads.Make sure to test & become familiar with job performance before using in critical maintenance windows


Config Servers

Driven by User feedback- UI performance vs. Job

Rule of Thumb: 1 Config Server for every 50 concurrent users- Users are unpredictable

Config Servers also serve BLCLIs- called from native shell scripts / NSH scripts- concurrent BLCLI scripts also count as users

Scalability driven by physical resource utilization- CPU and Memory (both Physical and Java)

Incoming Users typically distributed by Load Balancers


Agent Health

Agent Health is critical to successful job runs because the appserver is generous when trying to talk to a slow remote agent. - JOB_PART_TIMEOUT

Agent Health Survey:- Servers go up and down regularly - Run the “Update Server Properties” Job regularly, and before a critical job updates AGENT_STATUS property:

– “Agent is Alive” for hosts that are up, vs. – “Agent is Unavailable” for hosts that are down.

- AGENT_STATUS in Server Smart Groups to include only available hosts in Jobs Can’t deploy to a host that’s not up

Recovery:- Re-run Update Server Properties Job more often against a server group that only

includes “down” servers- Use a Server Smart Group to identify hosts that have been out of contact > 30 days


Best Practices Foundation Documentation

Artifacts in the “Best Practices” franchise- BSA 8.2 base documentation:

https://docs.bmc.com/docs/display/bsa82/Home- Deployment Architecture:

https://docs.bmc.com/docs/display/bsa82/Deployment+architecture- Sizing and Scalability:

https://docs.bmc.com/docs/display/bsa82/Sizing+and+scalability+factors- Disaster Recovery and High Availability:

https://docs.bmc.com/docs/display/bsa82/High+availability+and+disaster+recovery- Large Scale Installations:

https://docs.bmc.com/docs/display/bsa82/Large-scale+installations- App Server sizing spreadsheet (internal)

Read the book, or just see the movie?- Definitely read the book(s)


Additional Resources & Information

Online Documentation- BSA Deployment Architecture Best Practices

http://docs.bmc.com/docs/display/public/bsa82/Deployment+architecture- Product Documentation

http://docs.bmc.com/docs/display/public/bsa82/Home

BMC Communities (public forum)- BMC website documents discussions whitepapers additional information

- https://communities.bmc.com/communities/community/bmcdn/bmc_service_automation/server_configuration_automation_bladelogic

What to do when you inherit a BSA installation, including “How to” videos: https://communities.bmc.com/communities/community/bsm_initiatives/optimize_it/blog/2012/06/15/taking-the-reins-server-automation


Howto Videos

Initial Install – Database Setup: On BMCdocs YouTube at http://www.youtube.com/watch?v=91FEUDVD6sEInitial Install – File Server and App Server Installs: On Communities YouTube at

http://www.youtube.com/watch?v=m7Y3SY23kuQInitial Install – Console GUI and Appserver Config: On Communities YouTube at

http://www.youtube.com/watch?v=uwqlj60Lvo0Compliance Content Install: On BMCdocs YouTube at http://www.youtube.com/watch?v=bXdaogDsCNcCompliance Quick Audit: On BMCdocs YouTube at http://www.youtube.com/watch?v=i8BLi4WAWEYBSA 8.2 Patching - Setting Up a Windows Patch Catalog: On Communities YouTube at

http://www.youtube.com/watch?v=nfpFpOuub9k.Windows Patch Analysis: On Communities YouTube at http://www.youtube.com/watch?v=ODWhC01uEaQ.Patching in Short Maintenance Windows with BMC BladeLogic Server Automation: On Communities YouTube at

http://www.youtube.com/watch?v=o6Lfzbb3JZg.Here's another video I made about basic packaging of a Windows MSI.Basic Software Packaging: http://www.youtube.com/watch?feature=player_embedded&v=dtOWTTFqsaY

BSA Deployment Best Practice Part 1 - BMC Software · 2020-05-04 · BMC Server Automation (BladeLogic) v8.2 Best Practices Deployment and Configuration Session 1 ... Deploy core

Documents