Linux/UNIX Centralization at Brookhaven National
LabLisa Soto
BNL – Information Technology Division
May 31,2009
Overview
Background and Philosophy Pilot to Program Centrify Cfengine Architecture and Controls Technical Issues Cultural/Political Issues Lessons Learned What’s Next
Background Starting State
• DOE requires that all computers meet baseline security configuration requirements and have asset management capabilities (e.g. SMS) to report to DOE that baseline controls are in place, systems are patched, etc.
• BNL uses the CIS benchmarks for baseline configurations on Linux/UNIX computers
• Unable to apply baseline security requirements on decentrally managed or unmanaged computers in an efficient and automated manner
• Not all computers have a clear assignment of a responsible systems administrator (either departmental or ITD) to manage the system.
• Policies for computer security requirements and responsibilities for systems administrators have not been clearly defined
• Process for requesting an exception from policy with mitigating controls has not been defined
• This all leads to an inconsistent security posture that cannot pass security audits
What do we need to do?• Apply centrally managed security controls to enforce the baseline security for
all Linux/UNIX computers on the BNL internal network.• Define the process to request a variance from the centrally managed security
controls where there is an operational requirement that cannot be met with the security controls in place
• Define policies and responsibilities for Linux/UNIX system administrators, and make sure all computers are being managed by someone who can carry out these responsibilities
Philosophy Centralization is the tool, not the goal “GSS” Systems only
• Exempted clusters, control systems, Information Systems• Providing common controls
Taking a hands-on approach to calm fears and work through methodically, resolving pent-up demand for IT services
Each department/division/group has unique needs to be met for successful deployment
Documentation for all benchmarks as they apply to BNL, code from Cfengine, and other technical documentation is freely available to the users
Cfengine
http://www.cfengine.org/ Automated suite of programs for configuring and
maintaining Unix-like computers Autonomic maintenance system In use elsewhere since 1993 on a broad installation
base Actively developed and researched PNL offered support to bring to production very
quickly Needed to rapidly establish a baseline
Centrify
Centralize User Authentication via Active Directory (AD) • Centrify DirectControl is a commercial product that
authenticates Linux/Unix users against AD• Faster time to deployment over in-house development• Zone architecture well-suited to our needs• Did not require extension of the AD schema
Vendor very responsive to our needs • Added support for Scientific Linux and RHEL on PPC at
our request
Architecture
Produce Compliance
Audits
Managed Computer
ORDO
Cfengine Policy Server
Mirror.bnl.govRedHat Satellite
Enforce Policy
Distribute Patches
Central Authentication Server (AD)
Centrify DirectControl
Centrally Managed Controls Computers must be registered with the Network Access Registration
system (http://register.bnl.gov) Users must authenticate to the BNL Active Directory Domain
• Account management in the BNL Domain is centrally managed
Configuration policies are centrally deployed via Cfengine Linux/UNIX patches are deployed from ITD-maintained repositories
(RH Satellite for RedHat, local mirrors for other Linux and UNIX) Permit access to the network vulnerability scanners, and are subject
to continuous (critical) and quarterly network vulnerability scanning ORDO is installed and operational Cfengine is installed and operational All Non-Linux operating systems are fully supported by the vendor for
security patches Linux operating systems are standardized
Installation Checklist Identify machines and their OS using ORDO and Registration DB Check Machine Registration
Identify system administrators (or owners/users) for each machine Contact administrators and give them the opportunity to file for
variances Check for supported operating system
RedHat Enterprise Linux (RHEL) versions 3 and later Debian Linux oldstable(etch), stable(lenny), and testing(squeeze) Ubuntu versions 6.06 LTS, 8.04 LTS and 8.10 Scientific Linux versions 3 and later Fedora (non-preferred) 9 or 10 Solaris 8, 9, or 10 Irix 6.5.22 or higher
ORDO installed Install cfagent and subscribe to ITD cfengine policy server
Cfengine will distribute policies Install Centrify for AD Authentication
Rationalize UIDs as needed Add User accounts as needed
Subscribe to RHEL or mirror.bnl.gov for updates
Configuration Items Requiring a Variance
Root access for ITD Banners SSH Configuration File Permissions for passwd, shadow, group files No open X11 connections Lock accounts with null passwords Disable insecure protocol daemons
• - telnet• - rsh/shell• - rlogin/login• - rexec/exec
Screensaver activates after 15 minutes* Log forwarding* Most other controls are “as needed” and controlled by the local
administrators
Variance from Centrally Managed Security Controls Not all of the centrally managed security controls may be
able to be applied to all Linux/UNIX computers due operational requirements.
A variance from the standard controls must be documented and approved. This is done using a variance form for requesting an exemption from policy, e.g. computer cannot display banners at every login
Variances will require mitigating action to be taken to compensate for the exempted security control
Variances originate within Cybersecurity group and are approved through an escalation chain
Linux/UNIX Sys Admin & Users Responsibilities
In addition to the routine tasks included in system administration (software installation and maintenance, backups, performance tuning, etc.) Linux/UNIX system administrators will:
Cooperate with ITD staff to deploy Cfengine, central authentication, and patch management
Remediate all identified vulnerabilities that centrally managed security controls do not address
Request any needed variance from BNL standards based on operational requirements and ensure that mitigations are applied and remain in place
Respond to cyber security incidents and alert CSIRT
Pilot to Program
Presented at CSAC meetings for several months to track progress of beta and phase 1 machines
Formally wrapped up phase 1 before engaging in further phases
Plotted out phased approach to ramp up number of systems per phase and left largest areas for last
Pilot to Program
Technical Issues Summary
Technical issues with Centrify, Cfengine, the BNL computing environment, and other areas have been compiled and tracked in a comprehensive spreadsheet
Nearly all issues are either resolved or a workaround is in place
No technical issues that have arisen to date can be classified “showstoppers” for wider deployment
Cultural/Political Issues Summary Pushback from the phase 1 departments was as expected
• Open dialogue seems to have calmed many fears
Success of the program depends on the good will of the local system administrators and users• Anyone with physical access to the machine can easily remove
Centrify, cfengine, ordo, etc. or take other action to circumvent security mechanisms
• These are configuration management tools, not security tools• Development for compliance reporting tools is needed and will be
worked on in upcoming phases
Cultural/Political (cont’d)
System administrators are not well defined• The role of the local administrators is not well defined nor is the
roster of individuals- This makes interfacing with local users especially challenging
if no one can take responsibility for the system• Difficult to identify undermanaged computing resources and
leads to poor cyber security profile- Users unaware of how to create new accounts will share
passwords and use old accounts- Routine patching does not occur- Broken computers have no one to take responsibility to
repair/replace them- Problems become a surprise issue for ITD staff as urgent
support requests or when working on centralization for department
Cultural/Political (cont’d) Pent-up demand exists within departments
• Working methodically through the departments is releasing a lot of pent up demand for IT services- Examples
- Hardware RMA for 3 systems in Biology- These systems failed shortly after they were received.
They went unrepaired due to lack of a local system administrator. The warranty on parts was long expired and incurred cost to the system owner to procure replacement parts
- One system showed up as a single machine. It is actually a very poorly maintained cluster environment that suffers from frequent hardware crashes
Cultural/Political (cont’d) Inheriting Unmanaged environment
• Analyzing problems and understanding systems and how they are used takes time - Need to understand problems and strategize for
scalable and sustainable solutions- Generating significant amount of project-based work
to correct or improve long-term problems - Generating new responsibilities and additional
workload for ITD system administrators in both Unix and Windows environments
• Goal for centralization is to leave systems in better condition than when they were found and maintain state over time not just install software and leave
Cultural/Political (cont’d) Significant scheduling conflicts with local users and
administrators• Examples:
- Can’t work on Physics/C-AD machines while RHIC is running
- Computing resources in use to do science in support of grant proposals makes scheduling system upgrades difficult - 1 instance of this delayed completion of project by ~3
weeks- More significant as more machines are targetted
Lessons Learned
No technology issues have been “showstoppers” Cultural/Political conflict is far more of a problem Cultural/Political issues can be overcome through
open communication and a willingness to work with the users
These are configuration management tools, not security tools
Pent-up demand for IT services is prevalent in scientific departments
Slow, methodical progress has more wins than trying to strongarm it
What’s Next
Establish formal change control process to address move/add/change within the centralization code• Mitigated by vetting changes through Linux Working Group and
Cybersecurity Policy Working Group
Roles and Responsibilities (R2A2) of System Administrators is in the process of being revised • SBMS Subject Area will be updated
System Administrator training is still being developed Additional reporting components are still under development Transitioning into maintenance phase by end of calendar year