Top Banner
Ceph in a security critical OpenStack cloud Danny Al-Gaaf (Deutsche Telekom) Deutsche OpenStack Tage 2015 - Frankfurt
38
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: DOST: Ceph in a security critical OpenStack cloud

Ceph in a security critical OpenStack cloud

Danny Al-Gaaf (Deutsche Telekom)Deutsche OpenStack Tage 2015 - Frankfurt

Page 2: DOST: Ceph in a security critical OpenStack cloud

● Ceph and OpenStack● Secure NFV cloud at DT● Attack surface● Proactive countermeasures

○ Setup○ Vulnerability prevention○ Breach mitigation

● Reactive countermeasures○ 0-days, CVEs○ Security support SLA and lifecycle

● Conclusions

Overview

2

Page 3: DOST: Ceph in a security critical OpenStack cloud

Ceph and OpenStack

Page 4: DOST: Ceph in a security critical OpenStack cloud

Ceph Architecture

4

Page 5: DOST: Ceph in a security critical OpenStack cloud

Ceph and OpenStack

5

Page 6: DOST: Ceph in a security critical OpenStack cloud

Secure NFV Cloud @ DT

Page 7: DOST: Ceph in a security critical OpenStack cloud

NFV Cloud @ Deutsche Telekom

● Datacenter design○ BDCs

■ few but classic DCs ■ high SLAs for infrastructure and services■ for private/customer data and services

○ FDCs■ small but many■ near to the customer■ lower SLAs, can fail at any time■ NFVs:

● spread over many FDCs● failures are handled by services and not the infrastructure

● Run telco core services @ OpenStack/KVM/Ceph7

Page 8: DOST: Ceph in a security critical OpenStack cloud

Fundamentals - The CIA Triad

8

CONF

IDEN

TIAL

ITY INTEGRITY

AVAILABILITY

Preventing sensitive data against unauthorized

access

Maintaining consistency, accuracy, and trustworthiness

of data

Protecting systems against disrupting services and

availability of information

Page 9: DOST: Ceph in a security critical OpenStack cloud

High Security Requirements

● Multiple security placement zones (PZ)○ e.g. EHD, DMZ, MZ, SEC, Management○ TelcoWG “Security Segregation” use case

● Separation between PZs required for:○ compute○ networks○ storage

● Protect against many attack vectors● Enforced and reviewed by security department

9

Page 10: DOST: Ceph in a security critical OpenStack cloud

Solutions for storage separation

● Physical separation○ Large number of clusters (>100)○ Large hardware demand (compute and storage)○ High maintenance effort○ Less flexibility

● RADOS pool separation○ Much more flexible○ Efficient use of hardware

● Question:○ Can we get the same security as physical separation?

10

Page 11: DOST: Ceph in a security critical OpenStack cloud

Separation through Placement Zones

● One RADOS pool for each security zone○ Limit access using Ceph capabilities

● OpenStack AZs as PZs○ Cinder

■ Configure one backend/volume type per pool (with own key)■ Need to map between AZs and volume types via policy

○ Glance■ Lacks separation between control and compute/storage layer■ Separate read-only vs management endpoints

○ Manila■ Currently not planned to use in production with CephFS■ May use RBD via NFS

11

Page 12: DOST: Ceph in a security critical OpenStack cloud

Attack Surface

Page 13: DOST: Ceph in a security critical OpenStack cloud

RadosGW attack surface

● S3/Swift○ Network access to gateway

only○ No direct access for

consumer to other Ceph daemons

● Single API attack surface

13

Page 14: DOST: Ceph in a security critical OpenStack cloud

RBD librbd attack surface

● Protection from hypervisor block layer○ transparent for the guest○ No network access or CephX

keys needed at guest level

● Issue:○ hypervisor is software and

therefore not 100% secure…■ breakouts are no mythical creature

■ e.g., Virtunoid, SYSENTER, Venom!14

Page 15: DOST: Ceph in a security critical OpenStack cloud

RBD.ko attack surface

● RBD kernel module○ e.g. used with XEN or on bare

metal○ Requires direct access to Ceph

public network○ Requires CephX keys/secret at

guest level

● Issue:○ no separation between cluster

and guest15

Page 16: DOST: Ceph in a security critical OpenStack cloud

CephFS attack surface

● pure CephFS tears a big hole in hypervisor separation○ Requires direct access to Ceph

public network○ Requires CephX keys/secret at

guest level○ Complete file system visible to

guest■ Separation currently only via POSIX

user/group

16

Page 17: DOST: Ceph in a security critical OpenStack cloud

Host attack surface

● If KVM is compromised, the attacker ...○ has access to neighbor VMs○ has access to local Ceph keys○ has access to Ceph public network and Ceph daemons

● Firewalls, deep packet inspection (DPI), ...○ partly impractical due to used protocols○ implications to performance and cost

● Bottom line: Ceph daemons must resist attack○ C/C++ is harder to secure than e.g. Python○ Homogenous: if one daemon is vulnerable, all in the cluster are!

17

Page 18: DOST: Ceph in a security critical OpenStack cloud

Network attack surface

● Sessions are authenticated○ Attacker cannot impersonate clients or servers○ Attacker cannot mount man-in-the-middle attacks

● Client/cluster sessions are not encrypted○ Sniffer can recover any data read or written

18

Page 19: DOST: Ceph in a security critical OpenStack cloud

Denial of Service

● Attack against:○ Ceph Cluster:

■ Submit many / large / expensive IOs■ Open many connections■ Use flaws to crash Ceph daemons■ Identify non-obvious but expensive features of client/OSD interface

○ Ceph Cluster hosts:■ Crash complete cluster hosts e.g. through flaws in kernel network layer

○ VMs on same host:■ Saturate the network bandwidth of the host

19

Page 20: DOST: Ceph in a security critical OpenStack cloud

Proactive Countermeasures

Page 21: DOST: Ceph in a security critical OpenStack cloud

Deployment and Setup

● Network ○ Always use separated cluster and public networks○ Always separate your control nodes from other networks○ Don’t expose cluster to the open internet○ Encrypt inter-datacenter traffic

● Avoid hyper-converged infrastructure○ Don’t mix

■ compute and storage resources, isolate them!■ OpenStack and Ceph control nodes

○ Scale resources independently○ Risk mitigation if daemons are compromised or DoS’d

21

Page 22: DOST: Ceph in a security critical OpenStack cloud

Deploying RadosGW

● Big and easy target through HTTP(S) protocol

● Small appliance per tenant with○ Separate network ○ SSL terminated proxy forwarding

requests to radosgw○ WAF (mod_security) to filter○ Placed in secure/managed zone○ different type of webserver than

RadosGW● Don’t share buckets/users

between tenants22

Page 23: DOST: Ceph in a security critical OpenStack cloud

Ceph security: CephX

● Monitors are trusted key servers○ Store copies of all entity keys○ Each key has an associated “capability”

■ Plaintext description of what the key user is allowed to do

● What you get○ Mutual authentication of client + server○ Extensible authorization w/ “capabilities”○ Protection from man-in-the-middle, TCP

session hijacking● What you don’t get

○ Secrecy (encryption over the wire)23

Page 24: DOST: Ceph in a security critical OpenStack cloud

Ceph security: CephX take-aways

● Monitors must be secured○ Protect the key database

● Key management is important○ Separate key for each Cinder backend/AZ○ Restrict capabilities associated with each key○ Limit administrators’ power

■ use ‘allow profile admin’ and ‘allow profile readonly’■ restrict role-definer or ‘allow *’ keys

○ Careful key distribution (Ceph and OpenStack nodes)● To do:

○ Thorough CephX code review by security experts○ Audit OpenStack deployment tools’ key distribution○ Improve security documentation24

Page 25: DOST: Ceph in a security critical OpenStack cloud

● Static Code Analysis (SCA)○ Buffer overflows and other code flaws○ Regular Coverity scans

■ 996 fixed, 284 dismissed; 420 outstanding■ defect density 0.97

○ cppcheck○ LLVM: clang/scan-build

● Runtime analysis○ valgrind memcheck

● Plan○ Reduce backlog of low-priority issues (e.g., issues in test code)○ Automated reporting of new SCA issues on pull requests○ Improve code reviewer awareness of security defects

Preventing Breaches - Defects

25

Page 26: DOST: Ceph in a security critical OpenStack cloud

● Pen-testing○ human attempt to subvert security, generally guided by code review

● Fuzz testing○ computer attempt to subvert or crash, by feeding garbage input

● Harden build○ -fpie -fpic○ -stack-protector=strong○ -Wl,-z,relro,-z,now○ -D_FORTIFY_SOURCE=2 -O2 (?)○ Check for performance regression!

Preventing Breaches - Hardening

26

Page 27: DOST: Ceph in a security critical OpenStack cloud

Mitigating Breaches

● Run non-root daemons (WIP: PR #4456)○ Prevent escalating privileges to get root○ Run as ‘ceph’ user and group○ Pending for Infernalis

● MAC○ SELinux / AppArmor ○ Profiles for daemons and tools planned for Infernalis

● Run (some) daemons in VMs or containers○ Monitor and RGW - less resource intensive○ MDS - maybe○ OSD - prefers direct access to hardware

● Separate MON admin network27

Page 28: DOST: Ceph in a security critical OpenStack cloud

Encryption: Data at Rest

● Encryption at application vs cluster level● Some deployment tools support dm-crypt

○ Encrypt raw block device (OSD and journal)○ Allow disks to be safely discarded if key remains secret

● Key management is still very simple○ Encryption key stored on disk via LUKS○ LUKS key stored in /etc/ceph/keys

● Plan○ Petera, a new key escrow project from Red Hat

■ https://github.com/npmccallum/petera○ Alternative: simple key management via monitor (CDS blueprint)

28

Page 29: DOST: Ceph in a security critical OpenStack cloud

● Goal○ Protect data from someone listening in on network○ Protect administrator sessions configuring client keys

● Plan○ Generate per-session keys based on existing tickets○ Selectively encrypt monitor administrator sessions○ alternative: make use of IPSec (performance and management

implications)

Encryption: On Wire

29

Page 30: DOST: Ceph in a security critical OpenStack cloud

● Limit load from client○ Use qemu IO throttling features - set safe upper bound

● To do:○ Limit max open sockets per OSD○ Limit max open sockets per source IP

■ handle on Ceph or in the network layer?○ Throttle operations per-session or per-client (vs just globally)?

Denial of Service attacks

30

Page 31: DOST: Ceph in a security critical OpenStack cloud

CephFS

● No standard virtualization layer (unlike block)○ Filesystem passthrough (9p/virtfs) to host○ Proxy through gateway (NFS?)○ Allow direct access from tenant VM (most unsecure)

● Granularity of access control is harder○ No simple mapping to RADOS objects

● Work in progress○ root_squash (Infernalis blueprint)○ Restrict mount to subtree○ Restrict mount to user

31

Page 32: DOST: Ceph in a security critical OpenStack cloud

Reactive Countermeasures

Page 33: DOST: Ceph in a security critical OpenStack cloud

● Community○ Single point of contact: [email protected]

■ Core development team■ Red Hat, SUSE, Canonical security teams

○ Security related fixes are prioritized and backported○ Releases may be accelerated on ad hoc basis○ Security advisories to [email protected]

● Red Hat Ceph○ Strict SLA on issues raised with Red Hat security team○ Escalation process to Ceph developers○ Red Hat security team drives CVE process○ Hot fixes distributed via Red Hat’s CDN

Reactive Security Process

33

Page 34: DOST: Ceph in a security critical OpenStack cloud

Detecting and Preventing Breaches

● Brute force attacks○ Good logging of any failed authentication○ Monitoring easy via existing tools like e.g. Nagios

● To do:○ Automatic blacklisting IPs/clients after n-failed attempts on Ceph level

(Jewel blueprint)

● Unauthorized injection of keys○ Monitor the audit log

■ trigger alerts for auth events -> monitoring○ Periodic comparison with signed backup of auth database?

34

Page 35: DOST: Ceph in a security critical OpenStack cloud

Conclusions

Page 36: DOST: Ceph in a security critical OpenStack cloud

Summary

● Reactive processes are in place○ [email protected], CVEs, downstream product updates, etc.

● Proactive measures in progress○ Code quality improves (SCA, etc.)○ Unprivileged daemons○ MAC (SELinux, AppArmor)○ Encryption

● Progress defining security best-practices○ Document best practices for security

● Ongoing process

36

Page 37: DOST: Ceph in a security critical OpenStack cloud

Get involved !

● Ceph○ https://ceph.com/community/contribute/ ○ [email protected]○ IRC: OFTC

■ #ceph, ■ #ceph-devel

○ Ceph Developer Summit

● OpenStack○ Telco Working Group

■ #openstack-nfv ○ Cinder, Glance, Manila, ...

37

Page 38: DOST: Ceph in a security critical OpenStack cloud

[email protected]

dalgaaf

linkedin.com/in/dalgaaf

Danny Al-Gaaf Senior Cloud Technologist

IRC

THANK YOU!