July 2018 What’s New in Red Hat OpenShift Origin 3.10 OpenShift Commons Briefing
July 2018
What’s New in Red Hat OpenShift Origin 3.10
OpenShift Commons Briefing
OCP 3.10 - The Efficient Cluster
● Resource Management● Descheduler (tech preview), CPU Manager, Ephemeral Storage,
HugePages● Resilience
● Node Problem Detector, HA egress pods with DNS● Workload Diversity
● Device Manager, Windows Containers (dev preview)● Installation Automation
● TLS node bootstraping, static pods● Security
● Etcd cipher coverage, Shared PID namespace options, more secured router
What’s New for 3.10:
● Remove etcd from Automation Broker and move to using CRDs● Broker will now use CRDs instead of a local etcd instance
● Make serviceInstance details available to the playbook● Exposes the details at runtime of who provisioned a service to the provision and deprovision playbooks
● Such as OpenShift cluster dns suffix, username, namespace, ServiceInstance id
● Enhance error messages, so when a provision request fails the error is preserved and displayed to end user in web console
● Allows APB to return custom error messages that gets surfaced by service catalog if a provisioning operation fails● Eases troubleshooting and improves customer experience
3
Feature(s): OpenShift Automation (Ansible) Broker
Self-Service / UX
New AWS Services:
Kinesis Data Streams
Key Management Service (KMS)
Lex
Polly
Rekognition
Translate (requires Preview registration)
SageMaker*
Additional RDS engines:
Aurora*, MariaDB, & PostgreSQL
AWSServiceBroker
AMAZON WEB SERVICES
Service Broker
* Coming soon!
Feature(s): Improved search within catalog
Description: Show “top 5” results
How it Works:
● Weighting is given based on where the match is found
● Factors include: title, description, tagging
Self-Service / UX
Feature(s): User chooses route for application
Description: Need better way to show routes for app
How it Works:
● Indication that there are multiple routes● Annotate route that you’d like to be
primary
Self-Service / UX
console.alpha.openshift.io/overview-app-route: ‘true’
Feature(s): Create generic secrets
Description: Allow users a way to create opaque secrets
How it Works:
● User can already create secrets, expanding to opaque
● Behaves like creating config maps
Self-Service / UX
Feature(s): Service Catalog CLI
Description: Provision, bind services from command line
How it Works:
● Full set of commands to list, describe, provision/deprovision and bind/unbind
● Based on contribution from Azure● Separate CLI as part of RPM
Self-Service / UX
$ svcat provision postgresql-instance --class rh-postgresql-apb --plan dev --params-json '{"postgresql_database":"admin","postgresql_password":"admin","postgresql_user":"admin","postgresql_version":"9.6"}' -n szh-project Name: postgresql-instance Namespace: szh-project Status: Class: rh-postgresql-apb Plan: dev
Parameters: postgresql_database: admin postgresql_password: admin postgresql_user: admin postgresql_version: "9.6"
Miscellaneous Service Catalog
● Rename bind credential secret keys● Improvements in reconciliation process, optimizations and removed false
messages (failed to provision)● Flexible secret management (add, remove, change)
Self-Service / UX
DevExp / Builds
Feature(s): Jenkins items
● Sync removal of build jobs - this allows for cleanup of old/stale jobs
● Jenkins updated to 2.107.3-1.1● Update in Jenkins build agent (slave) images
- Node.js 8- Maven 3.5
Dev Tools - Local Dev
CDK 3.4:● OpenShift Container Platform v3.9.14● Image caching is enabled by-default● HyperV users can assign a static IP to CDK● Hostfolder mount using SSHFS (Technical preview)● Uses overlay as the default storage driver
Minishift 1.21 / CDK 3.5 : 17-JUL-2018● Native hypervisor (Hyper-V/xhyve/KVM) or VirtualBox● Run CDK against an existing RHEL7 host.● SSHFS is default technology for hostfolder share● Local DNS server to reduce dependency on nip.io.● Users will be able to use OCP 3.10
Operator SDK
Feature(s): Dev tools to build Kubernetes applications
Description: Help customers/ISVs build and publish Kubernetes applications that run like cloud services, anywhere OpenShift runs
How it Works:
● Includes all scaffolding code● Only need to build the logic specific to app● Tool to publish and use on multiple clusters● Supports Helm chart, Ansible PB or Go code
Embed unique operational knowledge
Package and installon OCP clusters
Feature(s): Kubernetes Upstream Red Hat Blog and Commons Webinar
Description: OpenShift 3.10 brings enhancements in how efficiently you can leverage the resources available from the nodes across the cluster. From ephemeral storage, CPU, memory pages, IP addresses, and other resources available to the cluster, OpenShift 3.10 more efficiently brings nodes into the cluster and exposes its resources to application services.
Container OrchestrationRed Hat Contributing Projects:
● API Aggregation● CronJobs stabilizing● Control API access from nodes● PSP stabilizing● Configurable pod resolv.conf● Kubelet ComponentConfig API● Mount namespace propagation● PV handling with deleted pods and
orphaned binds● Ephemeral Storage Handling● CRD subresource and categories● Container Storage Interface● Kubectl extension handling
Feature(s): HugePages, CPU Manager, Device Manager
Description: We spoke about Device Manager here. CPU Manager Policy allows you to tell kube that your workload requires an affinity to a CPU core. Maybe your workload needs CPU cache affinity and can’t handle being bounced around to different CPU cores on the node via normal fair share scheduling on linux. HugePages allows you to request that your workload consume a specific amount of HugePages.
Performance Pods
How it Works: CPU manager is a flag on the kubelet that has the option of none or static. Static will cause guaranteed QoS pod access to exclusive CPU cores on the node. HugePages is a flag you set to true on the master and kubelet. The nodes will then be able to tell if any HugePages are available and workloads can request them via the pod definition.
ubelet device manager
CPU Manager Policy
# cat /etc/origin/node/node-config.yaml...kubeletArguments:... feature-gates: - CPUManager=true cpu-manager-policy: - static cpu-manager-reconcile-period: - 5s kube-reserved: - cpu=500m
Result:
# oc exec pod-name -- cat /sys/fs/cgroup/cpuset/cpuset.cpus2# oc exec pod-name -- grep ^Cpus_allowed_list /proc/1/statusCpus_allowed_list: 2
HugePages
# cat /etc/origin/node/node-config.yaml...kubeletArguments:... feature-gates: - HugePages=true
Pod spec:
resources: requests: cpu: 1 memory: 256Mi limits: cpu: 1 memory: 256Mi
# cat /etc/origin/master/master-config.yaml...kubernetesMasterConfig: apiServerArguments: ... feature-gates: - HugePages=true
Pod spec:
resources: limits: hugepages-2Mi: 100Mi
Both the variable name and value are configurable.
Feature(s): Node Problem Detector
Description: Daemon that runs on each node as a daemonSet. The daemon tries to make the cluster aware of node level faults that should make the node not schedulable.
Node
How it Works: When you start the node problem detector you tell it a port to broadcast the issues it find over. The detector allows you to load sub-daemons to do the data collection. There are 3 as of today. Issues found by the problem daemon can be classified as “NodeCondition” which means stop node scheduling or “Event” which are only informative.
ubelet device manager
TechPreview
Problem Daemons:
● Kernel Monitor: monitors kernel log via journald and reports problems according to regex patterns
● AbrtAdaptor: monitors the node for kernel problems and application crashes from journald
● CustomerPluginMonitor: allows you to test for any condition and exit on a 0 or 1 should you condition not be met.
Feature(s): Protection of Local Ephemeral Storage
Description: Control the usage of local ephemeral storage feature on the nodes in order to prevent users from exhausting all node local storage (logs, empty dirs, copy on write layer) with their pods and abusing other pods that happen to be on the same node.
Node
How it Works: After turning on LocalStorageCapacityIsolation, pods submitted use the limit and requested fields. Violations will result in an evicted pod.Limit: ephemeral storage request when scheduling a container to a node, then fences off the requested ephemeral storage on the chosen node for the use of the container.
Request: provides a hard limit on the ephemeral storage that can be allocated across all the processes in a container
ubelet device manager
TechPreview
1. Master: /etc/origin/master/master-config.yamlkubernetesMasterConfig: apiServerArguments:
feature-gates:- LocalStorageCapacityIsolation=true
controllerArguments:feature-gates:- LocalStorageCapacityIsolation=true
2. Node: /etc/origin/node/node-config.yamlkubeletArguments: feature-gates: - LocalStorageCapacityIsolation=true
3.) Launch pods with the following in their deploymentConfig
resources: requests: ephemeral-storage: 500Mi limits: ephemeral-storage: 1Gi
Feature(s): Descheduler
Description: Due to the fact a scheduler’s view of a cluster is at a single point in time, a overall cluster’s balance may become skewed from taints and tolerations, evictions, affinities, and other life cycle reasons such as node maintenance or new node additions. As a result, you can have some nodes become under or over utilized.
Node
How it Works: A descheduler is a job running in a pod that runs in the kube-system project. This descheduler finds pods based on its policy and evicts them in order to give them back to the scheduler for replacement on the cluster. It does not target static pods, those with high QoS, daemonSets, or those with local storage.
ubelet device manager
TechPreview Available Policies:
● RemoveDuplicates: if this policy is set the descheduler looks for pods that are apart of the same replicaSet or deployment that happen to have been placed on the same node. It evicts the duplicates in the hope the scheduler will place them on a different node.
● LowNodeUtilization: finds nodes that are under the CPU, MEM, and # of pod thresholds you have set, it will evict pods from other nodes in the hopes the scheduler places the pods on these under utilized nodes. There is also a setting to only trigger this if you have more than X number of under utilized nodes.
● RemovePodsViolatingInterPodAntiAffinity and RemovePodsViolatingNodeAffinity: re-evaluates the pods that might have been forced to break their affinity rules and evicts them for another chance to be places on nodes that comply to their affinity or anti-affinity.
Feature(s): Windows Containers
Description: Be able to run Windows containers on Windows Server 1709, 1803, and 2019 within a OpenShift cluster.
Node
How it Works: Join partnership between Microsoft and Red Hat. Microsoft will distribute and support, through our joint co-located support process, the kubelet, configuration/installation, and networking components that need to be installed on Windows. Red Hat will support the interaction with those components with the OpenShift cluster.
Customers and partners can sign up for the developer preview program here. The program will start within the next 7 days. It has been delayed due to technical difficulties.
ubelet device manager
DevPreview
Providing in the developer preview:
1.) Powershell script to satisfy container prerequisites on Windows Server
2.) Installation process that allows you to install on one to many nodes without deploying an overlay network
3.) Ansible playbooks to deploy and configure an experimental OVN network on the OpenShift cluster
4.) Ansible playbooks to deploy and configure an experimental OVN network from CloudBase on Windows Server. And to then connect that Windows node to the OpenShift cluster
Features in the first drop:
1.) kubelet and pre-reqs (docker, networking plugins, etc)2.) Join Windows node to OpenShift cluster 3.) Allow Windows access to certain projects (nodeSelector or taints
& tolerations)4.) Work with templates in the Service Catalog5.) Attach static storage to the container6.) Scale up and down the Windows container7.) DNS resolvable URL for service to route object8.) East/west network connectivity to Linux pods9.) Delete Windows Container
Video of it WORKING!!!
Feature(s): Expose registry metrics with OpenShift auth
Description: Registry metrics endpoint now protected by built-in OpenShift auth
How it works:
● Registry provides an endpoint for Prometheus metrics
● Route must be enabled● Users with the appropriate role can access
metrics using their openshift credentials● An admin defined shared secret can still be
used to access the metrics as well
Registry
Feature(s): Run control plane as static pod
Description: Migrate control plane to static pods to leverage self-management of cluster components and minimize direct host management
How it Works: ● In 3.10 and newer, control plane components (etcd, API, and controller manager) will now move to
running as static pods● Goal is to reduce node level configuration in preparation for automated cluster configuration on immutable
infrastructure● Unified control plane deployment methods across Atomic Host and RHEL; everything runs atop the kubelet.
● The standard upgrade process will migrate existing clusters automatically
Installation
Feature(s): Bootstrapped Node Configuration
Description: Node configuration is now managed via API objects and synchronized to nodes
How it Works: ● In 3.10 and newer, all members of the [nodes] inventory group must be assigned an openshift_node_group_name (value is
used to select the configmap that configures each node)● By default, there are five configmaps created: node-config-master, node-config-infra, node-config-compute,
node-config-master-infra, & node-config-all-in-one● Last two place a node into multiple roles● Note: configmaps are the authoritative definition of node labels; the old openshift_node_labels value is effectively ignored.
● If you want to deviate from default configuration, you must define the entire openshift_node_group dictionary in your inventory. When using an INI based inventory it must be translated into a Python dictionary.
● The upgrade process will now block until you have the required configmaps in the openshift-node namespace● Either accept the defaults or define openshift_node_groups to meet your needs, then run
playbooks/openshift-master/openshift_node_group.yml to create the configmaps● Review the configmaps carefully to ensure that all desired configuration items are set then restart the upgrade
● Changes to these configmaps will propagate to all nodes within 5 minutes overwriting /etc/origin/node/node-config.yaml
Installation
Image Reference: https://medium.com/@toddrosner/kubernetes-tls-bootstrapping-cf203776abc7
Feature(s): HA Setup For Egress Pods
Description: In the first z-stream release of 3.10, egress pods can have HA failover across secondary cluster nodes in the event the primary node goes down.
How it works: Namespaces are now allowed to have multiple egress IPs specified, hosted on different nodes, so that if the primary node fails the egress IP switches from its primary to secondary egress IP being hosted on another node. When the original IP eventually comes back, then nodes will switch back to using the original egress IP. The switchover currently takes ≤7 seconds for a node to notice that an egress node has gone down (potentially configurable in a later version).
Networking
NODE 2NODE 1
NAMESPACE A
EXTERNAL SERVICE
Whitelist: IP1, IP2
POD POD POD POD
EGRESS IP 1
EGRESS IP 2
Feature(s): Allow DNS names for egress routers
Description: The egress router can now refer to an external service, with a potentially unstable IP address, by its hostname.
How it works: The OpenShift egress router runs a service that redirects egress pod traffic to one or more specified remote servers, using a pre-defined source IP address that can be whitelisted on the remote server. Its EGRESS_DESTINATION can now specify the remote sever by FQDN.
Networking
NODEIP1
EGRESSROUTER
PODIP1
EGRESS SERVICE
INTERNAL-IP:8080
EXTERNAL SERVICE
Whitelist: IP1
POD
POD
POD
...- name: EGRESS_DESTINATION value: | 80 tcp my.example.com 8080 tcp 5.6.7.8 80 8443 tcp your.example.com 443 13.14.15.16...
Feature(s): Document and test a supported way of expanding the serviceNetwork
Description: Provide a supported way of growing the service network address range in a multi-node environment to a larger address space.
For example:
serviceNetworkCIDR: 172.30.0.0/24
Note: This DOES NOT cover migration to a different range, JUST the increase of an existing range.
Networking
1. Update the master-config.yaml to change the serviceNetworkCIDR to 172.30.0.0/16
2. Delete the default clusternetwork object on the master: # oc delete clusternetwork default
3. Restart the master API service and the controller service
4. Update the ansible inventory file to match the change in (1) and redeploy the cluster
5. Evacuate the node one by one and restart the iptables and atomic-openshift-node services
How it works:
172.30.0.0/16
Feature(s) : Specify whitelist cipher suite for etcd
Security
Description: Users now have the ability to optionally whitelist cipher suites for use with etcd in order to meet security policies.
How it Works: ● Configure etcd to add --cipher-suites flag with
the desired cipher suite● Restart etcd, apiserver, controllers, etc● TLS handshake fails when client hello is
requested with invalid cipher suites.● If empty, Go auto-populates the list.
Feature(s) : Control Sharing the PID namespace between containers
Security
Description: Use this feature to configure cooperating containers in a pod, such as a log handler sidecar container, or to troubleshoot container images that don’t include debugging utilities like a shell.
How it Works: ● The feature gate PodShareProcessNamespace is set to false by default● Set 'feature-gates=PodShareProcessNamespace=true'
in apiserver, controllers and kubelet ● Restart apiserver, controller and node service● Create a pod with spec "shareProcessNamespace: true"● oc create -f <pod spec file>
Caveats: When the pid namespace is shared between containers● Sidecar containers are not isolated ● Environment variables are now visible to all other processes● Any "kill all" semantics used within the process are now broken● Exec processes from other containers will now show up pods/share-process-namespace.yaml
TechPreview
Feature(s) : Router Service Account no longer needs access to secrets
Security
Description: The router service account no longer needs permission to read all secrets. This improves security, as previously, if the router were compromised it could then read all of the most sensitive data in the cluster.
How it Works:
● When you create an ingress object, a corresponding route object is created.
● If an ingress object is modified, a changed secret should take effect soon after
● If an ingress object is deleted, a route that was created for it will be deleted
SERVICE
POD POD
ROUTER
POD
EXTERNAL TRAFFIC
INTERNAL TRAFFIC
Feature(s): Container Storage Interface (CSI)
Description: Introduce CSI sub-system as tech preview in 3.10
• External Attacher
• External Provisioner
• Driver registrar
• CSI Drivers shipped: None (use external/upstream)
StorageHow it Works
• Create a new project where the CSI components will run and a new service account that will run the components
• Create the Deployment with the external CSI attacher and provisioner and DaemonSet with the CSI driver
• Create a StorageClass for the new storage entity • Create a PVC with the new StorageClass
• See: https://github.com/openshift/openshift-docs/blob/master/install_config/persistent_storage/persistent_storage_csi.adoc
DevPreview
Feature(s): New Storage Provisioners
Description: New Storage Provisioners (external provisioners) added as Tech Preview with 3.10
• CephFS
StorageHow it Works • Use OpenShift Ansible installer
openshift_provisioners role
• Set the provisioner to be installed and started as true
<After the provisioner install and startup is completed>
• Create a Storage Class for the storage entity
• Create a pod with a PVC/claim with the Storage Class
TechPreview
● Atomic Host deprecation notice, as Red Hat CoreOS will be the future immutable host option.
○ Atomic supported in 3.10 & 3.11
Storage
● Virtual data optimizer (VDO) for dm-level dedupe and compression.
● OverlayFS by default for new installs (overlay2)○ Ensure ftype=1 for 7.3 and earlier
● Devicemapper continues to be supported and available for edge cases around POSIX
● LVM snapshots integrated with boot loader (boom)
RHEL 7.5 Highlights
OpenShift Container Platform 3.10 is supported on RHEL 7.4, 7.5 and Atomic Host 7.5+
Containers / Atomic
● Docker 1.13● Docker-latest deprecation● RPM-OSTree package overrides
Security
● Unprivileged mount namespace● KASLR full support and enabled by default. ● Ansible remediation for OpenSCAP● Improved SELinux labeling for cgroups
(cgroup_seclabel)
CRI-O v1.10
Feature(s): CRI-O v1.10
Description: CRI-O is an OCI compliant implementation of the Kubernetes Container Runtime Interface. By design it provides only the runtime capabilities needed by the kubelet. CRI-O is designed to be part of Kubernetes and evolve in lock-step with the platform.
CRI-O brings:
● A minimal and secure architecture● Excellent scale and performance● Ability to run any OCI / Docker image● Familiar operational tooling and commands
Improvements include:
● crictl CLI for debugging and troubleshooting● Podman for image tagging & management● Installer integration & fresh install time
decision: openshift_use_crio=True● Not available for existing cluster upgrades
KubeletStorage Image
RunCCNI Networking
Questions