What’s New in Red Hat OpenShift Origin 3.10 OpenShift ... · OCP 3.10 - The Efficient Cluster Resource Management Descheduler (tech preview), CPU Manager, Ephemeral Storage, HugePages

July 2018

What’s New in Red Hat OpenShift Origin 3.10

OpenShift Commons Briefing

OCP 3.10 - The Efficient Cluster

● Resource Management● Descheduler (tech preview), CPU Manager, Ephemeral Storage,

HugePages● Resilience

● Node Problem Detector, HA egress pods with DNS● Workload Diversity

● Device Manager, Windows Containers (dev preview)● Installation Automation

● TLS node bootstraping, static pods● Security

● Etcd cipher coverage, Shared PID namespace options, more secured router

What’s New for 3.10:

● Remove etcd from Automation Broker and move to using CRDs● Broker will now use CRDs instead of a local etcd instance

● Make serviceInstance details available to the playbook● Exposes the details at runtime of who provisioned a service to the provision and deprovision playbooks

● Such as OpenShift cluster dns suffix, username, namespace, ServiceInstance id

● Enhance error messages, so when a provision request fails the error is preserved and displayed to end user in web console

● Allows APB to return custom error messages that gets surfaced by service catalog if a provisioning operation fails● Eases troubleshooting and improves customer experience

3

Feature(s): OpenShift Automation (Ansible) Broker

Self-Service / UX

https://trello.com/c/K6zUdfLZ/668-21-remove-etcd-from-broker-and-move-to-using-crds

https://trello.com/c/dsgVkgjz/632-3-aws-broker-rfe-make-serviceinstance-details-available-to-the-playbook-674

https://trello.com/c/XqdHB5Ch/633-8-aws-broker-rfe-enhance-error-messages-so-when-a-provision-request-fails-with-a-cloudformation-error-the-error-is-preserved-and

https://trello.com/c/XqdHB5Ch/633-8-aws-broker-rfe-enhance-error-messages-so-when-a-provision-request-fails-with-a-cloudformation-error-the-error-is-preserved-and

New AWS Services:

Kinesis Data Streams

Key Management Service (KMS)

Lex

Polly

Rekognition

Translate (requires Preview registration)

SageMaker*

Additional RDS engines:

Aurora*, MariaDB, & PostgreSQL

AWSServiceBroker

AMAZON WEB SERVICES

Service Broker

* Coming soon!

Feature(s): Improved search within catalog

Description: Show “top 5” results

How it Works:

● Weighting is given based on where the match is found

● Factors include: title, description, tagging

Self-Service / UX

Feature(s): User chooses route for application

Description: Need better way to show routes for app

How it Works:

● Indication that there are multiple routes● Annotate route that you’d like to be

primary

Self-Service / UX

console.alpha.openshift.io/overview-app-route: ‘true’

Feature(s): Create generic secrets

Description: Allow users a way to create opaque secrets

How it Works:

● User can already create secrets, expanding to opaque

● Behaves like creating config maps

Self-Service / UX

Feature(s): Service Catalog CLI

Description: Provision, bind services from command line

How it Works:

● Full set of commands to list, describe, provision/deprovision and bind/unbind

● Based on contribution from Azure● Separate CLI as part of RPM

Self-Service / UX

$ svcat provision postgresql-instance --class rh-postgresql-apb --plan dev --params-json '{"postgresql_database":"admin","postgresql_password":"admin","postgresql_user":"admin","postgresql_version":"9.6"}' -n szh-project Name: postgresql-instance Namespace: szh-project Status: Class: rh-postgresql-apb Plan: dev

Parameters: postgresql_database: admin postgresql_password: admin postgresql_user: admin postgresql_version: "9.6"

Miscellaneous Service Catalog

● Rename bind credential secret keys● Improvements in reconciliation process, optimizations and removed false

messages (failed to provision)● Flexible secret management (add, remove, change)

Self-Service / UX

https://trello.com/c/4XDqRwfn/58-5-catalog310-allow-users-to-rename-bind-credential-secret-keys

https://trello.com/c/oRfQX2Ih/110-5-catalog310-reconciliation-loop-rework

https://trello.com/c/oRfQX2Ih/110-5-catalog310-reconciliation-loop-rework

https://trello.com/c/UmdhbQd8/129-catalog310-better-credentials-remapping-w-jsonpath-templates

DevExp / Builds

Feature(s): Jenkins items

● Sync removal of build jobs - this allows for cleanup of old/stale jobs

● Jenkins updated to 2.107.3-1.1● Update in Jenkins build agent (slave) images

- Node.js 8- Maven 3.5

Dev Tools - Local Dev

CDK 3.4:● OpenShift Container Platform v3.9.14● Image caching is enabled by-default● HyperV users can assign a static IP to CDK● Hostfolder mount using SSHFS (Technical preview)● Uses overlay as the default storage driver

Minishift 1.21 / CDK 3.5 : 17-JUL-2018● Native hypervisor (Hyper-V/xhyve/KVM) or VirtualBox● Run CDK against an existing RHEL7 host.● SSHFS is default technology for hostfolder share● Local DNS server to reduce dependency on nip.io.● Users will be able to use OCP 3.10

https://developers.redhat.com/products/cdk/download/?onebox=cdk

https://github.com/minishift/minishift/releases/tag/v1.21.0

Operator SDK

Feature(s): Dev tools to build Kubernetes applications

Description: Help customers/ISVs build and publish Kubernetes applications that run like cloud services, anywhere OpenShift runs

How it Works:

● Includes all scaffolding code● Only need to build the logic specific to app● Tool to publish and use on multiple clusters● Supports Helm chart, Ansible PB or Go code

Embed unique operational knowledge

Package and installon OCP clusters

Feature(s): Kubernetes Upstream Red Hat Blog and Commons Webinar

Description: OpenShift 3.10 brings enhancements in how efficiently you can leverage the resources available from the nodes across the cluster. From ephemeral storage, CPU, memory pages, IP addresses, and other resources available to the cluster, OpenShift 3.10 more efficiently brings nodes into the cluster and exposes its resources to application services.

Container OrchestrationRed Hat Contributing Projects:

● API Aggregation● CronJobs stabilizing● Control API access from nodes● PSP stabilizing● Configurable pod resolv.conf● Kubelet ComponentConfig API● Mount namespace propagation● PV handling with deleted pods and

orphaned binds● Ephemeral Storage Handling● CRD subresource and categories● Container Storage Interface● Kubectl extension handling

https://coreos.com//blog/kubernetes-110-released

https://coreos.com//blog/kubernetes-110-released

https://youtu.be/OLxlpZGMlMA

Feature(s): HugePages, CPU Manager, Device Manager

Description: We spoke about Device Manager here. CPU Manager Policy allows you to tell kube that your workload requires an affinity to a CPU core. Maybe your workload needs CPU cache affinity and can’t handle being bounced around to different CPU cores on the node via normal fair share scheduling on linux. HugePages allows you to request that your workload consume a specific amount of HugePages.

Performance Pods

How it Works: CPU manager is a flag on the kubelet that has the option of none or static. Static will cause guaranteed QoS pod access to exclusive CPU cores on the node. HugePages is a flag you set to true on the master and kubelet. The nodes will then be able to tell if any HugePages are available and workloads can request them via the pod definition.

ubelet device manager

CPU Manager Policy

# cat /etc/origin/node/node-config.yaml...kubeletArguments:... feature-gates: - CPUManager=true cpu-manager-policy: - static cpu-manager-reconcile-period: - 5s kube-reserved: - cpu=500m

Result:

# oc exec pod-name -- cat /sys/fs/cgroup/cpuset/cpuset.cpus2# oc exec pod-name -- grep ^Cpus_allowed_list /proc/1/statusCpus_allowed_list: 2

HugePages

# cat /etc/origin/node/node-config.yaml...kubeletArguments:... feature-gates: - HugePages=true

Pod spec:

resources: requests: cpu: 1 memory: 256Mi limits: cpu: 1 memory: 256Mi

# cat /etc/origin/master/master-config.yaml...kubernetesMasterConfig: apiServerArguments: ... feature-gates: - HugePages=true

Pod spec:

resources: limits: hugepages-2Mi: 100Mi

Both the variable name and value are configurable.

https://trello.com/c/REzRX4UW/876-node-promote-hugepages-to-supported

https://trello.com/c/5Pj2QkcP/877-promote-cpu-manager-pinning-to-supported

https://kubernetes.io/docs/concepts/cluster-administration/device-plugins/

https://docs.google.com/presentation/d/1p0DFiZAPyJiQNtFGGGIcfspHb9kLMJcElw1ShxlJFx0/edit#slide=id.g31380bb1cf_0_151

Feature(s): Node Problem Detector

Description: Daemon that runs on each node as a daemonSet. The daemon tries to make the cluster aware of node level faults that should make the node not schedulable.

Node

How it Works: When you start the node problem detector you tell it a port to broadcast the issues it find over. The detector allows you to load sub-daemons to do the data collection. There are 3 as of today. Issues found by the problem daemon can be classified as “NodeCondition” which means stop node scheduling or “Event” which are only informative.


TechPreview

Problem Daemons:

● Kernel Monitor: monitors kernel log via journald and reports problems according to regex patterns

● AbrtAdaptor: monitors the node for kernel problems and application crashes from journald

● CustomerPluginMonitor: allows you to test for any condition and exit on a 0 or 1 should you condition not be met.

https://trello.com/c/OA9z4cMU/701-8-node-deliver-node-problem-detector-for-tech-preview

Feature(s): Protection of Local Ephemeral Storage

Description: Control the usage of local ephemeral storage feature on the nodes in order to prevent users from exhausting all node local storage (logs, empty dirs, copy on write layer) with their pods and abusing other pods that happen to be on the same node.

Node

How it Works: After turning on LocalStorageCapacityIsolation, pods submitted use the limit and requested fields. Violations will result in an evicted pod.Limit: ephemeral storage request when scheduling a container to a node, then fences off the requested ephemeral storage on the chosen node for the use of the container.

Request: provides a hard limit on the ephemeral storage that can be allocated across all the processes in a container


TechPreview

1. Master: /etc/origin/master/master-config.yamlkubernetesMasterConfig: apiServerArguments:

feature-gates:- LocalStorageCapacityIsolation=true

controllerArguments:feature-gates:- LocalStorageCapacityIsolation=true

2. Node: /etc/origin/node/node-config.yamlkubeletArguments: feature-gates: - LocalStorageCapacityIsolation=true

3.) Launch pods with the following in their deploymentConfig

resources: requests: ephemeral-storage: 500Mi limits: ephemeral-storage: 1Gi

https://trello.com/c/0Omf07Vr/891-3-node-prevent-abuse-of-local-ephemeral-storage

Feature(s): Descheduler

Description: Due to the fact a scheduler’s view of a cluster is at a single point in time, a overall cluster’s balance may become skewed from taints and tolerations, evictions, affinities, and other life cycle reasons such as node maintenance or new node additions. As a result, you can have some nodes become under or over utilized.

Node

How it Works: A descheduler is a job running in a pod that runs in the kube-system project. This descheduler finds pods based on its policy and evicts them in order to give them back to the scheduler for replacement on the cluster. It does not target static pods, those with high QoS, daemonSets, or those with local storage.


TechPreview Available Policies:

● RemoveDuplicates: if this policy is set the descheduler looks for pods that are apart of the same replicaSet or deployment that happen to have been placed on the same node. It evicts the duplicates in the hope the scheduler will place them on a different node.

● LowNodeUtilization: finds nodes that are under the CPU, MEM, and # of pod thresholds you have set, it will evict pods from other nodes in the hopes the scheduler places the pods on these under utilized nodes. There is also a setting to only trigger this if you have more than X number of under utilized nodes.

● RemovePodsViolatingInterPodAntiAffinity and RemovePodsViolatingNodeAffinity: re-evaluates the pods that might have been forced to break their affinity rules and evicts them for another chance to be places on nodes that comply to their affinity or anti-affinity.

https://trello.com/c/vP0ACvwc/851-5-scheduler-deliver-descheduler-for-tech-preview

https://github.com/kubernetes-incubator/descheduler/blob/master/README.md

Feature(s): Windows Containers

Description: Be able to run Windows containers on Windows Server 1709, 1803, and 2019 within a OpenShift cluster.

Node

How it Works: Join partnership between Microsoft and Red Hat. Microsoft will distribute and support, through our joint co-located support process, the kubelet, configuration/installation, and networking components that need to be installed on Windows. Red Hat will support the interaction with those components with the OpenShift cluster.

Customers and partners can sign up for the developer preview program here. The program will start within the next 7 days. It has been delayed due to technical difficulties.


DevPreview

Providing in the developer preview:

1.) Powershell script to satisfy container prerequisites on Windows Server

2.) Installation process that allows you to install on one to many nodes without deploying an overlay network

3.) Ansible playbooks to deploy and configure an experimental OVN network on the OpenShift cluster

4.) Ansible playbooks to deploy and configure an experimental OVN network from CloudBase on Windows Server. And to then connect that Windows node to the OpenShift cluster

Features in the first drop:

1.) kubelet and pre-reqs (docker, networking plugins, etc)2.) Join Windows node to OpenShift cluster 3.) Allow Windows access to certain projects (nodeSelector or taints

& tolerations)4.) Work with templates in the Service Catalog5.) Attach static storage to the container6.) Scale up and down the Windows container7.) DNS resolvable URL for service to route object8.) East/west network connectivity to Linux pods9.) Delete Windows Container

Video of it WORKING!!!

https://kubernetes.io/docs/getting-started-guides/windows/

https://goo.gl/forms/RSzQceFe9KyjYHvY2

https://drive.google.com/file/d/1xVDtWOzgHw3rP1s8QGo3CgWtbYHSjTG_/view?usp=sharing

Feature(s): Expose registry metrics with OpenShift auth

Description: Registry metrics endpoint now protected by built-in OpenShift auth

How it works:

● Registry provides an endpoint for Prometheus metrics

● Route must be enabled● Users with the appropriate role can access

metrics using their openshift credentials● An admin defined shared secret can still be

used to access the metrics as well

Registry

https://trello.com/c/iOKMS2PA/1490-5-expose-registry-metrics-with-openshift-auth-registry

https://trello.com/c/iOKMS2PA/1490-5-expose-registry-metrics-with-openshift-auth-registry

Feature(s): Run control plane as static pod

Description: Migrate control plane to static pods to leverage self-management of cluster components and minimize direct host management

How it Works: ● In 3.10 and newer, control plane components (etcd, API, and controller manager) will now move to

running as static pods● Goal is to reduce node level configuration in preparation for automated cluster configuration on immutable

infrastructure● Unified control plane deployment methods across Atomic Host and RHEL; everything runs atop the kubelet.

● The standard upgrade process will migrate existing clusters automatically

Installation

https://trello.com/c/1fVvddqb/649-13-run-control-plane-as-static-pod-and-enable-node-bootstrapping-by-default

Feature(s): Bootstrapped Node Configuration

Description: Node configuration is now managed via API objects and synchronized to nodes

How it Works: ● In 3.10 and newer, all members of the [nodes] inventory group must be assigned an openshift_node_group_name (value is

used to select the configmap that configures each node)● By default, there are five configmaps created: node-config-master, node-config-infra, node-config-compute,

node-config-master-infra, & node-config-all-in-one● Last two place a node into multiple roles● Note: configmaps are the authoritative definition of node labels; the old openshift_node_labels value is effectively ignored.

● If you want to deviate from default configuration, you must define the entire openshift_node_group dictionary in your inventory. When using an INI based inventory it must be translated into a Python dictionary.

● The upgrade process will now block until you have the required configmaps in the openshift-node namespace● Either accept the defaults or define openshift_node_groups to meet your needs, then run

playbooks/openshift-master/openshift_node_group.yml to create the configmaps● Review the configmaps carefully to ensure that all desired configuration items are set then restart the upgrade

● Changes to these configmaps will propagate to all nodes within 5 minutes overwriting /etc/origin/node/node-config.yaml

Installation

Image Reference: https://medium.com/@toddrosner/kubernetes-tls-bootstrapping-cf203776abc7

https://trello.com/c/1fVvddqb/649-13-run-control-plane-as-static-pod-and-enable-node-bootstrapping-by-default

Feature(s): HA Setup For Egress Pods

Description: In the first z-stream release of 3.10, egress pods can have HA failover across secondary cluster nodes in the event the primary node goes down.

How it works: Namespaces are now allowed to have multiple egress IPs specified, hosted on different nodes, so that if the primary node fails the egress IP switches from its primary to secondary egress IP being hosted on another node. When the original IP eventually comes back, then nodes will switch back to using the original egress IP. The switchover currently takes ≤7 seconds for a node to notice that an egress node has gone down (potentially configurable in a later version).

Networking

NODE 2NODE 1

NAMESPACE A

EXTERNAL SERVICE

Whitelist: IP1, IP2

POD POD POD POD

EGRESS IP 1

EGRESS IP 2

https://trello.com/c/SslccZYc/578-8-ha-for-auto-egress-ip-egress

Feature(s): Allow DNS names for egress routers

Description: The egress router can now refer to an external service, with a potentially unstable IP address, by its hostname.

How it works: The OpenShift egress router runs a service that redirects egress pod traffic to one or more specified remote servers, using a pre-defined source IP address that can be whitelisted on the remote server. Its EGRESS_DESTINATION can now specify the remote sever by FQDN.

Networking

NODEIP1

EGRESSROUTER

PODIP1

EGRESS SERVICE

INTERNAL-IP:8080

EXTERNAL SERVICE

Whitelist: IP1

POD

POD

POD

...- name: EGRESS_DESTINATION value: | 80 tcp my.example.com 8080 tcp 5.6.7.8 80 8443 tcp your.example.com 443 13.14.15.16...

https://trello.com/c/407uoUFz/495-5-allow-dns-names-for-egress-routers-human-names-egressdemo

https://trello.com/c/407uoUFz/495-5-allow-dns-names-for-egress-routers-human-names-egressdemo

Feature(s): Document and test a supported way of expanding the serviceNetwork

Description: Provide a supported way of growing the service network address range in a multi-node environment to a larger address space.

For example:

serviceNetworkCIDR: 172.30.0.0/24

Note: This DOES NOT cover migration to a different range, JUST the increase of an existing range.

Networking

1. Update the master-config.yaml to change the serviceNetworkCIDR to 172.30.0.0/16

2. Delete the default clusternetwork object on the master: # oc delete clusternetwork default

3. Restart the master API service and the controller service

4. Update the ansible inventory file to match the change in (1) and redeploy the cluster

5. Evacuate the node one by one and restart the iptables and atomic-openshift-node services

How it works:

172.30.0.0/16

https://trello.com/c/Sz3oQzu9/635-document-and-test-a-supported-way-of-expanding-the-servicenetwork-docs

https://trello.com/c/Sz3oQzu9/635-document-and-test-a-supported-way-of-expanding-the-servicenetwork-docs

https://docs.openshift.org/latest/install_config/configuring_sdn.html#expanding-the-service-network

Feature(s) : Specify whitelist cipher suite for etcd

Security

Description: Users now have the ability to optionally whitelist cipher suites for use with etcd in order to meet security policies.

How it Works: ● Configure etcd to add --cipher-suites flag with

the desired cipher suite● Restart etcd, apiserver, controllers, etc● TLS handshake fails when client hello is

requested with invalid cipher suites.● If empty, Go auto-populates the list.

https://bugzilla.redhat.com/show_bug.cgi?id=1569169

https://github.com/coreos/etcd/pull/9801

Feature(s) : Control Sharing the PID namespace between containers

Security

Description: Use this feature to configure cooperating containers in a pod, such as a log handler sidecar container, or to troubleshoot container images that don’t include debugging utilities like a shell.

How it Works: ● The feature gate PodShareProcessNamespace is set to false by default● Set 'feature-gates=PodShareProcessNamespace=true'

in apiserver, controllers and kubelet ● Restart apiserver, controller and node service● Create a pod with spec "shareProcessNamespace: true"● oc create -f <pod spec file>

Caveats: When the pid namespace is shared between containers● Sidecar containers are not isolated ● Environment variables are now visible to all other processes● Any "kill all" semantics used within the process are now broken● Exec processes from other containers will now show up pods/share-process-namespace.yaml

TechPreview

https://trello.com/c/6czjxX2X/820-1-ability-to-opt-out-of-shared-pid-namespace-on-a-per-pod-basis

https://raw.githubusercontent.com/kubernetes/website/master/content/en/examples/pods/share-process-namespace.yaml

Feature(s) : Router Service Account no longer needs access to secrets

Security

Description: The router service account no longer needs permission to read all secrets. This improves security, as previously, if the router were compromised it could then read all of the most sensitive data in the cluster.

How it Works:

● When you create an ingress object, a corresponding route object is created.

● If an ingress object is modified, a changed secret should take effect soon after

● If an ingress object is deleted, a route that was created for it will be deleted

SERVICE

POD POD

ROUTER

POD

EXTERNAL TRAFFIC

INTERNAL TRAFFIC

https://trello.com/c/TxA9vyZz/528-5-change-the-routers-ingress-support-so-it-does-not-need-to-do-a-list-watch-on-secrets-ingress

Feature(s): Container Storage Interface (CSI)

Description: Introduce CSI sub-system as tech preview in 3.10

• External Attacher

• External Provisioner

• Driver registrar

• CSI Drivers shipped: None (use external/upstream)

StorageHow it Works

• Create a new project where the CSI components will run and a new service account that will run the components

• Create the Deployment with the external CSI attacher and provisioner and DaemonSet with the CSI driver

• Create a StorageClass for the new storage entity • Create a PVC with the new StorageClass

• See: https://github.com/openshift/openshift-docs/blob/master/install_config/persistent_storage/persistent_storage_csi.adoc

DevPreview

Feature(s): New Storage Provisioners

Description: New Storage Provisioners (external provisioners) added as Tech Preview with 3.10

• CephFS

StorageHow it Works • Use OpenShift Ansible installer

openshift_provisioners role

• Set the provisioner to be installed and started as true

<After the provisioner install and startup is completed>

• Create a Storage Class for the storage entity

• Create a pod with a PVC/claim with the Storage Class

TechPreview

● Atomic Host deprecation notice, as Red Hat CoreOS will be the future immutable host option.

○ Atomic supported in 3.10 & 3.11

Storage

● Virtual data optimizer (VDO) for dm-level dedupe and compression.

● OverlayFS by default for new installs (overlay2)○ Ensure ftype=1 for 7.3 and earlier

● Devicemapper continues to be supported and available for edge cases around POSIX

● LVM snapshots integrated with boot loader (boom)

RHEL 7.5 Highlights

OpenShift Container Platform 3.10 is supported on RHEL 7.4, 7.5 and Atomic Host 7.5+

Containers / Atomic

● Docker 1.13● Docker-latest deprecation● RPM-OSTree package overrides

Security

● Unprivileged mount namespace● KASLR full support and enabled by default. ● Ansible remediation for OpenSCAP● Improved SELinux labeling for cgroups

(cgroup_seclabel)

CRI-O v1.10

Feature(s): CRI-O v1.10

Description: CRI-O is an OCI compliant implementation of the Kubernetes Container Runtime Interface. By design it provides only the runtime capabilities needed by the kubelet. CRI-O is designed to be part of Kubernetes and evolve in lock-step with the platform.

CRI-O brings:

● A minimal and secure architecture● Excellent scale and performance● Ability to run any OCI / Docker image● Familiar operational tooling and commands

Improvements include:

● crictl CLI for debugging and troubleshooting● Podman for image tagging & management● Installer integration & fresh install time

decision: openshift_use_crio=True● Not available for existing cluster upgrades

KubeletStorage Image

RunCCNI Networking

Questions

What’s New in Red Hat OpenShift Origin 3.10 OpenShift ... · OCP 3.10 - The Efficient Cluster Resource Management Descheduler (tech preview), CPU Manager, Ephemeral Storage, HugePages

Documents