ERSTER EINBLICK IN AUTOMATIC TCP/IP CONNECTION FAILOVER Marcus Pullen, Technical Consultant 19. November 2019
ERSTER EINBLICK IN AUTOMATIC TCP/IPCONNECTION FAILOVER
Marcus Pullen, Technical Consultant
19. November 2019
TCP/IPV6
2
CPU 0
TCP/IP
SN1 SN2
CPU 1
TCP/IP
SN1 SN2
CPU N
TCP/IP
SN1 SN2
I/O Adapter – Lan1 I/O Adapter – Lan2
NonStop
CIP
3
CPU 0
TCP/IP
CLIM1 CLIM2
CPU 1
TCP/IP
CLIM1 CLIM2
CPU N
TCP/IP
CLIM1 clim2
CLIM1 CLIM2
Linux kernel Linux kernel
NonStop
*CIP = Cluster I/O Protocols
Automatic TCP/IP Connection Failover (ATCF)INTRODUCTION
• Automatically fails over TCP/IP connections from the primary CLIM to the backup CLIM if the primary CLIM fails
• Failover is transparent to the applications using the TCP/IP connections
4
CPU
Primary Adapter
Backup Adapter
TCP state
TCP/IPv6
G16SE
G16SE
CPU
PrimaryCLIM
Backup CLIM
TCP state
?
CIP before ATCF
CPU
Primary CLIM
Backup CLIM
TCP state
TCP state
TCP state
CIP w/ ATCF enabled
CONFIDENTIAL
FAILOVER IN TCP/IPV6, CIP, AND ATCF
The TCP/IPv6 stack is in the NSK host• The TCP state is unaffected if an adapter fails.
The CIP stack is in the CLIM• The TCP state and buffered data are lost if a
CLIM fails.
ATCF keeps a copy in the host or remote stack
• The TCP state is copied to the host.• Acknowledgements are delayed, so a copy of
the buffered data is saved in the originator.CONFIDENTIAL 5
pr G4SA
bk G4SA
pr CLIM
bk CLIM
Linux TCP
Linux TCP
1231234
4
??
5 Remote Stack
NSKCIP
23
45678
1234567
NSKTCP/IPv6
12345678
Remote Stack
pr CLIMLinux TCP
12345
bk CLIMLinux TCP
1234567
NSKCIP
1
122
33 4
33 4
455
66
77
8
Remote Stack
45678
EVOLUTION OF NONSTOP TCP/IP FAILOVER
TCP/IPv6(G16SE)
CIP before ATCF
CIP with ATCF
Failover of IP addresses, routes, tunnels on CLIM failure yes yes yes
Failover of UDP and listening sockets on CLIM failure yes yes yes
Failover of connected TCP sockets on CLIM failure yes no yes,
configurableFailover of connected SCTP sockets on CLIM failure n/a no no
Transfer of IPsec and IPtables config on CLIM failure n/a no no
Automatic restore when primary CLIM recovers yes no yes, configurable
Failover of sockets on NSK CPU failure no no no
CONFIDENTIAL 6
CONFIGURATION
7
AUTOMATIC TCP/IP CONNECTION FAILOVER
8
• ATCF and auto restore are enabled by interface using an extension of the climconfig failover –add command for regular interfaces:
climconfig failover -add <interface name>
-dest <CLIM name>.<interface name>
[-type unc|atcp] [-restore man|auto]
[-restorewait <secs>]
• Loopback interfaces are configured using a new attribute of the SCF ADD or ALTER PROVIDER commands:
ADD PROVIDER $ZZCIP.<provider-name>[[ , TYPE IPDATA ] [ , SHARE-PORTS <num-ports>] [, FAMILY { INET | DUAL }]
[ , LOFAILOVER { UNC | ATCP } ] |, TYPE MAINTENANCE , CLIM <clim-name> , IPADDRESS <ip-addr> ]
[ , TPNAME <tp-name> ][ , HOSTNAME <hostname> ][ , HOSTID <hostid> ][ , BRECVPORT ( <port> [, <port> [ ... ] ] ) ][ , TCP-LISTEN-QUE-MIN <queue-size> ]
• Failover types:• unc = unconnected sockets (same as original CIP)• atcp = all TCP sockets (ATCF enabled)
• Restore types: man = manual (same as original CIP), auto = automatic restore
64-BIT QIO
9
HOST MEMORY USAGE AND QIO64• Both incoming and outgoing data on CFO-enabled connections require more host processor memory• The QIO limitation of 512 Mbytes was not sufficient• QIO64 adds 64-bit segments to increase its maximum memory to 1/8th the physical memory (1/16th by
default) on the processor–Minimum size (32 GByte physical memory) is 4 Gbytes (2 Gbytes default)–Current maximum size (256 GByte physical memory) is 32 GBytes
• CIP uses the 64-bit segments for all data buffering, whether ATCF is enabled or not• QIO has new configuration parameters:
–Seg64MaxSize sets the maximum size of the 64-bit segments–Pool64InitSize sets the starting size of the 64-bit pool
• Also new status displays and EMS messages
10
HARDWARE/SOFTWARE REQUIREMENTS
11
Automatic TCP/IP Connection FailoverHARDWARE REQUIREMENTS
12
Hardware ATCF Supported ATCF Not SupportedNonStop system HPE Integrity NonStop X NS7
systems HPE Integrity NonStop X NS3 systems
HPE Virtualized NonStop systemsHPE NonStop X NS2 systemsHPE Integrity NonStop i systems
CLIM type NonStop X IP CLIMNonStop X Telco CLIM
NonStop X Storage CLIMVirtualized CLIMs (including IP and Telco vCLIM)
CLIM generation NonStop X Gen10 CLIMNonStop X Gen9 CLIM
NonStop X Gen8 CLIMNonStop i CLIMs (all generations)
Automatic TCP/IP Connection FailoverSOFTWARE REQUIREMENTS
13
• CIP• T0690L02^BBO• T0693L02^BAS• T0694L02^BBO• T0695L02^BBO• T0696L02^BBO• T0853L03^DBO
– T0691L03^DBO– T0692L03^DBO
• Release RVU is L19.08• Supported on L19.03 with the following SPRs
• QIO• T1034L01^AAC• T8375L02^ACH• T8671L02^ABH• T8672L02^AAF
• NCSL• T0940L02^ABO
INSTALLATION• There is no change in Installation steps of NonStop CIP
products• There is no change in CLIM S/W update procedure• There is no change in CLIM re-image procedure• The CIP configuration from pre-Connection Failover can be
migrated to Connection Failover without any issues
14
PERFORMANCE, LIMITATIONS AND FUTURE PLANS
15
ATCF PERFORMANCE CONSIDERATIONS• ATCF has some additional overhead as compared to the TCP/IPv6 implementation• Application throughput with ATCF enabled could be lower by 15% depending on your application• NonStop CPU cost with ATCF enabled is about 10%-15% higher depending on the type of the traffic
• Maximum number of IPsec connections on an IP CLIM with ATCF enabled is 1,500 vs 6,000 when ATCF is disabled
• Failover time• Small number (500<) of connections fail over faster than TCP/IPv6• Large number (>1,000) of connections can be a few seconds slower, but never so slow the connections time
out
Note: The above statements are based on tests conducted on a 10GbE network
YOUR EXPERIENCE MAY VARY FROM THE ABOVE!
16
ATCF CURRENT LIMITATIONS• Cannot be enabled on HPE Virtual NonStop (vNS) and Converged Virtualized NonStop (NS2)
systems• Cannot be enabled on VLAN and VxLAN interfaces• Doesn’t support IPv6 sockets
Corner case limitations:• Not fully complete (new) connections may not fail over• The incoming data is limited to 4000 bytes on an incoming connection until the connection is accepted by the
NonStop server. • Connections in the latter stages of closing during failover/restore may be reset instead of going through a
clean close• Out-of-band urgent data state is not fully saved during failover
17
ATCF FUTURES - SUBJECT TO CHANGE!• Support of Virtualized NonStop and NS2 systems• Support of VLAN and VxLAN interfaces• Improved throughput and host CPU cost• IPv6 support• Out-of-band urgent data failover
• Complete handling of corner case closing states• Failover of incomplete connections• Failover of all unaccepted connections
18
NonStop Firmware Update Procedures Best PracticesMarcus Pullen, Technical Consultant 19. November 2019
AGENDA• System Overviews• Documentation• Where to find Firmware • How to update Firmware• x86 Side-Ch. Vulnerabilities Mitigation • Flow & Expected Times
20
SYSTEM OVERVIEW NS7/NS3 (CONVERGED NONSTOP)• Where does Firmware play a role
• Blades / CPUs–BIOS– iLO
• Onboard Admininstrator• CLIM Components
– incl. Disk Enclosures• Ethernet Switches• System Interconnects• NonStop System Console*• Power*• Alarm Panel*
21*not covert here*
SYSTEM OVERVIEW NS2 (VIRTUALIZED CONVERGED NONSTOP)• Where does Firmware play a role
• Compute Node–BIOS– iLO–NICs–SAS
• System Interconnects–FlexFabric Switches
• Maintenance Switch• NonStop System Console*
22*not covert here*
FIRMWARE DOCUMENTATION• NonStop Technical Library
• Firmware Matrices Guide
23
• L-Series Software Installation and Upgrade Guide
• CIP Configuration and Management Manual• HPE OSM Service Connection (L-series) OLH + User's Guide• NonStop System Console Installer Guide• NonStop System Console Security Policy and Best Practices
FIRMWARE MATRICES GUIDE
WHERE TO FIND FIRMWARE Firmware needs to be managed properly to ensuring compatible versions are installed. Using wrong, unsupported Firmware can result in a wide range of system problems ( from intermittent faults to partial or complete system outages)
• Firmware on SUT (Site Update Tape)• For everything which is managed by OSM
–CLIM Components (HBAs/NICs/iLO/BIOS–Disk Enclosure
• Firmware on NSC (NonStop System Console) System Update DVD• Onboard Administrator• Blade iLO/Bios• System Interconnect • Maintenance Switch• UPS• Alarm Panel• NSC (iLO/Bios)
25
FIRMWARE ON SUT (SITE UPDATE TAPE)• OSM Service Connection informs about
• Firmware version information• Firmware Name, -Files and -File Location• Compare State attributes
– compares the current version against the default or available version
HPE CONFIDENTIAL AS PRESENTED AT THE NONSTOP TECHNICAL BOOTCAMP
26
FIRMWARE ON NSC (NONSTOP SYSTEM CONSOLE) SYSTEM UPDATE DVD
27
HOW TO UPDATE FIRMWARE
28
JOT DOWN THE VARIOUS USER-IDS AND PASSWORDS• All the different components have different user-ids and passwords. • Following are some defaults :
Component User-Id PasswordNSC Windows Server 2003 Administrator <none>NSC Windows Server 2008 Administrator Win2008NSC
!NSC Windows Server 2012 Administrator Win2012NSC
!NSC Windows Server 2016 Administrator Win2016NSC
!CLIM Linux root hpnonstopBlade Onboard Administrator
Admin hpnonstop
Blade Processor iLO Admin hpnonstopiLO of a CLIM Admin hpnonstopHPE 6125G Blade Switch admin adminIB Switch admin adminAruba Maintenance Switch admin admin
SIDE-CH. VULNERABILITIES MITIGATION FLOW / EXPIERENCES / TIME EXPECTATIONS
48
X86 SIDE-CH. VULNERABILITIES MITIGATION
CONFIDENTIAL 49
• HPE vulnerability webpage https://www.hpe.com/us/en/services/security-vulnerability.html• you’ll find all most current information about this topic. • from there you’ll find Links to the different areas, e.g. Links to those systems which are affected by
Meltdown/Spectre and also information about the available solutions.
• Additionally HPE NonStop does provide HotStuff Messages with very details Information about each Product areas• HS03387E x86 Side-Ch. Vulnerabilities Mitigation on NonStop CPUs
– from 27th of September 2019• HS03372G x86 Side-Ch. Vulnerabilities Mitigation on CLIM
– from 12th June 2019
• HS03369F x86 Side-Channel Vulnerabilities Mitigation On NSC– from 14th February 2019
• HS03371I x86 Side-Ch. Vulnerabilities Mitigation On VTR/VTC– from 26th September 2019
On NonStop CPUsX86 SIDE-CH. VULNERABILITIES MITIGATION
Abbr Common Name Fix Location ---- ----------- ------------V1 Spectre Variant 1 NonStop OSV1.1 Sepctre Variant 1.1 NonStop OSV2 Spectre Variant 2 NonStop OS and NonStop CPU BIOS M Meltdown NonStop OS V3a Spectre Variant 3a NonStop CPU BIOS V4 Spectre Variant 4 NonStop OS and NonStop CPU BIOS L1TF L1TF:OS/SMM NonStop CPU BIOS Mic Microarchitectura NonStop OS and NonStop CPU BIOS
CONFIDENTIAL 50
On NonStop CPUsX86 SIDE-CH. VULNERABILITIES MITIGATION
CONFIDENTIAL 51
Component Spectre Variant 1 Spectre Variant 1.1
Spectre Variant 2
Spectre Variant 3(Meltdown)
SpectreVariant 3a
Spectre Variant 4
L1TF:OS/SMM Micro-architectura
NonStop Operating System
L19.03.00 L19.08.00 L19.03.00 L19.03.00 not applicable L19.03.00 not applicable L19.08.00
NonStop CPU BIOS not applicable not applicable 1.46_10-02-2018(27 Nov 2018 *
not applicable
1.46_10-02-2018(27 Nov 2018) *
1.46_10-02-2018(27 Nov 2018) *
1.46_10-02-2018(27 Nov 2018)
2.04_04-18-2019(2 May 2019)
* based on NonStop NS7X3 and NS3X3
UPDATE FLOW
CONFIDENTIAL 52
Activity Time Status1 NonStop Software Essentials
(recv. SW, Planning, Build/Apply, ZPHIRNM Preview)
4-6 Hours Online
2 Update NonStop System Console 30 Min Online
3 Update Onboard Administrator 15 Min Online
4 Update Blade Switches 15 Min Online
5 Update Maintenance Switches 15 Min Online
6 CLIM management tool: SW transfer 20 Min (all) Online
7 OSM Prepare Down CLIM Firmware Action 10 Min Online
8 Blade iLO & Blade Bios Update (close to planned downtime)
10 Min (each)
Online
UPDATE FLOW
CONFIDENTIAL 53
Activity Time Status9 Stop Application, Stop NonStop Subsystems,
DSM/SCM ZPHINRM ~ 30 Min Offline
10 CLIM Management Tool: Update CLIMS * 30 – 60 Min Offline
11 HSS/HCA Firmware Management Tool Updates 20 Min Offline
12 Coldload, Start NonStop Subsystems, Start Applications
~ 30 Min Offline
13 Check Firmware with OSM Service Connection 10 Min Online
14 Update System Interconnect Switches 25 Min (each)
Online
* LAB measurements on NonStop X using Gen8 and Gen9 CLIMsAverage Time taken with 100 Mbps
Maintenance LAN SwitchAverage Time taken with 1 Gbps
Maintenance LAN SwitchTime taken to complete Software and Firmware Update on Network CLIM ( 16 CLIMs) 55 minutes 27 minutes
Time taken to complete Software and Firmware Update on Storage CLIMs ( 10 CLIMs) 51 minutes 34 minutes
{in parallel
THANK YOU!For more information send email [email protected]
Note: Detailed instructions on how to replace sample pictures like the one shown here can be found in the speaker notes of this slide.Tip! Remember to remove this text box.
CONFIDENTIAL 54