Top Banner
SAP HANA Disaster Recovery with SUSE High Availability Extension Cleber Paiva de Souza / Gabriel Cavalcante {cleber,gabriel}@ssys.com.br S-SYS Systems and Solutions
66

SAP HANA Disaster Recovery with SUSE High Availability

Dec 28, 2016

Download

Documents

buiphuc
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SAP HANA Disaster Recovery with SUSE High Availability

SAP HANA Disaster Recovery with SUSE High Availability Extension

Cleber Paiva de Souza / Gabriel Cavalcante

{cleber,gabriel}@ssys.com.br

S-SYS Systems and Solutions

Page 2: SAP HANA Disaster Recovery with SUSE High Availability

2

S-SYS and SUSE

• S-SYS officially born in

Jan/2014

• SUSE partner since

beginning

• Formed by professionals

with experience in SUSE

products, Linux in

general, training and

software development

• Acting together with

SUSE engineers in pre-

sales and project delivery

Page 3: SAP HANA Disaster Recovery with SUSE High Availability

3

Fujitsu Brazil Case: Sap HANA Appliances

• Fujitsu offers Primergy

RX600 S6 and

PRIMEQUEST machines

with SAP HANA.

• SAP HANA HA studies took

place at Fujitsu Platform

Solution Center (PSC).

• Integration between S-SYS

and Fujitsu teams.

• Knowledge transfer allowed

Fujitsu to delivery SAP

HANA integrated with

SUSE High Availability.

Page 4: SAP HANA Disaster Recovery with SUSE High Availability

4

Fujitsu RX600 hardware specification

• Up to 4x Powerful Intel® Xeon®

processors E7 family

• Expandable to 1 TB of DDR3-

RAM with mirroring support

• Robust I/O design and 10 PCI

Express slots

• 8x hard drive bays and support

up to 8 TB local storage

• Integrated Remote Management

Controller (iRMC) providing

advanced management features

• 4U rack server form factor

Page 5: SAP HANA Disaster Recovery with SUSE High Availability

5

Fujitsu Rx600 real hardware specification

• 4 Intel® Xeon® E7 4870 2.4

GHz. (10 cores * 2 threads * 4

sockets = 80 cores)

• 1 TB of RAM

• 2 x Fusion-IO PCI cards (1.2

GB in RAID 1)

• 8 x 900 GB SAS 10000 RPMs

in RAID5

• 6 x 10GE Network interfaces

• 6 x 1GE Network interfaces (4

onboard and 2 PCI)

Page 6: SAP HANA Disaster Recovery with SUSE High Availability

6

Fujitsu Primequest PRIMEQUEST Hardware Specification

Highlights of the new generation

8-Socket Server with up to 4 independent HW partitions and flexible I/O based on latest Intel® Xeon E7-x800v3

Maximum performance by new Intel Haswell-EX processor generation with up to 18 cores

Increased memory capacity and performance by 192x DDR4 DIMM slots with 1866Mhz

System self repair with flexible-IO and reserved-SB functionality 12Gbps RAID controller with 1/2GB cache

All parts are redundant and/or hot swappable Enhanced ServerView management

Improved Enterprise RAS feature set

Product facts

up to 8x Intel Xeon E7-x800v3 (Haswell-EX)

Up to 12 TB RAM (using 192 x 64GB) Up to 24 x 2.5” HDD/SSD´s Up to 16 PCIe slots internal Additional 48 PCIe hot plug slots in 4 x ext. PCI Box

Up to 8 x 10GbE internal with 4 x IOUF

Page 7: SAP HANA Disaster Recovery with SUSE High Availability

7

HANA in SLES HAE ClusterHANA Single Box – System Replication / Scale-up

Page 8: SAP HANA Disaster Recovery with SUSE High Availability

8

Considerations

• We are Linux experts not SAP experts.

• SLES for SAP = SLES 11 + HA extension + SAP support + SAPHanaSR.

• All tests done on SLES for SAP 11 SP3.

• Two-node clusters only (Scale-up / single-box replication).

• AUTOMATED_REGISTER=“false”

• By default SAP HANA instances are not started during boot. Cluster take care of services.

• Synchronous system replication.

• SAP HANA SPS 08 release 85.

Page 9: SAP HANA Disaster Recovery with SUSE High Availability

9

Definitions

Parameter Value

Cluster node 1 hana01

Cluster node 2 hana02

SID HDB

Instance number 00

User key slehaloc

User Password

hdbadm P@ssword1

sapadm P@ssword1

SYSTEM P@ssword1

slehaloc Password1

Page 10: SAP HANA Disaster Recovery with SUSE High Availability

10

Problem

• SAP HANA System Replication on SLES for SAP

Applications from June/2014 did not provide

configurations for IPMI as a STONITH resource.

• SAPHanaSR Hawk template does not provide IPMI

configuration.

Page 11: SAP HANA Disaster Recovery with SUSE High Availability

Setup SAP HANA

Page 12: SAP HANA Disaster Recovery with SUSE High Availability

12

Procedures for setup

1) Install SLES for SAP

2) Configure network interfaces

3) Configure NTP and timezone

4) Setup disk layout

5) Check hostnames and IP addresses

6) Install SAP HANA database

7) Setup HANA

8) Configure SLES HA Extension

9) Testing takeover

10) Stress test

Page 13: SAP HANA Disaster Recovery with SUSE High Availability

1) Install SLES for SAP

Page 14: SAP HANA Disaster Recovery with SUSE High Availability

14

Install SLES for SAP

• Install SUSE as usual

– Select pattern SAP HANA Server Base.

– SLES for SAP install minimal network services.

• Register at SUSE Customer Center (SCC) and apply

updates to prevent well-known problem and bugs.

– Updates size ~ 500MB

• SAPHanaSR is available only on SLES for SAP:

– Provide SAPHanaTopology and SAPHana resource agents

– Provide Hawk wizard templates

Page 15: SAP HANA Disaster Recovery with SUSE High Availability

2) Configure network interfaces

Page 16: SAP HANA Disaster Recovery with SUSE High Availability

16

Configure network interfaces

• Define how your network communication will work.

• Define interfaces for user access, heartbeat, data

replication, SAP remote support, STONITH, IPMI etc.

Page 17: SAP HANA Disaster Recovery with SUSE High Availability

17

Network throughout and redundancy

• Use bonding for aggregation or redundancy. (802.3ad,

balance-rr, active-backup etc)

• Make use of 10GE network interfaces and Infiniband

56 GB.

• High Availability requires redudant paths for network

switches, fibre switches, Infiniband switches etc.

• Monitor your environment.

Page 18: SAP HANA Disaster Recovery with SUSE High Availability

3) Configure NTP and timezone

Page 19: SAP HANA Disaster Recovery with SUSE High Availability

19

Configure NTP and timezone

• All nodes must be in time sync.

• Cluster could fail if clock are skewed.

• Make use of the same timezone on all nodes or SAP

could misbehavior.

• Trace events on logs

could be hard.

Page 20: SAP HANA Disaster Recovery with SUSE High Availability

4) Setup disk layout

Page 21: SAP HANA Disaster Recovery with SUSE High Availability

21

Setup disk layout

Page 22: SAP HANA Disaster Recovery with SUSE High Availability

22

Data throughput and redundancy

• Put /hana/log on Fusion-IO for performance.

– 7200 RPMs SATA = ~100 IOPS

– 15000 RPMs SAS = ~200 IOPS

– SSD disks = ~20,000 IOPS

– Fusion-IO = ~140.000 IOPS

• /hana/data e /hana/shared on some RAID layout (1,

5, 6 etc).

Page 23: SAP HANA Disaster Recovery with SUSE High Availability

5) Check hostname and IP Addresses

Page 24: SAP HANA Disaster Recovery with SUSE High Availability

24

Check hostnames and IP addresses

• Hostname should be defined before starting SAP

HANA installation.

– SAP HANA stores this information on sapstart service profiles

– Altering hostame after installation will require some changes

in files such as

/usr/sap/<SID>/HDB<instance_number>/<hostname>/sapp

rofile.ini

• Check /etc/hosts consistency.

– All nodes must know other nodes’ IPs to hostname mapping.

• Assign a virtual IP and hostname for the master node

in cluster.

Page 25: SAP HANA Disaster Recovery with SUSE High Availability

6) Install SAP HANA Database

Page 26: SAP HANA Disaster Recovery with SUSE High Availability

26

Install SAP HANA Database

Page 27: SAP HANA Disaster Recovery with SUSE High Availability

27

Install SAP HANA Database

Page 28: SAP HANA Disaster Recovery with SUSE High Availability

28

Install SAP HANA Database

Page 29: SAP HANA Disaster Recovery with SUSE High Availability

29

Install SAP HANA Database

Page 30: SAP HANA Disaster Recovery with SUSE High Availability

30

Install SAP HANA Database

Page 31: SAP HANA Disaster Recovery with SUSE High Availability

31

Install SAP HANA Database

Page 32: SAP HANA Disaster Recovery with SUSE High Availability

32

Install SAP HANA Database

Page 33: SAP HANA Disaster Recovery with SUSE High Availability

33

Install SAP HANA Database

Page 34: SAP HANA Disaster Recovery with SUSE High Availability

34

Install SAP HANA Database

Page 35: SAP HANA Disaster Recovery with SUSE High Availability

35

Install SAP HANA Database

Page 36: SAP HANA Disaster Recovery with SUSE High Availability

36

Install SAP HANA Database

Page 37: SAP HANA Disaster Recovery with SUSE High Availability

37

Install SAP HANA Database

Page 38: SAP HANA Disaster Recovery with SUSE High Availability

7) Setup HANA

Page 39: SAP HANA Disaster Recovery with SUSE High Availability

39

Setup HANA (part I)

• Create user for data synchronization on all nodes:

# export PATH="$PATH:/usr/sap/HDB/HDB00/exe"

# hdbsql -u system -i 00 'CREATE USER slehasync PASSWORD Password1'

# hdbsql -u system -i 00 'GRANT DATA ADMIN TO slehasync'

# hdbsql -u system -i 00 'ALTER USER slehasync DISABLE PASSWORD LIFETIME'

• Set password on all nodes:

# hdbuserstore SET slehaloc localhost:30015 slehasync Password1

Page 40: SAP HANA Disaster Recovery with SUSE High Availability

40

Setup HANA (part II)

• Verify user creation on all nodes:

# hdbuserstore list

DATA FILE : /root/.hdb/hana01/SSFS_HDB.DAT

KEY SLEHALOC

ENV : localhost:30015

USER: slehasync

• Verify query working without asking for password on

all nodes:

# hdbsql -U slehaloc "select * from dummy"

DUMMY

"X"

1 row selected (overall time 2733 usec; server time 115 usec)

Page 41: SAP HANA Disaster Recovery with SUSE High Availability

41

Setup HANA (part III)

• Defining primary node with user hdbadm:

hana01:/usr/sap/HDB/HDB00> hdbnsutil -sr_enable --name=SITE001

checking for active nameserver ...

nameserver is active, proceeding ...

successfully enabled system as system replication source site

done.

• Verifying node state with user hdbadm:

hana01:/usr/sap/HDB/HDB00> hdbnsutil -sr_state

checking for active or inactive nameserver ...

System Replication State

~~~~~~~~~~~~~~~~~~~~~~~~

mode: primary

site id: 1

site name: SITE001

Host Mappings:

~~~~~~~~~~~~~~

done.

Page 42: SAP HANA Disaster Recovery with SUSE High Availability

42

Setup HANA (part IV)

• Do first backup:

hana01:~ # hdbsql -u system -i 00 "BACKUP DATA USING FILE ('backup')"

Password:

0 rows affected (overall time 46.986124 sec; server time 46.984819 sec)

• Verify replication status:

hana01:~ # hdbsql -U slehaloc 'select distinct REPLICATION_STATUS from

SYS.M_SERVICE_REPLICATION’

REPLICATION_STATUS

0 rows selected (overall time 1701 usec; server time 401 usec)

Page 43: SAP HANA Disaster Recovery with SUSE High Availability

43

Setup HANA (part V)

• Define secondary node with user hdbadm:

hana02:/usr/sap/HDB/HDB00> hdbnsutil -sr_register --remoteHost=hana01 --

remoteInstance=00 --mode=sync --name=SITE002

adding site ...

checking for inactive nameserver ...

nameserver hana02:30001 not responding.

collecting information ...

updating local ini files ...

done.

• Check secondary node status with user hdbadm:

hana02:/usr/sap/HDB/HDB00> hdbnsutil -sr_state

checking for active or inactive nameserver ...

System Replication State

~~~~~~~~~~~~~~~~~~~~~~~~

mode: sync

site id: 2

site name: SITE002

active primary site: 1

Page 44: SAP HANA Disaster Recovery with SUSE High Availability

44

Setup HANA (part VI)

• Check primary node status with user hdbadm:

hana01:/usr/sap/HDB/HDB00> hdbnsutil -sr_state

checking for active or inactive nameserver ...

System Replication State

~~~~~~~~~~~~~~~~~~~~~~~~

mode: primary

site id: 1

site name: SITE001

Host Mappings:

~~~~~~~~~~~~~~

hana01 -> [SITE001] hana01

hana01 -> [SITE002] hana02

done.

Page 45: SAP HANA Disaster Recovery with SUSE High Availability

8) Configure SLES HA Extension

Page 46: SAP HANA Disaster Recovery with SUSE High Availability

46

Configure SLES HA Extension

• Install pattern “High Availability”.

• Install package SAPHanaSR.

• sleha-init on first node.

• Change /etc/corosync/corosync.conf if necessary.

– udp (multicast) vs udpu (unicast).

– Enable redundant channel and rrp mode.

– Enable security auth.

• sleha-join on second node.

• Keep STONITH disabled during configuration.

Page 47: SAP HANA Disaster Recovery with SUSE High Availability

47

HA Configuration

• Default / global properties

property $id="cib-bootstrap-options" \

no-quorum-policy="ignore" \

stonith-action="poweroff"

rsc_defaults $id="rsc-options" \

resource-stickiness="1000" \

migration-threshold=3 \

failure-timeout=60

op_defaults $id="op-options" \

timeout="600”

Page 48: SAP HANA Disaster Recovery with SUSE High Availability

48

HA Configuration

• SAPHanaTopology:

primitive rsc_SAPHanaTopology_HDB_HDB00

ocf:suse:SAPHanaTopology \

params SID="HDB" InstanceNumber="00" \

op monitor interval="10" timeout="600" \

op start interval="0" timeout="600" \

op stop interval="0" timeout="300"

clone cln_SAPHanaTopology_HDB_HDB00

rsc_SAPHanaTopology_HDB_HDB00 \

meta is-managed="true" clone-node-max="1"

interleave="true"

Page 49: SAP HANA Disaster Recovery with SUSE High Availability

49

HA Configuration

• SAPHana:

primitive rsc_SAPHana_HDB_HDB00 ocf:suse:SAPHana \

params SID="HDB" InstanceNumber="00" PREFER_SITE_TAKEOVER="yes" AUTOMATED_REGISTER="true" DUPLICATE_PRIMARY_TIMEOUT="7200" \

op start interval="0" timeout="3600" \

op stop interval="0" timeout="3600" \

op promote interval="0" timeout="3600" \

op monitor interval="60" role="Master" timeout="700" \

op monitor interval="61" role="Slave" timeout="700" \

meta target-role="Started"

ms msl_SAPHana_HDB_HDB00 rsc_SAPHana_HDB_HDB00 \

meta clone-max="2" clone-node-max="1" interleave="true"

order ord_SAPHana_HDB_HDB00 2000: cln_SAPHanaTopology_HDB_HDB00 msl_SAPHana_HDB_HDB00

Page 50: SAP HANA Disaster Recovery with SUSE High Availability

50

HA Configuration

• Virtual IP:

primitive rsc_ip_HDB_HDB00 ocf:heartbeat:IPaddr2 \

params ip="10.30.1.1" iflabel="0" \

op start interval="0" timeout="20" \

op stop interval="0" timeout="20" \

op monitor interval="10" timeout="20"

colocation col_saphana_ip_HDB_HDB00 2000:

rsc_ip_HDB_HDB00:Started msl_SAPHana_HDB_HDB00:Master

Page 51: SAP HANA Disaster Recovery with SUSE High Availability

51

HA Configuration

• STONITH IPMI:

primitive stonith_ipmi_hana01 stonith:external/ipmi \

params hostname="hana01" ipaddr="172.16.1.1" userid="admin"

passwd="admin" \

op monitor enabled="true" interval="300" start-delay="5"

timeout="20"

location stonith_ipmi_hana01_not_on_hana01 stonith_ipmi_hana01

-inf: hana01

primitive stonith_ipmi_hana02 stonith:external/ipmi \

params hostname="hana02" ipaddr="172.16.1.2" userid="admin"

passwd="admin" \

op monitor enabled="true" interval="300" start-delay="5"

timeout="20"

location stonith_ipmi_hana02_not_on_hana02 stonith_ipmi_hana02

-inf: hana02

Page 52: SAP HANA Disaster Recovery with SUSE High Availability

52

Hawk template

• Created custom Hawk template including IPMI as

STONITH. Available at

http://www.ssys.com.br/susecon/tut20056/hawk-

template.tar.gz.

Page 53: SAP HANA Disaster Recovery with SUSE High Availability

53

Hawk template

Page 54: SAP HANA Disaster Recovery with SUSE High Availability

54

Hawk template

Page 55: SAP HANA Disaster Recovery with SUSE High Availability

55

Hawk template

Page 56: SAP HANA Disaster Recovery with SUSE High Availability

9) Testing takeover

Page 57: SAP HANA Disaster Recovery with SUSE High Availability

57

Manual takeover

• Secondary become primary with user hdbadm:

hana02:/usr/sap/HDB/HDB00> hdbnsutil -sr_takeover

checking local nameserver ...

done.

• Verify new state with user hdbadm:

hana02:/usr/sap/HDB/HDB00> hdbnsutil -sr_state

checking for active or inactive nameserver ...

System Replication State

~~~~~~~~~~~~~~~~~~~~~~~~

mode: primary

site id: 2

site name: SITE002

Host Mappings:

~~~~~~~~~~~~~~

hana02 -> [SITE001] hana01

hana02 -> [SITE002] hana02

done.

Page 58: SAP HANA Disaster Recovery with SUSE High Availability

58

Cluster takeover

• Set AUTOMATED_REGISTER=”true”.

• Take attention to STONITH. Prefer shutdown instead

of reboot.

• Take attention to timeout (start, stop, migration etc)

Page 59: SAP HANA Disaster Recovery with SUSE High Availability

9) Stress test

Page 60: SAP HANA Disaster Recovery with SUSE High Availability

60

Stress test

• Detect problem during stress.

• Most of time due to lower timeout.

• HanaStress (https://github.com/Centiq/HanaStress)

hanastress.py -v --host localhost -i 00 -

u SYSTEM -p P@ssword1 -g anarchy --tables

100 --rows 100000 --threads 10

(This will create 100 tables with 100000 rows of

information each, using 10 threads)

Page 61: SAP HANA Disaster Recovery with SUSE High Availability

61

Cleanup after stress test

• Remove database fragmentation:

– ALTER SYSTEM RECLAIM DATAVOLUME 120 DEFRAGMENT

– ALTER SYSTEM RECLAIM LOG

• Force flushing log data to disk:

– ALTER SYSTEM SAVEPOINT

Page 62: SAP HANA Disaster Recovery with SUSE High Availability

62

References

• https://www.suse.com/docrep/documents/wvhlogf37z/

sap_hana_system_replication_on_sles_for_sap_appli

cations.pdf

• http://scn.sap.com/docs/DOC-60318

• http://scn.sap.com/docs/DOC-60374

• http://scn.sap.com/docs/DOC-60368

Page 63: SAP HANA Disaster Recovery with SUSE High Availability

63

Thank you.

Going further

www.ssys.com.br

Page 64: SAP HANA Disaster Recovery with SUSE High Availability
Page 65: SAP HANA Disaster Recovery with SUSE High Availability

65

+49 911 740 53 0 (Worldwide)www.suse.com

Corporate Headquarters

Maxfeldstrasse 590409 NurembergGermany

Join us on:www.opensuse.org

Page 66: SAP HANA Disaster Recovery with SUSE High Availability

Unpublished Work of SUSE LLC. All Rights Reserved.

This work is an unpublished work and contains confidential, proprietary, and trade secret information of SUSE LLC.

Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of

their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated,

abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE.

Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.

General Disclaimer

This document is not to be construed as a promise by any participating company to develop, deliver, or market a

product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making

purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document,

and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose.

The development, release, and timing of features or functionality described for SUSE products remains at the sole

discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at

any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in

this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All

third-party trademarks are the property of their respective owners.