Top Banner
Kemari: Virtual Machine Synchronization for Fault Tolerance using DomT Yoshi Tamura NTT Cyber Space Labs. [email protected] 2008/6/24
19

XS Boston 2008 Fault Tolerance

May 13, 2015

Download

Technology

Yoshi Tamura: VM Synchronization for Fault Tolerance Using DomT
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: XS Boston 2008 Fault Tolerance

Kemari: Virtual Machine Synchronization for Fault Tolerance using DomT

Yoshi Tamura NTT Cyber Space Labs.

[email protected]

2008/6/24

Page 2: XS Boston 2008 Fault Tolerance

Outline

 Our goal  Design  Architecture overview  Implementation  Evaluation  Conclusion

Copyright © 2007-2008 Nippon Telegraph and Telephone Corporation 2

Brush up on Xen Summit 2007

Page 3: XS Boston 2008 Fault Tolerance

What is Kemari?

 Kemari is a football game that players keep a ball in the air

Copyright © 2007-2008 Nippon Telegraph and Telephone Corporation

Don’t drop the ball!

蹴鞠 (Kemari)

3

Page 4: XS Boston 2008 Fault Tolerance

Copyright © 2007-2008 Nippon Telegraph and Telephone Corporation 4

Our goal

Don’t drop the ball! Don’t drop the VMs!

Keep running transparently

Kemari: Virtual Machine Synchronization

Hardware failure

Page 5: XS Boston 2008 Fault Tolerance

Copyright © 2007-2008 Nippon Telegraph and Telephone Corporation 5

What needs to be done?

  Virtual Machine Synchronization   Primary VM and Secondary VM must be identical

  Detection of failure   Failover mechanism

Secondary node

Hardware

VMM

Guest OS

Apps

Hardware

VMM

Guest OS

Apps

Primary node

Network

Hardware Failure

SAN

VM Synchronization

Failover

Extension of existing techniques

Page 6: XS Boston 2008 Fault Tolerance

Copyright © 2007-2008 Nippon Telegraph and Telephone Corporation 6

How to synchronize VMs?

  Need to make the overhead of sync smaller   Make sync time shorter

  Sync VMs less often

Primary VM

Secondary VM

2. Resume Primary after sync

1.  Pause Primary, and sync with Secondary

tsync tinterval

  Sync VMs before sending or receiving Events   Events: Storage, network, console

Only transfer updated data

Secondary must be able to continue transparently

Page 7: XS Boston 2008 Fault Tolerance

  Secondary VM won’t be able to continue transparently 7

What happens if synced on specific intervals?

Vi Vi

Vi-1

Sj

Sj+1

2. Update Secondary

5. Reply

Vi+1

3. Sync completed

1. Sync VM

Vi : VM’s state Sj : Storage’s state

4. Read request

Vi+2 7. Reply

6. Write request

× 8. Failover to Secondary VM after detecting HW failure

Sj+1 Vi 9. Read request

The state between VM and storage isn’t consistent

Copyright © 2007-2008 Nippon Telegraph and Telephone Corporation

Primary VM Secondary VM Storage Primary VMM Secondary VMM

Page 8: XS Boston 2008 Fault Tolerance

Copyright © 2007-2008 Nippon Telegraph and Telephone Corporation 8

Sync on events from VM to storage

Primary VM Secondary VM Storage Primary VMM Secondary VMM

Vi

Vi

Vi-1 Sj

Sj+1

5. Resume Read / Write

3. Update Secondary

6. Reply

Vi+1

Resume point: Just before operating storage

4. Sync completed

2. Sync VM and event

Vi : VM’s state Sj : Storage’s state

  Secondary will redo the same operation as Primary   Secondary will receive the same reply as Primary

1. Read / Write request

Page 9: XS Boston 2008 Fault Tolerance

Demo…

Copyright © 2007-2008 Nippon Telegraph and Telephone Corporation 9

Hardware failure

  Guest VM is running on Kemari   Guest is running VNC server, and the client accesses via VNC client   xclock is launched from the client   See what happens to the clock when the primary physical server is

shut downed from HP iLO2 management console

VNC Server

VNC Client

Page 10: XS Boston 2008 Fault Tolerance

Hardware

Xen

Dom0 DomT

Back-end Front-end

Kemari

Kemari

Hardware

Xen

Dom0 DomT

Back-end Front-end

Kemari

Copyright © 2007-2008 Nippon Telegraph and Telephone Corporation 10

Architecture overview

SAN

Network

  The core of the synchronization mechanism resides in hypervisor to synchronize virtual machines efficiently

  LOC ≅ 3000 (hypervisor: 1000, Dom0+Tools: 2000)

Sync DomT

Page 11: XS Boston 2008 Fault Tolerance

What is DomT?

  Para-virtualized domain which uses shadow page table (auto-translated-mode)

  Don't have to translate the page tables on transferring   DomT patch set for xen-3.0.4 was written by Michael A

Fetterman from University of Cambridge 11

Domain U Domain T

PT

PT

pfn

PT

PT

Shadow PT mfn pfn mfn

Copyright © 2007-2008 Nippon Telegraph and Telephone Corporation

Page 12: XS Boston 2008 Fault Tolerance

Copyright © 2007-2008 Nippon Telegraph and Telephone Corporation 12

Implementation of Kemari

 Event Channel tapping

 Transferring DomT

 Restoring para-virtualized devices

Page 13: XS Boston 2008 Fault Tolerance

Copyright © 2007-2008 Nippon Telegraph and Telephone Corporation 13

Event Channel tapping

  Simple but the key component of Kemari   Monitors IN/OUT or Both   Registered function is called on specific events   Dynamically attachable

  May be useful for measurements

Dom0

Back-end

DomT

Front-end

Kemari

ECS_TAP

1

2

3 4

5

Page 14: XS Boston 2008 Fault Tolerance

Copyright © 2007-2008 Nippon Telegraph and Telephone Corporation 14

Transferring DomT

1.  Pauses DomT and locks the grant tables. No need to suspend! •  Grant tables are mapped at the last 4 pages of DomT region

2.  Extracts dirtied pfn from the bitmap, copies pfns and the vcpu to the shared buffer, and notifies Tools via event channel

3.  Maps dirtied pages, transfers pages and vcpu to the secondary 4.  Secondary prepares temp buffers to rollback when failure is

detected during transfer

Log dirty bitmap

VCPU

Shared buffer

(VMM/Tools)

DomT

Grant table

Primary Secondary

1 (VMM) 2 (VMM) 3 (Tools) 4 (Tools)

tmp buffer

DomT region

Page 15: XS Boston 2008 Fault Tolerance

Copyright © 2007-2008 Nippon Telegraph and Telephone Corporation 15

Restoring para-virtualized devices

DomT

Front-end

Dom0

Back-end

1. Device Channel is stored in DomT region

2. Attach the Back-end to the Device Channel using BACK_RING_ATTACH macro

3. Adjust producer and consumer indexes of the Back-end appropriately

Device Channel

Response consumer

Request producer

Response producer

Request consumer

Page 16: XS Boston 2008 Fault Tolerance

Copyright © 2007-2008 Nippon Telegraph and Telephone Corporation 16

Evaluation

  Evaluation items   Performance of the Primary VM (Network and File I/O) using

netperf and iozone

  Test machines   Hardware spec

•  CPU: Intel Xeon 3GHz X 2 •  Memory: 4GB •  Network: Gigabit Ethernet, InfiniBand •  SAN: FC Disk Array

  VM spec •  VMM: Xen 3.0.4 with DomT support •  Guest OS: Debian Etch •  Memory: 512MB

Page 17: XS Boston 2008 Fault Tolerance

Copyright © 2007-2008 Nippon Telegraph and Telephone Corporation 17

Performance of Primary VM

  InfiniBand boosted the performance of Network and Buffered + fsync, both of which dirties many pages

  All benchmarks continued transparently when the primary server was shut downed from the HP iLO 2 management console

0

1

2

3

4

5

6

7

8

O_SYNC

Thro

ughp

ut [M

B/s

ec]

0

10

20

30

40

50

60

70

80

90

100

Network

Thro

ughp

ut [M

b/se

c]

0

10

20

30

40

50

60

70

80

90

100

Buffered + fsync

Thro

ughp

ut [M

B/s

ec]

■ DomT ■ Kemari (Ethernet) ■ Kemari (InfiniBand)

Page 18: XS Boston 2008 Fault Tolerance

Conclusion

 Kemari is a Virtual Machine synchronization mechanism to achieve Fault Tolerance

 Don't drop the ball! Don't drop the VMs!

 Implemented Kemari using Xen and DomT  Thanks to Michael from University of Cambridge

 Demonstrated Kemari achieved acceptable performance

Copyright © 2007-2008 Nippon Telegraph and Telephone Corporation 18

Page 19: XS Boston 2008 Fault Tolerance

Future work

 Demonstrate the range of applications Kemari can manage to run transparently

 Improve the performance of I/O intensive applications that send numbers of events

 Hosting HVM domains with PV drivers  Hosting multiple domains simultaneously  Functions to implement for practical use such as

detection of HW failure and failover mechanism

Copyright © 2007-2008 Nippon Telegraph and Telephone Corporation 19