Top Banner
Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 R3 Life Cycle Objective Review for Common Execution Infrastructure (CEI) Subsystem Kate Keahey David LaBissoniere Patrick Armstrong Pierre Riteau 1
18

Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 R3 Life Cycle Objective Review for Common Execution.

Jan 01, 2016

Download

Documents

Lester Glenn
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 R3 Life Cycle Objective Review for Common Execution.

Ocean Observatories Initiative

OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013

R3 Life Cycle Objective Review forCommon Execution Infrastructure (CEI) Subsystem

Kate KeaheyDavid LaBissonierePatrick ArmstrongPierre Riteau

1

Page 2: Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 R3 Life Cycle Objective Review for Common Execution.

OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013

Subsystem Purpose

• Allow OOI applications and system to– Provide Highly Available (HA)

services– Scale to demand

• Enact OOI deployment policies in elastic environment

• Provide a deployment foundation for OOI CI

2

Page 3: Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 R3 Life Cycle Objective Review for Common Execution.

OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013

Overview

• CEI Overview

• R3 Scope

• Cloud Provider Options

• Risks

• Elaboration Plan

3

Page 4: Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 R3 Life Cycle Objective Review for Common Execution.

OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013

Resources for HA and Scaling

04/20/23

4

EPU ManagementMonitor and regulate set properties

based on system-specific and application-specific metrics

– Cloud resources are available on-demand, but any particular resource may fail at any time

– Applications/processes can absorb new resources– Applications/processes can tolerate failures

EPU

Page 5: Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 R3 Life Cycle Objective Review for Common Execution.

OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013

EE ioncore 1.3

EPU ManagementEPU ManagementEPU Management

Elastic Processing Unit (EPU) Management

5

EE ioncore 1.2

context-agent

ou-agent

EE matlab 6.1

context-agent

ou-agent

Decision Engine

context-agent

ou-agent

Provisioner

IaaS

create instance

AMQP

OtherDTRS

CB

Page 6: Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 R3 Life Cycle Objective Review for Common Execution.

OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013

Making the EPU HA

ou-agent ou-agent ou-agent

EPU WorkerEPU WorkerEPU WorkerEPU WorkerEPU WorkerEPU Worker

EPU WorkerEPU WorkerEPU Worker

Bootstrap EPU

Dedicated DEProvisioner/DTRS

IaaS

create instance

AMQP

Other

cloudinit.d

Page 7: Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 R3 Life Cycle Objective Review for Common Execution.

OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013

Managing Processes: Creating a Process I

7

Process Definition Registry

Process Dispatcher EE type A instanceProcess Instance Registry

request to activateprocess X

ee-agentDecision Enginelookup

launch

enter

AMQP

Other

Page 8: Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 R3 Life Cycle Objective Review for Common Execution.

OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013

Managing Processes: Creating a Process II

8

Process Definition Registry

Process Dispatcher

Provisioner/DTRS

IaaS

EE type A instance

EPU Management

Process Instance Registry

request to activateprocess X

ee-agentDecision Enginelookup

launch

enter

request instance

create instance

AMQP

Other

Page 9: Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 R3 Life Cycle Objective Review for Common Execution.

OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013

CC instance

CC instance

Managing Processes: Inside an Execution Engine

9

EE type A instance

context-agent

ee-agent

ou-agent

supervisord

supervisord

supervisord

KeplerC

C

M

CMR

CMR

CMK

CMKO

CMKO

datastream subscription result

Process Dispatcher

EPU Management

Package Server

process (adapter) 1

AMQP

Other

C – create M – monitor R – restart K – kill O – I/OC – create M – monitor R – restart K – kill O – I/O

Page 10: Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 R3 Life Cycle Objective Review for Common Execution.

OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013

Adventures in Availability

• Time to repair (TTR)– Diagnosis– Time to scale (TTS)

• PENDING (request)• STARTED (deployment)• RUNNING

(contextualization)

04/20/23

10

A = MTBFMTBF+MTTR

mean time between failures

mean time to repair

TTS: preliminary results for 2,000 VMs provisioned on AWS EC2

Page 11: Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 R3 Life Cycle Objective Review for Common Execution.

OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013

CEI R3 Proposed Scope• Robustness

– upgrade mechanisms, maintainability code refactor, more unit tests, scale and stress testing, documentation, packaging, support, etc.

• Integration– Component interaction update, tight inter-component integration

• New features– Process and resource management

• Process activation and validation• New execution site registration

– Integration with National Infrastructure• Framework for integration of academic cloud providers such as XSEDE

and OSG

– Support for a new cloud provider

– SLA management

11

Page 12: Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 R3 Life Cycle Objective Review for Common Execution.

OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013

Cloud Provider Options• Windows Azure

– Initially PaaS, now offers IaaS with Windows & Linux

– Pros: 8 regions in North America, Europe, and Asia

– Cons: no libcloud support, still in preview mode (no SLA yet)

• Rackspace– Pros: based on OpenStack, libcloud support

– Cons: only 3 regions with 2 in the USA

• Google Compute Engine– Pros: targets high performance/throughput clusters, advertises

50% more CPU per $ compared to EC2

– Cons: still in limited preview, no libcloud support, few regions for now

12

Page 13: Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 R3 Life Cycle Objective Review for Common Execution.

OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013

Risks

• Scope– Mitigation: scope prioritization with the architects

• Handoff process

13

Page 14: Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 R3 Life Cycle Objective Review for Common Execution.

OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013

R3 Elaboration 1• Theme: focus on support and high-risk elements

• Support activities (includes R2 Transition and R2.1 features)

• Assist with Kepler integration

• Design and prototype process package download and installation scheme (with COI)

• Initial prototype of Chef server integration (upgrades)

• Integration: eliminate resource registry mirroring in Process Dispatcher

14

Page 15: Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 R3 Life Cycle Objective Review for Common Execution.

OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013

R3 Elaboration 2

• Theme: continue to support existing deployments and fix issues while emphasizing integration.

• Continued support activities

• Assist with Kepler Integration

• Integration: pyon capabilities in EPUM, DTRS, and Provisioner

• New execution site registration

15

Page 16: Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 R3 Life Cycle Objective Review for Common Execution.

OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013

CEI R3 Team

1604/20/23

CEI DeveloperPatrick ArmstrongUniversity of Chicago(location: Victoria, Canada)

CEI DeveloperPierre RiteauUniversity of Chicago(location: Oxford, England)(part-time)

CEI Senior DeveloperDavid LaBissoniereUniversity of Chicago(location: Chicago, IL)

CEI DesignerKate KeaheyArgonne National LabUniversity of Chicago

Page 17: Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 R3 Life Cycle Objective Review for Common Execution.

OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013

Questions?

17

Page 18: Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 R3 Life Cycle Objective Review for Common Execution.

OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013

CEI R3 Proposed Scope• Robustness

– upgrade mechanisms, maintainability code refactor, more unit tests, scale and stress testing, documentation, packaging, support, etc.

• Integration– Component interaction update, tight inter-component integration

• New features– Process and resource management

• Activation and validation• New execution site registration

– Integration with National Infrastructure• Framework for integration of academic cloud providers such as XSEDE

and OSG

– Support for a new cloud provider

– SLA management

18