Top Banner
The Open Cloud FPGA Testbed Supporting Experiments on Emerging Datacenter Configurations* Martin Herbordt Miriam Leeser * Funded by the National Science Foundation through the Computer Community Research Infrastructure CCRI Grand Program
34

The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

Jul 24, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

The Open Cloud FPGA Testbed –

Supporting Experiments on

Emerging Datacenter Configurations*

Martin Herbordt Miriam Leeser

* Funded by the National Science Foundation through the Computer

Community Research Infrastructure CCRI Grand Program

Page 2: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

Motivation & Overview

Page 3: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

Motivation (1/3) – Millions of FPGAs in the Cloud

for provider use – e.g. Microsoft Catapult

Provider system uses

• SDN

• Instrumentation and Metering

Provider internal applications

• Compression

• Encryption

Provider external applications

• Security and Privacy

• Machine Learning

• Other big-data analytics

Page 4: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

Motivation (2/3) - FPGAs Everywhere in the

Datacenter – Various Observations

Page 5: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

Motivation (3/3) – Potential of Millions of FPGAs in

Datacenters for HPC – e.g. Intel COPA

Page 6: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

The Open Cloud Testbed

• Funded by National Science Foundation CCRI Grand Program

– Computer Community Research Infrastructure

• Building on Existing Infrastructure:

– MGHPCC: Massachusetts Green High Performance Computing Center

– MOC: Massachusetts Open Cloud

– OpenCloudLab

• What’s new:

– FPGAs for the user community

• Collaboration among– UMass Amherst, Boston University, Northeastern University

Page 7: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

OCFT Context & Ecosystem

Page 8: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

Core Team

Page 9: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

MGHPCC: Massachusetts Green High Performance

Computer Center

Mass Open Cloud

Page 10: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

MOC: Massachusetts Open Cloud

• Funded by Commonwealth, Industry partners and universities

• Thousands users, many thousands of users of services

• New Harvard/BU research IT plan to create a production service:

– consistent infrastructure, operations team, research facilitators, buy-in model

• Connection to NSF NESE (20+PB), NSF NE Cyberteam, Harvard Dataverse

• Sustainability through:

– integration with research IT and support for end-users

– industry support for cloud: interoperability lab, exposing new innovation, visibility into

usage

– extensive experience upstreaming with large industry driven open source communities

• Support smaller institutions: new MTC proposal & NE Cyberteam

• Used by regional “friends and family” CISE researchers: cybersecurity (MACS), systems,

data science …

Page 11: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

What is Massachusetts Open Cloud (MOC)?

Page 12: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

MOC supports

– real users

– access to real data sets

– can provide traces of real usage

– can allow services to be exposed to end-users (e.g.,

TTP)

– has access to production services at scale (e.g., NESE)

– infrastructure and services provided by industry

partners

Page 13: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

• Scientific infrastructure for cloud research

• Three clusters (Utah, Wisconsin, and Clemson), which offer 15,000

cores

– Each cluster has a different focus: storage and networking (using hardware

from Cisco, Seagate, and HP), high-memory computing (Dell), and energy-

efficient computing (HP).

• Designed specifically for reproducible research

• Hard isolation to create many parallel “slices”

Page 14: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

What is CloudLab?

Page 15: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

Open CloudLab Concept

Page 16: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

Research "in" the MOC

logs/usage data

Cloud Users

NESE

Cloud ResearcherCloud Researchers

MOC production cloud

ESI NERC

Page 17: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

The Open Cloud FPGA Testbed

Page 18: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

The Open Cloud FPGA Testbed - OCFT

Tag line An MOC-style Catapult testbed and so much more

• Enhanced with programmable hardware (FPGA) capabilities not

present in other facilities available to researchers today

Current FPGAs in the Datacenter: why do we need the OCFT?

• Microsoft Catapult

– No user access

• AWS F1 instances >> and Baidu, Chameleon, TACC, etc.

– Available to users as accelerators, but interactions are restricted

• Various FPGA-centric clusters >> BU, Paderborn, Riken, TACC, Tsukuba

– Very difficult to bring on line, even for a single institution

– Even more difficult to maintain

– HPC-specific rather than general datacenter

Page 19: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

OCFT for FPGAs in the Datacenter

Page 20: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

Why OCFT will work

Funding for FPGA-specific system management and customer service

• FTE FPGA engineer

Integration into existing cloud ecosystem

Broader community will be pitching in

• Industry partners, advisory board, beta users

Page 21: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

Sample Projects

Page 22: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

How OCFT will be used –

Sample Projects

• Hardware operating system (on the FPGAs)• Drivers, Multitenancy, Handling “Pass-through” system communication

• Development environment• Enable access and programming by system and application developers

• System applications• Compression, security, privacy preserving computation

• User applications – in the node to across the datacenter• Middleware offload – MPI

• Application-aware I/O support through lossy compression

• Massively parallel applications – Large scale physical simulations

• Distributed machine learning

Page 23: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

The analyst learns only

and nothing else.

Applications: Privacy

• Garbled Circuits in an FPGA cloud. Supported by NSF SaTC grant:

– Massively Scalable Secure Computation Infrastructure Using FPGAs

– In collaboration with Stratis Ioannidis, Northeastern University

• Provide privacy guarantees:

• Good match for FPGAs

• Currently targeting

AWS F1 instances

Page 24: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

Yao’s Garbled Circuits: Protocol Overview

GARBLER EVALUATOR INPUT OWNERS

TRANSMIT

PROXY OBLIVIOUS TRANSFER Private Inputs

GA

RB

LE

Keys

Garbled Circuit

EVALUATE

PH

AS

E I

PH

AS

E I

IP

HA

SE

III

Page 25: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

Survey Results

email us if you would like to participate

Page 26: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

OCFT – Survey Results• Alpha cohort – Herbordt & Leeser research groups, Red Hat

• Beta cohort – Established FPGA/Cloud/HPC research groups. Survey is for Beta cohort.

• Gamma cohort – broader community with certain attributes, particularly the experience to be

able to use this rather than other infrastructure.

Initial list of potential users by affiliationUniversities Replies

• Boston University 2

• Brown

• BYU

• Cornell

• CMU x

• MIT x

• NCSU x

• Northeastern

• Penn

• Stevens

• Tufts x

• U. Arkansas x

• U. Alabama x

• UCSD

• U. Florida x

• U. Miami of Ohio x

• U. Massachusetts x

Universities, cont. Replies

• UNCC

• U. Pittsburgh

• U. Tennessee

• Worcester Polytechnic

• Wash. U. St. Louis x

• W. Michigan

• Yale

National Labs Replies

• Argonne x

• Lawrence Berkeley

• Pacific Northwest x

Industry Replies

• AlgoLogic

• Atomic Rules x

• Comma Corp x

• Gray Research LLC

• Red Hat x

Page 27: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

Beta user configuration priority

We can’t support everything … What should be the priorities?

First Choice Total

FE1: Catapult2-like – Bump-in-the-Wire 10 10

FE2: Programmable NIC 2 2

FE3: FPGA is the node 0 0

BE1: Bare-metal back-end processor 1 2

BE2: Tightly coupled back-end processor (CCIX) 2 4

BE3: Cluster of directly connected FPGAs 2 5

Page 28: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

Beta user project types

What will you use the OCFT for?

Project Type first choice total

Cloud and Operating System 6 6

Middleware 2 5

FPGA systems 3 4

FPGA tools 3 6

Provider applications 1 3

Tenant applications 2 3

Page 29: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

Miscellaneous

Enthusiasm for OCFT (17 replies)13/17 gave as part of their answer some variation of very interested

4/17 gave practical responses of what they would do with OCFT

Tools preference (17 replies)Intel – 11 Xilinx – 12 Generic – 1

Both or would switch – 12/17

HBM? (17 replies)Yes = 8 No = 1 “Nice but” = 2 No reply re HBM = 6

What board? (17 replies)no reply = 2 no preference = 7 Xilinx = 2 Intel = 4 Both = 2

Page 30: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

FPGA Options

Page 31: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

Xilinx Alveo Cards for data centers: https://www.xilinx.com/products/boards-and

kits/alveo/u280.html#specifications

Page 32: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

Intel D5005: https://www.intel.com/content/www/us/en/progra

mmable/products/boards_and_kits/dev-

kits/altera/intel-fpga-pac-d5005/overview.html

Page 33: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

Advantages and Disadvantages

• Xilinx Alveo 280

– + High Bandwidth Memory (HBM)

– -- Only 2 QSFP28 connections

– Programming: Xilinx Vitis Tool

• Intel D5005

– + 4 QSFP28 connections

– -- No HBM

– Intel OneAPI

Page 34: The Open Cloud FPGA Testbed · 2020-05-17 · MOC: Massachusetts Open Cloud • Funded by Commonwealth, Industry partners and universities • Thousands users, many thousands of users

Responsive to User Community

We want to build what you, our users, want

Discussion, comments, questions, …