Top Banner
Cloud Computing: What a Project Manager Needs to Know Dr. Patrick D. Allen, PMP [email protected]
23

Cloud Computing: What a Project Manager Needs to Know

Feb 25, 2016

Download

Documents

Laura Vielma

Cloud Computing: What a Project Manager Needs to Know. Dr. Patrick D. Allen, PMP [email protected]. Purpose. Provide Project Managers with the very basics of the two primary types of Clouds and Cloud Computing, and the questions they should ask when Clouds and their project intersect. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cloud Computing:   What a Project Manager Needs to Know

Cloud Computing: What a Project Manager

Needs to Know

Dr. Patrick D. Allen, [email protected]

Page 2: Cloud Computing:   What a Project Manager Needs to Know

Purpose

Provide Project Managers with the very basics of the two primary types of Clouds and Cloud Computing, and the questions they should ask when Clouds and their project intersect

2

Page 3: Cloud Computing:   What a Project Manager Needs to Know

Overview

“Computing as a Service” Clouds Questions PMs should ask

“Data-Focused” Clouds Relational Databases vs Clouds Map-Reduce and Accumulo examples Questions PMs should ask

General Cloud questions PMs should ask The importance of risk assessments

3

Page 4: Cloud Computing:   What a Project Manager Needs to Know

What’s a Cloud? Two primary definitions of Clouds presented today:

1. Compute-power as a Service (Utility Cloud; VMs) Infrastructure as a Service or Platforms as a Service or Software as a Service

2. A Data-focused Cloud that also runs on VMs E.g. Hadoop Data File System and data processing

A third emerging type is a “Data Storage” Cloud PMs need to make sure everyone understands which type

is being discussed If you think you’re discussing a different one, confusion

results and expectations will not be met

4

Page 5: Cloud Computing:   What a Project Manager Needs to Know

First Type: Computing as a Service Instead of using your own computers, you use a Third-

Party’s computers at another location (e.g., AWS’s EC2) Usually all same hardware with a variety of Virtual

Machine (VM) configurations to meet customer needs When hardware dies, it is seamlessly replaced All hardware and infrastructure and physical security

headaches are the responsibility of the Third Party You’re responsible for secure comms to and from the

data stores and the security on the machines you use You only pay for what you use (memory, computing

power or number of virtual machines used) Great for surge-type activities, such as the census

that’s run every ten years, or new venture start-ups Virtual private clouds are available for better security

5

Page 6: Cloud Computing:   What a Project Manager Needs to Know

First Type: Questions PMs Should Ask –1

What’s the cost per data stored (Cents per Gigabyte)? What’s the cost for number of VM’s used? How secure or private is my data when I store it on a

third-party platform? What security or privacy guarantees are provided? Will the PII be adequately protected? Can I test Cloud security before I put real data there? Would a Cloud be useful for my Continuity of

Operations (COOP) plans? It depends. Do your employees already regularly

perform remote operations like teleworking? Do you have a re-routing plan to get them to the Cloud?

Am I starting a new business with limited investment? 6

Page 7: Cloud Computing:   What a Project Manager Needs to Know

First Type: Questions PMs Should Ask – 2 Can you store classified data on a cloud?

If a properly secured government-accredited private cloud, Maybe If you are planning to use a Third-Party service, Maybe

As a minimum, use a virtual private cloud (e.g., AWS VPC) And located entirely in the U.S. (not distributed world wide) Probably need to limit access to selected personnel at the service

provider site (like no foreign access in US Gov Cloud) US-Gov-only Cloud important for data under export control Need your security department’s approval, which includes your

plan and vetting the provider Probably need to do penetration testing before use, like “side

channel attack” prevention Not sure if this is yet being used for more than unclassified but

sensitive data For either case, always get a cyber security expert to prepare a risk

assessment, and for classified data, a proper accreditation

7

Page 8: Cloud Computing:   What a Project Manager Needs to Know

2nd Type: Data-Focused Cloud–Definitions Huge Data: Petabytes or larger amounts of data HDFS is Hadoop Data File System (more on this later) Relational Database: Think rows and columns, densely

populated (like a spreadsheet) Structured non-relational databases: Cloud-based structured

data technologies like Accumulo and HBase running on HDFS Can be densely or sparsely populated Tend to use flexible labels of length three to six (more later) Many different types of data that may have some overlapping

elements, but not the same across all types of data If put into rows and columns it would be a huge table only

sparsely populated

8

Page 9: Cloud Computing:   What a Project Manager Needs to Know

Relational Database Example

9

Name Address Age HeightJohn Smith

Jane Doe

Fred Flintstone

Tony D. Tiger

Elmer Fudd

Peter Parker

Bruce Wayne

Roger Rabbit

Peter Rabbit

White Rabbit

Washington DC

Baltimore

Rockville

Battle Creek

DeForest

New York

Gotham

Fantasyland

Rural Address

Wonderland

35

29

55

67

60

28

36

41

118

135

5’10”

5’8”

4’10”

6’2”

4’6”

5’5”

6’1”

4’0”

1’1”

1’11”

Find the Names of those of Age >25 but <60, and > 5’ tall

Page 10: Cloud Computing:   What a Project Manager Needs to Know

Sparse Data Example

10

John Smith

Jane Doe

Peter Parker

Bruce Wayne

Washington DC

Baltimore

New York

Gotham

Age 35

Age 29

Age 28

36

5’10”

5’8”

5’5”

6’1”

Medical Records Drivers Licenses Facebook Dating Service

John Smith

Peter Parker

Bruce Wayne

Find the Names of those of Age >25 but <60, and > 5’ tall from multiple data sets

Page 11: Cloud Computing:   What a Project Manager Needs to Know

Accumulo Data Example

11

ID Col. Family Time Security ValueCol. Qualifier001001001001001001

Personal Name 31 Apr ‘12 PII John SmithPersonal Age 31 Apr ‘12 PII 35Personal Height 31 Apr ‘12 PII 5’ 10”Address City 31 Apr ‘12 PII Wash DCAddress Street 31 Apr ‘12 PII K StreetAddress Number 31 Apr ‘12 PII 810

002002002002002002

Personal Name PII Peter ParkerPersonal Age 31 Apr ‘12 PII 28Personal Height 31 Apr ‘12 PII 5’ 5”Address City 31 Apr ‘12 PII New YorkAddress Street 31 Apr ‘12 PIIAddress Number 31 Apr ‘12 PII

72nd Street145

31 Apr ‘12

Find the Names of those of Age >25 but <60, and > 5’ tall

Page 12: Cloud Computing:   What a Project Manager Needs to Know

2nd Type: Data-Focused Cloud Also runs on a VM farm, but uses a “Hadoop” or “Sector”

file management system (Hadoop is most widely used) What does a Hadoop Data File System (HDFS) do for you?

Let’s you store huge amounts of non-relational data Automatically parallelizes the computations Automatically sorts results of “map” step Handles all of the overhead associated with storing,

locating and processing your data Allows for Map-Reduce programs and Direct Access

Table-based searches using Hadoop to be run Can find relationships not easily visible in unstructured

data and/or large amounts of data

12

Page 13: Cloud Computing:   What a Project Manager Needs to Know

2nd Type: Map-Reduce Program Example Find the number people per household in census data

13

DistributedDatabases of Household (HH)Census Data

Count members of HH

Hadoop Auto Sorts

Map Reduce

Add # HHw/ N members,N = 1 to 25

1, 3.5 M

2, 9.6 M

3, 6.8 M

4, 5.3 M

Key = HH Size, Value = #

HH001, 3

HH002, 6

HH003, 4

HH004, 3

HH001, 3

HH002, 6

HH003, 4

HH004, 3

Key = #, Value = Total

Page 14: Cloud Computing:   What a Project Manager Needs to Know

2nd Type: Map-Reduce Pros and Cons Map-Reduce programs are good for:

When you have huge data sets If your data can't be managed in a relational database When you are not sure what types of queries you will want to run If you want to summarize the results of independent processes

that can be applied to data in parallel

Map-Reduce programs are not good for: If you can answer your questions with an existing relational

database in a reasonable amount of time, why bother with the overhead of a cloud?

If your data can fit within a relational database, AND If the queries you plan to run are fairly well-defined THEN You probably don’t need the overhead of a Cloud

14

Page 15: Cloud Computing:   What a Project Manager Needs to Know

2nd Type: Questions PMs Should Ask – 1 Do I even need to use a Cloud?

If you have well-structured reasonable amounts of data, stick with a relational database UNLESS you just want the compute power on demand (1st Type of Cloud presented)

If it is required by external authorities (like a customer), yes Do I have a lot of "surge" events, where you only need to store and

process large amounts of data periodically Then using a cloud makes sense

Do I need to know how to write a Map-Reduce program or an Accumulo Table to use a Cloud? No, can use pre-defined programs, OR you need someone who

knows how write new ones for you Do I need to know how to design a Map-Reduce program?

No, but it helps so you can ask for realistic output from the Cloud and really leverage the Cloud to solve your data problems

15

Page 16: Cloud Computing:   What a Project Manager Needs to Know

2nd Type: Questions PMs Should Ask – 2 Do I have access to an existing Cloud I could use?

If it meets your requirements, third-party Clouds work Make sure of the “fine print” on the guarantees, and whether the

recourse of the guarantee is sufficient to match the cost of the failure to guarantee

Have a security expert do a risk assessment before committing Do I need to build my own instead?

If you have security, privacy or proprietary needs not met by an existing Cloud, might want to build your own

Consider the ongoing maintenance costs (may be primary rationale for moving to a Cloud)

More automation reduces Cloud maintenance costs

16

Page 17: Cloud Computing:   What a Project Manager Needs to Know

General Cloud Questions for PMs Where is the Cloud located? Can it be restricted to U.S.? Who gets access to it? How are the communications to/from the cloud secured? How does it ingest its data? How does it store its data? How do they secure your data at rest? How does it delete its data? Can you test that it’s gone? Does it keep your data separate from other people's data?

Do you need/want a virtual private cloud instead? How often is the hardware upgraded? How many versions of VMs can you choose from? Has a security expert performed a risk assessment?

17

Page 18: Cloud Computing:   What a Project Manager Needs to Know

Summary Observations Cloud computing is here to stay Many more projects in the future will encounter Clouds in

some way that will impact the project Need to be aware of the strengths and limitations of

Clouds and whether they are appropriate for your project You may not have a choice whether or not to use a Cloud

This briefing listed some of the basic questions you should ask as appropriate to your project

Hopefully some of the mystery (and hype) of the Cloud has been dispelled by this talk

It is useful to be able to design a Map-Reduce program so your expectations of the output are realistic

Always do a cyber risk assessment on a Cloud you plan to use

18

Page 19: Cloud Computing:   What a Project Manager Needs to Know

Contact InfoDr. Patrick D. AllenJohns Hopkins University Applied Physics Lab11100 Johns Hopkins RoadMS 21-N246Laurel, MD 20723-6099443-778-9915 v443-778-3838 [email protected]

19

Page 20: Cloud Computing:   What a Project Manager Needs to Know

New in This Edition (14 Aug 2012)

FedRAMP is a new standardized approach to security assessment, authorization and security monitoring for cloud-based products and services

FedRAMP is mandatory for federal agency cloud deployments and service models at the low and moderate risk impact levels

Ref: The Business Monthly, Aug 2012 by Gloria Larkin “Cybersecurity and FedRAMP: A Mandatory Combination

20

Page 21: Cloud Computing:   What a Project Manager Needs to Know

Back-up: Terminology Relationship

21

Google File System (GFS)

Hadoop Data File System (HDFS)

Hadoop(Map Reduce)

Map Reduce

Big TableHDFSAccumulo

APACHE GOOGLE

StructuredData

Map Reduce Environment

File System

Page 22: Cloud Computing:   What a Project Manager Needs to Know

Back-up: Sample Map Reduce Program

22

Map algorithm

Map (key: sourceURL, value: text) {for each (targetURL in text)EmitIntermediate (targetURL, sourceURL);

}

Reduce Algorithm

Reduce (key: targetURL, value: sourceURL) {sourceList[] = null;for each (u in sourceURL)add sourceList[sourceURL];Emit (targetURL, sourceList[]);

}

Page 23: Cloud Computing:   What a Project Manager Needs to Know

Back-up: Map Reduce Example 2

23

Find targets for source 1

Find targets for source 2

Find targets for source 10^9

targetURL a – URL1

targetURL b – URL1

targetURL a – URL2

targetURL c – URL2

targetURL b – URL10^9

targetURL c – URL10^9

targetURL d – URL10^9

targetURL a – URL1

targetURL a – URL2

targetURL b – URL1

targetURL b – URL10^9

targetURL c – URL2

targetURL c – URL10^9

targetURL d – URL10^9

Create list for targetURL a

Create list for targetURL b

Create list for targetURL c

Create list for targetURL d

sortedtargetURL – sourceURL list

Doc 1

Doc 2

Doc 10^9

For each URL, find all the pages that point to it.