Top Banner
Research Issues in Cooperative Computing Douglas Thain http://www.cse.nd.edu/~cc l
27
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Research Issues in Cooperative Computing Douglas Thain ccl.

Research Issues inCooperative Computing

Douglas Thain

http://www.cse.nd.edu/~ccl

Page 2: Research Issues in Cooperative Computing Douglas Thain ccl.

Sharing is Hard!

• Despite decades of research in distributed systems and operating systems, sharing computing resources is still very difficult.

• Problems get worse as scale increases:– Office– Server Room– Distributed System– Computational Grid

Page 3: Research Issues in Cooperative Computing Douglas Thain ccl.

Designers Go To Extremes:

Peer toPeer

CentralControl

CooperativeComputing

Page 4: Research Issues in Cooperative Computing Douglas Thain ccl.

How Do We Share Data?

Central Storage Archive(NFS, UDC, StorageTank.)

P2P File Sharing(WWW, Napster)

Page 5: Research Issues in Cooperative Computing Douglas Thain ccl.

Things I Can’t Do Today

• Let members of my project team store and retrieve documents from this disk in my office.– (Where my boss defines “project team”.)

• I must have 1 TB of space for one whole week, but it must be stored by someone I know.– (Where I give a list of trusted people.)

• Allow a visitor in my office to use my machine.– (But I want her workspace isolated from mine.)

• This bioinformatics repository can be written by my grad students, read by all ND faculty, and read by anyone approved by the NSF.– (Where each list comes from a different source.)

Page 6: Research Issues in Cooperative Computing Douglas Thain ccl.

What is Cooperative Computing?

• CC means putting owners in charge.– I control who uses my resources.– Need tools for expressing trust.

• CC means respect for social structures. – Trust is rarely symmetric.– Hierarchy and centralization can be important.– Motivation is usually external to the system.

• CC means ease of use.– Resource owners need simple and effective tools.– Resource users need to be insulated from failures.

Page 7: Research Issues in Cooperative Computing Douglas Thain ccl.

ConsumptionAllocation

AccountingQuality of Service

SecurityDebugging

ConsumptionAllocation

AccountingQuality of Service

SecurityDebugging

ConsumptionAllocation

AccountingQuality of Service

SecurityDebugging

Every User Should be a Super-User

AllocationAccounting

Quality of ServiceSecurity

Debugging

Super-User

Page 8: Research Issues in Cooperative Computing Douglas Thain ccl.

Vision of Cooperative Storage

• Make it easy to deploy systems that:– Allow sharing of storage space.– Respect existing human structures.– Provide reasonable space/perf promises.– Work easily and transparently without root.– Make the non-ideal properties manageable:

• Limited allocation. (select, renew, migrate)• Unreliable networks. (useful fallback modes)• Changing configuration. (auto. discovery/config)

Page 9: Research Issues in Cooperative Computing Douglas Thain ccl.

basicfilesystem

storageserver

Where can I find100 GB for 24 hours?

Make reservationand access data

accesscontrolserver

Is this amember of

the CSE dept?

Members of theCSE dept can borrow200 GB for one week.

Resource Policy

storagecatalog

statusupdates

Evict user!

?Who is here?

Page 10: Research Issues in Cooperative Computing Douglas Thain ccl.

Cooperative Storage Pool

diskdisk

diskdisk

diskdisk

storageserver

storageserver

storageserver

storageserver

storageserver

storageserver

dist. file system backup systemdist. computation

Page 11: Research Issues in Cooperative Computing Douglas Thain ccl.

Cooperative Computingis useful in the office…

but it is badly neededon the Grid!

Page 12: Research Issues in Cooperative Computing Douglas Thain ccl.

CPU CPU

CPUCPU

CPU

CPU

PBS batch system

CPU

CPU

CPU

CPU

Con

dor

Bat

ch S

yste

m

CPU CPU

CPUCPU

Maui Scheduler

jobjobjobjobjobjobjobjob

Work Queue

gatekeeper

gatekeeper

gatekeeper

On the Grid

Page 13: Research Issues in Cooperative Computing Douglas Thain ccl.

Grid Computing Experience

Ian Foster, et al. (102 authors)The Grid2003 Production Grid:

Principles and PracticeIEEE HPDC 2004

The Grid2003 Project has deployed a multi-virtual organization, application-driven grid laboratory

that has sustained for several months the production-level services required by…

ATLAS, CMS, SDSS, LIGO…

Page 14: Research Issues in Cooperative Computing Douglas Thain ccl.

Grid Computing ExperienceThe good news:

– 27 sites with 2800 CPUs.– 40985 CPU-days provided over 6 months.– 10 applications with 1300 simultaneous jobs.

The bad news:– 40-70 percent utilization.– 30 percent of jobs would fail.– 90 percent of failures were local problems.

The lessons:– Most site failures were due to disk space.– Debugging most problems was impossible.

Page 15: Research Issues in Cooperative Computing Douglas Thain ccl.

Coop Computing and the Grid

• The Grid is a boundary case of CC.– Large scale, high performance.– Allocate resources to partially trusted visitors.– Everyone wants to exhaust resources.

• Can CC scale from the office to the grid?– If it is easy for one person to deploy in an

office… then it will be usable enough to work on the grid.

Page 16: Research Issues in Cooperative Computing Douglas Thain ccl.

More Cooperative Computing

• Nested Principals & Authentication– Simple question: How to allow a visitor?

• Distributed Access Control– Can we find something more usable than PKI?

• Storage Abstractions– Can we do better than files/directories?

• Data-Intensive Grid Computing– How do I use storage and CPU together?

• Distributing Debugging– Consider it a distributed query problem.

Page 17: Research Issues in Cooperative Computing Douglas Thain ccl.

Cooperative Computing Credo:

Make computer structures

model social structures...

Not the other way around!

Page 18: Research Issues in Cooperative Computing Douglas Thain ccl.

For more information…

The Cooperative Computing Lab

http://www.cse.nd.edu/~ccl

Prof. Douglas Thain

[email protected]

Page 19: Research Issues in Cooperative Computing Douglas Thain ccl.
Page 20: Research Issues in Cooperative Computing Douglas Thain ccl.

Two Related Problems

• Users don’t have direct control.– I need 50 GB of storage for one week.– Allow my collaborators to use my space.(Usually considered administrative tasks.)

• Users don’t have direct information.– Why was I denied this allocation?– What series of steps was used to run my job?(Usually considered implementation details.)

Page 21: Research Issues in Cooperative Computing Douglas Thain ccl.

The Current Situation

storageserver

storageserver

storageserver

storageserver

storageserver

libchirp

openclosereadwrite

chirptool

libchirp

GETPUT

parrot

libchirp

% cp% emacs% vi

catalogserver

statusupdates

simpleACL

hostnamekerberos

GSIfilesystem

Page 22: Research Issues in Cooperative Computing Douglas Thain ccl.

Distributed Debuggingdebugger

storageServer

storageserver

storageserver

batchsystem

kerberos

licensemanager

authgateway

cpu cpu

cpucpu

cpu cpu

archivalhost

logfile

logfile

logfile

logfile

logfile

logfile

logfile

logfile

workloadmanager

job

Page 23: Research Issues in Cooperative Computing Douglas Thain ccl.

Distributed Debugging

• Big challenges!– Language issues: storing and combining logs.– Ordering: How to reassemble events?– Completeness: Gaps, losses, detail.– Systems: Distributed data collection.

• But, could be a big win:– “A crashes whenever X gets its creds from Y.”– “Please try again: I have turned up the detail

on host B.”

Page 24: Research Issues in Cooperative Computing Douglas Thain ccl.

Grid Computing

- The Vision: Make large-scale computing resources as reliable and as simple as the electric power grid or the water utility.

- The Reality: Tie together existing computing clusters and archival storage around the country into systems that are (almost) usable by experts.

Page 25: Research Issues in Cooperative Computing Douglas Thain ccl.

• Storage Allocation– Give me 50 GB for 24 hours– Technical Problem: Building Allocation

• Distributed Debugging– Correlation– Hypothesis Proposal– Reasoning– System Building– Adaptation

Page 26: Research Issues in Cooperative Computing Douglas Thain ccl.

disk disk

If I can backup to you,you can backup to me.

CPU

CPUCPU

CSE grads can computehere, but only when I’m not.

CPU

I need ten more CPUs in order to finish my paper by Friday!

CPU

CPU

May I use your CPUs?

authserver

Is thisperson a

CSE grad?

secure I/O

My friends in Italy needto access this data.

disk diskdisk

I’m not root!

PBs of workstation storage!Can I use this as a cache?

Page 27: Research Issues in Cooperative Computing Douglas Thain ccl.

Cooperative Computing Credo

• Put users in charge of their resources.– Share resources as they see fit.– Expose information for debugging.

• Mode of operation:– Make tools that are foolproof enough for

casual use by one or two people in the office.– If they really are foolproof, then they will also

be suitable for deployment in large scale systems such as computational grids.