Top Banner
National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context of those activities Adopt the Rube Goldberg view Rube Goldberg
28

National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

Dec 17, 2015

Download

Documents

Zoe Holt
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

The Way Things Go

e-Science is a complex activity

Scientific knowledge is comprehensible only in the context of those activities

Adopt the Rube Goldberg view

Rube Goldberg

Page 2: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

Grand challenge: systems-scale science

Observation and modeling of multiple systems at multiple scales

Linking data and tools from different disciplines

to get a valid global result!

“... modeling complex systems will be a major research challenge for the 21st century”- National Science Foundation

Page 3: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

Building current practices up isn't working

Heterogeneous tools, data formats

Little global coordination of research

Little funding for sustained stewardship of tools and data

M.C. Escher, “Tower of Babel” (1928)

Page 4: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

Proposed solutions aren't working

e-Journals – not machine-interpretable Collaboration tools

scientists just use email like everyone else Portals and digital libraries – typically:

centralized domain-specific

The Grid – can orchestrate complex processing jobs, but that's not science

Page 5: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

Only networks work at scale

Single researcher Ad hoc data mgt,

single-user apps Community

Community tools, resources, control

Global No global practice,

tools, control

Desktop

Workgroup

Network

Page 6: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

How do we get there?

e-Science means managing Process, and Data

Current approaches favor one or the other

Information is getting lost

model

refine

observe

predict

data

criticalinterface

Page 7: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

Trends: process data

Data Semantics

Batch

Metadata

Interactive

Workflow

* mainframes

* digital libraries

* portals

* ontologies

* provenance

* desktop apps

* formats

* e-notebooks

* the grid

process

data

* rules

Page 8: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

Key technologies

Semantic web: data/metadata Provides means of merging descriptive

information even if it only partially agrees (e.g., comes from two different communities)

Workflow: process Describes complex procedures independently

of how they are executed Provenance: process + data/metadata

Links workflow, data, and any ancillary descriptive information (e.g., attribution)

Page 9: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

Semantics: data to knowledge

Data

Information

Knowledge

Concrete

Abstract

Aggregation, annotation

Learning, inference

Streams, arrays,swaths, etc.(a.k.a. files)

Collections, tags,attributes, etc.(a.k.a. metadata)

Ontologies, rules,models, etc.(a.k.a. semantics)

(cf Reagan Moore)

Page 10: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

Semantic web: RDF triple

Declarative: asserts a fact Subject and object URI's identify arbitrary

entities (things, people, concepts, events) Predicate identifies the relationship

between them

subject objectpredicate

Page 11: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

Triples form an open network

Subject nodes aren't “owned” by any single agent or container

Any actor can add arcs to the implicit, total, world graph

Any two graphs can be joined

hasBreed

Page 12: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

Non satis non scire(to know is not enough)

Semantic web “layer cake”

Where do we manage process? User interface? Applications?

“Semantic Grid” (D. DeRoure, C. Goble)

(source: World Wide Web Consortium)

Page 13: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

Workflow: process description

Describe complex operations as networks of simpler operations

Abstract operation execution from description

Can be shared (but may not be portable)

(Taverna)

(Kepler)

Page 14: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

Anatomy of a workflow

Declarative: says what do to

Modules identify arbitrary procedures

Arcs identify flow of control and/or data (data flow is usually implicit)“Module”

Control flow

Execution model (usu. implicit)

Page 15: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

Workflow systems

Modules representing units of computation

Language for specifying WF modules control flow

Engine for executing WF

D2K (source: NCSA)

Page 16: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

Work vs. workflow systems

Scientists are not WF modules

Science work also involves social organization

incl. funding field and “wet lab”

manual work discourse: review,

validation(source: CNRS/UCSD)

Page 17: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

Provenance: what happened

Answers critical questions What led to this

result? When and how

were observations made, conclusions reached?

Is a causal network of events

Page 18: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

Complementary incomplete notions of provenance

Artifact-centric (e.g., digital libraries) “lineage”= events

in lifecycle of artifact e.g., custody

IR's focus on curation events (not antecedent processes)

Process-centric (e.g., workflow) computational

events (e.g., service invocations)

control flow artifacts are either

not mentioned or opaque (tool-specific)

Page 19: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

Provenance Challenges 1 & 2

IPAW 2006, HPDC 2007

20 teams, 1 workflow, 9 queries major players

Interoperability? lots of manual work

required call for standards

(source: gridprovenance.org)

Page 20: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

Artifact + process provenance = “open provenance”

Can describe any process, not just WF execution (e.g., science!)

Allows alternate accounts by different observers

Rules for inferring transitive causal relationships

(source: Luc Moreau et al)

Page 21: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

Open Provenance Model

3 node types – artifact, process, agent 5 arc types – used, generated, triggered,

derived, controlled – and inference rules Generic – extensibility via annotation Choice of granularity and focus (e.g.,

artifact or process-centric)

(source: Luc Moreau et al)

Page 22: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

NCSA Provenance Infrastructure

Open Provenance Model

Tupelo Semantic Content Repository

Context ContextContext

OPM toolkit

Store Store Store

OPM toolkit

Visualization,interaction

Tracking,modeling,presentation

Abstraction,inference,storage

destkop,portal,etc.

Page 23: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

Tupelo: semantic content

Abstracts content from storage impls (e.g., Sesame, Mulgara)

Provides location-independent addressing of content and metadata

Supports transparent mirroring, caching, failover, etc.

(tupeloproject.org)

Page 24: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

CyberIntegrator: workflow by example

Records what users do as provenance source,

intermediate, and final artifacts

steps and parameters

Can re-enact interaction as a workflow

Page 25: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

MAEviz: analaysis/viz app, workflow “behind the scenes”

GIS app. platform Earthquake hazard

analysis plug-in Data catalog

built environment fragility/hazard

models Driven by workflow

-> provenance

Page 26: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

CyberCollaboratory: collaboration + provenance

User interaction with tools generates events

Events are captured using the OPM and published to Tupelo

Non-portal apps can browse / use provenance

Page 27: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

Summary

“The way things go” is critical to e-Science at scale

Provenance is an open causal network

New infrastructure supports provenance

Page 28: National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

National Center for Supercomputing Applications

Resources / acknowledgements Grid Provenance Challenge

http://twiki.gridprovenance.org/ NCSA technologies

Tupelo: http://tupeloproject.org/ CyberIntegrator: http://isda.ncsa.uiuc.edu/ MAEviz: http://maeviz.cee.uiuc.edu/ CyberCollaboratory:

http://ecid.ncsa.uiuc.edu/cybercollab/ Acknowledgements:

Jim Myers, Luc Moreau, Juliana Friere, Patrick Paulson, Simon Miles, Bob McGrath, and more ...