Top Banner
Meandre: Semantic-Driven Data-Intensive Flows in the Clouds Xavier Llora, Bernie Acs, Loretta Auvil, Boris Capitanu, Michael Welge, David Goldberg National Center for Supercomputing Applications University of Illinois at Urbana-Champaign {xllora, acs1, lauvil, capitanu, mwelge, deg}@illinois.edu The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR:
64

SEASR eScience 2008

Nov 29, 2014

Download

Technology

Loretta Auvil

Presentation of Meandre: Semantic-Driven Data-Intensive Flows in the Clouds at eScience 2008 by Bernie Acs

Data-intensive flow computing allows efficient processing of large volumes of data otherwise unapproachable. This paper introduces a new semantic-driven data-intensive flow infrastructure which: (1) provides a robust and transparent scalable solution from a laptop to large-scale clusters, (2) creates an unified solution for batch and interactive tasks in high-performance computing environments, and (3) encourages reusing and sharing components. Banking on virtualization and cloud computing techniques, the Meandre infrastructure is able to create and dispose Meandre clusters on demand, being transparent to the final user. This paper also presents a prototype of such clustered infrastructure and some results obtained using it.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SEASR eScience 2008

Meandre: !Semantic-Driven Data-Intensive !

Flows in the Clouds Xavier Llora, Bernie Acs, Loretta Auvil, Boris Capitanu, Michael Welge, David Goldberg

National Center for Supercomputing Applications!University of Illinois at Urbana-Champaign

{xllora, acs1, lauvil, capitanu, mwelge, deg}@illinois.edu

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

SEASR:

Page 2: SEASR eScience 2008

SEASR: Software Environment for the!Advancement of Scholarly Research

•  Funded by the Andrew W. Mellon Foundation to answer the humanities community’s call for a research and development environment capable of powering leading edge digital humanities initiatives.

•  Fosters collaboration through empowering scholars to share data and research processes with an infrastructure and framework designed to support reusable, repeatable, and scalable services and processes.

•  Designed to enable developers to rapidly design, build, and share software applications that support research and collaboration using modular components that can be assembled to create reusable data-flows.

•  Project web site: http://seasr.org

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

SEASR: The Project

Page 3: SEASR eScience 2008

SEASR: The High-Altitude Picture

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 4: SEASR eScience 2008

SEASR: @ Work – DISCUS

Page 5: SEASR eScience 2008

SEASR: @ Work – NEMA

Page 6: SEASR eScience 2008

SEASR: @ Work – NESTER

Page 7: SEASR eScience 2008

SEASR: @ Work – MONK

Page 8: SEASR eScience 2008

SAESR: @ Work – Evolution Highway

Page 9: SEASR eScience 2008

SEASR: A Quick Overview

•  Addresses:

–  Challenges of transforming information into knowledge

–  Constructs software bridges to migrate unstructured and semi-structured data into structured data and/or metadata to enable analysis and accessibility.

•  Aims:

–  Make digital collections more useful and flexible

–  Provide access to analytic processes and visualizations

–  Enable easy mash-up with other web-based services (SOA)

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 10: SEASR eScience 2008

SEASR: Knowledge Discovery…

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Predictable process

The Process •  Selection •  Preparation •  Transform •  Processing •  Interpret

Page 11: SEASR eScience 2008

SEASR: Knowledge Discovery…

Domains •  Literature •  History •  Music •  Art • Science

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Predictable process across domains.

Page 12: SEASR eScience 2008

SEASR: Knowledge Discovery…

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Predictable process across domains and digital collections.

Collection Types • Text •  Multimedia •  Data

Page 13: SEASR eScience 2008

SEASR: Design Goals

•  Transparency

–  From a single laptop to a HPC cluster

–  Not bound to a particular computation fabric

–  Allow heterogeneous development

•  Intuitive programming paradigm

–  Modular Components, Flows, and Reusable

–  Foster Collaboration and Sharing

•  Open Source

•  Service Orientated Architecture (SOA)

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 14: SEASR eScience 2008

Meandre: Infrastructure

•  SEASR/Meandre Infrastructure:

–  Dataflow execution paradigm

–  Semantic-web driven

–  Web Oriented

–  Supports publishing services

–  Modular components

–  Encapsulation and execution mechanism

–  Promotes reuse, sharing, and collaboration

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 15: SEASR eScience 2008

Meandre: Data Driven Execution

•  Execution Paradigms

–  Conventional programs perform computational tasks by executing a sequence of instructions.

–  Data driven execution revolves around the idea of applying transformation operations to a flow or stream of data when it is available.

•  Dataflow Approach

–  May have zero to many inputs

–  May have zero to many outputs

–  Performs a logical operation when data is available The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 16: SEASR eScience 2008

Meandre: Dataflow Example

•  Dataflow Addition Example

–  Logical Operation ‘+’

–  Requires two inputs

–  Produces one output

•  When two inputs are available

–  Logical operation can be preformed

–  Sum is output

•  When output is produced

–  Reset internal values

–  Wait for two new input values to become available The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Value1

Value2

Sum

Page 17: SEASR eScience 2008

Meandre: The Dataflow Component

•  Data dictates component execution semantics

Component

P

Inputs Outputs

Descriptor in RDF!of its behavior

The component !implementation

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 18: SEASR eScience 2008

Meandre: Component Metadata

•  Describes a component

•  Separates:

–  Components semantics (black box)

–  Components implementation

•  Provides a unified framework:

–  Basic building blocks or units (components)

–  Complex tasks (flows)

–  Standardized metadata

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 19: SEASR eScience 2008

Meandre: Semantic Web Concepts

•  Relies on the usage of the resource description framework (RDF) which uses simple notation to express graph relations written usually as XML to provide a set of conventions and common means to exchange information

•  Provides a common framework to share and reuse data across application, enterprise, and community boundaries

•  Focuses on common formats for integration and combination of data drawn from diverse sources

•  Pays special attention to the language used for recording how the data relates to real world objects

•  Allows navigation to sets of data resources that are semantically connected.

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 20: SEASR eScience 2008

Meandre: Metadata Ontologies

•  Meandre's metadata relies on three ontologies:

–  The RDF ontology serves as a base for defining Meandre descriptors

–  The Dublin Core Elements ontology provides basic publishing and descriptive capabilities in the description of Meandre descriptors

–  The Meandre ontology describes a set of relationships that model valid components, as understood by the Meandre execution engine architecture

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 21: SEASR eScience 2008

Existing!Standards

Meandre: Components in RDF

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

@prefix meandre: <http://www.meandre.org/ontology/> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix dc: <http://purl.org/dc/elements/1.1/> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix : <#> .

<http://dita.ncsa.uiuc.edu/meandre/e2k/components/limited-iterations> meandre:name "Limited iterations"^^xsd:string ; rdf:type meandre:executable_component ; dc:creator "Xavier Llora"^^xsd:string ; dc:date "2007-11-17T00:32:35"^^xsd:date ; dc:description "Allows only a limited number of

iterations"^^xsd:string ; dc:format "java/class"^^xsd:string ; dc:rights "University of Illinois/NCSA Open Source

License"^^xsd:string ; meandre:execution_context <http://norma.ncsa.uiuc.edu/public-dav/Meandre/demos/E2K/V1/resources/colt.jar> , <http://norma.ncsa.uiuc.edu/public-dav/Meandre/demos/E2K/V1/resources/gacore.jar> ,

<http://dita.ncsa.uiuc.edu/meandre/e2k/components/limited-iterations/implementation/> ,

<http://norma.ncsa.uiuc.edu/public-dav/Meandre/demos/E2K/V1/resources/gacore-meandre.jar> ,

<http://norma.ncsa.uiuc.edu/public-dav/Meandre/demos/E2K/V1/

Page 22: SEASR eScience 2008

Meandre: Components Types

•  Components are the basic building block of any computational task.

•  There are two kinds of Meandre components:

–  Executable components

•  Perform computational tasks that require no human interactions during runtime

•  Processes are initialized during flow startup and are fired when in accordance to the policies defined for it.

–  Control components

•  Used to pause dataflow during user interaction cycles

•  WebUI may be a HTML Form, Applet, or Other user interface

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 23: SEASR eScience 2008

Meandre: Component Assemblies

•  Defined by connecting outputs from one component to the inputs of another.

–  Cyclical connections are supported

–  Components may have

•  Zero to many inputs

•  Zero to many output

•  Properties that control runtime behavior

•  Described using RDF

–  Enables storage, reuse, and sharing like components

–  Allows discovery and dynamic execution

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 24: SEASR eScience 2008

Meandre: Flow (Complex Tasks)

•  A flow is a collection of connected components

Read

P Merge

P

Do

P

Show

P

Get

P

Dataflow execution The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 25: SEASR eScience 2008

Meandre: Create, Publish, & Share

•  “Components” and “Flows” have RDF descriptors

–  Easily shared, fosters sharing, & reuse

–  Allow machines to read and interpret

–  Independent of the implementations

–  Combine different implementation & platforms

–  Components: Java, Python, Lisp, Web Services

–  Execution: On a Laptop or a High Performance Cluster

•  A “Location” is RDF descriptor of one to many components, one to many flows, and their implementations

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 26: SEASR eScience 2008

Meandre: Repository & Locations

•  Each location represents a set components/flows

•  Users can

–  Combine different locations together

–  Create components

–  Assemble flows

–  Share components and flows

•  Repositories Help

–  Administrate complex environments

–  Organize components and flows

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 27: SEASR eScience 2008

Meandre: Metadata Properties

•  Components and Flows share properties such as component name, creator, creation date, description, tags, and rights.

•  Components specific metadata to describe the components' behavior, it’s location, type of implementation, firing policy, runnable, format, resource location, and execution context

•  Flow specific metadata describes the directed graph of components, components instances, connectors, connector instance data port source, connector, instance data port target, connector instance source, connector instance target, instance name

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 28: SEASR eScience 2008

Meandre: Programming Paradigm

•  The programming paradigm creates complex tasks by linking together a bunch of specialized components. Meandre's publishing mechanism allows components develop by third parties to be assembled in a new flow.

•  There are two ways to develop flows :

–  Meandre’s Workbench visual programming tool

–  Meandre’s ZigZag scripting language

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 29: SEASR eScience 2008

Meandre: Workbench Existing Flow

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Flows

Components

Locations

Page 30: SEASR eScience 2008

Meandre: Workbench Create Flow

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Flows

Components

Locations

Page 31: SEASR eScience 2008

Meandre: Workbench Create Flow

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Flows

Components

Locations

Drag & Drop Selected Component into workspace

Page 32: SEASR eScience 2008

Meandre: Workbench Create Flow

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Flows

Components

Locations

Properties for Selected Component Exposed

Page 33: SEASR eScience 2008

Meandre: Workbench Create Flow

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Flows

Components

Locations

Description for Selected Component Exposed

Page 34: SEASR eScience 2008

Meandre: Workbench Create Flow

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Flows

Components

Locations

Drag & Drop Another Component into workspace

Page 35: SEASR eScience 2008

Click First Port to connect will highlight with color change (Red)

Meandre: Workbench Create Flow

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Flows

Components

Locations

Connect Output of First Component to Input of Second

Page 36: SEASR eScience 2008

Meandre: Workbench Create Flow

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Flows

Components

Locations

Connect Output of First Component to Input of Second

Click Port to Connect will cause a line to be displayed as visual indicator

Page 37: SEASR eScience 2008

Meandre: Workbench Create Flow

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Flows

Components

Locations

Repeat Drag & Drop to Complete the Assembly

Page 38: SEASR eScience 2008

Meandre: ZigZag Script Language

•  ZigZag is a simple language for describing data-intensive flows

–  Modeled on Python for simplicity.

–  ZigZag is declarative language for expressing the directed graphs that describe flows.

•  Command-line tools allow ZigZag files to compile and execute.

–  A compiler is provided to transform a ZigZag program (.zz) into Meandre archive unit (.mau).

–  Mau(s) can then be executed by a Meandre engine.

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 39: SEASR eScience 2008

Meandre: ZigZag Script Language

•  As an example the Flow Diagram

–  The flow below pushes two strings that get concatenated and printed to the console

– 

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 40: SEASR eScience 2008

•  ZigZag code that represents example flow:

Meandre: ZigZag Script Language

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

# # Imports the three required components and creates the component aliases # import <http://localhost:1714/public/services/demo_repository.rdf> alias <http://test.org/component/push_string> as PUSH alias <http://test.org/component/concatenate-strings> as CONCAT alias <http://test.org/component/print-object> as PRINT # # Creates four instances for the flow # push_hello, push_world, concat, print = PUSH(), PUSH(), CONCAT(), PRINT() # # Sets up the properties of the instances # push_hello.message, push_world.message = "Hello ", "world!" # # Describes the data-intensive flow # @phres, @pwres = push_hello(), push_world() @cres = concat( string_one: phres.string; string_two: pwres.string ) print( object: cres.concatenated_string ) #

Repository Location

Defines the logical repository location where components in this flow can be found similar to defining a location for workbench which would then display available components located there

Page 41: SEASR eScience 2008

•  ZigZag code that represents example flow:

Meandre: ZigZag Script Language

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

# # Imports the three required components and creates the component aliases # import <http://localhost:1714/public/services/demo_repository.rdf> alias <http://test.org/component/push_string> as PUSH alias <http://test.org/component/concatenate-strings> as CONCAT alias <http://test.org/component/print-object> as PRINT # # Creates four instances for the flow # push_hello, push_world, concat, print = PUSH(), PUSH(), CONCAT(), PRINT() # # Sets up the properties of the instances # push_hello.message, push_world.message = "Hello ", "world!" # # Describes the data-intensive flow # @phres, @pwres = push_hello(), push_world() @cres = concat( string_one: phres.string; string_two: pwres.string ) print( object: cres.concatenated_string ) #

Alias

Assigns a logical name reference for each component making subsequent program calls easier to read and write.

Page 42: SEASR eScience 2008

•  ZigZag code that represents example flow:

Meandre: ZigZag Script Language

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

# # Imports the three required components and creates the component aliases # import <http://localhost:1714/public/services/demo_repository.rdf> alias <http://test.org/component/push_string> as PUSH alias <http://test.org/component/concatenate-strings> as CONCAT alias <http://test.org/component/print-object> as PRINT # # Creates four instances for the flow # push_hello, push_world, concat, print = PUSH(), PUSH(), CONCAT(), PRINT() # # Sets up the properties of the instances # push_hello.message, push_world.message = "Hello ", "world!" # # Describes the data-intensive flow # @phres, @pwres = push_hello(), push_world() @cres = concat( string_one: phres.string; string_two: pwres.string ) print( object: cres.concatenated_string ) #

Implementation Instances

Create instances of the components using the “Alias” references similar to dragging components on to workbench canvas

Page 43: SEASR eScience 2008

•  ZigZag code that represents example flow:

Meandre: ZigZag Script Language

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

# # Imports the three required components and creates the component aliases # import <http://localhost:1714/public/services/demo_repository.rdf> alias <http://test.org/component/push_string> as PUSH alias <http://test.org/component/concatenate-strings> as CONCAT alias <http://test.org/component/print-object> as PRINT # # Creates four instances for the flow # push_hello, push_world, concat, print = PUSH(), PUSH(), CONCAT(), PRINT() # # Sets up the properties of the instances # push_hello.message, push_world.message = "Hello ", "world!" # # Describes the data-intensive flow # @phres, @pwres = push_hello(), push_world() @cres = concat( string_one: phres.string; string_two: pwres.string ) print( object: cres.concatenated_string ) #

Set the Property Values

Define the property values for components which is similar to filing in values in the workbench’s properties panel.

Page 44: SEASR eScience 2008

•  ZigZag code that represents example flow:

Meandre: ZigZag Script Language

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

# # Imports the three required components and creates the component aliases # import <http://localhost:1714/public/services/demo_repository.rdf> alias <http://test.org/component/push_string> as PUSH alias <http://test.org/component/concatenate-strings> as CONCAT alias <http://test.org/component/print-object> as PRINT # # Creates four instances for the flow # push_hello, push_world, concat, print = PUSH(), PUSH(), CONCAT(), PRINT() # # Sets up the properties of the instances # push_hello.message, push_world.message = "Hello ", "world!" # # Describes the data-intensive flow # @phres, @pwres = push_hello(), push_world() @cres = concat( string_one: phres.string; string_two: pwres.string ) print( object: cres.concatenated_string ) #

Describe Connections

Define the connections or relationships between the components in this flow which is similar to drawing connection lines on the workbench canvas

Page 45: SEASR eScience 2008

Meandre: ZigZag Script Language

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

# # Describes the data-intensive flow # @pu = push() @pt = pass( string:pu.string ) print( object:pt.string )

•  Automatic Parallelization

–  Multiple instances of a component could be run in parallel to boost throughput.

–  Specialized operator available in ZigZag Scripting to cause multiple instances of a given component to used

•  Consider a simple flow example show in the diagram

•  The dataflow declaration would look like

Page 46: SEASR eScience 2008

•  Automatic Parallelization

–  Adding the operator [+AUTO] to middle component

–  [+AUTO] tells the ZigZag compiler to parallelize the “pass component instance” by the number of cores available on system.

–  [+AUTO] may also be written [+N] where N is an numeric value to use for example [+10].

Meandre: ZigZag Script Language

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

# Describes the data-intensive flow # @pu = push() @pt = pass( string:pu.string ) [+AUTO] print( object:pt.string )

Page 47: SEASR eScience 2008

•  Automatic Parallelization

–  Adding the operator [+4] would result in a directed graph

Meandre: ZigZag Script Language

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

# Describes the data-intensive flow # @pu = push() @pt = pass( string:pu.string ) [+4] print( object:pt.string )

Page 48: SEASR eScience 2008

•  Automatic Parallelization

–  ZigZag has created 4 parallel instances of the component.

•  It has also introduced a mapper instance that is in charge of distributing the incoming data to each of the parallel instance.

•  This is called unordered parallelization, since data may be arriving to the print flow out of the original order in which they were generated by the push component instance.

Meandre: ZigZag Script Language

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 49: SEASR eScience 2008

•  Automatic Parallelization

–  The operator [+AUTO] can be told to maintain data order with “!”

–  The [+AUTO!] tells the ZigZag compiler to parallelize the “pass component instance” by the number of cores available on system and to maintain order of data throughput.

Meandre: ZigZag Script Language

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

# Describes the data-intensive flow # @pu = push() @pt = pass( string:pu.string ) [+AUTO!] print( object:pt.string )

Page 50: SEASR eScience 2008

•  Automatic Parallelization

–  ZigZag has created 4 parallel instances of the component.

•  It has also introduced a mapper instance that is in charge of distributing the incoming data to each of the parallel instance.

•  It has also introduced a reducer instance that is in charge of distributing the incoming data to each of the parallel instance

Meandre: ZigZag Script Language

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 51: SEASR eScience 2008

Meandre: Flows to MAU

•  Flows can be executed using their RDF descriptors

•  Flows can be compiled into MAU

•  MAU is:

–  Self-contained representation

–  Ready for execution

–  Portable

–  The base of flow execution in grid environments

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 52: SEASR eScience 2008

Meandre: The Architecture

•  The design of the Meandre architecture follows three directives:

–  provide a robust and transparent scalable solution from a laptop to large-scale clusters

–  create an unified solution for batch and interactive tasks

–  encourage reusing and sharing components

•  To ensure such goals, the designed architecture relies on four stacked layers and builds on top of service-oriented architectures (SOA)

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 53: SEASR eScience 2008

Meandre: Basic Single Server

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 54: SEASR eScience 2008

Meandre MDX: Cloud Computing

•  Servers can be

–  instantiated on demand

–  disposed when done or on demand

•  A cluster is formed by at least one server

•  The Meandre Distributed Exchange (MDX)

–  Orchestrates operational integrity by managing cluster configuration and membership using a shared database resource.

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 55: SEASR eScience 2008

Meandre MDX: The Picture

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

MDXBa

ckbo

ne

Page 56: SEASR eScience 2008

Meandre MDX: The Architecture •  Virtualization infrastructure

–  Provide a uniform access to the underlying execution environment. It relies on virtualization of machines and the usage of Java for hardware abstraction.

•  IO standardization

–  A unified layer provides access to shared data stores, distributed file-system, specialized metadata stores, and access to other service-oriented architecture gateways.

•  Data-intensive flow infrastructure

–  Provide the basic Meandre execution engine for data-intensive flows, component repositories and discovery mechanisms, extensible plugins and web user interfaces (webUIs).

•  Interaction layer

–  Can provide self-contained applications via webUIs, create plugins for third-party services, interact with the embedding application that relies on the Meandre engine, or provide services to the cloud.

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 57: SEASR eScience 2008

Meandre MDX: The Experiment

•  Experimental Prototype

–  Designed and built to validate viability of MDX cluster

–  Using VMWare Server 2.0 on three identical hosts with

•  Windows Server 2003

•  Equipped with two quad-core 2.8GHz Xeon processors

•  1600MHz front side bus

•  32Gb of RAM

•  4Tb of RAID 5 disk

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 58: SEASR eScience 2008

Meandre MDX: The Experiment

•  Experimental Prototype

–  8 virtual Machine instances were created on each host with

•  32-bit Ubuntu 8.04 Linux

•  3 Gb RAM dedicated to each instance

•  1 Physical processor core assigned to each VM

•  VM instances were equipped to run a Meandre MDX server using Sun's Java 1.5 JVM

–  A Third Physical hosts support 2 virtual machine instances with

•  32-bit Ubuntu 8.04 Linux

•  3 Gb RAM dedicated to each instance

•  1 Physical processor core assigned to each VM

•  Highly available MySQL database and HTTP load-balancing facility

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 59: SEASR eScience 2008

Meandre MDX: The Experiment

•  We conducted three different experiments

–  All three were based on the same flow shown earlier in the ZigZag example with a single change to make the single line of text into 250,000 lines of text for each iteration of the flow.

–  The first test was designed to test the scalability of a single Meandre server.

–  Concurrent flows !running on a standalone!engine on a log/log scale, !each iteration of the flow !pushed 250,000 lines of text

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 60: SEASR eScience 2008

Meandre MDX: The Experiment

•  We conducted three different experiments

–  All three were based on the same flow shown earlier in the ZigZag example with a single change to make the single line of text into 250,000 lines of text for each iteration of the flow.

–  The second experiment were run against a virtual Meandre cluster consisting of 16 Meandre servers.

–  Concurrent flows !running on a standalone!engine on a log/log scale, !each iteration of the flow !pushed 1 lines of text

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 61: SEASR eScience 2008

•  We conducted three different experiments

–  All three were based on the same flow shown earlier in the ZigZag example with a single change to make the single line of text into 250,000 lines of text for each iteration of the flow.

–  The third experiment were run against a virtual Meandre cluster consisting of 16 Meandre servers.

–  Concurrent flows !running on a standalone!engine on a log/log scale, !each iteration of the flow !pushed 250,000 lines of text

Meandre MDX: The Experiment

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 62: SEASR eScience 2008

Meandre MDX: The Experiment

•  We conducted three different experiments

–  The first test clearly shows

•  The average time per flow increased linearly with the number of concurrent flows

–  The next experiments clearly shows

•  Cluster throughput grows linearly with the number of Meandre servers available

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 63: SEASR eScience 2008

Upcoming Events

•  SEASR 2009 workshop

–  The workshop is organized to provide expanded opportunities for learning, knowledge sharing, and support and is intended to provide sufficient introduction and support so that teams can implement a study using SEASR.

–  The workshop is intended for institutional teams of scholars from the Humanities.

–  The workshop will include communication and work from a team’s home campus as well as face-to-face meeting on the University of Illinois campus.

Page 64: SEASR eScience 2008

Meandre: !Semantic-Driven Data-Intensive !

Flows in the Clouds Xavier Llora, Bernie Acs, Loretta Auvil, Boris Capitanu, Michael Welge, David Goldberg

National Center for Supercomputing Applications!University of Illinois at Urbana-Champaign

{xllora, acs1, lauvil, capitanu, mwelge, deg}@illinois.edu

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

SEASR: