Top Banner
Integrating HPC Resources, Services, and Cyberinfrastructure to Develop Science Applications Using Web Application Frameworks M. P. Thomas 1,2 , C. Cheng 2 , S. More 2 , and H. Shah 2 1 Department of Computer Science, San Diego State University, San Diego, CA, USA 2 Computational Sciences Research Center, San Diego State University, San Diego, CA, USA Abstract The Cyberinfrastructure Web Application Framework (CyberWeb) simplifies access to heterogeneous, computational environments required by high-performance computing applications. CyberWeb has three core components: a Pylons Web2.0 framework, including XML, JavaScript, AJAX, Google APIs, social networks, and security; a Database and Web interface for configuring installations, applications, users, remote services; a job distribution service framework for task execution and management. CyberWeb design philosophy includes: “plug- n-play” mode - applications dynamically discover modifications to the system and automatically reload new components; “non-invasive” philosophy - no software is required to be installed on remote resources, it interfaces existing software and services. CyberWeb supports basic job functions (accounts, authentication, execution, task history); operates in heterogeneous environments from remote large-scale systems (XSEDE/TeraGrid) to local systems; applications built on top of the core framework (ocean, thermochemistry, education). In this paper we present the CyberWeb architecture, highlighting the database and JODIS architectures, and demonstrate its usefulness with application examples. 1. Introduction Advances in technologies and languages used for parallel and distributed computing have often presented the science researcher with the challenge of migrating a model or application to ever increasingly complex systems. Often, many science applications require tremendously large compute, data and archival resources and high-speed networks in order to manage results, or the model is often legacy code that is stable, but no longer ports to the high- end systems that are available. Cyberinfrastructure (CI) integrates hardware and software for computing, data management and information retrieval, visualization and analysis using interoperable software/middleware and services that are based on Web and Internet technologies. The NSF's Cyberinfrastructure Framework for 21 st Century Science and Engineering (CF21) sets an ambitious goal that next generation of cyberinfrastructure software must seamlessly couple high-end and low-end CI resources, networks and services, with users and applications using these commodity Internet and Web technologies [1]. The US DOE held a workshop in 2008 addressing the grand challenges and limitations that exist today in the field of high-resolution climate and Earth modeling systems [2]. The outcome of these efforts have helped to define the requirements for the next generation of HPC applications: new/updated models are needed that can take advantage of these cyberinfrastructure based environments; the NSF and DOE need to construct and support this cyberinfrastructure over the long run; and new tools and libraries are needed to facilitate the development of these applications. There are many efforts to develop common tools that can be reused at all layers of the science gateway architecture, including services to resources (data, compute, visualization, network), middleware, and user interfaces and portals and science gateways. Science gateways (and computational environments) are terms used to characterize the systems and tools that facilitate the utilization of CI by science applications; gateways typically involve a user interface [3][4]. This research is part of this effort: the software developed is contributed to the suite of tools developed by the NSF funded Open Grid Computing Environments (OGCE) project [5], which focuses on the development of gateways and tools. In this paper we describe advances made the SDSU Cyberinfrastructure Web Application Framework (CyberWeb, [6]) which is designed to simplify the development of advanced computational environments (CEs) used by high-performance computing (HPC) applications and science gateways. CyberWeb improves on standard CI toolkit functions (job execution, account management, task history, GSI authentication, etc) by hosting all applications as Web services, portal Web pages, or Web 2.0 gadgets. CyberWeb is being used to develop applications that need to operate across large-scale grids such as the XSEDE/TeraGrid, local university clusters, and commercial or public systems. In this paper, and overview of the CyberWeb architecture (Section 2), highlighting the database and JODIS architectures are presented. In Section 2.5, installation and deployment experiences are present, and Section 4 presents application examples. 2. CyberWeb Architecture The Cyberinfrastructure Web Application Framework (CyberWeb) architecture is shown in Figure 1. It reflects the standard 3-tier architecture found in systems that connect clients (human, computer) to remote resources via middleware. Front-end clients can be applications, services, or human (browsers, desktop apps, command line interfaces). The backend tier (cyberinfrastructure) includes local services, web services, other applications, and remote
7

Integrating HPC Resources, Services, and Cyberinfrastructure to

Sep 12, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Integrating HPC Resources, Services, and Cyberinfrastructure to

Integrating HPC Resources, Services, and Cyberinfrastructure to Develop Science Applications Using Web Application Frameworks

M. P. Thomas1,2

, C. Cheng2, S. More

2, and H. Shah

2

1Department of Computer Science, San Diego State University, San Diego, CA, USA

2Computational Sciences Research Center, San Diego State University, San Diego, CA, USA

Abstract – The Cyberinfrastructure Web Application

Framework (CyberWeb) simplifies access to heterogeneous,

computational environments required by high-performance

computing applications. CyberWeb has three core

components: a Pylons Web2.0 framework, including XML,

JavaScript, AJAX, Google APIs, social networks, and

security; a Database and Web interface for configuring

installations, applications, users, remote services; a job

distribution service framework for task execution and

management. CyberWeb design philosophy includes: “plug-

n-play” mode - applications dynamically discover

modifications to the system and automatically reload new

components; “non-invasive” philosophy - no software is

required to be installed on remote resources, it interfaces

existing software and services. CyberWeb supports basic

job functions (accounts, authentication, execution, task

history); operates in heterogeneous environments from

remote large-scale systems (XSEDE/TeraGrid) to local

systems; applications built on top of the core framework

(ocean, thermochemistry, education). In this paper we

present the CyberWeb architecture, highlighting the

database and JODIS architectures, and demonstrate its

usefulness with application examples.

1. Introduction

Advances in technologies and languages used for

parallel and distributed computing have often presented the

science researcher with the challenge of migrating a model

or application to ever increasingly complex systems. Often,

many science applications require tremendously large

compute, data and archival resources and high-speed

networks in order to manage results, or the model is often

legacy code that is stable, but no longer ports to the high-

end systems that are available. Cyberinfrastructure (CI)

integrates hardware and software for computing, data

management and information retrieval, visualization and

analysis using interoperable software/middleware and

services that are based on Web and Internet technologies.

The NSF's Cyberinfrastructure Framework for 21st

Century

Science and Engineering (CF21) sets an ambitious goal that

next generation of cyberinfrastructure software must

seamlessly couple high-end and low-end CI resources,

networks and services, with users and applications using

these commodity Internet and Web technologies [1]. The

US DOE held a workshop in 2008 addressing the grand

challenges and limitations that exist today in the field of

high-resolution climate and Earth modeling systems [2].

The outcome of these efforts have helped to define the

requirements for the next generation of HPC applications:

new/updated models are needed that can take advantage of

these cyberinfrastructure based environments; the NSF and

DOE need to construct and support this cyberinfrastructure

over the long run; and new tools and libraries are needed to

facilitate the development of these applications.

There are many efforts to develop common tools that

can be reused at all layers of the science gateway

architecture, including services to resources (data, compute,

visualization, network), middleware, and user interfaces and

portals and science gateways. Science gateways (and

computational environments) are terms used to characterize

the systems and tools that facilitate the utilization of CI by

science applications; gateways typically involve a user

interface [3][4]. This research is part of this effort: the

software developed is contributed to the suite of tools

developed by the NSF funded Open Grid Computing

Environments (OGCE) project [5], which focuses on the

development of gateways and tools. In this paper we

describe advances made the SDSU Cyberinfrastructure Web

Application Framework (CyberWeb, [6]) which is designed

to simplify the development of advanced computational

environments (CEs) used by high-performance computing

(HPC) applications and science gateways. CyberWeb

improves on standard CI toolkit functions (job execution,

account management, task history, GSI authentication, etc)

by hosting all applications as Web services, portal Web

pages, or Web 2.0 gadgets. CyberWeb is being used to

develop applications that need to operate across large-scale

grids such as the XSEDE/TeraGrid, local university clusters,

and commercial or public systems.

In this paper, and overview of the CyberWeb

architecture (Section 2), highlighting the database and

JODIS architectures are presented. In Section 2.5,

installation and deployment experiences are present, and

Section 4 presents application examples.

2. CyberWeb Architecture

The Cyberinfrastructure Web Application Framework

(CyberWeb) architecture is shown in Figure 1. It reflects the

standard 3-tier architecture found in systems that connect

clients (human, computer) to remote resources via

middleware. Front-end clients can be applications, services,

or human (browsers, desktop apps, command line

interfaces). The backend tier (cyberinfrastructure) includes

local services, web services, other applications, and remote

Page 2: Integrating HPC Resources, Services, and Cyberinfrastructure to

grid, computing and data services. CyberWeb uses the

Pylons Web Application Framework [7] for its core Web

2.0 services.

2.1 Pylons Web Application Framework

Pylons is a component based, lightweight Web 2.0

application framework that using only Web 2.0

technologies, including WSGI (Web Server Gateway

Interface, relational databases, XML, JavaScript, AJAX,

Google Gadgets, social networks, and security. Any number

of components and libraries developed by other projects are

part of the system and new ones are easily integrated.

Pylons uses the model-view-controller (MVC) request-

response architecture (Figure 1). The Model contains the

data used by applications. Often the model refers to

database tables. The View reads the data from the model and

displays it to the user. The Controller manages the logic of

the application, activates views to display data to the user,

or parses information from the user and stores them to the

models. CyberWeb application developers work with all

three components. The MVC approach decouples the

services layer from the logic of the code behind it, allowing

the same application to be hosted as different service types

(Web service, Web page, XML request, iGoogle gadget).

Pylons supports dynamic routing via Routes (Python

version of Rails), and is used for mapping URL's to

Controllers/Actions and generating URL's. Routes makes it

easy to create “pretty” and concise RESTful

(REpresentational State Transfer) URIs (uniform resource

indicators). Template packages are used to build dynamic

Web pages from a variety of sources: HTML file, XML

generators, scripts (perl, php), or templates.

2.1.1 Key Features

Pylons has several features that are beneficial to science

application and gateway developers. A key philosophy of

the Pylons project is external components can be plugged

into the framework with only the minimal code-base

necessary. This facilitates customization. Although the

features of Pylons are as rich as the libraries and modules

that are available to run with it, a few key features of

importance to this research are listed here.

Routing and “Beautiful URLs:” The feature makes it easy

to create “pretty” and concise RESTful (REpresentational

State Transfer) URIs (uniform resource indicators). With

RESTful URI’s, the information being transferred is

stateless, and independent of resource details.

SQLAlchemy database: Pylons supports many databases,

including SQLAlchemy which is a Python SQL toolkit and

Object Relational Mapper (ORM). ORMs map the database

structure to objects.

Interfaces to multiple template packages: Templates are

used to build Web pages in a dynamic manner. Pylons

allows any type of HTML file, XML generators, or template

modules to be used. xx This allows application developers

flexibility in choosing how they want to build their

application.

Dynamic update and interactive debugger: An important

feature of Pylons is the “reload” feature: it supports

dynamic class loading for all MVC layers; modifications to

the code cause the compiler to update the binary and re-load

it into the server while keeping the server live.

Web 2.0 and Open Social Interfaces: Pylons’ RESTful

Web services allows Pylons components be published as

services, widgets and gadgets, and desktop applications.

Pylons interfaces to other toolkits including: Google

Application Engine and OpenSocial gadgets; PyWebKitGtk

(an API for developers to program WebKit/Gtk).

In addition, AJAX/JavaScript support for libraries such

as the Yahoo! UI Library (YUI). The YUI, CSS, AJAX,

and Javascript are used by CyberWeb for nearly all demo

web pages.

2.2 Model and Database Services

At the core of CyberWeb is its database system, where

most of its configuration data is stored (users, hosts,

Figure 1. CyberWeb Architecture Showing client, middleware and backend resource cloud. Middleware layer

includes the Pylons framework, the CyberWeb database and Jodis services [6].

Page 3: Integrating HPC Resources, Services, and Cyberinfrastructure to

authentication, jobs, job history, etc.). In addition to using

this database to store session and user information, a key

design goal was to develop a database architecture must be

easily modified, populated dynamically, and then be used to

configure and build virtual organizations (VO’s).

Furthermore, the system must operate in a “plug-n-play”

mode: once modifications are committed, the new

information be readily available to the relevant components

of the system. This includes adding compute and archival

resources, defining and naming services, and configuring

these services on a resource. Once set up, all internal

services use this database to discover active resources, the

operational status of services, and the ACL of a CWuser for

that service or resource, and relevant user preferences.

2.2.1 Schema Design

The schema design was based on an evaluation of

schema used by multiple projects (TeraGrid [8]; Open Grid

Forum [9]; and W3C/IETF standards). The requirement that

the schema be simple, and useful to non-database experts

developing small applications, drove the design approach to

simplify and reduce the number of tables and elements in

them to a minimal set. The assumption is that an application

developer can expand the initial database to meet project

requirements. The perspective is from the point of view of a

high-performance computing environment: a resource is

typically a host computer (cluster, archival); services run on

these resources (SSH, FTP); and users have access to them

via authentication such as username/password or GSI

certificates.

The database is initialized using a JSON input file, and

can be populated dynamically. The database design has four

key components: (1) SQLAlchemy: a Python SQL toolkit

and Object Relational Mapper (ORM) that interfaces to

multiple RDBs; (2) SQLite: a lightweight, easy to install

database, the data is stored in memory within the server

(note that CyberWeb has also used MySQL, which is

desirable for larger databases); (3) The Model is coded into

a python module, and loaded into the Pylons server at

startup; and (4) Data is initialized using a flat text file

written in JavaSript Object Notation (JSON) format, and

easily edited to seed the database.

Because the model is defined in an ASCII file, the

developer can easily add new or modify existing tables to

the initialization database or make modification at run-time.

Note that for the examples shown in Table 1, all examples

can be modified or new ones added via the Database Admin

Interface (CW-DAI), described below. However, caution

should be used to ensure that core services are not adversely

affected. A good practice is too define all core components

and services in the initialization database.

If the codebase is loaded into a developer environment,

the file can be viewed easily with JSON markup tags; if

using an SVN, then an SVN viewer (eg. TRACS) allows for

easy viewing of the model. This is a key advantage for

developers who are not database experts. As part of the

codebase, there is a demo portal which contains examples of

how to query the model (database) for common tasks.

2.2.2 Database Admin Interface

The database can be managed via the SQLite (or

MySQL) command line interface. CyberWeb also has a

comprehensive Web based Database Admin Interface (DAI)

that is used to dynamically add, delete, modify, configure

and test resources, the services that will run on them, and a

users/groups access control list (ACL). The DAI module is

designed using the very Rich User Interface design. User

Interface guides user to do operations like add, edit or delete

on any entit, along with JQuery, Ajax, and javascript. Figure

2 captures a usecase of the workflow used to configure and

activate accounts for CyberWeb users (CWusers) on a

remote compute resource.

CyberWeb securely manages accounts (using HTTPS,

and sensitive data is encrypted and stored outside Web

space, and no passwords are stored), maps CWuser accounts

to accounts on remote resources, and tracks preferred

authentication schemes (e.g. PKI/SSH, GSI/SSH). All users

have PKI serverside credentials; GSI credentials can be

uploaded or obtained from a MyProxy server. For PKI

authentication, CyberWeb configures a password-less SSH

service for the Cyberweb user. To activate the resource for

the CWuser, a secure test is performed using an SSH

command. The CWuser must have an account on the remote

resource.

Table 1. List of CyberWeb model category and table names.

Table Category Table Names Description

Group Users, Groups CyberWeb user accounts (chosen by user), Groups (used for authentication &

authorization; e.g. admin, developer, application)

Accounts Account Accounts on remote resources; owned by a CyberWeb user account.

Protocol Protocol HTTP/HTTPS, TCP/IP, SSH, GSISSH,

Services Service Type,

Name, Service

Used to define service types (authentication, queue, application, archival); names of

specific services (SSH is type authentication); a Service (e.g. an SSH Service) has a type,

name, and is installed onto a Resource.

Queue Systems QueueType,

Name, Info

QueueService

Used to define queuing service types (batch, condor, grid); names of specific services

(LSF, PBS, Torque, SGE); a QueueService (e.g. an SGE Service) has a type, name, infor,

and is installed onto a Resource.

Resources Resource Defines compute, archival, and networks used (primarily) for remote job execution. A

Resource has DN or IP address; and are used (typically) to host Services.

Job Job CyberWeb tracks internal Jobs and Tasks, maintaining job history. This is independent

from job ids used on remote resources.

Message MessageTypes,

Message

Used for communication among CyberWeb users and applications, messages types

include news, events, jobnotification; a Message has a type.

Page 4: Integrating HPC Resources, Services, and Cyberinfrastructure to

2.3 Job Distribution Service (JODIS)

The job distribution Web service framework (JODIS)

was allows CyberWeb applications to distribute jobs across

several campus compute clusters each running a different

resource manager and each controlled by a different system

administrator. Its main duty is to distribute application

workloads across heterogeneous computing systems by

abstracting middleware and resource management systems.

The JODIS Web Service application framework based on

the master/worker design pattern using common

commodity software allows us to bridge these systems

unbeknownst to the developer and user. It has been tightly

coupled with the SDSU Cyberinfrastructure Web

Application Toolkit (cyberWeb) framework making it a full-

scale web application [10].

2.3.1 JODIS APIs

Job Management: The job management API is the one

component of JODIS that does not ease development by

abstracting a layer in the application stack, instead, this API

tracks and manages jobs run with this system. It aggregates

every aspect of the job from what kind of job, how many

tasks, when it was run and the run time. We believe all of

these aspects are useful for the administrator of the portal

and will ultimately be used to measure how well an

application is running.

Job Queue: This API allows the developer to interface to

any queuing system. The current system includes Condor,

PBS and Sun Grid Engine (SGE). There are many more in

use at SDSU and elsewhere. Each application has varying

syntax and command-line parameters. The JODIS job

queuing API abstracts these allowing the developer to make

one call to a system’s queuing application regardless of

what that may be. This allows a developer to quickly move

to a new system or handle system configuration changes.

File Transfer: The second API that JODIS provides is

file transfer. The API mimics that of a secure copy

command. This API allows a developer to move and copy

files from one machine to the next regardless of the protocol

you are using to connect. A big use case for this

functionality is easy on-the-fly deployment of an application

and transferring data as needed by your application. A

second common use for this is allowing a user to access

his/her data results and move it back to another machine

possibly for visualization, archiving or sharing with

colleagues. This file transfer view method discussed earlier

in the paper extensively utilizes the JODIS file transfer API.

Raw System Calls: Jodis allows the developer to run raw

commands on the remote resources as if he/she were at the

command prompt. This gives the developer the ultimate

flexibility when developing applications using CyberWeb

and JODIS. A user can call the raw method and pass in the

command to run. Unfortunately, this is done synchronously.

The user must wait for the command to return before an

HTTP response is sent back to the user.

2.3.2 JODIS Services

JODIS consists of multiple services offered through a

Web 2.0 server environment. Providing each component as

a service allows JODIS to

scale horizontally. In the

web application

environment, these

services work together to

provide end-to-end job

dispatching service

simultaneously to multiple

clients. The architecture

for the JODIS system can

be found in Fig. 1. The

Web server environment

(based on CyberWeb, see

below) provides users

with methods of

communicating with

JODIS using either a

Python client API or Web

Service to access the Job

Service.

The Web Service allows a

wide variety of

applications to interact

with JODIS regardless of

location, device or

programming language.

The Job Service provides

a provides a majority of theuser-accessible function calls

and manages a user’s jobs regardless of the resource. The

Resource Service works on the back end as a singleton to

manage the various connections between the JODIS and the

compute resources being used. The ability for JODIS to

gather usage information and use this data for predicting job

runtimes and for selecting where to run a job provides a

useful approach to running MTC jobs.

Job Services: Clients primarily interact with the JODIS

Job Service API. It is responsible for integrating all the

services that make up Jodis. This service is used for job

submission and monitoring. This service wraps the the

resource management systems and abstracts the

complexities involved with job submission such as job

syntax, tracking and management of all jobs across the

different resources. CyberWeb interfaces to batch queue

systems (LSF, PBS, SGE) and schedulers ( Condor, SGE).

JODIS uses a job runtime “Guesstimation” to forecast job

runtimes and a distribution policy to dynamically choose

which compute resource to use for each job.

Resource Service: The resource service provides one

essential functions to JODIS and that is controlling

middleware. The resource service offers communication to

the compute resources and client targets via Secure Shell

(SSH) or GSI-Enabled SSH. This service leverages the

CyberWeb database for fast and flexible storage of resource

metadata, user account information and access control data.

This service can also be used to stage files on the various

compute resources and target machines.

Client Services: JODIS hosts a general client Web

Service for authorized job submission. Clients interact with

JODIS directly using the Python API, or more popularly,

Figure 2. Admin flowchart

for configuring user

accounts on a resource.

Page 5: Integrating HPC Resources, Services, and Cyberinfrastructure to

through the RESTful Web Service interface. JODIS also

hosts a WSDL that allows users to find the service as well

as keep up-to-date on the latest API.. Developers can extend

zzthe client service for specific applications with the use of

the Job Builder Client interface. The client service can hook

into the JODIS job service to provide functions such as pre-

processing, post-processing and more. The client service is

responsible for clustering the tasks, for each resource, which

are then passed as a collection of jobs to the JODIS job

service and distributed. An example of the Jodis job cycle

can be found in Figure 3. An example of the client service is

described below in Section 3.4.

2.4 Authentication

CyberWeb security works at two levels: requests and

responses processed by the Pylons Web server; and

transactions performed on behalf of a CWuser on a remote

resource or service using PKI or GSI authentication. To

authenticat users, CyerWeb uses the Pylons AuthKit module

[7] which is a complete authentication and authorization

framework for WSGI applications, and was written

specifically to provide Pylons with a flexible framework for

managing these tasks. This module queries the user table in

the CyberWeb database. Once the user is authenticated, the

user’s ID and group information is stored with the user

session data. Based on the ACL for an application,

CyberWeb automatically requires the user to re-authenticate

their session after a 5-minute period of idle time.

Decorators have been built using AuthKit to allow

developers to permission applications based on the user or

the group of a user. These decorators wrap access to the

decorated method and redirect the user to the login page or a

404 – “permission denied” page depending on whether the

user is not authenticated or lacks permission. The advantage

to using these decorators is that the developer has fine-grain

control over access and can control user interaction based on

the application. For eample, a user might be able to create a

workload to be run, but must have permission before the

user can submit this workload.

For remote transactions, two methods are currently

supported: Public Key Infrastructure (PKI) and Grid

Security Infrastructure (GSI) developed by the Globus

project [11]. PKI is used for ssh (secure shell) transactions.

Users must set up passwordless access in order to use a

resource (see Section 2.2.2). A credential associated with

the server is created for all CWuser accounts. If the user has

an account on a resource, then CyberWeb will put a copy of

the credential into the appropriate file on the remote

machine. For grid based transactions, CyberWeb uses the

MyProxy API [11] to obtain a user proxy certificate, which

is stored out of Web space, and gsissh to connect to remote

resources for job execution.

2.5 Installation and Deployment

CyberWeb applications can be run from most modern-

day operating systems: it has been tested on linux, Windows

and Mac OS X. CyberWeb is written using the Pylons web

framework. It requires minimally two Python libraries a)

Pylons, the web framework core of CyberWeb, b) Paramiko,

a Python wrapper for the SSH library. After installing the

pre-requisites, download the CyberWeb source code: it can

be run from any directory on your machine. An installation

challenge is where the CyberWeb server is located. Network

access to resources behind firewalls is a common issues.

For example, in order to access the clusters at San Diego

State University, servers must be on the campus network.

A key aspect of the OGCE project is its use of Maven

[12] to install the entire framework and all software

dependencies automatically for the client. In Python, this is

done using the Easy Install package. Easy Install

automatically downloads, builds, installs, and manages

Python packages. It installs the tar or jar equivalent called

the Python Egg. The Easy Install software comes with

commands (configured by Pylons) that allow you to bundle

your application into an egg for distribution. Earlier

experiments with extending the Paste installation egg to

include all software and versions needed for a complete

CyberWeb package were successful and this will be done

for future releases. The system administrator will be able to

fire up CyberWeb out of the box. Settings for the CyberWeb

installation can be found in the development.ini or

production.ini depending on the environment. In many

aspects, these two files will mirror each other. The main

difference between the two should mainly be the debug

variable. The production.ini file should have debug set to

Figure 3. Diagram of a typical JODIS job cycle for job

submission from a CHEQS client.

Figure 4. Log-log plot of the runtime vs. number of nodes as

a function of the number of tasks on anthill.sdsu.edu [15].

Page 6: Integrating HPC Resources, Services, and Cyberinfrastructure to

false. This prevents the stack trace from being displayed

when an uncaught error is encountered.

Straight out of the box, the development.ini file directs

CyberWeb to create a folder in the top level directory of

CyberWeb to store user data files. The top level directory is

referred to by the variable “%(here)s”. This CyberWeb data

directory stores users’ public/private keys at the top level

and user directories and data in a directory below. The top

level CyberWeb data directory is not accessible via

CyberWeb or Paster, which is the same level of security that

the linux operating system uses to store these keys.

New Pylons projects are built using Paste - a Web

development and application installation tool similar to Ant

or Make. It creates python middleware modules (router,

config, and mapper template files), a simple python

Webserver that can process WSGI requests, and the

directory structure needed for a full site (to be populated by

the developer). A single command installs the application

template (all the codebase needed including the directory

structure, basic files, and data to support a new portal):

%paster create –-template=pylons pyGateSite. The portal

server is started using the following command: $paster serve

–reload development.ini, which lauches the welcome page.

Alternatively, you can install all software into a “virtual

environment” which is essentially an isolated working

directory with all libraries contained within it. This allows

developers to work independently of the host operating

system, to keep stable versioning control, and locally

manage development tasks. This facilitates distribution and

deployment of a project. This will be useful for providing a

reusable toolkit or deploying a Pylons application to a cloud

computing resource.

3. Application Examples

CyberWeb has been used to develop Web services,

Social gadgets, and portals in the biology, geospatial

mapping, and ocean application areas. The examples below

demonstrate client-server use of Web services, portals, and

visualization Web pages.

3.1 CyberWeb Demo Portal

As part of our development of CyberWeb, a user portal has

been developed that demonstrates how to build CI enabled

applications and for testing services. Figure 5 is a composite

image of the UCOAM portal (see Section 3.2), and many of

the components are contained in the demo portal The portal

system is intended to be used as a startup application for

new projects. The portals capabilities include: secure login

with group access control to selected pages; user and

account management with the MyCyberWeb customization

section; MyProxy credential management; job tracking and

history; and data management using the Jodis file transfer

widget; interactive Web pages for a Job execution and file

management that are also part of a planned unit test system

for checking resources and services. The portal also hosts

the Database Admin pages. Through the myCyberWeb

interface, a user can update preferences, send/receive

messages, track jobs, and configure authentication on

remote resources.

3.2 Unified Curvilinear Ocean and

Atmospheric Model (UCOAM)

The UCOAM model differs significantly from the

traditional approach, where the use of Cartesian coordinates

forces the model to simulate terrain as a series of steps.

UCOAM utilizes a full three-dimensional curvilinear

transformation, which has been shown to have greater

accuracy than similar models and to achieve results more

efficiently [13] [14]. CyberWeb is being used to develop a

computational environment for the UCOAM project with

the following features: community access portal for expert

and non-expert users (see Figure 5); running and managing

jobs, and interacting with long running jobs; managing input

and output files; quick visualization of results; publishing of

computational Web services to be used by other systems

such as larger climate models [Error! Bookmark not

defined.] [10].

3.3 Data Viewer Tool

The Data Visualization tool is a custom application that

is being developed for the UCOAM application. provides

portal users with the ability to run a suite of post processing

tasks including analysis of parallel performance, and model

simulations. The Data Viewer utilizes several CyberWeb

components (database, JODIS), and emerging technologies

including python, pylons, AJAX, javascript, jQuery, gnuplot

and gnuploy.py library. The tool allows users to select a job

through the data browser and then to run the Job Analyzer.

The Job Analyzer presents a job summary and a variety of

plot options. These plotting options support different plots

for viewing timing/performance, or contours of the velocity,

temperature, or and pressure changes (see Figure 5). The

plotting options are stored dynamically, and are extensible.

Once the user selects any particular plot, an Ajax call is sent

to the server, which uses Jodis to send commands to the

archival host to create the image on remote using gnuplot

scripts. This image file is then returned to the client browser

in the form of data-bytes for display.

3.4 CyberCHEQS

This example highlights the use of JODIS and

CyberWeb, we developed a simple job distribution Web

service (Jodis), which runs a many-task computing (MTC)

[15] jobs for the CyberCHEQS thermochemistry

applications [16]. The tasks were run in a heterogeneous

environment (Figure 4), using TeraGrid and SDSU

machines simultaneously, on hundreds of nodes, moving

data and results between remote resources. Using

CyberWeb, the resolution of Flame3D was increased from

103 to more than 10

6 control volumes and significant

reduction in run times (by a factor of over 40 for a large test

case of 128 processors and 107 tasks)

Page 7: Integrating HPC Resources, Services, and Cyberinfrastructure to

Figure 5. The UCOAM Simulation Portal. Composite image shows file management, user/account management,

simulation job submission and history, and visualization [Error! Bookmark not defined.].

4. Conclusions and Future Work

CyberWeb applications have been developed that

demonstrate its usefulness for client-server access to Web

services, portals, and visualization Web pages. CyberWeb is

able to operate in a “plug-n-play” mode: applications

dynamically discover modifications to the database, check

the service status, and when available, use that service or

resource. It has a “non-invasive” philosophy: no software is

required to be installed on remote resources; rather

interfaces using existing software and services are created.

CyberWeb has been used to develop applications in ocean

modeling, thermochemistry, and education; it has been used

to access large-scale (EXSEDE/TeraGrid) and local

compute and archival systems.

Future plans include enhancing or developing:

visualization tools; interactive job management; full third

party file transfer; integration of cloud computing resources;

interface to modifying the initialization JSON data for the

database; addition of new authentication systems such as

OAuth or Kerberos (both are used among in the scientific

Cyberinfrastructure community); additional queuing

systems. Plans are underway for the software to be bundled

into a Python egg, and the software will be added to the

NSF funded Open Grid Computing Environments (OGCE)

project [5], which develops gateways and tools.

5. Acknowledgements

This work was supported in part by the National Science

Foundation (Grants #0753283, #0721656), the Department

of Energy (DOE # DE-GC02-02ER25516), and with

resources available with an NSF funded XSEDE (TeraGrid)

allocation (TG-CCR110014) and the San Diego State

University Computational Sciences Research Center.

6. References

[1] NSF Vision: Cyberinfrastructure Framework for 21st Century

Science and Engineering (CF21). Available at:

http://www.nsf.gov/pubs/2010/nsf10015/nsf10015.pdf

[2] Washington W. Challenges in Climate Change Science and the Role

of Computing at the Extreme Scale. Proc. of the Workshop on

Climate Science, Nov., 2008, Washington D.C.

[3] Wilkins-Diehr, N. 2007. Special Issue: Science Gateways—Common

Community Interfaces to Grid Resources: Editorials. Concurr.

Comput. : Pract. Exper. 19, 6 (Apr. 2007), 743-749.

[4] Alameda, J, M., et. al., The Open Grid Computing Environments

collaboration: portlets and services for science gateways.

Concurrency and Computation: Practice & Experience, Volume 19

Issue 6, pg.1078, 2007.

[5] NSF NMI Open Grid Computing Environments (OGCE) project.

Website last accessed on 1-Jan-06 at http://www.ogce.org.

[6] Thomas, M. P., Cheng, C. Development of Web Application

Frameworks for Cyberinfrastructure. CSRC Technical Report, 2010.

[7] Pylons Web application Framework Website, Available:

http://pylonshq.com/

[8] The TeraGrid MDS4 Schemas Project Page:

http://dms.teragrid.org/mediawiki/index.php?title=MDS4_Schemas

[9] Open Grid Forum GLUE 2.0 XML Schema project page:

https://forge.ogf.org/sf/projects/glue-wg.

[10] Thomas, M. P., Castillo, J. E., Development of a Computational

Environment for the General Curvilinear Ocean Model, 2009 J. Phys.:

Conf. Ser. 180.

[11] The Globus Project. Available: http://www.globus.org .

[12] Maven Software Project. Available: http://maven.apache.org/

[13] Abouali M., Castillo J.E., “Unified Curvilinear Ocean Atmosphere

Model (UCOAM): A Vertical Velocity Case Study”, Journal of

Mathematical and Computer Modeling. (Accepted on March 17th,

2011), DOI:10.1016/j.mcm.2011.03.023.

[14] Mary P. Thomas, M. P., Castillo, J. E. “Parallelization of the 3D

Unified Curvilinear Coastal Ocean Model: Initial Results.“ Accepted

for publication, International Conference on Computational Science

and Its Applications, 2012. ICCSA '12.

[15] Thomas, M. P., Cheng, C., Edwards, R. A., Paolini, C. P. Improving

the Performance of Thermochemical Computations Using Many-Task

Computing Methods. CSRC Technical Report, 2010.

[16] Paolini, C. P. and Bhattacharjee, S., A Web Service Infrastructure for

Distributed Chemical Equilibrium Computation, Proceedings of the

6th International Conference on Computational Heat and Mass

Transfer (ICCHMT), May 18–21, 2009, Guangzhou, China.