“CC*Data: National Cyberinfrastructure for Scientific Data Analysis … · 2018-03-20 · clouds and CI facilities • Parameterizable templates will be provided with sane defaults

“CC*Data: National Cyberinfrastructure for Scientific Data Analysis at Scale (SciDAS)NSF-CC* [Award#1659300]

Claris Castillo, Fan Jiang, Wenzhao Zhang, Paul Ruth, Mert Cevik, Michael Stealey, Hong Yi, Gokkul Sudan, Terrell Russel, Jason Coposky, Ray Idaszak (RENCI); Alex Feltus, Melissa Smith, Ben Stealy, William Poehlman, Nicholas Mills (Clemson University); Stephen Ficklin; Tyler Biggs, Josh Burns (WSU); Blaine Lee (NCBI); Anthony Castronova(CUAHSI); Pabistra Dash (Virginia Tech); Jeff Horsburgh; David Tarbaton (Utah State University); Mats Rynge (USC)

F. Alex Feltus, Ph.D.Clemson Dept. of Genetics & Biochemistry (Associate Professor)

Allele Systems LLC (CEO)Internet2 Board of Trustees (Member)

[email protected] All Hands Meeting: 20 March 2018 @ 4pm

Not Scaling up Data Analysis is Not an Option

www.smartpractice.comWisegeek.org

Soon, many pharmacies, subways, hospitals, research labs, public health facilities, police stations, etc. will have a DNA sequencer generating Exabytes of data in aggregate each week.

qPCR is about to be replaced by highthroughput DNA sequencing.

20th Century 21st Century

Distributed petascale/exascalesystems will never be turn-key so our focus is on the “informaticist”.

SciDAS is Building a Scalable Ecosystem that Works for Researchers

Giga-/Tera-/Peta-scale

END USERS

www.iowaturfgrass.org

“If you build it they will come”

“They will help build you it while using it.”

Geneticist

StorageEngineer

SoftwareEngineer

SciDAS embeds active high-scale data usersin the cyberinfrastructure engineering stack.

Systems BiologistHydrologist

Education TheoristPlant Biologist

GenomicistBioinformaticist

Space Biologist

Computer ScientistNetwork EngineerStorage Engineer

Visualization ExpertHCI Expert

SciDAS Ecosystem: CI, clouds and community platforms

End User Facing

Cloud/infrastructure/compute

Networks

Storage infrastructure

+100 sites +1500 users

CLI

SciDAS Breakdown

Leverage network capabilities to enable efficient data movement

Infrastructure-agnosticabstraction layer for compute

Programmable and policy-able storage-agnostic abstraction layer for data

User interface that is consumable and builds on reproducible artifacts+100 sites +1500 users

CLI

SciDAS Breakdown





CLI

SciApps: Towards Reproducible Science• Scientific applications will be available in the form of SciApps “virtual

appliances” (CC-ADAMANT, [works15])• Borrowed concept from ’virtual appliance’, i.e., virtual machine image• A SciApp is configured with the application software needed to reproduce an experiment with

the highest fidelity possible• A SciApp may consists of multiple containers spanning over a virtual network across multiple

clouds and CI facilities • Parameterizable templates will be provided with sane defaults to meet the needs of

the scientists

[works15] Enabling Workflow Repeatability with Virtualization Support, Fan Jiang et.al. Workshop on Workflows of Large-Scale Science, Supercomputing Conference (SC15), Austin, Texas,2015.

Jupyter SciApp• Jupyter notebooks are a critical

tool to enable reproducibilescience

• A Jupyter notebook provides a workspace that present data and code on a cohesive workspace environment

Fan

{"id": ”jupyter-test","containers": [

{"id": "bw","image": "scidas/irods-jupyter-base","resources": {"cpus": 1,"mem": 1024

},"port_mappings": [

{"container_port": 8888,"host_port": 0,"protocol": "tcp"

}],"args": ["--ip=0.0.0.0","--NotebookApp.token=\"\""

],"data": [

"/hydroZone/home/scirodsapi/Yeast-GEM.tar.gz"]

}]

}

Resource demand

Reference data

Container image

SciDAS provides the illusion of running the experiment on a very large computer

HTCondor SciApp• HTC flocking allow a condor job to

run across multiple pools• A mechanism to scale out• Commonly used on Open Science Grid

(OSG) and campus HTC pools

• HTC SciApp infrastructure consists of: • submitter node: to serve as an access

point for scientists to interface with HTC clusters

• master nodes: one for each HTC pool• worker nodes: worker nodes in each

HTC pool

{"id": ”pegasus-htc","containers": [{"id": "submitter","image": "scidas/kinc-submitter","resources": { "cpus": 2, "mem": 4096,"disk": 10240},"cluster": "chameleon","port_mappings": [{"container_port": 22, "host_port": 0,"protocol": "tcp"}],"args": [

"-f","chameleon-master,aws-master,azure-master", "-k", "ssh-rsaAAAAB3NzaC1yc2EAAAADAQABAAABAQC303e2y8aUaMQ1IkHWnGFyb5XykxOM5pLK83XFxWZMKsbYcgmkoODZ4w4COratlQPyMXSz7yaFUbYUccXjIjz8SDZf/9c3xI0UuILOiVfb5Ql/dsfssgsfvxcvfdsss321nksnvsnvlkvlksvkkdkddvlkssvn/xk+TORZYK3CE3Oqu9p77nrFM7W3M5khsb5Qg/z0W1TQmVWvo5/i3QbDK6YaWhw/0DXjfCeEtdlTVdIq1EJxMWuJnm5IptB1EtG9GBhuHq5Ct2XkUh",

"-u”, "irodsuser","-p",“fdsfsfsdczxv3rr3r","-h","irods-renci.scidas.org", "-z", “irodsZone"

]},{"id": "chameleon-master","image": "scidas/htcondor-worker-centos7:1","cluster": "chameleon","resources": {"cpus": 48,"mem": 49152,"disk": 10240

},Input Output

SciDAS Breakdown




Infrastructure-agnosticabstraction layer for compute+100 sites +1500 users

CLI

Abstracting compute resources with Mesos

• Apache Mesos: A layer of abstraction, to utilize an entire data center as a single large server

• A hypervisor which abstract physical computing resources (CPU, memory, storage) and presents them directly to applications

Mesos is more light weight.

Large clusters: Twitter (>10,000), Netflix (>80,000)

Adopted in more than 100 organizations, e.g., Twitter, Netflix, Airbnb, eBay

http://mesos.apache.org/documentation/latest/powered-by-mesos/

Geo-Mesos: Mesos on Geo-Distributed Environments

• Mesos was initially developed to target single owner cluster environments• RENCI extended Mesos to support Geo-Distributed Environments• RENCI developed a meta-orchestrator to which independent Mesos clusters

subscribe to and handle meta-scheduling on behalf of frameworks • Mesos extensions: (2500 + 200 lines of code)• Meta-orchestrator (600 lines of code)

• Mesos as a resource discovery layer

Wenzhao

Requester

Orchestrator

Chronos Marathon

Mesos Cluster 1

…

Geo-Mesos

++

Chronos Marathon

Mesos Cluster 2

…

Resourcediscovery

++…

1. Identify resources available among registered Mesos clusters

Demand: <4cpu,4gb, gpu=true>, RefData: /irods/path/refdata.txt,dockerImage: myDockerImage

?

DockerHub

Requester

Orchestrator

Chronos Marathon

Virtual Compute/Storage Resources

…

++

Chronos Marathon


…

Resourceiscovery

++…

?

DockerHub

Demand: <4cpu,4gb, gpu=true>, RefData: /irods/path/refdata.txt,dockerImage: myDockerImage Geo-Mesos

2. Offers: {<agent1, master, 4cpu,4gb, gpu=true>,…, <agent2, 2cpu, 4gb>}

Requester

Orchestrator

Chronos Marathon


…

++

Chronos Marathon


…

Resourceiscovery

++…

3. Intelligent decision making process to select best resources for request

DockerHub

Demand: <4cpu,4gb, gpu=true>, RefData: /irods/path/refdata.txt,dockerImage: myDockerImage Geo-Mesos

Requester

Orchestrator

Chronos Marathon


…

++

Chronos Marathon


…

++…

Find best candidate host for compute</irods/path/file.txt, {offer1,…,offer2}>DockerHub

Data-network aware placement

Geo-Mesos

Requester

Orchestrator

Chronos Marathon


…

++

Chronos Marathon


…

Resourcediscovery

++…

5. Best offer:{{clusterid, agentid, 4cpu…}

Fan

DockerHub

Geo-Mesos


Requester

Orchestrator

Chronos Marathon


…

++

Chronos Marathon


…

++…

6. Submit request(s) to deploy app on pre-selected mesos clusters and resources

DockerHub

Geo-Mesos


Separation of concerns between Mesos and Frameworks

SciDAS Appliance Requester

• Extended Requester to support complex SciApps such as Pegasus/HTCondor

• Deploy multiple containers• Orchestrate the provisioning and configuration of multiple containers

• Adopt Container Network Interface (CNI) to configure Virtual Container Network (VCN) across multiple Clouds

• Multiple choices. We use Weave.

• Formalization of SciDAS API (URL)

Deploying “HTCondor/Pegasus” SciApp

Orchestrator

Requester

The Requester acquires resource offers from the Orchestrator and pins the Pegasus/HTCondor containers to each cluster across Clouds as specified

SciDAS Breakdown





CLI

SciDAS Storage Infrastructure

Cost-AwareOptimize

iRODSShim (aaS)

API

PerfSONARShim (aaS)

API PerfSONARmapping

Requester

Orchestrator

Middleware

1PB Stge 2PB Stge 1PB Stge

Network

https://app.swaggerhub.com/apis/mjstealey/network-aware-irods-api/1.0.0

http://sc17demo1.scidas.org:9094/ui/#/Network

Virtualization System Metadata to encode rich information

Rule engine programmed with rules to enact policies Data Federation

iRODS: Integrated Rule Oriented Data System

• iRODS provides a unified namespace over SciDAS storage infrastructure across Clemson, RENCI and WSU

• iRODS provides enable policy-driven management critical to data-sharing collaborations in SciDAS

WAN

SciDAS: Wide area iRODS deploymentSciDAS provided a very significant first for the iRODS community

iCat

resourcenodes

iRODS Zone

Traditional iRODS deployment SciDAS (desired) iRODS deployment

iRODS federation as a workaround

iRODS Zone: administrative unit with full localized governance

iRODS over the Wide Area Network (WAN)

• iRODS team connected iRODS to a MariaDB Galera Cluster to provide a multi-master, distributed iRODS catalog.

SciDAS Zone

MariaDB Galleracluster

“Distributing the iRODS Catalog: a way forward”, M. Stealey, et. al. iRODS User Group Meeting (UGM), Netherlands, 2017.

Michael S.

SciDAS provided a very significant first for the iRODS community

Jason C

Terrell R.

SciDAS Breakdown




+100 sites +1500 usersCLI User interface that is consumable and

builds on reproducible artifacts

Network-aware Data and Compute Management• Layer-2 connectivity between NCBI and iRODS data grid via stitch-ports through dynamic network provided by ExoGENI

(CC-ADAMANT) • PerfSONAR network deeply integrated with middleware management

• Deployment of a PerfSONAR network across compute and storage node to drive intelligent placement of compute and data• Deploying PerfSONAR infrastructure across commercial clouds comes with financial challenges

• Development of Network-Optimizer Algorithm (as-a-service) • Bridge PerfSONAR, compute and storage networks• Identify optimal placement of compute or data based on network monitoring information

• Development of iRODS Shim Service• Map logical path of data object to hostname of iRODS resource node hosting data object

• Procurement of FIONA nodes for NRP integration• Clemson done. WSU done. RENCI’s in progress.

Network

SciDAS Data and Network Infrastructure

Network-Aware Compute Placement

1PB Stge/ FIONA 2PB Stge./FIONA 1PB Stge./FIONA

Cost-AwareOptimize

iRODSShim (aaS)

API

PerfSONARShim (aaS)


2. Get _hostname_ of iRODS resource node hosting _logicalpath_ (_logicalpath_)

1. Find host with best network connectivity to transfer data object (_logicalpath_) to among a set of candidate hostsParams: (_logical_path_, {_hostA_,_hostB,…})

3. Get _networkconnectivity_ (bw) between two hosts (_hostA_, _hostB)

Service maintains mapping of _hostname_ PerfSONAR node

Paul R

Mert

Gokkul

Michael S



Current SciDAS Middleware and Infrastructure

1PB Stge/ FIONA 1PB Stge./FIONA 1PB Stge./FIONA

Cost-AwareOptimize

iRODSShim (aaS)

API

PerfSONARShim (aaS)


Requester

Orchestrator

Network



SciDAS Current Directions• Need more democratized compute for scale up

• We are working with OSG to flock out of SciDAS onto OSG• Experimenting with XSEDE allocations• FUTURE: SLATE?/NRP-ML?/More Commercial Cloud

• Building Gene Co-expression Networks (GCNs) with HTCondor (OSG-KINC) SciApp• Yeast unit test constructed many times• Lung Tumor/Normal GCN almost complete• 100TB (raw) Arabidopsis experiment underway• More Species in the queue

• Building Additional SciApps• SLURM/Nextflow Dockerized “Tuxedo Suite” Genomics Applications• FUTURE: Distributed Visualization and Deep Learning Apps we have developed

• iRODS Data Grid Production• Build iRODS data retention policies (qualitative then quantitative)• Leveraging 1000 species indexed genomes with metadata• 100TB Arabidopsis raw data input experiment• Data movement optimization (NCBI, WSU, RENCI, Clemson, StashCache?)

• Solve Authentication Issues • CILogon

Bio Application

Biology community needs addressed by SciDAS• Containerized Scientific Workflows: Gene Expression Matrix Construction

(GEM), Gene Coexpression Network (GCN) Construction, and Gene Network Visualization (GNV). These have broad applicability.

• Automated Resource Wrangling: These workflows need to map to user-authorized cloud resources with heterogenous resources to run at tera-/peta-scale, which will be the normal range for many researchers. We are stress testing using public RNAseq data from hundreds of organisms.

• Computational Experimental Design and User Intervention: A UI is needed to predict the wall-time of high-scale experiments, monitor jobs, and allow for user control of job resets and prioritization.

• Collaborative Data Organization: WAN collaborative storage with standardized data retention policies (iRODs).

Petascale Bio Driver: What are the ancestral Gene Interaction Networks for all Organisms?

Rice

MaizeConserved Paleogenomic

Fossil Subnetworks(60-80 million years old)

“CC*Data: National Cyberinfrastructure for Scientific Data Analysis … · 2018-03-20 · clouds and CI facilities • Parameterizable templates will be provided with sane defaults

Documents