1 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute Part III Computational Workflows in Wings/Pegasus AAAI-08 Tutorial on Computational Workflows for Large-Scale Artificial Intelligence Research
Jan 16, 2016
1Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Part III
Computational Workflows in Wings/Pegasus
AAAI-08 Tutorial on Computational Workflows for
Large-Scale Artificial Intelligence Research
2Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Our Approach
Express analysis as distributed workflows• Data analysis as distributed application
User-centric workflow refinement process • Start with high-level problem description, add layers of detail,
map to distributed execution environment Knowledge-rich descriptions of workflows -- OWL/RDF
• Descriptions of input data and data products (aka “metadata”)• Models of components in terms of I/O data and their function
Automation of resource allocation and optimization• Efficient scheduling algorithms for workflow graphs• Optimization techniques of broad applicability
Build on distributed computing research -- GRID• Designed, by definition, to be robust, secure, flexible
3Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
The Wings/Pegasus Workflow System[Gil et al 07; Deelman et al 03; Deelman et al 05; Kim et al 08; Gil et al forthcoming]
Grid servicescondor.uwisc.eduwww.globus.org
Pegasus:Automated workflow refinement and executionpegasus.isi.edu
WINGS:Knowledge-based workflow environmentwww.isi.edu/ikcap/wings
•Ontology-based reasoning on workflows and data (W3C’s OWL)
•Workflow library of useful analyses
•Proactive assistance +automation
•Execution-independent workflows
•Optimize for performance, cost, reliability
•Assign execution resources•Manage execution through DAGMan
•Daily operational use in many domains•Secure and controlled sharing of distributed services, computing, data
•Scalable service-oriented architecture
•Commercial quality, open sourceIBM
IBM
IBM
IBM
4Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
WorkflowSelection
WorkflowTemplate
DataSelection
WorkflowInstance
WorkflowLibraries
Data Repositories
Application Components
Ontologies:Domain terms,
Component types,Workflow Products
- Preexisting data collections- Workflow execution results
“Show meworkflows that classifydatasets”
“Run this workflowwith theweather1980 data set”
“Validate this workflowbased on the component specs”
STUDENT
SEASONED NL RESEARCHER
WorkflowCreation
ALGORITHM DEVELOPER
-Workflow templates specify complex analyses sequences- Workflow instances specify data
“Here is a newclassification algorithm,has a parameter for smoothing, is compiled for MPI”
Component Specification
Executable WorkflowPegasus
WINGS
- Specifies data requirements- Specifies execution requirements
DAGMan/Grid
(OWL)
Wings: Workflow Instance Generation and Selection
5Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
April 21, 2023
© 2005 TANGRAM
5
Globus RLSreplicamgmt
GRAMremote
submission
GridFTPdata
transfer
Condor DAGManexecution
engine
Condor-Gjob
manager
Nagiosmonitoring
probes
PegasusSite
selectionReplica
selectionWorkflow
optimization
WingsWorkflowvalidation
Data/Compselection
Metadatageneration
Workflowgeneration
NationalMiddleware
Infrastructure(NMI) software
Workflowsubmission
LEGEND:
Workflow System
All softwareis open source
6Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Workflow Structure We take to heart the separation of “programming”
from “analysis” activities– Components are designed by programmers and can be
complex (and need testing, debugging, loops should terminate, etc)
– Workflows are composed by non-programmers and should have simple structure-- focus is on selecting application components and data
Therefore, our workflow structure is very streamlined• Only iterations handled are parallel data processing
pipelines• Only conditionals handled are data-driven component
selections• Standard workflow languages offer much more complex
constructs Workflow structure designed to:
• Be accessible to users• Facilitate automation and failure recovery
7Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Core Workflow Concepts
C1
C2
F1
F4
F6
F2
Workflow consists of• Components: software to be executed• Links: data flow among components
Directed Acyclic Graphs (DAGs)• Facilitate automation, esp. execution
monitoring and repair Data always handled through files Special handling of some control
constructs loops (more on this later)• Choices of components• Iterations over data sets
Layered workflow refinement process• Select application components ->
select data -> select execution resources
Each layer adds more information to the same basic workflow structure
C3
F5
F3
F5
8Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Workflow Abstraction Layers We use several layers of description of workflows
9Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
WINGS:Workflow Representation
10Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
F2-operation-SA-Median-Distance-JB F2-operation-SA-Median-VS30
Compute-F2-SA-Median-wrt-Distance-JB-given-Fault-Type-&-Basin-Depth-&-…
Compute-F2-SA-MEDIAN-wrt-VS30-given-Fault-Type-&-Basin-Depth-&-…
Hazard-Level
Hazard-Level-with-SA
Hazard-Level-with-PGA
Hazard-Level-with-PGV
Compute-Hazard-Level-given-IMR-input-parameters
. . .
. . .
Compute-Hazard-Level-with-SA-given-IMR-input-parameters
Compute-Hazard-Level-with-PGA-given-IMR-input-parameters
Compute-Hazard-Level-with-PGV-given-IMR-input-parameters
Hazard-Level-with-SA-Median
Hazard-Level-with-SA-Std-Dev
Hazard-Level-with-SA-Prob-Exc
Hazard-Level-with-Median
Hazard-Level-with-Std-Dev
Hazard-Level-with-Median
. . .
Compute-Hazard-Level-with-SA-Median-given-IMR-input-parameters
Compute-Hazard-Level-with-SA-Std-Dev-given-IMR-input-parameters
Compute-Hazard-Level-with-SA-Prob-Exc-given-IMR-input-parameters
IMR-Input-Parameter
Field-2000-Input-Parameter
Parameter
Fault-Type
Basin-Depth
Distance
. . .
. . .Compute-F2-SA-Median-given-Field-2000-input-parameters
Compute-F2-Hazard-Level-given-Field-2000-input-parameters
F2-Hazard-Level
. . . . . .Domain OntologyOntology of Components
IMTprobability-function
IMR
probability-function
F2-SA-Median-wrt-VS30
. . .
11Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
F1
WINGS: Representing Components
Any input or output can be defined as a file collection
• Same file type• Unspecified cardinality• Ordered
Inputs and outputs through files• Files are typed
Each input is uniquely identified by a file descriptor (~ parameterID)
Ordered lists of file descriptors for both I and O
C-one
D1
D3
D2
C-many
F1
D13
F1DC11 D12
12Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Data Descriptions Metadata of different
kinds can be organized in ontology
Files represented as instances and classified in ontology according to their metadata
File collections also represented as instances and defined as ordered sets of file instances
A file Skolem is created for each class as a representative instance (more on this later)
Similarly, a file collection Skolem is created for each class
Application-Specific
Metadata Ontologies
ContentMetadata
FormatMetadata
Kim-Homepage
EHS-T
File Collection
Gil-Homepage
Kim-Homepage
Gil-Homepage
…
EHCS-T
IKCAP-pages
13Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
A Component in a Workflow Template
C-one
D1
D3
D2
Nodes correspond to individual application components
Links include file descriptors for origin and destination and a file Skolem
C-one
D1
D3
D2
Link
Node
C67C67
D6
D7
D6C67
D6
L1 L2
L3
L4
N1
N2
N3
FS-A FS-B
FS-C
FS-DNotation: “S” marks a Skolem
14Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
File Collections in a Workflow Template
F1
Links that include file descriptors that are collections refer to file collection Skolems
Using the same file Skolem ID or file collection Skolem ID in different links indicates identity
F1F1DC11 D12
C-many
D13
F1
C-many
F1
D13
F1DC11 D12FS-B
FS-C
L1L2
L3
N1
FCS-A
15Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Iteration Over File Collections in a Workflow T
Iteration over sets compactly represented with single nodes that contain component collections
Will be expanded to as many jobs as files are specified for the executable workflow
Links capture formation of file collections as input
C-one
G1
Z1
D1 D2
D3
C-many
C-one
Z2
C-one
Z88
…
…
…
K1 G2 K2 G88 K88
L1 L2
L3
C-manyN2
D12
L4
FS-Y
Y1
C-one
D1
D3
D2
F1
C-many
F1
D13
F1DC11 D12F1F1F1DC11
FCS-G FCS-K
FCS-Z
C-one
NC1
16Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Iteration With a Constant in a Workflow T
Nodes that represent component collections can take the same file from the same link when the link contains a file Skolem instead of a file collection Skolem
C-one
G1
Z1
C-many
C-one
Z2
C-one
Z88
…
…
…
K1 G2 K1 G88 K1
Y1
C-one
D1
D3
D2
F1
C-many
F1
D13
F1DC11 D12
D1 D2
D3
L1 L2
L3
C-manyN2
D12
L4
FS-YF1F1F1DC11
FCS-G
FCS-Z
C-one
NC1
FS-K1
17Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Constraints on Workflow Templates
CybershakeTemplate
InputLink_SiteNameFile_to_BoxNameCheck
hasSiteName
InputLink_RuptureVars_to_SeisgmogramGen
hasLink
…
F-RV
C-RuptVars
CC-RuptureVariations
InputLink_SGTCollforRup_to_SeismogramGen
F-SGT
C-SGT-forRups
CC-SGTs
hasFile
hasFile
hasFile
SGTsSiteName
SiteNameFile
hasSiteName
SiteName
N_Rups
hasN_Items
hasN_Items
…
… isSameAs
Constraints on number of elements in different collections
Constraints on files/collections of different workflow components
18Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Workflow Instances
C-one
D1
D3
D2
C67
D6
C-plenty
L1 L2
L3
N1
N2N3
FS-A FS-B
F-C
D7
C-one
D1
L5
N4
FS-ED8
D2
L6
FS-F
D3L7
FS-G
DC9
L4
File85
File28
F34254-05-06-08
FileColl54
F34256-05-06-08
F34255-05-06-08
F34257-05-06-08
Existing data
New data products
Input data selected from the file library by querying for files of the type of file Skolems
Logical names created for new data products with metadata based on file Skolems
Compact Workflow Instance = WT + bindings
Easy to understand, and easily transformed into an expanded WI and a DAX for Pegasus
Bindings
FCS-D
19Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
AUTOMATED WORKFLOW INSTANCE GENERATION
IN WINGS
20Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Corpus
Kernel_RulesSplit
Filter_Rules
Prune_Rules
Binarize Generate_Rule_Map
Compile
XRS_Rules BRF_Rules Lexicon_Dictionary
1…n
1…n
1…n 1…n
WSJ-2001
KR-09-05
…
…
WSJ-2001KR-09-05
Workflow Instance Expressions
•Compact expression for efficient search and matching
•Expanded expression when further details are needed
21Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Expanded Workflow Instance<rdf:RDF ..(xmlns definitions)....><wflns:WorkflowInstance rdf:ID="WFT0b"> <wflns:hasDescription rdf:datatype=MailScanner has detected a possible fraud attempt from "www.w3.org" claiming to be MailScanner has
detected a possible fraud attempt from "www.w3.org" claiming to be "http://www.w3.org/2001/XMLSchema#string"> Count the number of unique words in a file </wflns:hasDescription> <wflns:hasNode rdf:resource="#N1"/> <wflns:hasNode rdf:resource="#N2"/> <wflns:hasLink rdf:resource="#L12"/> <wflns:hasLink rdf:resource="#L01"/> <wflns:hasLink rdf:resource="#L2Output"/> </wflns:WorkflowInstance> <wflns:InOutLink rdf:ID="L12"> <wflns:hasOriginFileDescription rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics/componentLibrary.owl#remDupesOutputFile"/> <wflns:hasFile rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics/fileLibrary.owl#F12_WFT0b_1117161532484"/> <wflns:hasDestinationFileDescription
rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics/componentLibrary.owl#CountWordsInputFile"/> <wflns:hasDestinationNode> <wflns:Node rdf:ID="N2"> <wflns:hasComponent rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics/componentLibrary.owl#countWordsV1"/> </wflns:Node> </wflns:hasDestinationNode> <wflns:hasOriginNode> <wflns:Node rdf:ID="N1"> <wflns:hasComponent rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics/componentLibrary.owl#removeDuplicatesV1"/> </wflns:Node> </wflns:hasOriginNode> </wflns:InOutLink> <wflns:InputLink rdf:ID="L01"> <wflns:hasDestinationNode rdf:resource="#N1"/> <wflns:hasFile rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics/fileLibrary.owl#test_txt_WFT0b_1117161532484"/> <wflns:hasDestinationFileDescription
rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics/componentLibrary.owl#remDupesInputFile"/> </wflns:InputLink> <wflns:OutputLink rdf:ID="L2Output"> <wflns:hasOriginFileDescription
rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics/componentLibrary.owl#CountWordsOutputFile"/> <wflns:hasOriginNode rdf:resource="#N2"/> <wflns:hasFile rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics/fileLibrary.owl#F2Output_WFT0b_1117161532484"/> </wflns:OutputLink></rdf:RDF>
22Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
W Instance: “dax” for Pegasus<?xml version="1.0" encoding="UTF-8"?><!-- generated: 2004-08-18T10:53:01-05:00 --><adag xmlns="http://www.griphyn.org/chimera/DAX"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.griphyn.org/chimera/DAX http://www.griphyn.org/chimera/dax-1.8.xsd" version="1.7" count="1" index="0" name="WorkFlow0b">
<!-- part 1: list of all files used (may be empty) --> <filename file="vahi.f.a" link="input"/> <filename file="vahi.f.b1" link="inout"/> <filename file="vahi.f.b2" link="output"/><!-- part 2: definition of all jobs (at least one) --> <job id="ID000001" namespace="vds" name="removeDups" version="1.0" level="3" dv-namespace="vds" dv-
name="top" dv-version="1.0"> <argument>-a top -T60 -i <filename file="vahi.f.a"/> -o <filename file="vahi.f.b1"/> </argument> <uses file="vahi.f.a" link="input" dontRegister="false" dontTransfer="false"/> <uses file="vahi.f.b1" link="output" dontRegister="true" dontTransfer="true" temporaryHint="true"/> </job> <job id="ID000002" namespace="vds" name="countWords" version="1.0" level="2" dv-namespace="vds" dv-
name="left" dv-version="1.0"> <argument>-a left -T60 -i <filename file="vahi.f.b1"/> -o <filename file="vahi.f.b2"/> -p
0.5</argument> <uses file="vahi.f.b1" link="input" dontRegister="false" dontTransfer="false" temporaryHint="true"/> <uses file="vahi.f.b2" link="output" dontRegister="true" dontTransfer="true" temporaryHint="true"/> </job><!-- part 3: list of control-flow dependencies (empty for single jobs) --> <child ref="ID000002"> <parent ref="ID000001"/> </child></adag>
23Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
AUTOMATED METADATA GENERATIONIN WINGS
24Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Metadata Reasoning for file name generation and workflow validation
Filename Generation• Explicit representation of metadata in ontology (e.g.
source id, rupture id)• Propagate metadata attributes for all data products when
creating workflow instance• Names for intermediate files are created automatically
from the metadata Workflow Validation
• Explicit representation of metadata constraints (examples are shown below)
– Constraints on individual files and collections– Constraints on component inputs and outputs – Constraints among components in a workflow
• Check constraints while generating workflow instantiations
25Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Propagation of metadata for filename generation: an example
SeismogramGen_Li
RVM
127_6.rvm- source_id: 127- rupture_id: 6
Rupture_variationRupture_variation
127_6.txt.variation-s0000-h0000- source_id: 127- rupture_id: 6- slip_relaization_#:0- hypo_center_#: 1
127_6.txt.variation-s0000-h0000- source_id: 127- rupture_id: 6- slip_relaization_#:0- hypo_center_#: 1
127_6.txt.variation-s0000-h0001- source_id: 127- rupture_id: 6- slip_relaization_#:0- hypo_center_#: 1
127_6.txt.variation-s0000-h0001- source_id: 127- rupture_id: 6- slip_relaization_#:0- hypo_center_#: 1
SGT
127_6.txt.variation-s0000-h0000- source_id: 127- rupture_id: 6- slip_relaization_#:0- hypo_center_#: 1
127_6.txt.variation-s0000-h0001- source_id: 127- rupture_id: 6- slip_relaization_#:0- hypo_center_#: 1
FD_SGT/PAS_1/A/SGT161- site_name: PAS- tensor_direction: 1- time_period: A- xyz_volumn_id: 161
127_6.txt.variation-s0000-h0001- source_id: 127- rupture_id: 6- slip_realization_#:0- hypo_center_#: 1
Seismogram
Seismogram_PAS_127_6.grm-site_name: PAS-source_id: 127-rupture_id: 6
… …SGT
26Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
AUTOMATIC WORKFLOW GENERATION IN WINGS
27Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Automatic Template-Based Workflow Generation Algorithm
WR0: Workflow Template
Workflow request =
Workflow Template
+
Seed Constraints
Seed workflow from request
unified well-formed request
Find input data requirements
seeded workflows
Data source selection
binding-ready workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows
executable workflows
Workflow ranking
top-k workflows dataVariable5 data:contains data:Muti-party-communicationdataVariable0 data:creator 5048dataVariable1 data:creator 5048
WR0: Seed Constraints
28Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Step 1: Workflow Template is Seeded
unified well-formed request
Find input data requirements
seeded workflows
Data source selection
binding-ready workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows
executable workflows
Workflow ranking
top-k workflows
Seed workflow from request
29Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Step 2: Backward Sweepunified well-formed request
Find input data requirements
seeded workflows
Data source selection
binding-ready workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows
executable workflows
Workflow ranking
top-k workflows
Seed workflow from request
30Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
E-07
S-NY
Step 3: Select Data Sourcesunified well-formed request
Find input data requirements
seeded workflows
Data source selection
binding-ready workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows
executable workflows
Workflow ranking
top-k workflows
Seed workflow from request
31Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
E-07
S-NY
Step 3: Select Data Sourcesunified well-formed request
Find input data requirements
seeded workflows
Data source selection
binding-ready workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows
executable workflows
Workflow ranking
top-k workflows
Seed workflow from request
32Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
E-07
S-NY
Step 4: Forward Sweepunified well-formed request
Find input data requirements
seeded workflows
Data source selection
binding-ready workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows
executable workflows
Workflow ranking
top-k workflows
Seed workflow from request
33Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
E-07
S-NY
Result-PartA
Result-PartB
Step 5: Workflow Instantiationunified well-formed request
Find input data requirements
seeded workflows
Data source selection
binding-ready workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows
executable workflows
Workflow ranking
top-k workflows
Seed workflow from request
34Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
E-07
S-NY
Result-PartA
Result-PartB
Step 5: Workflow Instantiationunified well-formed request
Find input data requirements
seeded workflows
Data source selection
binding-ready workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows
executable workflows
Workflow ranking
top-k workflows
Seed workflow from request
35Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
E-07
S-NY
Result-PartA
Result-PartB
<job id = “j42” name=“Neuman-BC”> <argument> -i E-07 17.5 -o ES-07….
parent
parentparent
parent
parent
Step 6: Workflow Grounding
Ground Workflow
Seed workflow from request
unified well-formed request
Find input data requirements
seeded workflows
Data source selection
binding-ready workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows
executable workflows
Workflow ranking
top-k workflows
36Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
W1: estimated exec time 3hrs W2: estimated exec time 20hrs
W3: estimated exec time 3dW4: estimated exec time 5hrs
Step 7: Workflow RankingSeed workflow from request
unified well-formed request
Find input data requirements
seeded workflows
Data source selection
binding-ready workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows
executable workflows
Workflow ranking
top-k workflows
37Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Seed workflow from request
unified well-formed request
Find input data requirements
seeded workflows
Data source selection
binding-ready workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows
executable workflows
Workflow ranking
top-k workflows
W1: estimated exec time 3hrs W2: estimated exec time 20hrs
W3: estimated exec time 3dW4: estimated exec time 5hrs
Step 7: Workflow Ranking
38Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Ground workflow: 15 compute nodesdevoid of resource assignment
41
85
10
9
13
12
15
9
4
837
10
13
12
15
13 data stage-in nodes
11 compute nodes (1-2&5-6 reduced based on available intermediate data)
8 inter-site data transfers
14 data stage-out nodes to long-term storage
14 data registration nodes (data cataloging)
Executable workflow: mapped to 3 sites
Step 8: Workflow MappingSeed workflow from request
unified well-formed request
Find input data requirements
seeded workflows
Data source selection
binding-ready workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows
executable workflows
Workflow ranking
top-k workflows
39Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Why Do We Automate All This?So You Don’t Have To
Request ID
# Binding-Ready
Workflow Candidates
# Bound Workflow
Candidates
# Configured Workflow
Candidates
# Calls to c:find-DODs-given-output-requirements
# Calls to
d:find-data-
objects
# Calls to c:predict-DODs-given-input-requirements
Workflow Generation
Time
R1 6 8 8 1 6 8 5 s
R2 6 8 8 7 6 16 4 s
R3 6 24 24 7 6 48 7 s
R4 6 24 24 13 6 72 8 s
R5 18 64 48 7 18 128 22 s
R6 18 288 216 7 18 576 81 s
R7 18 16 12 7 18 32 10 s
R8 6 0 0 1 6 0 1 s
Seed workflow from request
unified well-formed request
Find input data requirements
seeded workflows
Data source selection
binding-ready workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows
executable workflows
Workflow ranking
top-k workflows
Workflow candidates generated + considered(many are eliminated)
Queries aboutdata
Queries abouttools
40Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
WINGS DEMO
41Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Editing a Seed & Template,Generating a DAX
WR0: Workflow Template
dataVariable5 data:contains data:Muti-party-communicationdataVariable0 data:creator 5048dataVariable1 data:creator 5048
WR0: Seed Constraints
Workflow seed =
Workflow Template
+
Seed Constraints
Seed workflow from request
unified well-formed request
Find input data requirements
seeded workflows
Data source selection
candidate workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows (DAXes)
executable workflows
Workflow ranking
top-k workflows
42Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
SCEC WORKFLOWS IN WINGS
43Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
InSAR Image of theHector Mine Earthquake
• A satellitegeneratedInterferometricSynthetic Radar(InSAR) image ofthe 1999 HectorMine earthquake.
• Shows thedisplacement fieldin the direction ofradar imaging
• Each fringe (e.g.,from red to red)corresponds to afew centimeters ofdisplacement.
SeismicHazardModel
Seismicity Seismicity PaleoseismologyPaleoseismology Local site effectsLocal site effects Geologic structureGeologic structure
FaultsFaults
StressStresstransfertransfer
CrustalCrustalmotionmotion
CrustalCrustaldeformationdeformation
Seismic velocitySeismic velocitystructurestructure
RuptureRupturedynamicsdynamics
Seismic Hazard Analysis in Southern California Earthquake Center (SCEC) [Slide from T. Jordan]
44Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Intensional descriptions of data sets
Intensional descriptions of parallel computations
Querying results of other data creation subworkflows
Rich metadata descriptions for all data products
Reusable High-Level Workflow Templates
45Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Workflows for Seismic Hazard Analysis [Gil et al 06; Kim et al 06; Gil et al 07]
Input data: a site and an earthquake forecast model
• thousands of possible fault ruptures and rupture variations, each a file, unevenly distributed
• ~110,000 rupture variations to be simulated for a given site
High-level template combines 11 application codes
8048 application nodes in the workflow instance generated by Wings
24,135 nodes in the executable workflow generated by Pegasus, including:
• data stage-in jobs, data stage-out jobs, data registration jobs
Executed in USC HPCC cluster, 1820 nodes w/ dual processors) but only < 144 available
• Including MPI jobs, each runs on hundreds of processors for 25-33 hours
• Runtime was 1.9 CPU years Provenance records kept throughout the
generation and execution process for 100,000 workflow data products
46Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
DAX automatically generated from WINGS
14,639 jobs for 4,626 ruptures with 106,124 rupture variations for USC site
47Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Summary:Creating Workflows with WINGS
Separates analysis spec from data• Workflow template as reusable well-defined acceptable analysis process• Workflow instance binds template to data for particular analyses
Ensures that the data complies with the component specifications and their constraints within the workflow
Represents data collections (nominal or otherwise) within the workflow specification
Automatically generates descriptions and metadata to new data products to be created by the workflow execution
Compact workflow instance is user-friendly and reusable • Separates data provenance (workflow instance) and pedigree (workflow
template) Expands workflow instance into DAX for Pegasus, which creates
the executable workflow
48Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Key Benefits
Efficient and correct creation of new workflows• By retrieving a template and filling in the data
Framework ensures adherence to methodology• Represents as templates widely-accepted analysis
methodologies• Supports repeatability of experiments/analyses• Enables controlled variations
Ensures better quality of data analysis results• Attaches provenance and pedigree information
49Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Ongoing and Future Work
Interactive assistance in creating valid workflow templates• Based on CAT (Composition Analysis Tool) [Kim et al 05]
More sophisticated models of components Automatic completion of workflow’s data conversion
and formatting steps through AI planning techniques Tracking new versions of components, invalidate
data and workflows from old versions Workflow template libraries
• Indexing, retrieval Managing collections of workflows as part of an
overall analysis activity• Eg: parameter sweeping, variants of analysis
50Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
BACKUP SLIDES
51Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
…
Extension 1: Handle Collections of Collections
SGTSGT127_6.txt.variation-s0000-h0000127_6.txt.variation-s0000-h0000127_6.txt.variation-s0000-h0001127_6.txt.variation-s0001-h0000127_6.txt.variation-s0001-h0001 …20_0.txt.variation-s0000-h0000 …150_11.txt.variation-s0000-h0000…
SGTSGT127_6SGT20_0.txt.variation-s0000-h0000
SGT150_11.txt.variation-s0000-h0000
…
For rupture 127_6 (source ID 127, rupture ID 6), there are 8 variationsFor rupture 20_0(source ID 20, rupture ID 0), there are 1352 variationsA set of ruptures, each with a set of variationsEach variation in a separate file
52Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Extending Wings to Handle Collections of Collections
File Collection
File
Variation FileCollection
has-type
Variation File
Collection ofCollections
has-type
has-type
Ruptures-PAS
…
SGTSGT127_6.txt.variation-s0000-h0000
SGTSGT127_6SGT127_7.txt.variation-s0000-h0000
SGT150_11.txt.variation-s0000-h0000
…
127_6.txt.variation-s000-h000
Vars_127_6 Vars_127_7
127_6.txt.variation-s000-h001
127_7.txt.variation-s000-h000
127_7.txt.variation-s000-h001
… …
127_6.txt.variation-s0000-h0000127_6.txt.variation-s0000-h0001127_6.txt.variation-s0001-h0000127_6.txt.variation-s0001-h0001 …20_0.txt.variation-s0000-h0000 …150_11.txt.variation-s0000-h0000…
53Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Wings Coll/Coll
150_11127_7
L1
F1F1F1RupVar
L2
F1F1F1SGT
SeismogramGen_Li
NC1
L3
seism
seism
L4SA
FCS-S
FCS-SA
PeakValCalc_Okaya
NC2
FCS-Var
CCS-Rup
SGTSGT127_6.txt.variation-s000-h000
SeisGen_Li
PeakValCalc
Seismograms_PAS_127_6.grm
PeakVals_allPAS_127_6.bsa
SGT161 SGTSGT127_7.txt.variation-s000-h000
SeisGen_Li
PeakValCalc
Seismograms_PAS_127_7.grm
PeakVals_allPAS_127_7.bsa
SGT282 SGTSGT150_11.txt.variation-s000-h000
SeisGen_Li
PeakValCalc
Seismograms_PAS_151_11.grm
PeakVals_allPAS_151_11.bsa
SGT161
FCS-SGTColCCS-SGT
RV_127_6150_11127_7S_127_6
54Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Constraints (in OWL ontology)
Constraints on Files • metadata attributes: data types and default valuese.g. simulation_out_timesamples of SeisParamValsFile should be an integer and the default value is 1801• File name format with respect to metadata attributese.g. rupture variation file: e.g. 127_6.txt.variation-s0002-h0000
Format: <source_id>_<rupture_id>.txt.variation-s[4 digit slip_realization#]-h000[4 digit hypo center #]
Constraints on collections and collection of collection• Type of each element• Relations between metadata of a collection and metadata of individual itemse.g. Each rupture variation has the same source/rupture ids as the rupture
variation collection Component level constraints on metadata attributes of input/output files
or collections• Deriving metadata of output files from metadata of input filese.g. The output of PeakValCalc_Okaya (SA output file) should have the same site
name as the seismogram file Template level constraints on metadata attributes of files or collections
• Input/output files of different components can have the same metadatae.g. The RVM collection input for SeismogramGen_Li should have the same site
name as the CollOfCollection rupture variations input• Checking number of items in collectionse.g. number of RVM files and the number of rupture var collections should be equal
55Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Constraints on Files
RuptureVarFile
Int
Metadata:4DigitInt
hasSourceID
hasRuptureID
hasSlipRealization
hasHypoCenter
FileNameFormathasNameFormat
List of Metadata or StringConstant
File
SkolemInstances
RupVar-SK
xsd:inthasDefaultVal
hasMetadata
Metadata
SourceID1
RuptureID1
SlipRealz1
HypoCent1
RupVar_FileNameFormat1
hasDefaultValue
_
.txt.variation
0
Constraints on default values
Constraints on file names…
hasSourceID
hasRuptureID
usedAs
Domain independent definitions
SCEC dependentdefinitions
: classes
: instances
: roles
56Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Constraints on Collections
RuptureVariations
CollOf Collection
CollectionhasType:hasCollectionType
FilehasType:hasFileType
RuptureVarsForForRupture
RuptureVarFile
RupVar-SK
C-RuptVars-SK
CC-RuptureVariations-SK
hasCollectionType
hasSiteName
Metadata:String
hasFileType
hasSourceID
hasRuptureIDMetadata:Int
hasSourceID
hasRuptureID
SkolemInstances
hasSiteName SiteName1
hasSiteName
SourceID1hasSourceID
hasSourceID
RuptureID1
Constraints on collection element types
metadata constraints on collections & their elements
…
57Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Constraints on Components
SeismogramGen
ComponentType
hasInputs FileOrCollection
hasOutputs
SeismogramGen_LiSkolemInstances
hasInputs
SeismogramGenLi_Inputs
SeismogramGenLi_Outputs
hasOutputs
RVM1
Seismogram1
S-RV1
S-RuptVarsForRup1
hasSourceID
RVM_SourceID1
RVM_RuptureID1hasRuptureID
hasSiteName
SGTsSiteName1
metadata constraints on input and output files
Constraints on the types of input and output file and collections
…
SGT1
C-SGT1
…
58Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Workflow Templates: a set of nodes and links
TemplatehasNode Node
hasLink Link(Input, Output, InOut, LinkMaping)
CybershakeTemplate1
Node_SeismogramGen_Collection
ComponentType orComponentCollection
hasComponent
hasFile File orCollection
hasNode
hasDestinationNode, hasOriginNode, hasDestinationFileDesc, hasOriginFileDesc, …
hasComponent
ComponentCollection_SeismogramGen
hasComponentType
InputLink_RuptureVars_to_SeisgmogramGen
hasLink
hasDestinationNode…
hasFile
F-RV1
C-RuptVars1
CC-RuptureVariations1
SeismogramGen_Li
S-RV1
S-RuptVarsForRup1hasDestinationFileDesc
InputOutLink_Seismogram_from_SeismGen_to_PeakValCalc
SkolemInstances
59Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Constraints on Templates
CybershakeTemplate1
InputLink_SiteNameFile_to_BoxNameCheck
hasSiteName
InputLink_RuptureVars_to_SeisgmogramGen
hasLink
…
F-RV1
C-RuptVars1
CC-RuptureVariations1
InputLink_SGTCollforRup_to_SeismogramGen
F-SGT1
C-SGT-forRups1
CC-SGTs1
hasFile
hasFile
hasFile
SGTsSiteName1
SiteNameFile1
hasSiteName
SiteName1
N_Rups
hasN_Items
hasN_Items
…
… isSameAs
SkolemInstances
Constraints on number of elements in different collections
metadata constraints on files/collections of different components
60Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Example OWL definitions
Filename format for rupture variation files
Definitions for metadatapropagation (SynthSGT)
Constraints on files/collections of different components
61Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Extension 3: Creating many workflow instantiations
SGTSGT127_6.txt.variation-s000-h000
SeisGen_Li
PeakValCalc
Seismograms_PAS_127_6.grm
PeakVals_allPAS_127_6.bsa
SGT161SGTSGT127_7.txt.variation-s000-h000
SeisGen_Li
PeakValCalc
Seismograms_PAS_127_7.grm
PeakVals_allPAS_127_7.bsa
SGT282 SGTSGT150_11.txt.variation-s000-h000
SeisGen_Li
PeakValCalc
Seismograms_PAS_151_11.grm
PeakVals_allPAS_151_11.bsa
SGT161
4262 independent instances for each rupture, >100,000 variations for a site
Memory Bottleneck: handling many files in the file library e.g. rupture variations
. . .
BNC
GenMD
BNC BNC
GenMD GenMD
62Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Creating many workflow instantiations (on-going work)
Independent instances are generated separately• Instantiations for different ruptures are generated
separately On-demand creation of files and collections in the
file library• If files or collections are not used in metadata reasoning,
we don’t need to create file library objects for them (e.g. rupture variations) and only an ID is generated for them
Currently Wings needs 5-6 hrs to generate DAXes for 4626 ruptures with 106,124 variations
63Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Extension 4: Interleaving execution with workflow generation
Extensions in the WF template representations• System links: a link from a component that generates
results needed in template instantiation E.g. BoxNameCheck generates a file that contains SGT file
names Template navigation algorithm: while navigating
links, identify partial workflows that can be executed based on system links & steps that are already executed
Wings and Pegasus interaction• On-going work: Client/server style interaction
e.g. use secure shell
64Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Partial DAX generation: Workflow Navigation Algorithm
System link
Template navigation
Used for Partial DAX generation
65Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Summary: Current System
MCS
On
tolo
gy
AP
I
file
&
mat
adat
a A
PI
OWL ontologiesWings File Ont
Wings Component Ont
Domain component Ont
Template Library
CC-Rup-Vars
C-Rup-Vars-for-Rup
File Library
Domain File Ont
…
Metadata constraints
Metadata reasoner
F-RV1F-RV1-current wf instance-logical files used-bindings -new file objects and metadata created
Jena
TemplateInstantiator
Pegasus
CAT TemplateValidator
TemplateSelection
DAXgenerator
User
WINGS
66Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Ongoing Work
Approaches for handling many thousands of files• Use of MCS for storing logical file names and metadata• Use of more efficient OWL reasoners
(e.g. Sesame can handle 100 million triples) Client/server style interactions with Pegasus
67Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Mappings in a Workflow Template
Link mappings specify the order of inputs to a node that accepts a collection
F1
C-plenty
F1
D8
F1DC9
C-one
G1
Z1
C-plenty
C-one
Z2
C-one
Z88
…
…
…
K1 G2 K2 G88 K88
Y1
C-spl
H1
C-one
D1
D3
D2
C-spl
D17
D18 C-plentyN3
L4
FS-Y
C-splN2
M5
D18
#1
#2
F1F1F1DC9
D1 D2
D3
L1 L2
L3
F1F1F1DC11
FCS-G FCS-K
FCS-Z
C-one
NC1
FCS-T
68Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
…
Nested File Collections
SGTSGT127_6.txt.variation-s0000-h0000127_6.txt.variation-s0000-h0000127_6.txt.variation-s0000-h0001127_6.txt.variation-s0001-h0000127_6.txt.variation-s0001-h0001 …20_0.txt.variation-s0000-h0000 …150_11.txt.variation-s0000-h0000…
SGTSGT127_6SGT20_0.txt.variation-s0000-h0000
SGT150_11.txt.variation-s0000-h0000
…
For rupture 127_6 (source ID 127, rupture ID 6), there are 8 variationsFor rupture 20_0(source ID 20, rupture ID 0), there are 1352 variationsA set of ruptures, each with a set of variationsEach variation in a separate file
69Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Nested File Collections
File Collection
File
Variation FileCollection
has-type
Variation File
Collection ofCollections
has-type
has-type
Ruptures-PAS
…
SGTSGT127_6.txt.variation-s0000-h0000
SGTSGT127_6SGT127_7.txt.variation-s0000-h0000
SGT150_11.txt.variation-s0000-h0000
…
127_6.txt.variation-s000-h000
Vars_127_6 Vars_127_7
127_6.txt.variation-s000-h001
127_7.txt.variation-s000-h000
127_7.txt.variation-s000-h001
… …
127_6.txt.variation-s0000-h0000127_6.txt.variation-s0000-h0001127_6.txt.variation-s0001-h0000127_6.txt.variation-s0001-h0001 …20_0.txt.variation-s0000-h0000 …150_11.txt.variation-s0000-h0000…
70Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Nested File Collections
150_11127_7
L1
F1F1F1RupVar
L2
F1F1F1SGT
SeismogramGen_Li
NC1
L3
seism
seism
L4SA
FCS-S
FCS-SA
PeakValCalc_Okaya
NC2
FCS-Var
CCS-Rup
SGTSGT127_6.txt.variation-s000-h000
SeisGen_Li
PeakValCalc
Seismograms_PAS_127_6.grm
PeakVals_allPAS_127_6.bsa
SGT161 SGTSGT127_7.txt.variation-s000-h000
SeisGen_Li
PeakValCalc
Seismograms_PAS_127_7.grm
PeakVals_allPAS_127_7.bsa
SGT282 SGTSGT150_11.txt.variation-s000-h000
SeisGen_Li
PeakValCalc
Seismograms_PAS_151_11.grm
PeakVals_allPAS_151_11.bsa
SGT161
FCS-SGTColCCS-SGT
RV_127_6150_11127_7S_127_6
71Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Example OWL definitions
Filename format for rupture variation files
Definitions for metadatapropagation (SynthSGT)
Constraints on files/collections of different components
72Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Component Ontology in OWL <owl:Class rdf:ID="ComponentType"/> <owl:FunctionalProperty rdf:ID="hasInputs"><rdf:type rdf:resource="http://www.w3.org/2002/07/owl#ObjectProperty"/> <rdfs:domain rdf:resource="#ComponentType"/><rdfs:range rdf:resource="#FileAndPrefixList"/> </owl:FunctionalProperty><owl:ObjectProperty rdf:ID="hasOutputs"> <rdf:type
rdf:resource="http://www.w3.org/2002/07/owl#FunctionalProperty"/> <rdfs:domain rdf:resource="#ComponentType"/> <rdfs:range rdf:resource="#FileAndPrefixList"/> </owl:ObjectProperty><owl:FunctionalProperty rdf:ID=”hasFile"> <rdfs:range> <owl:Class> <owl:unionOf rdf:parseType="Collection"> <rdf:Description
rdf:about="http://www.isi.edu/ikcap/wings/fileOntology.owl#File"/> <rdf:Description
rdf:about="http://www.isi.edu/ikcap/wings/fileOntology.owl#FileCollection"/>
</owl:unionOf> </owl:Class> </rdfs:range> <rdfs:domain rdf:resource="#FileAndPrefix"/> <rdf:type
rdf:resource="http://www.w3.org/2002/07/owl#ObjectProperty"/> </owl:FunctionalProperty><owl:DatatypeProperty rdf:ID="hasPrefix"> <rdfs:domain rdf:resource="#FileAndPrefix"/> <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/> <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#FunctionalProperty"/> </owl:DatatypeProperty>
<owl:FunctionalProperty rdf:ID="hasVersion">
<rdfs:domain rdf:resource="#ComponentType"/> <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/> <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#DatatypeProperty"/> </owl:FunctionalProperty> <owl:FunctionalProperty rdf:ID="hasExecutionRequirements"> <rdfs:domain rdf:resource="#ComponentType"/> <rdfs:range rdf:resource="http://www.isi.edu/ikcap/wings/executionRequirements.owl#ExecutionRequirements"/> <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#ObjectProperty"/> </owl:FunctionalProperty><owl:DatatypeProperty rdf:ID="hasExecutablePath">
<rdfs:domain rdf:resource="#ComponentType"/> <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/> <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#FunctionalProperty"/> </owl:DatatypeProperty><owl:FunctionalProperty rdf:ID="hasNamespace"> <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#DatatypeProperty"/> <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/> <rdfs:domain rdf:resource="#ComponentType"/> </owl:FunctionalProperty><owl:FunctionalProperty rdf:ID="hasTranslationArgument"> <rdfs:domain rdf:resource="#ComponentType"/> <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/> <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#DatatypeProperty"/> </owl:FunctionalProperty>
<owl:Class rdf:ID="ComponentCollection"/> <owl:ObjectProperty rdf:ID="hasComponentType"> <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#FunctionalProperty"/> <rdfs:range rdf:resource="#ComponentType"/> <rdfs:domain rdf:resource="#ComponentCollection"/> </owl:ObjectProperty>
73Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
A Component Description from the Library <owl:Class rdf:ID="RemoveCommonWords"> <rdfs:subClassOf
rdf:resource="http://www.isi.edu/ikcap/wings/componentOntology.owl#ComponentType"/>
</owl:Class> <RemoveCommonWords rdf:ID="removeCommonWordsV1"> <clns:hasInputs> <clns:FileAndPrefixList rdf:ID="componentLibrary_RDFResource_5"> <rdf:rest rdf:resource="#componentLibrary_RDFResource_6"/> <rdf:first rdf:resource="#removeCommonWordsInput1"/> </clns:FileAndPrefixList> </clns:hasInputs> <clns:hasExecutionRequirements
rdf:resource="#countWordsExecutionReq"/> <clns:hasNamespace
rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >vds</clns:hasNamespace> <clns:hasOutputs> <clns:FileAndPrefixList rdf:ID="componentLibrary_RDFResource_7"> <rdf:rest rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-
ns#nil"/> <rdf:first rdf:resource="#removeCommonWordsOutput"/> </clns:FileAndPrefixList> </clns:hasOutputs> <clns:hasVersion
rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >1</clns:hasVersion> <clns:hasExecutablePath
rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >/nfs/isd/varunr/wings/removeCommonWords</clns:hasExecutablePath> </RemoveCommonWords>
<clns:FileAndPrefix rdf:ID="removeCommonWordsOutput"> <clns:hasFile rdf:resource="#removeCommonWordsOutputFile"/> <clns:hasPrefix
rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >-o</clns:hasPrefix> </clns:FileAndPrefix> <lingflns:EnglishFile rdf:ID="removeCommonWordsOutputFile"/>
<clns:FileAndPrefixList rdf:ID="componentLibrary_Individual_34"> <rdf:rest> <clns:FileAndPrefixList rdf:ID="componentLibrary_Individual_37"> <rdf:rest rdf:resource="http://www.w3.org/1999/02/22-rdf-
syntax-ns#nil"/> <rdf:first> <clns:FileAndPrefix rdf:ID="removeCommonWordsInput2"> <clns:hasFile> <lingflns:EnglishFile rdf:ID="removeCommonWordsInputFile"/> </clns:hasFile> </clns:FileAndPrefix> </rdf:first> </clns:FileAndPrefixList> </rdf:rest> <rdf:first> <clns:FileAndPrefix rdf:ID="removeCommonWordsInput1"> <clns:hasPrefix
rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >-i</clns:hasPrefix> <clns:hasFile> <lingflns:EnglishFile rdf:ID="CommonWordsFile"/> </clns:hasFile> </clns:FileAndPrefix> </rdf:first> </clns:FileAndPrefixList>
<clns:FileAndPrefixList rdf:ID="componentLibrary_Individual_40"> <rdf:first rdf:resource="#removeCommonWordsOutput"/> <rdf:rest rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-
ns#nil"/> </clns:FileAndPrefixList>
<clns:FileAndPrefixList rdf:ID="componentLibrary_RDFResource_6"> <rdf:first rdf:resource="#removeCommonWordsInput2"/> <rdf:rest rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-
ns#nil"/> </clns:FileAndPrefixList>
74Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
Formats for Filenames (examples)
SGT file: e.g. FD_SGT/USC_1/A/SGT161 Format: FD_SGT/<site_id>_[1-2]/[A-L]/SGT[3-digit-alphanumeric] - site_name: e.g. USC - tensor direction[1-2]: 1 (EW) 2(NS) - time_period [A-L]: A (0-15 seconds) B(15-30 seconds), etc. - 3-digit-alphanumeric :xyz volumn id rupture variation file: e.g. 127_6.txt.variation-s0002-h0000
Format: <source_id>_<rupture_id>.txt.variation-s[4 digit slip_realization#]-h000[4 digit hypo center #] - source_id: e.g. 127
- rupture_id: e.g. 6 - 4 digit slip_realization# : 2
- 4 digit hypo center #: 0 SA output file: e.g. PeakVals_allLADT_127_6.bsa
Format: PeakVals_all<site_id>_<source_id>_<rupture_id>.bsa seismogram file : e.g. Seismogram_LADT_127_6.grm
Format: Seismogram_<Site>_<source_id>_<rupture_id>.grm SRL file: e.g. USC-sorted_by_rupture_variations.srl
Format: <site_id>-sorted_by_rupture_variations.srl additional metadata:
75Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008USC Information Sciences Institute
All Data Products Have Rich Metadata <flns:File
rdf:about="http://www.isi.edu/ikcap/wings/domains/NLP/fileLibrary.owl#kernelRules_RulePruningWorkflow1_1118895460046">
<flns:usedAs rdf:resource="http://www.isi.edu/ikcap/wings/domains/NLP/componentLibrary.owl#KernelRulesFile"/>
<wflns:createdBy rdf:resource="http://www.isi.edu/ikcap/wings/domains/NLP/workflows/RulePruningInstance1.owl#"/>
<wflns:usedBy rdf:resource="http://www.isi.edu/ikcap/wings/domains/NLP/workflows/RulePruningInstance2.owl#"/>
</flns:File>
<nlpflns:TextFile rdf:about="http://www.isi.edu/ikcap/wings/domains/NLP/fileLibrary.owl#TextFileCollection_RulePruningWorkflow1_1118895460046_item_1"/>
<flns:FileCollection rdf:about="http://www.isi.edu/ikcap/wings/domains/NLP/fileLibrary.owl#RulePruningWorkflow1_1119042891296_FilteredRulesCollection">
<flns:usedAs rdf:resource="http://www.isi.edu/ikcap/wings/domains/NLP/componentLibrary.owl#FilterRulesOutputFile"/>
<flns:usedAs rdf:resource="http://www.isi.edu/ikcap/wings/domains/NLP/componentLibrary.owl#PruneRulesInputFile"/>
<wflns:createdBy rdf:resource="http://www.isi.edu/ikcap/wings/domains/NLP/workflows/RulePruningInstance2.owl#"/>
<flns:hasFiles rdf:parseType="Collection">
<flns:File rdf:about="http://www.isi.edu/ikcap/wings/domains/NLP/fileLibrary.owl#RulePruningWorkflow1_1119042891296_FilteredRulesCollection_item_0"/>
<flns:File rdf:about="http://www.isi.edu/ikcap/wings/domains/NLP/fileLibrary.owl#RulePruningWorkflow1_1119042891296_FilteredRulesCollection_item_1"/>
</flns:hasFiles>
</flns:FileCollection>