www.csiro.au Towards Service Oriented Geoscience SEE Grid and APAC Grid Dr Robert Woodcock Executive Manager, e-Science
Jan 12, 2016
www.csiro.au
Towards Service Oriented GeoscienceSEE Grid and APAC Grid
Dr Robert Woodcock
Executive Manager, e-Science
2
Outline
• Industry drivers
• Inefficiencies in “geoscience” modelling workflow
• The Solid Earth and Environment Grid
• The APAC (Geoscience) Grid
• Putting it all together: pmd*CRC Modelling Workflow for Industry problems
• Results and what might the future hold?
3
Australian National Research Priorities
Frontier Technologies for Building and Transforming Australian Industries:
Stimulating the growth of world-class Australian industries using innovative technologies developed from cutting-edge research
Priority Goal 4: Smart information use
Improved data management for existing and new business applications and creative applications for digital technologies
ICT applications are providing huge opportunities to deliver new systems, products, business solutions, and to make more efficient use of infrastructure
The ability of organisations to operate virtually and collaborate across huge distances in Australia and internationally hinges on our capabilities in this area
4
Key points from case studies and support letters
• Show the diversity of use cases for the same data type throughout the mining value chain
• Show a strong business case for interoperability for management of your data in the external world
• Show an even stronger business case for interoperability for internal data management
• Show why standards need to be developed by groups working together as part of a community
• Highlight the emerging issue that responsibility of data quality becoming a legislative issue
5
Key Driver: Input to the Minerals Exploration Action Agenda – July 2003
Industry input highlighted
problems in gaining access to pre-competitive
geoscience information
described existing information as commonly
incomplete and fragmented across eight
government agencies, each with its own
information management systems and
structures
noted that the disparate systems lead to
inefficiencies causing higher costs, reduced
effectiveness and increased risk incurred by the
industry and its service providers
Source: http://www.industry.gov.au/assets/documents/itrinternet/minerals_aa_finalreport_July2003.pdf
6
What is the role of:
• Competency contrasts?
• Permeability?
• Pore fluid pressure & flow fields?
Modelling Workflow
Define the geological problem
Build the model
Run the model
View and Interpret Results
Iterate to achieve Understanding
Report and feed into knowledge base
…Must be repeatable, robust and timely
very weak
strong
weak
strong
Tensile failure
mod. strong
mod. strong
mod. strong
mod. strong
Block model of dilation: showing impact of Fault set “A” Dip variation
7
Inefficiencies in the Workflow
Information is scattered across:
Organisations – company, geological survey, etc
Resources – different hardware and software platforms
Geography – geological surveys in each state and territory (region) in Australia
Cost of data integration is high, in some situations exceeding all other costs
• Computational resources:
Different architectures suit different numerical codes better
Are often available but outside your organisations direct control
Are setup in different ways
Cost of adapting an investigators specific toolkit to use multiple sites is often prohibitive
Can these issues be removed?
www.csiro.au
The Solid Earth and Environment GridObtaining information…
9
The SEE Grid Community
Working together (loosely) to develop a toolkit for interoperability for the Solid Earth and Environmental Sciences
Together… because our information and services need to be shared more easily to achieve our goals
Loosely… because ultimately we are separated by political and economic boundaries
Toolkit… because our World is dynamic and we need tools that can be reconfigured and chained together quickly to answer our questions
…in this context we must reduce the barriers to becoming a part of the community
10
Data Structures
Proprietary Software
Versions of Software
Client
Pre-competitive geoscience data - The trouble is…
Slide courtesy of Stuart Girvan
11
XML
GML/XMML
Client
Our aim…
Slide courtesy of Stuart Girvan
12
PIRSAWeb Feature Service (WFS)
Common Interface Binding – GML/XMML
GA Geochemistry
Feature Data Source
DOIRGeochemistry
FeatureData Source
DOIRWeb Feature Service (WFS)
GAWeb Feature Service (WFS)
Geoserver (Open Source)
PostGIS (Open Source)
OraclePostGIS (Open Source)
CLIENT APPLICATIONS
DATA ACCESS SERVICES
DATA SOURCES
WebMap Composer
GA Reports Application
PIRSA Geochemistry
FeatureData Source
Little or no change required here
Translation to standards here
13
Common Interface Binding – GML/XMML
WebMap Composer
GA Reports Application
PIRSAWFS
DOIRWFS
GAWFS
NTGSWFS
MRTWFS
NRMWFS
NSWDPIWFS
VICDPIWFS
?FracSIS
pmd*CRCModel Tools
CLIENTS
DATA SOURCES
DATA SERVICES
www.csiro.au
The Solid Earth and Environment GridInformation - Implementation and Examples
15
Common Interface Binding - Details
Two parts
1. Service interface standard – how you communicate with the service, sending requests and receiving results
2. Information standards – how information is encoded in a community agreed form
We use and develop Open Geospatial Consortium and the Exploration and Mining Mark-up Language and its successor, GeosciML
16
Open Geospatial ConsortiumWeb Feature Service (WFS)
WebFeatureService
Get Capabilities Request
Get Capabilities Response
Describe Feature Type RequestDescribe Feature Type Response
Get Feature RequestGet Feature Response
http protocol
XML/KVP
XML
XML/KVP
GMLSchema
XML/KVP
GML
Data Source
ConfigFiles
Application (web based or desktop)
Response in Geography Mark-up Language (GML)
- Or more usefully, a GML Application Schema
17
Features – Geoscience Community (XMML & GeoSciML)
Borehole collar location shape collar diameter length operator logs related observations …
Fault shape surface trace displacement age … Ore-body
commodity deposit type host formation shape resource estimate …
Observation location subject/specimen/station property/theme method operator date/time result (+ type/reference
system/scale/classification) …
Basin? formations shape – time
dependent resource estimate …
18
Data source to community schemas
Community schemas provide the common or shared model
All data providers have their own local data model
All data providers must map data from local source (database) to community schema, irrespective of technology implementation
19
20
Why XML?
Extensibility
Self describing
Ability to be (remotely) validated against schema
XML Schema provides “loose tolerances”
All software languages have tools to deal with XML
But…
Problematic for large data sets…
though nobody said you can’t use binary as well (even over WFS) Community agreement is what matters
21
A user makes a request and gets back GML based data which can be ….
Rendered into a map layer AND queried by a user or….
… formatted into a report or ….
… read and used by any enabled application
Slides courtesy Stuart Girvan – Geoscience Australia
How would you use an interoperable service?
22
Web Map Interface (courtesy of Social Change Online)
Bounding Box
Known Layers
23
Tabular Reports by Source(courtesy of Geoscience Australia)
24
Desktop Visualisation (courtesy of Fractal Technologies)
www.csiro.au
High Performance Computing in Exploration and Mining
26
Why use simulation and modelling?
•Mineral exploration has considerable uncertainty
•We use simulation and modelling to analyse an ensemble of possible geological structures and histories that could have produced the observations seen today
•The result is reduced uncertainty and some quantification of risk
This same approach applies to many fields – hazards, environment, … which is why we formed SEE Grid community
27
Our toolkit…
Our toolkit contains a variety of codes (usually more than one each type) for
Mechanics
Chemistry
Transport
Thermal
Fluid flow
Some of these can be coupled together: Reactive Transport – Chemistry+Transport+Thermal+Fluid flow
Some scenarios only require a subset…
It becomes very computationally intensive when using many…
AND we run many scenarios
Grid Computing provides a solution
Darcy flow and Streamlines
28
Community Agreed Service Interfaces and Information Models
Industry Data and Knowledg
eGrid
APAC Data and Compute Grid
APAC Web Feature Service (WFS)
Industry Web Feature Service (WFS)
Client Applications
Gateway Services
Facilities
Drill Core Analysis Workflow
Government Geological Surveys Data and
Knowledge Grid
Mantle Convection Modelling Workflow
Tsunami Workflow
ReactiveTransport Workflow
Geological SurveyWeb Feature Service (WFS)
29
Grid Technology Layers
Data and Information
Infrastructure
Application Portals
Visualisation3-D models
Data andKnowledge
Portals
e-Science and e-Geoscience Layer
Community-specific Knowledge Environments and Networks for Research and Education
Customised for discipline- and project specific applications
eg, 3D models, Geophysics, Thermodynamics, Fluids, Geochronology
Networks,Communications
High performance
computing
High VolumeStorage
Middleware Architecture
Base Computing TechnologiesAPAC Grid
pmd*CRC
SEE Grid
30
Client
The Grid Application… Service Interactions
Resource Registry
Data Management Service
HPC Repository
LoginJob
MonitorRun
SimulationEdit Problem Description
Local Repository
Archive Search
Geology S.A
Geology W.AGeochem
N.S.W
Geochem W.A
Information
Authentication Job Management Service
Escript Service
FastfloRT Service
User
Workflow...
Computation
Community Infrastructure
Physical Resource Physical Resource
31
Traditional Mechanical Modelling Workflow
• “Powerful” desktop computes several models at a time
• Limitations are in the order of ~2 models per week
• Models (mesh + data files) are individually and laboriously constructed
• The manual process is error prone
• Results are manually visualised one at a time
• Screenshots are manually taken and made into “movies”
• Very little, if any, standardised data archiving is done. This results in potential confusion or loss of the originating conditions of the experiments, making it unrepeatable in the long termSlide courtesy of Robert Cheung and Warren Potma
32
New Refined Workflow
• Parameterised template or wizard driven model geometry/mesh creation
• Boundary condition & model properties parameter sweep utilities
• automatically creates a “family” of model, data files based on varying a set of parameters
• Inversion algorithms
• determine input parameters of future iterations automatically based on the user ranking of previous results
Automated generation of visualisations
Automated movie generation
Automated archiving
3D Time varying volume visualisation
Parameterised Geometry Creation
Multi-site data storage via Storage Resource Broker
Slide courtesy of Robert Cheung and Warren Potma
33
Results to Date
For one Investigator, on one investigation:
• 500 Models in 4 months (100x more!)
• Inversion/parameter sweep algorithms – semi-automated model creation; faster, less errors
• Automated post-processing/visualisation – all views X all timescale X all models await the investigator automatically
• Automated archiving – metadata searchable, more accurate store of experimental conditions, delivered to your store!
34
Results
Major inefficiencies have been removed by:
• Integrating the pmd*CRC geoscience modelling workflow
with the:
• Solid Earth and Environment Grid, and
• APAC (Geoscience) Grid
Industry response to approach is supportive as evidenced by SEE Grid Roadshow survey results and pmd*CRC applications
www.csiro.au
Thank You
Name Dr Robert Woodcock
Title Executive Manager, e-Science
Phone +61 8 6436 8780
Email [email protected]
Web www.csiro.au
www.seegrid.csiro.au
Contact CSIRO
Phone 1300 363 400
+61 3 9545 2176
Email [email protected]
Web www.csiro.au