A flexible and generic web- service for the delivery of geophysical data Experiences from 2 years of Intrepid’s JetStream System in Australia EuroSeismic Meeting, Paris, November, 2004 Presented by Philip McInerney
Dec 15, 2015
A flexible and generic web-servicefor the delivery of geophysical data
Experiences from 2 years of Intrepid’s JetStream System in
Australia
EuroSeismic Meeting, Paris, November, 2004
Presented by Philip McInerney
Topics – Web Data-Delivery
• Introduction• Example - Geoscience Australia• JetStream Architecture / Design
• System built on standard protocols of the web• Design elements: Catalog and Geospatial Intelligence
• JetStream Implementation• Consultation, Customisation, Integration• The Australian Experience
• Future Vision• Distributed Data Management and Delivery
Introduction&
Web Data-Delivery Example
Geoscience Australia’s GADDS
Intrepid’s Web Data-Delivery System
• Intrepid Geophysics develops and maintains the Intrepid Geophysical Data Management and Processing Software
• In the last two years we have used …• Intrepid’s data management and processing• the standard protocols of the world-wide-web• OpenDAP standards
as the basis for developing a web data-delivery system … called JetStream
Introduction
Geoscience Australia - GADDS
• April, 2003 – web data-delivery pilot– Intrepid’s JetStream system installed and
successfully delivers a small sample of survey datasets + continental-scale grids
– The study is rapidly expanded with the goal of delivering all of GA’s survey datasets
• November, 2003 – GADDS launched– The Geophysical Archive Data Delivery System
(GADDS) is formally launched– GA’s 50Gb archive of magnetic and gravity data
are freely available across the web (across the globe!) … with minimal administrative overhead
GADDS Example
Geoscience Australia - GADDS
• June, 2004 – Expansion– Upgraded to deliver 256-channel radio-
metric line datasets, and multi-band grids (K, U, Th & Total Count)
• November, 2004 – Vic, Qld data added– Datasets from state government surveys
were added to pool of data being served– Data from most states will be included by
mid-2005
GADDS Example
Define Area of InterestGADDS Example
Define Area of InterestGADDS Example
Define Area of InterestGADDS Example
Define Area of InterestGADDS Example
Define Area of InterestGADDS Example
Select DataType and ThemeGADDS Example
Search Results ...GADDS Example
Metadata ReviewGADDS Example
Metadata ReviewGADDS Example
Metadata ReviewGADDS Example
Metadata ReviewGADDS Example
Dataset SelectionGADDS Example
Dataset SelectionGADDS Example
Submit RequestGADDS Example
Submit RequestGADDS Example
Email: Data Ready … DownloadGADDS Example
Dataset DownloadGADDS Example
GADDS Example - Summary
• We used a standard browser• We viewed standard web-pages• We made some simple choices
• Area of interest• Type of data: Vector or Grid• Theme: Magnetics, Gravity, Radiometrics, …
• We chose to download one survey dataset• Selected fields of the dataset• Chose the Datum, Projection, and file format
• We received an email when data were ready• We used a standard web ‘download to file’
GADDS Example
JetStream Architecture/Design
1. Schematic Diagram
2. Design: Catalog
3. Design: Geospatial Intelligence
JetStream Client-Server Architecture
Apache
Web Server
ExplorerNetscape
Web Client
TCP/IP
HTTP
Firewall
Tomcat (or equivalent)
JetStreamProcesses
JetStream Server
JetCat Catalog
OpenDAPDrivers
Intrepid Processes
Data Administrator
Binary Datasets
Architecture / Design
JetStream Server Architecture
Apache Tomcat (or equivalent; Servlets Container)
Client queries
Get additional information
Queue (Process) Management - Dataset extract & process - Zipping the requested data - Email ‘data-ready’ service
Download management
- Dataset extract & process - Zipping the requested data - Email ‘data-ready’ service
Web Server JetStream Server
JetCat Catalog
Interface to a web-browser
Client
Data Administrator (Acquire_Catalog, …)
Binary Datasets
Intrepid Processes
OpenDAP Drivers
In a distributed system the data and processes would be located on ‘remote’ servers
Architecture / Design
Design: Catalog
Apache Tomcat (or equivalent; Servlets Container)
Client queries
Get additional information
Queue (Process) Management - Dataset extract & process - Zipping the requested data - Email ‘data-ready’ service
Download management
- Dataset extract & process - Zipping the requested data - Email ‘data-ready’ service
Web Server JetStream Server
JetCat Catalogue
Interface to a web-browser
Client
Data Administrator (Acquire_Catalog, …)
Binary Datasets
Intrepid Processes
OpenDAP Drivers
Architecture / Design
JetCat Catalog
Data Administrator (Acquire_Catalog, …)
Design: Catalog
• The catalog …• is at the heart of the JetStream System; all
interactive client-queries interrogate the catalog rather than the binary datasets
• is a very simple data structure …– a flat table; can be ASCII, Access, Oracle table, …– one record per dataset– a small number of essential fields …
» Lat/Long limits of the dataset» URL address of the dataset» Data-type and ‘theme’
– additional user-defined fields can contain any other metadata that might be pertinent to the application
Architecture / Design
Design: Catalog• Maintaining the catalog is the essential data-administrative task • The automated ‘Acquire_Catalog’ administrative tool ‘harvests’
metadata from the data-files • an intelligent ‘Data Manager’ tool facilitates additional manual
maintenance of the catalog
Architecture / Design
Design: Geospatial Intelligence
Apache Tomcat (or equivalent; Servlets Container)
Client queries
Get additional information
Queue (Process) Management - Dataset extract & process - Zipping the requested data - Email ‘data-ready’ service
Download management
- Dataset extract & process - Zipping the requested data - Email ‘data-ready’ service
Web Server JetStream Server
JetCat Catalogue
Interface to a web-browser
Client
Data Administrator (Acquire_Catalog, …)
Binary Datasets
Intrepid Processes
OpenDAP Drivers
Architecture / Design
Intrepid Processes
Design: Geospatial Intelligence
• JetStream is geospatially intelligent about a wide variety of industry file types; this means that …
• it can interpret such files to determine a dataset’s location and extent
• it can ‘look into’ such files … and intelligently extract subsets of the contained data; Subsets may be spatial subsets, or some subset of fields of the dataset
• This geospatial intelligence is achieved through ‘Intrepid Processes’ – the functions of the Intrepid Geophysical Data Management and Processing System
Architecture / Design
Design: Geospatial Intelligence
• Geospatial intelligence is used …– to maintain the catalog …
• the ‘Acquire_Catalog’ administrative tool intelligently ‘harvests’ metadata from the data-files themselves
• the Data Manager tool also uses intelligent analysis of data-files to assist manual administrative tasks.
– to present ‘on-the-fly’ previews of the data to the end-user client (e.g. thumbprint image displays)
– to extract subsets of data – either spatial or by selected dataset fields – in order to deliver to the client only that subset of data requested; in effect, reducing download time by excluding data not required by the client
Architecture / Design
Geospatial Intelligence – File TypesFile-type Back-Office Preview Web-based Delivery
Vector
Intrepid DB Y Y
Geosoft GDB Y Y
Oracle Y (… and other RDB’s) Y
ESRI Shape files Y Y
SEG-Y (seismic) Soon Soon
Grid Files
ERMapper Y Y
Geosoft Y Y
netCDF Y Y
Image Files
GeoTiffs Y Y
Jpeg (with .jgw) Y Y
Tiff (with .tfw) Y Y
ECW + Algorithms Soon Soon
Architecture / Design
Geospatial Intelligence – File Types
• Although JetStream can treat many file-types intelligently – and extract subsets of data from such files - it is also possible to configure the system to deliver any file
• simply add a file to the catalog … assigning the essential fields of ‘dataset extents’ … and JetStream can deliver that file across the web
• Nominate specific file-extensions to be ‘associated’ files; e.g. a “.doc” or “.pdf” might be ‘report’ files associated with a survey dataset; the files can be ‘associated’ by using the same base-file-name; the data file, and it’s associated report file, will be web-delivered together
Architecture / Design
JetStream Implementation
Consultation, Customisation, Integration
‘Off-the-Shelf’ Solution ? Yes, but …
• JetStream is an ‘off-the-shelf’ web data-delivery solution … which must be integrated into a corporation’s business
• In our experience, implementation requires …– Consultation– Integration
• with existing data-management systems• into existing web-interface systems
– Customisation of the system• JetStream is flexible, with many options, • Balance this with the value of keeping web-pages
simple!
Implementation
Integration with Legacy Systems
• Consultation needs to identify opportunities of using existing data management systems
• JetStream can use sources of systematic metadata in various ways …– JetStream’s Catalog can simply be an existing
database table … in Access, Oracle, ASCII, …– an existing database table can be used to initially
populate a Catalog– if a legacy system is maintained … then the
Catalog can be regularly refreshed from that system’s database tables
Implementation
Integration into Existing Web-Pages
• Again – consultation should explore opportunities to build upon any existing investment in an organisation’s web-systems
• For example, PIRSA had developed an Arc-IMS site for map-composition and map-based querying of spatial databases; JetStream was integrated with that system within days of delivery
Implementation
Integration – PIRSA ExampleImplementation
Exist
ing A
rc-IM
S
inve
stm
ent i
n
web-p
ages
JetS
tream
is
added
with
a
single
TAB p
anel
Customisation – Corporate Image
• Despite obvious differences – the GA and PIRSA sites have almost identical JetStream functionality
• JetStream (an ‘off-the-shelf’ solution) can be implemented behind a customised web-page environment … customised to corporate needs … to maintain a corporate image, or integrate with an existing system, etc.
Implementation
Customisation – Client Service
• Despite being a ‘off-the-shelf’ solution, JetStream is very flexible – and there is considerable scope to tailor the web-page interface to clients needs …
• the GA interface, for example, provides access to metadata to assist the client’s selection
• the PIRSA site provides much less metadata
• For a seismic data service, one would want to see seismic line locations at the time of defining an ‘area of interest’ …
Implementation
Customisation – Client Service
Display seismic line location to assist definition of ‘area of interest’
Implementation
Customisation – Client Service
Preview SEG-Y image … then request to download the SEG-Y data file
Implementation
Australian Experience
Client Perspective
Data Provider Perspective
GADDS – Customer Reaction
• Well received by the customer base– Clients have expressed satisfaction with
the access via universally available web-browsers
– The simple series of web-pages provide sufficient metadata to facilitate effective dataset selection
– Clients have found the dataset delivery mechanism – notification by email, with a URL link – is effective and practical
Australian Experience
GADDS – Benefits for GA
• Assists GA in their charter to ‘make data freely available’ …– Use of the universally available ‘web-
browser’, and delivery via standard web-download protocols – ensures ease-of-access for all. (No proprietary software needed by the client).
– Clients construct their own queries to find ‘what data are available ?’
Australian Experience
GADDS – Benefits for GA
• Reduced data administration overheads– The JetStream system facilitates the
dataset management, with features to assist the maintenance of the system catalogue
– Significant reduction in clerical staff. Tasks, such as answering client queries, taking orders, extracting datasets from archives, arranging delivery … are now automated
Australian Experience
PIRSA Experience
• JetStream has improved our efficiencies• More time is available to add value to
the products available• More and more stakeholders access our
potential field data via JetStream.• Data are also being better managed
centrally via JetStream.
Australian Experience
Domenic Calandro,Manager, Geoscience Datasets
PIRSA Experience – Client View
• JetStream is being very well received by our stakeholders, particularly international users, and users with broadband internet access
• JetStream definitely contributed to our "number 1 status in the world" for delivery of pre-competitive data (Report of the independent Fraser Institute)
Australian Experience
Domenic Calandro,Manager, Geoscience Datasets
Fraser Institute Survey:100% of respondents considered the South Australian geoscience databases to encourage exploration investment
Future Vision
Distributed Systems
Distributed Data Management
• We see the JetStream system as having a ‘data management’ function in addition to web-data-delivery
• In the context of data-management it is essential to think in terms of distributed systems
• Today many organisations operate on a ‘distributed’ basis – with authority and responsibility distributed to regions. It is frequently impractical to centralise the management of data in such organisations
Distributed Systems
Distributed Data Management
• The main advantage of distributed management of data comes from the ‘divide and conquer’ principal …
• the ‘problem’ remains small !• the regional office has a greater interest in the
management of their data
• At the same time, however, clients in other parts of the organisation may want to know ‘what data are available’ … and request a copy of those data … so …
Distributed Systems
Data Delivery in a Distributed World
• The goal …
Distributed Systems
manage data locally
access data globally
JetStream in a Distributed World
The data, Intrepid Processes and the Catalog can be distributed
Distributed Systems
Apache Tomcat (or equivalent)
JetStreamProcesses
Web Server JetStream Server
JetCat Catalog
OpenDAPDrivers
Intrepid Processes
Data Administrator
Binary Datasets
JetStream in a Distributed World
The data, Intrepid Processes and the Catalog can be distributed
Distributed Systems
Apache Tomcat (or equivalent)
JetStreamProcesses
Web Server JetStream Server
JetCat Catalog
OpenDAPDrivers
Intrepid Processes
Data Administrator
Binary Datasets
JetStream in a Distributed World
The data, Intrepid Processes and the Catalog can be distributed
Distributed Systems
Apache Tomcat (or equivalent)
JetStreamProcesses
Web Server JetStream Server
OpenDAPDrivers
Data Administrator
Catalog
IntrepidBinary
Datasets
Catalog
IntrepidBinary
Datasets
Catalog
IntrepidBinary
Datasets
JetStream in a Distributed World
• Advantages• Local management
of datasets is efficient
• No administrative overhead of centralisation
• Can access data globally – only accessing it when I want it
Distributed Systems
Extending the Vision
• The ultimate goal is to link from one distributed network to other networks …
• e.g. “I don’t need to manage those data; the European Union is managing it for me … and I’ll go back and get it when I want it”
Distributed Systems
Extending the Vision
• A geologist of Global Petroleum Inc. constructs a query to locate seismic data in the North Sea – and queries the corporate net
• The query delivers metadata information and download options back to the desktop – from multiple distributed data repositories
Distributed Systems
Global PetroleumInc. EuroSeismic
Network
• The query construct is forwarded to a special port of the EuroSeismic network
Summary• JetStream uses the standard protocols of the web (not re-
inventing wheels)• For geo-spatial data – we can be intelligent about it; we
know its extent, we can extract subsets out from it, we have tools to assist data management (SEG-Y ? Soon)
• We integrate with existing data management, with existing web-systems; we don’t replace, but build-on existing investment
• We like the KISS principle … Keep It Simple, Simon!• Web-delivery is providing client satisfaction, and benefits
to the data-provider – improved client service, improved data management, scope to value-add, reduced costs
• We believe that distributed data management is practical, and are confident that JetStream can deliver global access with such distributed systems
Acknowledgements
• Thank you for the opportunity to make this presentation to you today
• Thanks also to …• Geoscience Australia
– http://www.geoscience.gov.au/gadds
• Primary Industry & Resources, South Australia– http://www.pir.sa.gov.au/pages/minerals/sarig/sarig.htm