17 March 2008 Standards for Interoperable Grids 1 Data Management Standards for Interoperable Grids: Experience from NextGRID and OMII-Europe Clive Davenhall National e-Science Centre, University of Edinburgh
Jan 17, 2016
17 March 2008 Standards for Interoperable Grids 1
Data Management
Standards for Interoperable Grids: Experience from NextGRID and OMII-Europe
Clive Davenhall
National e-Science Centre, University of Edinburgh
Standards for Interoperable Grids 2
Data Management: Overview Manipulation and management of data. Typically including:
Processing
Transfer
Storage
Access
Standards for Interoperable Grids 3
Data Management: Overview Manipulation and management of data. Typically including:
Processing Job execution, BES, JSDL
Transfer
Storage
Access
Standards for Interoperable Grids 4
OGSA Standards There are a number of OGSA data management
standards:
DMI: data transfer.
ByteIO: data access (file-like), data transfer.
WS-DAI: data access (database-like).
Can be used individually or in concert with other OGSA standards.
Standards for Interoperable Grids 5
OGSA-DMI
DMI: Data Management Interface.
Not yet a specification; still a draft: currently receiving public comments, completion is imminent.
A standard mechanism for moving data between locations: from a source of data, to a sink (or destination) of data.
Standards for Interoperable Grids 6
OGSA-DMI Architecture A standard structure or interface
Various resources can use and interoperate.
Support a variety of protocols for the actual data transfer: GridFTP, file access, OGSA-ByteIO, SRB.
Supports ‘third party’ transfers, a superintending process initiates a transfer from a remote
source to a remote sink.
Only concerned with moving bytes from the source to the sink: not concerned with the semantics or structure of the data, though future versions might be.
Standards for Interoperable Grids 7
Port Types DMI: a mechanism for scheduling and managing
data transfers. Provides two port types. Uses the factory pattern.
DTF: Data Transfer Factory Client invokes a DTF to create a DTI.
DTI: Data Transfer Instance Service created to perform a specific transfer.
Standards for Interoperable Grids 8
DTI Operations
A DTI (Data Transfer Instance) will support the following operations:
StartActivateStopResumeSuspendGetStateGetInstanceAttributeDocument
Standards for Interoperable Grids 9
Sources and Sinks Source:
Emits an ordered sequence of bytes.
Sink: Receives an ordered sequence of bytes.
For a resource to act as a source or sink in a DMI transfer it must: Provide suitable services to send or receive data. Furnish a list of protocols that it can use.
Information about how data are to be sent or received is encapsulated in a DEPR (Data Endpoint Reference).
Standards for Interoperable Grids 10
DEPR DEPR: Data Endpoint Reference.
Encapsulates all the information about: How data in a source are to be accessed. How data sent to a sink are to be received.
Includes all the transport protocols supported by a source or sink.
Contains endpoint references to access the data.
In future versions these endpoint references will use WS-Addressing.
Standards for Interoperable Grids 11
NextGRID Recommendations Resources should be modelled as WS-
resources.
Transfers must be implemented as ‘Logical Data Transfers’ (the most flexible of several options available).
Prescribes a mechanism to query the protocols available to a source or sink.
OGSA-ByteIO must be one of the protocols available to both the source and sink.
Standards for Interoperable Grids 12
OGSA Data Management Standards
DMI: data transfer.
ByteIO: data access (file-like), data transfer.
WS-DAI: data access (database-like).
Standards for Interoperable Grids 13
OGSA ByteIO
POSIX-like access to remote resources.
The remote resource can be any source of data: files, sensors, live-data streams, etc…
Aims to provide access transparency.
Standards for Interoperable Grids 14
Mapping to Web Services
Core OGSA ByteIO Specification Independent of any basic profile.
ByteIO OGSA WSRF Basic Profile RenderingMapping to WSRF Basic Profile.
Currently WSRF is the only mapping.
Others are anticipated.
Standards for Interoperable Grids 15
ByteIO Access Methods Two access methods. Implemented as port-types. Each is optional.
RandomByteIO: Direct random access to a portion of data resource. Portion to access specified as offset from start of the
resource.
StreamableByteIO: Streamed access to a data resource. Each access relative to the previous access.
Standards for Interoperable Grids 16
RandomByteIO read(startOffset: unsignedLong, bytesPerBlock:
unsignedInt, numBlocks: unsignedInt, stride: long): byte[]
write(startOffset:unsignedLong, bytesPerBlock: unsignedInt, stride: long, data: byte[]): void
append(data: byte[]): void
truncAppend(offset: unsignedLong, data: byte[]): void
Standards for Interoperable Grids 17
RandomByteIO read as XML<rbyteio:read>
<rbyteio:start-offset>xsd:unsignedLong</rbyteio:start-offset>
<rbyteio:bytes-per-block>xsd:unsignedInt</rbyteio:bytes-per-block>
<rbyteio:num-blocks>xsd:unsignedInt</rbyteio:num-blocks>
<rbyteio:stride>xsd:long</rbyteio:stride>
<rbyteio:transfer-information transfer-mechanism=”xsd:anyURI”> byteio:transfer-information-type
</rbyteio:transfer-information>
</rbyteio:read>
Standards for Interoperable Grids 18
StreamableByteIO
seekRead(offset: long, seekOrigin: URI, bytesToRead: unsignedInt): byte[]
seekWrite(offset: long, seekOrigin: URI, data: byte[]): void
Standards for Interoperable Grids 19
NextGRID Recommendations
Must conform to the WSRF rendering.
Must support RandomByteIO.
Restrictions on naming.
Standards for Interoperable Grids 20
OGSA Data Management Standards
DMI: data transfer.
ByteIO: data access (file-like), data transfer.
WS-DAI: data access (database-like).
Standards for Interoperable Grids 21
OGSA WS-DAI
WS-DAI: Web Service Data Access and Integration.
Access to remote data resources.
Modelled on access to databases,- of various sorts.
Standards for Interoperable Grids 22
WS-DAI Data Resource Models The CORE WS-DAI Specification
Independent of data model. Implemented as a model-dependent realisation.
WS-DAIR Modelled on access to relational databases. Queries in SQL.
WS-DAIX Modelled on access to XML databases. Queries in XPath, XQuery and XUpdate.
Anticipated that additional realisations will be developed: eg, RDF, object databases…
Standards for Interoperable Grids 23
Properties A WS-DAI resource has a number of properties which a client can
interrogate to determine the resource’s characteristics:
DataResourceAbstractName: ParentDataResource: DataResourceManagement: DatasetMap: ConfigurationMap: LanguageMap: DataResourceDescription: Readable Writeable: ConcurrentAccess: TransactionInitiation: TransactionIsolation ChildSensitiveToParent
Standards for Interoperable Grids 24
Data Resources
Externally managed resources Data stored using a pre-existing DBMS which has its
own existence apart from WS-DAI. WS-DAI gives access to this resource.
Service managed resources No independent existence. WS-DAI exists to manage the resource. For example, the results of a previous query could be
made available as a serivce-managed resource.
Standards for Interoperable Grids 25
Direct and Indirect Access Patterns for obtaining the results of queries to a
resource.
Direct Access The results are simply returned in response to the
query.
Indirect Access Effectively implements the ‘factory pattern’. The results are not returned in the response to the
query. Rather, they are made available as a data resource in
their own right.
Standards for Interoperable Grids 26
NextGRID Recommendations
WS-DAI access is optional for NextGRID.
Resources should be modelled as WS-resources.
Restrictions on naming.