Top Banner
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands A CMD Core Model for CLARIN Web Services Menzo Windhouwer, Daan Broeder, Dieter van Uytvanck
23

A CMD Core Model for CLARIN Web Services

Jan 16, 2015

Download

Documents

Presentation at the Metadata2012 workshop at LREC12 in Istanbul, Turkey, May 22, 2012
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A CMD Core Model for CLARIN Web Services

The Language Archive – Max Planck Institute for Psycholinguistics

Nijmegen, The Netherlands

A CMD Core Model for CLARIN Web Services

Menzo Windhouwer, Daan Broeder, Dieter van Uytvanck

Page 2: A CMD Core Model for CLARIN Web Services

LREC Metadata 2012 workshop 2

CLARIN vision

Our vision is that the resources for processing language, the data to be processed as well as appropriate guidance, advice and training be made available and can be accessed over a distributed network from the user's desktop. CLARIN proposes to make this vision a reality: the user will have access to guidance and advice through distributed knowledge centres, and via a single sign-on the user will have access to repositories of data with standardized descriptions, processing tools ready to operate on standardized data, and all of this will be available on the internet using a service oriented architecture based on secure grid technologies.

http://www.clarin.eu/external/index.php?page=about-clarin&sub=0

22 May 2012

Page 3: A CMD Core Model for CLARIN Web Services

LREC Metadata 2012 workshop 3

Outline

• Web Service architectures• Component Metadata Infrastructure

(CMDI)• National CLARIN initiatives• CMD core model for Web Services• Usage of the core model• Future work and conclusions

22 May 2012

Page 4: A CMD Core Model for CLARIN Web Services

LREC Metadata 2012 workshop 4

Service

servicetokenize

inputtext=‘Welcome to Istanbul’ output

tokens=[‘Welcome’, ‘to’, ‘Istanbul’]separator=whitespace

22 May 2012

Page 5: A CMD Core Model for CLARIN Web Services

LREC Metadata 2012 workshop 5

Web Service

web serverhttp://www.example.com/

servicetokenize

inputtext=‘Welcome to Istanbul’ output

tokens=[‘Welcome’, ‘to’, ‘Istanbul’]separator=whitespace

How to invoke a service and pass the parameters?Different Service Oriented Architectures.

22 May 2012

Page 6: A CMD Core Model for CLARIN Web Services

6

SOA: RESTful Resource Orientation

web serverhttp://www.example.com/

servicetokenize

inputtext=‘Welcome to Istanbul’ output

tokens=[‘Welcome’, ‘to’, ‘Istanbul’]separator=whitespace

• Resource oriented instead of service oriented• An URL tells what resource to operate on• Uses HTTP verbs (PUT, GET, POST, DELETE) to tell what to do

1. create a resourcePUT http://www.example.com/textContent-type: plain/text

Welcome to Istanbulresponse

201Location: http://www.example.com/text/123

2. request tokens resourceGET http://www.example.com/text/123/tokens/whitespace

Page 7: A CMD Core Model for CLARIN Web Services

7

SOA: Remote Procedure Call

web serverhttp://www.example.com/

servicetokenize

inputtext=‘Welcome to Istanbul’ output

tokens=[‘Welcome’, ‘to’, ‘Istanbul’]separator=whitespace

• The XML-RPC and SOAP are HTTP oriented RPC standards• An URL may function as an endpoint for several operations• Uses a standard envelope format to tell what to do

POST http://www.example.com/servicesContent-type: text/xml

<methodCall><methodName>tokenize</methodName><params>

<param><value><string>Welcome to Istanbul</string></value></param><param><value><string>whitespace</string></value></param>

</params></methodCall>

Page 8: A CMD Core Model for CLARIN Web Services

LREC Metadata 2012 workshop 8

SOA: REST-RPC hybrid

web serverhttp://www.example.com/

servicetokenize

inputtext=‘Welcome to Istanbul’ output

tokens=[‘Welcome’, ‘to’, ‘Istanbul’]separator=whitespace

• Mixes REST and RPC• Can be more service than resource oriented• URL indicates what operation to perform on which data

GET http://www.example.com/tokenize?text=Welcome+to+Istanbul&separator=whitespace

22 May 2012

Page 9: A CMD Core Model for CLARIN Web Services

LREC Metadata 2012 workshop 9

Interface Description Language

• RPC architectures tend to have an IDL which allows to describe which operations are available at the endpoint– SOAP: Web Service Description Language

• WSDL (2)

• For REST and REST-RPC hybrids an IDL is controversial– Once you have the resource URL you can ‘just’ follow the links,

e.g., like a web crawler does with HTML pages– However, REST(-RPC) allows too much freedom to allow a

machine to infer how to retrieve a resource or invoke a service• RFC 6570: URI Template• WADL (old W3C submission by Sun)• WSDL 2• ...

22 May 2012

Page 10: A CMD Core Model for CLARIN Web Services

LREC Metadata 2012 workshop 10

Profile matching

• To place Web Services in a chain or a workflow an user can be supported by profile matching– which service can operate on the input the user

currently has available

• The IDL describes the technical needs to invoke a service, but profile matching needs more semantic information– it’s not useful to invoke the tokenizer on a project

name, although it is a string of characters– next to a technical description also a semantic

description is needed

22 May 2012

Page 11: A CMD Core Model for CLARIN Web Services

11

National CLARIN initiatives - Spain

• IULA at UPF (continuation in PANACEA)• architecture: RPC (SOAP)• IDL: WSDL• semantic description:

– SoapLab 2 and myGrid inspired– a CMD profile has been created

http://www.panacea-lr.eu/

22 May 2012

Page 12: A CMD Core Model for CLARIN Web Services

12

National CLARIN initiatives - Germany

• WebLicht (D-SPIN continuation in CLARIN-D)• architecture: REST-RPC• IDL: none as there is a known pattern, i.e., POST TCF documents• semantic description:

– WebLicht used a propriety service description– WebLicht 2.0 uses a core model compliant CMD profile

http://clarin-d.net/index.php/en/language-resources/weblicht-en

22 May 2012

Page 13: A CMD Core Model for CLARIN Web Services

13

National CLARIN initiatives – Netherlands and Flanders

• TTNWW project• architecture: any• IDL: when available and supported by the framework (Taverne)• semantic description:

– a CMD profile has been created– a core model compliant CMD profile has been created

http://www.clarin.nl/group/76#TTNWW

22 May 2012

Page 14: A CMD Core Model for CLARIN Web Services

LREC Metadata 2012 workshop 14

A CMD core model for CLARIN Web Services

• Several CMD profiles have been created by the national initiatives– large overlap due to common area:

• service• input and output specifications

– differences due to design choices:• multiple operations per description• separate technical description (IDL) or none at all• handling of embedded parameters• ...

• The CMD core model aims to align these profiles, but also allow extensions for accommodate differences in design choices

22 May 2012

Page 15: A CMD Core Model for CLARIN Web Services

LREC Metadata 2012 workshop 15

Additional aims for the core model

• The core model should preferably provide enough information to– do (basic) profile matching– invoke a service

• This should allow (national) CLARIN web service chaining and workflow engines to potentially use all CLARIN web services

22 May 2012

Page 16: A CMD Core Model for CLARIN Web Services

16

UML model for Web Services

+Name : string [1]+Description : string [0..1]+ServiceDescriptionLocation : URI [1]+Operations : Operation [1..*] {unique}

Service

+Name : string [1]+Description : string [0..1]+Input : AbstractParameter [0..*]+Ouput : AbstractParameter [1..*]

Operation

+Name : string [1]+Description : string [0..1]+MIMEType : MIME [0..1]+DataType : datatype [0..1]+SemanticType : string [0..1]+DataCategory : URI [0..1]

AbstractParameter

+Value : string [1]+Description : string [0..1]+DataCategory : URI [0..1]

ParameterValue

+Values : ParameterValue [0..*] {unique}+isConfigurationParameter : boolean [0..1] = false

Parameter

+Parameters : Parameter [1..*] {unique}

ParameterGroup

The Operation Name should be resolvable in the Service Description.The Operation Name should be resolvable in the Service Description.

When the Parameter (Group) has a direct correspondent in the Service Description the Name should be resolvable there.

When the Parameter (Group) has a direct correspondent in the Service Description the Name should be resolvable there.

When the Parameter (Group) has a direct correspondent in the Service Description the Name should be resolvable there.

0..*

1

+Values

0..*1

+Input

1..*1

+Ouput

1..*

1 +Parameters

1..*

1

+Operations

Page 17: A CMD Core Model for CLARIN Web Services

LREC Metadata 2012 workshop 17

Salient points

• A (technical) service description is mandatory, but the model doesn’t prescribe an IDL

• The location/endpoint of the service is part of the technical description, i.e., only the PID/URL of the service description is part of the semantic description

• A description can cover multiple operations• Parameters might be (deeply) embedded in a technical input

document, e.g., the token or lemma layer inside a TCF document, this is covered by parameter groups

• Names of operations, parameters and/or groups in the semantic description should be resolvable in the technical description, so after profile matching it is known how parameters should technically be passed on during invocation

• Supports parameter (profile) matching on various semantic levels: MIME type, data type, data category, semantic type

22 May 2012

Page 18: A CMD Core Model for CLARIN Web Services

LREC Metadata 2012 workshop 18

Parameter matching

• MIME type reveals the media type: text/plain• Data type is generally an XML Schema data type:

ID• Data category is generally an ISOcat data category

PID: /project id/ (DC-2535)• Semantic type is generally a service specific type:

clam.project.adelheid

• The tokenize server could specify text/plain as its input MIME type but still an Adelheid project name as the output of the Adelheid create project service would not be proper input

22 May 2012

Page 19: A CMD Core Model for CLARIN Web Services

LREC Metadata 2012 workshop 19

From UML to CMD

• Transformation to deal with inheritance:– each non-abstract class becomes a component– each atomic attribute becomes an element, but– each referential attribute becomes a component

with the referred class as a child component, except

– when this class is abstract all non-abstract subclasses become child components

– copy cardinality constraints where possible

http://catalog.clarin.eu/ds/ComponentRegistry?item=clarin.eu:cr1:p_1311927752335

22 May 2012

Page 20: A CMD Core Model for CLARIN Web Services

Service• Name: string• Description: string?• ServiceDescriptionLocation: resource

Operation• Name: string • Description: string?

Operations

ParameterGroup• Name: string • Description: string?• MIMEType: string?• DataType: string?• DataCategory: anyURI?• SemanticType: string?

Input Output

Parameter• Name: string • Description: string?• MIMEType: string?• DataType: string?• DataCategory: anyURI?• SemanticType: string?• isConfigurationParameter: boolean?

ParameterValue• Value: string• Description: string ?• DataCategory: anyURI?

Values

+

Parameters

?

* * **

+

?

+

22 May 2012

Page 21: A CMD Core Model for CLARIN Web Services

LREC Metadata 2012 workshop 21

Usage of the CMD core model

• The core model is only a starting point, i.e., provides enough information for basic profile matching and technical invocation

• It is a template to form a basis for the CMD profile of specific web service registries, i.e., the model can be extended

• However, instantiations should also maintain compliance to the core model

– cardinalities should be within the boundaries of the core model, e.g., mandatory elements cannot become optional

– closed value domains cannot be extended, but open value domains can be turned into closed ones

– data category references should not be changed as this would imply different semantics

• CLARIN-NL ToolService profile is a compliant extension:http://

catalog.clarin.eu/ds/ComponentRegistry?item=clarin.eu:cr1:p_1311927752306

• Validate compliance of instances to the core model:http://www.isocat.org/clarin/ws/cmd-core/

22 May 2012

Page 22: A CMD Core Model for CLARIN Web Services

LREC Metadata 2012 workshop 22

Current status and future work

• Current state:– There are some compliant profiles:

• CLARIN-NL ToolService profile– but not in use by TTNWW yet

• WebLicht 2.0 profile– in use, but still missing technical description (WADL)

– The core model is successful if there is a workflow/chaining engine invoking web services which were originally targeted at another engine or none at all

• Future work:– Complex chains of web services are captured in (mini) workflows

• can this core model also describe the mini workflows?

– Alignment with or reuse by other initiatives• e.g., META-SHARE meta model is also based on components and ISOcat, and

contains a section on Tools and Services

– Identify common extensions and incorporate them into the core• e.g., default values, cardinalities, asynchronicity22 May 2012

Page 23: A CMD Core Model for CLARIN Web Services

LREC Metadata 2012 workshop 23

Thanks for your attention!

Please visit:http://www.isocat.org/clarin/ws/cmd-core/

22 May 2012