The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands A CMD Core Model for CLARIN Web Services Menzo Windhouwer, Daan Broeder, Dieter van Uytvanck
Jan 16, 2015
The Language Archive – Max Planck Institute for Psycholinguistics
Nijmegen, The Netherlands
A CMD Core Model for CLARIN Web Services
Menzo Windhouwer, Daan Broeder, Dieter van Uytvanck
LREC Metadata 2012 workshop 2
CLARIN vision
Our vision is that the resources for processing language, the data to be processed as well as appropriate guidance, advice and training be made available and can be accessed over a distributed network from the user's desktop. CLARIN proposes to make this vision a reality: the user will have access to guidance and advice through distributed knowledge centres, and via a single sign-on the user will have access to repositories of data with standardized descriptions, processing tools ready to operate on standardized data, and all of this will be available on the internet using a service oriented architecture based on secure grid technologies.
http://www.clarin.eu/external/index.php?page=about-clarin&sub=0
22 May 2012
LREC Metadata 2012 workshop 3
Outline
• Web Service architectures• Component Metadata Infrastructure
(CMDI)• National CLARIN initiatives• CMD core model for Web Services• Usage of the core model• Future work and conclusions
22 May 2012
LREC Metadata 2012 workshop 4
Service
servicetokenize
inputtext=‘Welcome to Istanbul’ output
tokens=[‘Welcome’, ‘to’, ‘Istanbul’]separator=whitespace
22 May 2012
LREC Metadata 2012 workshop 5
Web Service
web serverhttp://www.example.com/
servicetokenize
inputtext=‘Welcome to Istanbul’ output
tokens=[‘Welcome’, ‘to’, ‘Istanbul’]separator=whitespace
How to invoke a service and pass the parameters?Different Service Oriented Architectures.
22 May 2012
6
SOA: RESTful Resource Orientation
web serverhttp://www.example.com/
servicetokenize
inputtext=‘Welcome to Istanbul’ output
tokens=[‘Welcome’, ‘to’, ‘Istanbul’]separator=whitespace
• Resource oriented instead of service oriented• An URL tells what resource to operate on• Uses HTTP verbs (PUT, GET, POST, DELETE) to tell what to do
1. create a resourcePUT http://www.example.com/textContent-type: plain/text
Welcome to Istanbulresponse
201Location: http://www.example.com/text/123
2. request tokens resourceGET http://www.example.com/text/123/tokens/whitespace
7
SOA: Remote Procedure Call
web serverhttp://www.example.com/
servicetokenize
inputtext=‘Welcome to Istanbul’ output
tokens=[‘Welcome’, ‘to’, ‘Istanbul’]separator=whitespace
• The XML-RPC and SOAP are HTTP oriented RPC standards• An URL may function as an endpoint for several operations• Uses a standard envelope format to tell what to do
POST http://www.example.com/servicesContent-type: text/xml
<methodCall><methodName>tokenize</methodName><params>
<param><value><string>Welcome to Istanbul</string></value></param><param><value><string>whitespace</string></value></param>
</params></methodCall>
LREC Metadata 2012 workshop 8
SOA: REST-RPC hybrid
web serverhttp://www.example.com/
servicetokenize
inputtext=‘Welcome to Istanbul’ output
tokens=[‘Welcome’, ‘to’, ‘Istanbul’]separator=whitespace
• Mixes REST and RPC• Can be more service than resource oriented• URL indicates what operation to perform on which data
GET http://www.example.com/tokenize?text=Welcome+to+Istanbul&separator=whitespace
22 May 2012
LREC Metadata 2012 workshop 9
Interface Description Language
• RPC architectures tend to have an IDL which allows to describe which operations are available at the endpoint– SOAP: Web Service Description Language
• WSDL (2)
• For REST and REST-RPC hybrids an IDL is controversial– Once you have the resource URL you can ‘just’ follow the links,
e.g., like a web crawler does with HTML pages– However, REST(-RPC) allows too much freedom to allow a
machine to infer how to retrieve a resource or invoke a service• RFC 6570: URI Template• WADL (old W3C submission by Sun)• WSDL 2• ...
22 May 2012
LREC Metadata 2012 workshop 10
Profile matching
• To place Web Services in a chain or a workflow an user can be supported by profile matching– which service can operate on the input the user
currently has available
• The IDL describes the technical needs to invoke a service, but profile matching needs more semantic information– it’s not useful to invoke the tokenizer on a project
name, although it is a string of characters– next to a technical description also a semantic
description is needed
22 May 2012
11
National CLARIN initiatives - Spain
• IULA at UPF (continuation in PANACEA)• architecture: RPC (SOAP)• IDL: WSDL• semantic description:
– SoapLab 2 and myGrid inspired– a CMD profile has been created
http://www.panacea-lr.eu/
22 May 2012
12
National CLARIN initiatives - Germany
• WebLicht (D-SPIN continuation in CLARIN-D)• architecture: REST-RPC• IDL: none as there is a known pattern, i.e., POST TCF documents• semantic description:
– WebLicht used a propriety service description– WebLicht 2.0 uses a core model compliant CMD profile
http://clarin-d.net/index.php/en/language-resources/weblicht-en
22 May 2012
13
National CLARIN initiatives – Netherlands and Flanders
• TTNWW project• architecture: any• IDL: when available and supported by the framework (Taverne)• semantic description:
– a CMD profile has been created– a core model compliant CMD profile has been created
http://www.clarin.nl/group/76#TTNWW
22 May 2012
LREC Metadata 2012 workshop 14
A CMD core model for CLARIN Web Services
• Several CMD profiles have been created by the national initiatives– large overlap due to common area:
• service• input and output specifications
– differences due to design choices:• multiple operations per description• separate technical description (IDL) or none at all• handling of embedded parameters• ...
• The CMD core model aims to align these profiles, but also allow extensions for accommodate differences in design choices
22 May 2012
LREC Metadata 2012 workshop 15
Additional aims for the core model
• The core model should preferably provide enough information to– do (basic) profile matching– invoke a service
• This should allow (national) CLARIN web service chaining and workflow engines to potentially use all CLARIN web services
22 May 2012
16
UML model for Web Services
+Name : string [1]+Description : string [0..1]+ServiceDescriptionLocation : URI [1]+Operations : Operation [1..*] {unique}
Service
+Name : string [1]+Description : string [0..1]+Input : AbstractParameter [0..*]+Ouput : AbstractParameter [1..*]
Operation
+Name : string [1]+Description : string [0..1]+MIMEType : MIME [0..1]+DataType : datatype [0..1]+SemanticType : string [0..1]+DataCategory : URI [0..1]
AbstractParameter
+Value : string [1]+Description : string [0..1]+DataCategory : URI [0..1]
ParameterValue
+Values : ParameterValue [0..*] {unique}+isConfigurationParameter : boolean [0..1] = false
Parameter
+Parameters : Parameter [1..*] {unique}
ParameterGroup
The Operation Name should be resolvable in the Service Description.The Operation Name should be resolvable in the Service Description.
When the Parameter (Group) has a direct correspondent in the Service Description the Name should be resolvable there.
When the Parameter (Group) has a direct correspondent in the Service Description the Name should be resolvable there.
When the Parameter (Group) has a direct correspondent in the Service Description the Name should be resolvable there.
0..*
1
+Values
0..*1
+Input
1..*1
+Ouput
1..*
1 +Parameters
1..*
1
+Operations
LREC Metadata 2012 workshop 17
Salient points
• A (technical) service description is mandatory, but the model doesn’t prescribe an IDL
• The location/endpoint of the service is part of the technical description, i.e., only the PID/URL of the service description is part of the semantic description
• A description can cover multiple operations• Parameters might be (deeply) embedded in a technical input
document, e.g., the token or lemma layer inside a TCF document, this is covered by parameter groups
• Names of operations, parameters and/or groups in the semantic description should be resolvable in the technical description, so after profile matching it is known how parameters should technically be passed on during invocation
• Supports parameter (profile) matching on various semantic levels: MIME type, data type, data category, semantic type
22 May 2012
LREC Metadata 2012 workshop 18
Parameter matching
• MIME type reveals the media type: text/plain• Data type is generally an XML Schema data type:
ID• Data category is generally an ISOcat data category
PID: /project id/ (DC-2535)• Semantic type is generally a service specific type:
clam.project.adelheid
• The tokenize server could specify text/plain as its input MIME type but still an Adelheid project name as the output of the Adelheid create project service would not be proper input
22 May 2012
LREC Metadata 2012 workshop 19
From UML to CMD
• Transformation to deal with inheritance:– each non-abstract class becomes a component– each atomic attribute becomes an element, but– each referential attribute becomes a component
with the referred class as a child component, except
– when this class is abstract all non-abstract subclasses become child components
– copy cardinality constraints where possible
http://catalog.clarin.eu/ds/ComponentRegistry?item=clarin.eu:cr1:p_1311927752335
22 May 2012
Service• Name: string• Description: string?• ServiceDescriptionLocation: resource
Operation• Name: string • Description: string?
Operations
ParameterGroup• Name: string • Description: string?• MIMEType: string?• DataType: string?• DataCategory: anyURI?• SemanticType: string?
Input Output
Parameter• Name: string • Description: string?• MIMEType: string?• DataType: string?• DataCategory: anyURI?• SemanticType: string?• isConfigurationParameter: boolean?
ParameterValue• Value: string• Description: string ?• DataCategory: anyURI?
Values
+
Parameters
?
* * **
+
?
+
22 May 2012
LREC Metadata 2012 workshop 21
Usage of the CMD core model
• The core model is only a starting point, i.e., provides enough information for basic profile matching and technical invocation
• It is a template to form a basis for the CMD profile of specific web service registries, i.e., the model can be extended
• However, instantiations should also maintain compliance to the core model
– cardinalities should be within the boundaries of the core model, e.g., mandatory elements cannot become optional
– closed value domains cannot be extended, but open value domains can be turned into closed ones
– data category references should not be changed as this would imply different semantics
• CLARIN-NL ToolService profile is a compliant extension:http://
catalog.clarin.eu/ds/ComponentRegistry?item=clarin.eu:cr1:p_1311927752306
• Validate compliance of instances to the core model:http://www.isocat.org/clarin/ws/cmd-core/
22 May 2012
LREC Metadata 2012 workshop 22
Current status and future work
• Current state:– There are some compliant profiles:
• CLARIN-NL ToolService profile– but not in use by TTNWW yet
• WebLicht 2.0 profile– in use, but still missing technical description (WADL)
– The core model is successful if there is a workflow/chaining engine invoking web services which were originally targeted at another engine or none at all
• Future work:– Complex chains of web services are captured in (mini) workflows
• can this core model also describe the mini workflows?
– Alignment with or reuse by other initiatives• e.g., META-SHARE meta model is also based on components and ISOcat, and
contains a section on Tools and Services
– Identify common extensions and incorporate them into the core• e.g., default values, cardinalities, asynchronicity22 May 2012
LREC Metadata 2012 workshop 23
Thanks for your attention!
Please visit:http://www.isocat.org/clarin/ws/cmd-core/
22 May 2012