Public Page 0 of 56 Interoperability Standards and Specifications Report December 11, 2016 Deliverable Code: D5.2 Version: 1.3 Dissemination level: Public First version of the interoperability standards and specification report that guides interoperability considerations within and beyond the OpenMinTeD project. H2020-EINFRA-2014-2015 / H2020-EINFRA-2014-2 Topic: EINFRA-1-2014 Managing, preserving and computing with big research data Research & Innovation action Grant Agreement 654021
56
Embed
Interoperability Standards and Specifications Reportopenminted.eu/wp-content/uploads/2016/12/D5.2... · 2017. 6. 26. · Interoperability Standards and Specifications Report • •
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Public Page0of56
Interoperability Standards and Specifications
Report December11,2016
DeliverableCode:D5.2
Version:1.3Disseminationlevel:Public
First version of the interoperability standards and specification report that guidesinteroperabilityconsiderationswithinandbeyondtheOpenMinTeDproject.
2. Table of Tables Table 1 - Requirements in status "draft" ........................................................................................................................................................... 24 Table 2 - Requirements in status "final" ............................................................................................................................................................ 24 Table 3 – Requirements in status “deprecated” ............................................................................................................................................... 27 Table 4 - Assessed products and consulted sources ........................................................................................................................................ 29 Table 5 – WG 1 summary of actions to improve compliance ...................................................................................................................... 31 Table 6 – WG 2 summary of actions to improve compliance ...................................................................................................................... 32 Table 7 – WG 3 summary of actions to improve compliance ...................................................................................................................... 34 Table 8 – WG 4 summary of actions to improve compliance ...................................................................................................................... 35 Table 9 - Compatibility Matrix (draft version 1.0): Contents ...................................................................................................................... 51 Table 10 - Compatibility Matrix (draft version 1.0): Software ................................................................................................................... 52 Table 11 - Compatibility Matrix (draft version 1.0): Terms of Service ...................................................................................................... 52 Table 12 - Compatibility Matrix (draft version 2.0): Concent ..................................................................................................................... 53 Table 13 - Compatibility Matrix (draft version 2.0): Software ................................................................................................................... 54 Table 14 – Compatibility Matrix (draft version 2.0): Terms of Service ..................................................................................................... 54
InteroperabilityStandardsandSpecificationsReport
•••
Public Page5of56
Disclaimer Thisdocumentcontainsdescriptionof theOpenMinTeDproject findings,workandproducts.Certainpartsof itmightbeunderpartnerIntellectualPropertyRight(IPR)rulesso,priortousingitscontentpleasecontacttheconsortiumheadforapproval.In case you believe that this document harms in any way IPR held by you as a person or as arepresentativeofanentity,pleasedonotifyusimmediately.Theauthorsofthisdocumenthavetakenanyavailablemeasureinorderforitscontenttobeaccurate,consistentandlawful.However,neithertheprojectconsortiumasawholenortheindividualpartnersthatimplicitlyorexplicitlyparticipatedinthecreationandpublicationofthisdocumentholdanysortofresponsibilitythatmightoccurasaresultofusingitscontent.This publication has beenproducedwith the assistance of the EuropeanUnion. The content of thispublication is the sole responsibilityof theOpenMinTeDconsortiumand can innowaybe taken toreflecttheviewsoftheEuropeanUnion.TheEuropeanUnionisestablishedinaccordancewiththeTreatyonEuropeanUnion(Maastricht).Therearecurrently28MemberStates of the Union. It is based on the European Communitiesand the member states cooperation in the fields of CommonForeignandSecurityPolicyandJusticeandHomeAffairs.Thefivemain institutions of the European Union are the EuropeanParliament, theCouncilofMinisters, theEuropeanCommission,the Court of Justice and the Court of Auditors.(http://europa.eu.int/)
Publishable Summary The goal of the Interoperability Standards and Specifications report is to assess and improveinteroperability between relevant products from the TDM and NLP domains, in particular thoseinvolved and associated with the OpenMinTeD project. The process underlying the document isdesigned to closely involve internal and external stakeholders in the definition of requirementsnecessary to achieve better interoperability,with the aim also of committing these stakeholders toactuallyperformthenecessaryadjustmentstotheirrespectivesystems.Thisdocumentisthefirstinaseriesofthree.ItwillbeupdatedinM20(D5.3)andM26(D5.4).Thisreportfocussesonpresentingahigh-leveloverviewof theprogressachievedwithin the reportingperiodandonactionsplanned forthe next period. The actual work documents released during the reporting period are provided asattachmentstothisdeliverable.
1.1 Methodology In themilestone documentMS5 “Working groups external experts list andworkmethodology”,weoutlined themethodology for building the interoperability specification. SinceMS5 is presently onlyavailableontheproject-internalwiki,werepeatthekeyconceptshere:
3. Prototype – Several efforts are being undertaken to assess the feasibility and effort ofimplementingtheproposedrequirementsandtoprovide further insight for their refinement.These efforts are listed in the summary reports (Section 2) of the respective WGs in thesubsection“Progressincurrentperiod”.
4. Evaluation–Inordertoevaluatetheproposedrequirements,weidentifiedrelevantproducts(e.g.TDMframeworksoftheinvolvedprojectpartners)andassessedtheircompliancewithourrequirements. Based on the evaluation, we generated a set of actions designed to improvecompliance with the interoperability requirements. These actions are meant to serve as aroadmap for the next reporting period. More details on the compliance assessment areprovidedinSection4.3.
5. Specification – The requirements specification is a living document, continually updated asrequirementsarecreated,refined,anddeprecated.Section4.1providesadditionalinformationabouttherequirementlifecycle.
The process was designed to ensure the participation of all stakeholders. It also pays attention ontightly involving thosestakeholders that latermayneedtoadjust theirproducts inorder tobecomecompliantwithourrequirements.
1.2 Working groups Fourworkinggroups(WG)consistingofprojectmembersandexternalexpertsarecontributingtotheOpenMinTeDInteroperabilityStandardsandSpecificationseriesofdeliverables.TheseWGsare:
2. Summary Reports In this section, we provide short summary reports for each of the interoperability working groupscoveringthefollowingaspects:
• Missionstatement–shortupdatedsummaryoftheworkinggroup’smissionstatement• Mode of operation – each of the working groups opted for slightly different modes of
2.1.1 Mission statement ThefocusofWG1liesonthemetadatarequiredfordescribingresourcestargetedbytheOpenMinTeDprojectinordertoensuretheirdiscoverabilityandachieveinteroperabilitybetweenthem.TDMinvolvesawiderangeofresourcetypes:theresourcestobemined(scholarlypublicationsintheproject),thetextmining/languageprocessingsoftwareperseandancillaryknowledgeresourcesusedforitsoperation(e.g.annotationschemas,linguistictagsets,ontologicalresourcesusedforannotatingtheresourcestobemined,annotatedtextualcorpora).Todescribe these resourcesa core setofmetadataelementscanbeused tocapture their commonproperties (e.g. administrative information, such as contact details and identification data), whilevarioussetsofelementsencodetheparticularpropertiestheydisplay(e.g.sizeandformatforcontentresourcesvs.inputspecificationsforcomponents).Sinceprocessingactivitiesinvolvetheinteractionoftheseresources,asubsetoftheresources'propertiesneedtobedescribedwiththesamevocabulary(e.g.thelanguageofthecontentsofapublicationandthelanguageatoolorservicecanprocess,orthedomainofathesaurusandthedomainofapublication).ThedefinitionandharmonisationofthesemetadataelementsisthemainobjectiveofWG1.Thisendeavourisfurtherhamperedbythefactthatthese resources are the object ofwork for experts coming fromdifferent disciplines,with differenttheoreticalbackgroundsandconceptualisationoftheirwork,oftenusingdifferenttermsforthesameor similar concepts. The clarification of these concepts and their semantic mapping as a means toestablisha"common"vocabularyposesachallengeforWG1.InteroperabilityforWG1is,therefore,soughtattwolevels:
• across resource types - i.e. ensuring that the samemetadata elements are used to describetheirintersectingfeatures.
2.1.2 Mode of operation WG1 brings together experts from the different communities involved in the project, combiningexpertise invarious fields:publishers,aggregatorsofscholarlypublications, infrastructurespecialists,developers of language processing and/or text mining services, experts in the creation and/orrepresentationoflanguageandknowledgeresources,metadataspecialists,legalexpertsetc.The group holds regular teleconference calls where internal and external experts are invited;dependingonthetopicofthediscussion,theattendancevarieswithanucleusoftheexpertsalwayspresent and a further set joining when the discussion relates to their particular expertise.RepresentativesfromtheotherOpenMinTeDWPs(e.g.onusecases)alsoattendthemeetingswheninlinewiththeirinterests.Extra-regularmeetingsdedicatedtospecificissues(e.g.metadataschemasofpublications)havealsobeenheld. Inaddition,closecollaborationisalsosoughtwiththeotherthreeworking groups to ensure that their requirements as regardsmetadata encoding are properlymet;attendanceoftheirteleconferencecallsandworkingdocumentsprovidestheappropriateinput.Thegrouphascreatedaninventoryofmetadataschemasandrelatedefforts(takingintoaccounttheDeliverable D5.1 – Interoperability Landscaping Report) that present more interest for the WG1objectives - cf. Section 8.2. The discussions of the group have focused on the contents of theseschemas and on the interoperability requirements that were extracted from WP5.2 scenarios (cf.Section4).Finally,adocumentintheformofaworkingreport,iscollectivelydrafted.
2.1.3 Progress in current period WG1hasproducedthefollowing:
• a set of 21 requirements for interoperability1of the metadata descriptions between thevariousresourcetypes.8additionalrequirementsweregeneratedbutleftforreconsiderationforthenextreportingperiod;
• aselectionofresources,thatwillbedirectlyinvolvedintheprojectgiventhattheybelongtothe partners or identified by them as standard for our purposes, has been assessed forcompliancetotherequirements:
o OpenAIRE2,CORE3andFrontiersschemasfordescribingpublications;o TheSOZ4,AGROVOC5,JATS6,OLIA7andLAPPSGrid8asknowledgeresources;o asetofstandardlicences(e.g.CC,FOSSlicences);o AlvisNLP1,Argo2,DKProCore3andILSPsoftwarecomponents4;
• an inventory ofmetadata schemas, vocabularies and ontologies used for thedescriptionoftheseresourcetypes(cf.Section8.2)
• discussionstakingasabasethemainmetadataschemasusedbytheconsortiumpartners,i.e.META-SHAREforcorpora,knowledgeresourcesandsoftwarecomponents,andOpenAIREandCORE for publications; these, together with the WG1 interoperability requirements, thedescriptionofcomponentsperformedbyWG4andtherequirementsofWG2,havebeenusedasthebasisfortheReferenceMetadataSchemaofOpenMinTeD;
in a harmonized way all the resource types of OpenMinTeD and caters for their satelliteentities; theschema iscurrentlyunderreviewbythemembersofallWGsandplannedtobeusedfortheregistryfirstversion.
• Publication (jointly with WG3): Legal Interoperability Issues in the Framework of theOpenMinTeD Project: AMethodologicalOverview (for details on thepublication and a link,seeSection2.5)
2.1.4 Tasks planned for next period WG1willcontinuetoworkalongtheactionlinesthatithasalreadyinitiated:
• finalisetheinteroperabilityrequirements:asetofrequirementshasalreadybeenidentifiedbutthediscussionshowedthattheyarenotmatureyettoformulatewithaunanimousconsent;weexpect also that analysis of other sources (e.g.WP4 requirements, use of the schema in theregistry)willgeneratemorerequirements;
• updatetheinventoryofschemasandvocabularies,asrequired;• continue the compliance assessments and recommend ways of improving metadata
2.2.1 Mission statement Working Group 2 targets the interoperability of knowledge resources. Knowledge is specificinformation that is relevant for the linguistic and conceptual interpretation of text and the contentexchangebetweenTDMmodules.This information iseitherexploitedorproducedbyTDMmodulesandtools.Thedefinitionencompassesavarietyofresourcesubtypes:
• Languageresourcessuchasannotatedtextcorpora;• Ancillary resources for conceptual/linguistic interpretationwithin the TDMworkflow such as
lexicons,termbanks,ontologies,thesaurianddictionaries.• Processing resources that produce knowledge such as textmining tools/services like part of
WG2aimsto:• Tackle semantic interoperability issues when integrating knowledge resources (linguistic,
terminologicalandontologicalresources)withTDMworkflows.Theseissuesarisea)whenthesamedomainconceptmaybedefined indifferentways inknowledgeresources,b)whentheconcept ismodelleddifferentlywithin a TDMcomponent anda related knowledge resource.Oneexamplewouldbetwoontologiesonchemicalcompoundsspecifyingthesecompoundsatdifferentlevelofdetailandusingdifferentcontrolledvocabularies.Anotherexamplewouldbethat the part-of-speech tag necessary to disambiguate a word during a dictionary lookup isusingadifferentsetoftagsinthedictionarythanitisproducedbyanautomaticpart-of-speechtagger.
• Define a specification for the representation of knowledge resource types such as lexicons,terminologicalsources,thesauri,ontologies,annotatedcorporaandtooloutputs.
Thefocusofthisgroupisonensuringthediscoverability,interoperabilityandconsistencyoflinguistic,terminological and ontological content at the granular representation level of individual knowledgeelements.Thisknowledgeiseithercontainedwithinresourcesorproducedbylanguageprocessingandtextminingtools.Itsinteroperabilitywillfostercommonunderstanding,datasharingandreuse.Forthispurpose,thegroupwillestablishanetworkof(defacto)standardreferencevocabulariesforthe representationand linkingof informationelements required for interoperable text consumptionandprocessing.
2.2.2 Mode of operation The group consists of a number of internal experts, a subset of whom attends regular monthlyteleconferences.Theworkthatisdiscussedanddistributedovertheparticipatingpartnersisdonebymeansofthecollaborativedraftingofadocumentwhosestructurereflectsthatofthepresentreportand its future iterations.We also use collaborativelymaintained spreadsheets for the collection ofknowledgeresourceschemas,linkinginformation,andrequirementformulation.Externalexpertsaremostlyconsultedonapersonalbasis,andtargetedfortheirparticularexpertise.
InteroperabilityStandardsandSpecificationsReport
•••
Public Page14of56
2.2.3 Progress in current period WG2 has worked on the interoperability of Knowledge Resources (KRs): resources containing,producingorrepresentingknowledge.Afirstsetofschemashasbeenselected,whichcanbedividedupintoseveralsubtypes:
Interoperability between these schemas is in the process of being defined in terms of a number oflinkingrelationsbasedonOWL1/RDF2andSKOS3relations.Once completed, the linked vocabularies form a reference network of linguistic, terminological andontologicalconceptualelementsthatformsthecorevocabularyforTDMinformationexchange.Requirements forKRoperationalisationwithinOpenMinTeDhavebeen formulated.Eightof theKRslistedabovehavebeencheckedwithrespecttotheircompliancewiththerequirementsidentifiedsofar(cf.Section5.2).
2.2.4 Tasks planned for next period • Selectionofastandardforannotationinteroperabilityatthelevelofend-to-endsystemoutput• The furtherselectionofparticularcandidatestandards for theprovisionofacoresetofdata
2.2.5 State of operations Thegroupoperatesefficientlyandproducesexpectedoutputsaccordingtotheplan.Theselectionofuse case driven resource schemas was performed on the basis of the WG2 member expertise.Alignmentwithotherworkpackages,specificallyWP4,hasbeenestablished.
2.3 WG3 – IPR and licensing
2.3.1 Mission statement The goal of WG3 "IPR and licensing" is to study and identify copyright and related rights (e.g. suigeneris database right) restrictions and exceptions to the use and reuse of sources (both textual
sourcesandtext-miningservices)inTDMactivities.OnthisbasistheWGwillalsoidentifycontractualtoolsandschemes(e.g.licences)thatcanbestservetheneedsofTDMservices.Inparticular,itwillexaminewhichexceptionsarecurrentlyavailable(e.g.thenewlyimplementedTDMexceptionintheUK),whichareupcomingandwhetherthecurrent/proposedsolutionsembracealltheneedsofthescientificandacademicsector(e.g.isthenon-commerciallimitationnecessary?).The working group also focuses on the issue of legal compatibility and interoperability of licenses,aiming at determinewhethermultiple licenses that apply to different components can be deemedcompatible and legal interoperable, particularly when there is the need to assess if the result(combinationofcomponentsunderdifferencelicensingterms)canberedistributedornot.Additionally,openlicensingmodelsforboththescientificrelatedtextualsourcesandthetext-miningserviceswillbeexploredandevaluated,bymeansofspecifictoolssuchasgraphicalrepresentationsoflicensescompatibility(tobeidentifiedascompatibilitymatrix)andworkflowsthatwillguidetheendusers to choose the best applicable license and determine what licensing restrictions or rightsstatementslimitations,ifany,applytospecificuses.
2.3.2 Mode of operation WG3,while focusingon legal interoperability issues,brings togetherexperts fromavarietyof fields.Theseinclude:legalstudies,publishers,technicalexperts(computerscientists,metadataexperts,etc.),policy makers, academics, representative of different communities, groups and initiativesinternationallyactiveinthefield.WG3regularlyorganiseconferencecallswithexternalexperts(onceamonth),internalexperts(onceamonth)anddedicatedconferencecallswithWG1plusselectedexpertstodiscussthespecificissueoflicence/right statements and metadata representation (again once a month), for a total of threeconferencecallsamonth.Agendaitems,minutesandsummariesofallconferencecallsaremaintainedonthededicatedwebsite.WG3maintainsanupdatedlistofworkingdocumentswhichincludeaninventoryoflicencesandtermsof use submitted by all the consortium members, which form the basis for another documentdedicated to the compatibilityof the licences and termsofuse.Adetailedbibliographyof scholarlypublications and policy documents is also maintained. Additionally, a glossary collecting the mostrecurringlegalconceptswithabriefexplanationislikewiseavailable.
2.3.3 Progress in current period Thegrouphastwomaingoals:favouringlicencecompatibilityandclarifyingthelegallandscapeinthefieldofTDM.Allthis,withaviewtotheneedsofTDMresearchers,whichimpliestheneedtodevelopdocumentsandtoolsthatcanbereadilyusedbylaymen.Thefollowingitemssummarisetheaccomplishmentachievedsofarare:• Licence-relatedinteroperabilityrequirements1• LicenceCompatibilitymatrix(seeSection8.3)-aschematicrepresentationthatrepresents(a)the
typeofdata(contents,softwareandtermsofservices)and(b)thetypeof licencesand/orrightsstatements, to determine whether or not there is compatibility between resources underrespectivelicences.Thismatrixaimsatfacilitatingthechoiceforusersforthebestlicencetouseand share/distribute resources and particular TDM workflow results which may have been
generatedfrommultiplesourcesunderheterogeneous licences.Whenmultiple licencescouldbeapplied, also displays in briefwhat could be the legal implications of choosing one or the otherlicence.
• Legalmetadataandrightsstatement(thisdocumentisstillworkinprogressandnotincludedwiththe present deliverable) - a document drafted in collaboration with WG1 that illustrates theroadmap to address the inference of legal metadata elements and rights statements for thepurpose of TDM activities. This roadmap articulates in the following actions: (a) identifyingapplicable rights statements andbuild anOpenMinTeD inventory; (b) categorising these findingsand comparing them with similar inventories (e.g. Europeana, OpenAIRE and CORE, but alsoCLARINandMETA-SHARE);(c)identifyingacommonvocabularywhilealso(d)contemplatingtheirmachine-readability.
• Licences<->RightsStatements(thisdocumentisstillworkinprogressandnotincludedwiththepresentdeliverable) - a synthetic representationof licences and rights statements’ conditions tohelpusersunderstandwhatagivenlicenseorasetoflicensesallowthemtodo–includingwhattheyarerequiredtodotoproperlyperformtheiractivitiesunderthoselicensingterms–andwhatlimitationsorrestrictionsmayapplytotheusetheywishtomakeoftheresource..
2.3.4 Tasks planned for next period The main question to be addressed together with the other WG regards the “granularity” of therepresentation of legal information. In other words, whether the legal rules and the connectedmetadata should be represented at the licence level, or deconstructed further at the level of rightstatements. There is an ongoing discussion with internal and external experts (including therepresentativesof internationalprojectsactive in this field)about thedesirabilityandfeasibilityofa“rightsstatement”implementation.On the basis of the outcome of this analysis,WG3 will implement the connected licence or rightsstatement compatibility table both at the “horizontal” level as already in draft version in the listeddocuments,aswellasatthe“multi-layer”levelexplainedinthecitedpapers.TheextendedabstractaboutTDMexceptionwillbedevelopedintoafullpaper.Theglossarywillbeaccordinglyexpanded.Manyofthedevelopedresources(casescenarios,bibliography,glossary,etc.),willformthebasisforadditionaltrainingmaterialasrequestedbyotherWP(e.g.FAQs).
2.3.5 State of operations Thework ofWG3 is on schedule. The discussion about “granularity” is revealing to bemuchmorecomplexthanoriginallythought,butthishasnotcausedmajordelays.Intheeventualityinwhichthediscussionwillnotfindanacceptablesolutionwithinreasonabletime,ariskreductionplanhasalreadybeenconsidered.Theoriginallyintendedlicencecompatibilitytoolswillbedeveloped,inparalleltothediscussionregardingrightsstatement.Giventhemodularityof thecompatibilitytableandthemulti-layer approach, an eventual implementation of a rights statement compatibility table within theexistinglicencecompatibilitycanbeeasilyachievedinlegal,technicalandscientificterms.
InteroperabilityStandardsandSpecificationsReport
•••
Public Page17of56
2.4 WG4 – Annotation and workflows
2.4.1 Mission statement This working group studies interoperability aspects of text annotation and workflows. It includessupportedinput/outputformats,annotationencodingmodels,workflowarchitectures,serviceaccessmodes,typesystemalignmentandothers.Aninterfacebetweenworkflowmanagementsystemsandcomponents is a key interoperability issue, as it includes the problems of how their functionality ispackaged,whatmetadataareincludedandhowtheyareinterpretedbyasystem,butisalsorelatedtowhat type of information is processed and how it is represented, serialised as input/output files ortransmitted.
2.4.2 Mode of operation Thegroupactivitiesarebasedon theexpertiseof itsmembers, representing institutionsdevelopingsomeoftheleadingtextminingframeworks:UniversityofManchester(ARGO,U-Compare),UniversityofSheffield(GATE,AnnoMarket),UniversityofDarmstadt(DKProCore),FrenchNational InstituteforAgriculturalResearch (Alvis) andAthenaResearchand InnovationCenter (ILSP). Thegroupmeets atregular conference calls every two weeks and if necessary consults external experts, representingother major TM centres. A cycle of technical presentations on workflow systems has also beeninitiated,startingwithadescriptionanddiscussionondistributedexecutioninArgo.
2.4.3 Progress in current period Sofarthegrouphasproducedthefollowingresources:
• Analignmentof6typesystems2usedinexistingplatforms(Alvis,Argo,DKProCore,GATE,ILSP,LAPPS Grid). The alignmentmaps equivalent types and features, which shows concepts andapproaches that are consistent or overlapping. On the other hand, it also help to identifydifferencesandseewhethertheycomefromdifferentfocus(e.g.concentratingonbiomedicalconceptsmissingfromothersystems)ordifferentconventionsofdatarepresentation.
• An initial directory of 556 components3currently available in libraries of the consideredworkflow systems, including their short description, automatically assigned categories,parametersandmachine-readabledescriptorsinMETA-SHAREformat.Thedirectoryiscreatedthrough an automatic process that aggregates metadata from multiple sources. Theaggregationprocessesisworkinprogressandcontinuallybeingimproved.
• Initialworkonaprototypesolutionallowingtobuildworkflowsincludingcomponentscomingfromdifferentplatforms (initiallyDKProCore (UIMA)andGATE,also looking intoAlvis,Argo,ILSP, LAPPS Grid) in the form if the domain-specific programming language “OpenMinTeDScript”.4This prototype serves as a sandbox to investigate interoperability issues in terms ofcomponent lifecycle, deployment, and data transformation. In particular, it allows us to
generatediscussionsandinsightsonthesetopicsindependentlyoftheOpenMinTeDWorkflowservicewhichwillbedeliveredlaterintheproject.Infact,weexpectthatlessonslearnedfromOpenMinTeDScriptwillhaveanimpactonthedesignoftheOpenMinTeDWorkflowservice–potentially parts of OpenMinTeD Script can even evolve to be integrated into theworkflowservice,e.g.thedatatransformationfunctionality.
• Creatingconcreterequirements:sofarallofthecreatedrequirementsare‘abstract’, i.e.theydescribesomedesiredfunctionality(e.g.componentsshouldbedescribedbymachine-readablemetadata), but without technical details (e.g. a format of the metadata). For each abstractrequirement, at least one concrete counterpart should be created in the next period. Theprocess of creating the concrete requirementswill also inform the interoperability guidelinedeliverables(D5.5andD5.6).
2.4.5 State of operations Thegroupoperatesefficientlyandproducesexpectedoutputsaccordingtotheplan.
2.5 Publications Thissection listspeer-reviewedpublicationsrelevanttothisdeliverable fromprojectpartnerswithinthereportingperiod.Allthepublicationsareavailableonline1asopenaccessunderCC-BY-NClicence2.• P. Labropoulou and S. Piperidis and T. Margoni, 2016. Legal Interoperability Issues in the
Framework of the OpenMinTeD Project: a Methodological Overview. In Proceedings of theWorkshop on Cross-Platform Text Mining and Natural Language Processing Interoperability(INTEROP2016)atLREC2016,p.60-63,Portorož,Slovenia,DOI10.5281/zenodo.182497
• T.Margoni andG.Dore, 2016.WhyWeNeed a Text andDataMining Exception (but it is notenough)(ExtendedAbstract).InProceedingsoftheWorkshoponCross-PlatformTextMiningandNatural Language Processing Interoperability (INTEROP 2016) at LREC 2016, p.57-59, Portorož,Slovenia
• W. Peters, 2016. Tackling Resource Interoperability: Principles, Strategies and Models. InProceedings of the Workshop on Cross-Platform Text Mining and Natural Language ProcessingInteroperability(INTEROP2016)atLREC2016,p.34-37,Portorož,Slovenia
• M.BaandR.Bossy,2016. Interoperabilityof corpusprocessingwork-flowengines: the caseofAlvisNLP/MLinOpenMinTeD.InProceedingsoftheWorkshoponCross-PlatformTextMiningandNatural Language Processing Interoperability (INTEROP 2016) at LREC 2016, p.15-18, Portorož,Slovenia
• P.KnothandN.Pontika,2016.AggregatingResearchPapersfromPublishers'SystemstoSupportText and Data Mining: Deliberate Lack of Interoperability or Not?. In Proceedings of theWorkshop on Cross-Platform Text Mining and Natural Language Processing Interoperability(INTEROP2016)atLREC2016,p.1-4,Portorož,Slovenia,DOI10.5281/zenodo.194788
• R.EckartdeCastilho,2016.Interoperability=f(community,divisionoflabour).InProceedingsofthe Workshop on Cross-Platform Text Mining and Natural Language Processing Interoperability(INTEROP2016)atLREC2016,p.24-28,Portorož,Slovenia,DOI10.5281/zenodo.161848
ChristianChiarcos Goethe-UniversitätFrankfurtamMain,Germany X ChristopherCieri LDC,USA X DaanBroeder MPIforPsycholinguistics,Netherlands X DianePeters CreativeCommonsHQ X DominiqueEstival WesternSydneyUniversity,Australia X XEnriqueAlonso ConsejodeEstado X EricNyberg CarnegieMellonUniversity,USA XFedericoMorando NexaCenterforInternet&Society,Italia X GeoffreyBilder Crossref X X GiuliaAjmoneMarsan TheOrganisationforEconomicCo-operationand
Development(OECD) X
GwenFranck CreativeCommons,EIFL X InekeSchuurman CCL,UniversityofLeuven X Jin-DongKim DatabaseCenterforLifeScience,Research
OrganisationofInformationandSystems X
JochenSchirrwagen UniversitätBielefeld,Germany X JohnMcCrae NationalUniversityofIreland,Galway,Ireland X KeithSuderman VassarCollege,USA(LAPPSGrid) X X XKristoferErickson CREATe X LarsBjørnshauge SPARCEurope X LiamEarney JISC,UK X LukaszBolikowski UniversityofWarsaw,Poland X XMaartenvanGompel RadboudUniversityNijmegen,NL X MaartenZeinstra Kennisland,NL X MarcVerhagen BrandeisUniversity,USA(LAPPSGrid) XMarkPerry UniversityofnewEngland,Australia X MaurizioBorghi BournemouthUniversity,UK X MenzoWindhouwer MPIforPsycholinguistics,Netherlands X NancyIde VassarCollege,USA(LAPPSGrid) X XPaulKeller Kennisland,NL X PaulUhlir NationalAcademyofSciences X PawelKamocki InstitutfürDeutscheSprache,Germany X PeterSuber BerkmanKleinCentre,HarvardUniversity X PiekVossen VUUniversityAmsterdam,Netherlands XProdromosTsiavos TheMediaInstitute X RafalRak UberResearch,UK XSteveCassidy MacquarieUniversitySydney,Australia X XThiloGötz IBM,Germany X
InteroperabilityStandardsandSpecificationsReport
•••
Public Page21of56
3. Scenarios Inpreparationofgenerating interoperabilityrequirements(Section4), theWGspreparedasetof17scenarios.ThesescenarioshighlightedparticularaspectsofinteroperabilityfromtheperspectiveoftherespectiveWG. Theywere identified anddescribedby the participants from the respectiveworkinggroups through introspection and subsequently described and refined in a collaborative processinvolving external experts, cross-WG communication, as well as communication with WP4. Inparticular, the firstOpenMinTeD InteroperabilityWorkshopheld onNov 12, 2015 in TheHague,NLrevolved around the interoperability scenarios and focussed on deriving a first seed set ofinteroperabilityrequirementsfromthemwhichwaslaterelaborated.WG1
• Scenario1 — Discoverresourcesofvarioustypesatvariouslocations• Scenario2 — SMErunningresearchanalyticsforfunderswithintheEuropeanResearchArea• Scenario3–-Acontentproviderusingtextminingtoolstoenrichtheircontent• Scenario4 — Providecomprehensivestatisticalmetadataforresources• Scenario5 — Domain specific researcherusinga textmining toolor service topromote their
4. Requirements This sectionoutlines thestructureof requirementsandpresents the requirementsgeneratedso far.The section provides an overview of the requirement’s structure (Section 4.1) and a high-leveloverviewoftheactualrequirements(Section4.2).
4.1 Requirement Structure ID-EveryrequirementhasanID.Westartcountingfrom1andeverynewrequirementincrementstheIDby1.TheIDisencodedintherequirementfilename,e.g.1.adoc.Concreteness - TheOpenMinTeD infrastructureaims tobeopen, sustainable,andable tocopewithchangeinthecommunityandintechnology.Assuch,itneedstobeabletosupportmultiplepopulartechnologies and standards. As popularity is changing over time and as new standards andtechnologies are evolving, OpenMinTeD will have to evolve as well. As supporting too manytechnologiesand standards inparallel is alsounsustainable.Thus, the supportedbyOpenMinTeDatany time will be limited to a few. However, third-parties that would like to develop and maintainexternalmodulesforOpenMinTeDtosupportadditionaltechnologiesandstandardsarewelcomeandthesethird-partiescanrefertotheinteroperabilityrequirementstoestimatethefeasibilityofcreatingsuchanexternalmodule.Thedistinctionbetweenabstractandconcreteinteroperabilityrequirementsthatwemakehereallowsustoanswertwoquestions:
• HowdifficultisitforanewtechnologyorstandardtobeincorporatedintoOpenMinTeD?• Howdifficult is itto integratenewcomponentsbasedonalreadysupportedtechnologiesand
standardsintoOpenMinTeD?
Abstract requirements are agnostic to concrete technologies and standards and help assessingcompliance with them; helps answering the first question. Concrete requirements refer to specificimplementationdetailsandhelpansweringthesecondquestion.Requirementconcretenessvalues
• Abstract-therequirementspecifiesaneed,butdoesnotgointodetailshowthisneedmustbefulfilled.The requirementmayprovideexamplesof techniquesor implementations that fulfiltherequirement,butdoesnotmandatetheiruse.
• Concrete - the requirement specifies a need and prescribes the use of specific techniques,standards,implementations,etc.
they do not affect the compliance status of any product, component, format, etc. that hasalready been evaluated against the requirement specification. If a change would trigger achange in any compliance status, instead of changing an existing requirement, a newrequirementmustbecreatedunderanewIDandcompliancemustbeevaluatedagainst thisnewrequirementspecificationinthenextiteration.Thepreviousrequirementmustbemovedtodeprecatedstatus.
• Deprecated - the requirement is no longer to be used for compliance assessment. Therequirementspecificationmustnotbechanged.Exceptionsareamendmentsaddingpointerstopotentialnewversionsoftherequirementandprovidingarationaleforthedeprecation.
Category - The category of a requirement is used to anchor it in the document structure of theinteroperabilityspecification.Arequirementmaybeassociatedwithmultiplecategories.
4.2 Requirements Overview This section provides a high-level overview of the interoperability requirements that have beengeneratedduringthereportingperiod.Atotalof72requirementshavebeengeneratedbytheWGs,manyofwhichareapplicableacrosstheWGs(WG1:21,WG2:17,WG3:23,WG4:33).Thesecanbebrokendownbystatus:
Here, we provide only a tabular overview over the requirements generated so far. Each of theserequirementshasamoredetaileddescriptionwhichcanbefoundintheinteroperabilityspecificationdocument1thatisalsoattachedtothepresentdeliverable.The generation of requirements happens perWG. It is possible that very similar requirements arebeing generated in multiple WGs. When this happened, we kept on of the requirements anddeprecatedtheothers,mergingcomplianceassessmentintotheremainingrequirementifnecessary–thisisanongoingprocessandcontinuesasmorerequirementsareaddedandasexistingrequirementsbecome better understood. Several requirements that were generated by the WGs were laterconsideredtobefunctionalrequirementsforoneoftheOpenMinTeDservices(e.g.theregistryserviceortheworkflowservices)ratherthaninteroperabilityrequirements.ThesehavealsobeenmarkedasdeprecatedandscheduledforinclusioninthefunctionalspecificationdocumentD4.3.Mostoftherequirementsarerecommendations(41),acoresetofrequirementsismandatory(6),andafewareoptional(6).We provide in this document only the requirement overview with their short titles. The fullrequirementspecificationisprovidedasanattachmenttothisdocumentandisalsopubliclyhostedonourGitHubrepository.BrowsingtherequirementshostedonGitHubisthepreferredmethodasitisahighlycross-referencedhypertext.
10 Components should specify the typesof theannotationsthattheyinputandoutput
abstract mandatory WG4,WG2
11 Components should declare whether theycanbescaledwithinaworkflow
abstract mandatory WG4
13 Citationinformationforcomponent abstract recommended WG1,WG414 Components must maintain Licence
informationabstract mandatory WG4
18 Workflows should be described using anuniformlanguage
abstract recommended WG4
51 Licenceshouldbeattached abstract recommended WG353 Licensormustbeentitledtograntlicence abstract recommended WG354 Licensees should remain with a copy of the
licenceabstract recommended WG3
55 Standardlicencesshouldbeused abstract recommended WG356 Licenceshouldbemachinereadable abstract recommended WG357 Licence should be understandable by non-
lawyersabstract recommended WG3
58 TDMmustbeexplicitlyallowed abstract recommended WG359 Right for (temporary) reproductionmust be
62 World-wideandirrevocablelicencegrant abstract recommended WG363 LicencemustqualifyforOpenAccessrights abstract recommended WG364 LicencemustqualifyforOpenAccessuses abstract recommended WG365 Licence must qualify for Open Access must
notrestrictuseinanywayabstract recommended WG3
66 Licence must qualify for Open Access mayincludeattributionrequirements
abstract recommended WG3
Table2-Requirementsinstatus"final"
ID Requirement Concreteness Strength WG’s1 Componentsshouldbedescribedbymachine- abstract mandatory WG4
InteroperabilityStandardsandSpecificationsReport
•••
Public Page25of56
ID Requirement Concreteness Strength WG’sreadablemetadata
2 Component metadata should be embeddedintothecomponentsourcecode
abstract recommended WG4
3 Component metadata is separable from thecomponent
5. Compliance Intheprevioussection,wediscussedtherequirementsforinteroperabilitythatWGsinOpenMinTeDhave identified so far. But unless relevant products are compliantwith these, the requirements areineffective. Inthissection,weanalysethecompliancewiththerequirementssofar.ThisprovidesuswithabasisfordetermininghowtoeffectivelyimprovecomplianceandthusinteroperabilitybetweentherelevantproductsaswellaswiththeOpenMinTeDinfrastructure.
When a requirement is changed, compliance assessments may have to be updated as well. Thus,complianceassessmentsshouldonlybemadeonrequirementsthathavebeenmarkedas“final”,i.e.whosedescriptionmustnolongerbechanged.However,inpreparationofthepresentdeliverable,wehavealsoperformedcomplianceassessmentsforthoserequirementswhicharestill in“draft”status.Thoseassessmentswillhavetobeupdatedwhentherequirementsarepromotedtothe“final”status.
5.2 Compliance assessments In this section, we list the products taken into account for the compliance assessment. For everyinteroperabilityrequirement,therearerelevantclassesofproducts:
• Resources that have been developed by the consortium partners andwhere the creation ofmetadata is the responsibility of the respective partners (Frontiers, Alvis, Argo/U-Compare,DKProCore,ILSP)
• Resources that are already used in TDM processes and/or are being examined for use inOpenMinTeD and are, therefore, not directly responsible for the metadata descriptions(TheSOZ,AGROVOC,JATS,OLIA,LAPPSGrid,licences)
• Resources that are being collected from the original data providers who also supply themetadatadescriptions(CORE,OpenAIRE).
6. Actions Basedon the complianceassessment, eachWGhas identifiedactions thatneed tobeperformed inorder to improve the compliance of relevant products with the OpenMinTeD interoperabilityrequirements. These actions shall guide the work of the WGs in the next reporting period(s), willprovide input to T5.4 “Alignment of service and content provider systems” and T5.5 “Datainteroperability toolkit for repositories, publishers’ systems andOpenMinTeD subsystems” and shallalsobetakenintoaccountfortheimplementationofOpenMinTeDservices(WP6).Most of the requirements generated so far (69 out of 72) are “abstract”, i.e. endorsedways to becompliantwiththeserequirementsthroughtheuseofspecificstandards,havenotyetbeenspecified.Nevertheless, various relevant products are already compliant with these abstract requirements,althoughpotentiallyinverydifferentways.A major focus across all WGs for the next reporting period will be to add suitable “concrete”requirements explicating the specific standards and mechanisms endorsed and supported byOpenMinTeD.Wherenosuitablestandardsandmechanismsexist,theWGswill-incollaborationwithWP6(Implementation)-proposetomakeuseofrespectivemechanismspioneeredandimplementedbyOpenMinTeDandincludetheirrespectivespecificationsinfutureversionsofthisdeliverable.Asecondmeasureofensuringtheapplicability,practicability,andcompletenessoftheinteroperabilityrequirements going forward is the continued development of interoperability prototypes. TheseprototypesarealsomeanttobecarriedoverintotheactualimplementationofOpenMinTeD.
6.1 WG 1 Withoneexception,when themetadatadescriptions fallunder the responsibilityof the consortiummembers, the resultsof theassessmentswere rather satisfactory,when the requirementapplied tothespecificresourcetype.Strategicactions–needtobeundertakentoresolvetheseissuesincludeatahigherlevel:
• Promotingandsupportingthecreationandenrichmentofformalmetadatadescriptions• Standardising, where possible, the metadata elements and values and recommending best
practicesforfillingthemin.
Immediate actions – the immediate actions that can and should be taken in the OpenMinTeDframework,toensureinteroperabilityatleastfortheproject’spurposes:
• Conversion of the existing metadata descriptions to the reference metadata schema andenrichment thereofwith the lackingmetadata elements; thiswill also help in spotting otherinteroperabilityissuesaswell.
Adding/improving formal metadata descriptions for JATS, LAPPS and Alvis; for Alvis, these can beprovided in thenextphase, given that thedeveloper ispartof the consortium; for JATSandLAPPS,thesewillneedtobeprovidedbyotherpartners.
• Lack of standardised vocabulary to encode the information, even though the elements areconsidered important and may already be present in other forms (e.g. as free textdocumentation); for instance, forREQ-30,REQ-31(qualitymetrics)andREQ-32(version), it isimportanttoarriveataconsensusontheencodingpracticesbeforeaddingittothemetadatadescriptioninaharmonisedway.
• Absence of the information in the metadata descriptions, despite the existence of theappropriateelements;suchcasesare,forinstance,REQ-39(formatforcontentresources)andREQ-41 (language for content resources); this category includes both technical andadministrative information, and the reasons behind this non-compliance can be that theinformationisusuallyoptionalandregardedbytheprovidersasredundant.
Prototypes–thecomponentoverview2representsaprototypefortheaggregationandtransformationofexistingcomponentmetadatadescriptionsfromdifferentsources(GATE,UIMA,Alvis,Maven,etc.)into a common scheme. Its development has provided insights that have been integrated into thedevelopmentof the first versionof theOpenMinTeDMetadataSchema.Partsof its functionality, inparticular functionality related to theharvestingofmetadata, isalsonowbeing transferred into theOpenMinTeDregistry.WG1willaccompanytheevolutionofthisprototypeasitisbeingintegratedintothe registry and as its harvesting functionalities are expanded and update the interoperabilityspecificationasnewrequirementscomeup.
6.2 WG 2 1https://openminted.github.io/releases/interop-spec/1.0.0/openminted-interoperability-spec/#REQ-42https://openminted.github.io/releases/interop-spec/1.0.0/components/
InteroperabilityStandardsandSpecificationsReport
•••
Public Page32of56
In general, the requirements were deemed essential and relevant. All KRswere deemed compliantwithREQ-67,REQ-71andREQ-72.Forthisreason,weconsidermakingtheserequirementsmandatoryinthenextiterationofthespecification.RequirementREQ-68andREQ-69checkthelevelatwhichKRsalreadyresource-internallymakeuseof:
• Strategical, i.e. widely used and interconnected (de facto) standard vocabularies forlinguistic/terminological/ontologicalmetadata
o Ontolexo OLIAReferenceModelo CLARINConceptRegistryo Schema.orgo PennTreebanko UniversalDependency
• Representativesetofusecasedrivenschemas o TheSOZ(socialsciences)o JATS(structureofscholarlyarticles)o Agrovoc(agriculture)o BioLexicon(lifesciences)
Thissetwillbeextendedwherenecessary inordertoensurefullcoveragefortheinteroperabilityofTDMKRelementsrequiredinOpenMinTeD.REQ-70wasconsideredaspotentiallyageneralWG4requirement,andwasthereforenotincludedinthecompliancecheck.Using the schema alignments created in this period, a prototype will be set up during the nextreportingperiodtoinvestigatetheapplicationofthealignmentsinpracticalworkflows.Tothisend,wewill involveinparticularthoseOpenMinTeDpartnersandexternalexpertsthatworkextensivelywithknowledgeresources.Inthisway,weexpecttogeneratefurtherabstractandconcreteinteroperabilityrequirements for OpenMinTeD. This prototype can be implemented in conjunction with anotherprototypeoriginatingfromWG4:OpenMinTeDScript.Table6–WG2summaryofactionstoimprovecompliance
6.3 WG 3 Due to the nature of the WG, the possible immediate actions are limited. The primary relevantproducts for thisWGare licencesand termsofuse thatarecreatedby thirdparties.Hence,actionswithexternal influencearegenerallyofa strategicnature.OtheractionsarecoordinatedwithotherWGs,mainlyWG1,regardingtheinclusionoflicencemetadatawithresourcesbeingsubmittedtoandaccessedthroughOpenMinTeD.Strategicactions–AllresourcesingestedbyOpenMinTeDorproducedastheresultofaTDMprocessmustcarryalicence.Thelicencehastobeexpressedinbothlegalandmetadataterms.Anadditionallayer of information regarding themain rights and obligations should also be added (e.g. commonsdeed).Licencesshouldcomplywiththeproductrequirements(licences) identifiedbyWG3. Inparticular,alllicences (inbound; outbound) should be chosen among standard licences with clear compatibilitystandards.Ad-Hoclicencesaredeprecated.Amajorissuehereisconnectedwiththefactthattypicallytermsofuseforservicesarenotstandardised.Thisisanaspectthatwillbeaddressed.
InteroperabilityStandardsandSpecificationsReport
•••
Public Page34of56
Anassessmentofthelegalstatusofaresource(otherthanlicenceorabsenceoflicences)isdependentonapplicablelegislation.Ad-hocanalysisinthiscaseseemsunavoidable.Immediateactions–Wecontinuediscussionwithinternalandexternalexpertsthe“rightsstatement”implementation and feed the output of this discussion into the implementation of the connectedlicenceorrightsstatementcompatibilitytable.Prototypes–Forthenextreportingperiod,weplantheimplementationofalicenceselectorprototypeturningthecompatibilitymatrixintoauser-orientedapplication.Additionally,weshallinvestigatethewayinwhichcertainpermissionsorobligationsare(not)grantedorimposedinparticularlicencetextsthrough.Tothisend,wewillconductanexperiment inwhichexpertswith legaltrainingwillanalyselicencetextsandannotatephrasesorsentencesinthelicencetextswiththeirlegalimplications.Table7–WG3summaryofactionstoimprovecompliance
6.4 WG 4 Immediate actions – Based on the compliance assessment, we identify three areas that requireimmediateactioninthenextreportingperiod:1. Core requirements – necessary for workflow execution, e.g. regarding component input/output
definition, metadata, dependencies specification. These are fairly well supported across theproductsandonlyafewimprovementsarenecessarytoachievefullcompliance:• Makingcomponentmetadataavailablebothfromitssource(REQ-2)andseparately(REQ-3).• Creatingauniqueidentifierforeachcomponent(REQ-6).• Enablingusingworkflowsascomponents(REQ-24).• Enforcingspecifyinginput/outputannotationtypesforcomponents(REQ-10).
2. Additional requirements – where all (or nearly all) of the products are non-compliant, thussignificantchangesarenecessarytosatisfythem:• Non-technical information in component metadata, including citable publications (REQ-13),
• Lettingusersdecideonhowaworkflowisdeployed,whichmaybenecessarybecauseoflegalreasons by making sure a workflow engine doesn’t see the processed data (REQ-20) ordownloadingcomponentsforlocaluse(REQ-28).
Prototypes–Duringthepresentreportingperiod,WG4hascreatedOpenMinTeDScriptasaprototypetoinvestigateinteroperabilityissuesincross-platformworkflows(i.e.workflowsinvolvingcomponentsfrom UIMA and from GATE). This was a necessary step in order to deepen the discussion aroundinteroperabilityinworkflowstowardsthegenerationofconcreteinteroperabilityrequirements.Italsocan serve as a temporary research substitute for theOpenMinTeDworkflow service,which is to bedelivered later in theproject. In fact,weexpect thatpartsofOpenMinTeDScriptcanbetransferredintothedesignandimplementationoftheOpenMinTeDworkflowservice.Assuch,weshallcontinueevolving this prototypeduring the next reporting period, in particular incorporatingmore platforms(e.g. web-services from ILSP, from UNIMAN, or from LAPPS Grid). This also entails intensifiedinvestigation into thedata transformationprocessesnecessary tobridge the technical and semanticgapsbetweenthedifferentplatforms.Table8–WG4summaryofactionstoimprovecompliance
Action Product Relatedrequirements
ContinueevolvingtheOpenMinTeDScriptprototypewith a focus on the integration of additionalplatformsandondatatransformation
Add functions to export all the configurationparametersofaworkflowasasinglefile(considereddifficultbecauseofarchitecturallimitations).
Alvis,Argo,ILSP
22
Add changes to execution environment and userinterface that would enable running workflows ascomponents
Argo
24
Revise implementations and metadata for existingcomponentstomakesuretheyspecify input/outputtypes
Alvis,Argo
10
InteroperabilityStandardsandSpecificationsReport
•••
Public Page36of56
Action Product Relatedrequirements
Expand the component metadata schemata toincludealldesiredadditionalfields.Thatseemstobefairlyeasilyachievableinallproducts,butitrequiresalotofefforttofillinthatinformation(esp.licences)forallexistingcomponents.
all
8,13,14
Unifyhandlingofresourcesandmodels. all 16,26
Prepare the execution model in a way thatguarantees that a user may choose where theprocessing happens, which is important because oflegal restrictions. Achieving this is consideredpossiblebutdifficult,asitrequiresmajorchangesinthesystems.
Alvis,Argo,GATE,ILSP
28
Write documentation for undocumentedcomponents.
Alvis,Argo,ILSP 12
Offer classes for component authors to extend, sothattheywillhandlefailuresproperly
Argo 27
Define exactly what kind of type system andenvironmental information is necessary, how it’sgoingtobeused,andhowtoencodeit.
Alvis,Argo,DKProCore
5,9
Define (or choose) a workflow representationlanguage tobeusedbeforewecanassessability tocomplywiththis
all 18
InteroperabilityStandardsandSpecificationsReport
•••
Public Page37of56
7. List of attachments • DetailedInteroperabilitySpecificationv1
o https://openminted.github.io/releases/interop-spec/1.0.0/openminted-interoperability-spec/
8.1 OpenMinTeD Component Classification (Draft) Thissectionprovidesanoverviewofthedraftcategorisationsystemforcomponents.Thisisanexcerptfrom awork-in-progress document. The actual document also includes information on how tomapthesecategoriestoothercategorisationsystems,e.g.theMETA-SHAREvocabulary.
yes no no no no no MARC21 provides a complete but complex description of bibliographic metadata using code numbers to describe data. MARC21 presents some inconveniences, such as its high complexity and its inability to be easily read by humans.
yes no no no no no FaBiO allows for the semantic description of a variety of bibliographic objects, such as research articles, journal articles, and journal volumes, to clearly separate each part of the publishing process, the people involved in the publication process, and the various versions of
Title Fulltitle Type Description Publications Lexica, ontologies,
etc.
Corpora S/w tools
Web services
Workflows Comment
to go for machine readable licences
ODRL1 OpenDigitalRightsOntology
licensing ontologyforrepresentinglegalrights
N/A N/A N/A N/A N/A N/A complementary to legal metadata, if WG3 decides to go for machine readable licences
LREMap2 LREMap languageresources
user-provideddescriptionsoflanguageresources
no yes yes yes no no general, free values; user filled in
CCR3 CLARINConceptRegistry
metadata(external&linguistic)
registryformetadataelementsandvalues
no yes yes yes yes yes follow-up of ISOcat; good for checking, but not all elements and values are validated; difficult to select those that are needed fro external metadata
8.3.2.1 License Calculators/Selectors Amongthisfirstgroupoftools,thefollowingexampleshavebeenconsidered:LICENTIA¸ELRALicenceWizard;LINDATOpenLicenceSelector;CLARINLicencecategorycalculator;RDFRepresentationoflicences;OSSWATCHLicencedifferentiator.LICENTIA1 is a suiteof services that supportusers in findinga suitable licence for theirdata. It is alicencecalculator/selectorbasedonODRLrepresentationsofcommonlyusedlicencesandcanbeusedin three modes: users select conditions of use (obligations, permissions and prohibitions) andcompatible licences are shown; users select a licence and seewhether it's compatible with certain1http://licentia.inria.fr/
InteroperabilityStandardsandSpecificationsReport
•••
Public Page48of56
conditions of use; users select a licence, view it with a visualisation tool and export a RDFrepresentation.Thetoolappearseasytouseifusersknowtheirpreferencesintermsofpermissions,obligationsandprohibitions.However,thepartitionintopermissions,obligationsandprohibitionsmaybetrickyifnotsupportedbyaclearlegaldefinition.Besides,itdoesnotallowmultiplechoicesanditdoesnotprovideabroaderillustrationoflicences.ELRALicenceWizard1isawebconfiguratorthatenablestochooseamonganumberoflegalfeaturesandconsequentlyobtainasuitablelicencedistributecontentsadjustedtotheirselection.Itcovers24licences (ELRA, Creative Commons andMETA-SHARE)which are classified according to nine criteria(e.g.usetype,whetheritrequireselectronicsignaturesetc.).Thetoolguidesuserstomaketheirchoiceswiththehelpofexplanatorytext.Italsoallowstheusertostatemultiplepreferences.Itprovidesabroaderpictureofavailablelicencesbasedonusers'selection.The language used to guide users' choices is not however always clear (explanations under thequestionmark that aims to explain the criteria are not always clear) and it does not always have acorresponding legal meaning (see, for instance, the distinction between implicit and explicit).Moreover, it does not provide a graphical illustration that could help users to better visualize thelicences'rules.Similarly, LINDAT Open Licence Selector2asks users a number of questions (based on licensingconditions,again)andconcludeswithasetoflicencesthatmatchhis/heranswersandwhichtheusercanuseforhis/herresource.Thetoolisquiteappealingintermsofinterface,alsoallowingafreesearchandprovidesasummaryofpotentiallyapplicablelicences.Nevertheless,itdoesnotallowmultiplechoicesneitheritreallyguidesusers(especiallynon-professionalusers)tomaketheirchoice.Another similar tool isCLARIN Licence category calculator3, which suggests a number of labels forconditionsofuse(aka“LaundryTags”).IthelpsuserstoclassifythelicencethattheywouldliketouseforacertainresourceaccordingtotheCLARINlicencecategories.Iftheuserhasnotchosenalicence,italsoprovidesa linktoa"ready-made" legaltextconformantwiththe licensingconditionstheuserhasselected.Iftheconditionsdonotrequireuseridentification,itprovidesalinktotheLINDATOpenLicenceSelector.The tool guides the users providing a number of conditions of use identified by labels. However, itseems to be confined to CLARIN, as it implies certain specifications that are narrowed to CLARINcategories. Inaddition, themeaningofeachcondition is too syntheticandmoreexamples couldbeadded.Finally,OSSWATCH Licence differentiator4aims at helping users to understand their preferences inrelationtofreeandopensourcesoftwarelicences.1http://wizard.elra.info/index.php2http://ufal.github.io/public-license-selector/3http://www.helsinki.fi/finclarin/calculator/ClarinLicenseCategory.html4http://oss-watch.ac.uk/apps/licdiff/
InteroperabilityStandardsandSpecificationsReport
•••
Public Page49of56
Itguidesusers specifying indetail thecontentof theirchoices. Itmakesalsoexplicit thatusers fullyreadandunderstandtheirchosenlicence(bystatingthat“it isnosubstituteforreadingthelicencesthemselves [and] the classifications of licence type that enable this tool to work are by necessitysomewhat reductive, and therefore output of this tool cannot andmust not be thought of as legaladvice”),whichisoftennotthecaseformostofuserswhodonothavealegaltraining.
8.3.2.2 License Descriptors Amongthelicensedescriptors,thedatasetprovidedbyRDFRepresentationoflicences1contains126licensesthatareexpressedasRDF,whilelicensescanbealsoaccesseddirectly.The tool is a representation of licences with visualisation of basic licensing conditions: a set ofcommonlyused licenceswith theirRDF representation (ODRL&CC-REL).While thedatasetcontainsmanylicenses,itdoesnotprovideguidancetousers.Likewise,RDFLicensedataset2alsoincludeslicencesrepresentedinRDF.Similartotheprevioustool, it ismakesuseoftheMS-rightsvocabulary3. Itcontainsthesamelistoflicenses as the previously mentioned dataset, but it adds the value of including a keyword- basedsummary.Atthesametime,conditionsofusecouldbeexpanded.
8.3.2.3 Comparative Tables and Graphics The first example considered is theMETA-SHARE Table4,which appears in theD6.1.1META-SHAREReport related to the use of language resources (LRs) and language technologies (LTs) within theframeworkofMETA-SHARE.Thetoolhastheaimtocoverasmanyelementsaspossibleandcondenseit intoaconcisegraphicalrepresentation.Limited toELRA,LDC (&NIST),CC licenses, ityetprovidessomeunclear information(e.g.with ref. to "Remark"or "Implicit/Explicit"), therefore it isnotalways straightforward to followandriskstoconfuseuserstosomeextent.A similar tool is theORACLE Table5,which is intended to compare themajorattributesof themostpopularFreeandOpenSourceSoftwarelicenses.Thechartcomparesandgraphicallyrepresentsanumberoflicenses,withtheaimtovisuallycomparethemain featuresofmostpopular freeandopensource software licenses.However, it appears toosynthetic and its own compiler understands the related limits acknowledging the difficulty of fullyunderstandthedifferencesamonglicenses.Anothercomparativegraphictool istheGNUTable1,whichknowinglyaimsatcoveringalsotheNewCompatibleLicenses.InadditiontotheGNUlicenseslist,thegraphicillustratessomelicensingrulesinrelationtonewcompatiblelicenses.
ThechartaimsatclarifyingthecompatibilityofanumberoffreesoftwarelicenseswithGPLandnowalsoGPLv3.Althoughitoffersaquiteclearandschematicpictureoftherelationsamonglicenses,thescopeofthechartisinevitablytoonarrow.Toconcludewith,WG3hasconsidereda listofothergraphical representations,suchasTLDRLegal2andGitHub Tool3, but also - although of different nature and scope tools like theEuropean PublicDomainCalculator4andPublicDomainSherpa5,theUScopyrighttermcalculator.Regarding these specific tools, during the previous conference callswithWG3 internal and externalexperts, it emerged that, although they cannot be considered precisely applicable to the scenariosconsideredbyOpenMinTeDnorpreciselytotheextentthattheycouldbedirectlyappliedtotheWG3Compatibility Matrix, they still offer a basis for comparison and therefore they can be consideredsimplyasexamples,especiallyintermsofthemethodologybehindthemandtheirgraphicalinterface,tolookatwhendevelopingamoreOpenMinTeDtailoredtool.
8.3.3 The OpenMinTeD Compatibility Matrix (CM) AswellarguedbyLabropoulou,PiperidisandMargoniintheframeworkoftheLREC2016WorkshoponCross-PlatformTextMiningandNaturalLanguageProcessingInteroperability:
“In the field of TDM it is important to properly address the licence compatibility issue by employing a “multi-layer licence approach”. The starting point is of course to focus on just one “layer”, e.g. content licences or software licences or terms of use, and try to resolve compatibility issues “within” the same type of licences. This means to verify the compatibility of the same kind of licences in order to determine whether two or more content licences can be combined, or two or more software licences can be combined. A multi-layer approach applies the same compatibility principle across the 3 categories identified (content licences, tools or software licences, and service agreements). In this way, it will be possible to develop an interoperability model or matrix that is not limited to content, tools or services individually considered, but that, by taking a holistic approach, is able to offer a more complete analysis of the licence compatibility issues faced by TDM researchers. In other words, this formulation, instead of taking a theoretical legal
approach, puts at its centre the needs and the skills of TDM researchers, who usually are not legally trained”.1
AfirstdraftoftheCMsfor(1)contents,(2)software,and(3)termsofuse,couldlookatfirstlikethefollowing.Itisimportanttonoteatthispointthatthethirdcolumn(“Aretheycompatible?”)referstothepossibilitytocombinethesubjectmatterofcolumn1and2inawaythatundercopyrightlawtheyformasocalled“derivativework2”.Whenthecombinationoftwoworksdoesnotleadtothecreationofaderivativeworkthereshouldbenorestrictiontothepossibilitytocombinethem.Nevertheless,there are caseswhere thedifference is not clear cut and the specific terminology employedby thelicences can become decisive. There are instances, however, where two licences interpret theirrespectivetermsindifferentways.Whenthisisthecase,thiswillbenotedinthe4thcolumn.Note:TablesTable9,Table10,andTable11includeonlysomeofthelicensestobeconsidered.Thesetablesarenowbeingsubstitutedwith twoaxisgraphical representations (TablesTable12,Table13,andTable14).Thepresenttablesarestillinadraftversion.UpdatedversionsaretobeincludedwithD5.3.Table9-CompatibilityMatrix(draftversion1.0):Contents