Top Banner
CLARIN-D tutorial Nijmegen, 7 September 2011 Speech & Language Data Repository (SLDR) Bernard Bel [email protected]
62

Speech & Language Data Repository (SLDR)

Dec 24, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Speech & Language Data Repository (SLDR)

CLARIN-D tutorial Nijmegen, 7 September 2011

Speech & Language Data Repository (SLDR)

Bernard Bel [email protected]

Page 2: Speech & Language Data Repository (SLDR)

Table of contents

2

Speech & Language Data Repository (SLDR)

• A demo to start with! • Background • Item ‘packaging’ • Our OAIS model • Interoperability • Processing queries on SLDR • Access rights management • Following up resource usage • Work in progress

Page 3: Speech & Language Data Repository (SLDR)

A demo

3

Speech & Language Data Repository (SLDR)

http://sldr.org/sldr000714/toc

We need to modify the private/public status of some files on this item:

Page 4: Speech & Language Data Repository (SLDR)

Background

4

Speech & Language Data Repository (SLDR)

Page 5: Speech & Language Data Repository (SLDR)

• Laboratoire parole et langage (LPL), a speech research laboratory of the French Centre National de la Recherche Scientifique (CNRS), is in charge of an archive submission site named Speech & Language Data Repository (SLDR, pronounce ‘splandar’): http://sldr.org

• The aim of SLDR is to preserve data eligible for speech/language research and facilitate non-commercial access to it.

• Resource pooling is constructed on an interoperable system (OAIS, Open Archival Information System) currently involving two major computing centres (CINES and CC-IN2P3) in a joint project initiated by TGE-Adonis.

Centre Informatique National de l’Enseignement Supérieur,

Montpellier, France

Centre de Calcul de l’Institut National de Physique

Nucléaire et de Physique des Particules, Lyon, France 5

Très Grand Équipement Adonis, Paris, France

Speech & Language Data Repository (SLDR)

Page 6: Speech & Language Data Repository (SLDR)

from experimental linguistics…

A broad range of disciplines…

Page 7: Speech & Language Data Repository (SLDR)

… to field linguistics

Page 8: Speech & Language Data Repository (SLDR)

… including didactics, processing written forms, understanding of speech information, the assessment and retraining of voice, speech and language dysfunctioning etc.

Page 9: Speech & Language Data Repository (SLDR)

History

9

• June 2005: a project proposal for the Council of Federation Institut de Linguistique Francaise • September 2005: a call for projects from Direction de l'Information Scientifique (CNRS) on « Centres

de ressources numériques » • Bringing together two projects of Centre de ressources pour la description de l’oral (CRDO)

submitted by LACITO and LPL: http://sldr.org/docs/admin/lettreDeMissionCRDO.pdf

• September 2006: creating the steering committee of CRDO (Lyon): http://sldr.org/wiki/ComiteDePilotage

• 2006-2007 (contract with Ministry of research, LPL funding): developing the architecture of SLDR (3 enginers)

• The pilot project of TGE-Adonis/SLDR/CRDO-Paris/CINES/IN2P3: Framing the project with Direction des archives de France (juillet 2008) Setting up the pilot project (nov. 2008) A report on the pilot project in March 2009 (Claude Huc): http://sldr.org/docs/admin/ArchivageMutualise-document-synthese-v1.3.pdf Evaluation for the steering committe of TGE-Adonis by Y. Marcoux, EBSI, Montréal (June 2009): http://sldr.org/docs/admin/Marcoux-resume-operation.pdf An agreement is signed between CNRS, CINES and Archives de France to set up the legal framework of long-term preservation (June 2010). Long-term preservation is initiated by CRDO-Paris (22 June 2010) and CRDO-Aix (16 July 2010). January 2011: CRDO-Aix in production 66 Gb/40952 files, in test 186 Gb/7108 files.

• August 2011: The CRDO label is abandoned. CRDO-Aix is renamed Speech & Language Data Repository (SLDR, http://sldr.org)

Detail: http://sldr.org/wiki/CrdoHistorique

Page 10: Speech & Language Data Repository (SLDR)

Item ‘packaging’

10

Speech & Language Data Repository (SLDR)

Page 11: Speech & Language Data Repository (SLDR)

Sharing/preserving what and why?

11

• No specific structure is imposed on SLDR items. We deal with generic items.

• The following item types are defined in metadata: Primary data (corpora): all signals associated with oral production

(audio/video/articulatory measurements) and documents created or collected during an experiment or a field enquiry;

Secondary data (resources): material derived from primary data (transcription, translation, annotation, analysis…) and their associated resources: lexica, grammars, frequency tables, knowledge bases etc.

Tools: software and hardware descriptions of equipment used for data analysis and annotation, e.g.:

• Morpho-syntactic taggers • Syntactic analyzers • Translation assistance • Prosodic analysis/modelling

• Collections of items of all types (including collections)

Page 12: Speech & Language Data Repository (SLDR)

Items of SLDR include

speech/singing corpora, their

annotations, lexica and other

knowledge bases as well as tools

associated with data processing. Corpora

may comprise audio/video

recordings and measurements of

physiological activity.

embargo -> 2047

public

non-commercial licence

non-commercial licence

in progress

Page 13: Speech & Language Data Repository (SLDR)

Persistent URL + OAI + ARK Archival Resource Key

Display source

Several access options

Page 14: Speech & Language Data Repository (SLDR)

14

Our OAIS model

Speech & Language Data Repository (SLDR)

Page 15: Speech & Language Data Repository (SLDR)

The OAIS model in TGE-Adonis pilot project

• Open Archival Information System is the ISO 14721 standard.

• For the pilot project we took advantage of prior experience in the long-term preservation of data from astrophysics.

L’archivage numérique à long terme : les débuts de la maturité ? F. Banat-Berger, L. Duplouy, C. Huc. Direction des Archives de France, 2009.

• The actual implementation took into account specific features of oral resources, notably:

The diversity of file formats: sound/video and all signals associated with speech/singing

Corpora and their annotations are subject to editorial modifications after being stored and shared.

Multilingual approach of descriptive metadata, international scripts, transliteration/annotation standards (IPA etc.)

15

Page 16: Speech & Language Data Repository (SLDR)

Preserving on the (very) long term: why and how?

• Digital medium-term or long-term archiving is not a mere reliable backup. • Aim 1: preserving data • Aim 2: making it accessible and eligible for reuse after an unspecified period

of time. This is the challenge of long-term preservation (archivage pérenne).

• Long-term digital preservation is not the ultimate step of data storage before it becomes untraceable or lost!

• Three major issues: 1) preserving a document, 2) making it accessible, 3) preserving its intelligibility.

Why should we preserve documents?

16

(Source : CINES)

How shall we proceed? • These issues deal with the very long term, thereby meaning more than 30 years. • For this reason data should be handed over to an institutional archive rather

than a consortium of computer centres.

Page 17: Speech & Language Data Repository (SLDR)

Implications of long-term preservation

17

• Preserve the intelligibity of a document: The archive site accepts a restricted number of

persistent formats whose specifications are freely accessible.

The archive site is committed to migrating formats once they have become obsolete. This is the job of the archive curator, not the producer’s!

• Preserve the signification of its content: 1. Descriptive metadata; 2. Archival metadata; 3. A formal description of conventions used in the

archive. PPDI = Project Preservation Description Information. http://sldr.org/ppdi

Page 18: Speech & Language Data Repository (SLDR)

The pilot project involved 3 actors:

• Submission sites: CRDO-Aix (SLDR) and CRDO-Paris • The archive site: Centre informatique national de

l’Enseignement supérieur (CINES, Montpellier) • The dissemination site: Centre de calcul de

l’Institut national de physique nucléaire et de physique des particules (CC-IN2P3, Lyon)

18

The OAIS model in TGE-Adonis pilot project

Page 19: Speech & Language Data Repository (SLDR)

CRDO Organizes data

collection, formatting and metadata

CINES Transfer management/SIP verification

Creating AIP (Archival Information Packages) and storing them Transfering DIPs to CC-IN2P3

Submission Information Packages

(SIP)

Receipts Warnings Archival certificates

CC-IN2P3 Assessing transfers

Structuring items for distribution Retrieving Dublin Core metadata/general

cataloging

Producers

Dissemination Information Packages (DIP) CRDO

The ‘domain’ application: Graphic interface Handlings OLAC

metadata Query tools…

Scientific users

Items distributed

A functional diagram of the OAIS setup

Generic infrastructure Public users

TGE ADONIS: management and funding

You may skip this slide as simple explanations follow!

Source: projet pilote pour la mutualisation de l’archivage pérenne des données orales

Page 20: Speech & Language Data Repository (SLDR)

The life on an item on SLDR

1. Items submitted to SLDR are protected by regular back-up procedures: current data;

2. After a proper packaging, each item is transfered to the test platform of the archive site (CINES);

3. After assessing the submission information package (SIP), CINES forwards a dissemination information package (DIP) to CC-IN2P3;

4. Several versions of the same item may be submitted to take into account editorial changes during the phase of medium-term preservation;

5. Once the item has become stable, it is transfered to the production platform of the archive site and assigned a persistent Archival Resource Key (ARK) for its long-term preservation. A DIP is again transfered from CINES to CC-IN2P3 for its distribution;

6. Submitting new versions is still possible but it should be motivated since all versions are preserved in the long-term archive.

7. Nonetheless it remains possible to modify metadata, descriptive files and access rights without submitting a new version.

20

Page 21: Speech & Language Data Repository (SLDR)

The OAIS solution worked out by SLDR

21

SLDR (submission site)

Archive site Dissemination site

Lab producer

Individual producer

Transfer

Submission

A multi-tier architecture

CINES (Montpellier) CC-IN2P3 (Lyon)

version 2

version 1

(No storage in medium-term preservation)

version 2

version 1

Submission Submission

Submission

Page 22: Speech & Language Data Repository (SLDR)

The OAIS solution worked out by SLDR

22

SLDR (submission site)

Archive site Dissemination site

Lab producer

User Queries interpreted by SLDR may be forwarded to the lab producer or the dissemination site

CINES (Montpellier) CC-IN2P3 (Lyon)

version 1

version 2 version 2

version 1

Page 23: Speech & Language Data Repository (SLDR)

The OAIS solution worked out by SLDR

23

SLDR (submission site)

Archive site

Lab producers

Users

Open access

Users may receive data/metadata from SLDR, lab producers and/or the dissemination site. Note that the archive site is not involved in this process.

CINES (Montpellier)

version 1

version 2

Dissemination site

version 2

version 1

CC-IN2P3 (Lyon)

Page 24: Speech & Language Data Repository (SLDR)

The OAIS solution worked out by SLDR

24

SLDR (submission site)

Archive site Disseminationsite

Lab producers

User

Portal

Metadata harvesting

Portals harvest metadata found in repositories available at different levels of the system

CINES (Montpellier) CC-IN2P3 (Lyon)

version 1

version 2

Page 25: Speech & Language Data Repository (SLDR)

25

Item structure on SLDR

Data Previewing

sldr000759

Page 26: Speech & Language Data Repository (SLDR)

Creating the submission information package

Submission Information Package (SIP)

26

Item stored at SLDR

Page 27: Speech & Language Data Repository (SLDR)

Submitting an item to the archive site: the submission information package (SIP)

27

(See: http://sldr.org/wiki/Packaging-en)

Page 28: Speech & Language Data Repository (SLDR)

Dealing with the SIP in CINES and forwarding a DIP to CC-IN2P3

28

CC-IN2P3 (dissemination site)

Fedora Commons

iRods Arcsys

Dissemination Information Package (DIP)

CINES (archive site)

Long-term storage (AIP)

Page 29: Speech & Language Data Repository (SLDR)

Accessing data on SLDR

User

2. SLDR to CC-IN2P3 query

3. Downloading under SLDR licence

1. Selection

29

CC-IN2P3 (dissemination)

SLDR

Controlled downloading from CC-IN2P3 is achieved via a ‘channelling’ of data through SLDR.

Open access

Page 30: Speech & Language Data Repository (SLDR)

Deleting data from the submission site

User

Open access

30

CC-IN2P3 (dissemination)

SLDR

Data may be deleted from SLDR as it is entirely distributed from CC-IN2P3.

Controlled access

Page 31: Speech & Language Data Repository (SLDR)

Restoring data on the submission site

31

CC-IN2P3 (dissemination)

SLDR

When necessary for preparing a new version of an item, its entire data set may be retrieved from the distribution site.

Page 32: Speech & Language Data Repository (SLDR)

Retrieving an item from CC-IN2P3 to SLDR

32

Page 33: Speech & Language Data Repository (SLDR)

33

CC-IN2P3

SLDR

Retrieving an item from CC-IN2P3 to SLDR

Datastreams stored at CC-IN2P3 are downloaded to SLDR and the whole structure of the item is restored along with original file names and dates of modification.

Page 34: Speech & Language Data Repository (SLDR)

34

Archival metadata: the ‘sip.xml’ file

This file contains archival metadata: a brief description of the item, relations to other items, versioning, storage instructions and specifications of all files submitted for long-term preservation (in the DEPOT folder)

Page 35: Speech & Language Data Repository (SLDR)

35

10,000 years!

Archival metadata: the ‘sip.xml’ file

Page 36: Speech & Language Data Repository (SLDR)

36

File description in the SIP

Page 37: Speech & Language Data Repository (SLDR)

37

Interoperability

Speech & Language Data Repository (SLDR)

Page 38: Speech & Language Data Repository (SLDR)

(OAI-PMH) Dublin Core metadata (Follow this link)

Multilingual keywords

Multilingual description

Both oai_dc and olac metadata formats are produced.

Page 39: Speech & Language Data Repository (SLDR)

39

Descriptive metadata http://www.language-archives.org/item/oai:sldr.org:sldr000745

OLAC: Open Language Archives Community

Descriptive metadata should contain all details relevant to reusing a linguistic resource. This is a

simple example registered in the OLAC archive. A more elaborated example is found at:

oai:sldr.org:sldr000764

Page 40: Speech & Language Data Repository (SLDR)

40

Processing queries on SLDR

Speech & Language Data Repository (SLDR)

Page 41: Speech & Language Data Repository (SLDR)

SLDR persistent links and open access

41

SLDR currently meets requirements for creating persistent links to open-access or controled-access items and the files that they contain. Below are examples.

• Link http://sldr.org/sldr000014 displays item sldr000014. • Link http://sldr.org/sldr000014/get/olac picks up OLAC descriptive metadata for item

sldr000014. • Link http://sldr.org/sldr000027/source launches a ‘source’ query on the server of the

lab producing sldr000027. • Link http://sldr.org/sldr000723/download launches a downloading of the latest version

of sldr000723 (under Creative Commons licence) irrespective of its distribution site. • Link http://sldr.org/sldr000014/download launches a downloading of the latest version

of sldr000014 (under CRDO licence) irrespective of its distribution site. • Link http://sldr.org/sldr000036_v3/download launches a downloading of version 3 of

item sldr000036 irrespective of its distribution site. • Link http://sldr.org/sldr000014/toc displays a detailed table of contents from which no

file can be downloaded because the entire item is in restricted access (under CRDO licence).

• Link http://sldr.org/sldr000525/toc displays a detailed table of contents from which a few open-access files can be downloaded.

• Link http://sldr.org/sldr000525/map displays a map of files from which a few open-access files can be downloaded.

Page 42: Speech & Language Data Repository (SLDR)

42

This URL: http://fedora.tge-adonis.fr:8090/fedora/get/CRDO-Aix:126690/DEPOT_525.pdf depends on the dissemination service and current version of the item because of its references to identifier ‘126690’ and index ‘525’.

http://sldr.org/sldr000033/toc

SLDR (non-)persistent links for open access

Page 43: Speech & Language Data Repository (SLDR)

43

This URL http://sldr.org/sldr000525/get/stream/CG5_22k-tc.txt is independent on the dissemination service, the current version of the item, and whether it is stored in a medium-term or long-term archive.

(To this effect, file [72] was packed in a ‘stream’ directory)

http://sldr.org/sldr000525/toc

SLDR persistent links for open access

Page 44: Speech & Language Data Repository (SLDR)

44

http://sldr.org/sldr000525/get/stream/CG5_22k-tc.txt

This URL will remain accessible after uploading a new version to the archive.

Solution: modifying access rights to any document should be accomplished by a simple update of metadata, i.e. no versioning of the concerned item.

What if its access right needs to be modified?

SLDR persistent links for open access

Page 45: Speech & Language Data Repository (SLDR)

45

Access rights management

SLDR data is submitted to an institutional archive (CINES) for its long-term preservation. Access rights should therefore be compliant with the French Code du patrimoine (Heritage code) with respect to public archives.

Speech & Language Data Repository (SLDR)

Page 46: Speech & Language Data Repository (SLDR)

Owner of item

Privileged user (team)

Privileged individual

Member of authorised group

… Anybody

Open-access files in table of contents

yes yes

yes

yes

yes

yes

Any file in current version

yes

yes

yes

yes

… no

Any file in previous versions

yes

yes

no

no

… no

Any file at the source (next version)

yes

yes

no

no

… no

File in a SECRET folder yes

yes

no

no

… no

Updating descriptive files and metadata

yes yes no no … no

Editing/viewing confidential metadata

yes no no no no no

Access options More options to come!

Page 47: Speech & Language Data Repository (SLDR)

47

Code du patrimoine (the Heritage Code) in France Excerpts from Code du patrimoine, law of 15 July 2008, (the Heritage Code) L211-1: Archives cover all documents, regardless of their date, place of storage, shape and physical support, produced or received by any person or entity and any department or public or private agency during the exercise of their business. L211-2: The preservation of archives is organized in the public interest, both for the sake of dealing with and assessing the rights of individuals or legal entities, public or private, and for documenting research with historical material. L211-4: Public archives are: (a) Documents produced by the activity of State, local governments, public institutions and other legal persons under public or private law who are in charge of a public service, as part of their public service remit. (...) L213-1: Public archives are in open access if not subject to restrictions as per Article L. 213-2.

L213-2: Notwithstanding the provisions of Article L. 213-1 (...) public archives are automatically granted open access after a delay of ... (read details). Among derogations (code AR048): 50 years. Documents disclosure of which undermines the protection of privacy or for appreciation or value judgments about a person named or easily identifiable, or which reveal the behavior of a person under circumstances which might bring him/her injury.

L213-5: Any Administration holding public or private archives is required to give reasons for objecting to a request for access to archival documents.

Page 48: Speech & Language Data Repository (SLDR)

48

Access rights management on SLDR

1

2 3

4

• Access to items disseminated by SLDR must be compliant with the French Code du patrimoine which classifies every scientific production as a public archive.

• By default, a public archive shall be in open access (Article L213-1, Act of 15 July 2008). However, derogation clauses for restricting access are applicable in cases enlisted by Art. L213-2.

• Denied access shall be explicitly motivated (Art. L213-5) (1). • In the latter case, right holders may sign a free consent form to share the document

before the end of the restriction period (2). • Permissions may be granted for a limited period of time (3) under special conditions (4). • Different access rights may be specified for single files in a given item. • Permission to download a file is granted by the distribution site on the basis of

conditions declared by producers, as shown above.

Page 49: Speech & Language Data Repository (SLDR)

49

Article L213-2, Code du patrimoine, Act of 15 juillet 2008 (source) Modified by Ordonnance Nr. 2009-483 of 29 April 2009 - art. 13 Notwithstanding the provisions of Article L. 213-1: I.-Public archives are automatically granted open access after a delay of: 1. Twenty-five years from the date of the document or the most recent document included in the file: a) For documents whose disclosure violates the confidentiality of the deliberations of the Government and authorities of the executive branch to conduct foreign relations, currency and credit, public, business and industrial secrecy, inquiries by the relevant departments on tax and customs offenses (AR039) or secrecy in statistics except when relevant data are collected through questionnaires relating to facts and private behavior mentioned in 4 and 5 (AR041); b) For documents mentioned in 1 of I of Article 6 of Law No. 78-753 of 17 July 1978, with the exception of documents produced under a contract for services performed on behalf of one or more specific persons when those documents, because of their content, fall in the scope of points 3 or 4 of the present act (AR040, AR042, AR053); 2. Twenty-five years from the date of death of the person concerned, for documents whose disclosure violates medical confidentiality (AR043). If the date of death is unknown, the time is one hundred and twenty years from the date of birth of the person in question (AR061); 3. Fifty years after the date of the document or the most recent document included in the file, for documents whose disclosure violates the secrecy of national defense, fundamental interests of the State in the conduct of external affairs, public safety, security of persons (AR049) or for the protection of privacy (AR048), except documents mentioned in 4 and 5. The same deadline applies to documents that will assess or value judgments about an individual, named or readily identifiable, or which reveal the behavior of a person under circumstances that might cause him/her prejudice (AR048). The same deadline applies to documents relating to the construction, equipping and operation of structures, buildings or parts of buildings used for detention of persons or usually receiving detainees (AR050). This period is counted from the end of the allocation to these uses of structures, buildings or parts of buildings in question;

Page 50: Speech & Language Data Repository (SLDR)

50

4. Seventy-five years from the date of the document or the most recent document in the folder, or within twenty-five years from the date of death of the person if the latter period is shorter: a) For documents whose disclosure violates the secrecy of statistics are involved when data collected through questionnaires are relating to facts and behavior of private life (AR041, AR052); b) For documents relating to investigations conducted by the staff of the Judicial Police (AR046, AR056); c) For documents relating to cases before the courts, subject to special provisions relating to judgments, and enforcement of judgments (AR047, AR051, AR057); d) For the minutes and registers of public or ministerial officers (AR045, AR055); e) For records of birth and marriage certificates of civil status, after their completion (AR044, AR054); 5. Hundred years from the date of the document or the most recent document in the folder (AR058), or within twenty-five years from the date of death of the person if the latter period is more, with respect to documents referred to in point 4 dealing with persons under 18 (AR041, AR046, AR047, AR045). The same limits apply to documents covered or having been covered by the national defense secrets whose disclosure is likely to endanger the safety of persons named or readily identifiable (AR059). It is the same for documents relating to investigations conducted by police services, judicial matters brought before the Courts (AR058), subject to special provisions relating to judgments, and enforcement of judgments the communication of which affects the intimacy of people’s sexual life (AR060). II.-Access is prohibited to public archives whose disclosure would lead to the dissemination of information to design, manufacture, use or location of nuclear, biological, chemical or other weapons that have a direct or indirect destruction of a similar level (AR062).

Page 51: Speech & Language Data Repository (SLDR)

(With p.o.) Victorine Dumas, 23 April 2011. Photograph: Médéric Gasquet-Cyrus

Page 52: Speech & Language Data Repository (SLDR)

Informed consent and complimentary user’s

licence

User

Informant

Page 53: Speech & Language Data Repository (SLDR)

53

Specifying private/public status in the package

Some folders bear special names indicating a peculiar way of sharing their contents.

Setting up access rights attributes is explained in full detail on this page: http://sldr.org/wiki/accessRightsSettings_en

Page 54: Speech & Language Data Repository (SLDR)

54

Following up resource usage

Speech & Language Data Repository (SLDR)

Page 55: Speech & Language Data Repository (SLDR)

SLDR user licence

Signing CRDO user licence

Downloading

55

Page 56: Speech & Language Data Repository (SLDR)

Notifications are posted in the language chosen by the user. They remind him/her of the nature of downloaded items, their

reference publications and terms of the SLDR licence.

Notification of downloading (under SLDR licence)

Page 57: Speech & Language Data Repository (SLDR)

Following up resource usage

57

This page displays the downloadings of an item and academic profiles of its users.

Access to this list is restricted to persons having produced items on SLDR or the ones who downloaded this particular item.

1) Users’ community

Page 58: Speech & Language Data Repository (SLDR)

58

2) Publications

Following up resource usage

Page 59: Speech & Language Data Repository (SLDR)

59

Project 1: Corpus of Interactional Data (2003) Project 2: OTIM (Outils de Traitement de l'Information Multimodale) (ANR 2009) Project 3: La production des sons en parole conversationnelle (2007)

3) Teams and research projects

Follow this link

Three projets

Following up resource usage

Page 60: Speech & Language Data Repository (SLDR)

60

• facilitating collaborative work beyond institutional barriers (international projects etc.)

• stressing out the utility of oral data for the research community, the diversity of their uses, and consequently the benefit of sharing them on a non-commercial basis.

Following up persons, productions, teams and projects associated with SLDR resources is of great relevance to:

Following up resource usage

Page 61: Speech & Language Data Repository (SLDR)

61

Work in progress

Speech & Language Data Repository (SLDR)

Page 62: Speech & Language Data Repository (SLDR)

62

• SLDR team at LPL (part time): 3 administrators technical coordination: 2 research enginers scientific coordination: 2 research scientists

• In the framework of the ORTOLANG project (Open Resources and Tools for Language), a consortium of French linguistic research labs is planning to set up a network of CLARIN centres relying on SLDR and CNRTL respectively for the preservation and sharing of oral and text resources. This project will demand notably: • implementing persistent identifiers based on EPIC (European Persistent Identifier

Consortium); • enriching descriptive metadata formats to facilitate interoperability; • implementing Shibboleth or OpenSSO cross-site authentication.

• Complementary distribution of linguistic resources in collaboration with ELRA and the Linguistic Data Consortium (LDC), e.g. sldr000034 et sldr000770.

• We are eager to work in association with research projects likely to prompt the implementation of new features!

Journal of development: http://sldr.org/wiki/Developpement/Journal

Work in progress at SLDR