Top Banner
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project Baden Hughes 1 , David Penton 1 , Steven Bird 1 , Catherine Bow 1 , Gillian Wigglesworth 1 , Patrick McConvell 2 and Jane Simpson 3 1 University of Melbourne, 2 AIATSIS, 3 University of Sydney
16

Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project

Jan 16, 2015

Download

Technology

Baden Hughes

Paper at LREC2004 (May 2004, Lisbon)
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project

Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project

Baden Hughes1, David Penton1, Steven Bird1, Catherine Bow1, Gillian Wigglesworth1, Patrick McConvell2

and Jane Simpson3

1University of Melbourne, 2AIATSIS, 3University of Sydney

Page 2: Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project

2

Overview Introduction Requirements Data Model Implementation

Data Entry Reports, Queries and Searches Exports Synchronisation Administration

Conclusion

Page 3: Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project

3

Introduction A metadata creation and management tool

for a multiple fieldworker, longitudinal, child language acquisition research project

Addressing the need for principled metadata creation as well as best practice data creation

Challenging deployment scenario which is typical of numerous field-oriented linguistic research and language data collection projects

Page 4: Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project

4

Requirements Data Management

Metadata for complex multimodal data Relational data for participants Delineation between participant roles Not just collection, but reports and queries

Research Methodology Integration with tool of choice for analysis 2 stage enquiry process - metadata then data Extensible controlled vocabularies User defined fields (particularly lists)

Technology Full support for data entry and enquiry in both online and

offline modes Metadata collection with maximum utility to project without

precluding other renderings eg as OLAC or IMDI catalogue Easy to install and use on multiple platforms

Page 5: Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project

5

Data Model Tools for modelling

DBDesigner (open source, XML based, multi-platform)

Challenges for modelling Multiple interlinked media, sessions, and transcripts Differentiating between participants and focus children

in multiple contexts Incomplete personal data eg no DOB Non-linear progression through educational system Multiple types of anthropological relations Non-standardised linguistic classification and

nomenclature

Page 6: Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project

6

Implementation Architecture

(fully independent) networked client-server single line of code difference between client and

server installation Underlying requirement to provide full

functionality in both online or offline environments Technology Platform

PHP, PEAR scripting language MySQL database engine Apache HTTP server fundamentally open source, cross-platform

Page 7: Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project

7

Page 8: Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project

8

Data Entry Forms based data entry

Participant Form Session Form

Feature of both these forms is the “build your own list” form interface which allows end user to construct a list of parameters and then apply instances of these parameters within the parent form educational progress session-media-transcript

Page 9: Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project

9

Reports, Queries and Searches Simple Reports

for frequently used 2 dimensional queries eg participants by fieldworker eg participants by gender

Advanced Reports design your own query interface

Full Text Query Boolean support full database index query

Page 10: Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project

10

Page 11: Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project

11

Exports Generate headers for CLAN

eg @participants Generate Physical Media Labels

Eg FM025.A.DV, FM025.A.MD Generate File Names for

Transcriptions eg DEV00012004049.trn

XML-based database dump

Page 12: Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project

12

Synchronisation Client -> Server

SQL query identifies all changed data since last sync

Export and serialize as XML Compress, checksum Transfer over HTTP Checksum, uncompress Serialise XML to SQL Import SQL into database

Server -> Client is this process in reverse

Page 13: Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project

13

Page 14: Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project

14

Administration User facilitated editing of

System data Synchronisation – server settings

Extensible controlled vocabularies Languages – linked to Ethnologue and AIATSIS

codes Locations – geographical metadata Activities/tasks – both locally and globally defined

User administration Access (personal metadata) Roles (fieldworker, administrator …)

Project administration Fieldworker activity

Page 15: Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project

15

Conclusion

Feature of note is complete online and offline operation

Research methodology is indicative of many field linguistics projects

Available for other interested parties to build on and extend

http://www.cs.mu.oz.au/research/lt/projects/acla-db

Page 16: Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project

16

Acknowledgements The research reported here is

supported by the Australian Research Council Discovery Project Grant DP0343189.