Data Management David Nathan & Peter Austin & Robert Munro
Jan 12, 2016
Data Management
David Nathan & Peter Austin
& Robert Munro
This section
1. Data management
2. Properties of data
3. Relational data model
4. XML
5. Example
Workflows - description vs documentation
something inscribed
something happened
you applied knowledge, made decisions
NOT OF INTEREST!
representations, lists, summaries, analysescleaned up,
selected, analysedarchived, presented, published
something happened
recordingyou applied knowledge, techniques FOCUS OF
INTEREST!
representations, eg transcription, annotationmade decisions,
applied linguistic knowledge archived & ... ??
recapitulates
Description
Documentation
Choosing values/priorities
Standards & compliance Adeptness with tools Modelling of phenomena, architecture of data Dissemination/publishing Preserving Ethics, responsibility, protocol Range, comprehensiveness Intellectual rigour
Which are priorities? Which are dispensible?
Data should be:
explicitconsistentrobustmeaningfulconventionaladaptable, convertible, machine readable etcuseful!
“Portability”
Bird and Simons 2003:
language documentation data needs to have integrity, flexibility, longevity
“Portability”
completeexplicitdocumentedpreservabletransferableaccessibleadaptablenot technology-specific(also appropriate, accurate, useful etc!!)
Data management
the way that data is structured is also information, that may be complex
properly structured data allows:usage including manipulation, conversion, derivationpreservationmachine readability
Data management systems
a data management system is a system you design for storing data and metadata:information about content and structuresrelationship between units of information
it is not necessarily tied to any particular software, or even a computer
Naive managment using filenames
a (too) simple management system:information about a recording is captured in the
filenames:1st_int_john_5Aug.wav
market_conv_mj.wav
….
what does ‘int’ mean? what information about the recording is missing?
Data modeling
World/universeDomainRelevant
entitiespropertiesrelationships
We also need formal ways to represent these
Data modeling
data modelling is the process of designing your data management system:what information do you need to record?what are the units of information?what are their properties (attributes)?what are the relationships between the units of
information?how is the information etc likely to change in the future?how can all this be represented?
Data management
two well-known formats for structured data:relational databaseeXtensible Markup Language (XML)
these are methods, not softwares or hardwaresany system for well-structured data could be OK,
but generally:smaller community of users so less tools and support ... so errors more likely
Databases
Note that database has 3 senses:a body of related informationtype of software (eg Oracle, Access, Filemaker)a model for the domain of information (ie. formulation
of entities and relationships)
Relational format
Uses tablesTable rows represent entities in a domainTable columns represent properties/attributes of
entitiesEach cell represents one atomic unit of dataThe order of rows and columns has no
significance
Representing a relational design
field name
TABLE NAME
simplest example
Representing a relational design
field 1
TABLE NAME
less trivial entity
field 2
Representing a relational design
less trivial domain
name
CONTINENT
name
COUNTRY
= one to many
Non-trivial domains
non-trivial domains have many-to-many relationships
name
AUTHOR
name
SUBJECT
.....
.....
From model to implementation
implementing table relationships
name
CONTINENT
name
COUNTRY
id id continent_id
Designing a database
Determine the domain, entities and relationshipsExperiment with scenariosAny non-trivial model will evolve as it is thought
out and testedNormalisation is the process of refining models
Practical example
Create a database model for some audio metadata
What does all this achieve?
conceptual/intellectual validity scalable, searchable, modular machine readable in fact, portable:
complete explicit documented preservable transferable accessible adaptable not technology-specific
Stop here!