D-square (D-kwadraat) Digital Databases and Tools for Dutch Dialect Dictionaries Jos Swanenberg, Folkert de Vriend & Roeland van Hou
Dec 31, 2015
D-square(D-kwadraat)
Digital Databases and Tools for Dutch Dialect Dictionaries
Jos Swanenberg, Folkert de Vriend & Roeland van Hout
Topics
• Historical background
• Overview of project phases
• Conversion procedures
• New encoding for data
• End user access to the data
Volumes
1. Agricultural terminology
2. Other technical or craft terminologies
3. Common vocabulary
Macro structure WBD & WLD
Constituents:- Lexical meaning (title, description of the
concept)- Lexical form (‘dutchified’ entry)- Phonetic form - Sources- Geographical code (+ map)
Micro structure WBD & WLD
1960-1980 Filing cards
1985-1995 Word processor, Genoveva
1995-2007 Databases + word proc.
2002 Online database WBD
2003 -2007 D-square
History of automation
Deel IIIMS-Word
Editors
/
Management
Users
Analog Digital Digital Analog
(parts of)Vol. I+IIMS-Word
Filing cards
WebsiteWBD/WLD
with tools forsearching andcartography
EnricheddataXML
Raw dataFileM Pro
Vol. I+IIMacWrite
Questionnaires Nijmegen and
Leuven
Questionnaires (chiefly) Meertens
Raw data
Vol. I + II Vol. III
Edited data
Specializedprint editions (dialect atlas
or local dictionary)
Online DB WBD
(Polderland)
Edited dataXML
Vol. IIIFileM Pro
SGV on CD(Polderland)
Vol. III
1. Conversion to a new format
2. End user access to data
3. Enrichment of data
4. Data management
Overview phases D-square
Reasoning behind new encoding
• XML, not relational database
• Tailored to WBD and WLD
• Flexible enough to be used for other dialect dictionaries
• Based on standard: LMF (ISO TC 37/SC 4)
Example XML-encoding<LEXICON dialect="Brabants"><ENTRY> <META>
… </META> <CONCEPT lang=“dutch” ontol_id=“492”>Meikever</CONCEPT> <DATA> <VARIANT type=“heteronym”>Bakkertje <VARIANT type=“lexical”>bakkerke
<VARIANT type=“raw” import=“diplomatic”>bakkərkə <LOCATION source1=“N83”>K 178</LOCATION>
</VARIANT> </VARIANT> </VARIANT> </DATA></ENTRY> …</LEXICON>
Small scale survey
- Tools: Search engine, Cartographic tool, Format conversions.
- Enrichment: POS, morphemes (syllables)
- Links to other resources: Other dictionaries, questionnaires, FAND, MAND.
Difficulties to overcome• Search engine
• Getting from question to query (coaching needed). Is SmartMatch (fuzzy matching) helpful in this regard?
• Speed of XML searching
• Cartography
Availability of base maps
• Links to other resources
Differences in interpretation