Top Banner
Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004
34

Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

Dec 31, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

Dr. Kristin Bakken, NO 2014Oddrun Grønvik, NO 2014Dr. Daniel Ridings, DOK

Sept. 7th 2004

Page 2: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

An old dictionary with new tools Norsk Ordbok (Norwegian Dictionary)

An old national dictionary with ambitious scope Initiated in 1930 A dictionary archive of some 3,2 mill. slips A combined literary and dialect dictionary Complex entry structure

Literary and dialect quotations with references Formal variants with geographical references Information on pronunciation, etymology, inflection and

historical standardization

Page 3: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

New project organization in 2002 Fresh fundings directly from the government on

top of The University of Oslo share in the project Political acceptance of the national value of the dictionary

Conditions To complete the dictionary in 12 volumes by 2014 To develop and exploit computer-based tools in the

process Strengthened management

Page 4: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

Our starting point 4 volumes already published – demands of continuity

vs. need for reform The University of Oslo wanted a general format for all

their digitalized academic archives Unit for digital documentation (DOK) was in charge of all

digitalization Our slip archive and other sources had been digitalized

and turned into a database by DOK during the 1990s An index (the Meta-dictionary) had been made to

access the database.

Page 5: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

ChallengesTo make the dictionary writing more time-efficient

- the analysis and synthesis of data- the writing of dictionary entries

To ease the process of training many new editors in a short timeA simpler and faster production phaseTo improve the quality of the dictionary

Page 6: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

Therefore: To build an integrated digital platform on which to

edit Norsk Ordbok To make a dictionary writing system on top of the

Meta-dictionary database, so that the sources are directly accessible through the DWS

To link a corpus to the DWS and to the Meta-dictionary

A database and work-station solution

Page 7: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

Oddrun Grønvik: Resources and editor

NO 2014 gives specifications and coope-rates closely in application development with DOK (Unit for Digital Documentation, HF, UiO). Partners at DOK are:

Meta Dictionary: Dr. Christian-Emil Ore Dictionary database and editor: Lars

Jørgen Tvedt Corpus: Dr. Daniel Ridings

Page 8: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

Organisation

Editor and database for Norsk Ordbok

Meta Dictionary(Language Archives)

NO 2014 in ”show print” or proof sheet

NO 2014 Corpus

Slip archiveRaw manuscript (1940)Other dictionaries(standard, special,dialect) larger texts

new materials

Page 9: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

Meta Dictionary

Page 10: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

MO normalisation window

Page 11: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

Article in Meta Dictionary with facsimile

Page 12: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

Special features of Norsk Ordbok – editor requirements

small group very large and complex entries (function words, central vocabulary)

extensive and complex indication of sources extensive and complex linking system in database

Result:

maximum format for Editor and Style manual way beyond needs for most of the entries

Page 13: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

The NO 2014 editor Basic requirements Establish ”best practice” from vol 1-4 in new style manual Speed and capacity for large number simultaneous

operations Must show entry structure (”tree”), entry forms, print version Clear organisation of tree and entry forms one to one correspondence between tree and entry forms

(speed, equivalence) and to style manual finely graded tree (icons for all types of elements i.e. various

source types)

Page 14: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

Menues and databases accessible from editorHeadword (from the Meta Dictionary)part of speech, status, morphologySpecial symbols and characters (not IPA)linguistic sources from before 1900Bibliography (source list)Geographical location and hierarchyLanguages (etymology form)Other entries (cross references)Usage markers

Page 15: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

Entry form headword and grammar

Page 16: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

Entry form variant information

Page 17: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

Entry form for ”unit of meaning”

Page 18: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

NO 2014 editor search window

Page 19: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

show entry

Page 20: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

Entries in proof sheet style

Page 21: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

Daniel Ridings: Corpus Designed to meet the needs of the

dictionary project Supplement existing paper and electronic slips Concentrate on modern language usage

Works as an independent application or as a module within the Meta-dictionary

Page 22: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

Technical details SGML / TEI

Limited subset of importance for dictionary work Every element, from the top document element

to the lowest word element has its own ID 25634 documents, 19143247 words (Sept. 1)

Database access (Oracle) WWW PC application

Page 23: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.
Page 24: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.
Page 25: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.
Page 26: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.
Page 27: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.
Page 28: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.
Page 29: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

Challenges met: Time efficiency Faster analysis: indexed computerized sources, the corpus The writing of dictionary entries: preset menus, structured

guide through the entry (… but maximum forms for simple entries)

Training advantages: preset menus, structure given Production phase: The database gives all information needed

to produce proof sheets Quality: The DWS secures higher consistency, the corpus

has enriched our empirical basis, no typing errors in everything that is preset, i.e. punctuation, abbreviations and typography

Page 30: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

Standard questions Specific needs?

small group very large and complex entries (function words, central vocabulary)

extensive and complex indication of sources extensive and complex linking system in

database How does the current DWS meet the

needs? Too early to tell but it is looking good.

Page 31: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

Standard questions 2 Corpus

Developed specifically for the project WWW access and integration with the Meta-

Dictionary and our DWS

What kind of software do you use? PC-based software developed locally

Page 32: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

Standard questions 3 Management tools

Not yet developed, but specifications have been given

Is off-line work possible Laptops at home is possible but anyone wishing

to do so also has internet at home, so they are not off-line. All software will work from home.

Limitation is network speed, which in Norway is not really an issue.

Page 33: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

Standard questions 4 Granularity of work

It is possible to divide up the categories (definitions, pronunciation, etc) among various lexicographers. Oracle is a multiuser database.

NLP context The dictionary has not been used in NLP projects.

Morphology and syntax are not usually expressed with the degree of formality that NLP requires. We are involved with a project for machine translation and can compare the needs.

Page 34: Dr. Kristin Bakken, NO 2014 Oddrun Grønvik, NO 2014 Dr. Daniel Ridings, DOK Sept. 7th 2004.

http://no2014.uio.no