Page 1
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
1
Romanian Online Dialect Atlas
An exploration into the management of high volumes of complex knowledge in the social sciences and humanities.
Sheila M. Embleton
Dorin Uritescu
Eric S. Wheeler
Page 2
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
2
Romanian Online Dialect Atlas Sheila M. Embleton Department of Languages, Literatures and
Linguistics, York University
Dorin Uritescu co-editor of source atlas: Noul Atlas lingvistic
român. Crisana.
Department of French, Glendon College, York University
Eric S. WheelerITEC program, York University,
Managing partner, Wheeler and Young Inc.
Page 3
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
3
Romanian Online Dialect Atlas
Supported (2003-2006) by a grant from:
Social Sciences and Humanities Research Council (Canada)
Page 4
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
4
Agenda
The problem of high-volume, complex data in social sciences and humanities.
Predecessor projects: English, Finnish dialect data
Use of Multidimensional Scaling (MDS) to consolidate data
Interactive, media-rich presentation
Page 5
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
5
ProblemIn social sciences/humanities, data is
often characterized by: high volume multiple variables or dimensions no a priori model
Dialectology provides a good exemplar
Page 6
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
6
Dialectology Explain the variations in linguistic
usage across geography Simple example:
“church” vs. “kirk” (< OE cirice) More realistic problem:
169 features in 313 locations (SED)
213 features in 400+ locations (Finnish)
Page 7
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
7
Dialect atlases Record the details in maps Many maps needed to make an
atlas Recovery of individual facts is
possible but... Global understanding of the
situation is lost in the volume of details
Page 8
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
8
English
Survey of English Dialects (SED) 169 features at 313 locations
Computer Developed Linguistic Atlas of English
Applied MDS to already computerized data
Page 9
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
9
English: results
2-D map of dialect locations No geographic information used Close correspondence to
geography (as expected) Highlighted further problems of
handling and understanding high-volumes of data
Page 10
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
10
English Dialect Map Northern counties at
top Mid and southern
counties below Somerset, Devon
(South-west) is out of place (in East)
Star-bursts, colours, dotted lines all help interpret map data
Page 11
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
11
Finnish
Page 12
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
12
Kettunen (1940) The Dialect Atlas of Finland 213 maps x 530 locations Up to 16 features per map Typically 1-3 features per location ~120,000 data items
Project: data computerization (largely done)
Stage II: application of MDS (not yet done)
Page 13
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
13
Map 1 (parts)
Page 14
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
14
•Special software to facilitate accurate data entry
Page 15
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
15
Ambiguity
?
Page 16
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
16
Resolution Make Editorial decision:
“X, not Y”
Mark as “AMBIGUOUS”
“X or Y”
Get more input
“X (says expert)”
Page 17
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
17
Lesson
In transforming data from one medium to another, even well-structured data will have unexpected pitfalls:
Design data-transformation carefully Prototype your system; Find the
problems early Plan to work iteratively
Page 18
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
18
Romanian Online Dialect Atlas: Crisana Apply innovative contemporary
methods in dialect geography to an online set of Romanian dialect data.
Page 19
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
19
Romanian language
Key to understanding the evolution of all Romance languages Early branch, distinct from French-
Spanish-Italian line Exemplar of non-hierarchical,
dialect variation, and linguistic continua Transition areas contain mixtures
of dialect features and specific features
Page 20
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
20
RODA: Part 1
Create online version of The New Romanian Linguistic Atlas. Crisana (Stan & Uritescu. 1996)
Available on internet and CD Default interpretations Interactive interface to data
custom select data for a map Add audio clips to illustrate data
Page 21
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
21
RODA Prototype 1
Page 22
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
22
RODA: Part 2
Allow plug-in applications and other analyses of data, e.g.
Apply Multidimensional Scaling to dialect data
Statistical technique Consolidate large amounts of data Complement to traditional analyses
of small amounts of data
Page 23
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
23
Multidimensional Scaling
Page 24
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
24
Multidimensional Scaling Statistical technique (Torgerson
1952) Used in sociology, psychology,
marketing Reveals the scales along which
data varies; gives a data-space Uses distances [(dis)similarities]
among responses of subjects
Page 25
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
25
MDS
10
15
250
1210
A B C D EA 0B 25 0C 10 11 0D 12 17 11 0E 24 10 15 26 0
Axioms of metric d(X,X) = 0 d(X,Y) = d(Y,X) d(X,Y) > 0 if XY d(X,Y)
d(X,C) + d(C,Y) for all points C
Matrix reflects these rules
Page 26
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
26
MDS
n+1 points generate an n-dimensional space
MDS can reduce that high-dimensional space to 2 (or 3) dimensions
Result: complex data can be viewed as a “map”
Page 27
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
27
MDS
Can use MDS to consolidate data English 312 dimensions reduced
to 2 All 169 features included (and
taken in relevant subsets) Finnish, Romanian provide large
data sets that can do the same
Page 28
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
28
Interactive, media-rich presentationObjectives Make data accessible, useful to a
wide research audience
Methods Interactive selection of data Constructive presentation of data Addition of audio and other media
Online is much more than a book!
Page 29
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
29
Framework and App’ns
Online atlas provides a framework for accessing and presenting data
Other applications can work within the framework to transform or process the data, such as:
MDS data consolidation Tools to analyze dialect variants of
phonemes (proposed) Others
Page 30
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
30
Summary
Humanities and Social Sciences deal with large, complex data sets
Explore methods to access, process, present this kind of data
Solutions include: MDS type processing Online, interactive, rich presentation
Example: Romanian Online Dialect Atlas
Page 31
Romanian Online Dialect Atlas
© 2003 Embleton, Uritescu, Wheeler
31
References Embleton, Sheila M. and Eric S. Wheeler (2000). Computerized Dialect Atlas of Finnish: Dealing with
Ambiguity. J. of Quantitative Linguistics 2000. 7.3. pp 227-231. Embleton, Sheila M. and Eric S. Wheeler (1997a). Multidimensional Scaling and the SED Data. in
Wolfgang Viereck and Heinrich Ramisch. The Computer Developed Linguistic Atlas of England 2. Tuebingen: Max Niemeyer Verlag.
Embleton, Sheila M. and Eric S. Wheeler (1997b). Finnish Dialect Atlas for Quantitative Studies. J. of Quantitative Linguistics 1997. 4.1-3. pp 99-102
Schiffman, Susan S. , M. Lance Reynolds, Forrest W. Young (1981). Introduction to Multidimensional Scaling. Theory, Methods, and Applications. New York: Academic Press. 411pp.
Torgerson, W. S. 1952. Multidimensional scaling: 1. theory and method. Psychometrika. 17. 401-419. Stan, Ionel & Uritescu, Dorin. 1996. Noul Atlas lingvistic român. Crisana. Vol. I. Bucharest: Romanian
Academy Press. (2003. Vol. II. Bucharest: Romanian Academy Press) Uritescu, Dorin. 1983. “Asupra repartiţiei dialectale a graiurilor dacoromâne. Graiul din Oaş" / "On the
Dialect Structure of Daco-Romanian. The Dialect of Oaş”/, in Materiale si cercetari dialectale II, Cluj-Napoca: The University of Cluj- Napoca, pp. 231 - 246.
Uritescu, Dorin. 1984a. “Subdialectul crisean.” In: V. Rusu (ed.), Tratat de dialectologie româneasca. Craiova: Scrisul românesc, 284-320, 916-930.
Uritescu, Dorin. 1984b. “Graiul din Tara Oasului.” In: V. Rusu (ed.), Tratat de dialectologie româneasca. Craiova: Scrisul românesc, 390-399, 964-967.
Wheeler, Eric S. (2002). Zipf's Law and Why It Works Everywhere. Glottometrica 4, 45-48. Wheeler, Eric S. (2003). Multidimensional Scaling to Visualize Text Separation. Glottometrica 6
forthcoming. Wheeler, Eric S. (nd). Multidimensional scaling. chapter in Reinhard Koehler. (ed) forthcoming
Handbook in Quantitative Linguistics.