Top Banner
Romanian Online Dialect Atlas © 2003 Embleton, Uritescu , Wheeler 1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex knowledge in the social sciences and humanities. Sheila M. Embleton Dorin Uritescu Eric S. Wheeler
31

Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Mar 31, 2015

Download

Documents

Aidan Lockett
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

1

Romanian Online Dialect Atlas

An exploration into the management of high volumes of complex knowledge in the social sciences and humanities.

Sheila M. Embleton

Dorin Uritescu

Eric S. Wheeler

Page 2: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

2

Romanian Online Dialect Atlas Sheila M. Embleton Department of Languages, Literatures and

Linguistics, York University

Dorin Uritescu co-editor of source atlas: Noul Atlas lingvistic

român. Crisana.

Department of French, Glendon College, York University

Eric S. WheelerITEC program, York University,

Managing partner, Wheeler and Young Inc.

Page 3: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

3

Romanian Online Dialect Atlas

Supported (2003-2006) by a grant from:

Social Sciences and Humanities Research Council (Canada)

Page 4: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

4

Agenda

The problem of high-volume, complex data in social sciences and humanities.

Predecessor projects: English, Finnish dialect data

Use of Multidimensional Scaling (MDS) to consolidate data

Interactive, media-rich presentation

Page 5: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

5

ProblemIn social sciences/humanities, data is

often characterized by: high volume multiple variables or dimensions no a priori model

Dialectology provides a good exemplar

Page 6: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

6

Dialectology Explain the variations in linguistic

usage across geography Simple example:

“church” vs. “kirk” (< OE cirice) More realistic problem:

169 features in 313 locations (SED)

213 features in 400+ locations (Finnish)

Page 7: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

7

Dialect atlases Record the details in maps Many maps needed to make an

atlas Recovery of individual facts is

possible but... Global understanding of the

situation is lost in the volume of details

Page 8: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

8

English

Survey of English Dialects (SED) 169 features at 313 locations

Computer Developed Linguistic Atlas of English

Applied MDS to already computerized data

Page 9: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

9

English: results

2-D map of dialect locations No geographic information used Close correspondence to

geography (as expected) Highlighted further problems of

handling and understanding high-volumes of data

Page 10: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

10

English Dialect Map Northern counties at

top Mid and southern

counties below Somerset, Devon

(South-west) is out of place (in East)

Star-bursts, colours, dotted lines all help interpret map data

Page 11: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

11

Finnish

Page 12: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

12

Kettunen (1940) The Dialect Atlas of Finland 213 maps x 530 locations Up to 16 features per map Typically 1-3 features per location ~120,000 data items

Project: data computerization (largely done)

Stage II: application of MDS (not yet done)

Page 13: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

13

Map 1 (parts)

Page 14: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

14

•Special software to facilitate accurate data entry

Page 15: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

15

Ambiguity

?

Page 16: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

16

Resolution Make Editorial decision:

“X, not Y”

Mark as “AMBIGUOUS”

“X or Y”

Get more input

“X (says expert)”

Page 17: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

17

Lesson

In transforming data from one medium to another, even well-structured data will have unexpected pitfalls:

Design data-transformation carefully Prototype your system; Find the

problems early Plan to work iteratively

Page 18: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

18

Romanian Online Dialect Atlas: Crisana Apply innovative contemporary

methods in dialect geography to an online set of Romanian dialect data.

Page 19: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

19

Romanian language

Key to understanding the evolution of all Romance languages Early branch, distinct from French-

Spanish-Italian line Exemplar of non-hierarchical,

dialect variation, and linguistic continua Transition areas contain mixtures

of dialect features and specific features

Page 20: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

20

RODA: Part 1

Create online version of The New Romanian Linguistic Atlas. Crisana (Stan & Uritescu. 1996)

Available on internet and CD Default interpretations Interactive interface to data

custom select data for a map Add audio clips to illustrate data

Page 21: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

21

RODA Prototype 1

Page 22: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

22

RODA: Part 2

Allow plug-in applications and other analyses of data, e.g.

Apply Multidimensional Scaling to dialect data

Statistical technique Consolidate large amounts of data Complement to traditional analyses

of small amounts of data

Page 23: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

23

Multidimensional Scaling

Page 24: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

24

Multidimensional Scaling Statistical technique (Torgerson

1952) Used in sociology, psychology,

marketing Reveals the scales along which

data varies; gives a data-space Uses distances [(dis)similarities]

among responses of subjects

Page 25: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

25

MDS

10

15

250

1210

A B C D EA 0B 25 0C 10 11 0D 12 17 11 0E 24 10 15 26 0

Axioms of metric d(X,X) = 0 d(X,Y) = d(Y,X) d(X,Y) > 0 if XY d(X,Y)

d(X,C) + d(C,Y) for all points C

Matrix reflects these rules

Page 26: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

26

MDS

n+1 points generate an n-dimensional space

MDS can reduce that high-dimensional space to 2 (or 3) dimensions

Result: complex data can be viewed as a “map”

Page 27: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

27

MDS

Can use MDS to consolidate data English 312 dimensions reduced

to 2 All 169 features included (and

taken in relevant subsets) Finnish, Romanian provide large

data sets that can do the same

Page 28: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

28

Interactive, media-rich presentationObjectives Make data accessible, useful to a

wide research audience

Methods Interactive selection of data Constructive presentation of data Addition of audio and other media

Online is much more than a book!

Page 29: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

29

Framework and App’ns

Online atlas provides a framework for accessing and presenting data

Other applications can work within the framework to transform or process the data, such as:

MDS data consolidation Tools to analyze dialect variants of

phonemes (proposed) Others

Page 30: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

30

Summary

Humanities and Social Sciences deal with large, complex data sets

Explore methods to access, process, present this kind of data

Solutions include: MDS type processing Online, interactive, rich presentation

Example: Romanian Online Dialect Atlas

Page 31: Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Romanian Online Dialect Atlas

© 2003 Embleton, Uritescu, Wheeler

31

References Embleton, Sheila M. and Eric S. Wheeler (2000). Computerized Dialect Atlas of Finnish: Dealing with

Ambiguity. J. of Quantitative Linguistics 2000. 7.3. pp 227-231. Embleton, Sheila M. and Eric S. Wheeler (1997a). Multidimensional Scaling and the SED Data. in

Wolfgang Viereck and Heinrich Ramisch. The Computer Developed Linguistic Atlas of England 2. Tuebingen: Max Niemeyer Verlag.

Embleton, Sheila M. and Eric S. Wheeler (1997b). Finnish Dialect Atlas for Quantitative Studies. J. of Quantitative Linguistics 1997. 4.1-3. pp 99-102

Schiffman, Susan S. , M. Lance Reynolds, Forrest W. Young (1981). Introduction to Multidimensional Scaling. Theory, Methods, and Applications. New York: Academic Press. 411pp.

Torgerson, W. S. 1952. Multidimensional scaling: 1. theory and method. Psychometrika. 17. 401-419. Stan, Ionel & Uritescu, Dorin. 1996. Noul Atlas lingvistic român. Crisana. Vol. I. Bucharest: Romanian

Academy Press. (2003. Vol. II. Bucharest: Romanian Academy Press) Uritescu, Dorin. 1983. “Asupra repartiţiei dialectale a graiurilor dacoromâne. Graiul din Oaş" / "On the

Dialect Structure of Daco-Romanian. The Dialect of Oaş”/, in Materiale si cercetari dialectale II, Cluj-Napoca: The University of Cluj- Napoca, pp. 231 - 246.

Uritescu, Dorin. 1984a. “Subdialectul crisean.” In: V. Rusu (ed.), Tratat de dialectologie româneasca. Craiova: Scrisul românesc, 284-320, 916-930.

Uritescu, Dorin. 1984b. “Graiul din Tara Oasului.” In: V. Rusu (ed.), Tratat de dialectologie româneasca. Craiova: Scrisul românesc, 390-399, 964-967.

Wheeler, Eric S. (2002). Zipf's Law and Why It Works Everywhere. Glottometrica 4, 45-48. Wheeler, Eric S. (2003). Multidimensional Scaling to Visualize Text Separation. Glottometrica 6

forthcoming. Wheeler, Eric S. (nd). Multidimensional scaling. chapter in Reinhard Koehler. (ed) forthcoming

Handbook in Quantitative Linguistics.