Top Banner
Self-Organization of the Sound Inventories: An Explanation based on Complex Networks
37

Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Dec 17, 2015

Download

Documents

April Ray
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Self-Organization of the Sound Inventories: An Explanation

based on Complex Networks

Page 2: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Overview of the Talk

• Motivation

• Approach & Objective

• Principle of Occurrence in Consonant Inventories

• Principle of Co-Occurrence in Consonant Inventories

• Findings

• Conclusions and Future Work

Page 3: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Sabda Bramha: Sound is Eternity

sabda-brahma su-durbodham pranendriya-mano-mayam ananta-param gambhiramdurvigahyam samudra-vat

– Sound is eternal and as well very difficult to comprehend. It manifests within the life air, the senses, and the mind. It is unlimited and unfathomable, just like the ocean.

Page 4: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

• Several living organisms can produce sound

– They emit sound signals to communicate

– These signals are mapped to certain symbols (meanings) in the brain

– E.g., mating calls, danger alarms

Signals and Symbols & § ۞ ☼ ♥

Page 5: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Human Communication

• Human beings also produce sound signals

• Unlike other organisms, they can concatenate these sounds to produce new messages – Language

• Language is one of the primary cause/effect of human intelligence

Page 6: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Human Speech Sounds

• Human speech sounds are called phonemes – the smallest unit of a language

• Phonemes are characterized by certain distinctive features like

Mermelstein’s Model

I. Place of articulation

II. Manner of articulation

III. Phonation

Page 7: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Types of Phonemes

Vowels Consonants Diphthongs

/ai/L

/a/

/i/

/u/

/p/

/t/

/k/

Page 8: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Choice of Phonemes

• How a language chooses a set of phonemes in order to build its sound inventory?

• Is the process arbitrary?

• Certainly Not!

• What are the forces affecting this choice?

Page 9: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Forces of Choice

/a/

Speaker Listener / Learner

/a/

Desires “ease of articulation” Desires “perceptual contrast” / “ease of learnability”

A Linguistic System – How does it look?

The forces shaping the choice are opposing – Hence there has to be a non-trivial solution

Page 10: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Vowels: A (Partially) Solved Mystery

• Languages choose vowels based on maximal perceptual contrast.

• For instance if a language has three vowels then in more than 95% of the cases they are /a/,/i/, and /u/.

Max

imall

y Dist

inct

Maximally Distinct

Maximally Distinct/u/

/a/

/i/

Page 11: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Consonants: A puzzle

• Research: From 1929 – Date

• No single satisfactory explanation of the organization of the consonant inventories

– The set of features that characterize consonants is much larger than that of vowels

– No single force is sufficient to explain this organization

– Rather a complex interplay of forces goes on in shaping these inventories

Ji g

sa

w

Page 12: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

The Approach & Objective

• We adopt a Complex Network Approach to attack the problem of consonant inventories

• We try to figure out the principle of the distribution of the occurrence of consonants over languages

• We also attempt to figure out the co-occurrence patterns (if any) that are found across the consonant inventories

Page 13: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Principle of Occurrence

• PlaNet – The “Phoneme-Language Network”

– A bipartite network N=(VL,VC,E)

– VL : Nodes representing languages of the world

– VC : Nodes representing consonants

– E : Set of edges which run between VL and VC

• There is an edge e Є E between two nodes

vl Є VL and vc Є VC if the consonant c occurs

in the language l.

L1

L4

L2

L3

/m/

/ŋ/

/p/

/d/

/s/

/θ/

Conso

na

nts

Langu

ages

The Structure of PlaNet

Page 14: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Construction of PlaNet

• Data Source : UCLA Phonological Inventory Database (UPSID)

• Number of nodes in VL is 317

• Number of nodes in VC is 541

• Number of edges in E is 7022

Page 15: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Degree Distribution

• Degree of a node is defined as the number of edges connected to the node.

• Degree Distribution (DD) is the fraction of nodes, pk, having degree equal to k.

• The Cumulative Degree Distribution (CDD) is the fraction of nodes, Pk, having degree k.

Page 16: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Degree Distribution of PlaNet

0 50 100

150

0.02

0.04

0.06

0.08

Language inventory size (degree k)

pk

pk = beta(k) with α = 7.06, and β = 47.64

pk =Γ(54.7) k6.06(1-k)46.64

Γ(7.06) Γ(47.64)

kmin= 5, kmax= 173, kavg= 21

200

Pk

1000Degree of a consonant, k

Pk = k -0.71

Exponential Cut-off

1 10 100

0.001

0.01

0.1

1

DD of the language nodes follows a β-distribution

DD of the consonant nodes follows a power-law with an exponential cut-off

Distribution of Consonants over Languages follow a power-law

Page 17: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Preferential Attachment: The Key to Power Law

• Power law distributions observed in

– Social Networks

– Biological Networks

– Internet Graphs

– Citation Networks

• These distributions emerge due to preferential attachment

$$ $ $

$$ $ $

$ $ $ $$ $ $ $

RIC

H RIC

HE

R

Page 18: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Synthesis of PlaNet

Given: VL = {L1, L2, ..., L317} sorted in the ascending order of their degrees and 541 unlabeled nodes in VC .

Step 0: All nodes in VC have degree 0.

Step t+1:

Choose a language node Lj (in order) with cardinality kj (inventory size)

for c running from 1 to kj do

Pr(Ci) =di

α+ ε

∑xV* (dxα + ε)

Connect Lj preferentially with a consonant node Ci VC, to which it is already not connected, with a probability

where, di = degree of node Ci at step t and V* = subset of VC not connected to Lj at t and ε is the smoothing parameter.

Page 19: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

L1 L3L2 L4

L1 L3L2 L4

The Preferential Mechanism of Synthesis

After step 3

After step 4

Page 20: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Simulation Result

The parameters α and ε are 1.44 and 0.5 respectively.

The results are averaged over 100 runs

PlaNetrand

PlaNetPlaNetsyn

1 10 100 1000

1

.1

.01

.001 Degree

(k)

Pk

Page 21: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Principle of Co-occurrence

• Consonants tend to co-occur in groups or communities

• These groups tend to be organized around a few distinctive features (based on: manner of articulation, place of articulation & phonation) – Principle of feature economy

If a language has in its inventory

then it will also tend to have

voiced voiceless

bilabial

dental

/b/ /p/

/d/ /t/

plosive

Page 22: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

How to Capture these Co-occurrences?

• PhoNet – “Phoneme Phoneme Network”– A weighted network N=(VC,E)

– VC : Nodes representing consonants

– E : Set of edges which run between the nodes in VC

• There is an edge e Є E between two nodes vc1 ,vc2 Є VC if the consonant c1 and c2 co-occur in a language. The number of languages in which c1 and c2 co-occurs defines the edge-weight of e. The number of languages in which c1 occurs defines the node-weight of vc1.

/kw/

/k′/

/k/

/d′/42

14

38

13

283

17

50

39

Page 23: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Construction of PhoNet

• Data Source : UPSID

• Number of nodes in VC is 541

• Number of edges is 34012

PhoNet

Page 24: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Community Structures in PhoNet

• Radicchi et al. algorithm (for unweighted networks) – Counts number of triangles that an edge is a part of. Inter-community edges will have low count so remove them.

• Modification for a weighted network like PhoNet

– Look for triangles, where the weights on the edges are comparable.

– If they are comparable, then the group of consonants co-occur highly else it is not so.

– Measure strength S for each edge (u,v) in PhoNet where S is,

– Remove edges with S less than a threshold η

S =wuv

√Σi Є Vc-{u,v}(wui – wvi)2 if √Σi Є Vc-{u,v}(wui – wvi)2>0 else S = ∞

Page 25: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

3

1

2

4100

110

101

10

5

646

52

45 3

1

2

411.11

10.94

7.14

0.06

5

63.77

5.17

7.5S

η>1

3

1

2 6

4

5

Community Formation

For different values of η we get different sets of communities

Page 26: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Consonant Societies!

η=1.25η=0.72

η=0.60

η=0.35

Page 27: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Evaluation of the Communities: Occurrence Ratio

• Hypothesis: The communities obtained from the algorithm should be found frequently in UPSID

• We define occurrence ratio to capture the “intensity” of occurrence,

– N is the number of consonants in C (ranked by the ascending order of frequency of occurrence) , M is the number of consonants of C that occur in a language L and Rtop is the rank of the highest ranking consonant in L that is also present in C

– If a high-frequency consonant is present in L it is not necessary that the low-frequency one should be present; but if a lower one is already present then it is expected that the higher one must be present

OL =M

N – (Rtop – 1)

Page 28: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Computing Occurrence Ratio: An Example

X

/kh/

/k/

/kw/

/kh/

X

/kw/

/kh/

/k/

/k/

/kh/

/kw/

C

L1

L2

L3

R =1

R =2

R =3

M=3, N=3, Rtop=1

OL=3/3=1

M=2, N=3, Rtop=2

OL=2/2=1

M=2, N=3, Rtop=1

OL=2/3=0.66

Page 29: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Average Occurrence Ratio

• For a given community it will have an occurrence ratio in each language L in UPSID

• We average this ratio over all L as,

where Loccur is the number of languages where at least one of the members of C has occurred

Oav =Loccur

ΣL Є UPSIDOL

Page 30: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Results of the Evaluation

Consonants show patterns of co-occurrence in 80% or more of the world’s languages

η >

0.3

Oav > 0.8

Page 31: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

The Binding Force of the Communities: Feature Economy

• Feature Entropy: The idea is borrowed from information theory

• For a community C of size N, let there be pf consonants for which a particular feature f is present and qf other consonants for which f is absent – probability that a consonant chosen from C has f is pf /N and that it does have f is qf /N or (1- pf /N)

• Feature entropy can be therefore defined as

where F is the set of all features present in the consonants in C

• Essentially the number of bits needed to transmit the entire information about C through a channel.

ΣFЄf(-(pf /N)log(pf /N) – (qf /N)log(qf /N))FE =

Page 32: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Computing Feature Entropy

Lower FE -> C1 economizes on the number of features

Higher FE -> C2 does not economize on the number of features

Page 33: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

If the Inventories had Evolved by Chance!

• Construction of PhoNetrand

– For each consonant c let the frequency of occurrence in UPSID be denoted by fc.

– Let there be 317 bins each corresponding to a language in UPSID.

– fc bins are then chosen uniformly at random and the consonant c is packed into these bins without repetition.

– Thus the consonant inventories of the 317 languages corresponding to the bins are generated.

– PhoNetrand can be constructed from these new consonant inventories similarly as PhoNet.

• Cluster PhoNetrand by the method proposed earlier

Page 34: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

PhoNet

PhoNetrand

0 5 10 15 20

10

5

0

Avera

ge F

eatu

re

En

trop

y

Community Size

The curve shows the average feature entropy of the communities of a particular size versus the community size

Comparison between PhoNet and PhoNetrand

Page 35: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Our Findings

• The distribution of the occurrence of consonants over languages follow a power-law behavior;

• A preferential attachment-based model can reproduce this distribution of occurrence to a very close approximation (mean error ~0.01);

• The patterns of co-occurrence of the consonants, reflected through communities in PhoNet, are observed in 80% or more of the world's languages;

•Such patterns of co-occurrence would not have emerged if the consonant inventories had evolved just by chance;

Page 36: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

The Epilogue

• How to explain preferential attachment?– Perhaps it is due to the linguistic heterogeneity involved in the

process of language change (at the microscopic level)– Consonants belonging to languages that are prevalent among the

speakers in one generation have a higher (and higher) chance of getting transmitted to the speakers of the subsequent generations

– The above heterogeneity manifests as preferential attachment in the mesoscopic level

• What is the cause of the origin of feature economy?– Perhaps it is the outcome of the interplay of the functional forces

such as the perceptual contrast and ease of learnability that is reflected as feature economy

Indo-European family of languages

Page 37: Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Danke!