PERCEPTION DE LA HAUTEUR DES SONS COMPLEXES …

N° d'ordre: 160 2000 Année 2000

THESE

Présentée

devant l'UNIVERSITE CLAUDE BERNARD - LYON 1

Pour l'obtention

Du Diplôme DE DOCTORAT Discipline Acoustique

(arrêté du 30 mars 1992)

Présentée et soutenue publiquement le 27 Septembre 2000.

ParM. Nicolas GRIMAULT

PERCEPTION DE LA HAUTEUR DES SONS COMPLEXESHARMONIQUES: ETUDE DES MECANISMES SOUS-JACENTS ET

RELATION AVEC L'ANALYSE DE SCENES AUDITIVES.

Directeur de Thèse: Docteur Christophe MICHEYL

JURY: Docteur Georges CANEVET (DR. CNRS), rapporteurDocteur Robert P. CARLYON (MRC Senior Scientist)Professeur Lionel COLLETDocteur Laurent DEMANY (DR. CNRS), rapporteurDocteur Christophe MICHEYL (CR. CNRS)Docteur Richard RAGOT (CR CNRS)Professeur Michel SUNYACH

REMERCIEMENTS

Au Docteur Christophe Micheyl qui a encadré l'ensemble de mon travail.Il m'a apporté lapassion nécessaire au chercheur par son dynamisme scientifique et il m'a enseigné lapsychoacoustique avec beauoup de patience et de tact.Au Professeur Lionel Collet qui m'a accordé sa confiance sans retenue dès que je l'ai sollicité.Il m'a non seulement accueilli dans son laboratoire mais aussi soutenu tout au long de mondoctorat.Au Docteur Laurent Demany qui a bien voulu rapporter mon travail et qui, par ses critiques,m'a permis d'approfondir ma réflexion et d'amméliorer la qualité d'ensemble de ma thèse.Au Docteur Georges Canévet qui lui aussi a rapporté mon travail mais qui surtout, par laqualité de son enseignement, est responsable de mon orientation vers la psychoaoustique.Au Docteur Robert Carlyon qui, plus qu'un membre du jury, à été un collaborateur précieuxtout au long de mon doctorat et dont j'ai profité de la très grande expérience.Aux Professeur Michel Sunyach et Docteur Richard Ragot qui ont accepté de participer à monjury et qui ont donc contribué à la realisation de ce document.A l’ensemble des membres d’Entendre et en particulier à Messieurs Lombard, Bouroukhoff,Leblanc, Arthaud et Garnier qui, en me finançant, ont permis la réalisation de mon doctorat.De nombreux membres d’Entendre ont été de réels collaborateurs et ont donc participé nonseulement financièrement mais aussi scientifiquement à ce travail.Aux Docteurs Sid Bacon et Jungmee Lee qui m’ont initié à la phychoacoustique pendant monDEA et qui m’ont permis de débuter en tant que chercheur dans ce domaine.Aux Professeurs Magnan, Chais et Cazals qui m’ont fait découvrir à Marseille les aspectscliniques que peut parfois revètir la psychoacoustique.A l’ensemble des membres du laboratoire et du pavillon U et en particulier à Annick, Collette,Michel, Vincent, John, Evelyne, Sylvianne, Berger Vachon et Annie qui, de mille et unefaçons m’ont permis, au quotidien, de réaliser mon doctorat dans les meilleurs conditions.A tous les étudiants du laboratoire, Vincent, les deux Stéphanies, Arnaud, Marie, Caroline,Nathalie, Sonia, les Stéphanes et tous les autres, qui ont partagé avec moi le laboratoire, lesrepas et les pauses au cours de ces trois années de travail en commun.A Nadège, Mathieu et Bénédicte avec qui, au cours de ces trois ans, mes relations sontdevenues tout particulièrement amicales.A ma famille, mes parents, et en particulier à mon père, qui m’ont permis de réaliser d’aussilongues études et qui sont incontestablement à l’origine de ma vocation de chercheur.A ma femme, Christine, qui m’a toujours soutenu et qui accepte avec plaisir les contraintes demobilité inhérente à un cursus universitaire de chercheur transformant cette contrainte enaventure.

Grimault 1

TABLES DES MATIERES

AVANT PROPOS: Au sujet de la psychoacoustique.

INTRODUCTION

MISE AU POINT BIBLIOGRAPHIQUE

1-Le système auditif comme analyseur spectral.

1-1-La tonotopie cochléaire.

1-2-Le phénomène de vérouillage de phase.

1-3-Le concept de bande critique.

2-Le codage de la hauteur des sons complexes harmoniques.

2-1-Le codage spectral.

2-1-1-Le modèle de Goldstein.

2-1-2-Le modèle de Terhart (Terhart, 1972, 1978).

2-1-3-Conclusions.

2-2-Le codage "non-spectral".

2-3-Le codage de la hauteur par autocorrélation.

2-3-1-La notion d'autocorrélation.

2-3-2-Une autocorrélation physiologique ?

2-3-3-Les modèles "autocorrélatifs".

2-3-3-1 Le modèle de Meddis & Hewitt (1991a,b).

2-3-3-2- confrontation de ce modèle aux donnéespsychoacoustiques.

2-3-3-2-1- La hauteur des sons complexes composésd'harmoniques en phase sinus ou alternées.

2-3-3-2-2- La discrimination de la hauteur des sonscomplexes harmoniques.

Page

1

4

6

12

13

13

15

15

18

22

23

26

27

27

29

29

30

32

32

38

39

46

Grimault 2

2-3-3-2-3- Hauteur d'un son complexe ayant unharmonique décalée en fréquence.

2-3-3-2-4- Critique du modèle autocorrélatif de Meddis& Hewitt (1991).

3-L’analyse de scène en audition.

3-1-L’analyse par schémas.

3-2-L’analyse primitive des scènes auditives.

3-2-1-L’analyse de sources simultanées.

3-2-1-1-La corrélation temporelle (principe du destincommun).

3-2-1-2- Progression de la transformation, continuité etlenteur.

3-2-2-L’analyse de sources séquentielles: "le streaming".

3-2-2-1-Cadre général.

3-2-2-2-L'influence particulière de la hauteur virtuelle.

3-2-2-3-Les modèles de groupement séquentiel.

3-2-2-4-De l'organisation séquentielle à la discrimination dehauteur.

4-Résumé, objectifs de ce travail et introduction de mes travaux personnels.

5-Une méthode d'exploration basée sur les apprentissages sélectifs.

5-1-Introduction.

5-2-La plasticité neuronale du système auditif interne induite parapprentissage.

TRAVAUX EXPERIMENTAUX

Chapitre 1: Etude des mécanismes d'encodages de la hauteur des sons complexesharmonique résolus ou non-résolus par le système auditif périphérique.

Article 1: Evidence for two pitch encoding mechanisms using a selective auditorytraining paradigm.N. Grimault, C. Micheyl, R. P. Carlyon et L. Collet.

51

52

57

59

60

61

62

65

69

70

74

76

77

78

80

80

83

87

88

89

Grimault 3

Article 2: Perceptual learning in pure-tone frequency discrimination and amplitude-modulation rate discrimination, and generalization to fundamental frequencydiscrimination.N. Grimault, C. Micheyl, R. P. Carlyon, S. P. Bacon et L. Collet.

Chapitre 2:Implication et importance d'un codage performant de la hauteur surl'analyse de scènes en audition.

Article 3: Influence of peripheral resolvability on the perceptual segregation ofharmonic complex tones differing in fundamental frequency.N. Grimault, C. Micheyl, R. P. Carlyon, P. Arthaud et L. Collet.

Article 4: Perceptual auditory stream segregation of sequences of complexsounds in subjects with normal and impaired hearing.N. Grimault, C. Micheyl, R. P. Carlyon, P. Arthaud et L. Collet.

Article 5: Further evidence for the resetting of the pitch analysis system by abrupttemporal transitions between sucessive tones.N. Grimault, C. Micheyl, R. P. Carlyon et L. Collet.

RESUME GENERAL ET CONCLUSIONS.

1-Les mécanismes présumés d'encodages de la hauteur.

2-L'analyse de scène auditive est-elle conditionnée par les mécanismes deperception de la hauteur.

3-Conclusions.

BIBLIOGRAPHIE GENERALE

ANNEXES

A1: Modèle de calcul des patterns d'excitations périphériques.

A1-1-Présentation du modèle.

A1-2-Application du modèle.

A1-3-Discussion du modèle.

A1-4-Résultats et apport du modèle à la discussion de l'étude 5.

RESUME EN ANGLAIS

INDEX

109

141

142

153

173

191

192

196

199

200

220

221

221

224

225

226

230

231

Grimaut 4

AVANT PROPOS

Au sujet de la psychoacoustique.

La psychoacoustique est un domaine souvent mal connu et même très souvent

totalement inconnu du grand public. Il me semble donc important de donner ici un bref aperçu

de cette discipline et en particulier de ses objectifs et de ses outils de travail. Toute cette

discipline part de l'idée qu'il doit exister des règles universelles gouvernant les sensations

éveillées par les stimulations auditives. Il s'agit alors de découvrir ces règles dans le double

but de pouvoir prévoir la sensation qu'évoquera un stimulus donné et de mieux comprendre

les mécanismes permettant l'élaboration de cette sensation.

Le psychoacousticien tente donc toujours de relier les grandeurs physiques du stimulus

(Intensité de la pression acoustique, fréquence, fréquence fondamentale...) aux sensations

évoquées (Sonie, tonie, hauteur...).

Système auditif

Stimulation Sensation

Pa (t) ,F,F0,...Sonie,Tonie,Hauteur, Timbre...

Fig 1: La boite noire de la psychoacoustique. Une stimulation d'entrée avec différents

paramètres (pression acoustique en fonction du temps (Pa(t)), fréquence (F), fréquence

fondamentale (F0)...) donne lieu à une sensation sonore ayant différentes caractéristiques de

sonie, de tonie, de hauteur et de timbre. La psychoacoustique a la charge d'expliciter la boite

noire ci-dessus.

Grimaut 5

La connaissance de ces règles est extrêmement utile pour rendre performante toute

communication utilisant les voies auditives et mettant en oeuvre un appareil électrique,

mécanique ou électronique. Ces appareils peuvent être aussi variés qu'un récepteur de

téléphone, une prothèse auditive, un implant cochléaire ou un instrument de musique.

De plus il est parfois possible de déduire de ces règles les mécanismes neuronaux qui sont

sous-jacents aux processus perceptifs. Cette discipline a donc à la fois une portée pratique et

théorique et peut apporter des informations sensibles au domaine des Neurosciences en

permettant de déterminer des fonctionnements neurologiques.

Les méthodes d'exploration le plus classiquement utilisées sont celles de la

psychologie expérimentale. Schématiquement, pour relier l'évolution d'un paramètre physique

à l'évolution de la sensation qu'il procure, nous pouvons faire écouter à un sujet plusieurs

conditions de stimulation obtenues en variant la grandeur physique du paramètre puis lui

demander, soit de juger le son en le plaçant sur une échelle sensitive, soit de le comparer à un

son de référence. Nous obtenons alors des éléments permettant de déterminer l'influence du

paramètre sur la sensation auditive évoquée.

Cette thèse de psychoacoustique a un double objectif. La première est de préciser les

mécanismes qui font naître la sensation caractéristique de hauteur lors de la présentation d'un

son complexe harmonique (ie. Un son composé de plusieurs sons purs dont les fréquences

sont toutes multiple d'une même fréquence dite fréquence fondamentale). La seconde est

d'obtenir plus d'informations sur la contribution de la hauteur dans les mécanismes

d'organisation perceptive auditive.

Grimaut 6

INTRODUCTION

Grimaut 7

Les ondes acoustiques complexes dont le spectre est constitué d'harmoniques (spectre

de raies) évoquent en général une sensation auditive de hauteur dite fondamentale ou virtuelle.

Ce phénomène, qui est connu depuis plus d’un siècle et a fait l’objet de nombreuses études

expérimentales durant les dernières décennies, suscite encore aujourd’hui certaines

interrogations. Ainsi, les mécanismes par lesquels le système auditif central « calcule » cette

hauteur virtuelle sur la base des informations diponibles en sortie du système auditif

périphérique constituent toujours l’objet de débats animés dans le champ de la

psychoacoustique et de la physiologie auditive. Une question particulièrement brûlante depuis

quelques années concerne l’unicité ou au contraire la multiplicité de tels mécanismes. En

effet, si un ensemble de travaux de modélisation mathématiques et physiologiques suggèrent

qu'un unique mécanisme permet de déterminer la hauteur virtuelle de tous les sons complexes

périodiques, que ceux-ci contiennent des harmoniques de rang élevé ou faible et que leur

fréquence fondamentale soit faible ou élevée, certains résultats psychoacoustiques suggèrent,

en revanche, la nécessaire existence de mécanismes dissociés pour s’accommoder des

contraintes fonctionnelles de la périphérie auditive et, plus précisément, de la résolution

fréquentielle cochléaire. Ainsi, de façon schématique, il faudrait distinguer deux cas de figures

selon que les harmoniques sont suffisamment espacés pour exciter des filtres auditifs

périphériques distincts ou non. Dans le premier cas, les harmoniques sont alors dits « résolus »

par le système auditif ; dans le second, ils sont dits « non résolus ». A chacun de ces cas de

figure correspondrait un mécanisme sous-jacent différent pour l’encodage de la sensation de

hauteur virtuelle.

Cette question de l’unicité ou, au contraire, de la multiplicité, des mécanismes

d’encodage de la hauteur virtuelle en fonction de la résolution fréquentielle du système auditif

périphérique a constitué la toile de fond de ma recherche doctorale et constitue par conséquent

le thème majeur de cette thèse. Dans un premier groupe de travaux expérimentaux, j’ai tenté

Grimaut 8

d’apporter un éclairage nouveau à cette question en utilisant une approche relativement

originale, fondée sur l’étude du transfert d’apprentissage perceptif de la discrimination de

hauteur fondamentale entre des conditions de stimulation différant par degré de résolution

fréquentielle des harmoniques de sons complexes. J’ai émis l’hypothèse que si les

mécanismes sous-jacents à l’encodage de la hauteur virtuelle des harmoniques résolus et non

résolus avaient effectivement un substratum neurophysiologique différent, les bénéfices d’un

entraînement sélectif prolongé de l’un de ces mécanismes au moyen de stimulus composés

exclusivement d’harmoniques résolus devrait se transférer peu ou prou à des conditions de test

impliquant des harmoniques non résolus (et vice versa) car les unités nerveuses mises en jeu

lors de l’entraînement et du test seraient différentes. Ayant testé cette hypothèse au moyen

d’une première étude longitudinale dont les résultats (que j’invite le lecteur à découvrir en

détail dans l’article dévolu à leur présentation) vont globalement dans le sens de l’hypothèse

duale, j’ai souhaité aller plus loin dans cette question en essayant de déterminer la nature des

mécanismes d’encodage de la hauteur virtuelle des harmoniques résolus et non résolus.

Certaines données de la littérature, décrites dans la partie théorique de la thèse, suggèrent que

la première étape du mécanisme utilisé pour déterminer la hauteur virtuelle d’un groupe

d’harmoniques résolus, consistant à déterminer la hauteur de chacune des composantes

individuelles du son, est similaire à celui mis en œuvre pour encoder la hauteur d’une unique

composante fréquentielle. Par conséquent, on peut faire l’hypothèse que la discrimination de

fréquence fondamentale d’hamoniques résolus bénéficie d’un entraînement à la discrimination

fréquentielle de sons purs. D’un autre côté, certains travaux suggèrent que l’encodage de la

hauteur virtuelle d’un groupe d’harmoniques non résolus impliquerait la détermination

relativement précise de la cadence des fluctuations d’enveloppe en sortie des filtres auditifs

périphériques (dans lesquelles l’interaction de plusieurs harmoniques suscite une activité

fluctuant à une cadence qui correspond à la fréquence fondamentale). Par conséquent, on peut

Grimaut 9

faire l’hypothèse que la discrimination de fréquence fondamentale d’harmoniques non résolus

bénéficie d’un entraînement à la discrimination de cadences de modulation d’amplitude.

J’invite le lecteur, là encore, à découvrir dans la seconde partie de la thèse les résultats de

cette seconde étude portant sur l’existence et la nature des mécanismes sous-jacents à la

discrimination de hauteur d’harmoniques résolus et non résolus.

L’autre grande question qui a inspiré ma recherche doctorale concerne l’influence de la

résolution fréquentielle sur l’organisation perceptive des séquences de sons complexes sur la

base de leur hauteur virtuelle. Cette question, connexe de la précédente, a été inspirée par les

résultats d’une étude antérieure réalisée par Micheyl et Carlyon (1998), qui suggèrent que les

auditeurs ont plus de mal à (voire, se trouvent dans l’impossibilité de) tirer profit de

différences de fréquence fondamentale entre des sons complexes successifs afin de les séparer

en différents « flux » perceptifs. En d’autres termes, une insuffisante résolution fréquentielle

périphérique pourrait mettre en défaut les mécanismes de l’organisation perceptive qui

opèrent dans le domaine séquentiel. Cette hypothèse m’a paru constituer un prolongement

intéressant de mes autres travaux car si une meilleure connaissance de l’influence de la

résolution fréquentielle sur les mécanismes de la perception de la hauteur des sons complexes

est passionnante d’un point de vue théorique, ses implications pratiques demeurent

relativement abstraites ou indirectes. En revanche, s’il s’avère que cette influence de la

résolution fréquentielle conditionne en partie la capacité à organiser perceptivement les

séquences de sons complexes (que sont, schématiquement, la musique et la parole), cela

pourrait avoir des retombées importantes sur la compréhension des difficultés qu’éprouvent

les individus atteints de surdité partielle d’origine cochléaire vis-à-vis des scènes auditives

complexes. En effet, divers travaux de la littérature indiquent que la résolution fréquentielle

périphérique est presque systématiquement réduite par les atteintes cochléaires. Par

conséquent, j’ai réalisé deux études dans ce champ : Une première étude chez des sujets

Grimaut 10

normo-entendants visait à tester dans quelle mesure la résolution fréquentielle des

harmoniques influence la capacité à former des flux auditifs sur la base de séquences ABA de

sons complexes différant par la fréquence fondamentale. Une seconde étude, impliquant des

sujets normo- et malentendants, visait à compléter la précédente en testant si les performances

de séparation en flux des séquences de sons complexes harmoniques sur la base de différences

de fréquence fondamentale sont effectivement moindres chez les seconds que chez les

premiers. Le lecteur découvrira dans la seconde partie de la thèse les deux articles consacrés à

ces études et à la présentation de leurs résultats. Il y découvrira également une troisième et

dernière étude que j’ai réalisée dans le but de caractériser l’influence de la résolution

fréquentielle sur l’organisation perceptive des séquences de sons complexes. Les résultats de

cette dernière étude trouvent parfaitement leur place à la fin de cette thèse en ce que, d’une

part, ils complètent les résultats précédemment obtenus, mais, d’autre part, suggèrent un

certain nombre de perspectives pour de futures études dans le cadre de cette vaste

problématique de l’influence de la résolution fréquentielle sur la perception des séquences

sonores.

Pour clore cette introduction, et avant d’entrer dans le vif du sujet, je préciserai que j’ai

tenté de réunir dans la première partie de cette thèse les principaux éléments bibliographiques

qui seront je l’espère utiles aux lecteurs n’étant pas spécialistes des domaines de la

psychoacoustique concernés par les études expérimentales présentées en seconde partie ; à

savoir, essentiellement : la perception de la hauteur virtuelle, les règles de l’organisation

auditive, et l’apprentissage perceptif auditif. Loin de prétendre couvrir ces vastes questions de

façon exhaustive, cette première partie vise plutôt à faire ressortir des multiples publications

antérieures qui ont été consacrées à ces questions, les résultats qui ont inspiré mes hypothèses

de travail, en précisant par là même le contexte général dans lequel ma recherche doctorale est

venue s’inscrire.

Grimaut 11

Grimaut 12

MISE AU POINT BIBLIOGRAPHIQUE

Grimaut 13

1-Le système auditif comme analyseur spectral.

L'ensemble des signaux utilisés pendant la durée de mon doctorat sont des sons

complexes harmoniques. Il est connu de longue date que notre système auditif, à la réception

d'un tel son, composé de plusieurs sons purs, est capable, sous certaines contraintes, d'analyser

ce son. Ainsi, si on envoie simultanément deux sons purs dont les fréquences sont espacées,

nous pouvons isoler chacune des composantes et percevoir ainsi l'un et l'autre des sons purs

(Plomp, 1964; Green, 1964). Notre système auditif fonctionne donc comme un analyseur

spectral.

1-1-La tonotopie cochléaire

Pour réaliser cette analyse spectrale des signaux que nous percevons, l'utilisation des

propriétés tonotopiques de la cochlée semble plausible.

Rappelons tout d'abord que lorsqu'un son pur excite la membrane basilaire, la fréquence de ce

son est en bijection avec la situation géographique du maximum de l'enveloppe de la vibration

de la membrane basilaire (figure 2). La correspondance entre la situation du maximum et la

fréquence du son incident a été mesurée par Dolmazon en 1978 (cité dans Canévet, 1995).

Grimaut 14

Figure 2: Cette figure, extraite de l'ouvrage de Moore (1989), reprend les travaux de von

Békésy (1947). Elle représente schématiquement le déplacement instantané de la membrane

basilaire à deux instants successifs. Les auteurs ont tracé en pointillé l'enveloppe du

déplacement de la membrane. Le maximum de cette enveloppe dépend de la fréquence du son

incident.

Les fibres nerveuses connectées au niveau de ce maximum sont ainsi représentatives de la

fréquence du son pur. Cette tonotopie cochléaire est ensuite conservée tout au long des voies

auditives.

D'autre part, les fibres nerveuses connectées en ce point sont tout particulièrement réceptives

aux sons de cette fréquence. Chaque neurone répond ainsi de façon privilégiée aux sons d'une

fréquence particulière. On peut ainsi tracer des courbes en cloche caractérisant la réponse de

chaque neurone en fonction de la fréquence. Ces courbes sont classiquement appelées les

courbes d'accord des neurones.

Ce mécanisme de codage de la fréquence serait particulièrement exploité lorsque les sons

présentés sont courts ou/et de haute fréquence.

Grimaut 15

1-2-Le phénomène de verouillage de phase.

Par contre, en présence de sons purs de fréquence relativement basse (<4 kHz), les

neurones déchargent préférentiellement lors des pics de pression de l'onde excitatrice.

L'existence d'une période réfractaire des neurones -durée d'une milliseconde environ

succédant à une décharge et pendant laquelle toute décharge est impossible- rend la

synchronisation incomplète mais on peut toutefois déduire la fréquence d'un son en observant

la cadence de décharge des fibres. En effet, celles-ci déchargent à des instants qui

correspondent tous à un nombre entier de période. Ce type de codage de la fréquence pourrait

bien être dominant en présence de sons relativement longs et surtout à des fréquences

inférieures à 4-5 kHz (Rose et al., 1968; Moore, 1973). Au-dessus de 5 kHz, les neurones ne

parviennent plus à suivre la cadence du son excitateur.

Toutefois, cet indice semble difficile à exploiter lors de la présentation d'un signal complexe.

1-3-Le concept de bande critique

Le concept initial de bande critique (Fletcher, 1940) vient de l'observation

expérimentale suivante: dans une expérience de détection d'un son pur plongé dans un bruit

large bande (par exemple un bruit blanc), seule une petite bande spectrale centrée sur le son

pur gène ou masque sa perception. La largeur de cette bande "masquante" défini la bande

critique à la fréquence du son.

Cette bande donne ainsi toute la gamme des fréquences des sons risquant d'interférer

avec le son pur s'ils sont présentés simultanément à celui-ci.

Grimaut 16

Le fonctionnement de la cochlée pourrait être ainsi modélisé par une juxtaposition de

bandes critiques (36 de ces bandes couvrent la gamme fréquentielle allant de 26 Hz à 10781

Hz). Ces bandes critiques peuvent être considérées comme les bandes passantes à -3 dB de

filtres passe-bandes. La cochlée peut alors être modélisée par un banc de filtres auditifs. Le

passage au travers de ce banc de filtres permettrait ainsi l'analyse fréquentielle de n'importe

quel signal complexe excitant la cochlée.

L'exploration de ces filtres auditifs et le calcul de leurs différentes caractéristiques

spectrales et temporelles (la détermination de leur largeur en fonction de leur fréquence

centrale, la détermination de leur réponse impulsionelle dans le domaine temporel...) a donné

lieu à de très nombreuses études qui ont donné naissance à de nombreux modèles du

fonctionnement du système auditif périphérique (Glasberg & Moore 1990; Irino & Patterson,

1997).

Grimaut 17

Fig 3: Simulation des sorties temporelles de 10 filtres auditifs centrés à des fréquences allant

de 250 Hz à 6 kHz. Ces filtres ont été stimulés par les 20 premiers harmoniques de 500 Hz (ie.

20 sons purs de fréquences n*500 ng[1,20]). Le spectre de ce stimulus est présenté

verticalement, à droite. Cette simulation a été réalisée avec des filtres auditifs de type

"gammachirp" définis par Irino & Patterson (1997).

J'ai moi-même développé au cours de cette thèse un modèle permettant le calcul des

diagrammes d'excitation en sortie de périphérie auditive (i.e. la forme d'onde temporelle en

sortie de chaque filtre auditif) qui utilise les réponses impulsionelles des filtres auditifs telles

qu'elles sont décrites par Irino & Patterson. Contrairement au modèle de diagrammes

d'excitation de Glasberg & Moore ou le passage du domaine temporel au domaine spectral,

qui se fait au moyen d'une transformée de Fourier rapide, n'a pas de fondement physiologique,

celui ci convolue directement les signaux avec le filtre cochléaire défini dans le domaine

Grimaut 18

temporel. La fenêtre temporelle d'intégration est donc plus proche de la réalité physiologique

et dépend de la fréquence centrale du filtre utilisé.

On voit ainsi sur la figure 3 qu'un modèle du système auditif périphérique comprenant un

ensemble de filtres auditifs permet une analyse du signal d'entrée. Les réponses dans le

domaine temporel (les diagrammes d'excitation) sont représentées à la sortie de 10 filtres

auditifs. Le filtre centré sur 250 Hz n'est, par exemple, pas excité car cette fréquence n'est pas

présente dans le signal alors que celui centré sur le premier harmonique du stimulus (500 Hz)

est excité. On voit aussi que le pouvoir d'analyse d'un banc de filtres est limité par la largeur

des filtres. Ceci est tout particulièrement vrai en haute fréquence car la largeur des filtres

augmente avec leur fréquence centrale. Ainsi, la largeur du filtre centré sur 4750 Hz fait qu'il

est excité par plusieurs harmoniques de 500 Hz (Figure 3).

J'ai évoqué en introduction deux différentes cas de figure de sons complexes (les sons

"résolue" et "non-résolue") dont nous verrons par la suite la définition rigoureuse. On peut

cependant préciser dès à présent que la "résolvabilité" d'un son complexe dépend du nombre

d'harmoniques par filtre auditif.

2-Le codage de la hauteur des sons complexes harmoniques.

Avant toute chose, il convient maintenant de définir précisément ce que l'on entend par

"son complexe harmonique" -qui sera souvent abrégé en "son complexe" dans cet ouvrage. De

façon générale, on appelle son complexe tout son qui n'est pas un son pur et dont le spectre

n'est donc pas limité à une unique raie. Un son complexe harmonique est quant à lui composé

d'un ensemble de sons purs dont les fréquences sont toutes multiples d'une même fréquence

que l'on appelle la fréquence fondamentale. Ainsi, il existe toujours une fréquence

Grimaut 19

fondamentale F0 telle que le spectre S d'un son complexe harmonique puisse se décomposer

mathématiquement de la façon suivante:

∑Ψ∈

⋅=k

FkS 0

[ [+∞⊂Ψ ;1

Ï est l'ensemble constitué des rangs des harmoniques présents dans le spectre.

Ce son peut ainsi être représenté par son spectre, comme sur la figure 4.

Fréquence

Rangn=1 n=2 n=3 n=4 n=5 n=6 n=7 n=8 n=9

F0 2.F0 3.F0 4.F0 5.F0 6.F0 7.F0 8.F0 9.F0

Fig 4: Spectre schématique d'un son complexe harmonique de fréquence fondamentale F0 ,

avec Ï=[1,9].

La hauteur d'un son comme celui-ci sera approximativement égale à sa fréquence

fondamentale F0 lorsque tous les harmoniques du son complexe sont en phase. On comprend

bien, qu'un tel son ne contient pas nécessairement l'harmonique de fréquence F0 (dès que

1hÏ). Pour cette raison, on appelle souvent la hauteur évoquée par un son complexe

harmonique la "hauteur virtuelle".

Grimaut 20

La perception ou non d'une hauteur en absence de l'harmonique de rang 1 a donné lieu a un

débat historique mettant en scène des hommes de science aussi illustres que Ohm (1843) et

Helmholtz (1863, 1877). Ce premier, en se basant sur le théorème de Fourier, a fait

l'hypothèse de la nécessaire présence de la fréquence fondamentale pour donner lieu à une

hauteur. Cette hypothèse dont la preuve expérimentale manquait a été démontrée comme étant

expérimentalement inexacte par Seebeck (1841, 1843). La reconnaissance scientifique de ce

dernier n'était cependant pas suffisante à l'époque pour résister lorsque Helmoltz vint soutenir

les travaux de Ohm. Il faudra attendre les travaux de Schouten en 1940 pour réhabiliter

Seebeck et confirmer ses résultats.

Il convient au passage de définir un vocabulaire associé à ces notions de sons complexes et de

bande critique. Le spectre d'un son complexe est donc constitué d'un ensemble d'harmoniques

equi-répartis en fréquence. Lors de l'excitation de la cochlée par un tel son, deux cas de

figures sont possibles. Ces deux configurations sont représentées sur la figure 5.

Grimaut 21

Base Apex

Son complexe résolu par le système auditif périphérique.

Son complexe non-résolu par le système auditif périphérique.

Base Apex

Fig 5: Représentation des deux configurations possibles (résolu et non-résolu) lors du

passage d'un son complexe harmonique au travers du banc de filtres auditifs. En haut, tous

les harmoniques sont isolés dans un filtre distinct. Le son est alors résolu. En bas, plusieurs

harmoniques interfèrent dans les filtres. Le son est non-résolu.

Sur cette figure où la cochlée est schématisée, des filtres auditifs ont été répartis ainsi qu'un

son complexe. On voit bien que la largeur des filtres varie suivant leur position sur la cochlée.

Ils sont larges à la base (codage des hautes fréquences) et étroits à l'apex (codage des basses

fréquences). Suivant la fréquence fondamentale du son complexe et le rang de ses

harmoniques, de nombreux harmoniques peuvent interférer dans des filtres ou au contraire y

être isolés. Nous pouvons à présent définir la notion de "résolvabilité" évoquées en

introduction: dans le premier cas, on dira que le son complexe est non-résolu par le système

auditif périphérique et dans le second cas, qu'il est résolu. Ce vocabulaire est extrêmement

Grimaut 22

important et sera réutilisé continuellement dans la suite de ce texte puisque cette thèse a pour

objectif principal l'étude des différences perceptives conditionnées par la résolvabilité des

signaux.

Nous allons dans ce chapitre faire une revue non exhaustive des différents modèles qui

ont été proposés dans la littérature pour expliquer par quels mécanismes le système auditif

"calcule" cette hauteur virtuelle. Ces modèles se séparent en deux grandes classes: les modèles

spectraux et les modèles temporels. La réalité physiologique de ces différents modèles suscite

toujours des polémiques scientifiques entre les différentes équipes travaillant sur ce sujet. Les

conclusions auxquelles aboutissent les études réalisées pendant ma thèse (voir à ce sujet les

articles du chapitre 1) sont les suivantes: au moins deux modèles peuvent être potentiellement

utilisés par le système auditif pour coder les sons complexes harmoniques. L'un serait sans

doute de type spectral et l'autre temporel. Mais n'anticipons pas trop et présentons dans un

premier temps les différents modèles proposés à ce jour dans la littérature.

2-1-Le codage spectral

Ces mécanismes potentiels de codage de la hauteur, qui sont aussi appelés des

mécanismes de codage "par la place", du fait de l'utilisation de la tonotopie cochléaire, ont été

historiquement les premiers à être élaborés.

Ils se regroupent en deux grandes classes de modèles:

2-1-1-Le modèle de Goldstein.

Grimaut 23

Ce modèle a été développé initialement par Goldstein en 1973. Il a cependant été

repris et revu par de nombreuses études jusque vers la fin des années 80 (Beerends &

Houtsma, 1986; Faulkner, 1985; Gerson & Goldstein, 1978; Scheffer, 1983; Srulovicz &

Goldstein, 1983). Il peut se décomposer en deux phases principales. Dans un premier temps,

d'après les informations recueillies en sortie de périphérie auditive, les fréquences des

différents harmoniques composant le signal sont isolées et mesurées. Le modèle prend en

considération l'erreur potentiellement commise à ce niveau et le fait que la distribution de la

fréquence mesurée suit, pour chaque harmonique, une loi gaussienne centrée sur cet

harmonique. Une incertitude "gaussienne" (l'écart type de la distribution) existe donc dans la

mesure de la fréquence de chaque harmonique.

Une fois cette décomposition terminée, un mécanisme central permettrait de trouver la hauteur

virtuelle du son complexe en comparant de façon systématique (en minimisant une variable

mathématique) le spectre mesuré à un ensemble de spectres.

Une brève description du modèle mathématique de Goldstein (Goldstein, 1973) est

développée ci-dessous afin de préciser ce paragraphe.

Grimaut 24

Fig 6: Dans le modèle de Goldstein, la fréquence de chaque harmonique est estimée avec une

précision traduite par l'écart type âF d'une distribution gaussienne centrée sur la fréquence

de l'harmonique.

Comme l'illustration de la figure 6 le souligne, la fréquence de chaque harmonique est

estimée avec une précision traduite par l'écart type âF d'une distribution gaussienne centrée

sur la fréquence de l'harmonique.

Remarquons au passage qu'il est indispensable à ce calcul que chaque harmonique puisse être

isolé des autres composantes du son complexe. Il faut donc que le son complexe ait ses

harmoniques résolus par le système auditif. Cette hypothèse avait pourtant été fortement

remise en cause par Plomp, en 1964.

Après détermination des N harmoniques (xk, ki[1,N]) contenus dans le signal, la

minimisation de la variable ö² permet de trouver la fréquence fondamentale F0 ainsi que le

rang ñ du premier harmonique présent.

Grimaut 25

( )[ ]∑=

−+−=N

kkk Fkñx

1

220

2 1 σε

De plus, la valeur de F0 peut directement être déterminée par une seconde formule extraite du

travail de Goldstein.

( ) 2

1120

11 ∑∑==

−+−+=

N

k k

N

k k

k kñxkñF

σσ

Il faut remarquer toutefois une limitation de ce modèle qui sous-entend que tous les

harmoniques présents sont consécutifs.

De plus, il est remarquable que la phase n'a aucun rôle à jouer dans un calcul de ce type. Cette

dernière assertion a été mise en défaut par de très nombreuses études (Bilsen, 1973; Buunen et

al., 1974; Lundeen & Small, 1984; McKeown & Darwin, 1991) et tout particulièrement par

les études de Shackleton & Carlyon (1994) et Carlyon & Shackleton (1994) qui montrent que

sous certaines contraintes, la modification de la phase des seuls harmoniques impairs (en

ajoutant une constante de à/2) peut doubler la hauteur perçue.

Il est toutefois remarquable que la phase ne semble pouvoir influencer la hauteur perçue que

lorsque les harmoniques constituant le stimulus ne sont pas résolus par le système auditif

périphérique (Moore & Glasberg, 1989; Bilsen, 1973). Nous avons déjà vu que cette

contrainte était exigée plus haut pour le calcul de la fréquence de chaque composante. Ce

modèle ne semble donc utilisable dans le seul codage de la hauteur des harmoniques résolus

par le système auditif périphérique. Cette limitation n'exclue pas ce modèle de l'ensemble des

modèles potentiellement utilisables par le système auditif. Cependant il faudrait alors au

Grimaut 26

moins deux modèles distincts. Celui-ci pour les harmoniques résolus et un second pour les

harmoniques non-résolus.

2-1-2-Le modèle de Terhardt (Terhardt, 1972a,b; 1978)

Il existe d'autres candidats pour le codage de la hauteur issue d'harmoniques spectralement

isolés ou, en d'autres mots, résolus par la périphérie auditive. Le grand concurrent du modèle

de Goldstein a été développé par Terhardt. Ce dernier suppose qu'un apprentissage pendant

l'enfance nous permet d'associer mémotechniquement chaque hauteur tonale, c'est-à-dire

chaque son pur de fréquence F, à un ensemble de hauteurs virtuelles potentielles (F/k avec

kg[1,+'¦). Alors, lors de la présentation d'un ensemble d'harmoniques, chacun évoque un

ensemble de fréquences fondamentales possibles. Celle qui est retenue est celle partagée par

tous.

Cette hypothèse d'un apprentissage vient de l'idée réaliste que les sons complexes

harmoniques -qui sont extrêmement communs dans notre environnement naturel et participent

largement aux signaux utiles pour la communication orale (les voyelles sont des sons

complexes harmoniques)- puissent, avec de l'entraînement, être perçus comme une entité

sonore à part entière, plutôt que comme un agrégat de sons purs de fréquences différentes. La

présence effective de cet entraînement a été étayée par plusieurs études. Par exemple, des

travaux ont montré que l'enfant acquerrait cette capacité à partir du 6ème ou du 7ème mois

(Bundy et al., 1982; Montgomery & Clarkson, 1997). D'autre part, Hall & Peters (1984) et

Peters & Hall (1984) ont montré que la hauteur évoquée par un son complexe inharmonique

pouvait être influencée par une association prolongée de ce son avec un son complexe

harmonique.

2-1-3-Conclusions

Grimaut 27

Ce chapitre a présenté les deux plus grands modèles spectraux de la perception de la

hauteur. Au cours de cette présentation, des critiques basées sur des résultats expérimentaux

ont été apportées. Ces modèles ne parviennent pas à expliquer l'ensemble des données

expérimentales et ont donc été pris pour cible par certains auteurs (Hartmann & Doty, 1996;

Martens, 1983). Cependant, certains travaux récents apportent à nouveau du crédit à ces

modèles spectraux (Brunstrom & Roberts, 1998; Lin & Hartmann, 1998). Ces auteurs

supposent que la hauteur est calculée en comparant le spectre du son complexe à un ensemble

de gabarits (une collection de spectres) qui ont été préalablement associés à une hauteur.

D'autres auteurs donnent même les bases physiologiques potentielles de l'extraction de la

hauteur (Fishmann et al., 1998). D'après ces derniers, la hauteur virtuelle pourrait être calculée

en utilisant les propriétés tonotopiques du cortex auditif primaire.

Les partisans de ces modèles reconnaissent pourtant que certains phénomènes (en particulier

l'effet de la phase) peuvent difficilement être expliqués. Il me semble donc plus adroit

d'envisager la possible coexistence de plusieurs mécanismes d'extraction de la hauteur qui

seraient activés suivant les conditions de stimulation (la résolvabilité du signal complexe) et

qui provoqueraient une unique sensation: la hauteur virtuelle.

2-2-Le codage "non-spectral"

L'idée d'un codage non spectral a été introduite par Shouten en 1940. Sa théorie est la

suivante: en sortie de l'étage périphérique de filtrage, plusieurs harmoniques interfèrent

souvent dans certains filtres (lorsque les harmoniques sont non-résolus). Or, la périodicité

globale de plusieurs harmoniques qui interfèrent est précisément égale à l'inverse de la

fréquence fondamentale. On extrait donc la fréquence fondamentale, d'après Shouten, grâce au

Grimaut 28

diagramme d'excitation résultant de l'interférence d'au moins deux harmoniques. C'est parce

que ce type de codage ne tire pas profit de la tonotopie cochléaire qu'il a été appelé "non-

spectral". Cependant, cette théorie a rapidement été désapprouvée. En effet, des auteurs ont

montré deux résultats essentiels qui affaiblissent l'hypothèse de Schouten: premièrement, il a

été vérifié dès 1973 qu'un ensemble d'harmoniques tous résolus peut donner naissance à une

sensation de hauteur virtuelle (Bilsen, 1973). Deuxièmement, il a même été montré que les

harmoniques résolus sont les plus "forts" ou les plus importants pour un codage performant de

la hauteur (Moore et al., 1985; Ritsma, 1967).

Cette théorie fut donc mise de coté. Cependant, elle a été reprise par la suite pour deux raisons

principales:

1-Tout d'abord, Plomp (Plomp, 1964) a mis en évidence qu'un son complexe constitué de

deux harmoniques suffisamment proches pour tomber dans la même bande critique pouvait

quand même avoir une hauteur. Il conclue donc que le calcul de la hauteur, doit plus être basé

sur la périodicité que sur la fréquence (Plomp, 1967).

2-Ensuite, en 1976, Burns et Viemeister (1976) ont mis en évidence qu'un bruit blanc modulé

en amplitude pouvait donner naissance à une sensation de hauteur. Le spectre à long terme

d'un tel signal étant plat, tout modèle spectral était voué à l'échec.

Ainsi, plus récemment, un ensemble de modèles très performants reprenant l'idée initiale de

Schouten a été développés. Ces modèles font l'objet du paragraphe ci-dessous.

2-3-Le codage de la hauteur par autocorrélation.

Grimaut 29

Les modèles de cette classe (Meddis & Hewitt, 1991a,b; Bilsen & Ritsma, 1970;

Brown & Puckette, 1989; Slaney & Lyon, 1990; de Cheveigné, 1993, 1998) sont tous issus de

celui inventé par Schouten. En effet, il s'agit d'une analyse globale des diagrammes temporels

d'excitation en sortie de périphérie auditive. Cependant, les sorties de toutes les bandes

critiques sont exploitées pour déduire la hauteur, même si seule un unique harmonique y est

présent.

2-3-1-La notion d'autocorrélation

Ce paragraphe commencera par un bref rappel de la notion mathématique d'autocorrélation.

Soit un signal s, échantillonné et contenant un nombre N d'échantillons; l'intercorrélation å de

ce signal s avec lui même correspond à l'autocorrélation notée åss et est égale à:

∑−

=

+−

=ΦkN

nss knsns

kNk

0

)()(1

)(

Cette autocorrélation est maximum en 0 et, pour des signaux périodiques, elle présente des

maxima à la même fréquence que celle du signal. La figure ci-dessous explicite le calcul du

coefficient d'autocorrélation en un échantillon k particulier.

Grimaut 30

Fig 7: Procédé de calcul d'une autocorrélation. Le signal s en haut de la figure est multiplié

au même signal s décalé dans le temps de k échantillons pour obtenir le signal du bas. La

valeur moyenne de ce signal entre k et N (représentée sur la figure par la ligne horizontale en

pointillés) donne le coefficient d'autocorrélation au point k (åss(k)). On norme souvent ce

coefficient par åss(0).

2-3-2-Une autocorrélation physiologique ?

Voyons maintenant les différentes tentatives faites dans la littérature pour trouver une base

physiologique au traitement corrélatif d'un signal auditif.

Deux principaux auteurs ont développé des modèles utilisant un calcul de corrélation dans

deux objectifs différents. Un excellent résumé de ces travaux et de leurs implications pourra

être trouvé dans de Cheveigné (1999).

Grimaut 31

Jeffress (1948) a développé un modèle de localisation spatiale des sources auditives basé sur

un calcul d'intercorrélation entre les stimuli recueillis dans l'oreille droite et dans l'oreille

gauche du sujet. Les bases et les sites physiologiques de cette intercorrélation ont été

anatomiquement déterminés, en particulier chez le chat (ie. Schwartz, 1992; Smith et al.,

1993) et le hibou (ie. Konishi et al., 1988; Irvine, 1992). Sans rentrer dans le détail, deux

voies, l'une ipsilatérale et l'autre controlatérale provenant des cellules sphériques des noyaux

cochléaires rejoindraient l'olive supérieure médiane. Les auteurs ci-dessus ont montré que la

voie contralatérale est disposée de façon à introduire un gradient de retard alors que le retard

de la voie ipsilatérale est fixe.

Licklider (1956) fut le précurseur de l'idée largement reprise par la suite dans la littérature

selon laquelle la sensation de hauteur serait issue d'un calcul autocorrélatif permettant la

détermination de la période temporelle d'un signal auditif périodique (de type "son complexe

harmonique" par exemple). Il n'y a pas eu pour le moment de confirmation anatomique de la

réalité physiologique de ce modèle sauf peut-être l'étude de Casseday & Covey (1995) qui

montre que la structure du noyau ventral du lemniscus latéral de la chauve-souris est

appropriée au développement d'une "autocorrélation neuronale". Cependant, si cette hypothèse

est encore peu portée par la neurophysiologie, peu d'alternatives semblent plus plausibles. De

plus, ce modèle est suffisamment puissant dans l'explication de nombreuses données

psychoacoustiques ou neuropsychologiques (ie. Meddis & Hewitt, 1991a,b; Cariani et

Delgutte, 1996) pour être un candidat plébiscité par grand nombre de chercheurs.

2-3-3-Les modèles "autocorrélatifs".

Grimaut 32

De nombreux auteurs ont participé à l'élaboration d'un modèle de codage de la hauteur

utilisant la notion d'autocorrélation (Schouten, 1940; Meddis & Hewitt, 1991a,b; Bilsen &

Ritsma, 1970; Brown & Puckette, 1989; Slaney & Lyon, 1990; Meddis & O'Mard, 1997). Le

plus avancé, à mon avis et le plus cité dans ce domaine est sans conteste celui de Meddis &

Hewitt de 1991 amélioré et validé par d'autres études (Meddis & O'Mard, 1997). J'ai donc

choisi ici d'expliquer en détail les différentes étapes du fonctionnement de ce modèle puis, au

travers de la littérature, de le confronter aux données psychoacoustiques existantes.

2-3-3-1 Le modèle de Meddis & Hewitt (1991a,b).

Le modèle de Meddis & Hewitt peut être décomposé en cinq phases principales illustrées sur

la figure 8:

Grimaut 33

Fig 8: Figure extraite de Meddis & Hewitt (1991a) présentant les différentes étapes du

modèle de codage de la hauteur qu'ils ont élaboré.

• Simulation de la fonction de transfert de l'oreille externe et moyenne (Fig 8: Stages 1,2).

Un simple filtre passe-bande a été utilisé. Ses coefficients sont:

yi=0.8878.xi-0.8878.xi-2-0.2243.yi-1+0.7757.yi-2.

• Passage au travers d'un banc de filtre simulant le filtrage périphérique de la cochlée (Fig 8:

Stage 3).

Grimaut 34

128 filtres auditifs de type "Gammatone" (Patterson et al., 1988) ont été régulièrement

disposés entre 80 Hz et 8 kHz. On récupère donc à la sortie de cette phase 128

diagramme d'excitation relativement proches de ceux donnés par la formule de Moore

& Glasberg (1987) et Glasberg & Moore (1990). Ces diagrammes sont calculés en

convoluant le signal d'entrée par la réponse impulsionnelle gt(t) donnée par la fonction

gammatone ci-dessous.

)2cos(...)( ).(..21 φππ += −+ tfetatg c

tfERBbnt

c

la fonction ERB donne la largeur d'une bande critique à une fréquence centrale fc donnée

(Moore, 1986; Glasberg & Moore, 1990; Greenwood, 1961). Cette fonction vaut:

cc ffERB .108.07.24)( +=

a,b,n, fc et å sont les paramètres de la fonction (Patterson, 1976; Patterson et al., 1995).

• Simulation des mécanismes neuro-transducteurs (Fig 8: Stages 4,5).

Cette phase du modèle transforme la fonction modélisant le mouvement de la membrane

basilaire en une fonction probabiliste décrivant le taux de décharge dans le nerf auditif

post synaptique. Ce passage est précisément décrit dans Meddis (1986, 1988).

• Autocorrélation dans chaque filtre pour détecter la périodicité de chaque diagramme

d'excitation (ACF) (Fig 8: Stage 6).

Grimaut 35

La fonction d'autocorrélation proposée par Licklider (1951, 1956, 1959, 1962) est ensuite

appliquée aux 128 bandes pour déterminer la périodicité de chacune d'entre elles. On

peut alors tracer un "autocorrélogramme" tel que celui de la figure 9. Un

autocorrélogramme est une représentation d'un ensemble de fonction autocorrélatives.

Le paramètre du modèle différenciant ces fonctions est indifféremment le numéro de la

bande utilisée ou la fréquence centrale de cette bande.

Grimaut 36

Fig 9: Cette figure extraite de l'article de Meddis & O'Mard (1997) donne quatre exemples

d'autocorrélogramme. Ces simulations ont toutes été réalisées avec des sons complexes

harmoniques de fréquences fondamentales 100 Hz.

Les quatre cadres représentent cependant quatre conditions de stimulations

différentes.

-Les cadres a et b comparent les résultats obtenus en utilisant deux filtres passe bandes

différents appelés LOW et HIGH (correspondant respectivement aux fréquences de

coupure 125-625 Hz et 1375-1875 Hz). Tous les harmoniques sont ici en phase sinus (Si ϕn

est la phase de l'harmonique de rang n, on a ∀ ng[1;+'¦, ϕn=0).

-Les cadres c et d rendent eux aussi compte des résultats obtenus dans les deux régions

LOW et HIGH mais avec des composantes ayant des relations de phases alternées (Si ϕ n

est la phase de l'harmonique de rang n, on a les relations ∀ ng[1;+'¦, ϕ2n=0 et

ϕ2n+1=π/2).

L'ordonnée de ces graphiques donne la fréquence centrale du filtre auditif utilisé dans

le modèle. 60 fonctions d'autocorrélation de 60 diagrammes d'excitations calculés en 60

fréquences centrales différentes sont représentées dans chaque cadre. La fonction

d'autocorrélation résumée (SACF) correspondant à la somme de toutes ces fonctions

autocorrélatives est représentée en dessous de chaque cadre.

• Comparaison de ces périodicités pour extraire la périodicité commune correspondant à la

hauteur (SACF) (Fig 8: Stages 7,8).

Grimaut 37

Lorsque toutes les fonctions d'autocorrélation sont calculées, Meddis & O'Mard (1997)

proposent de les ajouter afin d'obtenir la fonction SACF. Etant donné que chaque

fonction d'autocorrélation donne la ou les périodicités en sortie d'un filtre, la somme de

ces fonctions va donner la ou les périodicités communes ou en tous cas les plus

communes à l'ensemble des filtres auditifs.

Toujours d'après ces auteurs, la périodicité la plus commune correspondrait à la

période de la hauteur perçue. Si cette périodicité n'est pas unique, il y a alors ambiguïté

et deux hauteurs peuvent être perçues par le sujet.

La figure 9 montre quatre exemples de calcul. Une périodicité commune de 10 ms

(100 Hz) est facilement calculée grâce au SACF dans les exemples a, b et c. Dans

l'exemple d, par contre, il y a ambiguïté. En effet deux périodes ressortent du modèle:

5 ms (200 Hz) et 10 ms (100 Hz). La hauteur perçue peut donc prendre, d'après le

modèle, l'une ou l'autre de ces deux valeurs.

Il est intéressant de remarquer que Meddis & O'Mard (1997) expliquent, grâce à ce

même modèle la discriminabilité entre deux hauteurs. Il suffit, selon eux, de mesurer

la distance euclidienne au carré (D²) entre les SACFs calculés pour ces deux sons. La

discriminabilité entre deux sons complexes est alors proportionnelle à D².

2-3-3-2- confrontation de ce modèle aux données psychoacoustiques.

Confrontons maintenant ce modèle aux nombreuses données psychoacoustiques de la

littérature. Cette confrontation posant la question de la validité de ce modèle est encore un

sujet de vive polémique entre les différentes équipes travaillant sur ce sujet (voir pour cela la

note de Carlyon (1998) en réponse à Meddis & O'Mard (1997). Cette thèse a en partie pour

objectif de clarifier ce débat.

Grimaut 38

Les paragraphes suivants concernent tous un phénomène psychoacoustique particulier observé

expérimentalement et le confrontent au modèle ci-dessus.

2-3-3-2-1- La hauteur des sons complexes composés

d'harmoniques en phase sinus ou alternées.

Il a déjà été signalé précédemment que sous certaines conditions, les relations de phases entre

les harmoniques d'un son complexe pouvaient altérer la hauteur de celui-ci.

Ainsi, Bilsen (1973) a montré que la phase n'influençait pas la hauteur d'un son complexe A

constitué de tous ses harmoniques et que cette hauteur était identique à celle d'un son B

constitué de deux uniques harmoniques de rang inférieur à 8. Par contre, si le rang des deux

harmoniques constituant le son B est supérieur à 8, alors, la phase agit sur la hauteur de ce

dernier.

Les travaux les plus aboutis sur l'influence de la phase sont à mon avis ceux de Shackleton &

Carlyon (1994). A titre d'exemple, ces travaux sont détaillés ci-dessous.

Grimaut 39

Fig 10: Meddis & O'Mard (1997). Comparaison de deux sons complexes composés des 10

premiers harmoniques de 100 Hz et représentés temporellement. Tous les harmoniques de

celui du haut sont en phase sinus (Si ϕ n est la phase de l'harmonique de rang n, on a ∀

ng[1;+'¦, ϕn=0) et les harmoniques de celui du bas sont en phases alternées (on a les

relations ∀ ng[1;+'¦, ϕ2n=0 et ϕ2n+1=π/2).

Il est remarquable, et cela est très bien visible sur la figure 10, que le passage en condition de

phase alternée divise par deux la période temporelle d'un son complexe harmonique. De plus

Shackleton & Carlyon (1994) ont montré que cette propriété restait vraie après le passage au

travers d'un filtre cochléaire à condition que la fréquence centrale du filtre soit suffisamment

élevée par rapport à la fréquence fondamentale du son incident pour que plusieurs

harmoniques interfèrent dans le filtre (cadre b & d sur la figure 11). Si au contraire la largeur

Grimaut 40

du filtre est telle qu'un harmonique s'y trouve isolé, cette propriété n'est plus vérifiée (cadre a

& c sur la figure 11).

Fig 11: Shackleton & Carlyon (1994). Sortie de deux filtres cochléaires centrés sur 250 Hz (a

& c) et 4600 Hz (b & d) en réponse à un son complexe de fréquence fondamentale 1250 Hz

ayant ses harmoniques en phase sinus (a &b) ou en phases alternées (c &d).

Cette observation a conduit Shackleton & Carlyon à s'interroger sur la hauteur qui serait

évoquée par les sons décrits ci -dessus. Les résultats qu'ils ont obtenus sont présentés sur les

figures 12 et 13.

Grimaut 41

Fig 12: Shackleton & Carlyon (1994). Rapport entre les deux fréquences fondamentales de

deux sons complexes A et B donnés par le sujet comme ayant la même hauteur. Ces sons A et

B sont filtrés dans trois régions fréquentielles différentes notées LOW, MID et HIGH et

correspondant respectivement à des fréquences de coupures de 125-625 Hz, 1375-1875 Hz ou

3900-5400 Hz. Cette région ainsi que leur fréquence fondamentale est indiquée dans chaque

cadre.

Grimault 43

Fig 13: Shackleton & Carlyon (1994). Le sujet a ici pour consigne d'indiquer si un son

complexe de fréquence fondamentale F0 (indiqué en abscisse) et ayant des relations de phases

alternées ressemble davantage à un son complexe en phase sinus de fréquence fondamentale

F0 ou bien au même son de fréquence fondamentale 2F0. L'ordonnée de ces graphes indiquent

la différence entre le pourcentage de ces deux réponses possibles. Les trois cadres

correspondent à des conditions de filtrage différentes (LOW, MID et HIGH).

Grimault 44

Il est donc révélé par cette remarquable expérience que suivant la fréquence fondamentale et

suivant la région de filtrage, la hauteur d'un son complexe de fréquence fondamentale F0 en

phase alternée peut être soit F0 ou 2F0.

Les auteurs ont estimé que cette hauteur dépendait du nombre moyen d'harmonique par bande

critique. Si le son complexe est tel que moins de 2 harmoniques sont présents, en moyenne,

par filtre auditif, sa hauteur sera F0. Par contre s'il y a, en moyenne, plus de 3.25 harmoniques

par filtre, sa hauteur sera 2F0. Ils ont alors instauré une nomenclature largement reprise par la

suite (Carlyon, 1998; Micheyl & Carlyon, 1998; Plack & Carlyon, 1995; Carlyon, 1996a,b).

Les sons en phase alternée dont la hauteur est F0 (moins de 2 harmoniques/filtre) seront dits

"résolus" et ceux dont la hauteur est 2F0 (plus de 3.25 harmoniques/filtre) seront "non-

résolus". On voit que cette définition qui utilise le nombre d'harmoniques par filtre laisse la

place (entre 2 et 3.25) à des sons ni entièrement résolus ni entièrement non-résolus. La hauteur

de ces sons sera ambiguë, parfois F0 et parfois 2F0. Ceci explique les scores proches de 0 dans

le cadre 2 de la figure 13.

Voyons maintenant les prédictions que le modèle de Meddis & O'Mard (1997) nous permet de

réaliser.

Grimault 45

Fig 14: Meddis & O'Mard (1997). Cette figure donne des exemples SACF réalisés avec

différents sons complexes filtrés dans trois régions fréquentielles distinctes (LOW, MID et

HIGH) correspondant aux trois colonnes de la figure. Chaque ligne représente une condition

de stimulation: Ligne 1: F0=150 Hz, Phases sinus. Ligne 2: F0=150 Hz, Phases alternées.

Ligne 3: F0=300 Hz, Phase sinus.

Numérotons, par souci de clarté, les cadres de la figure 14 de gauche à droite et de haut en

bas. Le cadre numéro 1 correspond alors à un son complexe ayant un F0 de 150 Hz, filtré en

région LOW et ayant ses harmoniques en phase. Le cadre 2 correspond au même son filtré en

région MID. Les cadres 1,4,7 et 8 correspondent alors à des conditions résolues et les cadres 3

et 6 à des conditions non-résolues. On voit très clairement sur ce graphique que le modèle

prédit une hauteur de 150 Hz pour les 5 premiers cadres et une hauteur de 300 Hz pour les

Grimault 46

cadres 7, 8 et 9. Il est remarquable que le cadre 6, correspondant à un son complexe non-

résolu de F0=150 Hz ayant des harmoniques en phases alternées laisse prédire une hauteur de

300 Hz ou tout au moins une forte ambiguïté de hauteur entre F0 et 2F0.

Le modèle répond donc bien ici aux exigences des données de psychoacoustique. Ce point

n'est contesté par aucun auteur et même les équipes de recherche les plus critiques du modèle

de Meddis reconnaissent le bon comportement de son modèle dans cette situation

expérimentale (Carlyon, 1998).

2-3-3-2-2- La discrimination de la hauteur des sons complexes

harmoniques.

Le second point que j'aborde maintenant est, au contraire, fortement débattu et fait toujours

l'objet de débats. Il est cependant particulièrement important puisque trois des cinq études de

ce document consistent précisément à mesurer des seuils de discrimination de F0.

Avant toute chose, il est indispensable de se persuader que la discrimination de deux sons

complexes dépend de la différence qu'il existe entre leurs deux hauteurs virtuelles. Cette

discrimination est indépendante de la discrimination des différents harmoniques entre eux.

Moore & Glasberg (1990) ont montré ce résultat et nous le tiendrons dorénavant pour acquis.

Il s'agit dans ce paragraphe d'expliquer les données de Schackleton & Carlyon (1994) qui

mettent en évidence des meilleurs seuils de discrimination de hauteur pour des sons

complexes résolus que pour des sons complexes non-résolus. Pour percevoir une différence de

hauteur entre deux sons complexes qui se succèdent, il faut donc une différence de fréquence

fondamentale (FDL) plus faible pour les sons résolus (de l'ordre de 1% du F0) que pour les

Grimault 47

non-résolus (de l'ordre de 3% du F0). Ces données sont résumées dans la figure ci-dessous

extraite de Carlyon (1998).

Figure 15: Carlyon (1998). Seuils de discrimination de hauteur entre des sons complexes de

fréquence nominale 88 ou 250 Hz filtrés dans les régions LOW, MID et HIGH. Par souci de

comparaison, les seuils sont exprimés en pourcentage de la fréquence fondamentale

nominale. Les conditions 88-LOW, 250-LOW et 250-MID sont résolues et les conditions 88-

MID, 88-HIGH et 250-HIGH ne sont pas résolues.

Meddis et O'Mard, en 1997, ont enrichi le modèle initial de Meddis & Hewitt (1991) d'une

fonction "différence" permettant une mesure de la discriminabilité de deux sons complexes

harmoniques. Il s'agit en fait d'une simple observation de la différence entre les SACFs

calculés pour chacun des deux sons. Plus cette différence est ample, plus les sons seront

Grimault 48

facilement discriminés (reconnus comme différents) par le sujet. La figure 16 donne un

exemple concret de ce procédé.

Figure 16: Meddis & O'Mard (1997). SACFs (colonne de gauche) et fonctions "différences"

(colonne de droite) pour des signaux complexes ayants des F0 de 100 et 102 Hz et filtrés dans

les régions LOW (ligne du haut) et HIGH (ligne du bas).

Sur cette figure, la condition du bas correspond à des sons non-résolus et celle du haut à des

sons résolus par le système auditif périphérique. D'après les auteurs de cet article, on reconnaît

cela, en particulier, à la forme des SACFs. Ils sont bien nets (en haut) en condition résolue et

beaucoup moins (en bas) en condition non-résolue. De même, les fonctions différences qui en

Grimault 49

sont déduites sont beaucoup plus amples en résolu (en haut à droite) qu'en non-résolu (en bas

à droite). Les auteurs déduisent que la mauvaise disciminabilité de deux sons non-résolus est

directement imputable à des fonctions "différences" confuses et peu amples.

L'élévation des seuils de discrimination en condition non-résolu est donc dû, pour ces auteurs,

à la forme des SACFs représentant les deux sons.

Cette hypothèse a pourtant été durement critiquée par le travail de Carlyon (1998) qui suggère

que la forme du SACF d'un signal soit directement liée à la région où ce signal est filtré et non

pas à sa résolvabilité. La discriminabilité d'un signal, par contre, est liée à sa résolvabilité per

se et pas à l'enveloppe de son spectre (Shackleton & Carlyon, 1994; Carlyon & Shackleton,

1994).

Figure 17:Carlyon (1998). SACFs de sons complexes filtrés dans trois régions spectrales

LOW (125-625 Hz), MID (1375-1875 Hz) et HIGH (3900-5400 Hz) et dont les F0 sont fixés à

88 Hz et 250 Hz.

Grimault 50

La figure ci-dessus propose ainsi un contre exemple à l'idée de Meddis & O'Mard. En effet,

les SACFs 1,2 et 4 correspondent à des conditions résolues. Au contraire, les SACFs 3,5 et 6

représentent des sons non-résolus. On comprend alors bien que la différence de leur forme

peut expliquer une moindre discriminabilité pour les sons 5 et 6 (non-résolus) que pour les

sons 1 et 2 (résolus). Cependant, cette différence ne peut pas expliquer une discriminabilité

différente entre les sons 3 (non-résolus) et 4 (résolus) étant donné que les SACFs ont la même

allure générale. La mauvaise discriminabilité du son 3 par rapport au son 4 a pourtant été

expérimentalement prouvée (cf fig 15).

2-3-3-2-3- Hauteur d'un son complexe ayant un harmonique

décalée en fréquence.

Grimault 51

Figure 18: Meddis & O'Mard (1997). Hauteurs perçues par les sujets de l'expérience de

Darwin et al. (1994) ou prédites par le modèle (avec différentes constantes de temps) lors du

décalage de le 4ème harmonique d'un son complexe de fréquence fondamentale 150 Hz.

Le dernier phénomène psychoacoustique qui peut être prédit par le modèle consiste en un

décalage de la hauteur perçue par un sujet (Roberts & Brunnstrom, 1998; Darwin et al., 1994)

au fur et à mesure du décalage de la fréquence d'un de ses harmoniques. Le rang de cet

harmonique doit être préférentiellement faible. La hauteur est alors "tirée" dans le sens du

décalage jusqu'à un maximum à partir de laquelle elle retourne à sa valeur initiale.

Ce résultat ainsi que l'aptitude du modèle à prédire ce phénomène n'a pas, à ma connaissance,

été remis en cause. Enfin, pour finir sur ce point, notons que ce résultat qui nous entraîne déjà

Grimault 52

vers l'analyse de scènes auditives sera à nouveau développé dans la deuxième partie de cette

mise au point bibliographique sous ce nouveau point de vue.

Le modèle de Meddis peut encore expliquer toute une gamme de résultats dans le détail

desquels nous ne rentrerons pas puisqu'ils s'éloignent de notre champ d'investigation. Par

exemple et entre autres, ce modèle peut expliquer la sensation de hauteur évoquée par la

répétition sans blanc d'une bande temporelle de bruit blanc(Meddis & Hewitt, 1991a; Bilsen

& Ritsma, 1970; Wiegrebe et al., 1998; Yost, 1996) ou celle évoquée par un bruit modulé en

amplitude (Meddis & Hewitt; 1991a).

2-3-3-2-4- Critique du modèle autocorrélatif de Meddis &

Hewitt (1991).

Il ne s'agit pas ici de remettre en cause la puissance de ce modèle. Cependant, il me semble

intéressant d'exposer les principales critiques qui lui ont été faites dans la littérature.

Tout d'abord, il faut bien se rendre compte que ce modèle est exclusif de tout autre. Il est

destiné, par ses auteurs, à expliquer tous les phénomènes qui se rapportent à la perception de

la hauteur. Il peut être amélioré, pour prendre en compte de nouvelles données expérimentales

mais on ne doit pas avoir besoin de substituer à lui un autre modèle (comme, par exemple, un

des modèles spectraux exposés précédemment) pour répondre à un problème particulier.

L'intitulé de l'article de 1997 de Meddis et O'Mard peut ainsi être traduit en Français par "Un

modèle unitaire de la perception de la hauteur". Il a donc une vocation universelle. Les

principaux opposants de ce modèle sont précisément opposés à cette unicité. Seuls quelques

auteurs mettent en doute qu'un procédé autocorrélatif puisse être utilisé (Kaernbach &

Grimault 53

Demany, 1998), par le système auditif, pour extraire la hauteur d'un son. Les autres pensent

que ce type de modèle pourrait bien être utilisé sous certaines contraintes, mais que d'autres

stratégies doivent pouvoir parfois être mises en oeuvre. Ils pensent donc que nous disposons

de plusieurs stratégies de codage de la hauteur qui donnent pourtant naissance à une sensation

unifiée. Carlyon & Shackleton (1994) ont obtenu des données suggérant l'existence de

plusieurs stratégies. Cette question cruciale est au coeur des études 1 et 2 de ce document. Ces

deux études apportent des éléments qui sont favorables à l'hypothèse duale de Carlyon &

Shackleton (1994).

Les critiques et remarques sur ce modèle que je vais à présent passer en revue peuvent être

répertoriées en trois catégories. Tout d'abord j'exposerai les arguments en faveur d'une

multiplicité des modèles utilisables par le système auditif pour extraire la hauteur d'un son

complexe harmonique. Deuxièmement, je ferais une très brève revue des pistes

neurophysiologiques dont nous disposons pour infirmer ou au contraire confirmer cette

théorie. Enfin, je ferais part des critiques intrinsèques auxquelles ce modèle à été confronté.

1-Argument en faveur de l'existence de plusieurs stratégies de codage de la hauteur.

Grimault 54

Figure 19: Carlyon & Shackleton (1994). Aptitude des sujets, exprimée en d', à comparer la

hauteur de deux sons complexes harmoniques. Dans le cadre de gauche, ces sons ont un écart

de 3.5% entre leur F0; à droite, cet écart est égal à 7.1%. Plus le d' est important, mieux les

sujets sont parvenus à détecter la différence de hauteur. Les groupes de lettres R-R indiquent

que les deux sons à comparer sont résolus, U-U indique qu'ils sont non-résolus. Des résultats

dans des conditions de résolvabilité mixtes (R-U) sont présentés sur ce graphique. L'abscisse

indique la région spectrale de chaque son constituant la paire à comparer.

La figure ci-dessus extraite de leur article met en évidence que des sujets ont des difficultés

particulières à comparer la hauteur de deux sons complexes lorsque l'un d'eux est résolu et

l'autre non-résolu. La comparaison de la hauteur de deux sons tout deux résolus ou tout deux

non-résolus pose comparativement moins de difficultés. Les auteurs expliquent ces données

en suggérant l'existence de deux stratégies duales de codage de la hauteur. L'une serait

spécialisée dans le codage de la hauteur des sons complexes résolus et l'autre dans le codage

de la hauteur des sons complexes non-résolus.

Grimault 55

2-Données de la physiologie.

Au niveau physiologique, ce débat n'est pas plus avancé. Evans (1978) explique que

l'information transitant par le nerf auditif peut être utilisée à la fois pour un codage de la

hauteur par la place ou bien par la périodicité. Des travaux plus récents (Langner, 1997;

Langner et al., 1997) explorant la physiologie du cortex auditif (A1) montrent une

représentation orthogonale de la hauteur et de la fréquence (i.e. de la périodotopie et de la

tonotopie) dans A1. Ceci laisse la place à tout type d'hypothèse sur le procédé d'extraction.

Enfin, seules quelques études comme celle de Langner & Schreiner (1988) ou encore celle de

Schulze & Langner (1997a,b) seraient plutôt favorables à un modèle temporel.

Pour rendre la confusion à ce niveau encore plus complète, Steinschneider et al. (1998)

apportent quant à eux des arguments physiologiques en faveur de la présence de deux

mécanismes d'encodage de la hauteur dans A1 en fonction de la résolvabilité du son incident.

Les travaux les plus informatifs dans ce domaine sont sans doute ceux de Cariani et Delgutte

(Cariani & Delgutte, 1996a,b). Ces auteurs ont procédé à l'enregistrement dans le nerf auditif

du chat des trains d'impulsion évoqués par différents stimuli qui ont la particularité de faire

naître chez l'homme une sensation de hauteur. A partir de ces enregistrements, ils ont calculé

la distribution des intervalles entre deux pics consécutifs (intervalles du premier ordre) et celle

des intervalles consécutifs et non-consécutifs (intervalles du tout ordre). Cette deuxième

distribution correspond schématiquement à un calcul d'autocorrélation du train d'impulsion.

Ils ont conclu que la hauteur et ses propriétés pouvaient largement être prédites sur la base de

la distribution des intervalles interpics de tout ordre. La distribution des intervalles du premier

ordre est intensité-dépendante et convient donc moins bien pour une prédiction satisfaisante

de la hauteur. Ces résultats apportent donc des éléments physiologiques forts en faveur d'un

modèle autocorrélatif et vont à l'encontre des suggestions de certains théoriciens d'un modèle

Grimault 56

spectral du codage de la hauteur. En effet, ces derniers, comme Srulovicz & Goldstein (1983)

supposent l'existence, à un niveau central, d'un spectre interne permettant l'extraction de la

hauteur. D'après eux, ce spectre trouverait son origine dans le filtrage des trains d'impulsions

de chaque fibre auditive. Dans le cadre de cette théorie, la distribution des intervalles du

premier ordre devraient donc être prépondérante pour l'extraction de la hauteur.

Cependant, ces résultats viennent très récemment d'être remis en cause par un élégant

protocole de psychoacoustique. Cette étude et ses implications sont détaillées dans le

paragraphe suivant.

3-Mise en cause de la réalité physiologique d'une autocorrélation "mathématique"; premier

pas vers l'élaboration d'une fonction corrélative hybride plus représentative du fonctionnement

du système auditif.

Très récemment, Kaembach & Demany (1998) ont mis en évidence qu'un classique calcul

autocorrélatif échouait à rendre compte de la hauteur évoquée par des stimuli constitués de

trains de clicks.

Dans le cadre de cette expérience et d'après le modèle de Meddis & Hewitt (1991), lors de

l'excitation du système auditif par un train de click, on "récupère" ce même train de click dans

le nerf auditif. D'autre part, un tel train de clicks évoque une faible sensation de hauteur. On

peut ainsi tester directement si la saillance de la hauteur calculée en effectuant une

autocorrélation de ce stimulus coïncide avec la saillance de la hauteur rapportée par le sujet. Il

s'avère qu'une simple autocorrélation (détermination de l'intervalle interclicks d'ordre

quelconque le plus commun) prédit mal les résultats obtenus. Il faudrait modifier ce calcul

d'autocorrélation afin que les périodicités du signal détectées correspondent uniquement à des

Grimault 57

pics d'amplitudes adjacents (détermination de l'intervalle interclicks d'ordre un le plus

commun).

Jusqu'ici, la question de l'existence d'un unique modèle d'extraction de la hauteur (corrélatif ?)

dans toutes les situations ou plutôt de l'existence d'au moins deux modèles (l'un spectral et

l'autre autocorrélatif ?) n'a pas eu de réponse entièrement satisfaisante, ni de la part des

psychoacousticiens, ni de celle des neurophysiologiques.

C'est pourquoi nous avons entrepris les deux études (études 1 et 2) qui ont pour objectif

d'apporter des éléments de réponse à cette question en utilisant les outils de la

psychoacoustique.

3-L’analyse de scène en audition.

La seconde grande partie de ce travail, constituée des études 3, 4 et 5, a pour objectif de

montrer l'importance d'un codage performant de la hauteur pour des tâches auditives

particulières qui sont aussi bien quotidiennes qu'indispensables. Je me suis penché en

particulier sur le vaste domaine de l'analyse de scènes en audition.

Qu'est ce que l'analyse de scènes ? A tout instant, une multitude de sons, de bruits... nous

parviennent à l'oreille. Certains sont informatifs alors que d'autres ne le sont pas et gênent

notre perception. Dans le cas le plus fréquent, il s'agit, pour nous d'extraire d'un mélange

sonore un signal particulier correspondant par exemple à la voix de notre interlocuteur. Les

autres signaux nous font alors l'effet d'un bruit ambiant. On appelle "scène auditive" un

Grimault 58

mélange de plusieurs sons. Notre système auditif analyse à tout instant cette scène pour en

extraire les différentes composantes. Les mécanismes impliqués sont complexes et multiples,

mais nous essayerons dans cette partie de donner une vue d'ensemble de quelques uns de ces

mécanismes ainsi que de montrer dans quelle mesure la hauteur d'un son peut être un facteur

important pour l'isoler d'un fond sonore.

Cette analyse ne nous demande presque aucun effort. Elle n'engendre de la fatigue que dans

des situations exceptionnelles (comme dans une situation de "cocktail party") et pour des

personnes ayant le plus souvent une audition pathologique. Elle est pourtant d'une

complication extrême. Il faut se rendre compte comme le dit Bregman (Bregman, 1990) que

cette tâche est d'une difficulté comparable à celle de déterminer combien de bateaux, de quels

types et navigant dans quelles directions, sont présents sur un lac en n'observant que les

ondulations produites au niveau de la berge. Les bateaux simulent ici les sources sonores qui

peuvent être de plusieurs sortes (une voix, un bruit de voiture, de la musique...) et qui peuvent

se déplacer (comme la voiture) et les ondulations de la berge simulent les vibrations du

tympan.

Soulignons que ce vaste champ d'étude est intrinsèquement lié à l'étude de la perception de la

hauteur puisque cette sensation, comme nous l'avons vu en première partie, nous est évoquée

par un ensemble de sons purs groupés dans un même flux auditif. Ainsi, Hartmann (1988)

n'hésite pas à dire que l'intégration ou la ségrégation de sons entre eux est un des multiples

aspects de la perception de la hauteur.

Nous donnerons dans les paragraphes qui suivent un bref aperçu des mécanismes nous

permettant d'analyser les mixtures sonores afin de rendre plus intellligibles les études 3, 4 et 5.

3-1-L’analyse par schémas

Grimault 59

Ces mécanismes peuvent être regroupés au sein de deux grandes classes. La première

de ces classes concerne des mécanismes de haut niveau (corticaux et sous-corticaux).

Il est remarquable que nous puissions mémoriser, principalement au cours de notre enfance,

des groupes de sons auxquels nous associons une signification particulière. Lorsque l'un de

ces sons ou de ces groupes de sons nous parvient, nous le reconnaissons "globalement"

comme entité sonore distincte, même s'il est partiellement masqué par d'autres sources. Ainsi,

chacun d'entre nous connaît et reconnaît facilement son prénom et nous parvenons à l'isoler et

à le percevoir au milieu d'un brouhaha important. Ce mécanisme d'analyse porte le nom

d'analyse par schémas (Bregman, 1990). Les auteurs entendent par schéma les caractéristiques

spectrales ou temporelles d'un son qui permettent sa reconnaissance par le sujet. Ce

mécanisme est automatique et ne requiert pas nécessairement l'attention du sujet. Cependant,

la reconnaissance de certains schémas, moins connus ou en tout cas moins fréquents peut être

largement facilitée par la mise en oeuvre de processus attentionels conditionnés par le

contexte sémantique ou tout simplement par la situation à laquelle est confronté le sujet.

Ces processus de hauts niveaux n'ont pas fait l'objet de recherche complémentaire au cours de

mon doctorat. Sans toutefois les omettre complètement, je ne rentrerais donc pas dans le détail

de ces mécanismes.

3-2-L’analyse primitive des scènes auditives.-

Une question importante, toutefois, est de savoir comment le jeune enfant a été initialement

capable de se constituer sa "collection" de schémas. Il faut alors nécessairement postuler que

d'autres mécanismes, sans doute plus primaires et antérieurs, permettent eux aussi d'analyser

les scènes auditives. Ces mécanismes de bas niveau pourraient alors permettre à l'enfant

Grimault 60

l'élaboration des schémas dont il a besoin. Il y a peu de doutes que ces mécanismes soient

utilisés quotidiennement et préalablement à toute analyse et à toute reconnaissance.

On appelle ces traitements: l'analyse primitive des scènes auditives (Bregman, 1990).

+

=

GROUPE ENTENDRE

GROUPE + ENTENDRE

Fig 20. En haut, sonagrammes respectifs des mots "groupe" (à gauche) et "Entendre" (à

droite); En bas, sonagramme du mélange "groupe"+"Entendre".

Pour faire le parallèle avec la vision, on voit bien sur la figure 20 que sans connaître

préalablement et individuellement chaque entité visuelle correspondant chacune à un mot

("groupe" et "Entendre"), il semble difficile de les séparer lorsqu'elles sont présentées

simultanément.

Nous parlons ici d'isolement, de séparation ou même de ségrégation de différentes sources

auditives. Toutefois, une bonne analyse de scène ne fait pas que séparer des événements

simultanés, elle regroupe aussi les événements auditifs qui se correspondent. Par exemple,

chacun des bruits de pas de quelqu'un s'éloignant de vous ne sera pas pris isolément mais

Grimault 61

groupé aux autres dans une même source. Vous pourrez alors savoir, en observant la

décroissance de l'intensité de cette source que la personne s'éloigne. De même, chacun des

mots de votre interlocuteur appartient à la même source sonore. Si tel n'était pas le cas,

lorsque plusieurs locuteurs parlent simultanément, vous auriez des difficultés à reconstituer

les phrases de chacun d'entre eux à partir des mots prononcés par tous.

Deux type d'analyses sont donc nécessaires. L'analyse des sources simultanées fera l'objet de

la première partie de cet exposé alors que nous aborderons l'analyse de sources séquentielles

dans un deuxième temps. La première de ces analyses permet donc de séparer les événements

auditifs que nous appellerons dorénavant des flux auditifs. La seconde regroupe dans le même

flux auditif des événements non-simultanés qui ont cependant la même origine.

3-2-1-L’analyse de sources simultanées

Le principe de base de l'analyse primitive des scènes auditives est trivial et consiste à associer

entre eux les sons qui partagent entre eux des caractéristiques physiques communes. Ce

principe donne lieu à des règles de similitude plus ou moins influantes dans la prise de

décision de grouper ou de séparer deux sons. Ci-dessous, une liste non exhaustive de ces

principes permet d'éclaircir mon propos. Ce paragraphe, en rappelant les règles du groupement

auditif, permettra de lire la cinquième étude sous le jour nouveau de l'analyse de scènes.

3-2-1-1-La corrélation temporelle (principe du destin commun)

Cette règle vient d'une double constatation. Tout d'abord, il existe une faible probabilité que

deux sons sans rapport l’un avec l’autre démarrent et s’arrêtent simultanément. Une stratégie

Grimault 62

"ancien plus nouveau" peut donc être efficace pour détecter l'arrivée dans le paysage sonore

d'une seconde source et la séparer ainsi de la source initiale.

20 dB60 dB

30 ms

masque masque

signal

signal

temps tempsam

plitu

de

ampl

itude

Fig 21: Rasch (1978). Cette figure met en évidence le groupement des sons simultanés. Un

démasquage important (de l’ordre de 40 dB) résulte d’une désynchronisation de 30 ms du

masque et du signal. On explique ce phénomène par la distinction du signal et du masque en

deux flux distincts.

Rasch, en 1978, a mis en place une des expériences mettant le mieux en évidence l'influence

de ce phénomène. Cette expérience consiste à mesurer et à comparer la quantité de masquage

produite par un masque sur un signal lorsque ces deux stimuli sont synchronisés et lorsqu'ils

sont décalés de 30 ms. Un préalable est toutefois nécessaire pour bien comprendre son

raisonnement. Rasch fait l'hypothèse fondamentale qu'une importante partie du démasquage

induit par la désynchronisation est dûe à l'attribution, par le système auditif, de deux flux

auditifs distincts, l'un pour le masque et l'autre pour le signal. Le démasquage de 40 dB qu'il

observe peut donc, d'après lui être largement attribué à cette ségrégation.

Grimault 63

La seconde constatation est la faible probabilité que les fluctuations temporelles de deux

sources sonores distinctes soient corrélées. Ainsi, des sons modulés en amplitude de façon

coordonnée (comodulée) auront tendance à être regroupés au sein de la même source ou du

même flux auditif. C'est ce phénomène que les psychoacousticiens désignent sous le nom de

"démasquage par comodulation" (CMR) et qui est illustré sur la figure ci dessous.

Fré

quen

ce

Temps

Fig 22: Le pouvoir masquant d'une bande spectrale de bruit modulé en amplitude et centré

sur le signal (un son pur) est diminué en ajoutant deux autres bandes spectrales comodulées à

la première et spectralement décalées.

Le fait qu'ajouter de l'énergie à un masque déjà présent puisse engendrer un démasquage a

considérablement étonné le milieu de la psychoacoustique et de nombreux auteurs se sont

alors penchés sur ce phénomène (eg. Hall et al., 1984; Hick & Bacon, 1995, Bacon et al.,

1997).

Grimault 64

Finalement, ceci est un argument supplémentaire à la théorie de l'analyse de scènes.

L'organisation en sources auditives permet en effet de grouper perceptivement les trois bandes

de bruit et ainsi d'extraire le son pur comme une source distincte, ce qui entraîne un

démasquage. Il n'est pas impossible que ce type de mécanisme (Bregman et al., 1985),

consistant en un suivi des modulations d'amplitude, puisse être utilisé pour séparer deux voix

concurrentes.

De même, une modulation de fréquence cohérente peut permettre de grouper entre eux des

objets sonores (Mc Adams, 1989) même si cet indice de groupement semble de faible niveau

de hierarchie.

Remarquons ici que la cohérence temporelle des harmoniques constituant un son complexe

pourrait bien être un fort facteur d'intégration.

3-2-1-2- Progression de la transformation, continuité et lenteur.

Une simple règle de continuité postule que les transformations que subit une source doivent

être lentes et surtout continues. Ainsi toutes les grandeurs physiques caractérisant le flux

auditif: fréquence, intensité, hauteur... peuvent varier; mais si la variation est trop brutale,

l'analyse par le système auditif conclura qu'un nouveau flux s'est superposé au premier.

Cette règle peut donc s'appliquer à chacune des dimensions du signal sonore.

Observons tout d'abord l'aspect fréquentiel de la question.

Lorsque le spectre d’un son devient subitement plus complexe (cf figure 23), en gardant ses

composants initiaux, on continue à entendre le son initial plus un nouveau. On peut entendre,

Grimault 65

par exemple, un son pur perdurer au travers d’un bruit blanc temporaire dans une situation

expérimentale telle que celle représentée sur la figure 23.

temps (arb)

Inte

nsit

é (a

rb)

temps (arb)fréquence (arb)

Inte

nsit

é (a

rb)

Fig 23: Double représentation d'un signal constitué d'un son pur succédé d'un bruit blanc

auquel succède à nouveau un son pur identique au premier. La forme temporelle du signal est

représentée en haut. En bas, une représentation en trois dimensions (temps, fréquence et

intensité) est proposée. Toutes les unités sont arbitraires.

Comme pour la fréquence, ceci est vrai pour l'intensité du son. Warren, en 1982, a mis ce

phénomène en évidence. Il l'a dénommé la continuité homophonique.

Grimault 66

T

II

I

T

++

I

T

Fig 24: Le protocole expérimental de Warren (1982) est représenté sur cette figure. En

présentant au sujet le son représenté en haut de la figure (i.e. un son dont l'intensité augmente

brutalement sur une courte durée), il lui semble percevoir les deux sons du bas (i.e. d'une part

un son continu et d'autre part une brève bouffée sonore).

Lorsque soudain l'intensité d'un son augmente très brusquement, le sujet a la sensation d'être

en présence d'un son continu ayant une intensité constante auquel est venu s'ajouter un second

son.

Enfin, la sensation de provenance des sons doit elle aussi être continue pour éviter la scission

d'une source en deux flux.

Bregman (1991) montre cette dernière assertion au moyen du protocole de la figure 25.

Grimault 67

D G

D G

D G

I

I

I

I

I I

T T

T T

TT

Fig 25: Illustration du protocole imaginé par Bregman (1991). Dans le cas 1, le sujet perçoit

un seul son devant lui. Dans le cas 2, perception d’un seul son se déplaçant vers la droite et

dans le cas 3, perception d’un son devant et d’un deuxième à droite.

La direction d'ou provient les sons semble cependant être un faible indice de groupement ou

au contraire de séparation des sources. En effet, le plus souvent, les sons qui nous parviennent

ont subi de multiples réflexions et les relations de phase initiales ne sont pas ou peu

conservées.

Dans cet état d'esprit, remarquons par anticipation que des transitions temporelles brutales

semblent favoriser, dans l'étude 5, la discrimination de sons complexes noyés dans un bruit de

fond.

Avant de passer à l'étude du fonctionnement de l'analyse séquentielle, rappelons qu'à ces

règles de continuité viennent s'ajouter toutes celles de similitude de fréquence, de timbre et

surtout de hauteur...

Grimault 68

Par exemple, il est remarquable d'observer la variation de la hauteur d'un son complexe

harmonique lorsque la fréquence de l'une de ses composantes augmente. Cette hauteur

augmente progressivement jusqu'à atteindre un maximum (cf fig. 18) puis revient à sa valeur

initiale (Darwin et al., 1994). On peut supposer qu'il existe un décalage limite qui empêche

l'intégration de cette composante dans le même flux que les autres qui sont toutes dans un

rapport harmonique et qui donnent naissance à une sensation de hauteur cohérente.

Remarquons aussi les études de Bregman & Ahad (1994) et Bregman et al. (1994) qui

concluent que les mécanismes d'intégration de la hauteur peuvent être déclanchés et

réinitialisés par des temps de monté-descente des signaux très brusques. On voit alors une fois

de plus le lien étroit qu'il existe entre les mécanismes de groupement (intégration des

harmoniques dans un même flux) et les mécanismes d'intégration de la hauteur. Ces deux

études sont d'ailleurs à l'origine de la cinquième étude de ce document.

3-2-2-L’analyse de sources séquentielles: "le streaming".

Les expériences 3 et 4 de ma thèse ont pour objet l'étude des mécanismes permettant une

bonne analyse des scènes auditives constituées de signaux qui ne coïncident pas dans le

temps. Ce champ d'investigation est appelé l'analyse de sources séquentielles ce qui peut être

traduit en anglais par le terme "streaming". Ce terme indique que des sons non simultanés

peuvent être perceptivement groupés dans un même flux ("stream" en anglais) auditif.

3-2-2-1-Cadre général.

Grimault 69

Les mêmes règles de similitude et de continuité des paramètres physiques des signaux sont

cruciales pour déterminer si des sons qui ne sont pas simultanés appartiennent au même flux.

Ainsi, par exemple, le timbre particulier d'une flûte nous permet d'associer, sans confusion,

une série de notes à cet instrument même si d'autres instruments interfèrent. De nombreuses

expériences illustrent de quelle façon différents paramètres sont utilisés pour regrouper dans

un même flux les différents éléments d'une séquence sonore.

La plupart des illustrations qui figureront dans ce chapitre utiliseront une représentation

temps-fréquence. Un exemple de représentation de séquence est donné ci-dessous.

Temps

Fréq

uenc

e

F1

F2

F3

t1 t2 t3 t4 t5 t6

A B C

Fig 26: Cette figure représente une séquence de trois sons A, B et C notée A-B-C. Le son A est

un son pur de fréquence F1, il commence en t1 et fini en t2. Le son B est un son complexe

constitué de trois sons purs de fréquences F1, F2 et F3. Il commence en t3 et fini en t4. Le son

C est un son pur dont la fréquence varie linéairement de t5 à t6 entre F1 et F3.

Grimault 70

La continuité et la progressivité de la transformation sont de rigueur pour qu'une séquence ne

soit pas scindée en plusieurs flux auditifs. Par exemple, l’intensité du bruit des pas d’une

personne s’éloignant de nous diminue lentement et continûment.

L’expérience de Bregman & Dannenbring, en 1973 illustre cette règle de continuité.

Temps TempsFr

éque

nce

Fréq

uenc

e

Fig 27: Les deux flux qui étaient perçus dans le cadre de gauche se regroupent en un seul flux

lors de l’introduction d’une fréquence transitoire (à droite sur la figure).

Cette expérience montre tout d'abord qu'un auditeur, à qui on soumet une séquence A-B-A-B-

A... constituée par la répétition de deux sons purs A et B de fréquences suffisamment

éloignées l'une de l'autre, sépare cette séquence en deux flux, d'un coté A-A-A... et de l'autre

B-B-B... Par contre, en introduisant des "rampes" fréquentielles Cmonte et Cdescend entre chaque

paire A-B comme figuré à gauche de la figure 27 (la séquence devient:

ACmonteBCdescendACmonteBCdescend...), le sujet ne perçoit plus qu'un unique flux dont la

fréquence varie entre celle de A et celle de B.

Cette première expérience a amené tout d'abord van Noorden (1975) puis de nombreux autres

auteurs (Bregman & Campbell, 1971; Bregman, 1978b) à étudier de plus près quels

Grimault 71

paramètres gouvernaient la cohésion ou au contraire la scission en un flux de As et un flux de

Bs d'une séquence sonore A-B-A-... présentée à un sujet en boucle. Ces auteurs ont ainsi

mesuré des seuils de scission ou de cohérence d'une séquence dans des expériences qui ont été

regroupées sous le terme générique de "streaming".

L'expérience originelle de "streaming" de van Noorden (1975) montre qu'une séquence

constituée de deux sons purs A et B de fréquences Fa et Fb (cf Figure 28) est d'autant plus

facilement séparée en deux flux (A-A-A... et B-B-B-...) par le sujet que:

1-Fa est différent de Fb (DF= sFb-Fas est grand).

2-Dt est petit.

Temps

Fréquence

Fa

Fb

son A son B son A

DtDF

Le StreamingLe Streaming

A B ASilence Silence ...

Figure 28: Cette figure représente la configuration classique de "streaming" (Van Noorden,

1975). Des séquences de sons purs ABA-ABA-... sont présentées. Si la fréquence de A est très

proche de celle de B, les sons sont regroupés en une seule source. Si au contraire, Fa est

éloigné de Fb, alors A et B sont séparés. De même, si l’écart temporel Dt est très petit et que

Fa diffère de Fb, alors, il y a un brusque changement de fréquence et A et B sont séparés

d’après le principe de continuité. Si, par contre Dt est grand, alors le changement de

Grimault 72

fréquence est moins brutal et le groupement de A et de B supporte un plus grand écart

fréquentiel.

Ceci s'explique par les lois de continuité. Si Fa et Fb sont proches, les sons A et B sont assez

similaires (en fréquence) et vont donc pouvoir être intégrés dans le même flux. Toutefois, si

Dt est très petit, le changement (de Fa à Fb) va être très brutal et va empêcher l'intégration.

De même que de part leur proximité fréquentielle, des sons A et B peuvent être intégrés dans

un flux unique du fait de leur proximité de timbre ou de hauteur virtuelle (Bregman et al.,

1990). Ces deux derniers paramètres de proximité semblent même être de puissants vecteurs

d'intégration ou au contraire de ségrégation.

Il est enfin intéressant de souligner qu'il existe nécessairement une hiérarchie naturelle de ces

vecteurs et que si l'un d'entre eux peut favoriser l'intégration, un autre peut, lui, favoriser la

ségrégation. Il semble probable que l'organisation en flux qui sera privilégiée par le système

auditif sera celle satisfaisant le plus grand nombre de règles ou de contraintes.

Nous savons par exemple que la direction de provenance est un facteur faible de

regroupement ou de ségrégation. Ceci étant probablement dû aux nombreuses réflexions et

atténuations que subissent souvent les sons avant de nous parvenir.

Par contre, il semble que le timbre et la hauteur des sons sont deux facteurs (deux similitudes)

de haut niveau hiérarchique. Les lois de vibration des corps entraînent très souvent des

relations harmoniques étroites entre les sons composants chaque source sonore. Les sons

produits par les instrument de musique et surtout la voix en sont des exemples.

Nous sommes sûrs, d'autre part, que la hauteur est un élément important nous permettant de

séparer des voyelles concurrentes. En effet, dès 1957, Broadbent & Ludeforged ont montré

Grimault 73

que deux voyelles sont d'autant plus faciles à séparer qu'elles ont des fréquences

fondamentales éloignées l’une de l’autre.

3-2-2-2-L'influence particulière de la hauteur virtuelle.

Il a été montré que la hauteur virtuelle pouvait être un important facteur de groupement ou de

ségrégation (Bregman, 1990; Bregman & Levitan, 1983; Bregman & Pinker, 1978; Hartmann,

1988; Vliegen & Oxenham, 1999). Hartmann, dans un article rappelant les principaux

résultats de la littérature (Hartmann, 1988) fait même l'amalgame complet entre perception de

la hauteur d'une part et ségrégation et intégration d'autre part. Il estime en effet que les

mécanismes d'intégration et de ségrégation font partie à part entière des mécanismes sous-

jacents à la hauteur. Il va même plus loin en disant que ces mécanismes sont les mécanismes

de la perception de la hauteur.

Deux études simples décrites ci-dessous mettent, en tout cas, en évidence notre aptitude à 1-

séparer ou à 2- grouper des sons sur la base de leur hauteur respective.

1-Ségrégation sur la base de la hauteur:

Une étude de Bregman & Levitan (1983) malheureusement non publiée mais citée dans

Bregman (1990) montre qu'il est possible de séparer des sources sur la base de différences de

hauteur et compare la force de ces indices de hauteur à la force des indices spectraux pour une

ségrégation performante.

Grimault 74

0

0.28

0.56

0.83

1.10

Hau

teur

(F

0)

Timbre0 0.28 0.56 0.83 1.10

Fig 29: Bregman & Levitan (1983).Dominance des indices spectraux (timbre) par rapport à

la hauteur virtuelle dans une tâche de ségrégation. Les cercles signalent une dominance de la

hauteur et les croix une dominance du timbre. Les écarts de timbre ainsi que ceux de

fréquence fondamentale (F0) sont donnés en octaves.

Les signaux utilisés dans cette étude sont des sons complexes harmoniques de fréquence

fondamentale nominale F0=128 Hz qui sont filtrés par des filtres passe-bandes triangulaires de

fréquence centrale nominale Fc=1 kHz. On peut faire varier la hauteur de ces signaux en

augmentant F0 et faire varier indépendamment le timbre en variant Fc. C'est une expérience de

"streaming" et le sujet à donc pour consigne de séparer en flux distincts les sons A et B qui lui

sont présentés sous forme de séquence A-B-A-... L'influence de la différence entre la

fréquence centrale de filtrage de A et celle de B (sFc(A)-Fc(B)s) ainsi que l'influence de la

différence de F0 (sF0(A)-F0(B)s) est orthogonalement étudiée. Les résultats de cette étude sont

représentés sur la figure ci-dessus.

Grimault 75

2-Groupement sur la base de la hauteur:

On a largement discuté dans la première partie de ce travail la sensation évoquée par un

groupe de sons purs qui partagent entre eux des propriétés harmoniques. Ils sont tous

perceptivement groupés entre eux et donnent naissance à une hauteur.

Il est remarquable cependant que des sons purs peuvent, sous certaines conditions, être

perceptivement groupés même s'ils ne sont pas présentés simultanément. Bregman & Pinker

(1978) ont ainsi montré que la réitération d'une séquence A-B-... ou A est un son pur et ou B

est un son complexe constitué de deux composantes B1 et B2, permettait de grouper A et B1

dans un même flux auditif si la fréquence de ces deux composantes était suffisamment proche.

Ces études montrent toutes la puissance du groupement par similarité ou proximité de hauteur.

On peut regretter cependant que toutes utilisent uniquement des signaux bien résolus par le

système auditif périphérique. A la vue du chapitre précédent ainsi qu'à la vue des résultats des

études 1 et 2, tenter de reproduire ces études avec des sons complexes dont les conditions de

résolvabilité varient semble intéressant. Les études 3 et 4 apportent des éléments à ce sujet.

3-2-2-3-Les modèles de groupement séquentiel.

Les principaux travaux de modélisation des règles du groupement auditif chez

l'humain ont été réalisés par Beauvois & Meddis (1996) ainsi que par McCabe & Denham

(1997). L'approche utilisée dans ces deux modèles diffère légèrement mais le principe général

est identique. Il s'agit d'identifier le canal auditif dominant (celui qui contient le plus

d'énergie) puis d'augmenter le contraste entre ce canal et les autres (Beauvois & Meddis,

1996) au moyen d'une boucle rétroactive. Ces modèles reproduisent de nombreux résultats de

la psychoacoustique. Ainsi, ils expliquent l'influence relative de l'écart fréquentiel et du

rythme de présentation d'une séquence composée de deux sons purs A et B réitérés sur

Grimault 76

l'organisation en flux auditifs. Ces modèles expliquent par ailleurs d'autres résultats tels que la

construction progressive des flux auditifs (Antis & Saida, 1985; Bregman, 1978a) et la

différence entre les seuils de fission -la limite en-dessous de laquelle, il nous est impossible de

séparer deux sons- et les seuils de cohérence -la limite au-dessus de laquelle il nous est

impossible de grouper les sons dans un même flux auditif- (van Noorden, 1975).

3-2-2-4-De l'organisation séquentielle à la discrimination de hauteur.

Pour terminer cet exposé, les quelques études dont je vais tracer les grandes lignes

dans ce paragraphe mettent en évidence le lien très fort qui semble exister entre la perception

de la hauteur et l'organisation perceptive des scènes auditives. Ces études ont été source

d'inpiration pour la dernière des expériences présentées ici. Nous avons vu dans les

paragraphes précédents qu'une différence de hauteur suffisante entre des sons complexes

présentés sous la forme d'une séquence permettait de les séparer perceptivement en les

groupant dans des flux auditifs distincts. Dans une expérience de discrimination de hauteur (la

détermination de l'écart limite de fréquence fondamentale nécessaire à la perception d'une

différence de hauteur), des paires de sons complexes sont présentées successivement à un

sujet qui doit, à chaque présentation, déterminer, par exemple, le son le plus haut. Des études

ont mis en évidence que les scores de discrimination de F0 pouvaient être détériorés en

présence de franges temporelles -ie. des sons complexes présents juste avant et juste après

chacun des sons cibles à discriminer- (Carlyon, 1996a, b; Micheyl & Carlyon, 1998; Gockel et

al., 1999). Ces résultats informent indirectement sur les mécanismes d'encodage de la hauteur

et particulièrement sur l'existence potentielle d'une fenêtre temporelle d'intégration des

informations relatives à la hauteur. Ces travaux suggèrent notament l'existence d'un

phénomène de "sur-intégration" lorsque la fenêtre temporelle d'intégration utilisée pour

Grimault 77

déterminer la hauteur contient simultanément des portions de la frange et du son cible.

Toutefois, ces auteurs discutent, par ailleurs, ces résultats en terme d'organisation de scènes

auditives et certains montrent même l'importance particulière qu'elle peut revêtir. Gockel et al.

(1999) montrent que l'altération des seuils de discrimination est réduite lorsque les franges et

les sons à discriminer sont bien séparés perceptivement.

L'organisation en source auditive pourrait ainsi constituer le premier étage d'un modèle

réaliste de la perception de la hauteur. Réciproquement, nous avons vu que la hauteur était un

puissant outil d'organisation perceptive. Le lien étroit qui semble donc exister entre ces deux

mécanismes explique et donne sa cohérence à ce travail de doctorat.

4-Résumé, objectifs de ce travail et introduction de mes travaux personnels.

Nous avons vu dans une première partie les principaux modèles qui ont été proposés dans la

littérature pour expliquer le fonctionnement du système auditif dans la tâche bien particulière

de coder la hauteur virtuelle des sons complexes harmoniques. Nous avons vu d'une part

qu'historiquement, deux grands types de modèles avaient été proposés, l'un utilisant

spécifiquement les propriétés tonotopiques de la cochlée et l'autre travaillant sur les

diagrammes d'excitation (dans le domaine temporel) en sortie de filtrage cochléaire. Chacun

de ces modèles a l'avantage d'expliquer certaines données de la psychoacoustique mais échoue

à les expliquer intégralement. Si les modèles du second type (et surtout les modèles

"autocorrélatifs") parviennent à expliquer l'essentiel des résultats expérimentaux, certains

résultats ne semblent pouvoir être expliqués qu'en admettant la coexistence de plusieurs

modèles s'adaptant à la plus ou moins bonne résolvabilité des signaux. En d'autres mots, pour

extraire la hauteur, des mécanismes différents seraient mis en oeuvre suivant que le signal est

Grimault 78

résolu ou non-résolu par le système auditif périphérique. Ces deux mécanismes donneraient

toutefois naissance à une même sensation de hauteur.

Nous avons vu, dans une seconde partie, les principales règles de l'analyse de scènes auditives

et l'importance particulière d'un codage performant de la hauteur pour une bonne analyse des

paysages sonores qui comprennent des sons complexes harmoniques.

Les travaux que vous trouverez en seconde partie de ce document apportent de nouveaux

éléments à ce sujet.

Cinq études vous sont présentées groupées dans deux chapitres. Le premier chapitre concerne

les mécanismes d'encodage de la hauteur. Il est constitué lui-même de deux études. La

première donne des éléments en faveur de l'existence d'au moins deux mécanismes sous-

jascents au codage de la hauteur. La seconde tente de les caractériser et apporte quelques

pistes à ce sujet.

Le second chapitre, en regard des résultats du premier, fournit les premiers éléments

concernant les implications de la performance du codage de la hauteur sur l'analyse de scènes

en audition. La première étude que les performances de "streaming" varient en fonction de la

résolvabilité des signaux utilisés. La seconde étude met en évidence les difficultés spécifiques

des personnes souffrant de pertes auditives dans cette tâche. Ce résultat est expliqué par la

moindre résolution fréquentielle périphérique provoquée par les dommages cochléaires. Enfin,

la dernière étude met en rapport la résolvabilité des signaux et l'aptitude des sujets à les

séparer en fonction des temps de monté-descentes.

Toutes ces études utilisent les outils et les méthodes classiques de la psychoacoustique. Il

s'agit principalement de mesures de seuils de discrimination (mesure de la plus petite

différence perceptible par le sujet) entre des signaux de fréquences fondamentales différentes

ainsi que des seuils de ségrégation sur la base de la fréquence fondamentale (mesure de la

Grimault 79

différence de F0 faisant la frontière entre la perception d'un flux ou de deux flux) pour les

expériences de "streaming". Toutes les méthodes utilisées dans les protocoles sont détaillées

dans les articles et j'ai choisi de ne pas les répéter ici afin de rendre la lecture de ce document

moins fastidieuse.

Cependant, il m'a semblé inévitable de rentrer dans le détail de l'une des approches utilisées

dans ma recherche doctorale pour étudier les mécanismes sous-jacents à la perception de la

hauteur fondamentale. Cette approche, qui se fonde sur le transfert d'apprentissage perceptif,

sera sans doute moins familière à certains lecteurs.

5-Une méthode d'exploration basée sur les apprentissages sélectifs.

5-1-Introduction.

Chacune des deux premières études qui vont vous être présentées utilise le transfert sélectif de

l'apprentissage comme outil pour mettre en évidence deux choses:

1-D'une part qu'une même tâche auditive (l'encodage de la hauteur) peut être réalisée par

différents mécanismes.

2-D'autre part, qu'il peut exister des points communs entre des mécanismes utilisés pour

réaliser des tâches auditives différentes.

Cette démarche, largement utilisée en vision (Karni & Sagi, 1990; Karni & Sagi, 1991; Karni

& Sagi, 1994; Polat & Sagi, 1994; Shiu & Pashler, 1992), est exclusivement fondée sur l'idée

que l'on peut entraîner un mécanisme perceptif de façon sélective. On fait ainsi l'hypothèse

qu'en entraînant spécifiquement le mécanisme impliqué dans une tâche A, si le sujet progresse

Grimault 80

dans la tâche A', c'est que ces deux tâches utilisent au moins partiellement des mécanismes

similaires. Si par contre le sujet n'a pas progressé dans la tâche B, c'est que A et B ne

partagent pas les mêmes mécanismes neuronaux sous-jacents.

Dans le domaine de l'audition, seules quelques études utilisant ce procédé expérimental ont

été publiées. Dans le but de mettre en évidence un codage tonotopique de la fréquence des

sons purs au-dessus de 5 kHz et un codage par vérouillage de phase en dessous de cette

fréquence (cf. 1-1 et 1-2), Demany (1985) a entraîné quatre groupes de sujets à discriminer

des sons purs à 200, 360, 2500 et 6000 Hz respectivement. Les résultats (Fig. 30) montrent

que les trois groupes de sujets entraînés à moins de 5 kHz (200, 360 ou 2500 Hz) progressent

tous dans une tâche de discrimination de sons purs de fréquence nominale 200 Hz. Par contre,

les performances des sujets entraînés à 6000 Hz (>5 kHz) sont moindres. En d'autres termes,

le transfert d'apprentissage entre des sons purs de fréquences supérieures à 5 kHz et

inférieures à 5 kHz est réduit. Il conclut que la fréquence d'un son pur est codée par deux

mécanismes différents suivant que la fréquence est inférieure (vérouillage de phase) ou

supérieure (tonotopie) à 5 kHz. Il aurait alors sélectivement entrainé le premier de ces

mécanismes.

Grimault 81

Fig 30: Demany, 1985. Performances en discrimination de fréquence à 200 Hz pour chaque

groupe de sujets. Les lignes rejoignent les seuils de chaque sujet avant entraînement aux

seuils après entraînement. Globalement, les trois premiers groupes (entrainés en

discrimination de fréquence à 200, 360 et 2500 Hz) progressent alors que de nombreux sujets

entraînés à 6 kHz ne progressent pas.

Une seconde étude menée par Schulze & Scheich (1999) conclut que le mécanisme

d'encodage de la modulation d'amplitude à basse fréquence est différent de celui à haute

fréquence. Cette conclusion se base sur l'observation des courbes d'apprentissage de ces deux

tâches: des courbes d'apprentissage différentes suggèrent des mécanismes neuronaux

différents.

Grimault 82

Cette méthodologie à l'avantage de fournir par ailleurs de nombreuses informations sur nos

aptitudes à nous améliorer dans diverses tâches auditives.

Ces aptitudes doivent très certainement varier spécifiquement suivant la tâche et surtout

suivant le mécanisme entrainé sélectivement. Ceci peut donc apporter des informations de

toute première importance sur les mécanismes neuronaux sous-jacents. Ainsi, on peut essayer

de savoir à quel niveau du système auditif un phénomène se produit. Plus simplement, on peut

essayer de déterminer si ce phénomène est plutôt périphérique ou plutôt central. C'est ce

qu'ont fait Maubaret et al. (1999), en montrant que des sujets entraînés à discriminer des sons

purs dans l'oreille droite progressaient aussi dans l'oreille gauche. Ils concluent alors que le

mécanisme sous-jacent à la discrimination des sons purs se situe au-dessus des noyaux

cochléaires.

5-2-La plasticité neuronale du système auditif interne induite par apprentissage.

Pour pouvoir analyser de façon convenable les résultats d'une étude dont la méthodologie est

entièrement basée sur la comparaison des seuils avant et après un apprentissage perceptif, il

convient de bien connaître les rares données dont nous disposons au sujet de la plasticité du

système auditif induite par apprentissage.

En effet, ce type d'études, qui apportent souvent de passionnantes informations, ont été très

peu nombreuses du fait, sans doute, de la difficulté et de la lenteur de leur mise en oeuvre.

Elles ont pourtant deux avantages:

1-Elles permettent d'étudier les mécanismes neurophysiologiques.

2-Elles peuvent parfois offrir des solutions à des pathologies en permettant l'élaboration de

techniques de rééducation auditive (Tallal, 1996).

Grimault 83

Robinson & Summerfield (1996), dans leur article passant en revue tous les phénomènes de

plasticité, introduisent différentes définitions concernant l'apprentissage qu'il me semble ainsi

important de préciser. Il existe d'après eux trois types d'apprentissages:

1-L'apprentissage procédural fait référence aux progrès initiaux et rapides des sujets du fait de

l'habituation à la tâche proposée, au stimuli... ou tout simplement du fait de son inexpérience

initiale des tests de psychoacoustique.

2-L'apprentissage du stimulus. C'est sans doute cet apprentissage qui, lorsqu'il est mis en

oeuvre, provoque une réorganisation corticale qui permet une meilleure représentation du

stimulus. Il serait beaucoup plus lent que l'apprentissage procédural à se mettre en place.

3-L'apprentissage du test. Cette forme d'apprentissage, que l'on peut à mon avis fusionner avec

l'apprentissage procédural, concerne l'habitude plus ou moins grande du sujet à un type de test.

Ces définitions ainsi que l'idée de distinguer plusieurs types d'apprentissage vient en grande

partie du travail de Recanzone et al. (1993) qui montre, chez le singe, que les seuils de

discrimination fréquentielle commencent par une brusque phase d'amélioration (apprentissage

procédural) suivie par une composante plus longue et plus lente (apprentissage du stimulus).

De ces travaux, est née l'idée d'une modélisation très simple des courbes d'apprentissage par la

somme de deux exponentielles. L'une ayant une petite constante de temps (modélisant

l'apprentissage procédural) et l'autre une grande constante de temps (modélisant

l'apprentissage du stimulus). Cette modélisation est représentée sur la figure 31.

Grimault 84

Fig 31: Exemple hypothétique d'une courbe d'apprentissage (en trait plein) somme de deux

autres courbes d'apprentissage. La première (en pointillés) représente le gain dû à la

composante procédurale de l'apprentissage. Elle présente un gain rapide puis atteint

rapidement un plateau. La seconde (en longs pointillés) présente la composante

apprentissage du stimulus. Au contraire de la première, cette courbe descend lentement et

continûment.

Enfin, il ne me semble pas évident que l'amélioration des seuils psychoacoustiques au cours

d'un entraînement rende compte de la mise en place d'une plasticité cérébrale, reflètant une

réorganisation du cortex auditif. Il faudrait, pour être sûr de cela, être persuadé de la mise en

place d'un apprentissage du stimulus. Ce résultat, en ce qui concerne la discrimination

fréquentielle, a pourtant été montré récemment par Menning et al. (2000). En effet, ces

auteurs montrent une évolution d'une variable électrophysiologique pré-attentive au cours d'un

Grimault 85

entraînement à discriminer des sons purs. Un tel phénomène est difficile à expliquer en terme

d'apprentissage uniquement procédural.

Pour conclure sur ce sujet, il faut savoir que d'autres modes de mise en place de la plasticité

sont bien sur envisageables hors entraînement. Tout d'abord la plasticité dévelopementale qui

représente un vaste sujet d'étude dans lequel je ne m'aventurerai pas ici. Enfin, une plasticité

du système auditif peut être provoquée par une surdité brusque (Bilecen et al., 2000) ou au

contraire par une réinsertion dans le monde sonore par le biais d'un appareillage auditif ou

d'un implant cochléaire. Cette dernière forme de plasticité à été passée en revue par Palmer et

al. (1998) et Philibert et al. (2000).

Grimault 87

TRAVAUX EXPERIMENTAUX

Grimault 88

Chapitre 1: Etude des mécanismes d'encodage de la hauteur des sons complexesharmoniques résolus ou non-résolus par le système auditif périphérique.

Grimault 89

Article 1: Evidence for two pitch encoding mechanisms using a selective auditory

training paradigm

Nicolas Grimault, Christophe Micheyl, Robert P. Carlyon et Lionel Collet.

RESUME:

Les mécanismes neuronaux sous-jacents à la perception de la hauteur sont cruciaux

pour l'audition et font l'objet de recherches depuis le début de XXème siècle. L'une des

questions au coeur du débat actuel consiste à établir si deux mécanismes différents peuvent

encoder une même sensation perceptive de hauteur suivant que les harmoniques du son

complexe incident sont résolus par le système auditif (i.e. Les harmoniques sont bien séparés

au passage du banc de filtre cochléaire) ou au contraire non-résolus (plusieurs harmoniques

interfèrent dans des mêmes filtres auditifs). Cette étude a pour objectif d'apporter des éléments

de réponse sinon d'élucider cette question en utilisant un astucieux paradigme de transfert

d'apprentissage inventé par les neurophysiologistes pour révéler les mécanismes neuronaux de

la perception visuelle de l'homme.

Pour ce faire, nous avons testé si un apprentissage à la discrimination de hauteur de sons

complexes constitués d'harmoniques résolus (resp. non-résolus) se transférait à la

discrimination d'harmoniques non-résolus (resp. résolus). Les résultats mettent en évidence de

meilleurs résultats dans des conditions résolues (resp. non-résolues) et non-entrainées pour les

sujets qui se sont entraînés en conditions résolues (resp. non-résolues). Ces résultats apportent

des éléments en faveur de la coexistence de différents mécanismes neuronaux sous-jacents à

la perception de la hauteur. L'un d'entre eux serait spécifique au codage des harmoniques

résolus et l'autre au codage des harmoniques non-résolus.

Grimault 90

Evidence for two pitch encoding mechanisms using a selective auditory training

paradigm

Nicolas Grimaulta),b), Christophe Micheyl a), Robert P. Carlyon c) and Lionel Collet a)

a)UPRESA CNRS 5020

Laboratoire « Neurosciences et Systèmes Sensoriels »

Pavillon U. Hôpital E. Herriot.

69437 Lyon Cedex 03 France.

33 (0)4.72.11.05.03

[email protected]

b) Entendre GIPA2 Pontchartrain. France

c)MRC Cognition and Brain Sciences Unit.

Cambridge. England.

Grimault 91

Abstract

The neural mechanisms underlying the perception of pitch, a sensory attribute of

paramount importance in hearing, have been a matter of debate for over a century. A question

currently at the heart of the debate is whether the pitch of all harmonic complex tones can be

determined by the auditory system using a single mechanism, or whether different neural

mechanisms are involved, depending on the stimulation conditions. This question was

investigated here by testing for transfer of learning in pitch discrimination between different

stimulus conditions. The results indicate the existence of two distinct underlying mechanisms

for complex pitch perception.

Grimault 92

Introduction

Harmonic complex sounds, such as musical tones and vowels, generally elicit a strong

pitch sensation which corresponds approximately to their fundamental frequency (F0). This

pitch plays a role of paramount importance in hearing: it conveys melody in music, prosody in

speech, and it plays an essential part in the perceptual analysis of complex auditory scenes

(Hartman, 1988). The neural mechanisms underlying pitch perception have been debated for

over a century (Von Helmholtz, 1863; Schouten, Ritsma & Cardozo, 1962). A question that

currently occupies the centre of this debate is whether a single neural mechanism can account

for the perception of the pitch of all harmonic tones (Cariani & Delgutte, 1996a, 1996b;

Meddis & Hewitt, 1991a, 1991b; Meddis & O’Mard, 1997), or whether different mechanisms

are involved depending on the stimulation conditions (Carlyon & Shackleton, 1994; Ragot &

Crottaz, 1998; Shackleton & Carlyon, 1994; Steinschneider, Reser, Fishman, Schroeder &

Arezzo, 1998). This question is inspired from the fact that the cochlea acts like a bank of

parallel bandpass filters and has a finite frequency resolving power, which decreases with

increasing frequency. Thus, when the harmonics are widely spaced -as is the case at high F0s-,

and/or the frequencies of the harmonics are low, the frequency components of the sound are

« resolved » by the peripheral auditory system i.e. they fall in different peripheral auditory

filters and are conveyed by independent peripheral auditory channels (Fig. 1). In this situation,

although no single peripheral auditory channel contains unambiguous information about the

F0 of the sound, the central auditory system must combine the outputs of different auditory

channels to derive the pitch. In contrast, when the F0 is low, the frequency of the harmonics is

high, or both, the components of the sound are « unresolved » by the auditory periphery i.e.,

several of them fall within the passband of a single auditory filter and are mingled by the

corresponding auditory channel. In this situation, the auditory system can retrieve the pitch by

taking advantage of the fact that the auditory filter outputs fluctuate at a rate equal to the F0

(Fig. 1).

Grimault 93

Figure 1. Simulated peripheral auditory filter outputs of different center frequencies to a

harmonic complex tone. The stimulus consisted of a 500-Hz fundamental frequency and its

harmonics up to the 20th. The stimulus spectrum is represented vertically on the right, with

spectral components shown as horizontal bars. Auditory filter responses were computed in the

time domain using a « gammachirp » impulse response (Irino & Patterson, 1997). The time

span on this graph equals two periods of the fundmanetal frequency - i.e. 4 ms -. It can be

seen that while the lower harmonics - i.e. 500 and 1000 Hz - excite distinct auditory filters, at

higher frequencies, several harmonics interact within the auditory-filter passbands. Thus, at

high frequencies, the auditory filter outputs repeat over time at a rate corresponding to the

stimulus fundamental frequency.

On the other hand, it has been proposed that both resolved and unresolved harmonics

could be embraced by a single pitch mechanism (Cariani & Delgutte, 1996a, 1996b; Meddis

& Hewitt, 1991a, 1991b; Meddis & O’Mard, 1997). Schematically, the proposed mechanism

amounts to the computation by the central nervous system of a summed autocorrelogram,

which is obtained by pooling the autocorrelation functions of neural activity within the

different peripheral auditory channels. The resulting autocorrelogram exhibits peaks, the

largest of which corresponds in most cases to the perceived pitch of the sound. This

mechanism can account for the pitch of many stimuli, both when the harmonics are resolved

and when they are unresolved.

Grimault 94

In the present study, the existence of a single or of two different mechanisms for pitch

perception was adressed using a transfer-of-learning approach. The reasoning behind this

approach is that if the neural mechanism underlying pitch perception for resolved harmonics is

distinct from that used for unresolved harmonics, listeners trained in pitch-discrimination with

exclusively resolved harmonics should exhibit little or no performance improvement with

unresolved harmonics, and vice versa. Transfer of perceptual learning has widely been used as

a tool for investigating the locus of neural processes underlying different tasks in the visual

modality (Ahissar & Hochstein, 1993; Karni & Sagi, 1990, 1991, 1993; Polat & Sagi, 1994;

Shiu & Pashler, 1992). In the auditory modality, this approach has been used much more

sparsely (Demany, 1985; Wright, Buonomano, Mahncke & Merzenich, 1997).

From a more general point of view, very little data are available to date on perceptual

auditory learning. While most of the numerous psychoacoustical studies published so far have

involved trained subjects, very few articles provide explicit data on the perceptual learning

that accompanies this training. Regarding frequency discrimination learning, the most recent

data in humans are those of Demany (Demany, 1985), which indicate large improvements in

frequency discrimination thresholds within about 2½ hours of training, and significant transfer

of learning across frequencies spanning a very wide range of frequencies below 6 kHz. In owl

monkeys, other data on the time course and transferability of learning in frequency

discrimination have been provided by a study of Recanzone, Schreiner & Merzenich (1997).

Both of these studies involved pure tones. Although complex tones are much more common

in our auditory environment, no data are available in the literature on the time course and

transferability of pitch discrimination learning for complex tones yet. Consequently, besides

testing further for the existence of two pitch-encoding mechanisms, a secondary objective of

the present study was to provide data on perceptual learning with complex tones.

Material & Methods

Subjects

Twelve listeners took part in this experiment. The subjects ranged in age between 19

and 28 years (mean=23.83, SD=3.16). They all had binaural normal hearing, i.e., absolute

pure tone thresholds at or below 15 dB HL at octave frequencies from 250 to 8000 Hz

(American National Standard Insitute, 1969). None had prior experience in psychoacoustic

tasks. They all were paid an hourly wage for their participation. All completed the experiment.

Grimault 95

Stimuli

The stimuli consisted of harmonic complex tones having a duration of 200 ms,

including 50-ms cosine ramps. They were generated digitally in the time domain by adding the

successive harmonics of a given F0 in sine (0°) phase. The harmonics were then bandpass-

filtered digitally using a filter with a flat top and 48 dB/octave slopes. As many harmonics as

necessary to fill in the passband at 48 dB of the filter were included; harmonics to which an

attenuation larger than 48 dB would have had to be applied were omitted. Three different

filtering regions were used: a LOW region with lower and upper corner frequencies of 125

and 625 Hz, a MID region (1375-1875 Hz), and a HIGH region (3900-5400 Hz). Previous

studies (8-10, 22-26) have shown that in the MID frequency region, successive harmonics of

the 250-Hz nominal F0 occupy different peripheral auditory filters and are thus well resolved

by the auditory system, whereas harmonics of the 88-Hz F0 are largely unresolved. The other

stimuli, with F0s of 88 and 250 Hz, and filtered in LOW region were resolved by the

peripheral auditory system, whereas the other two, filtered in HIGH region were unresolved

(because auditory filters are broader at high than at low frequencies). We used then 6 different

complex (2 F0 X 3 regions) reaching the two different resolvability conditions.

The levels of the stimuli in the standard and signal intervals were set to 40 dB above the

absolute threshold of a harmonics complex filtered in the same frequency region and having

the same F0 as the standard. This level is referred to as 40 dB SL in the remaining of the

article. A pink-noise background with a 3 dB/octave slope was presented continuously

throughout all measurements. The aim of this noise background was to prevent the perception

by the listeners of combination tones generated by the ear, which might have introduced a bias

in some of the test conditions. The noise was digitally generated and pre-recorded on CD. Its

level was adjusted individually in each subject 20 dB above its absolute detection threshold.

Experimental Design

All subjects first took part in a preliminary test session during which they could

familiarise themselves with the test procedure and stimuli. On this preliminary session, three

threshold estimates were collected in each of the six stimulus conditions, in random order.

Subjects were then divided in three groups composed each of four listeners. In two of these

Grimault 96

groups, listeners were trained on pitch discrimination of a complex having a fundamental

frequency (F0) of either 88 or 250 Hz. Subjects from the third (control) group received no

specific training in F0 discrimination. In all groups, stimuli were presented in the right ear for

two listeners and in the left ear for the two others. In the two trained groups, the complex was

filtered in the same frequency region (1375-1875 Hz), so that any subsequent difference in

performance cannot be attributed to a difference in the trained frequency region. Training

lasted two hours a day, three days per week, for four consecutive weeks, with trial-by-trial

visual feedback.. During these 2-hour training sessions, subjects had to complete thirty DLF0

measurements in a single condition.

For each of the four subjects comprising each experimental group, five threshold

estimates were obtained in each of the six experimental conditions on the week before

training, the week after training, as well as five weeks after the end of training. Each threshold

estimate was obtained using a three-interval, two alternative, forced-choice procedure without

visual feedback; a two-down one-up adaptive rule tracked the 70.7% correct point on the

psychometric function (Levitt, 1971). Difference in F0s (∆F0s) between the standard and

signal stimuli were increased or decreased by a factor of 2 until the fourth turnpoint and by √2

thereafter. The procedure stopped after 16 turnpoints on the psychophysical staircase. The

threshold was estimated as the geometric mean of the ∆F0s over to the last twelve turnpoints,

expressed as a percentage of the nominal F0.

Material

The harmonic complex tone signals were generated digitally in the time domain on a

PC150 computer and output using a 16-bit digital-to-analog converter (TDT DA1) at a

sampling rate of 44.1 kHz. The pink-noise masker was generated digitally, recorded on an

audio compact disc (CD) and played out continuously throughout the measurements using the

CD-Rom drive (Goldstar CRD8322B) of another Pentium computer. The signals and noise

were individually low-pass filtered at 15 kHz (TDT FT6-2, attenuation more than 60 dB at

1.15 times the corner frequency)and attenuated (TDT PA4). Finally, they were summed (TDT

SM3) and led to the right or left earpice of a Sennheiser HD465 headphone mounted in a

25125 cushion via a headphone preamplifier (TDT HBC). Stimuli were monitored on all

sessions using an HP3561A signal analyzer. Subjects were comfortably seated in a sound-

treated booth

Grimault 97

Grimault 98

Results

Figure 2. Relative changes in pitch discrimination thresholds in the three experimental

groups beetween pré- and first post-test. Fig. 2a: Mean relative changes in pitch

discrimination thresholds in the untrained only, resolved and unresolved conditions (i.e. 250-

LOW and 88-LOW for resolved, and 88-HIGH and 250-HIGH for unresolved). Stars indicate

statistical significance. Fig. 2b: Relative changes in thresholds in each of the six resolvability

conditions. The nominal F0 (88 or 250 Hz) and frequency region (LOW, MID, or HIGH) used

in each condition are indicated below the bottom abscissa. Conditions are sorted by order of

decreasing harmonic resolvability, from left to right. According to the criterion defined in

earlier studies (Carlyon, 1996a, 1996b; Carlyon & Shackleton, 1994; Grimault, Micheyl,

Grimault 99

Carlyon, Arthaud & Collet, 2000; Micheyl & Carlyon, 1998; Plack & Carlyon, 1995; Ragot

& Crottaz, 1998; Shackleton & Carlyon, 1994), the third rightmost conditions correspond to

unresolved harmonics, and the third leftmost to resolved harmonics (the separation is

materialised by a vertical dashed line). The condition that is used in the text as an example is

indicated by arrow. Open (resp. filled) squares represent data from the group trained with

resolved (resp. unresolved) harmonics. Filled circles represent data from the control group.

Each data point was computed as the ratio of group (geometric) mean thresholds before and

after training. Error bars represent the geometrical standard errors around the mean ratios.

Fig. 2a shows the mean improvement for untrained conditions, between the pre- and

first post- training sessions. Performance is averaged across the two resolved and across the

two unresolved complexes. A contrast analysis on the log-transformed thresholds revealed

that subjects trained in the resolved condition showed larger improvements in the other

resolved conditions than subjects trained with unresolved harmonics (F(2,6)=8.45, p<0.05),

and vice versa (F(2,6)=27.59, p<0.001). The improvement for each individual condition

(including the two «trained» conditions) is shown separately in Fig. 2b, together with those for

a control, untrained group of subjects. In addition to the main findings shown in Fig. 2a, the

fact that most points lie above a value of unity shows that there is some general transfer of

learning to all combinations of F0 and frequency region. It can also be seen that the effect of

resolvability is larger than that of F0, in that listeners trained with a resolved 250-Hz complex

show a smaller improvement on an unresolved 250 Hz complex than do those trained with an

unresolved 88 Hz complex (comparison indicated by arrow on Fig. 2b).

Grimault 100

Figure 3. Relative pitch discrimination thresholds in the preliminary, pre-, per-, and post-

training sessions in the two experimental groups. Each data point represents the geometric

mean of relative pitch discrimination thresholds (i.e. difference in F0 at threshold between the

signal and standard stimuli divided by nominal F0) obtained on a given session across

subjects in each group. The solid (resp. dotted) lines represent the learning curves in the

group trained with resolved (resp. unresolved) harmonics. The curves were modelled as a sum

of two decaying exponentials (Recanzone et al. 1993; Robinson & Summerfield, 1996) and

fitted to the data using a least-squares error algorithm. The two time constants and R2 derived

from fitting are indicated nearby each curve.

Grimault 101

Further evidence for a qualitative difference between the mechanisms underlying pitch

perception for resolved and unresolved harmonics comes from the comparison of the time

course of learning in the two cases (Fig. 3). Theoretical learning curves consisting of the sum

of two exponentials with different time constants were fitted to the data. While in the group

trained with resolved harmonics a protracted improvement in performance was obtained, the

other group showed an abrupt initial improvement followed by a plateau. A repeated-measures

analysis of variance on the log-transformed thresholds obtained during the training sessions

revealed a significant overall improvement in performance over the twelve training sessions

(F(11,66)=1.91, p<0.001) and a trend for this improvement to differ between the two training

groups (F(11,66)=4.25, p=0.054). Furthermore, when subjects were re-tested 5 weeks after

training (Fig. 4), it was found that, while in the group trained with unresolved harmonics, the

mean threshold of the 6 conditions tended to increase between the first and the second post-

training tests (F(1,6)=5.25, p=0.062), in the group trained with resolved harmonics, no such

tendency for learning loss was observed.

Figure 4. Relative changes in thresholds in each of the six resolvability conditions beetween

the first and the second post-test. As in Fig. 2b, the nominal F0 and frequency used in each

condition are indicated in abscissa. Open (resp. filled) squares represent data from the group

trained with resolved (resp. unresolved) harmonics. Filled circles represent data from the

control group. Each data point was computed as the ratio of group (geometric) mean

thresholds. Error bars represent the geometrical standard errors around the mean ratios.

Discussion

Grimault 102

With respect to the main objective of the present study, namely testing for the

existence of two F0-encoding mechanisms, the main result consists in the fact that transfer of

learning in F0 discrimination was larger between complexes sounds of the same resolvability

status (i.e. both resolved or both unresolved) than between sounds of a different resolvability

status. This finding provides a string argument for the hypothesis that different mechanisms

subtend the encoding of F0 depending on whether the harmonics are resolved or unresolved

by the peripheral auditory system.

The fact that some transfer of learning was observed across all conditions may be

explained in terms of procedural learning (Robinson & Summerfield, 1996). The subjects had

no prior experience in psychoaocustical tests, and it is possible that they had not completely

familiarised with the test procedure by the end of the preliminary session. Because such

procedural learning is unlikely to depend on the specific characteristics of the stimuli used, it

probably explains the overall increase in performance observed in all stimulus conditions

between the pre- and post-training sessions. Furthermore, procedural learning must have

occured similarly in the two trained groups, irrespective on the stimulus condition used for the

training. Thus, procedural learning cannot explain differences in performance improvements

between resolved and unresolved conditions, or between subjects trained in a resolved

condition and those trained in an unresolved condition. To explain these differences, some

form of stimulus-specific learning must be involved. The notion that two different forms of

learning have taken place is suggested by the fact that the learning curves show a fast increase

in performance occuring within the first training week, followed by a slower improvement - at

least in the group trained with resolved harmonics -. These two components of the learning

curve, also indicated in previous reports on frequency discrimination learning (19,21), are

commonly though to correspond, respectively, to procedural and stimulus learning. The

observation, in this study, that the first time constants of the learning cruves of the two groups

are very similar (1.51 vs. 1.43), agrees with the notion that they correspond to procedural

learning, which should in effect not be different in the two groups.

The fact that the time constants corresponding to the later components of the learning curves

differed widely - with almost no further performance improvement occurring in the group

trained with unresolved harmonics - is consistent with the notion that stimulus learning

differed as a function of the stimulus-condition used for training.

Grimault 103

One potential problem with a straightforward interpretation according to which the

effects of procedural and stimulus learning are independent and additive, relates to the

observation that thresholds tended to raise in all conditions between the first and the second

post-training sessions in the group trained with unresolved harmonics, but not in the other

group. This observation can not be explained by a loss of procedural learning since, under the

above-mentioned interpretation, the loss should have been the same in both groups. It can

neither be explained by a loss of stimulus learning since thresholds raised precisely in the

group which, under the above-mentioned intrepretation, showed no stimulus learning - the

time constant of the second member of the theoretical learning function being very large -.

This straightforward interpretation of the results in terms of dissociated F0-encoding

mechanisms for resolved and unresolved mechanisms may have to be qualified based on the

trend - clearly apparent in the graphs - that while listeners trained with resolved harmonics

showed larger improvements with resolved than with unresolved harmonics, those trained

with unresolved harmonics improved almost equally in all conditions. One possible

explanation for this behavior is that training with unresolved harmonics has biased the

subjects’ auditory system toward using a mechanism that is normally used preferentially for

unresolved harmonics, but that may also apply to resolved ones. This would in particular be

the case if the F0-encoding mechanism for resolved harmonics relied on spectral cues present

in the peripheral pattern of excitation produced by the stimuli, whereas the F0-encoding

mechanism for unresolved harmonics relied on the temporal information at the output of the

peripheral auditory system. Indeed, above a given degree of unresolvability of the stimuli,

spectral cues become absolutely undetectable, and the former mechanism can no longer work.

In contrast, because temporal information as to the F0 is present in auditory-nerve dicharges

for both resolved and unresolved harmonics (Meddis & O’Mard, 1997; Cariani & Delgutte,

1996a, 1996b), a temporally-based mechanism for F0-extraction could theoretically work in

both conditions.

In conclusion, besides providing data on learning in pitch discrimination with complex

tones, the present study provides evidence based on a transfer-of-learning paradigm used

formerly in vision, that two different mechanisms underlying the perception of pitch

depending on whether the harmonics are resolved or unresolved at the auditory periphery. The

strongest argument in favor of this dual pitch mechanism hypothesis is that the transfer of

learning in pitch discrimination from one harmonic complex sound to another is strongest

when they are both resolved or both unresolved than when they differ in resolvability. This

Grimault 104

effect cannot be accounted for by existing models of pitch perception in which a unitary

mechanism processes the pitch of all complex tones. A consistent interpretation of these

results is that in spite of the apparent unity of the sensory attribute of complex sounds known

as the virtual pitch, due to constraints imposed on the central nervous system by the peripheral

auditory organ, two separate neural mechanisms underly the pitch of resolved and of

unresolved stimuli.

Grimault 105

References

Ahissar, M. & Hochstein, S. (1993). Attentional control of early perceptual learning.

Proceeding of the National Academy of Science. USA, 90, 5718-5722.

American National Standard Institute. (1969). Specification for audiometers. (ANSI S3.6-

1969), New-York: ANSI.

Cariani, P.A. & Delgutte, B. (1996a). Neural correlates of the pitch of complex tones. I. Pitch

and pitch salience. Journal of Neurophysiology. 76, 1698-1716.

Cariani, P.A. & Delgutte, B. (1996b). Neural correlates of the pitch of complex tones. II. Pitch

shift, pitch ambiguity, phase invariance, pitch circularity, rate pitch, and the dominance

region for pitch. Journal of Neurophysiology. 76, 1717-1734.

Carlyon, R.P. (1996a). Encoding the fundamental frequency of a complex tone in the presence

of a spectrally overlapping masker Journal of the Acoustical Society of America. 99,

517-524.

Carlyon, R.P. (1996b). Masker asynchrony impairs the fundamental-frequency discrimination

of unresolved harmonics. Journal of the Acoustical Society of America. 99, 525-533.

Carlyon, R.P. & Shackleton, T.M. (1994). Comparing the fundamental frequencies of resolved

and unresolved harmonics: Evidence for two pitch mechanisms? Journal of the

Acoustical Society of America. 95, 3541-3554.

Demany, L. (1985). Perceptual learning in frequency discrimination. Journal of the Acoustical

Society of America. 78, 1118-1120.

Grimault, N., Micheyl, C., Carlyon, R.P., Artaud, P. & Collet, L. (2000). Influence of

peripheral resolvability on the perceptual segregation of harmonic complex tones

differing in fundamental frequency: results from normal-hearing and hearing-impaired

subjects. Submitted.

Hartmann, W.M. (1988). Pitch perception and the segregation and integration of auditory

entities. In G.M. Edelman, W.E. Gall & W.M. Cowan (eds), Auditory function, (pp.

623-645) New York: Wiley.

Von Helmholtz, H.L.F. (1863). Die Lehre von den Tonempfindungen als physiologische

Grundlage für die Theorie der Musik. Braunschweig, Germany: F. Vieweg & Sohn.

Irino, T. & Patterson, R.D. (1997). A time domain, level-dependent auditory filter: The

gammachirp. Journal of the Acoustical Society of America. 101, 412-419.

Grimault 106

Karni, A. & Sagi, D. (1990). Texture discrimination learning is specific for spatial location

and background orientation. Investigation in Ophthalmology and Visual Science.

(Suppl.), 31, 562.

Karni, A. & Sagi, D. (1991). Where practice makes perfect in texture discrimination:

Evidence for primary visual cortex plasticity. Proceeding of the National Academy of

Science. U.S.A. 88, 4966-4970.

Karni, A. & Sagi, D. (1993). The time course of learning a visual skill. Nature. 365, 250-252.

Levitt, H. (1971). Transformed up-down methods in psychoacoustics. Journal of the


Meddis, R. & Hewitt, M. (1991). Virtual pitch and phase sensitivity of a computer model of

the auditory periphery: I. pitch identification. Journal of the Acoustical Society of

America. 89, 2866-2882.

Meddis, R. & Hewitt, M. (1991). Virtual pitch and phase sensitivity of a computer model of

the auditory periphery: II. Phase sensitivity. Journal of the Acoustical Society of

America. 89, 2883-2894.

Meddis, R. & O’Mard, L. J. (1997). A unitary model of pitch perception. Journal of the


Micheyl, C. & Carlyon, R.P. (1998). Effect of temporal fringes on fundamental-frequency

discrimination. Journal of the Acoustical Society of America. 104, 3006-3018.

Plack, C.J. & Carlyon, R.P. (1995). Differences in frequency modulation detection and

fundamental frequency discrimination between complex tones consisting of resolved

and unresolved harmonics. Journal of the Acoustical Society of America. 98, 1355-

1364.

Polat, U. & Sagi, D. (1994). Spatial interactions in human vision: from near to far via

experience dependent cascades of connections. National Academy of Science. USA. 91,

1206-1209.

Ragot, R. & Crottaz, S. (1998); A dual mechanism for sound pitch perception: new evidence

from brain electrophysiology. Neuroreport 9, 3123-3127.

Recanzone, G.H., Schreiner, C.E. & Merzenich, M.M. (1993). Plasticity in the frequency

representation of primary auditory cortex following discrimination training in adult owl

monkey. Journal of Neuroscience. 13, 87-103.

Robinson, K. & Summerfield, A.Q. (1996). Adult auditory learning and training. Ear and

Hearing. 17, 51S-65S.

Grimault 107

Schouten, J.F., Ritsma, R.J. & Cardozo, B.L. (1962). Pitch of the residue. Journal of the


Shackleton, T.M. & Carlyon, R.P. (1994). The role of resolved and unresolved harmonics in

pitch perception and frequency modulation discrimination. Journal of the Acoustical

Society of America. 95, 3529-3540.

Shiu, L.P. & Pashler, H. (1992). Improvement in line orientation discrimination is retinally

local but dependent on cognitive set. Perception and Psychophysics. 52, 582-588.

Steinschneider, M., Reser, D.H., Fishman, Y.I., Schroeder, C.E. & Arezzo, J.C. (1998). Click

train encoding in primary auditory cortex of the awake monkey: evidence for two

mechanisms subserving pith perception. Journal of the Acoustical Society of America.

104, 2935-2955.

Wright, B.A., Buonomano, D.V., Mahncke, H.W. & Merzenich, M.M. (1997). Learning and

generalization of auditory temporal-interval discrimination in humans. Journal of

Neuroscience. 17, 3956-3963.

Grimault 108

Author notes

This study received the approval of the ethics comitee (CCPPRB Léon Bérard

N°DGS 980626). It was supported by a research grant from Entendre GIPA2 and by the

Centre National de la Recherche Scientifique (CNRS). We thank Dr. Laurent Demany and

Pr. John D. Durrant for helpful suggestions on an earlier version of the manuscript. Jean-

Christophe Béra is gratefully acknowledged for his help with headphone calibration.

Grimault 109

Article 2: Perceptual learning in pure-tone frequency discrimination and amplitude-

modulation rate discrimination, and generalization to fundamental frequency

discrimination.

Nicolas Grimault, Christophe Micheyl, Robert P. Carlyon, Sid P. Bacon et Lionel Collet

RESUME:

Dans l'étude précédente, nous avons apporté des éléments suggérant l'existence d'au

moins deux mécanismes neuronaux distincts pour encoder la hauteur des sons complexes

harmoniques suivant que les composantes de ces sons étaient résolues ou non-résolues par

le système auditif périphérique. Cette seconde étude a pour objectif de caractériser

partiellement ces deux mécanismes neuronaux. Nous avons utilisé dans cette étude le

même paradigme expérimental de transfert d'apprentissage que dans l'étude précédente.

Principalement, nous avons mis en évidence que des sujets entraînés à discriminer

des sons purs amélioraient plus leurs performances de discrimination lorsque les

sons complexes à discriminer sont résolus que lorsqu'ils sont non-résolus. Ce résultat

suggère la validité d'un modèle spectral ou spectro-temporel lorsque les harmoniques sont

résolus par le système auditif périphérique. Par ailleurs, la plus grande aptitude des sujets à

discriminer entre eux des bruits dont la fréquence de modulation est différente ne semble

pas favoriser sélectivement l'encodage des harmoniques non-résolus.

Grimault 110

Perceptual learning in pure-tone frequency discrimination and amplitude-modulation

rate discrimination, and generalization to fundamental frequency discrimination

Nicolas Grimaulta),b), Christophe Micheyl a), Robert P. Carlyon c),

Sid P. Bacon d) and Lionel Collet a)

a)UPRESA CNRS 5020 Laboratoire "Neurosciences and Systèmes Sensoriels", Hôpital E.

Herriot - Pavillon U, 69437 Lyon Cedex 03, France

b) ENTENDRE Audioprothesists Group GIPA2, Pontchartrain. France.

c) MRC- Cognition and Brain Sciences Unit. 15, Chaucer Rd. Cambridge, CB2-2EF,

England.

d) Psychoacoustics Laboratory, Department of Speech and Hearing Science, Arizona State

University, Tempe, Arizona 85287-1908.

Grimault 111

INTRODUCTION

Many sounds in our auditory environment consist of harmonic complex sounds, which

contain spectral components whose frequencies are all integer multiple of a low, fundamental

frequency (F0). These sounds generally elicit a strong pitch sensation which is determined by

their F0: the higher the F0, the higher the pitch. This pitch is known as the fundamental pitch,

or virtual pitch, because it may be perceived even in the absence of the physical component

corresponding to the F0. Virtual pitch plays a role of paramount importance in hearing.

Variation in pitch over time convey melody in music and prosody in speech. Furthermore, the

encoding of the F0 of complex sounds has been shown to be tied to the perceptual analysis of

complex sounds and, specifically, the separation of concurrent sounds (Hartmann, 1988;

Bregman, 1990; Bregman et al., 1990).

The mechanisms underlying virtual pitch perception have been a matter of debate for

over a century. A central question that is currently disputed is whether the auditory system

"computes" the virtual pitch of all harmonic complex sounds using a single mechanism, or

whether different underlying mechanisms are needed in order to accommodate the limited

frequency resolving power of the auditory periphery. The peripheral auditory system is

traditionally modeled as a bank of parallel, bandpass filters whose outputs are conveyed to the

central nervous system by independent channels. The auditory-filter bandwidths increase with

center frequency. When the F0 of the complex is high and/or the harmonic are low in

frequency, the frequency separation between consecutive harmonics is large in regard to

auditory filter bandwidths so that each peripheral channel conveys a single harmonic; the

harmonics are then said to be "resolved" by the peripheral auditory system. In this case,

although no single peripheral auditory channel conveys unambiguous information about the

F0 of the whole sound, the central auditory system may combine information across channels

to determine the virtual pitch (Thurlow, 1963; Whitfield, 1967, 1970; Walliser, 1968, 1969a-

c; Terhardt, 1972a,b; Goldstein, 1973). In contrast, when the F0 is relatively low and/or the

frequencies of the harmonics are high, several components fall within the passband of the

same peripheral auditory filter and interfere within the same peripheral channels. In this case,

the central auditory system may however compute virtual pitch by taking advantage of the fact

that as soon as several harmonics fall within the bandwidth of the same auditory filter, that

auditory filter output fluctuates in amplitude at a rate equal to the F0 (Schouten, 1940, 1970).

Based on such considerations, it has been proposed that the central auditory system in fact

uses two different mechanisms in order to encode the pitch of resolved and of unresolved

Grimault 112

harmonics. On the other hand, some authors maintain that the virtual pitch can be encoded by

means of a single mechanism, independently of whether the harmonics making up the sounds

are resolved or not by the auditory periphery (Meddis and O'Mard, 1997),

In a recent study, this question of the existence of one versus two mechanisms for pitch

perception was addressed using a transfer-of-learning approach (Grimault et al., submitted

article). The reasoning behind this approach was that, if the mechanism underlying the

perception of pitch for resolved harmonics is distinct from that underlying pitch perception for

unresolved harmonics, training in pitch discrimination with exclusively unresolved harmonics

should lead to little or no improvement in pitch discrimination with resolved harmonics, and

vice versa. The results of this study provided two main arguments for the "dual pitch

mechanisms" hypothesis. Firstly, they demonstrated that specific training in F0 discrimination

using resolved harmonics later resulted in significantly larger improvements with other

complexes also made of resolved harmonics than with complexes made of unresolved

harmonics, and vice versa. Secondly, the learning curve representing the changes in thresholds

as a function of time during training was found to have a different time constant for resolved

than for unresolved harmonics. Overall, these results argue for the hypothesis that pitch

perception involves different mechanisms for resolved and for unresolved harmonics.

The current study aimed to further investigate this question of the underlying

mechanisms of virtual pitch perception by trying to gather information on the nature of the

mechanism responsible for the encoding of the pitch of resolved harmonics and that involved

in the processing of unresolved harmonics. Otherwise stated, while the previous study by

Grimault et al. (1999) addressed the question “are the mechanisms underlying the F0-

encoding of resolved and unresolved harmonics different?”, the present study addresses the

question “what are these mechanisms?”. Several candidate mechanisms have been put

forward in the literature to account for the perception of the pitch of complex tones consisting

of either resolved or unresolved harmonics. It is interesting to remark that with the possible

exception of the model proposed by Meddis and colleagues (Meddis and Hewitt, 1991;

Meddis and O'Mard, 1997; see: Carlyon, 1998 for a criticism), models of pitch perception

which can account fairly successfully for the perception of the pitch of resolved harmonics

generally fail for unresolved harmonics, and vice versa. Basically, models operating on

resolved harmonics include two stages: a first stage consisting in the estimation of the

frequencies of the individual components of the complex, and a second stage consisting of a

pattern recognizer, which estimates the pitch of the complex from the estimated frequencies of

Grimault 113

the individual components (Thurlow, 1963; Whitfield, 1967, 1970; Walliser, 1968, 1969a-c;

Terhardt, 1972a,b; Goldstein, 1973). On the other hand, models operating on unresolved

harmonics all involve the measurement of time intervals between successive periods in the

output waveform from auditory filters within the bandwidth of which several harmonics

interact. More precisely, Schouten (1940, 1970) proposed that the pitch of unresolved

harmonics - which he called the "residue pitch" - is determined by the most prominent time

interval between those peaks in the fine structure of the waveform which are close to adjacent

envelope maxima. In support of this view, he reported results showing that the pitch evoked

by a series of three harmonics having the same frequency spacing - and thus the same rate of

envelope fluctuations - could be altered by shifting all the components up or down in

frequency. Later on, however, Burns and Viemeister (1976, 1981) showed that stimuli which

contained no fine structure cues - i.e. noise - could produce a virtual pitch sensation when

sinusoidally amplitude modulated at a rate comprised within the pitch range, thereby

indicating that the rate of envelope fluctuations is a potential cue for pitch perception.

In the present study, we sought to test the hypotheses, inspired from the above-

mentioned data, that: 1/the pitch perception of resolved harmonics involves the encoding of

the frequencies of the individual components, and that 2/the pitch perception of unresolved

harmonics is based on the estimation of the rate of fluctuations in the stimulus envelope - or

equivalently, of the time intervals between successive envelope maxima -. Both hypotheses

were tested using a transfer-of-learning approach. We reasoned, firstly, that if the encoding of

the F0 of resolved harmonics depends in part on the encoding of the frequencies of the

individual harmonics, then training listeners in pure-tone frequency discrimination (FD)

should lead to an improvement, not only of pure-tone frequency discrimination performances,

but also of the F0-discrimination (F0D) performances for complexes consisting of resolved

harmonics. Furthermore, in order to test whether the improvement in F0D - if any - caused by

training in FD was mediated by an increase in the accuracy of the encoding of the individual

frequencies of the harmonics, or of the overall pitch itself - in other words, whether the

improvement was taking place at the first or second stage of the two-stage pattern-recognition

models of pitch perception described above -, we trained some listeners in FD with pure-tones

whose frequency fell within the range of the harmonics and other listeners with pure-tones

whose frequency fell in the range of the pitch of the complex tones which were used to test

their F0D performances. The observation of learning-transfer between FD and F0D in the

former listeners would suggest that some improvement had taken place at the level of the first

Grimault 114

pitch-encoding stage. The observation of learning-transfer between FD and F0D in the latter

listeners would suggest that some improvement had taken place at the level of the second

pitch-encoding stage.

Secondly, we reasoned that if the perception of the pitch of unresolved harmonics was

subtended by the estimation of the rate of envelope fluctuations in the auditory filter outputs,

training subjects in the discrimination of the rate of envelope fluctuations should improve

their ability to discriminate the F0s of successive harmonic complexes made of unresolved

harmonics. An important prerequisite for this hypothesis to be testable is, of course, that

amplitude-modulation rate discrimination (AMRD) performance improves significantly with

practice. Although there is very little data in the literature on perceptual learning in this type

of temporal task, recent results from Schulze et al. (1999) and Fitzgerald and Wright (2000),

suggest that this is the case.

MATERIAL AND METHODS

Subjects

Fifteen listeners took part in this experiment. They ranged in age between 20 and 27

years (mean=22.13, SD=1.68). They all had binaural normal hearing, i.e., absolute pure tone

thresholds at or below 15 dB HL at octave frequencies from 250 to 8000 Hz (American

National Standard Institute, 1969). None had prior experience in psychoacoustic tasks. All

were paid an hourly wage for their participation. All completed the experiment.

Stimuli

The stimuli consisted of pure tones, amplitude-modulated noise bands, and harmonic

complex tones. All stimuli had an overall duration of 200 ms and were shaped with 20-ms

cosine ramps, except for the harmonic complexes which had 50-ms ramps for consistency

with a previous study (Grimault et al., submitted article).

The pure tones had nominal frequencies of 88, 250, and 1605 Hz. The former two

frequencies were chosen to correspond to the nominal F0s of the harmonics complexes. The

latter was chosen to correspond to the geometric center frequency of the mid-frequency region

in which the harmonic complexes and modulated noise bands were filtered (see below). The

nominal SPLs of the pure tones were 85 dB at 88 Hz, 80 dB at 250 Hz, and 75 dB at 1605 Hz,

which corresponded to approximately 80 phones. The complex tones and modulated noise

bands had an overall level of 55 dB SPL. All stimuli were presented in a continuous pink (3

Grimault 115

dB/octave slope) noise background with an overall level of 57 dB SPL. This noise background

was aimed to prevent the perception by the listeners of combination tones generated by the

ear, which might have obscured the interpretation of the results. All stimuli were generated

digitally using a 44.1-kHz sampling frequency and a 16-bit coding range. They were then

saved on the computer hard disk, or, for the pink noise, on a compact disk.

The noise bands were obtained by digitally filtering white noise using a filter with a

flat-top and 48 dB/octave slopes. The maximum attenuation of the numeric filter was set to 48

dB. Three different filtering regions were used: a LOW region with lower and upper corner

frequencies of 125 and 625 Hz, a MID region (1375-1875 Hz), and a HIGH region (3900-

5400 Hz). These noise bands were then fully (100%) amplitude-modulated using a nominal

modulation rate of either 88 or 250 Hz.

The harmonic complexes were generated by adding the successive harmonics of a

given F0 in sine (0°) phase. The harmonics were then bandpass-filtered digitally using a filter

with a flat top and 48 dB/octave slopes. As many harmonics as necessary to fill in the

passband at 48 dB of the filter were included; harmonics to which an attenuation larger than

48 dB would have had to be applied were omitted. The same LOW, MID and HIGH filtering

regions than previously were used. Previous studies have shown that in the MID frequency

region, successive harmonics of the 250-Hz nominal F0 occupy different peripheral auditory

filters and are thus well resolved by the auditory system, whereas harmonics of the 88-Hz F0

are largely unresolved. The other stimuli, with F0s of 88 and 250 Hz, and filtered in LOW

region were resolved by the peripheral auditory system, whereas the other two, filtered in

HIGH region were unresolved (because auditory filters are broader at high than at low

frequencies).

Experimental Design

All subjects first took part in a preliminary test session during which they could

familiarize themselves with the test procedure and stimuli. On this preliminary session, two

threshold estimates were collected in each of the 15 stimulus conditions, in random order.

Subjects were then divided in five groups composed each of three listeners. In three of these

groups, listeners were trained in FD with pure tones of either 88, 250, or 1605 Hz. In the two

other groups, the listeners were trained on AMRD in the MID frequency region with

frequency modulation rates of either 88 or 250 Hz. Training lasted two hours a day, three days

Grimault 116

per week, for four weeks, with trial-by-trial visual feedback.. During these 2-hour training

sessions, subjects had to complete forty five threshold measurements in a single condition.

For each of the three subjects comprising each experimental group, six threshold

estimates were obtained in each of the fifteen experimental conditions on the week before

training, the week in the middle of the training period, as well as the week after the last

training session. Each threshold estimate was obtained using a three-interval, two alternative,

forced-choice procedure without visual feedback; a two-down one-up adaptive rule tracked

the 70.7% correct point on the psychometric function (Levitt, 1971). Differences in frequency

(∆F), fundamental frequency (∆F0) or modulation rate (∆Fm) between the standard and signal

stimuli were increased or decreased by a factor of 2 until the fourth turnpoint and by √2

thereafter. The procedure stopped after 16 turnpoints on the psychophysical staircase. The

threshold was estimated as the geometric mean of the last twelve turnpoints, expressed as a

percentage of the nominal frequency, F0 or modulation rate.

Apparatus

The stimuli were output using a 16-bit digital-to-analog converter (TDT DA1) at a

sampling rate of 44.1 kHz. The pink-noise masker was played out using the CD-Rom drive of

a host computer. The signals and noise background were independently low-pass filtered at 15

kHz (TDT FT6-2, attenuation more than 60 dB at 1.15 times the corner frequency)and

attenuated (TDT PA4). Finally, they were summed (TDT SM3) and led to the right or left

earpiece of a Sennheiser HD465 headphone mounted in a 25125 cushion via a headphone

preamplifier (TDT HBC). Stimulus characteristics were controlled using an HP3561A signal

analyzer. Subjects were comfortably seated in a sound-treated booth.

RESULTS

Grimault 117

Figure 1. Pre-training discrimination thresholds. The upper left panel shows the relative

frequency discrimination limens (expressed as DLF/F in percent) as a function of the nominal

test frequency (F). The upper right panel shows the relative amplitude-modulation rate

discrimination limens (expressed as DLFm/Fm in percent) in the three different frequency

region and nominal modulation rate conditions. The lower right panel shows the relative

fundamental-frequency discrimination limens (expressed as DLF0/F0 in percent) in the three

different frequency region and nominal fundamental frequency conditions sorted in order of

decreasing resolvability, as estimated using a resolvability index detailed in a previous paper

(Grimault et al., 2000). The error bars represent the standard deviation around the geometric

Grimault 118

means across subjects.

Figure 1 represents the relative DLFs, DLFms, and DLF0s measured on the pre-

training session. The DLFs, which are shown in the upper left panel, were found to decrease

significantly with increasing test frequency [F(2,22)=253.38, p<0.001], being around 3.15%

on average at 88 Hz, 0.75% at 250 Hz, and 0.32% at 1605 Hz. The DLFms, which are

represented in the upper right panel, were found to vary little across frequency regions and

modulation rates. On some occasions - i.e. on some runs and in some conditions -, some

subjects failed to perform the AMRD task correctly, even with the very large initial difference

in AM rate (80%) used in the adaptive procedure. The DLF0s are shown in the lower left

panel. The different test conditions are shown on the abscissa, by decreasing order of

resolvability of the harmonics. According to Shackleton and Carlyon's (1994) definition of

resolvability, the three leftmost data points correspond to resolved conditions while the three

rightmost correspond to unresolved conditions. DLF0s were found to vary significantly across

both frequency regions [F(2,8)=136.18, p<0.001] and F0s [F(1,4)=366.47, p<0.001].

Furthermore, an interaction was observed between these two factors [F(2,8)=20.36, p<0.001].

During this pre-training session, DLFs were found to decrease significantly across runs,

following a linear trend [F(1,11)=10.23, p<0.01]; no such significant effect was noted for

DLFms and DLF0s.

Grimault 119

Figure 2. Mean pure-tone frequency discrimination and amplitude-modulation discrimination

thresholds measured in the different training groups as a function of block number.The

unconnected data points respresent data from (pre-training, intermediate, and post-training)

test sessions and correspond to geometric means over 6 threshold estimates. The connected

points represent to data from training sessions and correspond to the geometric means of 30

consecutive threshold estimates. The empty symbols correspond to frequency discrimination

limens. The filled symbols correspond to amplitude-modulation rate discrimination limens.

The error bars represent the standard deviations around the geometric means across subjects.

Figure 2 represents the evolution of perceptual thresholds across the different test

sessions. These thresholds are expressed as percentages of the nominal test frequency (pure-

tone frequency for DLFs or modulation frequency for DLFms). The unconnected data points

correspond to thresholds measured during the pre-training, intermediate, and post-training test

sessions. These results were analyzed using a repeated-measures analysis of variance

(RMANOVA) with the log-transformed relative thresholds as dependent variable and the

Grimault 120

training condition and block number as factors. Both factors had a significant main effect

[F(4,10)=71.04, p<0.001 for training condition and F(17,170)=4.149, p<0.001] for block

number and they interacted significantly [F(68,170)=1.79, p=0.001]. Subsequent ANOVAs

performed on the data corresponding to a given training task (FD or AMRD) revealed

significant effects of nominal frequency [F(1,4)=43.45, p<0.005] and block number

[F(17,68)=7.02, p<0.001], and an interaction [F(17,68)=5.36, p<0.001] for AMRD; for FD,

significant differences were observed across conditions [F(2,6)=18.48, p<0.005], but not

across blocks [F(17,102)=1.34, p=0.18], and no interaction between the two factors was

obtained either [F(34,102)=1.18, p=0.25]. In order to get further insight into these results, one-

way ANOVAs with the block number as factor were performed on the data from the five

different training conditions independently. The only condition in which a significant main

effect of block number was obtained was AMRD at 88 Hz [F(17,34)=7.93, p<0.001]. The

block number factor failed to produce a statistically significant effect in the FD 1605 Hz

condition [F(17,34)=1.72, p=0.087]. In all other conditions, the block number factor did not

produce even a trend. When looking at the individual data of the different subjects however,

thresholds showed a significant linear decrease with increasing block number in three subjects

out of three in the AMRD 88 Hz condition, and two subjects out of three in all other

conditions. In the FD 88 Hz condition, one subject showed a significant linear increase in

thresholds with increasing block number.

Grimault 121

Figure 3. Variations in discrimination thresholds during the first learning period. These

variations were computed by dividing the thresholds measured on the pre-training test session

by the thresholds measured on the intermediate test session in the same test condition.

Therefore, thresholds improvement are indicated by values larger than 1. The empty symbols

represent the data from all subjects trained in frequency discrimination averaged together.

The filled symbols represent the data from all subjects trained in amplitude modulation rate

discrimination averaged together. The panels are arranged in the same way as in Figure 1,

with frequency-discrimination data shown in the upper left panel, amplitude modulation rate

discrimination data shown in the upper right panel, and fundamental-frequency

Grimault 122

discrimination data shown in the lower left panel. The smaller, lower right panel represents

the average variation in threshold for the resolved and unresolved harmonics conditions from

the fundamental-frequency discrimination tests. The error bars represent the standard error

around the geometric means across subjects.

Figure 3 represents the variations in thresholds between the pre-training and the

intermediate test sessions. Data points located above 1 on the ordinate correspond to a

decrease in threshold (i.e. an increase in performance) and vice versa. The empty symbols

represent data from subjects trained in the AMRD task altogether (i.e. with data from subjects

trained at 88 and 250 Hz pooled together). The filled symbols correspond to data from

subjects trained in FD (at 88, 250, and 1605 Hz altogether). The upper left panel shows the

results of FD tests. DLFs were found to decrease significantly between the two test sessions

[F(2,13)=15.2, p<0.01] in both subject trained in FD and subjects trained in AMRD. No

significant difference in FD improvement was observed between the two groups

[F(1,13)=1.55, p=0.23]. The improvement in FD was approximately the same at the three test

frequencies [F(1,13)=1.47, p=0.25]. The upper right panel shows the results of AMRD tests.

DLFms were found to be significantly lower on the intermediate than on the pre-training

session for both subjects trained in FD [F(1,4)=10.83, p<0.05] and subjects trained in AMRD

[F(1,3)=13.88, p<0.05]. Subjects trained in AMRD improved more than those trained in FD

[F(1,7)=11.37, p<0.05]. The lower left panel shows the results from F0-discrimination tests.

DLFms were found to decrease significantly between the two test sessions [F(1,14)=49.43,

p<0.001] in both subject trained in FD and subjects trained in AMRD. A significant

interaction was observed between the frequency region and the training group factors

[F(2,26)=3.55, p<0.05]. A trend was observed for DLF0s to improve more in resolved than in

unresolved conditions for subjects trained in FD [F(1,8)=4.63, p=0.064]. This is better

illustrated by the smaller panel in the lower right corner of the figure. For subjects trained in

AMRD, neither the frequency region [F(2,10)=3.05, p=0.09] nor the F0 [F(1,5)=0.01, p=0.91]

were found to exert a statistically significant influence on the improvement in DLF0s.

Grimault 123

Figure 4. Variations in discrimination thresholds during the second learning period. These

variations were computed by dividing the thresholds measured on the intermediate test

session by the thresholds measured on the post-training test session in the same test condition.

Other characteristics of the figure are identical to those used in Figure 3.

Figure 4 shows the variations in perceptual thresholds between the intermediate and

the first post-training test sessions. As indicated by the fact that most data points lay close to

or below the dotted horizontal line, little improvement occurred between these two test

sessions and performance even decreased in some cases. For DLFs and AMRDs, no

statistically-significant variation in threshold was noted, and no significant variation was

Grimault 124

observed, neither between the groups, neither across test frequencies. The variations in DLF0s

were not found to be significantly different for subjects trained in FD and subjects trained in

AMRD [F(1,13)=0.46, p=0.51]. They were found to differ significantly across frequency

regions for subjects trained in AMRD [F(2,10)=4.82, p<0.05], being largest in the MID region

and smallest in the HIGH region.

Figure 5. Overall variations in discrimination thresholds. These variations were computed by

dividing the thresholds measured on the pre-training test session by the thresholds measured

on the post-training test session in the same test condition. Other characteristics of the figure

are identical to those used in Figures 3 and 4.

Grimault 125

The overall improvements in perceptual thresholds measured between the pre- and the

post-training test sessions are shown in Figure 5. For subjects trained in FD, the

improvements in DLFs at the three test frequencies were not found to differ significantly from

each other, but nearly did [F(2,16)=3.34, p=0.06]. For subjects trained in AMRD, no such

trend was observed. Regarding DLFms, although no clearly significant effect for subjects

trained in AMRD to improve more than those trained in FD was observed [F(1,7)=3.88,

p=0.09], the data points corresponding to the former almost always laid above those

corresponding to the latter. Regarding DLF0s, the improvements were overall found to differ

significantly across frequency regions [F(2,26)=4.16, p<0.05]. A trend for an interaction

between the frequency region, the F0, and the subject group (one group trained in AMRD and

one trained in FD) was noted [F(2,26)=2.83, p=0.08]. A contrast analysis performed on the

data of the two groups pooled together further revealed that the improvement in DLF0s was

overall larger in resolved than in unresolved conditions [F(1,14)=9.55, p<0.01]. Considering

the data from the subjects trained in FD independently, the improvement in FD was found to

vary across frequency regions [F(2,16)=4.50, p<0.05], and this variation was itself found to

vary across F0s [F(2,16)=4.12, p<0.05]. Contrast analysis further revealed a significant

difference between the improvements in DLF0s measured in resolved and in unresolved

conditions [F(1,8)=8.59, p<0.05]. This result is illustrated in the small lower right panel.

Grimault 126

Figure 6. Variations in pure-tone frequency discrimination thresholds between the pre-

training and intermediate test sessions in the different training groups. The training condition

is indicated within each panel, at the top (FD: frequency discrimination; AMRD: amplitude

modulation rates discrimination). Data for the three subjects groups trained in frequency

discrimination are shown on the upper row. Data for the two subjects groups trained in

amplitude-modulation rate discrimination are shown on the lower row. The nominal test

frequencies are indicated in abscissa.

Grimault 127

Figure 7. Variations in amplitude modulation rate discrimination thresholds between the pre-





amplitude-modulation rate discrimination are shown on the lower row. The frequency region

and nominal modulation rate conditions are indicated in abscissa. The error bars represent

the standard errors around the geometric means across subjects.

Grimault 128

Figure 8. Variations in fundamental frequency discrimination thresholds between the pre-





amplitude-modulation rate discrimination are shown on the lower row. The frequency region

and nominal fundamental frequencies are indicated in abscissa. The smaller insert panels

represent the average threshold variations in the resolved and unresolved harmonics

conditions. The error bars represent the standard errors around the geometric means across

subjects.

Figure 6 shows the variations in FD thresholds between the pre-training and

intermediate test sessions for the different training groups. No significant difference was

observed between the results of the different training groups [F(4,10)=1.00, p=0.45].

Grimault 129

Furthermore, the variations at the different test frequencies were not found to differ

significantly in any of the training groups. The variations in DLFms between the two same test

sessions are shown for the different training groups in Figure 7. Basically, no consistent

difference in improvement was observed across training groups and test conditions. Figure 8

represents the variations in DLF0s obtained between the pre-training and intermediate test

sessions in the different training groups. It is interesting to note that in subjects trained in

AMRD at a nominal modulation rate of 88 Hz, the largest improvement in DLF0s were

obtained in the 88-Hz F0 condition in the HIGH region, while in subjects trained in AMRD at

a nominal modulation rate of 250 Hz, the largest improvement in DLF0s was obtained in the

250-Hz F0 condition in the same frequency region. These effects, however, proved not to be

statistically significant using post-hoc t tests. Regarding the results of subjects trained in FD, a

significant interaction between the frequency region and F0 factors was observed for subjects

trained in FD at 1605 Hz [F(2,4)=9.93, p<0.05] and almost obtained for subjects trained at

250 Hz [F(2,4)=6.87, p=0.051]; no such trend was observed in subjects trained in FD at 88

Hz. Finally, as indicated by the insert panels, while larger improvements in DLF0s were

generally obtained in unresolved than in resolved conditions in subjects trained in AMRD, the

converse trend was observed in subjects trained in FD (at 250 and 1605 Hz, but not at 88 Hz).

Grimault 130

Figure 9. Variations in pure-tone frequency discrimination thresholds between the

intermediate and post-training test sessions in the different training groups. See the legend of

Figure 6 for details.

Grimault 131

Figure 10. Variations in amplitude modulation rate discrimination thresholds between the


Figure 7 for details

Grimault 132

Figure 11. Variations in fundamental frequency discrimination thresholds between the


Figure 8 for details.

Figures 9, 10, and 11 represent the variations in DLFs, AMRDs, and DLF0s which

occurred between the intermediate and post-training sessions in the different training groups.

These variations remained slight. The only noteworthy point is that for DLF0s, marked

decreases in performance were observed in those conditions in which a marked improvement

had been observed between the pre-training and intermediate test sessions : namely, at 88 Hz

in the HIGH region for subjects trained in AMRD at 88 Hz, and at 250 Hz in the HIGH region

for subjects trained in AMRD at 250 Hz.

DISCUSSION

Comparison of thresholds in the different tasks with literature data

Before considering how the performances in the different tasks were altered by

training, it is worthy to determine whether the pre-training thresholds measured in this study

Grimault 133

are in the range of thresholds measured in earlier studies. Moore (1973) reported DLFs of

about 0.25% at 250 Hz, and about 0.125% at 1000-2000 Hz.in trained subjects. The DLFs

measured before training in the present study were substantially larger (namely, around 0.8%

at 250 Hz and 0.3% at 1605 Hz). After training, in the subjects trained in the corresponding

conditions, the DLFs were around 0.25% at 250 Hz and around 0.15% at 1605 Hz, close to the

values reported by Moore (1973). The comparisons of the AMRTs obtained in this study with

results from the literature are made difficult by the fact that data on AMRD with stimuli

having the same characteristics as those used here are lacking in the literature. Using

broadband noise carriers, Formby (1985) obtained DLFms of about 5% for a modulation rate

of 80 Hz, and around 10% at 200 Hz. The DLFms measured before training in the present

study at 88 and 250 Hz were generally larger than this, laying between about 10 and 20%.

This difference may be due to training and/or to the narrower bandwidth of the stimuli used

here. After training, in the subjects trained in AMRD with a modulation rate of 88 Hz, the

mean threshold for this modulation rate was around 5%, and in subjects trained with a

modulation rate of 250 Hz, the mean threshold at 250 Hz was around 10%. With carriers

consisting of 1100-Hz wide bandpass noises around geometric center frequencies of about

900, 2200, and 3400 Hz, Hanna (1992) obtained DLFms of about 3-5 Hz (i.e. around 5%) for

modulation rates of 66 and 100 Hz; for a modulation rate of 224 Hz, the DLFm was around

30-100 Hz (i.e. around 13-40%). Using pure tone carriers with frequencies comprised between

500 and 4000 Hz, Lee (1994) found DLFms between about 1 and 2 Hz (i.e. around 1.25-

2.5%) for a nominal sinusoidal-AM rate of 80 Hz, and between about 1.5-8 Hz (i.e. between

about 0.5-2.5%) for nominal rates of 160 and 320 Hz. This finding of substantially lower

thresholds for tonal carriers than for noise carriers may be explained by the presence of

marked spectral cues for the former.

Using stimuli similar to those used here, Shackleton and Carlyon (1994) reported

DLF0s of about 1% for resolved harmonics and around 3% for unresolved harmonics,

irrespective of the frequency region (LOW, MID, and HIGH) and of the nominal F0 (88.4 or

250 Hz) used. Before training, we obtained DLF0s between about 0.4 and 0.9% for resolved

harmonics, and between about 2 and 7% for unresolved harmonics. After training in FD or

AMRD, DLF0s dropped to about 0.41% for resolved harmonics, and 3.39% for unresolved

harmonics.

Learning in FD

Grimault 134

Considering all carrier frequencies and the whole training period, the performances in

FD failed to improve significantly. The lack of improvement following the first training

session was in particular apparent for the 88-Hz nominal carrier frequency condition, for

which an almost flat learning curve was observed. However, this apparent lack of learning in

FD must be tempered by the fact that in all training conditions, performances improved

significantly in two subjects out of three. The apparent overall lack of learning at 88 Hz is in

fact due to one subject showing a significant decrease in performance over time, which

compensated for the increase in the two other subjects. Similarly, in the 250 and 1605 Hz

conditions, the average learning effect is attenuated by one listener having a flat curve. Based

on these observations, we may conclude in agreement with earlier publications (Demany,

1985), that, at least in certain subjects, performance in FD improves significantly with practice

over the course of several days and even weeks, depending on the intensiveness of the

training. The learning effects however seem to be largely variable across subjects, with some

listeners failing to improve significantly even when given the opportunity of prolonged,

repeated practice in the task. The factors of this intersubject variability in perceptual auditory

learning, and the reasons for the failure of certain subjects to benefit from practice in a given

task and with given stimuli, remain unclear.

Learning in AMRD

Performance in AMRD generally improved over time. This is, to our knowledge, the

first published demonstration that performance in AMRD significantly improves with

practice. The improvement proved to be significant overall in the 88-Hz nominal rate

condition, but not in the 250 Hz condition. However, in this latter condition, two subjects out

of three showed a significant improvement; the lack of significant learning effect in this

condition may thus be imputed to one listener who failed to improve in the task. At 88 Hz, all

three listeners improved significantly in the task. Whether the more marked improvement

observed at 88 than at 250 Hz reflects inter-subject variability or a genuine difference in

learning potential across modulation rates is a difficult question to settle based on the limited

number of subjects involved in the present study. This question should be addressed in future

studies using larger subject samples.

The finding that performance in AMRD can improve significantly over time with

practice is generally consistent with results from other studies in both animals (Schulze et al.,

1998) and humans (Fitzgerald and Wright, 2000). In the latter study, AMRD learning was

Grimault 135

found to be specific to modulation rate. Such a specificity is not reflected in the present

results. Possible explanations for such a difference include intersubject variability, differences

in the duration and intensiveness of the training (1 hour per day for only six days in the

Fitzgerald and Wright study versus six-hours per week for several weeks here), and

differences in the physical stimulus parameters (nominal modulation rates, characteristics of

the carrier, …). The comparison is however drastically limited by the absence of detailed

information on the methods and individual results in Fitzgerald and Wright (2000).

Transfer of learning between FD and F0D

Since none of the subjects were trained in F0D, any change in DLF0 between the

different test sessions is likely to result from training in either FD or AMRD. Although one

cannot completely exclude the possibility that part of the changes in DLF0s across the

different test sessions resulted from the practice provided by the training session themselves,

this possibility is made unlikely by the fact that, in an earlier study, we found no significant

change in DLF0s across test sessions separated by exactly the same number of weeks as here,

in control subjects who did not participate in training sessions. In contrast, the results obtained

in this study revealed a significant overall improvement in DLF0s, which suggests that the

subjects benefited from training in other tasks than F0D.

Specifically, for the subjects trained in FD, the improvement in DLF0 proved to differ

significantly across frequency regions and F0s. On the whole, it was significantly larger in the

three conditions which involved resolved harmonics than in the three conditions which

involved unresolved harmonics. A first possible interpretation of this observation is that F0D

improves generally less for unresolved than for resolved harmonics. This interpretation is

however made unlikely by the fact that, as will be discussed later on, no such difference was

observed for subjects trained in AMRD. A second possible interpretation is that FD training

benefited more to F0D for resolved than for unresolved harmonics. This interpretation is

consistent with the general hypothesis that pitch perception is subtended by different

mechanisms for resolved and unresolved harmonics (Schackleton and Carlyon, 1994; Carlyon

and, Schackleton, 1994; Plack and Carlyon, 1995; Carlyon, 1998; Grimault et al., submitted

article), and with the particular hypothesis made in Introduction, that the mechanisms used for

the F0 discrimination of resolved harmonics share a common basis with those involved in the

frequency discrimination of pure tones. This can be further interpreted in different ways. A

first possibility is that the encoding of pure-tone frequency and of the F0 of unresolved

Grimault 136

harmonics both rely on spectral cues, which are absent, or much less salient, when the

harmonics are unresolved. This possibility appears however unlikely in the light of previous

results (Moore, 1973), which all indicate that spectral models are unable to account for pure-

tone DLFs at low to medium frequencies (below about 4-5 kHz). A second possibility is that,

as suggested by several authors (e.g. Terhardt, 1974; Goldstein, 1973), the encoding of the F0

of resolved harmonics involves as a first stage the encoding of the frequency of each

harmonic, based on fine temporal structure information; training in FD with isolated

components would then improve this part of the F0 computation process, thereby contributing

to improve F0D performances.

In order to try and gain further insight into the transfer of learning between FD and

F0D, it is worth considering the differences in DLF0 improvement between subjects trained in

FD with a nominal frequency corresponding to the F0 and those trained with a frequency

which fell within the frequency region of the harmonics. The finding of a significant

interaction between the F0 and region factors in the two groups of subjects that were trained in

FD at 250 and 1605 Hz confirms the visual observation of larger improvements in DLF0s for

resolved than for unresolved harmonics in these two groups. It is interesting to note that the

groups in which this superiority of learning effects for resolved harmonics was obtained are

those trained in FD using a carrier frequency falling within the frequency range of the

harmonics that made up the complexes - namely, the LOW region for 250 Hz, and the MID

region for 250 Hz -.In contrast, at 88 Hz, the amount of transfer appeared to be almost

identical, independently of the frequency region, nominal F0, and resolvability status of the

harmonics. Based on current knowledge, we cannot offer a clear interpretation for this whole

pattern of results.

Transfer of learning between AMRD and F0D

The data obtained regarding the effect of training in AMRD on performance in F0D

failed to evidence any statistically-significant pattern of learning transfer. It is nevertheless

noteworthy that while the data points representing the amount of improvement in DLF0

between the pre-training and intermediate test sessions were generally higher for R than for U

harmonics for subjects trained in FD, the converse pattern was observed for subjects trained in

AMRD. This result, however, could be related to the observed trend for improvements in

DLF0s to be larger in the HIGH region. Although the harmonics were always unresolved in

this region, if really learning transfer depended on resolvability, then larger improvements

Grimault 137

should have also been observed in the MID 88 Hz condition. In view of this, our initial

hypothesis according to which training in AMRD benefits more to F0D when the harmonics

are unresolved than when they are resolved is not supported and, consequently, no further

argument can be provided to the general hypothesis that the F0 encoding of unresolved

harmonics shares more common underlying mechanisms with AMRD than the F0 encoding of

resolved harmonics. Another possible interpretation, which is consistent with the results of a

previous study on learning transfer in F0D (Grimault et al., submitted article), is that specific

training in AMRD, like training in F0D with unresolved harmonics, biased the subjects'

toward using a mechanism that is normally used preferentially for unresolved harmonics, but

that may also apply to resolved harmonics.

SUMMARY AND CONCLUSION

The main findings of this study can be summarized as follows:

- In agreement with earlier reports, subjects were found to improve significantly in FD.

Similar amounts of threshold improvement were obtained at all test frequencies, irrespective

of the training frequency. This agrees with the results of a previous study in which learning in

FD was found not to be frequency-specific.

- Subjects can improve significantly with practice in AMRD over the course of several days

and weeks. The learning transfers widely across nominal AM rates and frequency regions.

- Subjects trained in FD at 250 and 1605 Hz showed significantly larger improvements in F0D

when the complex tones were composed of R than when they were composed of U harmonics.

This result is consistent with the hypothesis that F0 encoding is subtended by different

mechanisms depending on the resolvability of the harmonics. However, no clear explanation

can be provided for the lack of differential benefit in DLF0 depending on resolvability for

subjects trained in FD at 88 Hz.

- Training in AMRD did not result in larger improvements in DLF0s for unresolved than for

resolved harmonics. This is contrary to the hypothesis that F0 encoding for unresolved

harmonics is specifically subtended by a process comparable to the discrimination of AMRD.

REFERENCES

ANSI (1969). ANSI S3.6-1969, Specifications for audiometers. (American National Standards

Institute, New York).

Bregman A.S., Liao C., Levitan R. (1990) Auditory grouping based on fundamental frequency

Grimault 138

and formant peak frequency. Can. J. Psychol. 44: 400-13.

Bregman AS. (1990) Auditory Scene Analysis: The perceptual Organization of Sound, MIT,

Cambridge, MA.

Burns E.M., Viemeister N.F. (1976) Nonspectral pitch. J. Acoust. Soc. Am. 60, 863-869.

Burns E.M., Viemeister N.F. (1981) Played-again SAM: Further observations on the pitch of

amplitude-modulated noise. J. Acoust. Soc. Am. 70, 1655-1660.

Carlyon, R.P., (1998). The effect of the resolvability on the encoding of fundamental

frequency by the auditory system.

Carlyon, R.P., and Shackleton, T.M. (1994). Comparing the fundamental frequencies of

resolved and unresolved harmonics: Evidence for two pitch mechanisms? J. Acoust.

Soc. Am. 95, 3541-3554.

Demany, L., (1985). Perceptual learning in frequency discrimination. J. Acoust. Soc. Am. 78,

1118-1120.

Fitzgerald M.B., Wright B.A. (2000) Specificity of learning for the discrimination of

sinusoidal amplitude-modulation rate. J. Acoust. Soc. Am. 107, 2916.

Fletcher H. (1940) Auditory Patterns. Rev. Mod. Phys. 12: 47-65.

Goldstein, J.L. (1973) An optimum processor theory for the central formation of the pitch of

complex tones. J. Acoust. Soc. Am., 54, 1496-1516.

Grimault N., Micheyl C., Carlyon R.P., Artaud P., Collet L. (2000) Influence of peripheral

resolvability on the perceptual segregation of harmonic complex tones differing in

fundamental frequency. Accepted in J. Acoust. Soc. Am.

Grimault N., Micheyl C., Carlyon R.P., Collet L. Evidence for two pitch encoding

mechanisms using a selective auditory training paradigm. Submitted article.

Hanna T. E. (1992) Discrimination and identification of modulation rate using a noise carrier.

J. Acoust. Soc. Am., 91, 2122-2128.

Hartmann W.M. (1988) Pitch perception and the segregation and integration of auditory

entities. In Auditory function. (eds Edelman, G.M., Gall, W.E. and Cowan, W.M.)

623-645 (Wiley, New York).

Levitt, H. (1971). Transformed up-down methods in psychoacoustics. J. Acoust. Soc. Am. 49,

467-477.

Meddis R., O’Mard L. J. (1997) A unitary model of pitch perception. J. Acoust. Soc. Am. 102,

1811-1820.

Grimault 139

Meddis, R. and Hewitt, M. (1991). Virtual pitch and phase sensitivity of a computer model of

the auditory periphery: I. pitch identification. J. Acoust. Soc. Am. 89, 2866-2882.

Meddis, R. and Hewitt, M. (1991). Virtual pitch and phase sensitivity of a computer model of

the auditory periphery: II. Phase sensitivity. J. Acoust. Soc. Am. 89, 2883-2894.

Moore, B.C.J. (1973)

Plack C.J., Carlyon R.P. (1995) Differences in frequency modulation detection and

fundamental frequency discrimination between complex tones consisting of

resolved and unresolved harmonics. J. Acoust. Soc. Am. 98, 1355-1364.

Schouten, J.F. (1940) The residue and the mechanism of hearing. Proc. K. Ned. Akad. Wet.,

43, 991-999.

Schouten, J.F. (1970) The residue revisited. In Frequency Analysis and periodicity perception

in hearing (ed. R. Plomp and G.F. Smoorenburg), Sijthoff, Leiden.

Schulze H., Shceich H., Langner G. (1998) Periodicity coding in the auditory cortex: what can

we learn from learning experiments?

Shackleton, T.M., and Carlyon, R.P. (1994). The role of resolved and unresolved harmonics in

pitch perception and frequency modulation discrimination. J. Acoust. Soc. Am. 95,

3529-3540.

Terhardt, E. (1972a) Zur Tonhöhenwahrnehmung von Klängen. I. Psychoakustische

Grundlagen. Acoustica, 26, 173-186.

Terhardt, E. (1972b) Zur Tonhöhenwahrnehmung von Klängen. II. Ein Funktionsschema.

Acoustica, 26, 187-199.

Terhardt, E. (1974) Pitch, consonance and harmony. J. Acoust. Soc. Am., 55, 1061-1069.

Thurlow,W.R. (1963) Perception of low auditory pitch: a multicue mediation theory. Psychol.

Rev., 70, 515-519.

Walliser, K. (1968) Zusammenwirken von Hüllkurvenperiod und Tonheit bei der Bildung des

periodentonhöhe, Doctoral dissertation. Technische Hochschule, München.

Walliser, K. (1969a) Zusammenhänge zwischen dem Schallreiz und der Periodentonhöle.

Acoustica, 21, 319-328.

Walliser, K. (1969b) Zur Unterschiedsschwelle der Periodentonhöhe. Acoustica, 21, 329-336.

Walliser, K. (1969c) Uber ein Funktionsschema für die bildung der eriodentonhöhe aus dem

Schallreiz. Kybernetik, 6, 65-72.

Whitfield, I.C. (1967) The auditory pathway, Arnold, London.

Grimault 140

Whitfield, I.C. (1970) Central nervous processing in relation to spatiotemporal discrimination

of auditory patterns. In Frequency Analysis and periodicity perception in hearing (ed.

R. Plomp and G.F. Smoorenburg), Sijthoff, Leiden.

Grimault 141

Chapitre 2:Implication et importance d'un codage performant de la hauteur sur l'analyse descènes en audition.

Grimault 142

Article 3: Influence of peripheral resolvability on the perceptual segregation of

harmonic complex tones differing in fundamental frequency.

Nicolas Grimault, Christophe Micheyl, Robert P. Carlyon, Patrick Arthaud et Lionel Collet

RESUME:

Nous avons déterminé dans les études 1 et 2 que la résolvabilité des sons complexes

harmoniques conditionnait l'utilisation de mécanismes neuronaux différents pour le codage de

la hauteur. Les deux études qui vous sont présentés ici étudient l'influence de la résolvabilité

sur l'organisation perceptive de séquences sonores constituées de sons complexes

harmoniques variant par leur fréquence fondamentale. En utilisant une méthode à stimuli

constant, nous avons déterminé les seuils de scission de séquences de type A-B-A... en

fonction de la différence entre les fréquences fondamentales de A et de B.

Dans la première de ces expériences, ces mesures ont été réalisées avec des sons complexes de

fréquences fondamentales nominales 88 Hz et 250 Hz, filtrés dans trois régions fréquencielles

-LOW (125-625 Hz), MID (1375-1875 Hz) et HIGH (3900-5400 Hz). Ces paramètres

permettent d'obtenir différentes conditions de résolvabilité indépendamment de la fréquence

fondamentale ou de la région de filtrage.

Les sujets sont parvenus à séparer A de B en région HIGH ou toutes les harmoniques sont

non-résolus. Cependant, les seuils mesurés dans cette condition sont dégradés en regard de

ceux mesurés dans les régions LOW et MID.

La seconde expérience indique que l'aptitude des sujets à séparer A de B en région HIGH n'est

pas dûe à l'utilisation d'éventuels produits de distorsion.

Influence of peripheral resolvability on the perceptualsegregation of harmonic complex tones differingin fundamental frequency

Nicolas GrimaultUMR CNRS 5020 Laboratoire ‘‘Neurosciences and Syste`mes Sensoriels,’’ Hoˆpital E. Herriot-Pavillon U,69437 Lyon Cedex 03, France and ENTENDRE Audioprothesists Group GIPA2, Pontchartrain,France

Christophe MicheylUMR CNRS 5020 Laboratoire ‘‘Neurosciences and Syste`mes Sensoriels,’’ Hoˆpital E. Herriot-Pavillon U,69437 Lyon Cedex 03, France

Robert P. CarlyonMRC-Cognition and Brain Sciences Unit 15, Chaucer Road, Cambridge CB22EF, England

Patrick ArthaudENTENDRE Audioprothesists Group GIPA2, Pontchartrain, France

Lionel ColletUMR CNRS 5020 Laboratoire ‘‘Neurosciences and Syste`mes Sensoriels,’’ Hoˆpital E. Herriot-Pavillon U,69437 Lyon Cedex 03, France

~Received 9 April 1999; revised 20 October 1999; accepted 31 March 2000!

Two experiments investigated the influence of resolvability on the perceptual organization ofsequential harmonic complexes differing in fundamental frequency (F0). Using a constant-stimulimethod, streaming scores for ABA-... sequences of harmonic complexes were measured as afunction of theF0 difference between the A and B tones. In the first experiment, streaming scoreswere measured for harmonic complexes having two different nominalF0s ~88 and 250 Hz! andfiltered in three frequency regions~a LOW, a MID, and a HIGH region with corner frequencies of125–625 Hz, 1375–1875 Hz, and 3900–5400 Hz, respectively!. Some streaming was observed inthe HIGH region~in which the harmonics were always unresolved! but streaming scores remainedgenerally lower than in the LOW and MID regions. The second experiment verified that thestreaming observed in the HIGH region was not due to the use of distortion products. Overall, theresults indicated that although streaming can occur in the absence of spectral cues, the degree ofresolvability of the harmonics has a significant influence. ©2000 Acoustical Society of America.@S0001-4966~00!02807-1#

PACS numbers: 43.66.Ba, 43.66.Fe, 43.66.Hg, 43.66.Mk@SPB#

zaTs

udp-

fo.g’’

tiocefrehted

no

g-

rge

-g.dds,

i.e.,y.rethe

er-s ofan-of,-fea-tion.

INTRODUCTION

An important phenomenon in the perceptual organition of sound sequences consists of stream segregation.refers to the fact that, under certain conditions, soundquences can give rise to the perception of two or more atory streams~Miller and Heise, 1950; Bregman and Cambell, 1971; van Noorden, 1975; Anstis and Saida, 1985!. Itcan be experienced each time one listens to music andlows a given instrument among the orchestral backgroundlaboratory conditions, it is traditionally investigated usinsimplified stimuli consisting of a repeating sequence of ‘‘Aand ‘‘B’’ tones ~e.g., van Noorden, 1975!; when the stimulusrepetition rate is rapid enough, or the frequency separabetween the ‘‘A’’ and ‘‘B’’ tones large enough, the sequenbreaks down into two perceptual streams. The minimumquency separation between ‘‘A’’ and ‘‘B’’ tones for whictwo streams can be heard when the listener is trying to atto one or the other subset of elements has been dubbe‘‘fission’’ boundary ~van Noorden, 1975!.

To date, the mechanisms underlying this phenome

263 J. Acoust. Soc. Am. 108 (1), July 2000 0001-4966/2000/10

-hise-i-

l-In

n

-

ndthe

n

remain largely unknown. While certain authors have sugested that streaming is a central phenomenon~Bregman,1990!, others have proposed that it is determined to a laextent by the functioning of peripheral mechanisms~Beau-vois and Meddis, 1996!. One question, in particular, concerns the role of peripheral auditory filtering in streaminHartmann and Johnson~1991! have proposed that beyondifferences in the physical characteristics of the sounstreaming is determined by parallel bandpass filtering,‘‘channeling’’ of incoming sounds by the auditory peripherBasically, sounds falling in different auditory channels aeasily segregated, while sounds occupying successivelysame auditory filters are less likely to be allocated to diffent auditory streams. This view is supported by the resultearly experiments. Computer models based on this ‘‘chneling’’ principle can account successfully for a varietyexperimental data on streaming~Beauvois and Meddis1996; McCabe and Denham, 1997!. On the other hand, however, some experimental results demonstrate that signaltures not related to channeling can affect stream segrega

2638(1)/263/9/$17.00 © 2000 Acoustical Society of America

ornt-tontin

cRoseaa

frndale

frebB

ues

ctth

olomatly,

lic

n-,amicrb

ther

afl

n

ed

neg

at

eje

amxon-

rtedithbil-theis

ar-nd

ces,ntcedug-

hose

as

chvedro-mesces

ec-

ityvan-by

allyg af-of

har-a-

notes,

re-andiondedm-

in-t totheea-

For example, it has been shown that differences in tempenvelope between sounds having the same frequency cocan promote streaming~Iverson, 1995! and that the segregation boundary can be shifted by temporal envelope fac~Singh and Bregman, 1997!. Therefore, at present, the exteto which streaming depends on peripheral filtering remaunclear.

The question of the influence of peripheral frequenresolution on streaming has been addressed recently byand Moore~1997!. Using repeating ABA sequences, theauthors measured the fission boundary in normal-hearinghearing-impaired subjects. Based on the notion that streing depends on peripheral frequency selectivity~Hartmannand Johnson, 1991; Beauvois and Meddis, 1996! and thatcochlear hearing impairment is associated with reducedquency selectivity, one prediction was that the fission bouary would be larger in hearing-impaired than in normhearing subjects. The results in normal-hearing listenindicated that the fission thresholds at different centerquencies were independent of the frequency differencetween the A and B tones when expressed in terms of ERa common measure of auditory-filter bandwidth; this argfor the hypothesis that streaming depends on frequencylectivity. However, the results in hearing-impaired subjerevealed a much less clear pattern, which did not allowhypothesis to be confirmed.

One problem with the use of pure tones to study the rof peripheral frequency resolution on streaming comes frthe fact that, for such tones, changes in frequencystrongly correlated with changes in pitch; consequenthese two factors cannot be disentangled. Complex tonesthe contrary, can vary by their fundamental frequency (F0,which largely determines virtual pitch! and/or their spectralocus, corresponding to the region in which the harmonare filtered. Early experiments by van Noorden~1975! indi-cated thatF0 played no significant role in streaming, in cotrast to the spectral locus of the harmonics. In particularwas shown that alternating complex tones that had the sF0 but that were composed of different sets of harmongave rise to two perceptual streams, one having a tinniebrighter quality than the other. However, as pointed outBregman~1990!, this experiment did not giveF0 a ‘‘fairchance’’ as a potential factor of stream segregation givenknown large influence of spectral differences. Later expments concerned with the respective influence ofF0 andspectral locus on streaming questioned this conclusionsuggested that these two factors both had a significant inence on streaming~Singh, 1987; Bregmanet al., 1990; Singhand Bregman, 1997!. For example, Bregman and Levita~cited in Bregman, 1990! and Bregmanet al. ~1990! foundan effect ofF0 on streaming in a study which measurstreaming as a function of differences inF0 and peak posi-tion for harmonic complexes with a formantlike spectral evelope. However, as in the studies by Singh and by Brman, they used resolved complexes, and so differences inF0would have covaried with differences in the excitation pterns of the complexes.

This question of the influence of resolvability on thstreaming of complex tones has recently become the ob

264 J. Acoust. Soc. Am., Vol. 108, No. 1, July 2000

alent

rs

s

yse

ndm-

e--

-rs-e-s,se-

sis

e

re,on

s

ite

sory

ei-

ndu-

--

-

ct

of increased interest. Very recently, Vliegen and Oxenh~1999! reported effects ofF0 on streaming using completones consisting entirely of unresolved harmonics. They ccluded that streaming can be mediated byF0 differences inthe absence of excitation-pattern cues, and, indeed, repothat streaming was not reduced relative to a condition wresolved harmonics. This absence of an effect of resolvaity is somewhat surprising because, as they pointed out,virtual pitch percept produced by unresolved harmonicsconsiderably weaker than that obtained with resolved hmonics ~Houtsma and Smurzynski, 1990; Shackleton aCarlyon, 1994!. In a more recent study, Vliegenet al. ~1999!showed that streaming induced by gross spectral differenwhich were produced by filtering the harmonics in differefrequency regions, was more potent than streaming induby F0 differences in the absence of spectral cues. They sgested that the difference between these results and tobtained by Vliegen and Oxenham~1999! might be due tothe fact that in that earlier study, stream segregation wadvantageous~i.e., leading to better performance!, whereasin the Vliegenet al. ~1999! study it was detrimental. Unfor-tunately, the latter study did not include a condition in whithe harmonics of the A and B complex tones were resolandfiltered in the same frequency region; therefore, the pposed explanation for the differences between the outcoof the two studies may have been confounded by differenin the cues available to the listeners to perform the tasks~i.e.,local spectral cues in the former study versus global sptrum or timbre cues in the latter!.

Indirect evidence for the fact that harmonic resolvabilinfluences streaming even when stream segregation is adtageous for the listeners has been provided in a studyMicheyl and Carlyon~1998!, and recently confirmed byGockelet al. ~1999!. These authors have shown that theF0discrimination of target complex tones can be substantiimpaired by preceding and following complex tones havinslightly differentF0, and that this temporal interference efect is significantly larger when all complexes are madeunresolved harmonics than when they contain resolvedmonics. They paralleled this finding to the informal observtion that in the unresolved condition, the listeners couldstream apart the target from the interfering complexwhereas they could in the resolved conditions.

The present study investigated further the effect ofsolvability on auditory stream segregation using a taskinstructions which encouraged the use of a neutral criterby the listeners—namely, whether the sequences sounmore like one or two streams. Stream segregation of coplex tones was measured as a function both ofF0 and of thefrequency region into which the tones were filtered. Theteraction between these two factors determined the extenwhich the components in each complex were resolved byperipheral auditory system in a way which has been msured in some detail~Shackleton and Carlyon, 1994!, therebyallowing us to examine the effects of resolvabilityper se,independently of either frequency region orF0.

264Grimault et al.: Resolvability and streaming

taanngsencinhed.th

se

foamndp

dth

r

en

mtioleam

erait

fllythd

nured

C3thsD-

ensige

for

enton

on,of-

ged

sh-toehis

nicition

epa-ith

88ass/nd375ondi-

s,en.all-nthelvedre-ceolds fil-

re-

iseich

I. GENERAL METHODS

A. Procedure

Stream segregation was measured using a consstimuli procedure. Following a paradigm devised by vNoorden ~1975!, subjects were presented with repeatiABA tone sequences, where ‘‘A’’ and ‘‘B’’ represent toneof either the same or a different frequency. Subjects winstructed to indicate whether, at the end of the 4-s sequethey heard either a single auditory stream with a galloprhythm or two independent streams. Subjects indicated tresponse by pressing ‘‘1’’ or ‘‘2’’ on a computer keyboarThe program did not accept responses until completion ofwhole sequence, and waited for the response before preing the next sequence. Bregman~1978! has shown thatstreaming is a cumulative process, i.e., that it takes timethe listener to decide that there are two independent streHe estimated the time constant of the process to be arous. Over longer durations, spontaneous reversals in thecept have been shown to occur~Anstis and Saida, 1985!.Accordingly, the stimulus duration was chosen in this stuso that streaming was nearing its maximum at the end ofstimulus sequence, just as subjects had to indicate theisponse.

Overall, five or six different frequency separations btween the A and B tones were presented, including adifference condition~control condition for false-alarm rate!.These different stimulus conditions were presented ten tieach, in random order. Tests began with a demonstrawherein the subjects could hear examples of sequencesing unambiguously to a single-stream or to a two-strepercept.

B. Material

Two testing systems were used. With the first, TuckDavis-Technologies-based system, signals were generdigitally in the time domain and output through a 16-bdigital-to-analog converter~TDT DA1! at a sampling rate o44.1 kHz. A pink-noise background was generated digitarecorded on CD, and played out continuously throughoutexperiment~Sony CDP-XE300!. The signals and backgrounnoise were low-pass filtered~TDT FT6-2 attenuation morethan 60 dB at 1.15 times the corner frequency! at 15 kHz.They were then led to two separate programmable attetors ~TDT PA4!. The outputs of the attenuators wesummed~TDT SM3! and led to a Sennheiser HD465 heaphone via a headphone buffer~TDT HBC!. The subject wascomfortably seated in a sound booth.

The second system consisted of an Interacoustics Aaudiometer. The same sound files as used with the otesting system were used. The masker was produced uthe same prerecorded CD, played from the computer CROM drive. Signals were output via a 16-bit digital-toanalog converter. The masker and signals were then attated and added using the AC30 audiometer before beingto one earpiece of Sennheiser HD465 headphones. Scharacteristics at the output of the two test systems wmonitored using an HP3561A signal analyzer.


nt-

ree,

gir

ent-

rs.4

er-

ye

re-

-o-

esn

ad-

-ted

,e

a-

-

0ering-

u-entnalre

II. EXPERIMENT 1

A. Rationale

The aim of experiment 1 was to test systematicallythe influence of resolvability on streaming elicited byF0differences. To that end, differences inF0 were varied inde-pendently of differences in spectral regions. Three differfrequency regions, defined by Shackleton and Carly~1994! and used in several subsequent studies~Carlyon andShackleton, 1994; Carlyon, 1996a, b; Micheyl and Carly1998!, were used here. A prediction inspired by the resultsMicheyl and Carlyon~1998! was that streaming should decrease with decreasing resolvability.

B. Subjects

Seven subjects took part in the experiment. They ranin age between 22 and 29 years (mean525.7, s.d.52.7).They all had normal hearing, i.e., absolute pure-tone threolds at or below 15 dB HL at octave frequencies from 2508000 Hz ~ANSI, 1969!. Four subjects were tested with thF0 of signal A set to 88 Hz; for the other three subjects, tF0 was set to 250 Hz.

C. Stimuli

The stimuli consisted of 4-s sequences of harmocomplex tones. Each sequence was formed by the repetof three 100-ms complex tones~A-B-A ! occurring immedi-ately after each other. The three-tone sequences were srated by a 100-ms silent interval. The tones were gated w20-ms raised-cosine ramps. TheF0 of signal A was fixed at88 or 250 Hz, whereas that of signal B varied betweenand 352 Hz in half-octave steps. The signals were bandpfiltered digitally. The digital filter had a flat top and 48 dBoct slopes. Depending on the condition, the filter lower aupper corner frequencies were set to 125 and 625 Hz, 1and 1875 Hz, or 3900 and 5400 Hz. These values correspto the LOW, MID, and HIGH frequency regions of a prevous study by Shackleton and Carlyon~1994!. They showedthat complexes with anF0 of 88 Hz were resolved in theLOW region and unresolved in the MID and HIGH regionwhereas those withF0s of 250 Hz were resolved in thLOW and MID regions and unresolved in the HIGH regio~Resolvability was defined as the number of harmonics fing within the 10-dB-down bandwidth of an auditory filter ithe center of each region; this was lower than two forresolved complexes and higher than 3.25 for the unresocomplexes. In addition, manipulating the phase of the unsolved but not of the resolved complexes could influenpitch!. The signal level was set to 40 dB above the threshin quiet measured using a sequence composed of signaltered in the MID region, withF0s of 88 and 250 Hz for Aand B, respectively. For convenience, this level will beferred to as 40 dB SL in the following.1 All signals werepresented in a pink-noise background. The level of this nowas set 10 dB above its absolute detection threshold, whwas measured beforehand in each subject.


he

. I

ene

reheein

-

e-

n

ltn

lau

om

ixid

nt

f

that

as

tionofDn in

the

if-tion

ith

hen

’sints ofys-

il-nt

s-e’ r.rrendthro

in

sted

D. Results

The results of experiment 1 obtained when theF0 of theA tones was 88 and 250 Hz are shown in the left- and righand panels of Fig. 1, respectively. These results indicatfirst sight that although differences inF0 are an importantfactor for streaming, there are other sources of variationparticular, overall higher percents of segregation~corre-sponding to larger percentages of ‘‘two streams’’ respons!were observed in the LOW region than in the MID regioand in the MID region than in the HIGH region. Also, thway in which streaming scores varied as a function of theF0separation between tones A and B appeared to be diffeacross regions. In order to assess the significance of tobservations, two-way repeated-measures ANOVAs wperformed separately on the data obtained at each nomF0.

The results revealed that atF0A588 Hz there was, inaddition to a significant effect of theF0 separation@F(4,12)594.55,p,0.001], a significant effect of the frequency region in which the stimuli were filtered@F(2,6)59.26, p,0.05]. There was no significant interaction btween these two factors@F(8,24)51.46, p50.22]. At F0A

5250 Hz, a significant effect of the frequency regio@F(2,4)56.98, p,0.05] and F0 separation @F(4,8)514.24,p50.001] was observed. In contrast to the resufor F0A588 Hz condition, a significant interaction betweethe frequency region andF0 separation@F(8,16)54.21, p50.007] was obtained.

In order to investigate the existence of quantitative retionships between the degree of resolvability of the stimand the streaming scores in the different conditions, we cputed a ‘‘combined resolvability index’’~CRI!. This index,the mathematical details of which are given in the Appenddepends on the interaction between auditory filter bandw~which covaries with the frequency region! and theF0s ofthe A and B sounds. It varies between 0~fully unresolved!and 1~fully resolved!. Table I indicates the CRI and perce

FIG. 1. Streaming scores as a function ofF0 separation in the LOW, MID,and HIGH regions. Left-hand panel: data obtained withF0A588 Hz. Right-hand panel: data obtained withF0A5250 Hz. The horizontal scale showthe distance in octaves between theF0s of A and B; negative values correspond to cases where theF0 of A was below that of B. The vertical scalrepresents streaming scores expressed in percent of ‘‘two stream’sponses; the larger the score, the better the streaming performanceparameter was the filtering region. Filled circles and continuous line cospond to data in the LOW region. Squares and dashed lines correspodata in the MID region. Circles and dotted lines correspond to data inHIGH region. The error bars show the standard error of the mean acsubjects.


t-at

n

s,

ntse

real

s

-li-

,th

of ‘‘two stream’’ judgments for different combinations oF0s between the A and B tones, for cases where theF0difference is constant and equal to half an octave. Noteboth the CRI and segregation rates are greatest at highF0sand in low-frequency regions. A strong correlation wfound between these two variables (r 50.95, p,0.005, N56), which does not appear to be due to eitherF0 or fre-quency region alone. For example, the CRI and segregascores are both higher in the third than in the fourth rowTable I even though the stimuli are all filtered into the MIregion; conversely, both scores are higher in the first thathe fifth row, even though theF0s of the stimuli are thesame. This general pattern of results is consistent withidea that resolvability, rather thanF0 or frequency regionper se, has an effect on streaming byF0 differences. Table IIshows the CRI and percents of segregation forF0 separa-tions of half an octave below and above 250 Hz in the dferent frequency regions. Here again, a strong correlawas obtained (r 50.93,p,0.01,N56).

E. Discussion

The results of this experiment are in agreement wthose of previous studies indicating that differences inF0can be used to stream harmonic complexes~Singh, 1987;Bregmanet al., 1990; Singh and Bregman, 1997!. In particu-lar, the present finding that streaming can occur even wspectral cues are not available to the listeners~as in theHIGH frequency region! supports Vliegen and Oxenham~1999! conclusion. However, our results differ from theirsshowing that streaming is enhanced when the componeneach complex are resolvable by the peripheral auditory stem.

Some other indirect evidence for an effect of resolvabity on streaming is provided by the results of two rece

e-The-to

ess

TABLE I. CRI and experimental percent of segregation forF0 separationsof 20.5 or 10.5 octave in the three different frequency regions testedexperiment 1.

F0A F0B

Frequencyregion CRI Percent

250 176 LOW 0.8953 86.6788 125 LOW 0.6426 40

250 176 MID 0.3527 4088 125 MID 0.0155 15

250 176 HIGH 0.0007 1088 125 HIGH 0 2.5

TABLE II. CRI and experimental percent of segregation forF0 separationsof 60.5 octave around 250 Hz in the three different frequency regions tein experiment 1.

F0A F0B

Frequencyregion CRI Percent

250 352 LOW 0.9458 86.67250 176 LOW 0.8953 86.67250 352 MID 0.5912 70250 176 MID 0.3527 40250 352 HIGH 0.0265 43.33250 176 HIGH 0.0007 10


e--p

err

dicte

thetto

orspaviGth

he

neaur

na

ent

leon-arinbt

s-eg

to

bodi

0

u

dun-aveolv-

d-he

areuesheyrs,

ta-buts

canedhat-

o aicaloryomey toThiseri-

of

ualionom-out

ub-ear;di-

uesisentgh-sultspo-fre-

esveler-and

s at

ntalondplexit at

studies~Micheyl and Carlyon, 1998; Gockelet al., 1999!,which revealed that, in the LOW and MID frequency rgions, theF0 discrimination of a harmonic complex is impaired by preceding and succeeding complexes, i.e., temral ‘‘fringes,’’ having a similarF0, but not by fringes havinga widely differentF0. In contrast, in the HIGH region, wherall complexes were unresolved, interference effects occueven between fringes and targets differing widely inF0. In-formal observations made during the course of these stuindicated that the conditions in which interference effeoccurred corresponded to those in which the fringe-targfringe sequences could not be split into two streams;was, in particular, the case when the fringes and targets wfiltered in the same frequency region, were presented tosame ear, and had a similarF0. Thus, it was proposed thathe F0 of the target could not be encoded independentlythat of the fringes when it formed part of the same auditstream. Consequently, the finding that interference effectF0 discrimination occurred even for large target-fringe serations in the HIGH region was interpreted as indirect edence for the fact that streaming was less easy in this HIregion, unresolved condition. The present results supportinterpretation.

A possible reason for the different outcomes of tpresent study and that of Vliegen and Oxenham~1999!,which indicated no significant influence of resolvability ostream segregation, may come from the instructions givVliegen and Oxenham’s listeners were told to ‘‘try to heout tone B separately from tone A,’’ whereas our procedencouraged a more ‘‘neutral’’ criterion~whether the se-quence sounded more like one or two streams at the e!.The task of trying to hear two streams is different from thof trying to hold on to a coherent percept~van Noorden,1975; Bregman, 1990!, which the neutral criterion used hermay have encouraged the listeners to do. Also, the frequeseparation at the temporal coherence boundary—wherelistener is trying to hold on to the percept of a singstream—has been shown to be highly sensitive to the trepetition rate~van Noorden, 1975!. In fact, it has been suggested that the temporal coherence and the fission boundreflect different phenomena, the former indicating the poabove which the auditory system is forced to segregationautomatic primitive processes, while the second indicateslimit of the attention-based component of streaming~Breg-man, 1990!. Consequently, it is conceivable that stimulurelated factors, like repetition rate and resolvability, havlarger influence on streaming when listeners are not tryinhear-out two streams. However, there is no evidencepresent for the existence of an interaction between faclike repetition rate and resolvability.

Another possible reason for the apparent discrepancytween the results of Vliegen and Oxenham and thosetained here is that, in the present study and the preceones by Micheyl and Carlyon~1998! and Gockel et al.~1999!, a largerF0 range was used~88 to 250 Hz! than inthe study of Vliegen and Oxenham~100 to 189 Hz!. Wecomputed that while the minimum CRI in all studies is~corresponding to a fully unresolved condition!, the maxi-mum CRI is 0.31 in Vliegen and Oxenham’s study vers


o-

ed

esst-isre

he

fyin-

-His

n:re

dt

cyhe

e

iesty

he

atoatrs

e-b-ng

s

0.94 in ours~given that 1 corresponds to a fully resolvecondition!. Thus, the use of more extreme resolved andresolved harmonic conditions in the present study may hpromoted the emergence of significant influences of resability.

Three interpretations can be invoked to explain the fining that although harmonic resolvability influences tstreaming of complex tones differing inF0, streaming basedon F0 differences can occur even when the harmonicsunresolved. According to a first interpretation, spectral care not absolutely necessary for streaming to occur, but tcontribute to the phenomenon, together with other factonamely,F0 differences. According to a second interpretion, streaming does not depend directly on spectral cueson virtual pitchper se; the fact that streaming performanceare larger for resolved than for unresolved harmonicsthen be explained by the fact that the virtual pitch derivfrom resolved harmonics is generally more robust than tderived from unresolved harmonics~Houtsma and Smurzynski, 1990; Shackleton and Carlyon, 1994!. These two inter-pretations are considered further in Sec. IV. According tthird interpretation, although the components in the physstimulus could not be resolved by the peripheral auditsystem, distortion products were generated by the ear; sof these combination tones were low enough in frequencbe resolved and may thus have provided spectral cues.third interpretation was further tested in a second expment, described in the next section.

III. EXPERIMENT 2

A. Rationale

The results of experiment 1 indicate that sequencessounds differing by theirF0 can still be split into differentstreams by the auditory system, even when the individcomponents of the sounds fall in the same frequency regand are unresolved. Nevertheless, although the physical cponents of the sounds were unresolved, one may not rulethe possibility that distortion products corresponding to sharmonics of these components were generated by thethese combination tones, falling in a region where the autory filters were narrower, may have provided spectral cas to theF0 differences between the A and B stimuli. Thwould, in particular, be the case if an internal componcorresponding to the fundamental frequency of the hifrequency complex was generated by the ear. Recent resuggest that amplitude-modulated high-frequency comnents can give rise to a strong combination tone at thequency of the modulation~Wiegrebe and Patterson, 1999!.Earlier data in the literature indicate that combination tonproduced by two-tone complexes are audible when the leof the primaries is between about 40 and 70 dB SL on avage; however, there are large variations between subjectssome subjects can apparently detect combination toneprimary levels as low as 20 dB SL~Plomp, 1965!. Similarly,combination tones corresponding to the missing fundameof complexes composed of all harmonics between the secand the tenth can be detected when the level of the comis on average 57 dB SL, but some subjects could detect


ithea

velicraP2s,despTdB

dt.tbx

enen

e12

n

r4

seec

a1

eo

tenetten

e-dBvelthat

nre-adamthison-to-eri-ntlyourthatisethe

notes,

lv-ts.in-

nifi-

e inof. TherclesSLlevel

he

about 30 dB SL~Plomp, 1965!. On the basis of these data,cannot be completely excluded that some listeners cancombination tones when presented with a 40 dB SL hmonic complex, as was the case in experiment 1.

In their recent article, Vliegen and Oxenham~1999! es-timated that a pink-noise background with a spectrum leof 25 dB at 1 kHz ensured that the distortion products eited by their harmonic complexes were masked. The ovelevel of their complexes being fixed at 70 dB SPL, the Sper component in the passband varied between around 561, depending on theF0 and frequency region tested. Thuat 1 kHz, the component level was between 27 and 36above the level of the noise. In experiment 1 of the presstudy, the overall SPL of the stimuli in the MID region waestimated to be around 52 dB SPL and the SPL per comnent in the passband varied between about 44 and 49.estimated spectrum level of the noise at 1 kHz was 9.71Thus at this frequency, the component level was betweenand 39 dB above the noise level, and it cannot be concluthat distortion products were inaudible in that experimen

Consequently, we performed a second experimenwhich we first reduced the signal level by 10 dB, theremaking the signal-to-noise ratio 10 dB smaller than in eperiment 1, and similar to that used by Vliegen and Oxham~1999!. Then, keeping this new signal-to-noise ratio, wran a second condition in which we increased both the sigand noise levels by 20 dB, which were then comparablethose used in Vliegen and Oxenham~1999!.

B. Subjects

Four subjects with normal hearing~thresholds less than15 dB HL at conventional audiometric frequencies betwe250 and 8000 Hz! who all had taken part in experimentparticipated to experiment 2. They were aged betweenand 29 years.

C. Stimuli

The stimuli were the same as those used in experime~open circles! in the HIGH frequency region condition withF0A588, except for a change in level. Whereas in expement 1 the signal and pink-noise background levels wereand 10 dB SL, respectively, in this experiment they wereeither to 30 and 10 dB SL, or to 50 and 30 dB SL, resptively.

D. Results

The streaming scores obtained at the two test levelsshown in Fig. 2, along with the results from experiment~HIGH region, F0A588 Hz). The data in these threconditions—the two conditions of experiment 2 plus thatexperiment 1—were analyzed using a two-way repeameasures ANOVA. As in the previous experiment, a stroeffect of F0 separation on streaming was observ@F(4,12)573.32, p,0.001], but no statistically significandifference was found between the three conditions tes@F(2,6)51.61,p50.28]. No significant interaction betweecondition and F0 separation was noted either@F(8,24)50.64,p50.74].


arr-

l-ll

Lor

Bnt

o-he.

35ed

iny--

alto

n

2

t 1

i-0t-

re

fd-gd

d

E. Discussion

The fact that the streaming scores for the HIGH frquency complex were not significantly reduced by a 10-decrease in signal-to-noise ratio even when the signal lewas raised to 50 dB SL argues against the hypothesisdistortion products are necessary for the streaming of usolved, high-frequency harmonics. This outcome is in broagreement with the recent findings of Vliegen and Oxenh~1999!. The agreement between their results and ours onpoint is further supported by the fact that in the second cdition of our experiment 2, the signal levels and signal-noise ratios were comparable to those used in their expment 1a and, yet, streaming scores were not significadifferent from those measured at the lower levels used inexperiments 1 and 2. In more general terms, the findinga 20-dB increase in signal level with the same signal-to-noratio had no significant effect on streaming suggests thatsignal level, independently of the signal-to-noise ratio, isan important factor in the streaming of harmonic complexat least over the 30 to 50 dB SL range.

IV. SUMMARY AND CONCLUSION

Experiment 1 compared streaming in different resoability conditions in the same, normal-hearing subjecStreaming scores were found to decrease overall withcreasing frequency region, being in some instances sig

FIG. 2. Streaming scores as a function ofF0 difference in three differentlevel conditions in the HIGH region. The abscissa shows the distancoctave between theF0s of A and B. The ordinate represents the percentsegregation; the larger the score, the better the streaming performanceparameter was the presentation level of the signal and noise. Filled ciand continuous line correspond to a 30 dB SL signal level and 10 dBmasker level. Squares and dashed line correspond to a 50 dB SL signaland 30 dB SL masker level~same signal-to-noise ratio!. Circles and dottedline correspond to data from the first experiment~40 dB SL signal level and10 dB SL masker level!, replotted for comparison. The error bars show tstandard error of the mean across subjects.


-toexndtsthivoneger-esin

hasiga

olvthrigeu-

rehaifiedh

fourretceth

n

tef

ipp

teoe

urwt

ina

-

ise

ourd onay

cleshow

cantly larger in the LOW and MID than in the HIGH frequency region. Furthermore, streaming scores appearedsignificantly correlated with a computed resolvability indtaking into account the combined resolvability of the A aB tones forming the test sequences. However, the resulthis experiment and those of experiment 2 also indicatedcompletely unresolved harmonic complexes could still grise to two perceptual auditory streams, even in conditiwhere subjects were unlikely to use combination tonThese results confirm the recent demonstration by Vlieand Oxenham~1999! that streaming of complex tones diffeing in F0 can occur on the sole basis of temporal cuHowever, they differ from the results of these authorsshowing that the degree of resolvability of the harmonicsa significant influence on streaming. This outcome is content with other recent results which suggest that streaminsubstantially weaker for unresolved than for resolved hmonics ~Micheyl and Carlyon, 1998; Gockelet al., 1999!.The present results further indicate that an effect of resability on stream segregation can be observed even iftask and instructions encourage the use of a neutral criteby listeners. Therefore, the explanation proposed by Vlieet al. ~1999! to explain the difference between their conclsions and those reached by Vliegen and Oxenham~1999!may not be valid.

Overall, the results of the different experiments psented here suggest that although resolvability of themonics is not absolutely necessary for streaming, it signcantly contributes to it. This contribution may be mediateither by spectral cues, which are associated to resolvedmonics, or by pitch strength, which is known to be largerresolved than for unresolved harmonics. The aim of futexperiments might consist of trying to disentangle thespective influence of these two factors by manipulating pistrength independently of spectral cues. However, becausthe strong relationship that exists between these factors,aim may well prove difficult to achieve.

ACKNOWLEDGMENTS

This research was supported by the French NatioCenter for Scientific Research~CNRS! and by the ENTEN-DRE hearing-aid dispensers group. The authors are grato Sid Bacon, Brian Roberts, and an anonymous reviewervery helpful comments on earlier versions of the manuscrJean-Christophe Be´ra is gratefully acknowledged for his helwith calibration.

APPENDIX

1. Apparatus interchangeability check

For practical reasons, not all subjects could be tesusing the same apparatus; two testing systems had tused. The preliminary experiment described below was pformed in order to check that the streaming scores measusing these two systems were not different. To do this,tested the same four subjects in the same conditions ontwo systems. Furthermore, in order to investigate withsubject variability, the stimuli were presented 30 timeseachF0 combination. TheF0 of the A sound was main


be

ofates

s.n

.

ss-isr-

-e

onn

-r--

ar-re-hofis

al

fulort.

dber-edehe-t

tained constant at 88 Hz; theF0 of B was varied between 88and 352 Hz. The signal level was 40 dB SL and the nolevel was 10 dB SL. The stimuli were filtered in the MIDfrequency region~1375–1875 Hz!.

Figure A1 shows the mean streaming scores of the fsubjects on the two testing systems. The results obtainethe two testing systems are largely similar. A two-w

FIG. A1. Mean streaming scores on the two testing systems. Filled cirshow the results obtained with the AC30-based system. Empty circles sresults obtained using the Tucker-Davis-Technologies-based system.

TABLE AI. CRI for different F0 separations (F0A588 Hz) in the LOW~a!, MID ~b!, and HIGH~c! region, ‘‘y’’ and ‘‘n’’ indicate that harmonicswere or were not resolved according to Shackleton and Carlyon’s~1994!definition.

F0A588 HzF0B CRI y/n

~a! LOW region62 0.4097 y88 0.4097 y

125 0.6426 y176 0.8 y250 0.8953 y352 0.9457 y

~b! MID region62 0.0002 n88 0.0002 n

125 0.0155 ?176 0.1221 ?250 0.3527 y352 0.5912 y

~c! HIGH region62 0 n88 0 n

125 0 n176 0 n250 0.0007 n352 0.0265 ?


o

e-bat

aib

tn

md-re

g

-

er

fre

lvit

uefa

es

exhehence

la.i-

hern theestionifierolutethe

oxi-ck-nt of

Am.

a-

-xp.

hol.

xoc.

l-m.

itch

’ J.

-

oc.

ts

repeated-measure ANOVA indicated a significant effectthe F0 difference@F(4,12)5169.08,p,0.001] on stream-ing but no difference between the two systems@F(1,3)50.03, p50.87]. The results of this experiment also rvealed that the streaming percentages estimated on theof 30 presentations were very close to those estimated onbasis of only ten presentations; using a Mann-Whitney pwise comparison statistical test, the two were found not tosignificantly different. In view of this small within-subjecvariability, we chose to restrict the number of presentatioof each stimulus to ten in the actual experiments.

2. The combined resolvability index

This index was obtained by computing the average nuber of harmonics falling in the 10-dB auditory-filter banwidth whose center frequencies fall within the corner fquencies of the considered frequency region~LOW, MID,HIGH!. The resulting number was then transformed throua Gaussian function so that it was bounded between 0~fullyunresolved! and 1~fully resolved!. The formula used to compute the resolvability index is given below:

RI5exp$2~( f lf [email protected]•ERB~ f !/F0#/~ f u2 f l !!2/2%,

where fu and fl correspond to the upper and lower cornfrequencies, respectively, of the considered frequencygion, F0 corresponds to theF0 of the complex and ERB(f )is the equivalent rectangular bandwidth at the centerquencyf, as defined in Glasberg and Moore~1990!.

A complex was considered to be resolved if its resoability index was greater than 0.135 and unresolved ifresolvability index was smaller than 0.005; these two valcorrespond respectively to mean numbers of harmonics

TABLE AII. CRI for different F0 separations (F0A5250 Hz) in the LOW~a!, MID ~b!, and HIGH~c! regions. ‘‘y’’ and ‘‘n’’ indicate that harmonicswere or were not resolved according to Shackleton and Carlyon’s~1994!definition.

F0A5250 HzF0B CRI y/n

~a! LOW region62 0.8953 y88 0.8953 y

125 0.8953 y176 0.8953 y250 0.8953 y352 0.9458 y

~b! MID region62 0.3527 y88 0.3527 y

125 0.3527 y176 0.3527 y250 0.3527 y352 0.5912 y

~c! HIGH region62 0.0007 n88 0.0007 n

125 0.0007 n176 0.0007 n250 0.0007 n352 0.0265 ?


f

sisher-e

s

-

-

h

re-

-

-ssll-

ing in the auditory-filter bandwidth of 2 and 3.25~Shackle-ton and Carlyon, 1994!. Furthermore, because sequenccomprising A and B tones having differentF0s were used inthis study, we computed a combined resolvability ind~CRI!. This index was computed as the maximum of tresolvability index of the A and B complexes comprising tsequence. The combined resolvability index of a sequeA-B-A is then given by

CRI5MAX FexpH 2S ( f lf [email protected]•ERB~ f !/F0A#

~ f u2 f l ! D 2Y 2J ;

expH S ( f lf [email protected]•ERB~ f !/F0B#

~ f u2 f l ! D 2Y 2J G ,where all symbols are the same as in the previous formu

Tables AI and AII show the CRI in each of the condtions tested in this study.

1All stimulus levels used in this study were specified in terms of SLs ratthan SPLs. Nevertheless, some information regarding the SPLs used istudy could be obtaineda posteriori. The Sennheiser HD465 headphonused in the study were calibrated using a Zwislocki coupler in combinawith a 0.5-in. BK1433 condenser microphone and a BK2610 preamplfeeding an HP35665A signal analyzer. Based on the measured absthresholds of one of the normal-hearing listeners who had taken part inexperiment, the level of the 40-dB SL signal was estimated to be apprmately 52 dB SPL. The spectrum level of the 10-dB SL pink noise baground was measured to be about 41 dB below the level per componethe harmonic at 1500 Hz in this listener.

ANSI ~1969!. ANSI S3.6-1969,Specifications for Audiometers~AmericanNational Standards Institute, New York!.

Anstis, S., and Saida, S.~1985!. ‘‘Adaptation to auditory streaming offrequency-modulated tones,’’ Percept. Psychophys.11, 257–271.

Beauvois, M. W., and Meddis, R.~1996!. ‘‘Computer simulation of auditorystream segregation in alternating-tone sequences,’’ J. Acoust. Soc.99, 2270–2280.

Bregman, A. S.~1978!. ‘‘Auditory streaming is cumulative,’’ J. Exp. Psy-chol. 4, 380–387.

Bregman, A. S.~1990!. Auditory Scene Analysis: The Perceptual Organiztion of Sound~MIT, Cambridge, MA!.

Bregman, A. S., and Campbell, J.~1971!. ‘‘Primary auditory stream segregation and the perception of order in rapid sequences of tones,’’ J. EPsychol.89, 244–249.

Bregman, A. S., Liao, C., and Levitan, R.~1990!. ‘‘Auditory grouping basedon fundamental frequency and formant peak frequency,’’ Can. J. Psyc44, 400–413.

Carlyon, R. P.~1996a!. ‘‘Encoding the fundamental frequency of a completone in the presence of a spectrally overlapping masker,’’ J. Acoust. SAm. 99, 517–524.

Carlyon, R. P.~1996b!. ‘‘Masker asynchrony impairs the fundamentafrequency discrimination of unresolved harmonics,’’ J. Acoust. Soc. A99, 525–533.

Carlyon, R. P., and Shackleton, T. M.~1994!. ‘‘Comparing the fundamentalfrequencies of resolved and unresolved harmonics: Evidence for two pmechanisms?,’’ J. Acoust. Soc. Am.95, 3541–3554.

Glasberg, B. R., and Moore, B. C. J.~1990!. ‘‘Derivation of auditory filtershapes from notched-noise data,’’ Hear. Res.47, 103–198.

Gockel, H., Caryon, R. P., and Micheyl, C.~1999!. ‘‘Context dependence offundamental frequency discrimination: Lateralized temporal fringes,’Acoust. Soc. Am.106, 3553–3563.

Hartmann, W. M., and Johnson, D.~1991!. ‘‘Stream segregation and peripheral channeling,’’ Mus. Perc.9, 155–184.

Houtsma, A. J. M., and Smurzynski, J.~1990!. ‘‘Pitch identification anddiscrimination for complex tones with many harmonics,’’ J. Acoust. SAm. 87, 304–310.

Iverson, P.~1995!. ‘‘Auditory stream segregation by musical timbre: Effecof static and dynamic acoustic attributes,’’ J. Exp. Psychol.21, 751–763.


.

u

di

es:

,’’ J.

foge-

n

ured

McCabe, S. L., and Denham, M. J.~1997!. ‘‘A model of auditory stream-ing,’’ J. Acoust. Soc. Am.101, 1611–1621.

Micheyl, C., and Carlyon, R. P.~1998!. ‘‘Effect of temporal fringes onfundamental-frequency discrimination,’’ J. Acoust. Soc. Am.104, 3006–3018.

Miller, G. A., and Heise, G. A.~1950!. ‘‘The trill threshold,’’ J. Acoust.Soc. Am.22, 637–638.

Plomp, R. ~1965!. ‘‘Detectability threshold for combination tones,’’ JAcoust. Soc. Am.37, 1110–1123.

Rose, M. M., and Moore, B. C. J.~1997!. ‘‘Perceptual grouping of tonesequences by normally hearing and hearing-impaired listeners,’’ J. AcoSoc. Am.102, 1768–1778.

Shackleton, T. M., and Carlyon, R. P.~1994!. ‘‘The role of resolved andunresolved harmonics in pitch perception and frequency modulationcrimination,’’ J. Acoust. Soc. Am.95, 3529–3540.


st.

s-

Singh, P. G.~1987!. ‘‘Perceptual organization of complex-tones sequencA tradeoff between pitch and timbre?’’ J. Acoust. Soc. Am.82, 886–899.

Singh, P. G., and Bregman, A.~1997!. ‘‘The influence of different timbreattributes on the perceptual segregation of complex-tone sequencesAcoust. Soc. Am.102, 1943–1952.

van Noorden L. P. A. S.~1975!. ‘‘Temporal coherence in the perception otone sequences,’’ unpublished doctoral dissertation, Technische Hschool Eindhovern, Eindhoven, The Netherlands.

Vliegen, J., and Oxenham, A. J.~1999!. ‘‘Sequential stream segregation ithe absence of spectral cues,’’ J. Acoust. Soc. Am.105, 339–346.

Vliegen, J., Moore, B. C. J., and Oxenham, A. J.~1999!. ‘‘The role ofspectral and periodicity cues in auditory stream segregation, measusing a temporal discrimination task,’’ J. Acoust. Soc. Am.106, 938–945.

Wiegrebe, L., and Patterson, R. D.~1999!. ‘‘Quantifying the distortion prod-ucts generated by amplitude-modulated noise,’’ J. Acoust. Soc. Am.106,2709–2718.


Grimault 153

Article 4: Perceptual auditory stream segregation of sequences of complex sounds in

subjects with normal and impaired hearing

Nicolas Grimault, Christophe Micheyl, Robert P. Carlyon, Patrick Arthaud et Lionel Collet

RESUME:

Cette expérience quantifie l'influence néfaste de l'âge et d'une perte auditive sur notre faculté à

séparer des flux auditifs. La même procédure que dans l'expérience précédente est utilisée pour

mesurer la capacité de sujets jeunes et normo-entendants (groupe 1), malentendants et âgés

(groupe 2) ou seulement âgés (groupe 3) à organiser une séquence A-B-A... de sons complexes en

deux flux distincts sur la base d'une différence de fréquence fondamentale entre A et B. Etant

donné que l'âge et la perte auditive dégradent la résolvabilité des stimuli, cette étude, dans la

continuité de l'étude 3, tente d'objectiver les difficultés spécifiques dont souffrent les personnes

âgées -malentendantes ou pas- pour organiser des scènes auditives. Lorsque la fréquence

fondamentale des signaux utilisés est suffisamment basse pour supprimer tout indice spectral

pour les sujets des trois groupes expérimentaux, tous les sujets de l'étude montrent des seuils de

scission similaires. Au contraire, dans les conditions de stimulations résolues pour les uns

(groupe 1) et non résolues pour les autres (groupes 2 et 3), les seuils de ségrégation sont

significativement meilleurs pour les premiers. Ces résultats suggèrent qu'une perte de

résolvabilité diminue nos capacités à organiser une séquence A-B-A... en deux flux auditifs A-...

et B-.... Ils apportent ainsi des éléments d'explication au phénomène de "cocktail party".

Grimault 154

Perceptual auditory stream segregation of sequences of complex sounds in

subjects with normal and impaired hearing

Nicolas Grimaulta),b), Christophe Micheyl a), Robert P. Carlyon c),

Patrick Arthaud b) and Lionel Collet a)

a)UMR CNRS 5020 Laboratoire "Neurosciences & Systèmes Sensoriels"

Hôpital E. Herriot - Pavillon U, 69437 Lyon Cedex 03, France

b) ENTENDRE GIPA2, Pontchartrain. France.

c) MRC- Cognition and Brain Sciences Unit. 15, Chaucer Rd. Cambridge, CB2-2EF, England.

Running title: Auditory streaming and hearing loss.

Grimault 155

ABSTRACT

The influence of hearing loss and aging on auditory stream segregation was investigated by

comparing the perceptual organization of repeating ABA- sequences of harmonic complex

tones as a function of the difference in fundamental frequency (F0) between the A and B tones

in young normal-hearing subjects and in elderly subjects having either impaired or normal

hearing for their age. In conditions in which the F0s of the A and B complexes were so low

that the harmonics could not be individually resolved by the peripheral auditory system even in

the young normal-hearing subjects, those subjects showed similar stream segregation

performance to the elderly hearing-impaired subjects. In contrast, when the F0s of the tones

were high enough for the harmonics to be largely resolved at the auditory periphery in normal-

hearing subjects, but presumably unresolved in the elderly subjects, the former showed

significantly more stream segregation than the latter. These results, which cannot be

consistently explained in terms of age differences, suggest that auditory stream segregation is

adversely affected by reduced peripheral frequency selectivity of elderly individuals. This

finding has implications for the understanding of the listening difficulties experienced by

elderly individuals in cocktail-party situations.

KEY WORDS

Hearing impairment, aging, peripheral frequency resolution, auditory stream segregation,

fundamental frequency, complex tones.

Grimault 156

INTRODUCTION

Hearing-impaired and elderly people generally experience listening difficulties in

environments in which several sound sources interfere, a typical example of which consists of

the famous «cocktail party» situation (Cherry, 1953). In particular, they have a hard time

following the voice of a given speaker in the presence of other speakers or background sounds.

The fact that these difficulties are not systematically alleviated by the use of hearing aids, the

main function of which is to amplify external signals to a comfortable level, indicates that they

cannot be explained simply by reduced audibility (Plomp, 1978; Cox & Alexander, 1991). It

has been proposed that the hearing-in-noise difficulties of individuals with cochlear damages

could be explained by reductions in the frequency-resolving power of the cochlea (Florentine

et al., 1980; Glasberg & Moore, 1986; Tyler et al., 1982). This can be understood by

considering that the cochlea acts like a bank of parallel bandpass filters or channels which

partitions the spectrum of incoming sound into several frequency bands. When spectral

components fall in the bandwidth of the same peripheral auditory filter, they strongly interact

with each other. For example, the detection of a tone in noise is made difficult by those

components of the noise whose frequencies fall in the same peripheral auditory-filter

bandwidth (Fletcher, 1940). Consequently, the wider the auditory filters, the more likely it is

that the signal which the listener wishes to attend will be masked by remote frequency

components emanating from other sources, which the listeners wishes to ignore.

While the difficulty to hear out simultaneous signals may constitute an important

aspect of the reduced ability of hearing-impaired individuals to attend to some sounds in the

presence of other sounds, it may well not be the only one. Another possible factor is that the

widening of auditory filters due to cochlear damage also causes a reduction in the ability to

tease apart sound events which occur sequentially. It is known that under certain conditions,

successive sounds give rise to separate perceptual auditory "streams" (Bregman, 1990,

Bregman & Campbell, 1971, Bregman et al., 1990). This has been well demonstrated by early

experiments in which listeners where asked to describe how they perceived repeating

sequences of ABA tones - where A and B represent tones of a different frequency - (van

Noorden, 1975). It has been shown that when the frequency separation between the A and B

tones is small and the tempo is slow, listeners perceive a single melodic stream including both

Grimault 157

A and B tones and resembling a gallop. However, when the frequency separation between the

two tones is increased beyond a certain limit - known as the fission threshold -, the sequence

gives rise to two separate perceptual streams, one formed of the A tones and the second

formed by the B tones.

A possible explanation for this phenomenon is that, beyond a certain frequency

separation, the two tones successively excite well-separated peripheral auditory filters, and that

stimuli which are conveyed by different auditory channels can be assigned by the central

auditory system to separate streams, while stimuli occupying the same or overlapping

peripheral channels tend to be assigned to the same stream (Beauvois & Meddis, 1996;

McCabe, 1997; Hartmann, 1991). If this peripheral channeling view of streaming is correct,

then one may predict that the broadening of auditory filters in hearing-impaired subjects

should lead to an increase of the fission threshold - i.e. the frequency separation necessary for

the sound sequence to give rise to two separate streams -. This prediction was recently tested

in a study by Rose and Moore (1997). It was supported by the finding that, for normally

hearing subjects, fission thresholds increased with increasing overall frequency in a manner

consistent with the broadening of auditory filters. However, the results failed to show a

systematic decrease in auditory stream segregation in hearing-impaired subjects, as predicted

based on the hypothesis of a relationship between peripheral frequency selectivity and

streaming.

Further indications that peripheral resolvability may not be the primary factor

influencing streaming was obtained in a recent study by Vliegen and Oxenham (1999) in

which streaming was measured using sequences of harmonic complex tones differing by their

fundamental frequency (F0). The results of this study revealed that the percentage of "two

stream" responses increased with the F0 separation between the A and B tones in a very

similar way, irrespective of whether the complexes contained harmonics which were resolved

or unresolved by the peripheral auditory system; schematically, the harmonics are considered

to be resolved when they fall in different peripheral auditory filters, and to be unresolved

otherwise. The authors interpreted this observation as indicating that peripheral spectral cues

are not necessary for streaming. In a later study, however, Vliegen et al. (1999) showed that

spectral cues were dominant in inducing involuntary stream segregation. They explained the

Grimault 158

difference between the results of this and the previous study by the fact that with the tasks used

by Vliegen and Oxenham (1999) streaming was advantageous and encouraged, whereas with

those used by Vliegen et al. (1999), stream segregation led to worse performance. More recent

results in normal-hearing subjects in fact suggest that streaming is adversely affected by

reduced resolvability even when the task involves a neutral criterion (Grimault et al., 2000) -

i.e. when the streaming is neither advantageous nor detrimental. One interpretation of these

results is that there are local differences between the excitation patterns of the A and B tones

when they contain resolved harmonics, but that these differences are reduced or absent when

the harmonics are unresolved. Alternatively, it may be that stream segregation is influenced by

the perceived difference in pitch between the A and B tones, and that, although differences in

the fundamental frequency (F0) of unresolved harmonics do lead to changes in perceived

pitch, these changes are less discriminable than when the harmonics are resolved (Houtsma

and Smurzynski, 1988; Shackleton and Carlyon, 1994).

Therefore, although the question of the influence of peripheral frequency selectivity on

streaming is not completely settled, there appears to be on the whole increasing support for the

hypothesis that although streaming can occur based solely on temporal cues, spectral

resolution has a significant influence. If this is the case, then on the basis of data in the

literature indicating that frequency resolution decreases with hearing loss (Hoekstra & Ritsma,

1977; Florentine et al., 1980; Tyler et al., 1982; Moore, 1985) and aging (Patterson et al.,

1982, Sommers & Humes, 1993, Sommers & Gehr, 1998), one may predict that the stream

segregation of sequences of complex tones should be reduced in hearing-impaired and aged

individuals.

The present study aimed to test this prediction. Specifically, we measured stream

segregation using repeating ABA- sequences of harmonic complex tones as a function of the

F0 difference between the A and B tones, in young normal-hearing subjects and elderly

hearing individuals with or without hearing loss in addition to that caused by aging. Two

conditions were tested: in the first, the F0s of the A and B tones were so low that, in the

frequency region in which the complexes were filtered, the harmonics were not resolved at the

auditory periphery, even in the young normal-hearing subjects; in the second condition, the

F0s of the stimuli were large enough for the harmonics to be well resolved by the peripheral

Grimault 159

auditory system in the young normal-hearing subjects, but not in the elderly subjects, whether

or not they had hearing loss in addition to that cause by aging. Under the hypothesis that

streaming was determined by the peripheral resolvability of the harmonics, it was predicted

that in the first condition in which the harmonics were not fully resolved in any of the subject

groups tested, no difference in stream segregation should be obtained between the groups; in

the second condition, stream segregation should be larger in the young, normal-hearing

subjects group than in the elderly listeners. One might also expect the stream segregation to be

particularly reduced in listeners having an additional hearing loss, over and above that

naturally caused by aging.

MATERIAL AND METHODS

A. Subjects

Overall, 28 subjects took part in this experiment. They were divided into three groups.

A first group was composed of 7 normal-hearing subjects - i.e. pure-tone thresholds <= 10 dB

HL at octave frequencies between 250 and 8000 Hz - and no history of otologic disease. A

second group comprised 13 hearing-impaired subjects (aged between 61 and 86 years, mean

age= 70.9 years, SD= 8.68) with mild to moderate sensorineural hearing loss of various

etiologies - i.e. average loss at 500, 1000, and 2000 Hz was between 31.66 and 71.66 dB

(mean = 48.18, SD= 11.19) and average air-bone gap <= 5 dB at the same frequencies. A third

group comprised 5 elderly subjects (between 65 and 76-year old, mean age=68.8 years,

SD=4.2) having normal hearing for their age - i.e. their absolute pure-tone thresholds between

250 and 8000 Hz were within 10 dB of the reference data given in Davis (1995); average

hearing loss at 500, 1000, 2000 Hz was between 29.66 and 35.33 dB HL (mean=31.46,

SD=2.24). Auditory filter bandwidths in this latter group are approximately 50% wider than in

young listeners with normal hearing (Patterson et al., 1982). For convenience, these groups

will be referred to as "YNH", "EHI+", and "EHI" respectively.

B. Procedure

Grimault 160

Subjects were presented with repeating ABA- sequences, where "A" and "B" represent

harmonic complex tones of either the same or a different fundamental frequency. Subjects

were instructed whether, at the end of the 4-s sequence they heard either a single auditory

stream with a galloping rhythm or two independent streams. Subjects indicated their response

by pressing "1" or "2" on a computer keyboard. The program did not accept responses until

completion of the whole sequence and waited for the response before presenting the next

sequence. Overall, five or six different frequency separations between the A and B tones were

presented, including a no-difference condition (control condition for false-alarm rate). These

different stimulus conditions were presented 10 times each, in random order. Tests began with

a demonstration wherein the subjects could hear examples of sequences leading

unambiguously to a single-stream or to a two-stream precept.

Results were analyzed in the framework of the signal detection theory (Green & Swets,

1966; Snodgrass & Corwin, 1988). The number of "two-streams" responses given by the

subject in the case where the F0 of the A and B sounds was the same was taken as a "false-

alarm" rate used in order to estimate "streaming scores" - corresponding to the classical d’

discrimination index – uncontamined by differences in criterion differences between the

subject groups.

C. Material

Signals were output via a 16-bit digital-to-analog converter. The masker and signals

were then attenuated and added using a AC30 audiometer before being sent to one earpiece of

Sennheiser HD465 headphones. Signals were monitored using an HP3561A signal analyzer.

D. Stimuli

The stimuli consisted of 4-s long sequences of harmonic complex tones. Each sequence

was formed by the repetition of three 100-ms complex tones (A-B-A) occurring immediately

after each other. The three-tone sequences were separated by a 100-ms silent interval. The

tones were gated with 20-ms raised-cosine ramps. The F0 of signal A (F0A) was fixed at 88 or

250 Hz, whereas that of signal B varied between 88 and 352 Hz in half-octave steps. Normal-

hearing and impaired-hearing subjects were tested with F0A set to 88 or 250 Hz; elderly

Grimault 161

normal-hearing subjects could only be tested with F0A set to 250 Hz. The harmonic

complexes were generated by summation of 0°-deg phase sinusoids in the time domain. All

harmonics falling within a 1375-1875 Hz passband had a constant amplitude; below and above

these corner frequencies, the amplitude decreased by 48-dB/octave. The 1375-1875 Hz

passband corresponds to the so-called "MID" frequency region used in several previous studies

on the influence of resolvability on pitch perception (Shackleton & Carlyon, 1994; Micheyl &

Carlyon, 1998; Gockel et al., 1999). It has been demonstrated that in this region, harmonic

complexes with an F0 of 88 Hz are unresolved while harmonic complexes with an F0 of 250

Hz are resolved. In this context, resolvability is strictly defined as the number of harmonics

falling within the 10-dB-down bandwidth of an auditory filter in the center of the region; the

harmonics are considered to be resolved when this number is lower than 2, and unresolved

when it is higher than 3.25. Depending on the condition tested, the signal level was set to 30,

40, or 50 dB above the threshold in quiet for a stimulus sequence composed of A and B

complexes having F0s of 88 and 250, respectively. For convenience, these levels will be

referred to as 30, 40, and 50 dB sensation level (SL) in the following. A pink-noise

background was generated digitally, recorded on CD, and played out continuously throughout

the experiment. The level of the pink background noise was set 10 dB above its absolute

detection threshold, which was measured beforehand in each subject.

RESULTS

Grimault 162

FIG. 1: Streaming scores as a function of F0 difference in the MID region in normal-hearing

(YNH), hearing-impaired (IH) and elderly normal-hearing (EHI) listeners. Abscissas indicate

the F0 difference, in octave, between the A and B sounds. Ordinates represent the streaming

scores, expressed as d'. The black filled circles show the results in normal-hearing subjects,

the Dotted circles the results in elderly subjects and the Empty circles the results in hearing-

impaired subjects. The different panels correspond to different F0As and different levels. The

three upper panels correspond to conditions with F0A = 88 Hz ; the three lower panels to

conditions with F0A = 250 Hz. Signal level increases between 30 dB SL and 50 dB SL from left

to right.

Figure 1 shows d' scores obtained in the different conditions and groups. In the conditions

in which F0A was 88 Hz (upper-row panels), no substantial differences were noted between the

scores obtained by the YNH and ENH+ subjects - with the exception of one point at 50 dB SL.

Grimault 163

This was confirmed by the results of a three-way ANOVA (sound level X F0 difference X subject

group), which indicated no significant difference between the two subject groups. No effect of the

presentation level (SL) was observed either. Streaming was found to be influenced only by F0

separation (F(3,21)=37.24, p<0.0001). In the conditions in which F0A was 250 Hz (lower-row

panels), streaming scores were systematically greater in YNH than in EHI+ and EHI subjects. The

results of a three-way ANOVA revealed a significant difference between groups (F(2,9)=5.09,

p<0.05), and a significant main effect of both stimulus level (F(2,18)=5.86, p<0.05) and F0

separation (F(3,27)=13.17, p<0.0001). A significant interaction between subject group and

stimulus level was found (F(4,18)=4.09, p<0.05). Furthermore, when the EHI+ and EHI subjects

were pooled together as a single group they showed significantly less streaming than the young

normal listeners, as evidenced by a main effect of group (F(1,10)=5.85, p<0.05).

Some differences are apparent between the two elderly groups in the 250-Hz data shown in

bottom row of Fig. 1. In particular, it seems that at 30 and 50 dB SL, the EHI+ listeners show

more streaming than the EHI- listeners. This trend was not statistically significant. However,

because it is opposite to that predicted by our original hypothesis of a monotonic effect of

frequency resolution on streaming, one possible reason for it will be discussed briefly in the

following section.

DISCUSSION

The results of Shackleton and Carlyon (1994) and of Carlyon and Shackleton (1994)

showed that the F0s of resolved complexes were more discriminable than those of unresolved

complexes, and provided evidence for a qualitatively different form of processing for these

two types of sound. They defined a complex as being resolved when, on average, fewer than

two harmonics interacted within the 10-dB-down bandwidth of an auditory filter having a

center frequency in the middle of the passband of that complex. According to that definition,

the complexes (filtered in the 1375-1875 Hz region) in the F0A=88 Hz condition here were not

resolved by the auditory periphery even in the young normal-hearing listeners; the 10-dB

bandwidth of an auditory filter centered on 1606 Hz in a young normal hearing listener is

about 364 Hz (Glasberg and Moore, 1990). Otherwise stated, in this condition in which the

F0s of the stimuli were so low, the young normal-hearing subjects were in a similar situation

Grimault 164

as the elderly hearing-impaired subjects with respect to resolvability: the harmonics were not

individually resolved by the auditory periphery, irrespective of the presence or absence of

hearing loss and of age. This may explain the absence of difference between the streaming

scores of the two subject groups in this condition. Given the fact that the subjects in the two

groups differed by age as well as by hearing loss, the absence of difference between their

streaming scores suggests that, when the complexes are unresolved for all subjects, hearing

loss and age have no significant effect on stream segregation. The lack of influence of age on

stream segregation is consistent with the results of an earlier study (Alain et al., 1996).

In the F0A=250 Hz condition, streaming was generally larger in the YNH than in the

EHI+ and EHI listeners. Data from the literature (Patterson et al., 1982) indicate that the

average 10-dB-down auditory-filter bandwidth at 1500 Hz – i.e. the middle of the MID

frequency range used in the present and previous studies -, which is approximately 364 Hz in

young normal-hearing listeners, is around 540 Hz in listeners aged around 70 – i.e. close to the

average age across the subjects from the EHI group used here. In hearing-impaired individuals,

the bandwidth varies between about 1.5 and 4 times that measured in normal-hearing subjects

(Moore, 1985), leading to an estimated bandwidth of between 540 and 1456 Hz. Therefore, in

the F0A=250 Hz condition, while the harmonics were resolved in normal-hearing listener, they

were presumably unresolved in the other two groups of listeners tested. Accordingly, the

differences in streaming scores observed between the young, normal-hearing subjects and the

other subjects in this F0A=250 Hz condition may be related to differences in resolvability. The

finding of larger streaming scores in young, normal-hearing subjects than in elderly and

hearing-impaired listeners is consistent with the hypothesis that streaming is promoted by

differences in the spectral patterns of excitation elicited in the peripheral auditory system by

successive sounds (Hartmann & Johnson, 1991; Vliegen et al., 1999; Grimault et al., 2000),

and with the fact that these differences are generally larger when the frequency components are

individually resolved by the peripheral auditory system than when they are not. It is also

consistent with the «pitch strength» hypothesis, whereby differences in frequency selectivity

affect the resolvability of the complexes, which then affects their pitch strength and finally

their streaming scores.

Grimault 165

Naturally, because the YNH subjects differed from the EHI+ and EHI subjects not only

by peripheral frequency resolution but also by age, any difference we observe between them

might have been due to this age difference rather than to effects mediated by the peripheral

auditory system. For example, recent evidence shows that stream segregation involves central

mechanisms, and that the rate at which streaming builds up is much faster when subjects are

attending to the sequences than when they are performing a competing task (Carlyon et al,

2000). It is possible that our elderly subjects were attending to the tones less consistently than

our young listeners, and this would have reduced the amount of stream segregation at the end

of the sequences. However, it is hard to see why this would have occurred at F0A=250 Hz but

not at F0A=88 Hz. Furthermore, earlier results in the literature have indicated no systematic

effect of age on streaming performance measured using pure tone sequences; only the speed of

responses was shown to be different between young and elderly listeners (Alain et al., 1996).

Finally, it is noteworthy that the streaming scores measured in the F0A=250 Hz

condition were in general larger in the EHI+ group than in the EHI listeners. Although not

statistically significant, this observation is worth discussing. If streaming were determined

purely by the local differences in excitation pattern produced by the A and B tones, we would

expect segregation to be either the same in the two groups of greater in the EHI group.

However, if, as discussed above, streaming is determined by the difference in pitch strength,

the results may be consistent with the frequency resolution of the three groups. For the young

listeners, the complexes were resolved, and F0 discrimination would be expected to be good.

For the ENH group the complexes are «just» unresolved, with about three harmonics

interacting within the 10-dB auditory filter bandwidth. For the ENH+ groups the harmonics are

still unresolved, but the filter bandwidths are even broader and more harmonics interact.

Houtsma and Smurzyski (1988) have shown that increasing the number of unresolved

harmonics increases the accuracy with which their F0 is encoded; as more harmonics interact,

the temporal envelope at the output of an auditory filter becomes more sharply defined. A

similar effect may have been produced by the increased filter bandwidths of our ENH+ group.

It is important to note that both this interpretation and the one in terms of local differences in

the excitation patterns of the A and B tones rely on the notion that differences in spectral

resolution between young and elderly listeners can produce differences in stream segregation.

Grimault 166

Another possible interpretation which may tentatively be offered for the observation of

larger streaming scores in EHI+ than in EHI subjects is that beyond a certain degree of hearing

loss, the detrimental effect of auditory-filter widening on streaming is over-compensated by a

facilitating influence of loudness recruitment. In the presence of loudness recruitment, changes

in the physical intensity of sounds lead to larger changes in the perceived intensity. Thus,

loudness recruitment may have contributed to emphasize local spectral differences between the

A and B complexes in EHI+ listeners. Given that the stimuli were presented at equal SLs in all

subjects, and considering a simple model whereby the loudness in logarithmic sone units

increases linearly between the absolute threshold and 100 dB SPL, it can be roughly estimated

that the loudness of the stimuli was in fact on average 15 times as large in the EHI+ than in the

EHI listeners – instead of about 4 times as large in the EHI listeners - (see appendix for

details). Although loudness recruitment and the loss of frequency selectivity which are

generally associated to cochlear damage presumably both find their origin in the same

underlying mechanism – namely: outer hair cell damage – (see: Moore, 1995), it is possible

that their non-linear effects do not exactly counteract each other at all times. If above a certain

degree of loss, the enhancing effect of loudness recruitment on spectral cues becomes larger

than the smearing effect of the reduction in frequency selectivity, stream segregation may have

been facilitated in the EHI+ listeners, who had on average larger hearing loss, as compared to

the EHI subjects. It is noteworthy that this interpretation is consistent with the general

hypothesis that differences between the spectral excitation patterns elicited in the peripheral

auditory system by the A and B tones promote stream segregation, which is equally consistent

with the notion that peripheral frequency selectivity plays an important role in streaming.

CONCLUSION

On the whole, the present results indicate that perceptual auditory stream segregation is

reduced in elderly hearing-impaired subjects, as compared to young normal-hearing subjects in

a way that cannot be accounted for by age alone, but is generally consistent with the

detrimental effect of age and hearing-impairment on peripheral frequency resolution.

A potentially important implication of the present results is that reduced perceptual

separation of sequential sounds may lead to increased perceptual interference between these

Grimault 167

sounds. In particular, recent results suggest that the encoding of the F0 of a complex tone can

be largely impaired by the presence of preceding or following tones when all tones are

allocated to the same perceptual stream by the auditory system; in contrast, when the target and

interferer tones are streamed apart based on differences in F0, timbre, or perceived location,

the target F0 can be encoded almost as accurately as if the interferer tones were absent

(Micheyl & Carlyon, 1999; Gockel et al., 1999). From a more general point of view, the

present results open the way to interpretations of the listening difficulties experienced by

hearing-impaired individuals in "cocktail party" situations in terms of deficits in auditory scene

analysis mechanisms.

ACKNOWLEDGMENTS

This research was supported by the French National Center for Scientific Research

(CNRS) and by the ENTENDRE hearing-aid dispensers group. Stéphane Garnier, Philippe

Cancel and Gilles Leblanc are thanked for their help in conducting the experiments in hearing-

impaired and old normal-hearing listeners.

REFERENCES

Alain C, Ogawa KH, Woods DL. Aging and the segregation of auditory stimulus sequences. J

Geront B Psychol Sci Soc Sci 1996; 51: 91-93.

Beauvois MW, Meddis R. Computer simulation of auditory stream segregation in alternating-tone

sequences. J Acoust Soc Am 1996; 99: 2270-80.

Bregman AS. Auditory Scene Analysis: The perceptual Organization of Sound, MIT, Cambridge,

MA, 1990.

Bregman AS, Campbell J. Primary auditory stream segregation and the perception of order in

rapid sequences of tones. J Exp Psychol 1971; 89: 244-49.

Bregman AS, Liao C, Levitan R. Auditory grouping based on fundamental frequency and formant

peak frequency. Can J Psychol 1990; 44: 400-13.

Carlyon RP, Cusack R, Foxton JM, Robertson IH. Effects of attention and unilateral neglect on

auditory stream segregation. J Exp Psychol: Hum Perc Perf 2000; submitted.

Carlyon RP, Shackleton TM. Comparing the fundamental frequencies of resolved and unresolved

Grimault 168

harmonics: evidence for two pitch mechanisms? J Acoust Soc Am 1994; 95: 3541-54.

Cherry EC. Some experiments on the recognition of speech with one or two ears. J Acoust Soc

Am 1953; 25: 975-79.

Cox RM, Alexander GC. Hearing aid benefit in everyday environments. Ear Hear 1991; 12: 127-

39.

Davis A. Hearing in Adults. London: Whurr Publishers, 1995.

Fletcher H. Auditory Patterns. Rev Mod Phys 1940; 12: 47-65.

Florentine M, Buus S, Scharf B, Zwicker E. Frequency selectivity in normally-hearing and

hearing-impaired observers. J Speech Hear Res 1980; 23: 646-69.

Glasberg BR, Moore BCJ. Auditory filter shapes in subjects with unilateral and bilateral cochlear

impairements. J Acoust Soc Am 1986; 79: 1020-33.

Glasberg BR, Moore BCJ. Derivation of auditory filter shapes from notched-noise data. Hear Res

1990; 47: 103-38

Gockel H, Carlyon RP, Micheyl C. Context dependence of fundamental frequency

discrimination: Lateralized temporal fringes. J Acoust Soc Am 1999; 106: 3553-63.

Green DM, Swets JA. Signal detection theory and psychophysics. New York: Wiley, 1966.

Grimault N, Micheyl C, Carlyon RP, Artaud P, Collet L. Influence of peripheral resolvability on

the perceptual segregation of harmonic complex tones differing in fundamental frequency.

Accepted for publication in: J Acoust Soc Am 2000.

Hartmann WM, Johnson D. Stream segregation and peripheral channeling. Mus Perc 1991; 9:

155-84.

Hoekstra A, Ritsma RJ. Perceptive hearing loss and frequency selectivity. in: Psychophysics

and Physiology of Hearing, ed. EF Evans and JP Wilson, New York: Academic Press,

1977.

Houtsma AJM, Smurzynski J. JF Schouten revisited: Pitch of complex tones having many

high-order harmonics. J Acoust Soc Am 1988; 87: 304-10.

McCabe SL, Denham MJ. A model of auditory streaming. J Acoust Soc Am 1997; 101: 1611-21.

Micheyl C, Carlyon RP. Effect of temporal fringes on fundamental-frequency discrimination. J

Acoust Soc Am 1998;104: 3006-18.

Moore BCJ. Perceptual consequences of cochlear damage. Oxford: University Press, 1995.

Grimault 169

Moore BCJ. Frequency selectivity and temporal resolution in normal and hearing-impaired

listeners. Brit J Audiol 1985; 19: 189-201.

Nejime Y, Moore BCJ Simulation of the effect of threshold elevation and loudness recruitment

combined with reduced frequency selectivity on the intelligibility of speech in noise. J

Acoust Soc Am 1997; 102:603-15.

Patterson RD, Nimmo-Smith I, Weber DL, Milroy R. The deterioration of hearing with age:

Frequency selectivity, the critical ratio, the audiogram, and speech threshold. J Acoust

Soc Am 1982; 72: 1788-803.

Plomp R. Auditory handicap of hearing impairment and the limited benefit of hearing aids. J

Acoust Soc Am 1978; 63: 533-49.

Rose MM, Moore BCJ. Perceptual grouping of tone sequences by normally-hearing and hearing-

impaired listeners. J Acoust Soc Am 1997; 102: 1768-78.

Shackleton TM, Carlyon RP. The role of resolved and unresolved harmonics in pitch perception

and frequency modulation discrimination J Acoust Soc Am 1994; 95: 3529-40.

Snodgrass JG, Corwin J. Pragmatics of measuring recognition memory: Applications to dementia

and amnesia. J Exp Psychol: Gen 1988; 117: 34-50.

Sommers MS, Gehr SE. Auditory suppression and frequency selectivity in older and younger

adults. J Acoust Soc Am 1998; 103: 1067-74.

Sommers MS, Humes LE. Auditory filter shapes in normal-hearing, noise-masked normal and

elderly listeners. J Acoust Soc Am 1993; 93: 2903-14.

Stevens SS. On the psychoacoustical law. Psychol Rev 1957; 64:153-81.

Tyler RS, Wood EJ, Fernandes M. Frequency resolution and hearing loss. Brit J Audiol 1982;

16: 45-63.

Van Noorden LPAS. Temporal coherence in the perception of tone sequences. Unpublished

Doctoral Dissertation, Technische Hogeschool Eindhovern, Eindhoven, The Netherlands,

1975.

Vliegen J, Oxenham AJ. Sequential stream segregation in the absence of spectral cues. J

Acoust Soc Am 1999; 105: 339-46.

Grimault 170

Vliegen J, Moore BCJ, Oxenham AJ. The role of spectral and periodicity cues in auditory

stream segregation, measured using a temporal discrimination task. J Acoust Soc Am

1999; 106: 938-45.

Grimault 171

APPENDIX

Estimation of differences in loudness between the YNH, EHI+, and EHI subjects.

The loudness L of a sound having a SPL of I dB is given by:

nhI

nhsone kL

α

⋅= 1010

(1)

In young normal-hearing subjects, αnh=0.3 (Stevens, 1957) and, given that by definition, the

loudness of a 1 kHz pure tone at 40 dB SPL is 1 sone, knh=0.063095734546.

Let us consider a subject whose hearing threshold is elevated by an amount T dB as compared to

that of a normal-hearing subject. Under the assumption that, the loudness is the same in the two

subjects at threshold and at 100 dB SPL (Nejime & Moore, 1997), we have:

1010

10Tnh

ih

−

⋅=

αα

(3)

and:

1010

10

1010TT

nhih nh

kk

−

⋅

= α

(4)

Using these equations with T set to the average hearing threshold over 500, 1000, and 2000 Hz in

the YNH, EHI+, and EHI subjects, we estimated the loudness in sones of tones at 30, 40, and 50

dB SL; the resulting values are indicated in the following table:

Grimault 172

30 dB SL 40 dB SL 50 dB SL

YNH 0.50 1.00 2.00

EHI+ 3.36 12.74 48.33

EHI 1.24 3.40 9.30

Table A1: Estimated loudness (in sones) of tones at 30, 40, and 50 dB SL for the YNH, EHI+, and

EHI subjects.

Grimault 173

Article 5: Further evidence for the resetting of the pitch analysis system by abrupt

temporal transitions between sucessive tones

Nicolas Grimault, Christophe Micheyl, Robert P. Carlyon et Lionel Collet

RESUME:

Nous avons mesuré, au cours de cette étude, des seuils de discrimination de fréquence

fondamentale (discrimination entre la hauteur de deux sons complexes harmoniques) dans

différentes conditions expérimentales. La fréquence fondamentale nominale des sons

complexes à discriminer était soit 62 soit 352 Hz et ces sons étaient filtrés dans trois régions

spectrales différentes identifiées par LOW (125-625 Hz), MID (1375-1875 Hz) et HIGH

(3900-5400 Hz). Ces paramètres permettent aussi bien d'étudier l'influence de la fréquence

fondamentale que celle de la région ou encore celle de la résolvabilité des harmoniques sur la

discriminabilité des sons. La présence d'une frange temporelle, c'est à dire d'un son complexe

(même F0 et même région) précédant immédiatement le premier des deux sons complexes à

discriminer, donne toute son originalité à cette étude.

Nous avons ainsi montré que la présence de la frange (masquage proactif) pouvait diminuer

les performances de discrimination des sujets en absence totale de masquage rétroactif (pas de

frange dans l'intervalle inter-stimulus). Par ailleurs, plus la durée de la transition entre la

frange et le premier complexe est longue plus l'effet de masque est important.

Ce résultat est en accord avec l'idée originelle de Bregman (Bregman et al., 1994a,b) qui

suggère que les mécanismes d'encodage de la hauteur sont d'autant mieux réinitialisés que les

transitions temporelles entre les signaux sont brutales. De plus, l'influence particulière de la

région spectrale utilisée pour filtrer les signaux semble indiquer un rôle important du splatter

spectral, généré en sortie de filtrage périphérique par des transitions abruptes, dans les

mécanismes de réinitialisation du codage de la hauteur. Enfin, il faut souligner qu'une bonne

Grimault 174

réinitialisation de ces mécanismes permet une ségrégation efficace des différents éléments

d'une séquence constituée de sons complexes harmoniques. Les résultats de cette expérience

apportent donc quelques éléments théoriques supplémentaires sur les mécanismes primaires

de l'analyse de scène en audition.

Grimault 175

Further evidence for the resetting of the pitch analysis system by abrupt

temporal transitions between successive tones

Nicolas Grimaulta),b), Christophe Micheyl a), Robert P. Carlyon c) and Lionel Collet a)

a)UMR CNRS 5020 Laboratoire « Neurosciences & Systèmes Sensoriels », Hôpital E.

Herriot - Pavillon U, 69437 Lyon Cedex 03, France

b) ENTENDRE Audioprothesists Group GIPA2, Pontchartrain. France.

c) MRC- Cognition and Brain Sciences Unit. 15, Chaucer Rd. Cambridge, CB2-2EF,

England.

Received:

PACS Numbers:

Running title: F0 integration and temporal transitions

Grimault 176

Introduction

The results of several previous studies have demonstrated that the auditory processing

of the fundamental frequency (F0) of complex tones can be dramatically impaired by the

presence of temporally-adjacent sounds. A seminal study by Carlyon (1996a) showed that the

ability of subjects to discriminate the F0s of two successive complex tones, as reflected by F0

discriminations (DLF0s), was impaired in the presence of temporal "fringes", i.e. other

complex tones which immediately preceded and followed the target tones. This initial finding

was investigated further in a subsequent study by Micheyl and Carlyon (1998), the results of

which allowed to determine how the effect depended on the F0 difference between the fringes

and the targets, as well as on the peripheral resolvability of the harmonics. Basically, it was

found that when the targets and fringes were made up of resolved harmonics, a large

difference between their F0s annihilated the temporal interference effect; when the harmonics

making up the stimuli were unresolved, in contrast, even fringes whose F0s differed widely

from those of the targets had a detrimental influence. This result was replicated by a later

study (Gockel et al., 1999).

In all these previous studies which have demonstrated a detrimental influence of

temporally-adjacent sounds on the perceptual processing of F0 information, temporal fringes

were presented in both observation intervals. Consequently, two classes of interpretations can

be proposed to account for the observed results. Firstly, it is possible that, as proposed by

Carlyon (1996a), the detrimental effects of the fringes is due to the fact that in order to

estimate the pitch of sounds, the auditory system integrates F0 information over a relatively

long time window so that F0 information from the target complex is contamined by F0

information from the surrounding fringes. An alternative interpretation is that the fringes

impair, not the F0 encoding process, but rather the F0 comparison process. Previous studies

have shown that when extraneous sounds are introduced in the temporal interval which

separates two targets sounds, the comparison between some perceptual attributes of the two

sounds (pitch, timbre, phonetic identity...) can be impaired (Deutsch, 1972; Semal and

Demany, 1991a,b; Semal et al., 1996). A common explanation for this observation is that the

memory trace of the initial target sound is degraded by the following interferes, before it can

be compared to the second target. In the previous studies by Carlyon (1996a), Micheyl and

Carlyon (1998) and Gockel et al. (1999), there was always at least one interfering sound (a

fringe) between the target complexes which the subject had to compare. The main aim of the

present study was to test whether detrimental effects on F0 discrimination can still be

Grimault 177

produced by a temporally-adjacent fringe which does not occur between the two target tones.

Otherwise stated, a substantial methodological difference between the present study and

previous studies in which temporal interference effects in F0 discrimination were tested lays

in the fact that no interfering tone was present between the two target complexes. In such a

situation, the observation of significantly larger DLF0s in the presence of the fringe would

constitute a strong argument for an interpretation in terms of F0 over-integration.

One difficulty with the F0 over-integration hypothesis, comes from the fact that, in the

light of data in the literature, the auditory system continues to integrate F0 information in spite

of detecting the end of one sound and the beginning of another appears unlikely. Recent data

by White and Plack (1998) indicate that the strategy used by the auditory system to build an

estimate of the F0 of complex sounds over time is not as simple as the blind integration of

information within a fixed time window. Furthermore, earlier studies by Bregman and

colleagues (Bregman et al., 1994a, b) have shown that the pitch-analysis system can be reset

by abrupt signal onsets. In order to gain further information regarding the potential role of

onset abruptness on the fringe effect, in the second experiment of the present study, DLF0s

were measured for signals with onset times varying between 2.5 and 40 ms.

Finally, another aim assigned to this experiment was to gather information on the

possible influence of resolvability on the effect of the forward fringe. The previous studies by

Micheyl and Carlyon (1998) and Gockel et al. (1999) have demonstrated that resolvability has

an influence on the effect of the fringes in the sense that when the harmonics are resolved,

wide differences in F0s between the fringes and the targets annihilate the detrimental effect of

the fringe, whereas when the harmonics are unresolved, even fringes differing widely from the

targets by the F0 have a detrimental influence. In order to test whether resolvability has a

similar influence in the absence of fringes between the two target tones, fringes and targets

having F0s around either 62 or 352 Hz were tested so that, in the frequency region used -

which is defined as the "MID" region in earlier publications (e.g. Shackleton & Carlyon,

1994; Carlyon & Shackleton, 1994; Micheyl et al., 1998; Gockel et al., 1999) -, the harmonics

were either fully unresolved or fully resolved.

Experiment 1

Rationale

Grimault 178

The primary goal of Experiment I was to test the hypothesis that the F0 encoding of a

target harmonic complex is impaired more markedly by another, trailing complex if the

transition between the two tones is soft than if it is abrupt because abrupt transitions reset the

pitch-analysis system whereas soft transitions promote integration of the two tones into a

single perceptual entity. In order to test this prediction, we measured DLF0s for target

complex, successively in the absence and in the presence of a trailing 200-ms fringe, and

using either abrupt (2.5 ms) or slow (40 ms) ramp durations. Since we were furthermore

interested in gathering some information regarding the generalizability of the results as well as

the potential influence of harmonic resolvability, these measures were conducted at two

widely different F0s in a frequency region chosen so that the harmonics of the complexes

would be fully resolved in one case and fully unresolved in the other.

Subjects

Six normal hearing listeners took part in this experiment. The subjects ranged in age

between 24 and 31 years. They all had binaural normal hearing, i.e., absolute pure tone

thresholds at or below 20 dB HL at octave frequencies from 250 to 8000 Hz (ANSI, 1989).

All subjects had prior experience in two-interval, forced choice procedures and in pitch

discrimination tasks.

Procedure

Difference limens for F0 (DLF0s) were measured using a two-interval, two-alternative

forced-choice (2I-2AFC) procedure in conjunction with a two down, one up adaptive tracking

rule estimating the 70.7% correct discrimination on the psychometric function (Levitt, 1971).

The subjects' task was to indicate which of two successive harmonic complexes had a higher

pitch. Depending on the condition being tested, the two target complexes were or were not

preceded by another complex - hereafter referred to as a temporal "fringe" -, the F0 of which

was equal to the nominal F0 around which the actual F0 of the two target complexes were

geometrically centered. The difference in the actual F0s of the two target tones, which was set

to 40% of the nominal F0 at the beginning of a run, was divided by a factor of two after two

consecutive correct responses and multiplied by the same factor after any incorrect response

until the fourth turnpoint; thereafter, a factor of �2 was used. The procedure stopped after 16

turnpoints were obtained. The DLF0 was computed as the geometric mean of the last 12

Grimault 179

turnpoints. Six such threshold estimates were obtained in each of the eight conditions (two

nominal F0s, two ramp durations, with and without fringe).

Stimuli.

The stimuli consisted of harmonic complex tones having a duration of either 200 ms

for all maskers and 100 ms for all signals, including on and off cosine ramps with a duration

of either 2.5 or 40 ms each. They were generated digitally in the time domain by adding the

successive harmonics of a given F0 in sine (0°) phase. Nominal F0s of 62 and 352 Hz were

used. The harmonics were then bandpass-filtered digitally using a filter with lower and upper

corner frequencies of 1375 and 1875 Hz, a flat top, and 48 dB/octave slopes. As many

harmonics as necessary to fill in the passband at 48 dB of the filter were included; harmonics

to which an attenuation larger than 48 dB should had been applied were omitted.

The complex tones had an overall level of 55 dB SPL. All stimuli were presented in a

continuous pink (3 dB/octave slope) noise background with an overall level of 57 dB SPL.

This noise background was aimed to prevent the perception by the listeners of combination

tones generated by the ear, which might have obscured the interpretation of the results.

Apparatus

A Tucker-Davis-Technologies-based system was used. Signals were generated

digitally in the time domain and output through a 16-bit digital-to-analog converter (TDT

DA1) at a sampling rate of 44.1 kHz. A pink-noise background was generated digitally,

recorded on CD, and played out continuously throughout the experiment (Sony CDP-

XE300). The signals and background noise were low-pass filtered (TDT FT6-2 attenuation

more than 60 dB at 1.15 times the corner frequency) at 15 kHz. They were then led to two

separate programmable attenuators (TDT PA4). The outputs of the attenuators were

summed (TDT SM3) and led to the right or left earpiece of a Sennheiser HD465 headphone

via a headphone buffer (TDT HBC). The subject was comfortably seated in a sound booth.

Signal characteristics at the output of the two test systems were controlled using an

HP3561A signal analyzer.

Results

Grimault 180

Figure 1. DLF0s measured in the absence (squares) and in the presence (circles) of the

forward fringe for nominal F0s of 62 Hz (empty symbols) and 352 Hz (filled symbols), and

ramp durations of 2.5 ms and 40 ms. The error bars represent the (geometric) standard errors

around the (geometric) mean DLF0s across listeners.

Figure 1 shows the mean DLF0s measured in the presence and in the absence of the

fringe in the 6 subjects who took part in this experiment. The results were analyzed using a

three-way repeated-measures ANOVA with the log-transformed relative DLF0s (i.e.

DLF0/F0) as dependent variable. Overall, DLF0s proved to be significantly larger at the 62-

Hz than at the 352-Hz nominal F0 [F(1,5)=123.37, p<0.001].

Grimault 181

Figure 2. Variations in DLF0s caused by a forward fringe for nominal F0s of 62 Hz (black

bars) and 352 Hz (gray bars), and ramp durations of 2.5 ms and 40 ms. Each panel

corresponds to a listener. The error bars represent the (geometric) standard errors around

the (geometric) mean ratios computed as DLF0 without mask / DLF0 with mask.

Figure 2 represents the proportional changes in DLF0s induced by the fringe in each of

the 6 listeners. In most listeners and most conditions, the DLF0s measured in the presence of

the fringe lay above those measured in its absence. On average, DLF0s increased by about 39

% in the presence of the fringe. This effect proved to be statistically significant overall

[F(1,5)=8.20, p<0.05]. ANOVAs performed independently on the thresholds measured at the

two nominal F0s revealed a significant effect of the fringe at 62 Hz [F(1,5)=6.72, p<0.05]; at

352 Hz, the effect failed to reach the statistical significance threshold [F(1,5)=5.6, p=0.06].

Altogether, the ramp duration factor had no significant effect [F(1,5)=2.71, p=0.16].

This lack of effect was observed both for the data obtained in the absence of the fringe

[F(1,5)=1.09, p=0.34] and for the data obtained in the presence of the fringe [F(1,5)=2.53,

p=0.17]. Nevertheless, two-way repeated-measures ANOVAs (masker D onset) performed

independently on the data at 62 Hz and at 352 Hz revealed a significant influence of ramp

Grimault 182

duration in the former F0 condition [F(1,5)=16.96, p<0.01] but not in the latter [F(1,5)=0.13,

p=0.74].

Although no interaction between the "fringe" and "ramp duration" factors was obtained

overall [F(1,5)=0.43, p=0.54], an ANOVA performed on the results obtained using the 40-ms

ramp duration alone revealed a significant effect of the fringe [F(1,5)=7.85, p<0.05]; for the

2.5 ms ramp duration, the effect just failed to reach the statistical significance threshold

[F(1,5)=6.08, p=0.057].

Discussion

The DLF0s obtained in this experiment are in the range of those measured in previous

studies at similar nominal F0s (Carlyon & Shackleton, 1994; Shackleton & Carlyon, 1994).

The finding of larger DLF0s at 62 than at 352 Hz is consistent with these earlier data as well

as with the notion that unresolved harmonics are associated to larger DLF0s than resolved

harmonics (Carlyon & Shackleton, 1994; Shackleton & Carlyon, 1994).

The main finding of this experiment corresponds to the fact that DLF0s were

significantly increased by the presentation of a single fringe before the first target complex.

Whereas in previous studies in which fringes were presented in both observation intervals

(Carlyon, 1996; Micheyl & Carlyon, 1998; Gockel et al., 1999), the observed F0

discrimination impairments could be partly explained by the disruption of the memory trace of

the target tones (Deutsch, 1972; Semal and Demany, 1991a,b; Semal et al., 1996) or of the

ongoing processing of these tones in short term auditory memory, –e.g. backward masking

(Massaro, 1975) -, the present results allow to rule out this type of interpretations. Naturally,

one cannot deny that interferences in short-term auditory memory may have contributed to the

F0-discrimination impairments observed in these earlier studies. This possibility is left opened

by the fact that the fringe effects evidenced here - around 39 % on average - were substantially

smaller than those reported earlier – Micheyl and Carlyon (1998) and Gockel et al. (1999)

have reported increases of approximately 200% in conditions in which, like here, the target

complexes and the fringes had neighboring F0s –. However, it should be noted that even this

difference may be explained by other phenomena than memory interferences, as will be

further discussed below.

The results of the present experiment indicate that the F0-encoding process itself is

affected by temporally contiguous sounds. One possibility suggested by Carlyon (1996a) is

that the auditory system includes some of the information regarding the F0 of the fringe into

Grimault 183

its estimate of the target F0; this is the “F0 over-integration” hypothesis. According to this

hypothesis, the fact that the fringe effects observed here were smaller than those obtained in

the previous studies by Micheyl & Carlyon (1998) and Gockel et al. (1999) could be due to

the fact that only one of the two target complexes was corrupted - since the fringe was present

in only one observation interval whereas it was present in both observation intervals in earlier

studies -, and that it was corrupted less – since only one fringe was presented whereas earlier

studies used both a backward and a forward fringe -. Results from Carlyon (1996a) indeed

suggest that the forward and the backward fringes each have an effect.

However, this “F0 over-integration” interpretation needs to be qualified based on the

fact that when the data obtained in the two ramp-duration conditions were analyzed

separately, DLF0s were found to be significantly larger in the presence of the fringe in the 40-

ms ramp condition only. This observation is hard to explain in terms of a simple over-

integration mechanism whereby F0 information falling within a long, fixed-duration temporal

window is combined by the auditory system. Indeed, the shorter the transition between the

fringe and the target, the higher the likelihood that the window contains a large amount of the

fringe together with the target. Therefore, over-integration effects should have been larger in

the 2.5-ms ramp condition, which corresponds to shortest transition time between the fringe

and the target, than in the 40-ms ramp condition. The present finding, which show a

significant fringe effect in the 40-ms ramp condition, suggests that temporal F0 over-

integration, if any, between temporally-contiguous sounds is prevented by abrupt transitions.

This interpretation is consistent with the hypothesis, which was originally inspired by results

from Bregman and colleagues (1994a,b), that the pitch-analysis system is reset by abrupt

onsets. From that point of view, slow transitions between temporally contiguous sounds, even

if they are associated to large temporal gaps between the peak amplitudes of the sounds (i.e.

80 ms with the 40-ms ramps used here), appear to promote fusion into a single perceptual

entity.

Another interesting aspect of the present results relates to the finding of significantly

larger DLF0s for the 40-ms than for the 2.5-ms ramp duration in the 62-Hz F0 condition. A

possible explanation for this finding is suggested by previous data by Plack and Carlyon

(1995) showing that DLFs decrease significantly with increasing tone duration up to about

200-300 ms. In the present experiment, the overall duration of the tones remained fixed at 100

ms. However, the duration of the plateau varied from 95 ms for the 2.5-ms ramps down to

only 20 ms with the 40-ms ramps. The fact that ramp duration had a significant influence on

Grimault 184

DLF0s only in the 62-Hz F0 condition may further be related to the finding by Plack and

Carlyon (1995) that the effect of duration on DLF0s is more marked for unresolved than for

resolved harmonics.

Experiment 2

Rationale

The results of Experiment I have shown that abrupt transitions between temporally

contiguous signals contribute to prevent temporal interferences in F0 perception. Basically,

two mechanisms could explain this effect. Firstly, it could be that the temporal integration of

F0 information processing is reset when auditory neurons detect abrupt variations in stimulus

amplitude. Neurons in the cochlear nucleus and at several other stages of the central auditory

system which are specifically sensitive to stimulus onsets and/or offsets could trigger this

resetting. Alternatively, in the spectral domain, abrupt onsets cause a temporary broadening of

the spectrum, known as spectral splatter, which may increase the pool of peripheral auditory

neurons activated by the stimulus. The main purpose of this second experiment was to try and

tease apart these two types of effects in order to determine the factors underlying the influence

of onset duration observed in Experiment I. The approach that was used to this aim relies on

the notion that, in virtue of Paserval's time-frequency reciprocity theorem, while the

bandwidth of peripheral auditory filters increases with center frequency, the response time of

these filters decreases. Therefore, the effect of spectral splatter should decrease with

increasing frequency, as the changes in the spectral slopes of the stimulus which it produces

become smaller in comparison to auditory-filter slopes. In other words, at high frequencies,

the slopes of peripheral auditory filters are shallower that the spectral slopes of the stimulus,

irrespective of ramp duration. Therefore, spectral splatter is not reflected in peripheral

excitation patterns. In contrast, temporal envelope slopes should have a larger effect on

peripheral excitation patterns at higher frequencies, where the time constants of peripheral-

auditory filter impulse responses are short. Based on this reasoning, we measured and

compared in this second experiment the influence of ramp duration across three different

frequency regions ranging from LOW to HIGH. The predictions can be summarized as

follows: If plateau duration is the factor, the effect should vary with resolvability and be larger

for unresolved than for resolved harmonics. If the duration of the gap between the fringe and

Grimault 185

the masker is the factor, the effect should increase from the LOW to the HIGH region. If

spectral splatter is the factor, the effect should decrease from the LOW to the HIGH region.

Subjects

Six normal-hearing listeners, none of which had taken part in Experiment I, took part

in Experiment II. The subjects ranged in age between 19 and 27 years. They all had binaural

normal hearing, i.e., absolute pure-tone thresholds at or below 20 dB HL at octave frequencies

from 250 to 8000 Hz (ANSI, 1989). Only one of these subjects (i.e. the first author) had prior

experience in psychoacoustic tasks. The others were paid an hourly wage for their

participation.

Stimuli

The stimuli used in this second experiment had the same general characteristics than

those used in Experiment I and were generated using the same apparatus. In addition to the 2.5

and 40 ms ramp durations used previously, ramp durations of 5, 10, and 20 ms were used in

this second experiment. In addition to the MID (1375-1875 Hz) frequency region, a LOW

(125-625 Hz), and a HIGH (3900-5400 Hz) frequency regions were involved. The forward

fringe was always present.

Procedure

DLF0s were measured using exactly the same 2I-2AFC procedure as used in

Experiment I. The listeners took part in six 2-hour sessions, for three weeks (2 sessions per

week). On odd-numbered sessions, the subjects were tested in the MID region only, using

ramp durations of 2.5, 5, 10, 20, and 40 ms, and nominal F0s of 62 or 352 Hz. On even-

numbered sessions, the subjects were tested in the LOW, MID and HIGH regions, using ramp

durations of 2.5 and 40 ms, and nominal F0s of 62 or 352 Hz. DLF0s shown on this graph

were computed as the geometric mean of the DLF0s measured during the last eight 2-hour

sessions out of a total of twelve sessions.

Results

Grimault 186

Figure 3. DLF0s measured in the presence of the forward fringe. Right-hand panel: DLF0s

obtained in the MID frequency region using nominal F0s of 62 and 352 Hz and ramp

durations varying between 2.5 and 40 ms. Left-hand panel: DLF0s obtained in the LOW,

MID, and HIGH frequency regions, using nominal F0s of 62 and 352 Hz, and ramp durations

of 2.5 and 40 ms. The error bars represent the (geometric) standard deviations around the

(geometric) mean DLF0s expressed as percentages of the nominal F0.

The results of Experiment II are represented in figure 3. Data regarding the detailed

effect of ramp duration on DLF0s in the MID region are shown in the right-hand panel. These

data were analyzed using a two-way, repeated-measures ANOVA (ramp duration D F0).

DLF0s were found to be significantly larger in the 62-Hz than in the 352-Hz F0 condition

[F(1,5)=225.23, p<0.001]. They were also found to vary significantly across ramp durations

[F(4,20)=8.25, p<0.001], being larger for the 40-ms than for all shorter ramp durations in the

F0=62 Hz condition (Bonferroni adjusted p<0.05). No interaction was observed between the

F0 and ramp duration factors [F(4,20)=1.36, p=0.28].

The left-hand panel of figure 3 shows the data obtained in the different frequency

regions using the two extreme ramp durations. These data were analyzed using a three-way,

repeated-measures ANOVA (ramp duration D F0 D region). DLF0s were found to vary

significantly across frequency regions [F(2,10)=35.25, p<0.001], being significantly smaller in

the LOW than in the MID region for both ramp durations using the 62-Hz F0, and in the MID

than in the HIGH region for both ramp durations and F0s. The nominal F0 also had a

Grimault 187

significant effect [F(1,5)=39.62, p=0.001]: the DLF0s were overall larger in the 62-Hz than in

the 352-Hz nominal F0 condition. DLF0s varied across ramp durations [F(1,5)=116.00,

p<0.001], being larger in the 40-ms than in the 2.5-ms duration condition. The influence of

ramp duration was found to vary significantly across regions [F(2,10)=18.79, p<0.001], but

not to depend on F0 [F(1,5)=1.20, p=0.32]. No significant interaction was noted between the

F0 and frequency region factors [F(2,10)=0.33, p=0.72]. The third-order interaction between

these different factors failed to reach the statistical significance threshold [F(2,10)=3.41,

p=0.074]. In order to test whether the influence of ramp duration was different for resolved

and unresolved harmonics, the data of the three resolved and the three unresolved conditions

were grouped together and compaired; the results failed to show any significant interaction

between the “ramp duration” and “resolvability” factors [F(1,5)=0.19, p=0.68].

Discussion

The results of this second experiment extend those of Experiment I and further show

that ramp duration has a significant influence on the DLF0s measured in the presence of a

forward fringe1. No evidence was found in the pattern of results for a systematic difference in

this effect between resolved and unresolved harmonics. This argues against the hypothesis that

the origin of the fringe effect observed here is to be found in the temporal F0-integration

effects reported by Plack and Carlyon (1995). Indeed, these authors demonstrated

improvements in DLF0s with increasing stimulus duration to be markedly larger for

unresolved than for resolved harmonics. This finding was confirmed in a more recent study by

White and Plack (1998), which further suggested a longer integration time for unresolved than

for resolved harmonics. If the fringe effects found in the present study had been due to

listeners "over-integrating" F0 information from the fringe when extracting the F0 of the

target, larger detrimental effects should have been obtained with unresolved than with

resolved harmonics. The results of this second experiment suggest that the fringe effect is not

subtended by the over-integration of F0 information.

The effect of ramp duration was found to decrease with increasing frequency region. If

the effect of ramp duration had been related to the abruptness of the transitions between the

fringe and target in the temporal domain, it should have been smaller in the LOW region,

being smoothed due to the long decay time of auditory filter impulse responses. Rather, the

present finding of decreasing fringe effects with increasing frequency region is consistent with

the hypothesis that the effect of ramp duration on DLF0s is related to spectral splatter. Indeed,

Grimault 188

as mentioned in the Rationale of this second experiment, the effect of spectral splatter is likely

to be larger in a frequency region where auditory-filter slopes are steep than in a frequency

region where these slopes are shallow.

Summary and conclusions

Overall, the results of this study confirm and extend those of previous studies showing that

discrimination limens for F0 (DLF0s) can be impaired by temporally adjacent complexes

(Carlyon, 1996a ; Micheyl & Carlyon, 1998; Gockel et al., 1999). The present results

demonstrate that a temporal interference effect can be observed even in conditions in which

the F0-comparison process in short-term auditory memory is not impaired, which suggests

that the F0-encoding process itself is altered. The alteration is larger for relatively long (40-ms

ramps) than for abrupt (2.5-ms ramps) transitions between the interfering and target tones.

This is consistent with the hypothesis that abrupt transitions between successive tones

contribute to reset the mechanism which is responsible for pitch analysis. The fact that the

effect is not related to the F0, or to the resolvability of the harmonics, but decreases with

increases in the frequency region of the harmonics suggests that the resetting is mediated by

the influence of spectral splatter on the patterns of activity in the peripheral auditory system;

this influence presumably decreases with the decreasing slopes of peripheral auditory filters

toward high frequencies.

Acknowledgments

This work was supported by the French National Center for Scientific Research (CNRS) and

by a doctoral research Grant allocated to the first author by the Entendre hearing-aid

dispensers group.

References

ANSI (1989). ANSI 53.6-1989, NTIS (American National Standard Institute, New York).

Bregman A.S., Ahad P. (1994). "Resetting the pitch analisis system: 2.Role of sudden onsets

and offsets in the perception of individual components in a cluster of overlapping

tones," J. Acoust. Soc. Am. 96, 2694-2703.

Bregman A.S., Ahad P., Kim J., Melnerich L. (1994). "Resetting the pitch analisis system: 1.

Effects of rise times of tones in noise backgrounds or of harmonics in a complex tone,"

Percept. Psychophys 56, 155-162.

Grimault 189

Carlyon R.P. (1996a). "Encoding the fundamental frequency of a complex tone in the

presence of a spectrally overlapping masker," J. Acoust. Soc. Am. 99, 517-524.

Carlyon, R.P., and Shackleton, T.M. (1994). "Comparing the fundamental frequencies of

resolved and unresolved harmonics: Evidence for two pitch mechanisms?," J. Acoust.

Soc. Am. 95, 3541-3554.

Semal C., Demany L. (1991a). "Dissociation of pitch from timbre in auditory short-term

memory," J. Acoust. Soc. Am. 89, 2404-2410.

Semal C., Demany L. (1991b). "Further evidence from an autonomous processing of pitch in

auditory short-term memory," J. Acoust. Soc. Am. 93, 1315-1322.

Semal C., Demany L., Ueda K., Hallé P. (1996). "Speech versus nonspeech in pitch memory,"

J. Acoust. Soc. Am. 100, 1132-1140.

Deutsch D. (1972). "Mapping of interactions in the pitch memory trace," Science 175, 1020-

1022.

Gockel, H., Caryon, R.P. and Micheyl, C. (1999). "Context dependence of fundamental

frequency discrimination: Lateralized temporal fringes," J. Acoust. Soc. Am. 106, 3553-

3563.

Levitt, H. (1971). "Transformed up-down methods in psychoacoustics," J. Acoust. Soc. Am.

49, 467-477.

Massaro, D.W. (1975). "Backward recognition masking," J. Acoust. Soc. Am. 58, 1059-1065.

Micheyl, C., and Carlyon, R.P. (1998). "Effect of temporal fringes on fundamental-frequency

discrimination," J. Acoust. Soc. Am. 104, 3006-3018.

Plack C.J., Carlyon R.P. (1995). "Differences in frequency modulation detection and


and unresolved harmonics," J. Acoust. Soc. Am. 98, 1355-1364.

Shackleton, T.M., and Carlyon, R.P. (1994). "The role of resolved and unresolved harmonics

in pitch perception and frequency modulation discrimination," J. Acoust. Soc. Am. 95,

3529-3540.

White L.J., Plack C.J. (1998). "Temporal processing of the pitch of complex tones," J. Acoust.

Soc. Am. 103, 2051-2063.

Grimault 190

Footnote

This finding contrasts at first sight with the absence of significant ramp duration effects in

Experiment I. A possible cause for this apparent discrepancy between the results of the two

experiments comes from the fact that the listeners from Experiment II had more extensive

practice in the psychophysical task, procedure, and with the stimuli, than those from

Experiment I. Due to the more limited number of conditions and of test sessions in

Experiment I, it is possible that the listeners who took part in this experiment did not have the

opportunity to learn to take advantage of abrupt ramps. One argument for this interpretation

comes from the observation that, as illustrated by Figure 4, the improvement in DLF0s over

time was in general larger in the 2.5-ms ramp duration conditions than in the 40-ms ramp

duration conditions. One possible explanation for this additional learning is that in the former

condition, listeners improved not only in their ability to discriminate the two target F0s, but

also in their ability to tease apart the first target from the preceding fringe. Further study is

required on this point.

Figure 4. Variations in DLF0s between the first and the last four measurements measured in

Experiment II for different frequency regions, nominal F0s, and ramp durations of 2.5 (black

bars) and 40 ms (dashed bars). The error bars represent the standard error around the mean

ratios.

Grimault 191

RESUME GENERAL ET CONCLUSIONS.

Grimault 192

1-Les mécanismes présumés d'encodages de la hauteur:

Un bref exposé des différentes théories se rapportant aux mécanismes neuronaux sous-jacents

au codage de la hauteur des sons complexes harmoniques a été présenté en introduction de ce

document. Nous avons vu au cours de cet exposé que de nombreux points de désaccord

perdurent entre les équipes de recherche qui travaillent sur ce thème. Fondamentalement,

certains auteurs soutiennent que le mécanisme mis en oeuvre pour coder la hauteur est

similaire dans tous les cas de figure (Meddis & Hewitt, 1991a,b; Meddis & O'Mard, 1998).

D'autres revendiquent la coexistence de plusieurs mécanismes donnant naissance à la même

sensation (Carlyon, 1998; Shackleton & Carlyon, 1994; Carlyon & Shackleton, 1994), l'un

serait spécifique aux sons complexes dont les composantes fréquentielles sont résolues par le

système auditif périphérique, et l'autre serait spécifique aux sons complexes dont les

composantes fréquentielles sont non-résolues. Il s'agit dans ce paragraphe de rappeler et de

positionner dans ce débat les résultats des études présentées dans le premier chapitre.

La première étude rapportée ici, fournit des éléments nouveaux qui arguent en faveur de

l'utilisation de deux mécanismes distincts pour l'encodage de la hauteur des sons complexes

dont les composantes sont résolues et celle des sons complexes dont les composantes

fréquentielles sont non-résolues. Il apparaît en fait que l'un de ces mécanismes serait privilégié

par le système auditif lorsque les harmoniques sont résolus par le système auditif périphérique

tandis que l'autre serait dominant lorsque les harmoniques ne sont pas résolus.

Pour montrer ce résultat, nous avons utilisé dans cette étude un paradigme expérimental qui a

fait ses preuves dans le domaine de la vision (Karni & Sagi, 1994; Polat & Sagi, 1994;

Ahissar & Hochstein, 1993; Karni, 1996) mais dont l'utilisation en audition est demeurée

marginale (cf. introduction). Cette approche se fonde sur l'hypothèse qu'il est possible

d'entraîner sélectivement un mécanisme neuronal. En entraînant des sujets à réaliser une tâche,

Grimault 193

on postule que l'on entraîne spécifiquement le ou les mécanismes utilisés pour effectuer cette

tâche particulière. Le transfert des bénéfices d'un tel entraînement vers d'autres tâches

auditives suggère alors que ces tâches partagent un ou plusieurs mécanismes sous-jacents.

De plus, l'allure générale d'une courbe décrivant l'évolution des performances d'un sujet au

cours d'un entraînement sensoriel est révélatrice de l'optimisation progressive des différents

mécanismes utilisés dans le traitement de la tâche entraînée. La forme de cette courbe apporte

ainsi des indications sur les mécanismes sous-jacents impliqués. En introduction, nous avons

distingué l'entraînement procédural (correspondant, en fait, à l'entraînement d'un ensemble de

mécanismes) de l'entraînement au stimulus (spécifique des stimulus utilisés) (Robinson &

Summerfield, 1996). On peut supposer que les mécanismes sous-jacents à l'apprentissage

procédural sont identiques pour les harmoniques résolus et les harmoniques non résolus. Par

contre, les mécanismes impliqués dans le traitement spécifique des harmoniques résolus ou

non résolus pourraient bien être différents.

La première étude confirme chacune de ces prédictions. Premièrement, les sujets entraînés à

discriminer les hauteurs de sons composés d'harmoniques résolus (groupe 1) deviennent

globalement plus performants que les autres sujets (groupe 2) avec d'autres sons du même

type qu'avec des sons composés d'harmoniques non résolus. Réciproquement, ceux entraînés

avec des harmoniques non résolus (groupe 2) sont devenus plus performants que les autres

sujets (groupe 1) dans le traitement des sons complexes non résolus.

Deuxièmement, nous avons observé méticuleusement les courbes d'apprentissages pour

chacun des deux groupes de sujets ci-dessus, et nous avons séparé et quantifié (par une

constante de temps), pour chacune de ces courbes, le gain dû à l'apprentissage "procédural" du

gain dû à l'apprentissage "du stimulus". Il apparaît alors que quelle que soit la condition

entraînée (résolue ou non-résolue) la part procédurale du gain est comparable. Au contraire, la

part dûe à l'entraînement d'un mécanisme stimulus-spécifique est différente pour chacun des

Grimault 194

groupes. D'après les résultats de cette étude, les mécanismes encodant la hauteur des sons

complexes résolus et non-résolus seraient donc distincts.

La seconde étude, va, quant à elle, plus loin dans l'analyse des mécanismes permettant la

perception de la hauteur. En effet, elle se propose de caractériser la nature des deux

mécanismes dissociés d'encodage de la hauteur dont les résultats de la première étude et ceux

de travaux antérieurs suggèrent l'existence. En utilisant le même paradigme expérimental de

transfert d'apprentissage, nous avons, dans cette seconde étude, testé la proximité du

mécanisme codant la hauteur des sons complexes résolus avec celui codant la hauteur tonale

d'un son pur. En effet, si l'on suppose qu'un modèle de type spectral ou spectro-temporel

(Goldstein, 1973; Terhart, 1972) est utilisé par le système auditif pour encoder la hauteur d'un

groupe d'harmoniques tous résolus, le codage de la fréquence de chacune des composantes

doit en constituer le premier stade. L'amélioration de ce stade par un entraînement intensif

devrait, dans ce cas, améliorer le codage de la hauteur.

Parallèlement, en supposant que le codage de la hauteur en présence d'harmoniques tous non-

résolus utilise les fluctuations périodiques de l'enveloppe temporelle à la sortie des filtres

auditifs périphériques, un entraînement spécifique à discriminer les fréquences de modulations

d'enveloppe devrait en retour améliorer les performances de discrimination de F0 avec des

harmoniques non résolus.

La seconde étude répond positivement au premier de ces points. Ainsi, globalement, les sujets

entraînés dans des tâches de discrimination fréquentielle améliorent plus leurs performances

d'encodage de la hauteur dans les conditions résolues que dans les conditions non-résolues.

Par contre, bien que certains éléments des résultats aillent dans le sens de la seconde

hypothèse, nous n'observons pas de transfert significativement plus large entre la

discrimination de cadences de modulation et discrimination de sons complexes non résolus.

Grimault 195

Il est toutefois important de remarquer que ces résultats ne peuvent aucunement être expliqués

en postulant un mécanisme unitaire pour la perception de la hauteur. De ce point de vue, ils

confirment donc ceux de la première expérience et semblent plaider en faveur d'un modèle de

type spectral ou spectro-temporel pour le codage des harmoniques résolus par le système

auditif.

En conclusion de ces deux premières études, il semble qu'au moins deux types de mécanismes

sont impliqués dans le codage perceptif de la hauteur des sons complexes harmoniques. L'un

d'entre eux serait spécialisé dans le codage des composantes résolues par le système auditif.

Nous avons vu par ailleurs que ce mécanisme semble être spectral ou spectro-temporel.

Inversement, le second mécanisme mis en oeuvre lorsque les harmoniques ne sont pas résolus

serait plutôt de type non spectral. Seuls quelques indices non significatifs peuvent nous laisser

supposer que le codage des fluctuations d'enveloppe puisse être une des étapes de ce

mécanisme.

Grimault 196

2-L'analyse de scène auditive est-elle conditionnée par les mécanismes de perception de

la hauteur.

Dans la première partie, nous avons discuté, de façon théorique, les mécanismes

potentiels donnant lieu à la sensation de hauteur. Nous avons vu par ailleurs dans

l'introduction, l'importance de la hauteur pour l'analyse des scènes auditives. Si plusieurs

mécanismes, suivant la résolvabilité des signaux, conduisent à une sensation de hauteur

unifiée, il semble légitime de poser la question d'une possible interaction entre le mécanisme

utilisé pour encoder la hauteur et les performances de l'analyse de scènes réalisées par le sujet.

Cette question est abordée dans les trois articles du second chapitre de la thèse.

Dans un premier temps, nous avons mis en évidence que quelle que soit la résolvabilité des

signaux, les sujets parvenaient à organiser une séquence de sons complexes harmoniques en

groupant entre eux les sons ayant une hauteur virtuelle proche. Ce résultat confirme certains

résultats récents de la littérature (Vliegen & Oxenham, 1999) qui signalent que les indices

spectraux ne sont pas indispensables pour attribuer des flux auditifs distincts à des sons de

hauteur différente. Cependant, les résultats de notre étude ont permis de révéler que les

performances d'organisation étaient en fait réduites lorsque les harmoniques étaient non

résolus. La plus faible saillance de la hauteur dans le cas d'harmoniques non-résolus (Houtsma

& Smurzynski, 1990; Shackleton & Carlyon, 1994), phénomène pouvant être dû à des

mécanismes sous-jacents différents, peut expliquer ce résultat. Par ailleurs, ce résultat peut

expliquer, comme nous l'avons montré dans l'étude 4, les difficultés particulières des

personnes âgées souffrant d'une perte auditive pour organiser des paysages sonores. En effet,

ces personnes ont une sélectivité fréquentielle périphérique réduite (Patterson et al., 1982) et

leur système auditif périphérique sépare donc moins bien les composantes fréquentielles des

sons. On peut supposer qu'ils n'ont alors le plus souvent à leur disposition qu'un unique

Grimault 197

mécanisme d'encodage de la hauteur (celui utilisant de façon préférentielle les harmoniques

non-résolus) et que leur aptitude à séparer des sources concurrentielles sur la base de la

fréquence fondamentale est affectée.

A posteriori, les résultats de ces deux études confirment les hypothèses émises par les auteurs

de deux études récentes (Micheyl & Carlyon, 1998; Gockel et al., 1999). Ces auteurs ont

observé dans un premier temps que les performances de discrimination de hauteur entre deux

sons complexes harmoniques étaient détériorées par l'ajout de franges temporelles (d'autres

sons complexes) avant et après les sons cibles. Dans un second temps, ils ont remarqué que la

gêne provoquée par la présence des franges se produisait lorsque le sujet était dans

l'impossibilité d'organiser les franges et les cibles dans deux flux auditifs distincts. Nous

avons suggéré en introduction que les mécanismes d'analyse de scènes auditives pouvaient

constituer un préalable aux mécanismes d'encodage de la hauteur voire même, le tout premier

étage d'un modèle du codage de la hauteur.

C'est dans cette problématique que s'inscrivent les travaux réalisés dans la dernière étude

(article 5) du second chapitre. Cette étude mesure nos capacités de discrimination entre deux

sons complexes en présence d'une frange temporelle complexe précédant juste les deux sons

complexes cibles à discriminer. Différentes conditions de résolvabillité et surtout différentes

durées de rampes (c'est à dire des temps de montée et de descente plus ou moins rapides de la

frange et de la cible) ont été utilisées. Les résultats de cette étude suggèrent l'existence d'une

corrélation entre la ségrégation perceptive de la frange et de la cible et une discrimination

performante de la hauteur. L'aptitude des sujets à analyser la scène auditive qui leur est

proposée varierait ainsi en fonction des temps de monté-descente des signaux. Plus ces

intervalles temporels sont courts, meilleure est la ségrégation. La règle de continuité détaillée

en introduction de cet ouvrage (paragraphe 3-2-1-2) pourrait expliquer que la ségrégation soit

facilitée par des transitions frange/cible brutales. Comme nous l'avons discuté dans cet article,

Grimault 198

il semble donc vraisemblable que les onsets brutaux des signaux utilisés puissent tout à la fois

favoriser la ségrégation et réinitialiser le mécanisme d'encodage de la hauteur (Bregman,

1994a, b). Cette réinitialisation semble dépendre de la région spectrale dans laquelle ont été

filtrés les signaux. Ce résultat laisse supposer que la réinitialisation est provoquée par

l'élargissement du spectre (en anglais: "spectral splatter") induit en présence des temps

d'attaques/chutes les plus courts.

Grimault 199

3-Conclusions:

Pour conclure ce travail, rappelons-en les principaux résultats. Deux premières études,

utilisant le transfert d'apprentissage, ont apporté des arguments en faveur de l'hypothèse selon

laquelle deux mécanismes neuronaux différents pouvaient être mis en oeuvre pour coder une

sensation commune de hauteur. L'un de ces mécanismes semble partager des processus

communs avec celui utilisé pour la perception de la hauteur tonale car un transfert partiel

d'apprentissage se produit entre discrimination fréquentielle et discrimination de F0 lorsque

les harmoniques sont résolus.

Sachant que la hauteur est un puissant outil d'analyse de scènes auditives, les trois études

suivantes explorent en détail le groupement par proximité de hauteur. Nous avons mis en

évidence que si la présence d'indices spectraux n'était pas indispensable aux mécanismes de

groupement, ces indices facilitaient néanmoins leur mise en oeuvre. Les malentendants ont

pour cette raison des difficultés spécifiques pour analyser les scènes auditives. Enfin, la

dernière étude mesure l'influence des temps de montée et des temps de descente dans une

expérience de discrimination de hauteur entre des sons complexes précédés d'une frange

temporelle. Nous avons discuté ce résultat en suggérant que les brusques transitions, en

élargissant le spectre des stimuli, puissent réinitialiser les mécanismes d'encodage de la

hauteur et améliorer ainsi les performances de discrimination en favorisant la ségrégation des

sons complexes.

Grimault 200

BIBLIOGRAPHIE GENERALE

Grimault 201

Ahissar, M. and Hochstein, S. (1993). Attentional control of early perceptual learning, Proc.

Natl. Acad. Sc. USA 90, 5718-5722.

Alain C., Ogawa K.H., Woods D.L.. (1996). Aging and the segregation of auditory stimulus

sequences, J Geront B Psychol Sci Soc Sci 51: 91-93.

American National Standard Institute. (1969). Specification for audiometers. (ANSI S3.6-

1969), New-York: ANSI.

American National Standard Institute. (1989). ANSI 53.6-1989, NTIS (American National

Standard Institute, New York).

Anstis, S., and Saida, S. (1985). Adaptation to auditory streaming of frequency-modulated

tones, Percept. Psychophys. 11, 257-271.

Bacon, S.P., Grimault, N. and Jungmee, L. (1997) Spectral integration and the detection of

tones in modulated and unmodulated noise, J. Acoust. Soc. Am. 102, 3160.

Beauvois, M.W., and Meddis, R. (1996). Computer simulation of auditory stream segregation

in alternating-tone sequences, J. Acoust. Soc. Am. 99, 2270-2280.

Beerends J.G., Houtsma A.J.M. (1986). Pitch identification of simultaneous dichotic two-tone

complexes, J. Acoust. Soc. Am. 80, 1048-1056.

Békésy, G. Von (1947). The variation of phase along the basilar membrane with sinusoidal

vibrations, J. Acoust. Soc. Am. 19, 452-460.

Bilecen, D., Seifritz, E., Radü, E.W., Schmid, N., Wetzel, S., Probst, R., Scheffler, K. (2000).

Cortical reorganization after acute unilateral hearing loss traced by fMRI, Neurology, 54,

765-767.

Bilsen F.A. (1973). On the influence of the number and phase of harmonics on the

perceptibility of the pitch of complex signals, Acustica 28, 60-65.

Bilsen F.A., Ritsma R.J. (1970). Some parameters influencing the perceptibility of pitch, J.

Acoust. Soc. Am. 47, 469-475.

Grimault 202

Bregman, A.S., and Campbell, J. (1971). Primary auditory stream segregation and the

perception of order in rapid sequences of tones, J. Exp. Psychol. 89, 244-249.

Bregman, A.S., and Dannenbring, G. (1973). The effect of continuity on auditory stream

segregation, Perc. Psychophys. 36, 308-312.

Bregman A.S. (1978). Auditory streaming: Competition among alternative organizations,

Perception and Psychophysics 23, 391-398.

Bregman A.S., Pinker S. (1978). Auditory streaming and the building of timbre, Canad. J.

Psychol. 32, 19-31.

Bregman, A.S. (1978). Auditory streaming is cumulative, J. Exp. Psychol.: Human Percept.

Perform. 4, 380-387.

Bregman, A.S. and Levitan, R. (1983). Stream segregation based on fundamental frequency

and spectral peak. I: Effects of Shaping by filters, Unpublished manuscript, Psychology

Department, McGill University.

Bregman A.S., Abramson J., Doehring P. (1985). Spectral integration based on common

amplitude modulation, Perception and Psychophysics 37, 483-493.

Bregman A.S., Levitan R., Liao C. (1990). Fusion of auditory components: Effects of the

frequency of amplitude modulation, Perception and Psychophysics 47, 68-73.

Bregman, A.S., Liao, C., and Levitan, R. (1990). Auditory grouping based on fundamental

frequency and formant peak frequency, Can. J. Psychol. 44, 400-413.

Bregman, A.S. (1990). Auditory Scene Analysis: The perceptual Organization of Sound (MIT,

Cambridge, MA).

Bregman, A.S. (1991). Using quick glimpses to decomposemixtures, in Music. Language,

speech and brain (eds J. Sundberg, L. Nord and R. Carlson), Londres, MacMillan, p.244-

249.

Grimault 203

Bregman A.S., Ahad P (1994). Resetting the pitch analisis system: 2.Role of sudden onsets

and offsets in the perception of individual components in a cluster of overlapping tones,

J. Acoust. Soc. Am. 96, 2694-2703.

Bregman A.S., Ahad P., Kim J, Melnerich L. (1994). Resetting the pitch analisis system: 1.

Effects of rise times of tones in noise backgrounds or of harmonics in a complex tone,

Percept. Psychophys 56, 155-162.

Broadbent, D.E. and Ladefoged (1957). On the fusion of sounds reaching different sense

organs, J. Acoust. Soc. Am. 29, 708-710.

Brown J.C., Puckette M.S. (1989). Calculation of a narrowed autocorrelation function, J.

Acoust. Soc. Am. 85, 1595-1601.

Brunstrom J.M., Roberts B. (1998). Profiling the perceptual suppression of partials in periodic

complex tones: Further evidence for a harmonic template, J. Acoust. Soc. Am. 104, 3511-

3519.

Bundy, R.S., Colombo, J. and Singer, J. (1982). Pitch perception in young infants, Develop.

Psychol. 18, 10.

Burns E.M., Viemeister N.F. (1976). Nonspectral pitch, J. Acoust. Soc. Am. 60, 863-869.

Burns E.M., Viemeister N.F. (1981) Played-again SAM: Further observations on the pitch of

amplitude-modulated noise. J. Acoust. Soc. Am. 70, 1655-1660.

Buunen T.J.F., Festen J.M., Bilsen F.A., Van den Brink G. (1974). Phase effects in a three-

component signal, J. Acoust. Soc. Am. 55, 297-303.

Canévet, G. (1995). Eléments de Psychoacoustique. Document non-publié, Université Aix-

Marseille II.

Cariani P.A. and B. Delgutte (1996a). Neural correlates of the pitch of complex tones. I. Pitch

and pitch salience, J. Neurophysiol. 76, 1698-1716.

Grimault 204

Cariani P.A. and B. Delgutte (1996b). Neural correlates of the pitch of complex tones. II.

Pitch shift, pitch ambiguity, phase invariance, pitch circularity, rate pitch, and the

dominance region for pitch, J. Neurophysiol. 76, 1717-1734.

Carlyon R.P. (1996a). Encoding the fundamental frequency of a complex tone in the presence

of a spectrally overlapping masker, J. Acoust. Soc. Am. 99, 517-524.

Carlyon R.P., Cusack R., Foxton J.M., Robertson I.H. (2000). Effects of attention and

unilateral neglect on auditory stream segregation. J Exp Psychol: Hum Perc Perf,

submitted.

Carlyon, R.P. (1996b). Masker asynchrony impairs the fundamental-frequency discrimination

of unresolved harmonics, J. Acoust. Soc. Am. 99, 525-533.

Carlyon, R.P., (1998). Comments on "A unitary model of pitch perception" [J.Acoust. Soc.

Am. 102, 1811-1820 (1997).], J. Acoust. Soc. Am., 104, 1118-1121.

Carlyon, R.P., (1998). The effect of the resolvability on the encoding of fundamental

frequency by the auditory system.

Carlyon, R.P., and Shackleton, T.M. (1994). Comparing the fundamental frequencies of

resolved and unresolved harmonics: Evidence for two pitch mechanisms?, J. Acoust.

Soc. Am. 95, 3541-3554.

Casseday, J.H., Covey E. (1995). Mechanisms for analysis of auditory temporal patterns in the

brainstem of ocholocating bats, In Neural representation of temporal patterns Ed. By E.

Covey, H.L. Hawkins and R.F. Port, New York, Plenum, 25-51.

Cherry E.C. (1953) Some experiments on the recognition of speech with one or two ears, J

Acoust Soc Am 25, 975-79.

Cox R.M., Alexander G.C. (1991). Hearing aid benefit in everyday environments. Ear Hear.

12, 127-39.

Grimault 205

Darwin C.J., Ciocca V., Sandell G.J. (1994). Effects of frequency and amplitude modulation

on the pitch of a complex tone with a mistuned harmonic, J. Acoust. Soc. Am. 95, 2631-

2636.

Davis A. (1995). Hearing in Adults, London: Whurr Publishers.

de Cheveigné A. (1993). Separation of concurrent harmonic sounds: Fundamental frequency

estimation and a time-domain cancellation model of auditory processing, J. Acoust. Soc.

Am. 93, 3271-3290.

De Cheveigné A. (1997). Harmonic fusion and pitch shifts of mistuned partials, J. Acoust.

Soc. Am. 102, 1083-1087.

de Cheveigné, A. (1993). Separation of concurrent harmonic sounds: Fundamental frequency

estimation and time-domain cancellation model of auditory processing, J. Acoust. Soc.

Am. 93, 3271-3290.

de Cheveigné, A. (1998). Cancellation model of pitch perception, J. Acoust. Soc. Am. 103,

1261-1271.

de Cheveigné, A. (1999). Modèles de traitement auditif dans le domaine temps, Mémoire

HDR non-publié.

Demany, L., (1985). Perceptual learning in frequency discrimination, J. Acoust. Soc. Am. 78,

1118-1120.

Deutsch D. (1972). Mapping of interactions in the pitch memory trace, Science 175, 1020-

1022.

Evans E.F. (1978). Place and time coding of frequency in the peripheral auditory system:

Some phisiological pros and cons, Audiolology 17, 369-420.

Faulkner A. (1985). Pitch discrimination of harmonic complex signals: Residue pitch or

multiple component discriminations ?, J. Acoust. Soc. Am. 78, 1993-2005.

Grimault 206

Fishman Y.I., Reser D.H., Arezzo J.C., Steinschneider M. (1998). Pitch vs. Spectral

encoding of harmonic complex tones in primary auditory cortex of the awake monkey,

Brain Res. 786, 18-30.

Fitzgerald M.B., Wright B.A. (2000) Specificity of learning for the discrimination of

sinusoidal amplitude-modulation rate. J. Acoust. Soc. Am. 107, 2916.

Fletchter, H. (1940). Auditory patterns, Rev. Mod. Phys. 12, 47.

Florentine M., Buus S., Scharf B. and Zwicker E. (1980). Frequency selectivity in

normally-hearing and hearing-impaired observers, J. Speech Hear. Res. 23, 646-69.

Gerson, A. and Goldstein, J.L. (1978). Evindence for a general template in central optimal

processing for pitch of complex tones, J. Acoust. Soc. Am. 63, 498.

Glasberg B.R., Moore B.C.J. (1986). Auditory filter shapes in subjects with unilateral and

bilateral cochlear impairements, J. Acoust. Soc. Am. 79, 1020-1033.

Glasberg, B.R. and Moore, B.C.J. (1990). Derivation of auditory filter shapes from

notched-noise data, Hearing Research, 47, 103-198.

Gockel, H., Carlyon, R.P. and Micheyl, C. (1999). Context dependence of fundamental

frequency discrimination: Lateralized temporal fringes, J. Acoust. Soc. Am. 106, 3553-

3563.

Goldstein J.L. (1973). An optimum processor theory for the central formation of the pitch of

complex tones, J. Acoust. Soc. Am. 54, 1496-1516.

Goldstein, J.L. (1973). An optimum processor theory for the central formation of the pitch of

complex tones, J. Acoust. Soc. Am. 54, 1496.

Green D.M. (1964). Detection of multiple component. Signals in noise, In Sgnal detection and

recognition by human observers. Ed by J. A. Swets (J Willey & Sons Inc, New York,

London, Sydney).

Grimault 207

Green D.M. and Swets J.A. (1966). Signal detection theory and psychophysics, New York:

Wiley.

Greenwood, D.D. (1961). Critical bandwidth and the frequency coordinates of the basilar

membrane, J. Acoust. Soc. Am. 33, 1344-1356.

Grimault N., Micheyl C., Carlyon R.P., Collet L. Evidence for two pitch encoding

mechanisms using a selective auditory training paradigm, article soumis.

Grimault, N., Micheyl, C., Carlyon, R.P., Artaud, P. and Collet, L. (2000). Influence of

peripheral resolvability on the perceptual segregation of harmonic complex tones

differing in fundamental frequency, J. Acoust. Soc. Am. 108, 263-271.

Hanna T. E. (1992) Discrimination and identification of modulation rate using a noise carrier,

J. Acoust. Soc. Am. 91, 2122-2128.

Hall, J.W. and Peters, R.W. (1982). Change in the pitch of a complex tone following its

association with a second complex tone, J. Acoust. Soc. Am. 71, 142.

Hall, J.W., Haggard, M.P. and Fernandes, M.A. (1984). Detection in noise by spectro-

temporal pattern analysis, J. Acoust. Soc. Am. 76, 50-56.

Hartmann W.M., Doty S.L. (1995). On the pitches of the components of a complex tone, J.

Acoust. Soc. Am. 99, 567-578.

Hartmann, W.M. (1988). Pitch perception and the segregation and integration of auditory

entities, In Auditory function - Neurological bases of hearing. Edited by G.M. Edelman,

W.E. Gall and W.M. Cowan, New York, Willey, 623-645.

Hartmann, W.M., and Johnson D. (1991). Stream segregation and peripheral channeling, Mus.

Perc. 9, 155-184.

Helmholtz, H.L.F. Von (1863). Die Lehre von den Tonempfindungen als physiologische

Grundlage für der theorie der musik, 1st edn, F. Vieweg, Braunschweig.

Grimault 208

Helmholtz, H.L.F. Von (1877). On the sensation of tone, (English translation A.J.Ellis, 1954).

New York, Dover.

Hicks, M.L. and Bacon, S.P. (1995). Some factors influencing comodulation masking release

and across-channel masking, J. Acoust. Soc. Am. 98, 2504-2514.

Hoekstra A. And Ritsma R.J. (1977). Perceptive hearing loss and frequency selectivity, in:

Psychophysics and Physiology of Hearing, ed. EF Evans and JP Wilson, New York:

Academic Press.

Houtsma A.J.M. and Smurzynski J. (1988). JF Schouten revisited: Pitch of complex tones

having many high-order harmonics, J. Acoust. Soc. Am. 87, 304-310.

Houtsma, A.J.M., and Smurzynski, J. (1990). Pitch identification and discrimination for

complex tones with many harmonics, J. Acoust. Soc. Am. 87, 304-310.

Irino T., Patterson R.D. (1997). A time domain, level-dependent auditory filter: The

gammachirp, J. Acoust. Soc. Am. 101, 412-419.

Irvine, D.R.F. (1992). Physiology of the auditory brainstem, In The mammalian auditory

pathway: neurophysiology, Edited by A.N. Popper and R.R. Fay, New York, Spring

Verlag, 153-231.

Iverson P. (1995). Auditory stream segregation by musical timbre: Effects of static and

dynamic acoustic attributes, J. Exp. Psychol.: Hum. Perc. Perf. 21, 751-763.

Jeffress, L.A. (1948). A place theory of sound localization, J. Comp. Physiol. Psychol. 41, 35-

39.

Kaernbach C., Demany L. (1998). Psychophysical evidence against the autocorrelation theory

of auditory temporal processing, J. Acoust. Soc. Am. 104, 2298-2306.

Kaltenbach J.A., Czaja J.M. and Kaplan C.R. (1992). Changes in the tonotopic map of the

dorsal cochlear nucleus following induction of cochlear lesion by exposure to intense

sound, Hearing Res. 59, 213-223.

Grimault 209

Karni, A. and Sagi, D. (1990). Texture learning is specific for spatial location and background

orientation, Invest. Ophthalmol. Vis. Sci. (Suppl.). 31, 562.

Karni, A. and Sagi, D. (1991). Where practice makes perfect in texture discrimination:

Evidence for primary visual cortex plasticity, Proc. Natl. Acad. Sci. USA 88, 4966-4970.

Karni, A. and Sagi, D. (1993). The time course of learning a visual skill, Nature 365, 250-252.

Konishi, M., Takahashi, T.T., Wagner, H., Sullivan, W.E. and Carr, C.E. (1988).

Neurophysiological and anatomical substrates of sound localization in the owl, In

Auditory function - neurobiological bases of hearing. Edited by G.M. Edelman, W.E.

Gall and W.M. Cowan, New York, Willey, 721-745.

Langner G. (1997). Neural processing and representation of periodocoty pitch, Acta Otolar.

532, 68-76.

Langner G., Sams M., Heil P., Schulze H. (1997). Frequency and periodicity are represented

in orthogonal maps in the human auditory cortex: evidence from

magnetoencephalography, J. Comp. Pysiol. A 181, 665-676.

Langner, G. and Schreiner, C.E. (1988). Periodicity coding in the inferior colliculus of the cat.

I. Neuronal mechanisms, J. Neurophysiol. 60, 1799-1822.

Levitt, H. (1971). Transformed up-down methods in psychoacoustics, J. Acoust. Soc. Am. 49,

467-477.

Licklider, J.C.R. (1951). A duplex theory of pitch perception, Experientia 7, 128-134.

Licklider, J.C.R. (1956). Auditory frequency analysis, In Information theory Edited by C.

Cherry, London, Butterworth, 253-268.

Licklider, J.C.R. (1959). Three auditory theories, In Psychology, a study of a science. Edited

by S. Koch, New York, McGraw-Hill, I, 41-144.

Licklider, J.C.R. (1962). Periodicity pitch and related auditory process models, International

Audiology 1, 11-36.

Grimault 210

Lin J.Y., Hartmann W.M. (1998). The pitch of a mistuned harmonic: Evidence for a template

model, J. Acoust. Soc. Am. 103, 2608-2617.

Lundeen, C. and Small, A.M. (1984). The influence of temporal cues on the strength of

periodicity pitches, J. Acoust. Soc. Am. 75, 1578.

Martens J.P. (1983). Comment on Algorithm for extraction of pitch and pitch salience from

complex tonal signals [J. Acoust. Soc. Am. 71, 679-688 (1982).], J. Acoust. Soc. Am. 75,

626-628.

Massaro D.W. (1975). Backward recognition masking, J. Acoust. Soc. Am. 58, 1059-1065.

Maubaret C., Demany, L., Semal, C. (1999). Sélectivité de l’apprentissage auditif de

discrimination chez l’homme, Mémoire DEA non publié , Université Bordeau II.

McAdams S. (1989). Segregation of concurrent sounds. I: Effects of frequency modulation

coherence, J. Acoust. Soc. Am. 86, 2148-2159.

McCabe S.L., and Denham, M.J. (1997). A model of auditory streaming, J. Acoust. Soc. Am.

101, 1611-1621.

McKeown, J.D., Darwin C.J. (1991). Effects of phase changes in low-numbered harmonics on

the internal representation of complex sounds, The quarterly journal of experimental

psychology 43A, 401-421.

Meddis R., O’Mard L. J. (1997). A unitary model of pitch perception, J. Acoust. Soc. Am. 102,

1811-1820.

Meddis, R. (1986). Simulation of mechanical to neural transduction in the auditory receptor, J.

Acoust. Soc. Am. 79, 702-711.

Meddis, R. (1988). Simulation of mechanical to neural transduction: Further studies, J.

Acoust. Soc. Am. 83, 1056-1063.

Meddis, R. and Hewitt, M. (1991a). Virtual pitch and phase sensitivity of a computer model

of the auditory periphery: I. pitch identification, J. Acoust. Soc. Am. 89, 2866-2882.

Grimault 211

Meddis, R. and Hewitt, M. (1991b). Virtual pitch and phase sensitivity of a computer model

of the auditory periphery: II. Phase sensitivity, J. Acoust. Soc. Am. 89, 2883-2894.

Menning, H., Roberts, L.E. and Pantev, C. (2000). Plastic changes in the auditory cortex

induced by intensive frequency discrimination training, Neuroreport 11, 817-822.

Micheyl C., and Carlyon, R.P. (1998). Effect of temporal fringes on fundamental-frequency

discrimination, J. Acoust. Soc. Am.104, 3006-3018.

Miller, G.A., and Heise, G.A. (1950). The trill threshold, J. Acoust. Soc. Am. 22, 637-638.

Montgomery C.R., Clarkson M.G. (1997). Infant’s pitch perception: Masking by low- and

high-frequency noises, J. Acoust. Soc. Am. 102, 3665-3672.

Moore B.C.J. (1973). Frequency difference limens for short-duration tones, J. Acoust. Soc.

Am. 54, 610.

Moore B.C.J. (1985). Frequency selectivity and temporal resolution in normal and hearing-

impaired listeners, Brit. J. Audiol. 19, 189-201.

Moore B.C.J. (1989). An introduction to the psychology of hearing, Academic Press.

Moore B.C.J. (1995). Perceptual consequences of cochlear damage, Oxford: University Press.

Moore B.C.J., Glasberg B.R. (1987). Formulae describing frequency selectivity as a function

of frequency and level, and their use in calculating excitation patterns, Hear. Res. 28,

209-225.

Moore B.C.J., Glasberg B.R. (1989). Difference limens for phase in normal and hearing-

impaired subjects, J. Acoust. Soc. Am. 86, 1351-1365.

Moore B.C.J., Glasberg B.R. (1990). Frequency discrimination of complex tones with

overlapping and non-overlappinhg harmonics, J. Acoust. Soc. Am. 87, 2163-2177.

Moore B.C.J., Glasberg B.R., Peters R.W. (1985). Relative dominance of individual partials

in determining the pitch of complex tones, J. Acoust. Soc. Am. 77, 1853-1860.

Grimault 212

Moore, B.C.J. (1973). Frequency difference limens for short-duration tones, J. Acoust. Soc.

Am. 54, 610.

Moore, B.C.J. (1986). Parallels between frequency selectivity measured psychophysically and

in cochlear mechanics, Scand. Audiol. Suppl. 25, 139-152.

Nejime Y. and Moore B.C.J. (1997). Simulation of the effect of threshold elevation and

loudness recruitment combined with reduced frequency selectivity on the

intelligibility of speech in noise, J. Acoust. Soc. Am. 102, 603-615.

Ohm, G.S. (1843). über die definition des tones, nebst daran geknüpfter theorie der sirene und

ähnlicher tonbildender vorrichtungen, Ann. Physik.59, 513.

Palmer C., Nelson T. And Lindley IV G. A. (1998). The functionally and physiologically

plastic adult auditory system, J. Acoust. Soc. Am. 103, 1705-1721.

Patterson R.D., Nimmo-Smith I., Weber D.L. and Milroy R. (1982). The deterioration of

hearing with age: Frequency selectivity, the critical ratio, the audiogram, and speech

threshold, J. Acoust. Soc. Am. 72, 1788-1803.

Patterson, R.D. (1976). Auditory filter shapes derived with noise stimuli, J. Acoust. Soc. Am.

59, 640-654.

Patterson, R.D., Allerhand, M. And Giguère, C. (1995). Time-domain modelling of peripheral

auditory processing: a modular architecture and a software platform, J. Acoust. Soc. Am.

98, 1890-1894.

Patterson, R.D., Nimmo-Smith, I., Holdsworth, J. and Rice, P. (1988). Spiral vos final report,

Part A: The auditory filterbank, Cambridge Electronic Design, Contract Rep. (Apu

2341).

Peter, R.W. and Hall, J.W. (1984). Generalization and maintenance of pitch-change effects, J.

Acoust. Soc. Am.76, S76.

Philibert, B., Collet, L. And Veuillet, E. (2000). Revue de littérature non-publiée.

Grimault 213

Plack C.J., Carlyon R.P. (1995). Differences in frequency modulation detection and


and unresolved harmonics, J. Acoust. Soc. Am. 98, 1355-1364.

Plomp R. (1964). the ear as a frequency analyzer, J. Acoust. Soc. Am. 36, 1628-1636.

Plomp R. (1967). Pitch of complex tones, J. Acoust. Soc. Am. 41, 1526-1533.

Plomp R. (1978). Auditory handicap of hearing impairment and the limited benefit of hearing

aids, J. Acoust. Soc. Am. 63, 533-549.

Plomp, R. (1965). Detectability threshold for combination tones, J. Acoust. Soc. Am. 37,

1110-1123.

Polat, U. and Sagi, D. (1994). Spatial interactions in human vision: from near to far via

experience dependent cascades of connections, Proc. Natl. Acad. Sci. USA 91, 1206-

1209.

Ragot, R. and Crottaz, S. (1998). A dual mechanism for sound pitch perception: new evidence

from brain electrophysiology, Neuroreport 9, 3123-3127.

Rasch, R.A. (1978). The perception of simultaneous notes such as in polyphonic music,

Acustica 40, 21-33.

Recanzone G.H., Schreiner C.E., Merzenich, M.M. (1993). Plasticity in the frequency

representation of primary auditory cortex following discrimination training in adult owl

monkey, J. Neurosc. 13, 87-103.

Ritsma R.J. (1967). Frequencies dominant in the perception of the pitch of complex sounds,

J. Acoust. Soc. Am. 42, 191-198.

Roberts B., Brunstrom J.M. (1998). Perceptual segregation and pitch shifts of mistuned

components in harmonic complexes and in regular inharmonic complexes, J. Acoust.

Soc. Am. 104, 2326-2338.

Grimault 214

Robinson D.W., Dadson R.S. (1956) A redetermination of the equal-loudness relations for

pure tones, Br. J. Appl. Phys. 7, 166-181.

Robinson K., Summerfield A.Q. (1996). Adult auditory learning and training, Ear and

Hearing 17, 51S-65S.

Rose, J.E., Brugge, J.F., Anderson, D.J. and Hind, J.E. (1968). Patterns of activity in single

auditory nerve fibers of the squirrel monkey, In A.V.S. de Reuck & J. Knight (Eds).:

Hearing mechanisms in vertebrates. London: Churchill, 144.

Rose, M.M., and Moore, B.C.J. (1997). Perceptual grouping of tone sequences by normally-

hearing and hearing-impaired listeners, J. Acoust. Soc. Am. 102, 1768-1778.

Scheffers M.T.M. (1983). Simulation of auditory analysis of pitch: An elaboration on the

DWS pitch meter, J. Acoust. Soc. Am. 74, 1716-1725.

Schouten, J.F. (1940). The residue and the mechanism of hearing, Proc. K. Ned. Akad. Wet.

43, 991-999.

Schouten, J.F. (1970) The residue revisited. In Frequency Analysis and periodicity perception

in hearing (ed. R. Plomp and G.F. Smoorenburg), Sijthoff, Leiden.

Schouten, J.F., Ritsma, R.J. and Cardozo, B.L. (1962). Pitch of the residue, J. Acoust. Soc.

Am. 34, 1418-1424.

Schulze H., Langner G. (1997a). Periodicity coding in the primary auditory cortex of the

mongolian gerbil (Meriones unguiculatus).: two different coding strategies for pitch and

rhythm ?, J. Comp. Physiol. 181, 651-663.

Schulze H., Langner G. (1997b). Representation of periodicity pitch in the primary auditory

cortex of the mongolian gerbil, Acta Otolaryngol. 532, 89-95.

Schulze H., Scheich H. (1999). Discrimination learning of amplitude modulated tones in

Mongolian gerbils, Neuroscience letter 261, 13-16.

Grimault 215

Schulze H., Scheich H., Langner G. (1998) Periodicity coding in the auditory cortex: what can

we learn from learning experiments?

Schwartz, I.R. (1992). The superior olivary complex and lateral lemniscal nuclei, In The

mammalian auditory pathway: neuroanatomy. Edité par D.B. Webster, A.N. Popper and

R.R. Fay, New York, Springer-Verlag, 117-167.

Seebeck, A. (1841). Beobachtungen über einige Bedingungen der Entstehung von Tönen, Ann.

Phys. Chem. 53, 417.

Seebeck, A. (1843). über die Sirene. Ann. Phys. Chem. 60, 449.

Semal C., Demany L. (1991a). Dissociation of pitch from timbre in auditory short-term

memory, J. Acoust. Soc. Am. 89, 2404-2410.

Semal C., Demany L. (1991b). Further evidence from an autonomous processing of pitch in

auditory short-term memory, J. Acoust. Soc. Am. 93, 1315-1322.

Semal C., Demany L., Ueda K., Hallé P. (1996). Speech versus nonspeech in pitch memory, J.

Acoust. Soc. Am. 100, 1132-1140.

Shackleton, T.M., and Carlyon, R.P. (1994). The role of resolved and unresolved harmonics in

pitch perception and frequency modulation discrimination, J. Acoust. Soc. Am. 95, 3529-

3540.

Shiu, L.P. and Pashler, H. (1992). Improvement in line orientation discrimination is retinally

local but dependent on cognitive set, Percept. Psychophys. 52, 582-588.

Singh, P.G. (1987). Perceptual organization of complex-tones sequences: a tradeoff between

pitch and timbre?, J. Acoust. Soc. Am. 82, 886-899.

Singh, P.G., and Bregman, A. (1997). The influence of different timbre attributes on the

perceptual segregation of complex-tone sequences, J. Acoust. Soc. Am. 102, 1943-1952.

Slaney M., Lyon R.F. (1990). A perceptual pitch detector, Proc. 1990 IEEE Int. Conf.

Acoustics, Speech and Signal Processing (ICASSP), Albuquerque, NM, 357-360.

Grimault 216

Smith, P.H., Joris, P.X. and Yin, T.C.T. (1993). Projections of physiologically characterized

sperical bushy cell axons from the cochlear nucleus of the cat: evidence for delay lines to

the medial superior olive, J.Comp. Neurol. 331, 245-260.

Snodgrass J.G. and Corwin J. (1988). Pragmatics of measuring recognition memory:

Applications to dementia and amnesia, J. Exp. Psychol.: Gen. 117, 34-50.

Sommers M.S. and Gehr S.E. (1998). Auditory suppression and frequency selectivity in

older and younger adults, J. Acoust. Soc. Am. 103, 1067-1074.

Sommers M.S. and Humes L.E. (1993). Auditory filter shapes in normal-hearing, noise-

masked normal and elderly listeners. J. Acoust. Soc. Am. 93, 2903-2914.

Srulovicz P., Goldstein J.L. (1983). A central spectrum model: a synthesis of auditory-nerve

timing and place cues in monaural communication of frequency spectrum, J. Acoust. Soc.

Am. 73, 1266-1276.

Steinschneider M., Reser D.H., Fishman Y.I., Schroeder C.E., Arezzo J.C. (1998). Click train

encoding in primary auditory cortex of the awake monkey: evidence for two mechanisms

subserving pith perception, J. Acoust. Soc. Am. 104, 2935-2955.

Stevens S.S. (1957). On the psychoacoustical law, Psychol. Rev. 64, 153-181.

Tallal P., Miller S.L., Bedi G., Byma G., Wang X., Nagarajan S.S., Schreiner C., Jenkin

W.M., Merzenich M.M. (1996). Language comprehension in language-learning impaired

children improved with acoustically modified speech, Science 271, 81-84.

Terhardt, E. (1972a). Zur Tonhöhenwahrnehmung von Klängen. I. Psychoakustische

Grundlagen, Acustica 26, 173.

Terhardt, E. (1972b). Zur Tonhöhenwahrnehmung von Klängen. II. Ein Funktionsschema,

Acustica 26, 187.

Terhardt, E. (1974) Pitch, consonance and harmony. J. Acoust. Soc. Am., 55, 1061-1069.

Grimault 217

Terhardt, E. (1978). Psychoacoustic evaluation of musical sounds, Percept. Psychophys. 23,

483.

Thurlow,W.R. (1963) Perception of low auditory pitch: a multicue mediation theory. Psychol.

Rev., 70, 515-519.

Tyler R.S., Wood E.J. and Fernandes M. (1982). Frequency resolution and hearing loss,

Brit. J. Audiol. 16, 45-63.

Van Noorden L.P.A.S. (1975). Temporal coherence in the perception of tone sequences,

Unpublished Doctoral Dissertation, Technische Hogeschool Eindhovern, Eindhoven,

The Netherlands.

Vliegen J., Moore B.C.J. and Oxenham, A.J. (1999). The role of spectral and periodicity cues

in auditory stream segregation, measured using a temporal discrimination task, J. Acoust.

Soc. Am. 106, 938-945.

Vliegen J., Oxenham A.J. (1999). Sequential stream segregation in the absence of spectral

cues, J. Acoust. Soc. Am. 105, 339-346.

Walliser, K. (1968) Zusammenwirken von Hüllkurvenperiod und Tonheit bei der Bildung des

periodentonhöhe, Doctoral dissertation. Technische Hochschule, München.

Walliser, K. (1969a) Zusammenhänge zwischen dem Schallreiz und der Periodentonhöle.

Acoustica, 21, 319-328.

Walliser, K. (1969b) Zur Unterschiedsschwelle der Periodentonhöhe. Acoustica, 21, 329-336.

Walliser, K. (1969c) Uber ein Funktionsschema für die bildung der eriodentonhöhe aus dem

Schallreiz. Kybernetik, 6, 65-72.

Warren, R.M. (1982). Auditory perception: A new synthesis. New York: Pergamon.

White L.J., Plack C.J. (1998). Temporal processing of the pitch of complex tones, J. Acoust.

Soc. Am. 103, 2051-2063.

Whitfield, I.C. (1967) The auditory pathway, Arnold, London.

Grimault 218

Whitfield, I.C. (1970) Central nervous processing in relation to spatiotemporal discrimination

of auditory patterns. In Frequency Analysis and periodicity perception in hearing (ed. R.

Plomp and G.F. Smoorenburg), Sijthoff, Leiden.

Wiegrebe L, Patterson R.D., Demany L., Carlyon R.P. (1998). Temporal dynamics of pitch

strength in regular interval noises, J. Acoust. Soc. Am. 104, 2307-2313.

Wright, B.A., Buonomano, D.V., Mahncke, H.W. and Merzenich, M.M. (1997). Learning and

generalization of auditory temporal-interval discrimination in humans, J.Neurosc. 17,

3956-3963.

Yost, W.A. (1996). Pitch strength of iterated rippled noise when the pitch is ambiguous, J.

Acoust. Soc. Am. 101, 1644-1648.

Grimault 220

ANNEXES

Grimault 221

A1: Modèle de calcul des patterns d'excitations périphériques.

A1-1-Présentation du modèle.

J'ai, au cours de ce doctorat, élaboré un rapide modèle permettant le calcul des sorties

temporelles (les patterns d'excitation) des filtres auditifs. Le modèle classique le plus utilisé

est celui de Glasberg & Moore (1990). Un problème cependant est à mon avis intrinsèque à ce

modèle, en particulier pour mesurer l'incidence de rapides fluctuations temporelles sur la

sortie des filtres. Ce modèle est spectral. Il commence par prendre la représentation spectrale

du signal incident par une classique transformée de Fourrier rapide (algorithme FFT) pour

calculer le spectre à la sortie de chaque filtre auditif. Il repasse alors dans le domaine temporel

par transformée de Fourrier inverse (IFFT).

Le modèle que j'ai élaboré et qui vous est présenté dans cette annexe est un modèle

strictement temporel. Il utilise la définition temporelle des filtres auditifs donnés par Irino &

Patterson (1997): les gammachirps.

Ces auteurs donnent la réponse impulsionnelle de ces gammachirps:

)ln2cos())(2exp()( 1 φππ ++×−= − tctftfbERBattg rrn

c (t>0) (1)

où a, b, c, n, fr et å sont les paramètres du modèle (Irino & Patterson, 1997) et ERB consiste

en la fonction définie par Glasberg & Moore (1990) qui est rappelée dans le premier chapitre:

rr ffERB 108.07.24)( += (2)

Grimault 222

J'ai développé ce modèle dans l'objectif de quantifier le "splater" induit par les temps de

monté-descentes (les rampes) des sons complexes harmoniques utilisés dans l'étude 5 (entre

2.5 ms et 40 ms).

Détaillons les différentes phases de calcul de ce modèle.

1-Avant toute chose, précisons que ce modèle calcule l'énergie RMS du signal incident dans

32 bandes auditives numérotées de 2 à 33 d'après la formule de Glasberg & Moore (1990):

)137.4(log4.21 10 +=° kHzFERBN (3)

Les fréquences centrales (FkHz) de ces bandes sont donc répartis entre 55 Hz et 7743 Hz et leur

largeur est calculée grâce à la fonction ERB (eq.2).

Ainsi, connaissant le signal incident et son niveau global en dB SPL, on extrait l'énergie par

bande de ce signal par simple transformée de Fourrier (algorithme FFT) puis transformée de

Fourrier inverse (IFFT).

2-L'énergie par bande est alors corrigée pour prendre en compte la variation des seuils auditifs

aux différentes fréquences. Cette correction utilise les données expérimentales de Robinson &

Dadson (1956) qui sont rapportées dans Glasberg & Moore (1990) dans le tableau MAF

("Minimum Auditory Field"). Une interpolation affine par morceau de cette courbe sur une

échelle des abscisses logarithmiques permet de calculer la correction à apporter pour chaque

fréquence. Cette interpolation ainsi que les points expérimentaux de Robinson & Dadson

(1956) sont représentés sur la figure ci dessous.

Grimault 223

Fig A1: Les cercles représentent les seuils auditifs (MAF) mesurés par Robinson & Dadson

(1956). La ligne représente la fonction continue utilisée pour trouver une approximation du

seuil pour une fréquence quelconque.

3-Une fois l'énergie par bande connue, on peut calculer à l'aide de l'équation 1 les 32 réponses

impulsionnelles correspondant aux 32 fréquences centrales Fcs. Ces réponses impulsionnelles

sont intensité-dépendantes. Il était donc indispensable de connaître préalablement l'énergie par

bande calculée en 1 et corrigée en 2.

Au passage, pour chaque réponse impulsionelle (ie. Pour chaque Fc), un coefficient

normalisateur est calculé afin qu'un son pur centré sur Fc ne perde pas d'énergie en traversant

le filtre auditif centré, lui aussi, sur Fc.

4-La convolution du signal incident avec les 32 réponses impulsionnelles normées par leur

coefficient respectif donne alors 32 patterns d'excitation en sortie de 32 filtres auditifs.

A1-2-Application du modèle.

Grimault 224

Nous avons appliqué ce modèle à 12 des signaux complexes utilisés dans l'étude 5. Les

paramètres sont les suivants:

Fréquence fondamentale (F0): 62 Hz ou 352 Hz.

Filtrage passe bande en région LOW (125-625 Hz), MID (1375-1875 Hz) ou HIGH (3900-

5400 Hz).

Intensité globale: entre 55 dB SPL (certains signaux ont été calibrés à 55 dB SPL) et 54.41 dB

SPL (l'énergie des autres a été déduite numériquement).

Durée des signaux: la durée prise en compte de chaque signal est égale à son temps de montée

(2.5 ms ou 40 ms). La durée totale du signal est 200 ms.

Le modèle compare donc la réponse des filtres (l'énergie RMS par filtre) pendant le temps de

montée (2.5 ou 40 ms) dans différentes régions et F0s. L'énergie de chaque pattern d'excitation

dans chacune des conditions ci-dessus est tracée sur la figure ci-dessous. La réponse

énergétique des 32 filtres est normée (maximum égal à 1) pour chaque signal afin de prendre

en compte les différences d'énergies dues à des durées disparates et de permettre ainsi une

comparaison inter-signal.

Grimault 225

Fig A2: Energie RMS par bande en réponse à des sons complexes de fréquences

fondamentales 62 Hz (en haut) ou 352 Hz (en bas). Ces sons sont filtrés dans trois régions

distinctes: LOW (à gauche), MID (au milieu) et HIGH (à droite). Enfin, dans chaque cadre, le

trait continu correspond à la réponse (normée) à un son montant sur 2.5 ms, le trait pointillé

à un son montant sur 40 ms et le trait en tirets à la différence de ces deux valeurs.

A1-3-Discussion du modèle.

Ce modèle à l'avantage d'utiliser, dans le domaine temporel, les réponses

impultionelles simulants les filtres auditifs. Ceci permet d'avoir un réel aperçu du splater (la

sur-activation de nombreux filtres auditifs) provoqué lors de l'onset brutal (2.5 ms) d'un son.

Grimault 226

Un problème apparaît cependant comme incontournable. Le calcul des réponses

impultionelles (c'est à dire le calcul des filtres auditifs) nécessite la connaissance préalable de

l'énergie présente dans chaque filtre. Or le calcul de cette énergie demande lui aussi la

connaissance préalable des filtres. Nous sommes donc enfermés dans un cercle infernal et sans

solution entièrement satisfaisante.

Pour contourner ce problème, nous avons choisi ici de calculer l'énergie dans des bandes

critiques rectangulaires (ERB) puis d'assimiler cette énergie à celle contenue dans le

gammachirp correspondant. Il s'avère que ce procédé introduit une certaine approximation.

Cette approximation rend incomplet ce modèle et explique qu'il n'ait pas été inclu dans

l'article 5.

A1-4-Résultats et apport du modèle à la discussion de l'étude 5.

Il est toutefois intéressant, malgré la remarque ci-dessus, de bien observer les courbes

de la figure A2.

Région Différence intégrée moyenne ââ

LOW 1.48 (0.19)

MID 1.36 (1.16)

HIGH 0.14 (0.65)

Table A1: Somme sur les 32 bandes des indices spectraux en région LOW, MID et HIGH. La

déviation standard est donnée dans la troisième colonne.

Grimault 227

Tout d'abord, on remarque que globalement, les sons ayant des temps de montées rapides ont

tendance à exciter plus de filtres auditifs. La différence induite par des temps d'onset de 2.5

ms et de 40 ms (les indices spectraux) est maximum dans la région LOW (Table A1). Elle

décroît très légèrement dans la région MID et elle est quasiment inexistante dans la région

HIGH (Table A1). Ceci est en accord avec les éléments qui ont été discutés dans l'étude N°5.

Si nous admettons que le splater physique d'un son est indépendant de sa fréquence (figure

A3). Le spectre d'un son de fréquence F ayant un temps de monté lent se rapproche d'un dirac

en F (ÔF). Par contre, celui d'un son de fréquence F ayant un très court temps de monté sera

élargi (présence de splater) et il peut donc être représenté schématiquement par une bande

centrée sur F de largeur L (L dépendant principalement du temps de montée).

Am

plit

ude

(arb

)A

mpl

itud

e (a

rb)

Am

plit

ude

(arb

)A

mpl

itud

e (a

rb)

Temps (s)

Temps (s)

Fréquence (Hz)

Fréquence (Hz)

F=100 Hz

F=10 kHz

∆∆

Fig A3: Les représentations temporelles (à gauche) et spectrales (à droite) de deux sons purs

de 20 ms, de fréquences 100 Hz (en haut) et 10 kHz (en bas) sont représentées sur cette figure.

La représentation spectrale a été obtenue par transformée de Fourrier (FFT). On observe que

la largeur Ç du lobe principal des deux spectres est grossièrement identique pour chacun des

signaux quelque soit leur fréquence (100 Hz ou 10 kHz).

Grimault 228

Si les filtres sont larges (région HIGH), le passage de ÔF à L ne constituera pas un

changement majeur puisque ÔF excitait déjà de nombreux filtres. Par contre, si les filtres sont

étroits (région LOW), le passage de ÔF, qui n'excitait qu'un filtre, à L qui en excite plusieurs

est tout à fait remarquable. Cette argumentation est schématisée sur la figure A4.

LOW HIGH

Temps de monté: 40 ms

Temps de monté: 2.5 ms

filtres auditifs

spectredu signal

Fig A4: Cette figure représente schématiquement 4 configurations possibles:

1-En haut à gauche, un son pur basse fréquence (région L0W) ayant un temps de monté de 40

ms excite un unique filtre auditif.

2-En haut à droite, ce même son en haute fréquence (région HIGH) excite 3 filtres.

3-En bas (temps de monté 2.5 ms), quelque soit la région stimulée (LOW ou HIGH), le son de

basse et celui de haute fréquence excitent tous les deux trois filtres auditifs.

En région LOW, le passage de 2.5 ms à 40 ms provoque donc la stimulation de 2 filtres

supplémentaires. Ceci n'est pas vrai en région HIGH.

C'est donc certainement ce phénomène qui est mis en évidence par la figure A2. Remarquons

tout de même que cet effet ne semble pas être corrélé à la résolvabilité des signaux mais bien

plutôt à la région de filtrage (ie. à la largeur des filtres auditifs stimulés). En effet, le signal

62-MID (F0=62 Hz et région MID) fournit plus d'indices spectraux que le signal 62-HIGH

(F0=352 Hz et région HIGH) alors que tous deux sont non-résolus.

Grimault 229

Grimault 230

THE PITCH OF HARMONIC COMPLEX TONES: STUDY OF ENCODINGMECANISMS AND CONNECTION WITH AUDITORY SCENE ANALYSIS.

Summary:

In the first and introductory part of the thesis, the principal results and models of theliterature concerning the virtual pitch encoding theories are presented. Additionally, I presentthe main rules of the primitive auditory scene analysis. The connection between, on the onehand, the pitch analysis and, on the other hand, the auditory scene analysis is underlined. Thelast part of the introduction deals with auditory learning. As a matter of fact, thispsychoacoustical field has been used as a method to put into evidence similarities betweenneuronal process.Five studies succeed to this introduction. Using a transfer of learning paradigm, the first andthe second studies clearly argue for the existence of two different pitch encoding processdepending on the harmonic's resolvability. The selective learning transfer between pure-tonesdiscrimination and resolved harmonic complex tones discrimination task suggests that thepitch of resolved harmonics could be encoded by a spectral or a spectro-temporal process.All three last studies are aimed to investigate the auditory scene organization using pitchproximity. The first one put into evidence that although streaming can occur in the absence ofspectral cues, the degree of resolvability of the harmonics has a significant influence. Thesecond one gives a first explanation of the streaming difficulties experienced by elderlyhearing -impaired individuals. Their reduced peripheral frequency selectivity prevents themfrom using spectral cues in the same way as young and healthy subjects. The last study isaimed to further investigate the influence of temporal transition in pitch analysis mechanismsand auditory stream segregation. Overall, the results of this study confirm and extend those ofprevious studies showing that discrimination limens for fundamental frequency discriminationcan be impaired by temporally adjacent complexes. The results are consistent with thehypothesis that abrupt transitions between successive tones, generating spectral splatters,contribute to reset the mechanism which is responsible for pitch analysis and help forsegregation. As a conclusion, a general discussion of these results is provided in order toembrace the five experiments. A peripheral auditory simulation is described in annex.

Key-words: Psychoacoustic, pitch, auditory scene analysis, streaming, frequency selectivity,hearing-impairment.

Grimault 231

INDEX PAR AUTEUR1

1-Cet index fournit les numéros des pages où sont cités les auteurs. Les numéros précédés de "A" se réfèrent à unnuméro d'étude ou d'article (par exemple: A1 pour étude numéro 1).

Grimault 232

Auteur Page

Abramson J. 64Ahad P 69,A5Ahissar, M. A1,192Alain C. A4Alexander G.C. A4Allerhand, M. 34Anderson, D.J. 15Anstis, S. 77,A3Arezzo J.C. 27,55,A1Artaud, P. A1,A2,A4Bacon, S.P. 64Beauvois, M.W. 76,77,A4Bedi G. 84Beerends J.G. 23Békésy, G. Von 14Bilecen, D. 86Bilsen, F.A. 25,28-29 32,39,52,A3Bregman A.S. 58-60,64,67-69,71-77,A2,A3,A4,A5,198Broadbent, D.E. 74Brown J.C. 29,32Brugge, J.F. 15Brunstrom J.M. 27,51Bundy, R.S. 26Buonomano, D.V. A1Burns E.M. 28,A2Buunen T.J.F. 25Buus S. A4Byma G. 84Campbell J. 72,A3,A4Canévet, G. 13Cardozo, B.L. A1Cariani P.A. 31,55,A1Carlyon R.P. 9,25,3844,46,47,49,50,52-54,78,A1,A2,A3,A4,A5,192,196,197Carr, C.E. 31Casseday, J.H. 31Cherry E.C. A4Ciocca V. 51,43,69Clarkson M.G. 26Collet L. 86,A1,A2,A4Colombo, J. 26Corwin J. A4Covey E. 31Cox R.M. A4Crottaz, S. A1Cusack R. A4Dadson R.S. 222,223Dannenbring, G. 71Darwin C.J. 25,51,69Davis A. A4de Cheveigné A. 29,30Delgutte B. 31,55,A1Demany L. 52,53,56,81-83,A1,A2,A5

Auteur Page

Denham, M.J. 76,A3,A4Deutsch D. A5Doehring P. 64Dolmazon 13Doty S.L. 27Evans E.F. 55Faulkner A. 23Fernandes M. 64,A4Festen J.M. 25Fishman Y.I. 27,55,A1Fitzgerald M.B. A2Fletchter, H. 15,A2,A4Florentine M. A4Foxton J.M. A4Gehr S.E. A4Gerson, A. 23Giguère, C. 34Glasberg B.R. 16,25,34,46,A3,A4,221,222Gockel, H. 78,A3,A4,A5,197Goldstein J.L. 23-25,56,A2,194Green D.M. 13,A4Greenwood, D.D. 34Grimault N. 64,A1,A4Haggard, M.P. 64Hall, J.W. 26,64Hallé P. A5Hartmann W.M. 27,58,72,A1,A2, A3,A4Heil P. 55Heise, G.A. A3Helmholtz, H.L.F. Von 20,A1Hewitt, M. 29,31-33,47,52,56,A1,A2,192Hicks, M.L. 64Hind, J.E. 15Hochstein, S. A1,192Hoekstra A. A4Holdsworth, J. 34Houtsma A.J.M. 23,A3,A4,196Humes L.E. A4Irino T. 16,17,A1,221Irvine, D.R.F. 31Iverson P. A3Jeffress, L.A. 31Jenkin W.M. 84Johnson D. A3,A4Joris, P.X. 31Kaernbach C. 53,56Karni, A. 81,A1,192Kim J 69,A5Konishi, M. 31Ladefoged 74Langner G. 55,A2Lee J. 64Levitan R. 73-75,A2,A3,A4

Grimault 233

Auteur Page

Levitt, H. A1,A2,A5Liao C. 73,A2,A3,A4Licklider, J.C.R. 31,35Lin J.Y. 27Lindley IV G. A. 86Lundeen, C. 25Lyon R.F. 29,32Mahncke, H.W. A1Martens J.P. 27Massaro D.W. A5Maubaret C. 83McAdams S. 65McCabe S.L. 76,A3,A4McKeown, J.D. 25Meddis R. 29,31-34,36-38,40, 44-45,47-48,50-53,56,76,77,A1,A2,A3,A4,192Melnerich L. 69,A5Menning, H. 85Merzenich M.M. 84,A1Micheyl C. 9,44,78,A1,A2,A3,A4,A5,197Miller S.L. A3Miller, G.A. 84Milroy R. A4,196Montgomery C.R. 26Moore B.C.J. 14-17,25,28,34,46,A2,A3,A4,221,222Nagarajan S.S. 84Nejime Y. A4Nelson T. 86Nimmo-Smith I. 34,A4,196O’Mard L. J. 32,36-38,40,44,45, 47-48,50-51,53,A1,A2,192Ogawa K.H. A4Ohm, G.S. 20Oxenham, A.J. 74,A3,A4,196Palmer C. 86Pantev, C. 85Pashler, H. 81,A1Patterson R.D. 16,17,34,52,A1,A3,A4,196,221Peter, R.W. 26,28Philibert, B. 86Pinker S. 74,76Plack C.J. 44,A1,A2,A5Plomp R. 13,24,28,A3,A4Polat, U. 81,A1,192Probst, R. 86Puckette M.S. 29,32Radü, E.W. 86Ragot, R. A1Rasch, R.A. 62,63Recanzone G.H. 84,A1Reser D.H. 27,55,A1Rice, P. 34Ritsma R.J. 28,29,32,52,A1,A4

Auteur Page

Roberts B. 27,51,85Robertson I.H. A4Robinson K. 84,A1,193Robinson D.W. 222,223Rose, J.E. 15Rose, M.M. A3,A4Sagi, D. 81,A1,192Saida, S. 77,A3Sams M. 55Sandell G.J. 51,69Scharf B. A4Scheffers M.T.M. 23Scheffler, K. 86Scheich H. 82,A2Schmid, N. 86Schouten, J.F. 20,27-29,32,A1,A2Schreiner C.E. 55,84,A1Schroeder C.E. 55,A1Schulze H. 55,82,A2Schwartz, I.R. 31Seebeck, A. 20Seifritz, E. 86Semal, C. 83,A5Shackleton, T.M. 25,39-43,46,49,53-54,A1,A2,A3,A4,A5,192,196Shiu, L.P. 81,A1Singer, J. 26Singh, P.G. A3Slaney M. 29,32Small, A.M. 25Smith, P.H. 31Smurzynski J. A3,A4,196Snodgrass J.G. A4Sommers M.S. A4Srulovicz P. 23,56Steinschneider M. 27,55,A1Stevens S.S. A4Sullivan, W.E. 31Summerfield A.Q. 84,A1,193Swets J.A. A4Takahashi, T.T. 31Tallal P. 84Terhardt, E. 26,A2,194Thurlow,W.R. A2Tyler R.S. A4Ueda K. A5Van den Brink G. 25Van Noorden L.P.A.S 72,77,A3,A4Veuillet, E. 86Viemeister N.F. 28,A2Vliegen J. 74,A3,A4,196Wagner, H. 31Walliser, K. A2Wang X. 84Warren, R.M. 66,67Weber D.L. A4,196

Grimault 234

Auteur Page

Wetzel, S. 86White L.J. A5Whitfield, I.C. A2Wiegrebe L. 52,A3Wood E.J. A4Woods D.L.. A4Wright B.A. A2Wright, B.A. A1Yin, T.C.T. 31Yost, W.A. 52Zwicker E. A4

PERCEPTION DE LA HAUTEUR DES SONS COMPLEXES HARMONIQUES:ETUDE DES MECANISMES SOUS-JACENTS ET RELATION AVEC L'ANALYSEDE SCENES AUDITIVES.

Résumé en Français:

Dans une première partie d'introduction, j'ai présenté de façon non exhaustive lesprincipales hypothèses et les principaux résultats de la littérature concernant les mécanismesd'encodage de la sensation de hauteur que nous évoque un son complexe harmonique. Danscette même partie, j'ai rapidement exposé les principales règles et mécanismes de groupementauditif qui nous permettent d'organiser en sources sonores distinctes la mixture sonore qui, àchaque instant, nous parvient à l'oreille. J'ai alors mis en évidence l'interconnexion de cesdeux grands domaines de la psychoacoustique. En fin d'introduction, un bref exposé sur lesapprentissages perceptifs auditifs permet de préciser un point de méthode essentiel qui a étéutilisé dans deux des études présentées dans ce document.

Cinq études sont intégrées dans ce manuscrit. Les deux premières études, en utilisantun paradigme de transfert d'apprentissage, ont apporté des arguments en faveur de l'hypothèseselon laquelle deux mécanismes neuronaux différents pouvaient être mis en oeuvre pour coderune sensation commune de hauteur. L'un de ces mécanismes semble partager des processuscommuns avec celui utilisé pour la perception de la hauteur tonale car un transfert partield'apprentissage se produit entre la tâche de discrimination de sons purs et celle dediscrimination de sons complexes harmoniques lorsque les harmoniques sont résolus par lesystème auditif périphérique. Le second de ces mécanismes pourrait, quant à lui, utiliser lesfluctuations temporelles d'enveloppe pour extraire la hauteur. Toutefois, cette secondehypothèse n'a été que très partiellement confirmée par les résultats.

Une revue de littérature a montré, en introduction, que la hauteur est un puissant outilde l'analyse de scènes auditives, les trois études suivantes explorent en détail le groupementpar proximité de hauteur. Nous avons mis en évidence que si la présence d'indices spectrauxn'était pas indispensable aux mécanismes de groupement, ces indices facilitaient néanmoinsleur mise en oeuvre. Une autre étude a mis en évidence que les malentendants ont, pour cetteraison, des difficultés spécifiques pour analyser les scènes auditives. Enfin, la dernière étudemesure l'influence des temps de montée et des temps de descente dans une expérience dediscrimination de hauteur entre des sons complexes précédés d'une frange temporelle. Lestransitions brusques, en élargissant le spectre des stimuli, permettraient sans doute deréinitialiser les mécanismes d'encodage de la hauteur et d'améliorer ainsi les performances dediscrimination en favorisant la ségrégation des sons complexes.

Pour conclure ce travail, une discussion générale des résultats résume et relie entre euxles différents travaux expérimentaux. Enfin, un modèle de perception auditive périphériqueest présenté en annexe.

Discipline: Acoustique

Mots-clés: Psychoacoustique, hauteur, analyse de scène, groupement auditif, sélectivitéfréquentielle, malentendants, mécanismes perceptifs, discrimination, flux auditifs.

Laboratoire: UMR CNRS 5020, "Neurosciences et Système Sensoriels"Pavillon U, Hôpital Edouard Herriot, 3 Place d'Arsonval69003 LYON, Cedex 03

PERCEPTION DE LA HAUTEUR DES SONS COMPLEXES …

Documents