Musicological and Technological Exploration of Truths and Myths in Carnatic Music, the Raagam in Particular Thesis submitted in partial fulfillment of the requirements for the degree of Masters in Computer Science by Koduri Gopala Krishna 200502005 [email protected]Cognitive Science Lab International Institute of Information Technology Hyderabad - 500 032, INDIA December 2010
68
Embed
Musicological and Technological Exploration of …web2py.iiit.ac.in/publications/default/download/masters...Musicological and Technological Exploration of Truths and Myths in Carnatic
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Musicological and Technological Exploration of Truths and Myths inCarnatic Music, the Raagam in Particular
Thesis submitted in partial fulfillmentof the requirements for the degree of
An individual is moulded by the experiences in life. Few of us are lucky to have valuable experiencesin a conducive environment to grow in. I’m deeply indebted to almost every person I have encounteredin the last few years of my stay at IIIT-H and outside. I’ll go in a chronological order, for it will help mebetter in recollecting most of them.
My journey at IIIT-H commenced with meeting two similar natured guys of coastal Andhra descent,Bharat ram (Ambati) from Vijayawada and Vijay bharat (Yaram) from Guntur. Without the endlessfun episodes of Yaram and mission-critical tutorials of Ambati, I would not have found the place thatinteresting and hospitable. In the course of my stay, I discovered a wide variety of creatures in thejungle, I can fill pages with their names.
Mesmerized by the lectures of Prof. Jawahar in the first year, we three stayed back in the campusin the summer holidays of 2006, with the sole aim of securing our future at the prestigious Center forVisual Information and Technology, under his guidance. Obviously, in the beginning I did not have anyobjective of my own, I was behind what is lucrative in the popular opinion. Prof. Jawahar was, is andwill be one of the most adored professors of IIIT-H. I spent two years in that lab. There, I interactedwith several people. I never saw Rasagna frowning or complaining, however be the day. He shared hisexperiences with a very open heart. He is also the person with whom I can relate to myself in mostcases. Pramod Nair had always been there whenever I needed some guidance. I would call him andsay, ”Anna, chinna salaha kavali..” (Bro, I need an advice), and then it would go on. I partnered withRavindra who thinks very analytically, in several of my course projects. He was a very good partner. Heresembled Buddha, as he never got emotional to which ever extent I freaked.
Above all, it is Prof. Jawahar to whom I’m indebted to. Though I did not take up any seriouscomputer vision stuff while I was in the lab, I was involved in two projects - one related to the work onfont encodings, and the other, document image retrieval. The work on font encodings has later helpedme serve my first love - Open Source and Ethnocomputing. Today, I still devote a significant part of mytime towards it. There are several valuable experiences I have gained in the way - leading a small teamfor the development of an Indic Firefox-plugin (Padma), working as an intern at a company, identifyingthe core problems with Ethnocomputing in India and so on. A very very special thanks to him.
In late 2008 and early 2009, I got interested in cognitive science. I conveyed the same to Prof. Jawa-har, who said that I should pursue whatever my interest is. It is then, I met Prof. Bipin, who warmlywelcomed me into his lab. At first, I did not have a special interest in any particular topic in cognitive
v
vi
science. Without the freedom Prof. Bipin usually leaves his students with, it would have been a real dif-ficult situation for me. I kept jumping from topic to topic; I was intrigued by the cognition of language,role of images in comprehension, narrative structures etc. But finally, I have zeroed down to musiccognition and music information retrieval. By late 2009, I started working seriously on it. The topic Ihave chosen is - explore the musicological literature of India, see what has been done technologically,and address an interesting issue. The interdisciplinary nature of the topic made it difficult to move aheadin a normal phase a typical masters student would do. I’m immensely indebted to Prof. Bipin for hispatience and especially the opportunities he had provided me to learn from.
Prof. Bipin was very kind to me in leaving the freedom to take decisions that would help me. Theinternship opportunity he has provided with Prof. Christophe of IRIT - ENSEEIHT, France during late2009, has deeply impacted my thoughts. It helped me to discover Prof. Christophe’s radically differentviews of human perception, of music in particular. I’m fascinated by the non-statistical approach toaddress problems in information retrieval. Prof. Christophe is a very kind and friendly person. Withouthim, my stay in France, which is my first stay outside India, would have been a nightmare.
I’m very grateful to Prof. Preeti Rao, DAP lab, IIT-B for guiding me in building the raaga recognitionsystem. The three months in summer 2010 I have spent in her lab, have been very fruitful in gettingseveral insights into audio processing. A special thanks to Sankalp Gulati and other DAP lab membersfor making my stay at IIT-B a peaceful and interesting one. I’m also grateful to Dr. SuvarnalathaRao for replying patiently to my queries on Indian classical music. I thank Prof. Navjyoti, PranavKumar Vasishta, Kavita Vemuri, Sai Gollapudi and Violin Vasudevan for providing me valuable contactsand resources on Indian classical music. I thank Anupama, Abhilash, Ambati, Divya and Siva forreviewing the drafts written for conferences. I’m also greatly thankful to Prof. Xavier and Joan, ofMusic Technology Group at UPF in Barcelona, for reviewing parts of this thesis and providing criticaland wonderful feedback.
I feel very lucky to be in the company of my friends at IIIT-H and outside. The experiences weshared are very influential. There are commendable outcomes from the discussions over societal issues- http://team-samvedana.org and http://techsetu.com. There is also a drastic change in my nature andthe way I socialized. My gratitude is inexpressible.
With a weak economic status and a rural background, the firm determination of my parents in pro-viding me a decent schooling is the sole reason why I’m doing something that I’m doing today - thatwhich I liked and enjoyed. No words can possibly express what I owe them.
Abstract
The classical music traditions of the Indian subcontinent, Hindustani and Carnatic, offer an excellentground on which to test the limitations of the current music information research approaches. At thesame time, their study can shed light on how to solve new and complex music modeling problems. Bothtraditions have very distinct characteristics, specially compared with western ones: they have developedtheir own instruments, musical forms, performance practices, social uses and context. In this thesis, wefocus on the Carnatic music tradition of south India, especially on its melodic characteristics.
Raaga is the spine of Indian classical music. It is the single most crucial element of the melodicframework on which the music of the subcontinent thrives. Naturally, automatic raaga recognition isan important step in computational musicology as far as Indian music is considered. It has severalapplications like indexing Indian music, automatic note transcription, comparing, classifying and rec-ommending tunes, and teaching to mention a few. Simply put, it is the first logical step in the process ofcreating computational methods for Indian classical music. In this thesis, we investigate the properties ofa raaga and the natural process by which people identify the raaga. We survey the past raaga recognitiontechniques correlating them with human techniques, in both Hindustani and Carnatic music systems.We identify the main drawbacks and propose minor, but multiple improvements to the state-of-the-artraaga recognition technique.
Music is said to evoke emotions. After the advent of advanced signal processing techniques andeasily accessible computational resources, the scientists and engineers have been trying to understandthe nature of music in this very context. In this context, one of the several aspects of Indian music whichinterests us is the traditional association of emotions with raagas. Besides the ancient scriptures likeNatyasastra, the recent articles of several scholars also associate the raagas with emotions. A part ofour work is dedicated to the investigation of the origin of this association. We discuss the term rasa,often mistaken as emotion. We also report the results of a survey conducted to study the aforementionedraaga-emotion association.
We also overview the other theoretical aspects that are relevant for music information research anddiscuss the scarce computational approaches developed so far. We put emphasis on the limitations ofthe current methodologies and we present some open issues that have not yet been addressed and thatwe believe are important to be worked on.
3.1 Raaga paintings of Vasanta Ragini (left) and Hindola (right) raagas . . . . . . . . . . . 233.2 Average of the ratings collected per rasa, across all users and tunes, for each raaga.
X-axis denotes rasa index and Y-axis denotes the average value of the ratings. . . . . . 283.3 (a) Average rating obtained per rasa for a tune in Nadanamakriya. (b) Standard deviation
of the ratings for the same tune. X-axis denotes the rasa indices. Y-axis denotes thenormalized values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 Ratings given by six users for a sample track in each raaga. X-axis denotes rasa indices.Y-axis denotes the ratings quantifiers. . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5 Histograms of ratings obtained per rasa, for the six raagas. X-axis has the four ratingquantifiers - None at all, A Little, Somewhat and Very. Y-axis denotes the number ofratings obtained for each quantifier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1 Screenshot from the melodic pitch extraction system of [37] showing the detected pitchsuperimposed on the signal spectrogram. The axis on the right indicates pitch value (Hz). 37
In the last decade, the music information research played a vital role in the commercial music recom-mendation services in the western music industry. Examples include last.fm1, pandora2 etc. However,such services for Indian music were not encountered till date. Few scarce web music services likeraaga.com3, work merely with textual metadata. Even the current available web and desktop-basedservices for syncing or getting the metadata from the web are very western centric. The metadata ofa typical Indian film song is much more than that is allowed in Musicbrainz [52]. For instance, theinvolvement of various artists in creating an Indian film song can not be completely accounted for, usingthe schema used by, say Musicbrainz. This is because the involvement of various kinds of artists differin, say western pop and Indian films, and it is difficult to reflect these differences using, say Musicbrainz,without compromise. And majority of the content-based music recommendation algorithms found inopen source media players [57] are not at all suitable for Indian music - classical, film or otherwise.
In an attempt to build a web based service with metadata syncing and content based music recom-mendation for Indian film music, we surveyed the related music information research and musicologicalliterature. Methods proposed in several publications were based on western concepts which were notsufficient/relevant in the context of Indian film music. For instance, the concept of genre has no mean-ing as far as most film music is concerned. Classification of film songs requires a radically differentformulation, which, to our knowledge, has not been attempted. However, we found a few publicationsinteresting [24, 8]. They attempt to classify Indian classical music based on raagam and taalam [49]. Wealso encountered an interesting theoretical mood-based music classification in Indian classical music,which we found to be a scarcely researched topic.
In this thesis, we focus on Carnatic music and survey a few relevant computational models researchedin the past. We propose few enhancements to the state-of-the-art in the raaga recognition. Further, wealso investigate the raaga-emotion association with a behavioral study.
1.2 Why Carnatic music?
In the myriad of world music traditions, Indian classical music has few unique properties, as we’llsee in a detailed manner in the following chapter. There are two classical music traditions in India.Owing to the popularity of Pandit Ravi Shankar4 and The Beatles5 in the west, Hindustani, which isthe north Indian classical music, is often mistaken as the classical music tradition throughout India.The four south Indian states which form a large part of India - Andhra Pradesh, Karnataka, Kerala andTamilnadu - and parts of Maharashtra, Orissa have a distinct classical music tradition called Carnaticmusic.
Ever since India had seen the invaders, the evolution of the classical music tradition in India tooktwo different paths. In the north region, where it was greatly influenced by Sufi, it is called Hindustani.In south India, where it was less influenced, it is called Carnatic music. Carnatic tradition has adaptedsomething from outside, only if it proved to uphold the innate characteristics of the tradition. Violinstands as a living testimonial to this fact. The ability of the instrument to imitate human voice is verycrucial for its use in Carnatic music, which is full of gamakas, the curvy movements between notes.Though there are several commonalities between Carnatic and Hindustani, the differences are notableand very significant. In chapter 3, we outline few such differences between the two traditions.
Therefore, to be specific with what we are working with, we have chosen Carnatic music6. Moreover,the two classical music traditions of India have extensive musicological literature, and a few existingcomputational attempts that can help us in our investigation. Film music, however, neither have theextensive literature, nor any existing computational models. Since none of us are musicians by profes-sion, we chose classical music, for we can seek the guidance from the available literature during ourinvestigation.
1.3 Goals of our work
To our knowledge, this is the first thesis on computational and theoretical aspects of Carnatic music,presenting a thorough overview of the current state-of-the-art, and discussing several open issues that arecomputationally relevant in the realm. Though there are several musicological works in the past, therehas not been much discussion with an emphasis to build computational models barring few exceptions
4http://ravishankar.org5http://en.wikipedia.org/wiki/The_Beatles6It is also a natural choice since we are natives of the state of Andhra Pradesh.
2
[51] [53]7. As is the case with any area of research which is mostly untouched, it has been tough tochoose a narrow topic to work on. We chose these two broad aspects of Carnatic music to investigate indepth.
• Carnatic raaga recognition.
• The raaga-rasa relationship.
1.4 Contributions from the thesis
This thesis is intended to open doors to a new type of music for the scientific community to workon, than to propose solutions with a significant lap over the state-of-the-art. But we do report ourinvestigations in the two broad aspects which we have just listed. The following are the outcomes of thethesis:
1. Critical analysis of raaga-rasa relationship and a survey to test the hypothetical association be-tween them. Almost every scholarly article available on Indian music treats the term rasa asthough it is identical to the term emotion. In the course of our work, we used the term with thesame sense. But our investigation has yielded an insight which is very different from the currentunderstanding of the term. In this work, we report the results of a survey conducted to analyse therelationship between raaga and emotion, and discuss the the term rasa.
2. A survey of the state-of-the-art raaga recognition techniques identifying the drawbacks and thefuture directions. We present a general overview of the plausible approaches to the raaga recog-nition, and discuss various systems with respect to their contributions and drawbacks.
3. A raaga recognition system. To know a raaga, it is often said that there is no other way exceptto listen and feel it. This well defined, yet an abstract entity drew our attention to build a modelto identify the raaga of a given musical piece. In this work, we discuss the previous work andpropose few enhancements to a Hindustani raaga recognition model to suit the requirements ofCarnatic music, and discuss the results.
Apart from these major contributions, the thesis also includes the following minor contributions.
1. Discussed various concepts of Carnatic music like rasa, 22 srutis and microtonal intervals inthe light of knowledge shared by the recent investigations of the scientific community. We alsopresent several other open problems that are computationally relevant.
2. Built a ground-truth dataset of 10 raagas with 170 tunes. This dataset is drawn from real stageconcerts and audio CDs. As far as we know, it is also by far the most diverse Carnatic raagadataset reported.
7This thesis work which I have discovered only recently, has been done almost parallelly in IIT-Madras on other aspects ofCarnatic music.
3
1.5 Organization of the content
Chapter 2: This chapter introduces melodic, rhythmic and structural aspects of Carnatic music,coupled with the critical reviews of the past computational work. We focus on the drawbacks andpropose few enhancements. Further we discuss the advantages of computational modeling of variousmusical aspects of Carnatic music.
Chapter 3: We discuss the term rasa with a historical perspective, and present our criticism towardsits usage in today’s Indian classical music context. A thorough analysis of a survey on raaga and emotionis also presented and its implications on mood based music recommendation systems is discussed.
Chapter 4: In this chapter we identify few drawbacks of the current raaga recognition systems andpresent our method which attempts to overcome them. We’ll also present a new Carnatic raaga ground-truth data which can help researchers in their future efforts.
Chapter 5: We conclude the thesis presenting few open problems to the community, and also thepossible future direction of this work.
4
Chapter 2
Computational approaches to Indian classical music
Though all music traditions share few characteristics, each one can be recognized by some veryparticular features that need to be identified and preserved. The Information Technologies used for musicprocessing have typically targeted the western music traditions and current research is emphasizing thisbias even more. However, to develop technologies that can deal with the richness of our world’s musicwe need to study and exploit the unique aspects of other musical cultures. By looking at the problemsemerging from various musical cultures we will not only help those specific cultures but we will openup our computational methodologies, making them much more versatile. In turn, we will help preservethe diversity of our world’s culture.
The classical music traditions of the Indian subcontinent, Hindustani and Carnatic, offer an excellentground on which to test the limitations of the current music information research approaches. At thesame time, their study can shed light on how to solve new and complex music modeling problems.Both traditions have very distinct characteristics, specially compared with western ones: they havedeveloped their own instruments, musical forms, performance practices, social uses and context. Likewe said, in this thesis, we focus on the Carnatic music tradition of south India, especially on its melodiccharacteristics.
The computational study of Carnatic music offers a number of problems that require new research ap-proaches. Its instruments emphasize sonic characteristics that are quite distinct and not well understood.The concepts of Raaga and Taala are completely different from the western concepts used to describemelody and rhythm. Their music scores serve a different purpose than the ones of western music. Thetight musical and sonic coupling between the singing voice, the other melodic instruments and the per-cussion accompaniment within a piece, requires going beyond the modular approaches commonly usedin music information research (MIR). The tight communication established in concerts between per-formers and audience offer great opportunities to study issues of social cognition. Its devotional aim isfundamental to understand the music. The study of the lyrics of the songs is also essential to understandthe rhythmic, melodic and timbre aspects of the Carnatic music.
This chapter focuses on the melodic (Sec 2.1) and rhythmic (Sec 2.2) aspects of Carnatic music,overviewing the theoretical aspects that are relevant for MIR and discussing the scarce computational
5
approaches that have been presented. We put emphasis on the limitations of the current methodologiesand we present some open issues that have not yet been addressed and that we believe are important tobe worked on.
2.1 Computational approaches to melody
In Carnatic music, the melody is carried mainly by the vocalist. The voice plays always the centralrole, however, sometimes instruments like violin or veena take its place, usually imitating its mannerof articulating. The most fundamental melodic concept in Indian classical music is raaga. Matanga isthe first known person to define what a raaga is [45]: “In the opinion of the wise, that particularity ofnotes and melodic movements, or that distinction of melodic sound by which one is delighted, is raaga”.Therefore, the raaga is neither a tune nor a scale[32]. It is a set of rules which can together be called amelodic framework. The notion that a raaga is not just a sequence of notes is important in understandingit, and for developing a computational representation. A raaga evolves over time, i.e. no raaga wasunderstood the way it is today. A given raaga can nonetheless be described by a set of properties: Aset of notes (swaras), their progressions (arohana/avarohana), the way they are intoned using variousmovements (gamakas), characteristic phrases and the relative position, strength and duration of notes(types of swaras). In order to identify raagas computationally, swara intonation, scale, note progressionsand characteristic phrases are used (Sec 2.1.2 and 2.1.3). Other unexploited properties of a raaga includegamakas and the various roles the swaras play (Sec 2.1.4).
2.1.1 How do people identify raaga
Though there are no rules of thumb in identifying a raaga, usually there are two procedures by whichpeople get to know the raaga from a composition. It normally depends on whether the person is a trainedmusician or a rasika, the non-trained but knowledgeable person. People who have not much knowledgeof raagas cannot identify them unless they memorize the compositions and their raagas.
2.1.1.1 Non-trained person or the rasika’s way
In a nutshell, the procedure followed by a rasika typically involves correlating two tunes based onhow similar they sound. Years of listening to tunes composed in various raagas gives a listener enoughexposure. A new tune is juxtaposed with the known ones and is classified depending on how similar itsounds to a previous tune. This similarity can arise from a number of factors - the rules in transitionbetween notes imposed by arohana and avarohana, characteristic phrases, usage-pattern of few notesand gamakas.
This method depends a lot on the cognitive abilities of a person. Without enough previous exposure,it is not feasible for a person to attempt identifying a raaga. There is a note worthy observation in thismethod. Though the people cannot express in a concrete manner what a raaga is, they are still able to
6
identify it. This very fact hints at a possible classifier, that can be trained with enough data for eachraaga.
2.1.1.2 The trained musician’s way
A musician tries to find the characteristic phrases of the raaga. These are called pakads in Hindustanimusic and swara sancharas in Carnatic music. If the musician finds these phrase(s) in the tune beingplayed, the raaga is immediately identified. But at times these phrases might not be found or, are toovague. In this case, the musicians play the tune on an instrument (imaginary or otherwise) and identifythe swaras being used. They observe the gamakas used on these swaras, locations of various noteswithin the music phrases and the transitions between swaras. They use these clues to arrive at a raaga.
This method seems to use almost all the characteristics a raaga has. It looks more programmaticin its structure and implementation. If the current music technology can afford to derive various lowlevel features which can be used to identify such clues, the same procedure can be implemented com-putationally with almost perfect results! These two methods corresponding to the trained musiciansand the non-trained listeners are both important which are to be understood for implementing a raagarecognition system, or to model the raaga in a broad sense.
2.1.2 Swaras and Srutis
In Indian music, swaras are the seven notes in the scale, denoted by Sa, Ri, Ga, Ma, Pa, Da and Ni1
[43]. Except for the tonic and the fifth, all the other swaras have two variations each, which account for12 notes in an octave, called swarasthanas. There are three kinds of scales that one generally encountersin Carnatic and Hindustani music theory: a 12-note scale, a 16-note scale and the scale which claims 22srutis2. The 16-note scale is the same as the 12-note scale except that 4 of the 12 notes have two nameseach in order to be backward compatible with an older nomenclature. See Table 2.1. The tuning itself,whether it is just-intonation or equi-tempered, is an issue of debate3 [22]. Since Indian classical musicis an orally transmitted tradition, perception plays a vital role. For instance, tuning seldom involves anexternal tool. And even tambura, which is used as a drone, has a very unstable frequency. Hence theanalysis of the empirical data coupled with perceptual studies are important.
Few musicians and scholars claim that there are more srutis in practice than those explained above.Though many of them argue the total number to be 22, that itself is debated [18]. A more importantquestion to be asked is whether they are used in current practice at all. Some musicologists say that theyare no more used [35]. It is also said that they are wrongly attributed to Bharata, who used sruti to mean“the interval between two notes such that the difference between them is perceptible”. Krishnaswamy[23] argues that the micro tonal intervals observed in Carnatic music are the perceptual phenomena
1This notation is analogous to e.g. Do, Re, Mi, Fa, So, La and Ti.2Sruti is the least perceptible interval as defined in Natyasastra[36]3http://cnx.org/content/m12459/1.11
7
Table 2.1: The scales used in Indian classical music
Swaram Notation Western Sthanam RatioSadjamam Sa C 1 1Suddha Rishabam (Komal) Ri1 C # 2 16/15Chathusruthi Rishabam (Tivra) Ri2 D 3 9/8Shatsruthi Rishabam Ri3 D #/ E b 4 6/5Suddha Gandharam Ga1 D 3 9/8Sadharana Gandharam (Komal) Ga2 D # /E b 4 6/5Anthara Gandharam (Tivra) Ga3 E 5 5/4Suddha Madhyamam (Komal) Ma1 F 6 4/3Prati Madhyamam (Tivra) Ma2 F #/G b 7 64/45Panchamam Pa G 8 3/2Suddha Dhaivatham (Komal) Da1 G #/A b 9 8/5Chathusruthi Dhaivatham (Tivra) Da2 A 10 5/3Shatsruthi Dhaivatham Da3 A #/ B b 11 16/9Suddha Nishadam Ni1 A 10 5/3Kaisiki Nishadam (Komal) Ni2 A #/B b 11 16/9Kakali Nishadam (Tivra) Ni3 B 12 15/8
caused by the gamakas, i.e. that these micro tonal intervals are what few scholars and musicians claimas 22 srutis. However, we believe that these claims need to be verified with perceptual and behavioralstudies. In general, more empirical, quantitative and large-scale evidence on the tuning of Carnaticmusic needs to be gathered. In our encounters with most musicians, we can only conclude that they areunaware of the usage of 22 srutis in practice. Few musicians who claim they are used, are not ready todemonstrate them in a raaga. Table 2.2 shows the 22 sruti values derived by Sambamurthy [41].
It is a well accepted notion that a note (swarasthana) is a region rather than a point [13, 43]. Thus, a
fixed tuning for each note is not as important as it is in, say, western classical music. In addition, Sa can
be any frequency. It depends on the comfort of the singer or the choice of the instrument player. A given
note is intoned in different ways for each raaga. Even if two raagas have the same scale, the intonation
of notes vary significantly. Belle et al [4] have used this clue to differentiate raagas that share the same
scale. They evaluated their system on 10 audio excerpts accounting for 2 distinct scale groups (two
raagas each). They showed that the use of swara intonation features improved the accuracies achieved
with pitch-class distributions [8]. This clearly indicates that intonation differences are significant to
understanding and modeling raagas computationally. Levy [29] analyses the intonation in Hindustani
raaga performances and notes that it is highly variable, and that it does not seem to agree with any
standard tuning system. Subramanian [51] reports much the same for Carnatic music. These studies call
8
Table 2.2: The values of 22 Srutis derived by [41]
Name of Sruti Notation Ratio Interval Freq (Hz) Interval (cents) Equi-temp ratioShadja sa 1 240 0Ekasruti Rishabha ra, r1 256/243 1.0534 252.8 90 1.05946Dvisruti Rishabha ri, r2 16/15 1.0125 256 112Trisruti Rishabha ru, r3 10/9 1.0416 266.6 182Chatussruti Rishabha re, r4 9/8 1.0125 270 204 1.1224Suddha Gandhara Or KomalSadharana Gandhara
We have chosen six raagas based on their popularity with the help of a music trainer. They are
Ananda Bhairavi, Atana, Hamsadwani, Kedaragowla, Kalyani and Nadanamakriya. Each of these raa-
gas, by their properties, is believed to evoke a peculiar emotion as shown in Table 3.3. In each of
these raagas, five tunes played on violin, of approximately 1 minute duration are selected. These are
excerpts from kruti renditions. In a similar survey with Hindustani raagas, Chordia [9] used excerpts
from alapana section, which, in our view, has few drawbacks. In alapana, the artist improvises within
the constraints of the raaga. Most listeners in our unreported pilot survey conducted with six subjects
with excerpts of alapana from two tracks each of a raaga, have only reported whether they enjoyed it or
not. They said they did not feel any emotion. This observation is congruent with remarks of Samba-
murthy [41], who says the so called art music leaves listeners in an ecstasy called sangitananda, which
is bliss and does not necessarily evoke any particular emotion. Listeners appreciate this very process
which is highly dependent on the artist’s skill, but one might not essentially feel any emotion. However,
choosing the stimuli from alapana section bars effects from other variables such as accompaniment and
tempo. Tempo is another important aspect which can affect the perception so much that a raaga which is
typically used for melancholic tunes can be used with a faster tempo to bring about ferociousness. But
for this survey, for each raaga, we have selected those tunes that have more or less a common tempo.
Please note that it is tempo, and not taalam. The tempo varied only slightly in the tunes across the
raagas. The problems that might arise due to accompaniment differences are taken care of, since the
only accompaniment in the tunes selected, mrudangam2, if at all present, is mild.
2Mrudangam is a barrel shaped percussion instrument with two ends of barrel covered with skin. It is a tuned instrument.
27
Figure 3.2: Average of the ratings collected per rasa, across all users and tunes, for each raaga. X-axisdenotes rasa index and Y-axis denotes the average value of the ratings.
To reduce participants’ fatigue, we divided the selected 30 tunes into two sets of 15 tunes each. Each
set had at least 2 tunes from each raaga. We set-up a web portal where each participant gets one set and
marks the subjective responses. Each cluster of words from Table 1 is allowed to be rated as None at
all, A Little, Somewhat or Very, based on how best that cluster expresses the participant’s emotions. For
example, if the listener feels the tune is very romantic, he/she selects Very for cluster Srungara. They
can mark multiple clusters for the same tune. They were also asked to respond verbosely on how they
feel after listening to the tune.
3.2.6.2 Participants
A total of 750 responses were recorded by 48 people with a median age of 22. Majority of them
are undergraduate or graduate students. 88% of them are male and 12% are female participants. They
described their familiarity with Indian classical music tradition, either Hindustani or Carnatic, as None
(35%) and Moderate (65%).
3.2.6.3 Results and observations
Figure 3.2 shows the normalized averages of subjective responses recorded by participants for all
the tracks in each raaga. We have quantized the verbal responses — None at all, A Little, Somewhat
28
and Very as 0, 1, 2 and 3 — to arrive at this plot. One can immediately comprehend the similarity
between plots of Ananda Bhairavi and Kalyani, Atana and Hamsadwani respectively. Kedaragowla and
Nadanamakriya are unique in regard to their plot as they do not show much similarity with other raagas.
However, Ananda Bhairavi differs from Kalyani when the relative height of peaks within each of their
plots is considered. To cross check whether the obtained peaks for rasas in each raaga correspond to
consistent user ratings, standard deviation of ratings given for each rasa across all tracks for each raaga
has been calculated.
Figure 3.3: (a) Average rating obtained per rasa for a tune in Nadanamakriya. (b) Standard deviation ofthe ratings for the same tune. X-axis denotes the rasa indices. Y-axis denotes the normalized values.
Figure 3.4: Ratings given by six users for a sample track in each raaga. X-axis denotes rasa indices.Y-axis denotes the ratings quantifiers.
29
But it has been realized that the popular measures like standard deviation cannot be relied upon for
this analysis since the actual subjective responses were obtained through verbal terms which were later
quantified numerically for analytical purpose. Standard deviation of these values will not reflect the
truth. Let us look into an example. After listening a particular tune, if few users have rated Somewhat
for a rasa and a few have rated Very for the same rasa, the deviation in the values for that particular
rasa grows high when compared to the case where all the users either rated it either Very or Somewhat.
Figure 3.3 shows one such example of the average rating and standard deviation for all rasas for a tune
in raaga Nadanamakriya. Hence, non-standard numerical quantification of such terms is ruled out as a
measure to check the consistency in ratings.
With that in mind, we have resorted to a naive but straight-forward method to validate if the rasa-
peaks for each raaga are valid in attributing a rasa to the corresponding raaga. Figure 3.5 shows the
histograms of values obtained for each rasa for all raagas. For instance, let us consider the raaga
Nadanamakriya. One can observe that consistently a large number of responses have been recorded
against rasa clusters other than the third cluster, which is sympathy. So, observing those histograms and
correlating with the mean plot of Nadanamakriya shown in Figure 3.2, we arrive at a conclusion that
Nadanamakriya primarily induces sympathy. Figure 3.4 shows responses of nine users for a track in
each raaga. For instance let us consider Nadanamakriya again. A consistent trend has been observed in
such plots for this raaga, which reaffirms that this raaga primarily induces Karuna rasa.
But this is not the case with all the raagas though. For instance, ratings for raaga Kedaragowla have
not been very consistent across listeners. This can be observed from plots for other raagas in Figure
3.4. Kalyani is an interesting raaga with many jiva-swaras, which are the most stressed notes. The
results reaffirm this, showing that it arouses an array of emotions based on the frequencies stressed
in the composition. Though not as evident as Nadamanakriya, the other raagas more or less show a
convergence in ratings and the ratings show a consistency across users. However, from Figure 3.5 and
Figure 3.4, it can be observed that the emotional responses for any given raaga are not in favour of
any single rasa. Almost every participant has chosen the ratings for multiple clusters and expressed
verbosely that he/she can feel emotions in multiple rasas listening to the tune, as is evident in the plot.
What interests us is the converging patterns in those ratings for a given raaga. The observed rasas
of few raagas are not consistent with the actual rasas, but there is certainly an overlap. Though this
observation keeps us from making a final statement on the raaga-rasa association, from the converging
30
pattern between ratings for a raaga and their variance across raagas, we can say that the raaga certainly
encapsulates the melodic patterns responsible for eliciting specific emotions.
3.2.6.4 Implications to music recommendation systems
We have investigated the possibility of building a novel recommendation system based on emotion,
specific to Indian culture. The results from the analysis of this survey have several direct and indirect
implications which can be used to increase the effectiveness of content-based music recommendation
systems in general. The fact that raaga holds properties of a tune responsible for perception of melody
and evoking emotions can be a potential key to build recommendation systems that compliment and
contrast the western approaches.
3.2.7 Conclusions
We have reported a behavioural study that has empirically tested the hypothesis that each raaga in
Carnatic music evokes peculiar emotions characteristic to that raaga. But still few questions remain
to be answered. The participants in the study were Indians. So, the influence of raaga-based music
on other cultures is yet to be seen. For now, we can only speculate based on an analysis reported by
Balkwill et al [3]. In an attempt to avoid biasing the raagas and the tunes chosen, we have deferred from
deliberately picking up raagas and/or tunes. The tunes selected were not very distinct from each other.
This has resulted in a dataset with less polarity in emotional content of the tunes, as can be seen from
Figure 3.2.
With an added data of new raagas and tunes that bring in more polarity to the dataset, a study must
be conducted with participants from other ethnic groups around the world to observe if the cultural fac-
tors play a dominant role in responding emotionally to Carnatic music. As we have observed a great
similarity of plots of user responses between few raagas, it will be interesting to see if the note transition
patterns are common in the melodies constructed using those raagas. Further, these patterns can be anal-
ysed to see if they make a dominant contribution in evoking peculiar emotions. It will be interesting to
see the extent to which this analysis improves the mood-based recommendation system by incorporating
the results of this study and our future work, with an existing statistical music recommendation program
and identify the features responsible for perception of various emotions. Later it can also be verified if
the same features hold true in perceiving other genres of music.
31
And the most important aspect in taking this work any further is to take care in incorporating the
knowledge of the previous section on rasa. Without it, this study can only be considered as the one which
investigates the relationship between raaga and emotion. Even if the rasa term is properly interpreted,
it is almost impossible to take the user responses to empirically measure it. So, as we have said in the
beginning, the definition of the rasa does not make it a good emotion model in the case of music.
32
Figure 3.5: Histograms of ratings obtained per rasa, for the six raagas. X-axis has the four ratingquantifiers - None at all, A Little, Somewhat and Very. Y-axis denotes the number of ratings obtainedfor each quantifier.
33
Chapter 4
Raaga Recognition
Raaga is the spine of Indian classical music. It is the single most crucial element of the melodic
framework on which the music of the subcontinent thrives. Naturally, automatic raaga recognition is
an important step in computational musicology as far as Indian music is considered. It has several
applications like indexing Indian music, automatic note transcription, comparing, classifying and rec-
ommending tunes, and teaching to mention a few. Simply put, it is the first logical step in the process of
creating computational methods for Indian classical music. In this chapter, we identify the main draw-
backs of the previous raaga recognition techniques and propose minor, but multiple improvements to the
state-of-the-art raaga recognition technique. We discuss the results obtained with our raaga recognition
system with those improvements.
4.1 Introduction
Geekie [15] very briefly summarizes the importance of raaga recognition for Indian music and it’s
applications in music information retrieval in general. Raaga recognition is primarily approached as
determining the scale used in composing a tune. However the raaga contains more information which
is lost if it is dealt with western methods such as this. This information plays a very central role in the
perception of Indian classical music.
Though our work primarily concerns with Carnatic music, but most of the discussion applies to Hin-
dustani music as well, unless mentioned otherwise. In chapter 2, we have surveyed the computational
approaches to melody. In the following section, we identify a few problems that we address using our
raaga recognition system.
34
4.2 Problems that need to be addressed
4.2.1 Gamakas and pitch extraction for Carnatic music
An appropriate pitch extraction module is that which can accurately represent the gamakas. It has not
been a severe problem for the classification systems that were not depending on gamakas of a note for
classification. If there is such a pitch extraction system in place, gamakas can be used as an additional
feature to improve the accuracies of existing systems. Gamakas assume a major role when the number
of raaga classes is high in the dataset.
4.2.2 Skipping tonic detection
The manually implemented tonic (the base frequency of the instrument/singer) identification stage
needs to be eliminated if possible. Since the tonic identification itself involves some amount of error,
this could adversely impact the performance of a raaga recognition system. Neither the Carnatic nor
Hindustani systems adhere to any absolute tonic frequency, therefore it makes sense to build a system
that can ignore the absolute location of the tonic.
4.2.3 Resolution of pitch-classes
Though 12 bins for pitch-class profiles look ideal to the Western eye,we hypothesize that a more
continuous model can capture more relevant information related to Indian classical music. Dividing an
octave into n bins where n 12 can help us model the distribution with better resolution. Gamakas (the
micro tonal variations) play a vital role in the perception of Indian music, and this has been confirmed
by several accomplished artists. The transitions involved in a gamaka and the notes through which its
trajectory passes are two factors that need to be captured. We hypothesize that this information can be
obtained, at least partially, using a higher number of bins for the first-order pitch distribution.
4.2.4 A comprehensive dataset
The previous datasets which are used for testing have several problems. In Tansen, and the work by
Sridhar and Geeta, the datasets had as few as 2 or 3 raagas. The dataset used by Chordia has all the data
played on a single instrument by a single artist. The test datasets were constrained to some extent by
the requirement of monophonic audio (unaccompanied melodic instrument) for reliable pitch detection.
35
In the present work, we investigate raaga recognition performances on a more comprehensive dataset
with more raaga classes with significant number of tunes in each across different artists and different
compositions. This should enable us to obtain better insight into the raaga identification problem.
With these issues about the raaga recognition in mind, we have implemented a system which ad-
dresses some of the challenges described. The following sections introduces our method, and presents
a detailed analysis and discussion of the results.
4.3 Our method
As mentioned earlier, we propose to address some of the issues described in the previous section.
We have taken a diverse set of tunes to include in the dataset. The use of amply available recorded
music necessitates a pitch detection method that can robustly track the melody line in the presence of
polyphony. The obtained sequence of pitch values converted to cents scale (100 cents = 1 semitone)
constitutes the pitch contour. The pitch contour may be used as such to obtain a pitch-class distribution.
On the other hand, given the heavy presence of ornamentation in Indian music, it may help to use identi-
fied stable note segments before computing the pitch-class distribution. We investigate both approaches.
Finally, a similarity measure, that is insensitive to the location of the tonic note, is used to determine the
best matched raaga to a given tune based on available labeled data. Each of the aforementioned steps is
detailed next.
4.3.1 Pitch extraction
Pitch detection is carried out at 10 ms intervals throughout the sampled audio file using a predom-
inant pitch detection algorithm designed to be robust to pitched accompaniment [37]. The pitch de-
tector tracks the predominant melodic voice in polyphonic audio accurately enough to preserve fast
pitch modulations. This is achieved by the combination of harmonic pattern matching with dynamic
programming based smoothing. Analysis parameter settings suitable to the pitch range and type of
polyphony are available via a graphical user interface thus facilitating highly accurate pitch tracking
with minimal manual intervention across a wide variety of audio material. Figure 4.1 shows the output
pitch track superimposed on the signal spectrogram for a short segment of Carnatic vocal music where
the instrumental accompaniment comprised violin and mridangam (percussion instrument with tonal
characteristics). While the violin usually follows the melodic line, it plays held notes in this particular
36
segment. Low amounts of reverberation were audible as well. We observe that the detected pitch track
faithfully captures the vocal melody unperturbed by interference from the accompanying instruments.
Figure 4.1: Screenshot from the melodic pitch extraction system of [37] showing the detected pitchsuperimposed on the signal spectrogram. The axis on the right indicates pitch value (Hz).
4.3.2 Finding the tuning offset
The pitch values obtained at 10 ms intervals are converted to the cents scale by assuming an equi-
tempered tuning scale at 220 Hz. All the pitch values are folded into a single octave. The finely-binned
histogram maximum of the deviation of the cents value from the notes of the equi-tempered 12-note
grid provides us the underlying tuning offset of the audio with respect to 220 Hz. The tuning offset is
applied to the pitch values to normalize the continuous pitch contour to standard 220 Hz tuning by a
simple vertical shift but without any quantization to the note grid at this point.
4.3.3 Note segmentation
As we observe in Figure 4.1, the pitch contour is continuous and marked by glides and oscillations
connecting more stable pitch regions. The stable note regions too are marked by low pitch modulations.
As described in Sec. 2, melodic ornamentation in Indian classical music is very diverse and elaborate.
For our investigation of pitch class profiles confined to stable notes, we need to detect relatively stable
note regions within the continuously varying pitch contour. The local slope of the pitch contour can be
used to differentiate stable note regions from connecting glides and ornamentation.
At each time instant, the pitch value is compared with its two neighbors (i.e. 10 ms removed from
it) to find the local slope in each direction. If either local slope lies below a threshold value of 15
semitones per second, the current instant is considered to belong to a stable note region. This condition
is summarized by the Eq. 4.1.
37
(| (F (i− 1)− F (i)) |< θ) ‖ (| (F (i+ 1)− F (i)) |< θ) (4.1)
where F (i) is the pitch value at the time index i and θ being the slope threshold. To put the selected
threshold value in perspective, a large vibrato (spanning a 1 semi-tone pitch range) at 6 Hz pitch modu-
lation frequency has a maximum slope of about 15 semitones per second. All instants where the slope
does not meet this constraint are considered to belong to the ornamentation.
Finally, the pitch values in the segmented stable note regions are quantized to the nearest available
note value in the 220 Hz equi-tempered scale. This step smoothes out the minor fluctuations within
intended steady notes. Figure 4 shows a continuous pitch contour with the corresponding segmented
and labeled note sequence superimposed. We note several passing notes are detected which on closer
examination are found to last for durations of 30 ms or more.
Table 4.2: Performance of weighted-k-NN classification with various pitch-class profiles
4.4.2 Classification experiment
A k-NN classification framework is adopted where several values of k are tried. In a leave-one-out
cross-validation experiment, each individual tune is considered a test tune in turn while all the remaining
constitute the training data. The k nearest neighbors of the test tune in terms of the selected distance
measure are considered to estimate the raaga label of the test tune. The distance measure used is the
symmetric KL distance presented in the previous section. Since there are in all a minimum of 9 tunes
per raaga, we consider values of k=1, 3, 5 and 7. Since the number of classes is high (10 raagas), it is
more appropriate to consider a weighted-distance k-NN classification rather than simple voting to find
the majority class. Weighted k-NN classification is described by the equations below. The chosen class
is C*,
C∗ = arg maxc∑i
wiδ(c, fi(x)) (4.4)
where c is the class label (raaga identity in our case) , fi(x) is the class label for the ith neighbor of x
and δ(c, fi(x)) is the identity function that is 1 if fi(x) = 0, or 0 otherwise. The weights are given by,
wi =1
d(x, y)(4.5)
where d(x,y) is the symmetric KL distance between two pitch-class profiles x and y (e.g. its ith neigh-
bor).
The results in terms of percentage accuracy in raaga identification, obtained on the test dataset,
appear in Table 4.2. Two important points emerge from the comparison of accuracies across the different
types of pitch-class profiles. For all values of k, except k=1, in the k-NN classification, we see that P2
41
(the note segmented, duration weighted pitch-class profile) yields the highest accuracies. This implies
that note durations play an important role in determining their relative prominence for a particular raaga
realization. This is consistent with the fact that long sustained notes like dirgha swaras play a major role
in characterizing a raaga than other functional notes which occur briefly in the beginning, the end or in
the transitions. The benefit of note segmentation is seen in the slightly superior performance of P2 over
P3 (12 bin). P2 does not consider those instants that lie outside detected stable note regions. The second
important point emerging from Table 4.2 is the decreasing classification accuracy with increasing bin
resolution. Although the reverse might be expected in view of the widely held view that the specific
intonation of notes within micro-intervals are a feature peculiar to a raaga, a more carefully designed,
possibly unequal, division of the octave may be needed to observe this.
The overall best accuracy of 76.5%, which value is much higher than chance for the 10-way classi-
fication task, indicates the effectiveness of pitch-class profile as a feature vector for raaga identification.
It is encouraging to find that a simple first order pitch distribution provides considerable information
about the underlying raaga although the complete validation of this aspect can be achieved only by test-
ing with a much larger number of raaga classes on larger dataset. Including the ornamentation regions
in the pitch-class distribution did not help. As mentioned before, the gamakas play an important role
in characterizing the raaga as evidenced by performance as well as listening practices followed. How-
ever, for gamakas to be effectively exploited in automatic identification, it is necessary to represent their
temporal characteristics such as the actual pitch variation with time. A first-order distribution which
discards all time sequence information is quite inadequate for the task.
4.5 Conclusions
A brief but comprehensive introduction to the raaga and its properties is presented. Previous raaga
recognition techniques are surveyed with a focus on their approach and contributions. Key aspects that
need to be addressed are outlined and a method which deals with a few of them is discussed. Apart from
these contributions of our work, we have also highlighted details such as the composition of the testing
dataset, and provided insights into the post-processing steps involved with pitch extraction procedure
for Carnatic music. This is the first work, to the best of our knowledge, that uses polyphonic audio
recordings in the raaga recognition task.
42
The transitions in gamakas are discarded in the method explained, or are not fully utilized. A higher
number of bins in the pitch distribution proved to be not necessarily useful. Future raaga recognition
techniques can take into account the other properties of a raaga. Most important of these are the charac-
teristic phrases and gamakas which suggest that temporal properties may be usefully exploited in future
work. An automatic pitch-transcription system as accurate as the semi-automatic polyphonic pitch-
extraction system used in our work, is also necessary to scale the work to a large number of raagas.
43
Chapter 5
Conclusions
Very little research has been carried out on Indian music and even less on the specific characteristics
that makes it so special. The few existing computational approaches to melody, discussed in chapter 2,
have focused mainly on raaga recognition. Given the number of raagas which are commonly performed
and their unique properties, the data used in the literature is not representative. Indeed, the high accu-
racies reported might be due to the limited number of raagas used and the overall size of the dataset.
Moreover, important properties of the raagas, like their specific use of gamakas, have not been exploited
and issues beyond recognition have not been approached. As more representative datasets are gathered,
the features used will not be sufficient to discriminate the raaga classes. Features such as pitch-class
profiles and pitch-class dyad distributions infer partial information about the raagas. But the other roles
of notes are not evident, which need to be exploited. Symbolic scores can also be used for building
more complex models, especially to model the characteristic melodic movements of particular raagas.
It should be noted that raaga recognition is only a starting point to model a raaga and thus a lot remains
to be done.
At the level of musical instruments there is practically nothing done. Physical modeling of their
many non-linear behaviors is quite complex and the lack of instrument standardization does not help.
Some research has been done on modeling tabla and sitar [20] and there have been a few attempts in
developing sound synthesis systems [50]. In order to obtain credible synthesized sounds, as well as to
describe performance practise, the modeling of gamakas is a bottleneck.
The variability in performances of the same song is quite large, especially due to the importance
of improvisation. The same composition sung by two artists can be different in many musical and
expressive facets. These differences may challenge the version identification methods developed for
44
western commercial music. In addition to the compositional forms, there are many improvisatory forms
that are performed with well-defined structural criteria [18]. Nothing has been done in these topics.
Through this thesis, we have mentioned a number of characteristics of the Carnatic music that deserve
to be studied. Given that this music tradition is so different from the ones used to develop the current
methodologies, there is a need to also deal with some more fundamental issues. We need to study
how the musical concepts and terms in Indian music are understood, specifying proper ontologies with
which to frame our work. Also the cultural and community aspects of the music are so important that
without studying them we will not be able to develop proper musical models. In summary, to approach
the computational modeling of Carnatic music, making justice to its richness, is fundamental to take a
cultural approach and thus take into account musicological and contextual information.
To conclude, in this thesis, we have made the following contributions.
1. Strong theoretical arguments are presented to show that the term rasa cannot be used in the context
of music, and with the help of a behavioural study, raagas in Carnatic music are shown to evoke
feelings to certain extent.
2. A survey of state-of-the-art in raaga recognition is presented identifying the problems to be ad-
dressed.
3. Based on an existing raaga recognition system for Hindustani music, a system with several im-
provements is built for Carnatic music and has been tested on a comprehesive real world data.
4. Contributed the ground-truth data drawn from real stage concerts and CD recordings, making it
the most diverse and extensive dataset for Carnatic raagas till date.
5. A brief discussion on few standing debates like 22 srutis concluding with necessary future steps
to resolve such debates.
6. A discussion on several open problems in Carnatic music, to be explored computationally.
5.1 Impact of this work and the future directions
Possible applications of our work include music recommendation systems based on mood and raaga,
learning-aid for students in visualizing the feedback from their practice sessions, digitizing and archiv-
45
ing the huge amount of music data automatically with correct metadata, analysing various artistic styles
etc.
The data used to test the raaga recognition systems so far, is very less. when compared to hundreds
of raagas in Carnatic music. The first future step of this raaga recognition technique is to consider much
larger and diverse data of say, 100 raagas. This step is not as obvious as it sounds. In Carnatic music,
the same kruti is often sung by different artists, during several instances. A new kruti is not a common
phenomenon. The dataset gets biased if we include two versions of the same kruti. To handle this, we
need to grab those sections of a rendition which are not pre-composed. Alapana and Swarakalpana are
two such sections. One possible step would be to extract such sections programmatically from a huge
pool of renditions As the data grows, there will arise a need to exploit the unexploited properties of
raaga.
Cognitive and behavioural aspects of Carnatic music need to be studied in a systematic manner.
These studies will shed light on various things like the perception of gamakas, the extent to which the
variations of a swara are normally allowed, and raaga and emotion association etc.
5.2 Few guidelines for future students/researchers
The thesis has been a huge learning activity for us. It included a number of domains - musicology,
cognition, music performance, signal processing, machine learning and pattern recognition. During the
course of this thesis work, we have had several experiences which helped us to get an overview of the
scientific research in Indian classical music. We would like to draw the reader’s attention, especially
those who are working in this field, to few points.
The first point concerns the flexibility of an art form. Music being a very highly celebrated art
form, is practically an endless creative domain. It is very natural to observe deviations from the written
rules. Particularly the Indian classical music being an oral tradition, depends heavily on what people
perceive and transfer to next generation. Students learn from listening to the guru sing and perform.
The only feedback is the agreement with the guru. But this does not mean that rules can be broken.
Good examples for this are gamakas and the swaras. The artists take liberty in changing them a little
to sound good depending on the context. This is very different from western scenario where music is
played reading the notation!
46
Second point we would like to stress is the scale used. Some artists say that it is the same as the west-
ern equi-tempered scale. Others disagree saying it is just-intonated. Though we have spent some time in
trying to obtain the correct information, it appears that it is not that important, since a swarasthanam is
not a fixed point anyway, it is a region. Moreover, the tuning is often based on perceptual measures than
the objective tuning instruments. For mathematical simplicity, we have used equi-tempered scale in our
raaga recognition system. However, we do observe that there is a slight perceptual difference between
the two tuning systems.
The third point, something which bothered us throughout is - does it require to be a musician to do
research related to Indian music? There is no definitive answer for this. It is always better to have hands-
on experience with something one works with. But a musician can be as good/bad as a non-musician in
explaining the science behind it. A non-musician stature should not constrain one from approaching the
domain as a researcher. In this case, a lot of listening activity and reading the musicological literature
helps a lot as has been the case with us.
47
Appendix A
Basic Acoustics
Consider any object. Everything we observe about this object has an explanation. Studying its
physical aspects allows us to know how it behaves in response to various actions. Sound is the behaviour
of objects to natural/our actions. Of course, not all sounds qualify as music. But the physical properties
we study about an object are same be the sound it produces music or noise.
Pluck a string and observe the wave pattern generated. It looks like the complex wave pattern in
Figure A.1. Our vocal cords are no different, only that they are made of muscles.
Figure A.1: Wave pattern generated by a plucked string
A.1 Demonstration of various physical properties
Like any other object, sound also has certain physical properties. Though we are intuitively very
sensitive to them, we do not always realize them. To be able to appreciate this fact better, do the
following experiments, and see how the sound changes compared to the default case.
1. Stretch the string keeping the length of the string between the two points the same.
48
2. Vary the length keeping the tension in the string the same. i.e., hold it at different points to vary
the length without stretching or letting it go slack.
3. Pluck it with force.
4. Change the string. If it is rubber, now consider a brass/other-material string.
See if your observations concur with the following, they probably should.
1. The sound is sharper, like your younger sibling’s shrill cry.
2. The sound is flat compared to the original one.
3. It is louder.
4. It is a different sound altogether, though sharpness/flatness may be the same/different compared
to the original one.
Now, we’ll learn the physical properties which are involved in these observations. Any sound is caused
because the source vibrates and the vibrations are carried to our ear drum which sympathizes. As a
result of the vibrations of the source, alternative high and low pressure regions are created next to it.
These particles of the medium, typically air, are set in motion and disturb their adjacent particles in turn.
In this way the whole region around the source is set to vibrate. Observe the waves shown in Figure
A.2.
Figure A.2: Condensation and rarefaction represented as a sine wave
A.2 Sine waves
The wave in Figure A.2 can be represented as a series of crests and troughs, as shown. In mathemati-
cal terminology, it is called a sine wave. It has certain physical properties. It is periodic, i.e., it is nothing
49
but a pattern that is copied over and over. The number of such repetitions passing through a point in a
second is called frequency (f). Each such repetition is called a cycle. The time it takes for a cycle to pass
through a point is called time period (T). The distance between corresponding points on two adjacent
cycles is called the wave length (λ). An example to such points would be points on extreme tips in two
adjacent cycles. The height of the wave is called amplitude (A). See Figure A.2.
The frequency is the factor which we perceive when we observe a tone to be sharper or flatter.
More the frequency, sharper the sound. Time period and wave length are inversely proportional to the
frequency. The amplitude is the volume/loudness perceived. More the amplitude, louder the sound. See
Figure A.3, for example.
Figure A.3: Examples of sine waves with high and low frequencies
Normally when we speak or sing, or when an instrument is being played, these factors keep changing
and we perceive them collectively to be either noise/speech/music.
A.3 Harmonics
But we have one more question left to be answered. Why do two sounds produced by different
materials sound different when all these factors are made equal? Let us experiment again. Set a string
to vibrate and observe it. It might be difficult to see with naked eye, but the vibration does not look like
a perfect single sine wave. As said earlier, it is a complex wave pattern. When a string is plucked, there
are various modes in which it can vibrate. See Figure A.4. These are called harmonics; the frequency
of wave whose wavelength is λ1 is called 1st harmonic or the fundamental.
50
Figure A.4: Possible harmonics in a given string
Now let’s see Figure A.1 again. When we pluck a string, we can observe it vibrating not exactly
like a simple sine wave, but something more complex. This complexity arises from the mixture of other
harmonics with the fundamental. Given that the two ends of the vibrating string are fixed, the length
and other physical properties of the string will have a crucial effect on the properties of wave generated
with it. A string of length L can only have waves of wavelength 2L, 2L/2, 2L/3, 2L/4, etc. Frequencies
corresponding to the fundamental and second, third and fourth harmonics are given in the Table A.1. v
is the velocity with which the wave travels.
Frequency =velocity
wavelength(A.1)
Table A.1: Fundamental and its harmonics
Fundamental or First harmonic v/2L f1Second harmonic or first overtone 2v/2L 2f1Third harmonic or second overtone 3v/2L 3f1Fourth harmonic or third overtone 4v/2L 4f1nth harmonic or (n-1)th overtone nv/2L nf1
A.4 Timbre
Due to the fact that the force between molecules varies according to the material, the nature of wave
propagation is affected. In turn, the harmonic pattern is affected. Thus, the nature of these harmonics
51
varies from material to material. It is this property that distinguishes the sound waves of varied origins
(e.g.: rubber and steel). It is called the timbre. This is also one of the important reasons why we are able
to discriminate between two persons speaking.
Let us now look at the frequency measures in use.
A.5 Frequency measures
Frequency is measured in Hertz and Cents. Hertz is a linear scale measure whereas cents is used
to measure the interval between two frequency values in logarithmic scale. The interval in cents is
calculated using the following formula.
V alue in Cents(C) = log (a
b)× k (A.2)
Here a and b are the frequencies, k is a constant and it’s value is 120010log(2) .
For calculation purposes, if one has to deal with more than one octave, which usually is the scenario,
it is often advisable to use cents. Let us look at a quick example. Consider two octaves. One from 10 to
20, another from 1000 to 2000. If we take the absolute differences between the fifth note and the tonic,
they would be (10*3/2 10 == 5 Hz) and (1000*3/2 1000 == 500 Hz) respectively. But if we take the
interval in cents, they both will be the same.
D1 =1200
10log(2)∗ 10log(15
10) = 701.95cents
D2 =1200
10log(2)∗ 10log(1500
1000) = 701.95cents
The cents measure will also enable us to know the number of octaves between two frequency values.
Each octave has an interval of 1200 cents. So, for example the number of octaves between 12 Hz and
16400 Hz would be,
Number of Intervals =(1200/log(2) ∗ log(16400/12))
1200=
(log(16400/21))
log(2)= 10.41 octaves.
This knowledge should be sufficient to understand the chapters in this thesis. We recommend the
reader to refer to relevant books [30] for other concepts if and when one requires them. We will now
introduce the scales used in Carnatic music and then understand raaga and it’s properties.
52
A.6 Tuning systems
A.6.1 Equal-temperament
This is a western standard. It is preferred mostly due to its mathematical simplicity. In this scale, all
the 12 notes in the scale are equally spaced. That is, the ratio between 2nd and 1st note is the same as
the ratio between 3rd and 2nd note. We can derive the value corresponding to this ratio. Say xi is the ith
frequency value.
k =x2x1
=x3x2
=x4x3
= ... =x12x11
=(2 ∗ x1)x12
x2 = k ∗ x1, x3 = k ∗ x2 and so on x12 = k ∗ x11, and 2 ∗ x1 = k ∗ x12
So, x12 = k11 ∗ x1
Hence k ∗ x12 = k12 ∗ x1
2 ∗ x1 = k12 ∗ x1
k =12√2
So, in the equi-tempered scale, each subsequent note is obtained by multiplying the current note with
k, where k = 12√2.
A.6.2 Just-intonation
On the other hand, just-intonation tuning system uses ratios of small integers as intervals between
two notes of the scale. There are a number of ways to tune using small integer ratios. But the key essence
of this tuning method is to use these ratios instead of geometric progression to find the intervals.
53
Bibliography
[1] P. S. R. Apparao. Natyasastramu. Natyamala Publications, 1959.
[2] L. L. Balkwill and W. F. Thompson. A cross-cultural investigation of the perception of emotion in music:
Psychophysical and cultural cues. Music Perception, 17(1):43–64, 1999.
[3] L. L. Balkwill, W. F. Thompson, and R. Matsunga. Recognition of emotion in japanese, western, and
hindustani music by japanese listeners. Japanese Psychological Research, 46(4):337–349, 2004.
[4] S. Belle, R. Joshi, and P. Rao. Raga Identification by using Swara Intonation. Journal of ITC Sangeet
Research Academy, 23, 2009.
[5] V. N. Bhatkande. Hindusthani Sangeet Paddhati. Sangeet Karyalaya, 1934.
[6] P. Chordia. Segmentation and recognition of tabla strokes. In Proc. of ISMIR, pages 107–114, 2005.
[7] P. Chordia, A. Albin, A. Sastry, and T. Mallikarjuna. Multiple viewpoints modeling of tabla sequences. In
International Conference on Music Information Retrieval, number Ismir, pages 381–386, 2010.
[8] P. Chordia and A. Rae. Raag recognition using pitch-class and pitch-class dyad distributions. In Proc. of
ISMIR, pages 431–436, 2007.
[9] P. Chordia and A. Rae. Understanding emotion in raag: An empirical study of listener responses. Computer
Music Modeling and Retrieval, pages 110–124, 2008.
[10] P. Chordia and A. Rae. Tabla Gyan : An Artificial Tabla Improviser. In International Conference on
Computational Creativity, 2010.
[11] M. Clayton. Time in Indian Music : Rhythm , Metre and Form in North Indian Rag Performance. Oxford
University Press, 2000.
[12] D. Das and M. Choudhury. Finite State Models for Generation of Hindustani Classical Music. In Proceed-
ings of International Symposium on Frontiers of Research in Speech and Music, 2005.
[13] A. Datta, R. Sengupta, N. Dey, and D. Nag. Experimental Analysis of Shrutis from Performances in Hin-
dustani Music. Scientific Research Department, ITC Sangeet Research Academy, 2006.
[14] P. Ekman. An argument for basic emotions. Cognition & Emotion, 6(3):169–200, 1992.
[15] G. Geekie. Carnatic ragas as music information retrieval entities. In Proc. of ISMIR, pages 257–258, 2002.
[16] O. Gillet and G. Richard. Automatic labelling of tabla signals. In Proc. of ISMIR, 2003.
[17] E. Hanslick, G. Cohen, and M. Weitz. The beautiful in music. Liberal Arts Press, New York, 1957.
54
[18] S. R. Janakiraman. Essentials of Musicology in South Indian Music. The Indian Music Publishing House,
2008.
[19] P. Juslin. Music and Emotion: Theory and Research. Oxford University Press, November 2001.
[20] A. Kapur, P. Davidson, P. Cook, W. Schloss, and P. Driessen. Preservation and extension of traditional
techniques:digitizing north indian performance. Journal of New Music Research, 34(3):227–236, 2005.
[21] G. K. Koduri and B. Indurkhya. A Behavioral Study of Emotions in South Indian Classical Music and its
Implications in Music Recommendation Systems. In SAPMIA, ACM Multimedia, pages 55–60, 2010.
[22] A. Krishnaswamy. On the twelve basic intervals in south indian classical music. 10 2003.
[23] A. Krishnaswamy. Inflexions and Microtonality in South Indian Classical Music. In Frontiers of Research
on Speech and Music, 2004.
[24] A. Krishnaswamy. Melodic atoms for transcribing carnatic music. In Proc. of ISMIR, pages 345–348, 2004.
[25] A. Krishnaswamy. Multi-Dimensional Musical Atoms in South Indian Classical Music. In Proc. of the
International Conference of Music Perception & Cognition, 2004.
[26] C. L. Krumhansl. Reasoning about naming systems. Canadian Journal of Experimental Psychology,
51(4):336–353, 1997.
[27] S. K. Langer. Philosophy in a new key: a study in the symbolism of reason, rite, and art. Nueva York, EUA
: Mentor Books, 1959.
[28] K. Lee. Automatic chord recognition from audio using enhanced pitch class profile. In Proc. of the Inter-
national Computer Music Conference, 2006.
[29] M. Levy and N. A. Jairazbhoy. Intonation in North Indian Music: A Select Comparison of Theories with
Contemporary Practice. Aditya Prakashan, New Delhi, 1982.
[30] G. Loy. Musimathics: the mathematical foundations of music. Vol. II. With a foreword by John Chowning.
Cambridge, MA: MIT Press, 2007.
[31] G. Pandey, C. Mishra, and P. Ipe. Tansen: A system for automatic raga identification. In Proc. of Indian
International Conference on Artificial Intelligence, pages 1350–1363, 2003.
[32] H. S. Powers. The Background of the South Indian Raaga-System. PhD thesis, Princeton University, 1959.
[33] Pratyush. Analysis and Classification of Ornaments in North Indian (Hindustani) Classical Music. Master’s
thesis, University of Pompeu Fabra, 2010.
[34] C. V. Raman. The Indian musical drums. Proceedings Mathematical Sciences, 1(3):179–188, 1934.
[35] N. Ramanathan. Shrutis according to ancient texts. Journal of the Indian Musicological Society, 12(3):31–
37, 1981.
[36] A. Rangacharya. The Natyasastra. Munshiram Manoharlal Publishers, 2010.
[37] V. Rao and P. Rao. Vocal melody extraction in the presence of pitched accompaniment in polyphonic music.
Audio, Speech, and Language Processing, IEEE Transactions on, 18(8):2145–2154, 2010.
[38] E. Rosch. Principles of Categorization. John Wiley & Sons Inc, 1978.
55
[39] J. A. Russell. A circumplex model of affect. Journal of Personality and Social Psychology, 39:1161–1178,
1980.
[40] H. Sahasrabuddhe and R. Upadhye. On the computational model of raag music of india. In Workshop on AI
and Music: European Conference on AI, 1992.
[41] P. Sambamoorthy. South Indian Music. The Indian Music Publishing House, 1998.
[42] L. A. Schmidt and L. J. Trainor. Frontal brain electrical activity (eeg) distinguishes valence and intensity of