Analysis of Motifs in Carnatic Music: A Computational Perspective A THESIS submitted by SHREY DUTTA for the award of the degree of MASTER OF SCIENCE (by Research) DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING INDIAN INSTITUTE OF TECHNOLOGY, MADRAS. October 2015
93
Embed
Analysis of Motifs in Carnatic Music: A Computational ...lantana.tenet.res.in/website_files/thesis/MS/thesis_rao_pmistake... · In Carnatic music, a ... an attempt to automatically
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Analysis of Motifs in Carnatic Music: AComputational Perspective
A THESIS
submitted by
SHREY DUTTA
for the award of the degree
of
MASTER OF SCIENCE(by Research)
DEPARTMENT OF COMPUTER SCIENCE ANDENGINEERING
INDIAN INSTITUTE OF TECHNOLOGY, MADRAS.October 2015
THESIS CERTIFICATE
This is to certify that the thesis entitled Analysis of Motifs in Carnatic Music:
A Computational Perspective, submitted by Shrey Dutta, to the Indian Institute
of Technology, Madras, for the award of the degree of Master of Science (by
Research), is a bona fide record of the research work carried out by him under my
supervision. The contents of this thesis, in full or in parts, have not been submitted
to any other Institute or University for the award of any degree or diploma.
Dr. Hema A. MurthyResearch GuideProfessorDept. of Computer Science and EngineeringIIT-Madras, 600 036
Place: Chennai
Date:
ACKNOWLEDGEMENTS
I joined IIT Madras with the intention of mastering the techniques used in machine
learning. There is so much data available in digital form and I used to think that
machine learning techniques help in making sense of this data just as human brain
makes sense of the raw data received from different senses. As I started gaining
deep understanding in machine learning techniques, I realized that these tech-
niques are not mature enough to mimic the human brain and thus, should not be
used blindly. I understood that the data needs to be represented in a sensible form
which depends on the task under consideration. These techniques are designed to
use this representation in achieving the desired task. After understanding this, I
was able to use the existing techniques efficiently as well as design new techniques
when required. This level of understanding was not possible without the immense
knowledge and experience shared by my adviser, Prof. Hema A. Murthy, through
endless captivating discussions.
I would like to express my sincere gratitude to her for the excellent guidance,
patience and providing me with an excellent atmosphere for doing research. She
helped me to develop my background in signal processing and machine learning
and to experience the practical issues beyond the textbooks. She has not only
helped in improving my perspective towards research but also towards life.
I would like to thank my collaborators Vignesh Ishwar, Krishnaraj Sekhar
and Ashwin Bellur. The completion of this thesis would not have been possible
without their contribution. They helped me in building datasets, carrying out the
i
experiments, analyzing results and in writing research papers.
I am grateful to the members of my General Test Committee, Prof. C. Chandra
Sekhar and Prof. C. S. Ramalingam, for their suggestions and criticisms with
respect to the presentation of my work. I am also grateful for being a part of the
CompMusic project. It was a great learning experience working with the members
of this consortium.
I would like to thank my music teachers Prof. M.V.N. Murthy and Niveditha
Bharath. Prof. M.V.N. Murthy patiently taught me to play the instrument,
Saraswati Veena, in his unique and excellent style. He always encouraged me
to explore the music beyond what he used to teach in classes which certainly
manifested my creativity. Madam Nivedita Bharath taught me to sing Carnatic
music. She is an excellent and a very friendly teacher. Her classes were full of fun
and excitement. Learning music from these wonderful teachers also helped me to
better understand the work with respect to this thesis.
I would like to thank Aashish, Anusha, Asha, Jom, Karthik, Manish, Padma,
Praveen, Raghav, Rajeev, Sarala, Saranya, Sridharan, Srikanth and other members
of Donlab for their help and unconditional support over the years. It would have
been a lonely lab without them. I am also grateful to Alastair, Ajay and Sankalp
from MTG Barcelona for always clearing my doubts and helping in my research. I
would also like to acknowledge the help of Kaustuv from IIT Bombay. He always
found time to answer my questions regarding Hindustani music.
I am also obliged to the European Research Council for funding the research un-
der the European Unions Seventh Framework Program, as part of the CompMusic
project (ERC grant agreement 267583).
I would like to thank all my friends at IIT Madras without whom the life at IIT
ii
campus would have been dry and boring. If not for them, I would have finished
my thesis much earlier. They have always been a source of refreshment during
stressful times.
I would like to thank my parents who have made many sacrifices so that I can
get a good education and a good life. They have always tolerated my stubborn
and rebellious nature which I am constantly trying to change. I wish to make them
proud one day.
Lastly, I would like to thank my loving brother Anubhav for always being
an anchor of my life. It was he who has taken the responsibility of financially
supporting our family at an early age and motivated me to pursue any path I wish
to choose. I will always be grateful to him and I wish him all the happiness in life.
4.8 D2:Percentage of motifs preserved after filtering . . . . . . . . . . 54
5.1 Details of the database used. Durations are given in approximatehours (h), minutes (m) or seconds (s). . . . . . . . . . . . . . . . . 58
5.2 EER(%) for different algorithms using different normalizations ondifferent datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
ix
5.3 Number of claims correctly verified by hard-LCSS only, by soft-LCSS only, by both and by neither of them for D1 and D2 usingT-norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
x
LIST OF FIGURES
2.1 Comparing Pitch Histogram of Raga ‘Sankarabharanam’ with itsHindustani and Western classical counterparts. . . . . . . . . . . . 7
4.2 Slopes of the linear trend of stationary points helps in reducing thefalse alarms. The last three phrases are false alarms. . . . . . . . . 47
5.1 An example of a common segment set between two sequences rep-resenting the real data . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2 DET curves comparing LCSS algorithm with different algorithmsusing different score normalizations . . . . . . . . . . . . . . . . . 68
5.3 Showing the effect of T-norm on the score distribution . . . . . . . 70
xi
ABBREVIATIONS
DTW Dynamic Time Warping
UE-DTW Unconstrained Endpoint - Dynamic Time Warping
LCS Longest Common Subsequence
RLCS Rough Longest Common Subsequence
RCS Rough Common Subsequence
WAR Width Across Reference
WAQ Width Across Query
RWAR Rough Width Across Reference
RWAQ Rough Width Across Query
HMM Hidden Markov Model
LSF Least Squares Fit
LCSS Longest Common Segment Set
Z-Norm Zero Normalization
T-Norm Test Normalization
EER Equal Error Rate
VAD Voice Activity Detection
xii
NOTATIONS
f Frequency value in hertzdri,q j Distance between reference’s i-th value and query’s j-th valueTd Threshold on distance, dri,q j
ci, j Cost of RLCS till reference’s i-th value and query’s j-th valuewr
i, j WAR till reference’s i-th value and query’s j-th valuewq
i, j WAQ till reference’s i-th value and query’s j-th valueβ A weight on densityρ Matching rateca
i, j Actual length of RLCS till reference’s i-th value and query’s j-th valuewr
i, j RWAR till reference’s i-th value and query’s j-th valuewq
i, j RWAQ till reference’s i-th value and query’s j-th valuest A semitone in centsδSXY Density of RCS SXY
lwSXY
Actual length of RCS SXY
gX Gaps in sequence XgY Gaps in sequence Yτsim Threshold on the similarity scoresµX
SXYSlope of the linear trend of stationary points in sequence X
σXSXY
Standard Deviation of the linear trend’s slope in sequence XλSXY Similarity in the linear trend of stationary points in sequences X and Yγ The number of gaps between two hard segmentsη Penalty issued for each gapµclaim
I Imposter mean for the claimσclaim
I Imposter standard deviation for the claim
xiii
CHAPTER 1
Introduction
1.1 Overview of the thesis
In Carnatic music, a raga is a collective expression of melodies which consists of:
1. A set of svaras (ornamented notes) ordered in a well defined manner.
2. Phrases (aesthetic threads of ornamented notes) as established by perfor-mances through ages as rendered in well known compositions.
While there are some ragas, in particular, for which the first condition suffices,
in general both these conditions are necessary and are used in practice. The
phrases that collectively give a raga its identity are called melodic motifs. The
melodic motifs are unique to a raga. Therefore, in any rendition of the raga, either
compositional or improvisational, these motifs are rendered in order to establish
the raga’s identity. Different renditions of a motif may differ slightly from each
other, but they are enough to confuse a time-series matching algorithm. The goal
of the thesis is to design algorithmic techniques to automatically find these motifs,
their different renditions and, then use the regions replete with these motifs to
perform raga verification.
The initial part of the thesis is dedicated towards finding different renditions of
melodic motifs in an improvisational form of raga called the alapana. This problem
is known as motif spotting. A melodic motif, preselected by a musician, is used
as a query and its different renditions are spotted using a matching algorithm.
Following this work, inspired by how trained listeners identify ragas, automatic
discovery of motifs is attempted using certain segments of compositions which are
supposed to be rich in motifs. Similar phrases are extracted from a number of such
segments of the compositions in a particular raga. All similar phrases need not be
melodic motifs. Some of them could also appear in other ragas thus, violating the
uniqueness property of the motifs. Therefore, these non-motif phrases are filtered
out if they are found in composition lines of other ragas. Using this approach,
various motifs are discovered for 14 ragas thus, confirming that these segments
are replete with motifs. Therefore, using these segments of compositions, raga
verification is performed. In raga verification, a melody (a single phrase or an
aesthetic concatenation of many such phrases) along with a raga claim is supplied
to the system. The system confirms or rejects the claim. Raga verification is
performed by comparing the snippet of audio supplied with various composition
lines of the claimed raga. The obtained score is matched against the scores obtained
with composition lines of confusing ragas using score-normalization techniques.
Two algorithms for time-series matching are proposed in this work. One is
a modification of the existing algorithm, Rough Longest Common Subsequence
(RLCS). Another proposed algorithm, Longest Common Segment Set (LCSS), is
completely novel and uses in between matched segments to give a holistic score.
This algorithm comes in two forms: hard and soft. Hard-LCSS treats individ-
ual matched segments separately irrespective of their lengths and distribution
whereas, soft-LCSS can join two or more segments based on their lengths and
distribution in order to compute a holistic score. Using the proposed algorithms,
an error rate of ∼ 12% is obtained for raga verification on a database consisting of
17 ragas.
2
1.2 Contribution of the thesis
The following are the main contributions of the thesis.
1. A measure based on the stationary points of the pitch contour is introducedthat reduces the number of false alarms.
2. Modifications to an existing time-series matching algorithm, known as RoughLongest Common Subsequence, are proposed that reduces the number of falsealarms and results in better localization.
3. A new time-series matching algorithm, known as Longest Common SegmentSet, is proposed which performs better for the task of raga verification.
4. Approaches are proposed to discover melodic motifs automatically from thecomposition lines and to find their different renditions.
5. A system is designed to perform raga verification which is scalable to anynumber of ragas.
1.3 Organization of the thesis
The organization of the thesis is as follows: In Chapter 2, a brief background on
carnatic music is given which is required for a better understanding of the work.
Some of the related works on spotting motifs, motif discovery and raga verfication
is also discussed this chapter.
Chapter 3 is dedicated towards describing the approach proposed in this the-
sis to find different renditions of motifs. This chapter describes the quantization
of a pitch contour into stationary point that preserved most of the raga informa-
tion. This chapter also describes the modifications made to an existing time-series
matching algorithm.
Chapter 4 describes the proposed approach for automatically discovering the
melodic motifs from the composition lines of the ragas. A measure is defined based
on the stationary points which reduces the false alarms.
3
Chapter 5 is dedicated towards explaining the raga verification system. Auto-
matic extraction of composition lines from a given composition is discussed. This
chapter also describes the concept of cohorts for a raga. A new time-series match-
ing algorithm, named as Longest Common Segment Set, is also proposed in this
chapter.
Finally, Chapter 6 summarizes the work and discusses the possible future work.
4
CHAPTER 2
Literature Survey
Carnatic music is an art music (often also referred to as classical music) tradition
commonly associated with four states of Southern India: Andra Pradesh, Karnataka,
Kerela and Tamil Nadu, and also some parts of Maharashtra. It is one of the two
main sub-genres of Indian classical music. The other sub-genre is Hindustani Music
which is mainly practiced in North India and also some parts of South India.
A Carnatic music concert is an ensemble of the main performer (usually a
vocalist), an accompanist (usually a violinist, occasionally a vainika or flautist) and
percussionists (a single mridangam vidwan (main percussionist), or an ensemble
of percussionists). If the main percussionist is right handed, s/he sits to the right
of the main artist and the violinist sits to left. The positions are exchanged when
the mridangam vidwan is left handed. All the performers sit on the stage cross
legged without any support.
The first musical sound of a concert is always of a tambura, a drone instrument
which provides the tonic for the entire concert. The tambura (tanpura) is a string
instrument that has four strings tuned to three pitches: P-(S)-(S)-S. ‘S’ is the first
pitch of an octave whereas ‘P’ is 1.5 times the pitch of ‘S’ which makes ‘P’ the
seventh pitch of the octave. The two ‘(S)’s, being twice the pitch of ‘S’, represent the
first pitch of the second octave. When these four strings are played continuously
in a conventional manner, the perceived sound, rich in harmonics, provides the
harmonic base for the performance.
Table 2.1: Svaras and their respective ratios to the base pitch ‘S’.
using a geomatric approach for discovering repeated patterns that are musically
interesting to the listener. In [6, 7], Collins et. al. introduced improvements in
Meredith’s Structure Induction Algorithms. There has also been some significant
work on detecting melodic motifs in Hindustani music by Joe Cheri Ross et. al.
[39]. In this approach, the melody is converted to a sequence of symbols and a
variant of dynamic programming is used to discover the motif.
As mentioned before the typical motifs can be used for raga classification but
when the number of ragas increases, the scalability of this approach becomes an
issue. In Chapter 5, inspired by how the listener tries to identify raga during a
concert, an attempt is made to mimic the same. During a concert, the performer
usually begins by establishing the identity of the raga. When the musician is
establishing the identity by rendering the raga, the listener narrows down the
search space from hundreds of ragas to a small likely subset of ragas. By further
listening to the musician, the listener identifies the peculiarities, match them with
the shortlisted ragas and finally, identifies the raga. First, to mimic the reduction of
search space, a raga recording is presented with a claim. The claim is a raga which
a listener has associated it with. For every raga, a set of cohorts are identified
by a musician. Cohorts are ragas that have similar phrases and can be confused
with the given raga. The cohort raga list is used to reduce the search space. The
task that remains is to determine whether claimed raga is correct. This is done by
using a novel matching algorithm known as Longest Common Segment Set (LCSS)
along-with score normalization.
There is no parallel in Western classical music to raga verification. The closest
that one can associate with, is cover song detection [14, 33, 43], where the objective is
to determine the same song rendered by different musicians. Whereas, as discussed
17
in Chapter 2, two different renditions of the same motif may not be identical.
Several attempts have been made earlier to identify ragas [5, 11, 12, 18, 20, 25, 29, 47].
Most of these efforts have used small repertoires or have focused on ragas for which
ordering is not important. In [47], the audio is transcribed to a sequence of notes
and string matching techniques are used to perform raga identification. In [5],
pitch-class and pitch-dyads distributions are used for identifying ragas. Bigrams
on pitch are obtained using a twelve semitone scale. In [35], the authors assume that
an automatic note transcription system for the audio is available. The transcribed
notes are then subjected to HMM based raga analysis. In [25, 46], a template based
on the arohana and avarohana is used to determine the identity of the raga. The
frequency of the svaras in Carnatic music is seldom fixed. Further, as indicated
in [48] and [49], the improvisations in extempore enunciation of ragas can vary
across musicians and schools. This behaviour is accounted for in [23, 24, 29] by
decreasing the binwidth for computing melodic histograms. In [29], steady note
transcription along with n-gram models is used to perform raga identification. In
[11] chroma features are used in an HMM framework to perform scale independent
raga identification, while in [12] hierarchical random forest classifier is used to
match svara histograms. The svaras are obtained using the Western transcription
system. These experiments are performed on 4 to 8 different ragas of Hindustani
music. In [18], an attempt is made to perform raga identification using semi-
continuous Gaussian mixtures models. This will work only for ragas with linear
ordering of svaras.
Recent research indicates that a raga is characterised best by a time-frequency
trajectory rather than a sequence of quantised pitches [20, 38, 39, 45]. In [38, 39],
the sama of the tala (emphasised by the bol of tabla) is used to segment a piece. The
repeating pattern in a bandish in Hindustani Khayal music is located using the
18
sama information. In [20, 38], motif identification is performed for Carnatic music.
Motifs for a set of five ragas are defined and marked carefully by a musician.
Motif identification is performed using hidden Markov model (HMM) trained
for each motif. Similar to [39], motif spotting in an alapana in Carnatic music is
performed in Chapter 3. In [45], a number of different similarity measures for
matching melodic motifs of Indian music was attempted. It was shown that the
intra pattern type variance of the melodic motifs is higher for Carnatic music in
comparison with that of Hindustani music. It was also shown that the similarity
obtained is very sensitive to the measure used. All these efforts are ultimately
aimed at obtaining typical signatures of ragas. It is shown in Chapter 3 that there
can be many signatures for a given raga. To alleviate this problem in Chapter 4,
an attempt was made to obtain as many signatures for a raga by comparing lines
of compositions. Here again, it was observed that the typical motif detection was
very sensitive to the distance measure chosen. Using typical motifs/signatures for
raga identification is not scalable, when the number of ragas under consideration
increases. In raga-verification, as the task of identifying ragas narrows to a few
number of ragas, it is scalable to any number of new ragas.
19
CHAPTER 3
Motif Spotting
3.1 Introduction
A raga in Carnatic music can be characterised by a set of distinctive motifs. Dis-
tinctive motifs can be characterised by the trajectory of inflected svaras over time.
These motifs are of utmost aesthetic importance to the raga. Carnatic music is
a genre abundant with compositions. These compositions are replete with many
distinctive motifs. These motifs are used as building blocks for extempore improvi-
sational pieces in Carnatic music. These motifs can also be used for distinguishing
between two ragas, and also for archival and learning purposes. The objective of
the work presented in this chapter is to spot the location of the distinctive motifs in
an extempore enunciation of a raga called the alapana. In Carnatic music, the motifs
are laden with gamakas [27]. In addition, the motifs are similar across musicians
but not necessarily identical. The duration of the motifs can also vary quite signif-
icantly although the rhythm may be preserved. The query motif in general is very
short in duration compared to that of the test music segment. Several factors need
to be considered when dealing with this problem namely: selection of features,
time complexity, tolerance to noise, tolerance to speed variation, allowing partial
matches or rough matches rather than exact matches, timbre, etc [17, 30, 50].
In this chapter, pitch is used as the main feature for the task of motif spot-
ting. Substantial research exists on analysing different aspects of Carnatic music
computationally, using pitch as a feature. In [28], gamakas are characterized and
analysed using pitch contours. In [21], tuning of Indian classical music is studied
using pitch histograms. In [48], the motifs are extensively studied in the raga Thodi
using pitch histograms and pitch contours. All of the above prove the relevance
and importance of pitch as a feature for computational analysis of Carnatic music.
There are a number of dynamic programming techniques namely,the Dynamic
Time Warping (DTW), the Longest Common Subsequence (LCS) and their variants,
which are used for similar music matching tasks. DTW takes care of the speed
variations due to warping but forces the match from end-to-end of both the query
and the test sequences. Even unconstrained endpoint DTW will align an entire
query with a part of the test sequence [16]. In motif-spotting, there can be instances
where one can expect that most of the query is roughly matched with a part of
the test sequence. Although LCS does not force the match between query and test
to be end-to-end, it does not give importance to local similarity. Rough Longest
Common Subsequence (RLCS) addresses the issue of local similarity where some
leeway is given for partial query matches [30]. Other than partial query matches,
when the characteristic motif, for example, “Sa Ni Da Pa Da” is rendered as “Sa
Ri Ni Da Pa Da”, RLCS gives a good match since it gives the longest matched
subsequence.
3.2 Stationary Points
The task therefore is to attempt automatic spotting of a motif that is queried. The
motif is queried against a set of alapanas of a particular raga to obtain locations of
the occurrences of the motif. The task is non-trivial since no particular rhythm
is maintained in an alapana nor it is accompanied by a percussion instrument.
21
Figure 3.1: A Phrase with Stationary Points
Figure 2.6 shows repetitive occurrences of motifs in a piece of music. An enlarged
view of the motif is also shown. Since the alapana is much longer than the motif,
searching for a motif in an alapana is like searching for a needle in a haystack. After
an analysis of the pitch contours and discussions with professional musicians, it
was conjectured that the pitch contour can be quantized at stationary points. The
conjecture was confirmed as explained in Chapter 2. Figure 3.1 shows an example
phrase of the raga Kamboji with the stationary points highlighted.
Musically, the stationary points are a measure of the extent to which a particular
svara is intoned. In Carnatic music since svaras are rendered with gamakas, there is
a difference between the notation and the actual rendition of the phrase. However,
there is a one to one correspondence with the stationary point frequencies and
what is actually rendered by the musician (Figure 3.1). Figure 3.2 shows the pitch
histogram and the stationary point histogram of an alapana of the raga Kamboji.
The similarity between the two pitch histograms vindicates our conjecture that
stationary points are important.
22
Figure 3.2: The Pitch and Stationary Point Histograms of the raga Kamboji
3.2.1 Method of obtaining Stationary Points
Carnatic music is a heterophonic musical form. In a Carnatic music vocal concert,
a minimum of two accompanying instruments play simultaneously along with
the lead artist. These are the violin and the mridangam (a percussion instrument
in Carnatic music). Carnatic music is performed at a fixed tonic[2] to which all
instruments are tuned. The tonic is chosen by the lead artist and is maintained
throughout the performance by an instrument called the Tambura as discussed in
Chapter 2. The simultaneous performance of many instruments in addition to the
voice renders pitch extraction of the predominant voice a tough task. This leads
to octave errors and other erroneous pitch values. For this task it is necessary that
pitch be continuous. After experimenting with various pitch algorithms, it was
observed that the Melodia-Pitch Extraction algorithm [41] produced the fewest
errors. This was verified after re-synthesis using the pitch contours. In case of
an octave error or any other such pitch related anomaly, the algorithm replaces
the erroneous pitch values with zeros. The stationary points are obtained by
processing the pitch contour extracted from the waveform. The pitch extracted
23
Figure 3.3: Original and Cubic Interpolated pitch contours
is converted to the cent scale using (3.1) to normalise with respect to the tonic of
different musicians.
centFrequency = 1200 · log2
(f
tonic
)(3.1)
Least Squares Fit (LSF)[37] was used to compute the slope of the pitch extracted.
The zero crossings of the slope correspond to the stationary points (Figure 3.1). A
Cubic Hermite interpolation[15] was then performed with the initial estimation of
stationary points to get a continuous curve (Figure 3.3). The stationary points are
then again estimated from this continuous curve.
24
3.3 Rough Longest Common Subsequence Algorithm
Rough Longest Common Subsequence (RLCS), a variant of Longest Common
Subsequence (LCS), performs an approximate match between a reference sequence
and a query sequence while retaining the local similarity [30]. It introduces three
major changes in LCS namely, rough match, width-across-reference (WAR) and
width-across-query (WAQ) for local similarity and score matrix.
3.3.1 Rough match
In the recurrence function of LCS, the cost function is incremented by 1 when there
is an exact match. In RLCS, when the distance between a reference point, ri, and
a query point, qi, is less than a threshold, Td, they are said to be roughly matched,
ri ≈ qi i.e. d(ri, q j) < Td → ri ≈ qi, where d(ri, q j) is the distance between r j and q j.
The cost is incremented by a number, δ, between 0 and 1 instead of 1, based on
how good the match is as shown in (3.2).
δi, j = 1 −dri,q j
Td(3.2)
The cost is estimated using the following recurrence:
ci, j =
0 ; i. j = 0
ci−1, j−1 + δi, j ; ri ≈ q j
max(ci−1, j, c j−1,i) ; ri! ≈ qi
(3.3)
In LCS the cost gives the length of the Longest Common Subsequence. The cost
of RLCS is not incremented by 1 but it represents the length of the Rough Longest
25
Common Subsequence. Later, it is argued that this length is actually a rough length
of RLCS rather than its actual length.
3.3.2 WAR and WAQ for local similarity
To retain the local similarity, width-across-reference, WAR, and width-across-
query, WAQ, are used. WAR and WAQ represent the length of the shortest
substring of the reference and the query respectively, containing the LCS. These
measures represent the density of LCS in the reference and the query. Small values
of WAR and WAQ indicate a dense distribution of LCS. WAR is incremented by 1
if there is a rough match or jump along the reference. Likewise for the WAQ. WAR
and WAQ are computed using the following recurrences:
wri, j =
0 ; i. j = 0
wri−1, j−1 + 1 ; ri ≈ q j
wri−1, j + 1 ; ri! ≈ qi, ci−1, j ≥ ci, j−1
wri, j−1 ; ri! ≈ qi, ci−1, j < ci, j−1
(3.4)
wqi, j =
0 ; i. j = 0
wqi−1, j−1 + 1 ; ri ≈ q j
wqi−1, j ; ri! ≈ qi, ci−1, j ≥ ci, j−1
wqi, j−1 + 1 ; ri! ≈ qi, ci−1, j < ci, j−1
(3.5)
In (3.4) and (3.5), some of the cases and conditions are dropped from [30] for
the sake of clarity.
26
3.3.3 Score matrix
WAR, WAQ and cost are used to compute the score of a common subsequence in
the following way:
Scorei, j =
(β
ci, j
wri, j
+ (1 − β)ci, j
wqi, j
)·
ci, j
n i f ci, j ≥ ρn
0 otherwise
(3.6)
In (3.6), large value ofci, j
wri, j
suggests that the density of the RLCS is high in the
reference. Similarly, a large value ofci, j
wqi, j
is indicative of higher density of RLCS in
the query. β weighs between these two ratios. Large value ofci, j
n indicates that a
large part of the query has been matched, where n is the length of the query. ρ is
the matching rate that represents how long the length of the RLCS should be, with
respect to the query.
The algorithm to compute these values using Dynamic Programming is pre-
sented in [30].
3.4 Modified-Rough Longest Common Subsequence
In this section, the modifications made to the existing RLCS algorithm and the
rationale behind them are discussed.
27
3.4.1 Rough and actual length of RLCS
In [30], ci, j is defined as the length of the RLCS. But, it actually represents a rough
length of RLCS because it is incremented by δi, j when there is a rough match. The
resulting value of ci, j need not be an integer. Therefore, it cannot be the actual
length of any sequence. The actual length of RLCS is defined by the following
recurrence:
cai, j =
0 ; i. j = 0
cai−1, j−1 + 1 ; ri ≈ q j
max(cai−1, j, c
aj−1,i) ; ri! ≈ qi
(3.7)
In (3.7), cost is incremented by 1 on a rough match. In (3.6), while computing score,
half of the importance is given to the ratio of rough length of RLCS and the query
length. Instead of just considering how good the rough length of the RLCS with
respect to the query length is, it is conjectured that it is also important to consider
how good the rough length of the RLCS is with respect to the actual length of the
RLCS.ci, j + ci, j
cai, j + n
gives equal importance to both the ratios. This term is similar to the F1 score where
precision and recall are given equal importance.
3.4.2 RWAR and RWAQ
WAR and WAQ represent the width of the shortest substring that contains the
RLCS. As discussed in the previous subsection, ci, j represents the rough length
28
which is shorter than the actual length of the RLCS. Therefore, it is not clear
whetherci, j
wri, j
really represents the density of the RLCS in the reference. This term
also penalizes based on the degree of match, while a penalty has already been
accounted for in the term,ci, j
n . Therefore, a rough width across reference and query
is required that represents the rough width of the shortest substring containing
the RLCS. On a rough match, cost is incremented by a δi, j. At the same time,
when a rough match is obtained, the WAR and WAQ are also incremented by δi, j
resulting in Rough WAR (RWAR) and Rough WAQ (RWAQ), respectively. When
there is no match, RWAR and RWAQ are incremented by 1 whereas the cost is not
incremented. Therefore, RWAR and RWAQ account for the density of the RLCS
in the reference and query better. RWAR and RWAQ can be computed by the
following recurrences:
rwri, j =
0 ; i. j = 0
rwri−1, j−1 + δi, j ; ri ≈ q j
rwri−1, j + 1 ; ri! ≈ qi, ci−1, j ≥ ci, j−1
rwri, j−1 ; ri! ≈ qi, ci−1, j < ci, j−1
(3.8)
rwqi, j =
0 ; i. j = 0
rwqi−1, j−1 + δi, j ; ri ≈ q j
rwqi−1, j ; ri! ≈ qi, ci−1, j ≥ ci, j−1
rwqi, j−1 + 1 ; ri! ≈ qi, ci−1, j < ci, j−1
(3.9)
29
3.4.3 Matched rate on the query sequence
In (3.6), ρ is an empirical parameter that is set based on the required match rate on
the entire query sequence. The score is updated by a non-zero value, only if the
rough length of the RLCS is greater than ρ × n. It is not clear how to set the value
of ρ or what it means for the rough length to be greater than a fraction of the query
length. Instead, it would be better to update the score by a non-zero value if the
actual length is greater than ρ × n. This makes the interpretation clear and makes
it easy to set the value of ρ.
The score update of the modified-RLCS is given by the following equation:
Scorei, j =
(β
ci, j
rwri, j
+ (1 − β)ci, j
rwqi, j
)·
ci, j+ci, j
cai, j+n i f ca
i, j ≥ ρn
0 otherwise
(3.10)
3.5 A Two-Pass Dynamic Programming Search
In Section 3.2 it is illustrated that the sequence of stationary points are crucial for
a motif. Therefore, RLCS is used to query for the stationary points of the given
motif in the alapana.
Music matching using LCS methods for western music is performed on sym-
bolic music data[19]. The musical notes in this context are the symbols. However,
in the context of Carnatic music, there is no consistent one to one correspondence
between the notation and the sung melody. Although, in this work, stationary
points are used instead of a symbolic notation, one must keep in mind that sta-
tionary points are not symbols but are continuous pitch values. In order to match
such pitch values, a rough match instead of an exact match is required. A variant
30
of the LCS known as the Rough Longest Common Subsequence [30] allows such a
rough match.
In this work, a two pass RLCS matching is performed. In the first pass, the
stationary points of the reference sequence and the query sequence are matched to
obtain the candidate motif regions. Nevertheless, given two consecutive stationary
points, the pitch contour between these two stationary points can be significantly
different for different phrases. This leads to many false alarms. A second pass of
RLCS is then performed on the regions obtained from the first pass to filter out the
false alarms from the true motifs.
3.5.1 First Pass: Determining Candidate Motif Regions using
RLCS
The RLCS algorithm used in this work is illustrated in this section. The alapana is
first windowed and then processed with the RLCS algorithm. The window size
chosen for this task is 1.5 times the length of the motif queried for. The matrices
obtained from the RLCS are then processed as follows:
• From the cells of the score matrix with values greater than a threshold,seqFilterTd, sequences are obtained by tracing the direction matrix backwards.
• The duplicate sequences which may be acquired are neglected, preservingunique sequences of length greater than ρ times the length of reference. Theseare then added to a sequence buffer.
• This process is repeated for every window. The window is shifted by a hopof one stationary point.
• The sequences obtained thus are grouped.
• Each group, taken from the first element of the first member to the lastelement of the last member, represents a potential motif region.
31
3.5.2 Second Pass: Determining Motifs from the Groups
In the first pass a matching of only the stationary points is performed. As men-
tioned above, even though the stationary points are matched it is not necessary
that the trajectory between them matches. This leads to a large number of false
alarms. Now that the search space is reduced, the RLCS is performed between the
entire pitch contour of the potential motif region obtained in the first pass and the
motif queried. The entire pitch contour is used in order to account for the trajectory
information contained in the phrases. The threshold Td used for the first pass is
tightened in this iteration for better precision while matching the entire feature
vector. In this iteration, the cell of the score matrix having the maximum value is
chosen and the sequence is traced back using the direction matrix from this cell.
This sequence is hypothesized to be the motif. The database and experimentation
are detailed in the following sections.
3.6 Dataset
Table 3.1 gives the details of the dataset of alapana used in this work. As mentioned
above, this task will be performed on alapanas. The motifs are categorized into two
types based on their durations: short motifs and long motifs. The details of these
motifs are given in Table 3.2 and Table 3.3. The average duration is obtained from
the labeled ground truth. The long motifs are inspired by the “raga test” conducted
by Rama Verma1. Most people across the globe were able to unambiguously
determine the identity of ragas using these motifs. An attempt was made to use
the motifs from Rama Verma’s raga test directly. As the recordings are rather noisy,
3.7.2 Comparison between RLCS and Modified-RLCS using longer
motifs
Motif spotting is performed using RLCS and modified-RLCS on the dataset of
alapanas using longer motifs as queries. Td is set to 0.45 in both the methods.
This is done so that pitch values which are approximately one semitone apart are
considered as rough match. ρ is set to zero because the best value of ρ could be
different for both the methods which makes the comparison difficult.
First Voice Activity Detection (VAD) is performed on the alapanas to get the
voiced parts. This approximately segments the alapana into phrases. Instead of
the entire alapana, these voiced regions are used. In the first pass, stationary points
of the query motif and test alapana are used and the motif regions or groups are
retrieved along with their scores. Each group either corresponds to a motif or a
false alarm. A true group consists of one or more true positives. Score distribution
of the true positive groups and false alarm groups for both the algorithms are
shown in Figure 3.4. Each score value is subtracted from the mean of scores of the
false alarms such that mean of the false alarms’ distribution becomes zero for both
the algorithms. This enables a better comparison between RLCS and modified-
RLCS algorithms. The shared region in the score distribution of true positives
groups and false alarms groups is less in the modified-RLCS algorithm than in the
RLCS algorithm.
Motifs are sparsely present in an alapana. Our purpose is to retrieve as many
motifs as possible. Spotting all or most of the motifs is more crucial than removal
of all false alarms. Therefore higher penalty is given for missing a motif than
for a false alarm group. Score threshold is selected from the minimum detection
cost function for both the algorithms. The sequences whose scores are above the
36
−1 −0.5 0 0.5 1
Density
Scores−1 −0.5 0 0.5 1
Density
Scores
a) RLCS b) Modified-RLCS
Figure 3.4: a) True positive groups’ and false alarm groups’ score distribution forRLCS. b) True positive groups’ and false alarm groups’ score distribu-tion for modified-RLCS.
score threshold are preserved. The details of the comparison after the first pass are
shown in Table 3.8. Modified-RLCS has shown a clear improvement over RLCS in
terms of false alarms and average duration of true positives and false alarms.
Table 3.8: Long Motifs: Retrieved regions after the first pass
Raga identification by machine is a difficult task in Carnatic music. This is primarily
because a raga is not defined just by the solfege but by svaras (ornamented notes)
[27]. The melodic histograms obtained for the Carnatic music are more or less
continuous owing to the gamaka laden svaras of the raga [44]. As discussed in
Chapter 2, the svaras in Carnatic music are not quantifiable, but for notational
purposes an octave is divided into 12 semitones: S, R1, R2(G1), R3(G2), G3, M1,
M2, P, D1, D2(N1), D3(N2) and N3. Each raga is characterised by atleast four or five
svaras. Arohana and avarohana correspond to an ordering of svaras in the ascent and
descent of the raga, respectively. Ragas with linear ordering of svaras are referred
to as linear ragas such as Mohonam raga (S R2 G3 P D2 S). Similarly, non linear ragas
have non linear ordering such as Ananda Bhairavi raga (S G2 R2 G2 M1 P D2 P S).
A further complication arises owing to the fact that although the svaras in different
ragas may be identical, the ordering can be different. Even if the ordering is the
same, in one raga the approach to the svara can be different, for example, Thodi and
Dhanyasi [27].
In this chapter, this problem is addressed in a different way. The objective
is to mimic a listener in a Carnatic music concert. There are at least 100 ragas
that are actively performed today. Most listeners identify ragas by referring to the
compositions with similar motivic patterns that they might have heard before. In
raga verification, a raga’s name (claim) and an audio clip is supplied. The machine
has to primarily verify whether the clip belongs to a given raga or not.
This task therefore requires the definition of cohorts for a raga. As discussed
in Chapter 4, cohorts of a given raga are the ragas which have similar movements
while at the same time have subtle differences, for example, darbar and nayaki. In
darbar raga, G2 is repeated twice in avarohana. The first is more or less flat and short,
while the second repetition is inflected. The G2 in nayaki is characterised by a very
typical gamaka. In order to verify whether a given audio clip belongs to a claimed
raga, the similarity is measured with respect to the claimed raga and compared with
its cohorts using a novel algorithm called Longest Common Segment Set (LCSS).
LCSS scores are then normalized using Z and T norms [1, 34].
The rest of the chapter is organised as follows. Section 5.2 describes the dataset
used in the study. Section 5.3 describes the LCSS algorithm and its relevance for
raga verification. As the task is raga verification, score normalisation is crucial.
Different score normalisation techniques are discussed in Section 5.4. The experi-
mental results are presented in Section 5.5 and discussed in Section 5.6. The main
conclusions drawn from the key results in this work are discussed in Section 5.7
5.2 Dataset used
Table 5.1 gives the details of the dataset used in this work. This dataset is obtained
from the Charsur arts foundation1. The dataset consists of 254 vocal and instru-
mental live recordings spread across 30 ragas, including both target ragas and their
cohorts. For every new raga that needs to be verified, templates for the raga and
1http://www.charsurartsfoundation.org
57
Table 5.1: Details of the database used. Durations are given in approximate hours(h), minutes (m) or seconds (s).
Vocal Instruments TotalMale Female Violin Veena Saxophone FluteNumber of Ragas 25 27 8 3 2 2 30 (distinct)Number of Artists 53 37 8 3 1 3 105Number of Recordings 134 97 14 4 2 3 254Total Duration of Recordings 30 h 22 h 3 h 31 m 10 m 58 m 57 hNumber of Pallavi Lines 655 475 69 20 10 15 1244Average Duration of Pallavi Lines 11 s 8 s 10 s 6 s 6 s 8 s 8 s (avg.)Total Duration of Pallavi Lines 2 h 1 h 11 m 2 m 55 s 2 m 3 h
its cohorts are required.
5.2.1 Extraction of pallavi lines
A composition in Carnatic music is composed of three parts, namely, pallavi, anu-
pallavi and charanam. It is believed that the first phrase of the pallavi line of a
composition contains the important movements in a raga. A basic sketch is initi-
ated in the pallavi line, developed further in the anupallavi and charanam[42] and
therefore contains the gist of the raga. The algorithm described in [42] is used for
extracting pallavi lines from compositions. Details of the extracted pallavi lines are
given in Table 5.1. Experiments are performed on template and test recordings,
selected from these pallavi lines, as discussed in greater detail in Section 5.5.
5.2.2 Selection of cohorts
Wherever possible 4-5 ragas are chosen as cohorts of every raga. The cohorts
of every raga were defined by a professional musician. Professionals are very
careful about this as they need to ensure that during improvisation, they do not
accidentally sketch the cohort. Interestingly, as indicated by the musicians, cohorts
58
need not be symmetric. A raga A can be similar in movement to a raga B, but raga
B need not share the same commonality with raga A. The identity of raga B
may depend on phrases similar to raga A with some additional movement. For
example, to identify the raga Hindolam, the phrase G2 M1 D1 N2 S is adequate,
while Jayantashree raga requires the phrase G2 M1 D1 N2 S N2 D1 P M1 G2 S.
5.3 Longest Common Segment Set Algorithm
In raga verification, matching needs to be performed between two audio clips. The
number of similar portions could be more than one and spread across the entire
clip. Therefore, there is a need for a matching approach that can find these similar
portions without issuing large penalties for gaps in between them. In this section, a
novel algorithm called Longest Common Segment Set is described which attempts
to do the same.
Let X = 〈x1, · · · , xm; xi ∈ R; i = 1 · · ·m〉 be a sequence of m symbols and Y =
〈y1, · · · , xn; y j ∈ R; j = 1 · · · n〉 be a sequence of n symbols where xi and y j are the
tonic normalized pitch values in cents. The similarity between two pitch values,
xi and y j, is calculated using (4.11) defined in Chapter 4.
A common subsequence SXY in sequences X and Y is defined as
SXY =
⟨(xi1 , y j1), · · · , (xip , y jp)
⟩1 ≤ i1 < · · · < ip ≤ m
1 ≤ j1 < · · · < jp ≤ n
simk=1,··· ,p
(xik , y jk) ≥ τsim
(5.1)
59
Soft segment running score
10
20
30
40
50
60
70
80
90
100
110
Common subsequence
Hard segments
Soft segments
450 500 550 600 650 700 750 800
Sequence 2
0
200
400
600
800
1,000
1,200
Sequence 1
DDDD
200300400200
300
400
Pitch
Pitch
Figure 5.1: An example of a common segment set between two sequences repre-senting the real data
where τsim is a threshold which decides the membership of the symbol pair (xik , y jk)
in a subsequence SXY. The value of τsim is decided empirically based on the domain
of the problem as discussed in Section 5.5. An example common subsequence is
shown with red color in Figure 5.1.
5.3.1 Common segments
Continuous symbol pairs in a common subsequence are referred to as a segment.
Two different types of segments are defined, namely hard and soft segments.
60
Hard segment is a group of common subsequence symbols such that there are
no gaps in between as shown in green color in Figure 5.1. Then a hard segment,
starting with a symbol pair (xi, y j), must be of the form
HlXiY j
=
⟨(xi, y j), (xi+1, y j+1), · · · , (xi+l, y j+l)
⟩1 ≤ i < i + 1 < · · · < i + l ≤ m
1 ≤ j < j + 1 < · · · < j + l ≤ n
(5.2)
where l + 1 represents the length of the hard segment. The score of the kth hard
segment HlXik Y jk
is defined as
hc(Hl
Xik Y jk
)=
l∑d=0
sim(xik+d, y jk+d
)(5.3)
Soft segment is a group of common subsequence symbols where gaps are
permitted with a penalty. Therefore, a soft segment consists of one or more hard
segments (shown with blue color in Figure 5.1). The gaps between the hard
segments decides the penalty assigned. Thus, the score of the kth soft segment
SXik Y jk, consisting of r hard segments, is defined as
sc(SXik Y jk
)=
r∑s=1
hc(Hl
Xik Y jk
)− γη (5.4)
where γ is the total number of gaps between r hard segments and η is the penalty
for each gap. The number of hard segments to be included in a soft segment is
decided by the running score of the soft segment. The running score of the soft
segment increases during the hard segment and decreases during the gap due to
penalties as shown in gray-scale in Figure 5.1. During a gap, if the running score
61
decreases below a threshold τrc (or becomes almost white in Figure 5.1) then that
gap is ignored and all the hard segments, encountered before it, are included into
a soft segment.
5.3.2 Common segment set
All segments together correspond to a segment set. The score of a segment set (ss)
is defined as
score (ssXY) =
∑pk=1 c
(ZXik Y jk
)2
min(m,n)2 (5.5)
where p is the number of segments, c refers to the score computed in either (5.3)
or (5.4) and Z refers to a segment (hard or soft). This equation gives preference
to longer segments. For example, in case 1, there are 10 segments each of length
2 and in case 2, there are 4 segments each of length 5. In both the cases the total
length of the segments is 20 but in (5.5), case 1 is scored as 0.1 and case 2 is scored
as 0.25 when the denominator is taken to be 202. Longer matched segments could
be considered as a phrase or an essential part of it. Whereas, shorter matched
segments could generally mean noise. Therefore, there is a heavier penalty for
shorter segments.
5.3.3 Longest Common Segment Set
Longest Common Segment Set (LCSS) is a segment set with maximum score value
as defined in (5.6).
lcssXY = argmaxssXY
(score (ssXY)) (5.6)
62
Therefore, LCSS can be obtained by maximizing score in (5.5) using dynamic
programming.
Dynamic Programming algorithm to find Longest Common Segment Set
The algorithm to find the optimum soft segment set is given in Algorithm 1.
Optimum hard segment sets are found similarly. In the algorithm, tables c and
s are used for storing the running score and the score of the common segment
sets, respectively. Table a is used for storing the partial scores from s. Table d
is maintained for backtracking the path of the LCSS. The arrows represent the
subpath to take while backtracking (up, left or cross). Input sequences to function
LCSS are appended with symbols φx and φy such that their similarity with any
symbol is 0. This is mainly required to compute the last row and column of score
table. Line 8 in Algorithm 1 updates the running score with a value based on
the similarity, whereas line 9 updates the score using the previous diagonal entry.
When symbols are dissimilar a gap is found. Lines 12 and 19 are used to penalize
the running score. If it is an end of the segment then line 14 and 21 updates
score as per (5.5). Line 26 updates table a with the score value of the current
segment set when the beginning of a new segment is encountered. When a gap is
encountered line 28 updates table a to −1. To find the Longest Common Segment
Set, backtracking is performed to obtain the path in table d that has the maximum
score as given by table s. The boundaries of soft segments can be found using the
cost values while tracing the path.
63
Algorithm 1 Algorithm for Soft-Longest Common Segment SetData:c - table of size (m + 2) × (n + 2) for storing running scores - table of size (m + 2) × (n + 2) for storing scored - table of size (m + 2) × (n + 2) for path trackinga - table of size (m + 2) × (n + 2) for storing partial scores.
)2: Initialize 1st row and column of c, s, d and a to 03: p← min(m,n)4: for i← 1 to m + 1 do5: for j← 1 to n + 1 do6: if sim(xi, y j) > τsim then7: di, j ← “↖ ”8: ci, j ← ci−1, j−1 +
( sim(xi, y j)−τsim
1−τsim
)9: si, j ← si−1, j−1
10: else if ci−1, j > ci, j−1 then11: di, j ← “ ↑ ”12: ci, j ← max(ci−1, j − ρ, 0)13: if di−1, j = “↖ ” then
testing. This partitioning of dataset is done into two ways, referred as D1 and D2.
In D1, the variations of a pallavi line might fall into both templates and test though
it is not necessary. Variations of a pallavi line are different from the pallavi line due
to improvisations. In D2, these variations can either belong to template or they all
belong to test but strictly not present in both. The values of thresholds τsim and τrc
are empirically chosen as 0.45 and 0.5, respectively. Penalty, η, issued for gaps in
segments is empirically chosen as 0.5.
5.5.2 Results
Table 5.2 and Figure 5.2 show the comparison of LCSS with DTW and RLCS using
different normalizations. Equal Error rate (EER) refers to a point where false alarm
rate and miss rate is equal. For T-norm, the best 20 cohort scores were used for
normalization. LCSS (soft) with T-norm performs best for D1 around the EER
point, and for high miss rates and low false alarms, whereas it performs poorer
than LCSS (hard) for low miss rates and high false alarms. This behavior appears
to be reversed for D2. The magnitude around EER is much greater for D2. This
67
0.001 0.01 0.1 0.2 0.5 1 2 5 10 20 40
0.1
0.2
0.5
1
2
5
10
20
40
80
False Alarm probability (in %)
Mis
s p
rob
ab
ility
(in
%)
no−norm lcss (soft)
z−norm lcss (soft)
t−norm lcss (soft)
no−norm lcss (hard)
z−norm lcss (hard)
t−norm lcss (hard)
no−norm rlcs
z−norm rlcs
t−norm rlcs
no−norm dtw
z−norm dtw
t−norm dtw
0.001 0.01 0.1 0.2 0.5 1 2 5 10 20 40
0.1
0.2
0.5
1
2
5
10
20
40
80
False Alarm probability (in %)
Mis
s p
rob
ab
ility
(in
%)
no−norm lcss (soft)
z−norm lcss (soft)
t−norm lcss (soft)
no−norm lcss (hard)
z−norm lcss (hard)
t−norm lcss (hard)
no−norm rlcs
z−norm rlcs
t−norm rlcs
no−norm dtw
z−norm dtw
t−norm dtw
a) DET curves for dataset D1 b) DET curves for dataset D2
Figure 5.2: DET curves comparing LCSS algorithm with different algorithms usingdifferent score normalizations
is because, none of the variations of the pallavi lines in test are present in the
templates. It is also shown that RLCS performs poorer than any other algorithms
for D2. The curves also show no improvements for Z-norm compared to baseline
with no normalization. This can happen due to the way normalization parameters
are estimated for Z-norm. For example, some of the templates, which may not be
similar to the test, can be similar to some of the cohorts’ templates, resulting in
higher mean. This would not have happened in T-norm where the test itself is
tested against the cohorts’ templates.
5.6 Discussion
In this section, we discuss how LCSS (hard) and LCSS (soft) can be combined
to achieve better performance. We also verify that T-norm reduces the overlap
between true and imposter scores.
68
Table 5.3: Number of claims correctly verified by hard-LCSS only, by soft-LCSSonly, by both and by neither of them for D1 and D2 using T-norm
Dataset Claim- Hard- Soft- Both Neithertype only only
D1 True 23 55 289 77False 46 78 1745 54
D2 True 47 23 155 220False 99 75 1585 168
5.6.1 Combining hard-LCSS and soft-LCSS
Instead of selecting a threshold, we will assume that a true claim is correctly
verified when its score is greater than all the cohort scores. Similarly, a false claim
is correctly verified when its score is lesser than atleast one of the cohort scores.
Table 5.3 shows the number of claims correctly verified only by hard-LCSS, only
by soft-LCSS, by both and by neither of them. It is clear that there is an overlap
between the correctly verified claims of hard-LCSS and soft-LCSS. Nonetheless,
the number of claims distinctly verified by both is also significant. Therefore, the
combination of these two algorithms could result in a better performance.
5.6.2 Reduction of overlap in score distribution by T-norm
Figure 5.3 shows the effect of T-norm on the distribution of hard-LCSS scores. It is
clearly seen that the overlap, between the true and imposter score distributions, is
reduced significantly. For visualization purposes, the true score distributions are
scaled to zero mean and unit variance and corresponding imposter score distribu-
tions are scaled appropriately.
69
−5 −4 −3 −2 −1 0 1 2 3 4 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
LCSS (hard) scores without normalization
Den
sity
True scores
Imposter scores
−5 −4 −3 −2 −1 0 1 2 3 4 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
LCSS (hard) scores with t−norm
Den
sity
True scores
Imposter scores
Figure 5.3: Showing the effect of T-norm on the score distribution
5.6.3 Scalability of raga verification
The verification of a raga depends on the number of its cohort ragas which are
usually 4 or 5. Since it does not depend on all the ragas in the dataset, as in raga
identification, any number of ragas can be added to the dataset.
5.7 Summary
In this Chapter, we have presented a different approach to raga analysis in Carnatic
music. Instead of raga identification, raga verification is performed. A set of cohorts
for every raga is defined. The identity of an audio clip is presented with a claim.
The claimed raga is verified by comparing with the templates of the claimed raga
and its cohorts by using a novel approach, Longest Common Segment Set (LCSS).
A set of 17 ragas and its cohorts constituting 30 ragas is tested using appropriate
score normalization techniques. An equal error rate of about 12% is achieved. This
approach is scalable to any number of ragas as the given raga and only its cohorts
need to be added to the system.
70
CHAPTER 6
Conclusion
Typical motifs of a raga are used to establish its identity in all improvisational and
compositional forms. Along with raga identity, typical motifs can also be used to
index a recording for archival purposes. Further, indexed motifs can also be used
to explore and to analyze the melodic phrases connecting them, which could be
useful for both listeners and learners of Carnatic music.
The objective of this thesis was to develop algorithmic techniques for automatic
extraction of typical motifs and for performing raga verification using the regions
replete with typical motifs. Some of the salient points presented in this thesis are
as follows:
6.1 Salient Points
• It was shown using pitch histograms that the notes in Carnatic music havegreater pitch range as compared to Hindustani music and Western classicalmusic. This renders the symbolic representation of Carnatic music a non-trivial task and poses significant challenges in the analysis of Carnatic music.
• The stationary points of the pitch contour were shown to preserve the es-sential raga information however, the exact melodic information was lost.For the task of finding different renditions of typical motifs, these stationarypoints were used to reduce the search space. A measure based on the slopeof the linear trend in stationary points along with its standard deviation isused to reduce the false alarms.
• An algorithm was proposed for time-series matching which is a modificationof an existing algorithm known as Rough Longest Common Subsequence.This algorithm can match shorter sequences that are common between twolonger sequences. However, the score was penalized with respect to the
length of the longer sequences. Therefore, matched shorter sequences canget low scores suggesting that the match is poor even when the match isgood.
• The second algorithm, known as Longest Common Segment Set, proposedfor time-series matching was novel. It can also match shorter sequences thatare common between two longer sequences but the score was not penalizedwith respect to the longer sequences. Therefore, it was more effective in theextraction of the common shorter sequences.
• Typical motifs of duration of approximately four seconds were found to bemore relevant for raga identity. Shorter motifs had less context and resultedin great deal of false alarms.
• Typical motifs were found to be prevalent in the pallavi lines of the composi-tions. Therefore, these pallavi lines were used in the task of raga verification.
• In raga verification, cohort ragas (usually four or five ragas) were used fornormalizing the score instead of all the ragas in the dataset. Therefore, theproposed raga verification system was found to be scalable to any number ofragas. For a new raga to be added into the system, only the templates of thenew raga and its cohorts were required without altering the existing system.
6.2 Criticism of the work
In this section, we discuss the shortcomings of the approaches proposed in this
thesis.
• The proposed algorithms for time-series matching requires that the orderingof the common shorter sequences is same in both the longer sequences. Ifthe ordering is different, all the common shorter sequences are not matched.
• The algorithms also fail to match sequences if they are in different octaves.
• The performance of the algorithms is also sensitive to pitch errors. Thisproblem is dealt to some extent by smoothing the pitch contours if the pitcherrors are not significantly large.
• Typical motifs are retrieved only if they repeat across the composition lines.Therefore, this approach relies on large number of composition lines.
• Raga verification also needs large number of composition lines (templates)such that most of the typical motifs are represented.
72
6.3 Future work
Given the drawbacks of the proposed approaches in the previous section, the
following improvements can be made:
• For time-series matching, when the ordering of common shorter sequencesis different, no single alignment can align all the common shorter sequences.In such situations different alignments can be inspected to extract all thecommon shorter sequences irrespective of their order.
• For matching sequences that belong to different octaves, one of the two se-quences can be shifted to different octaves and the matching can be performedwith all the shifted sequences.
• Instead of using pitch to represent the melody, a transformation of the fre-quency spectrum can be used that reduces other noises and preserves themelody. This will help in improving the performance of the algorithm whichis sensitive to pitch errors.
73
LIST OF PAPERS BASED ON THESIS
1. Shrey Dutta, Krishnaraj Sekhar PV and Hema A. Murthy. Raga Verificationin Carnatic Music using Longest Common Segment Set. In Proceedings of16th International Society for Music Information Retrieval Conference, 2015.
2. Shrey Dutta and Hema A. Murthy. Discovering Typical Motifs of a Raga fromOne-Liners of Songs in Carnatic Music. In Proceedings of 15th InternationalSociety for Music Information Retrieval Conference, pages 397–402, 2014.
3. Shrey Dutta and Hema A. Murthy. A modified rough longest commonsubsequence algorithm for motif spotting in an Alapana of Carnatic Music.In 20th National Conference on Communications (NCC), pages 1–6, 2014.
4. Vignesh Ishwar, Shrey Dutta, Ashwin Bellur and Hema A. Murthy. MotifSpotting in an Alapana in Carnatic Music. In Proceedings of 14th InternationalSociety for Music Information Retrieval Conference, pages 499–504, 2013
74
REFERENCES
[1] Roland Auckenthaler, Michael Carey, and Harvey Lloyd-Thomas. Score normal-ization for text-independent speaker verification systems. Digital Signal Processing,10:42–54, 2000.
[2] Ashwin Bellur and Hema A Murthy. A cepstrum based approach for identifying tonicpitch in indian classical music. In National Conference on Communications, pages 1–5,2013.
[3] Yueguo Chen, Mario A. Nascimento, Beng Chin, Ooi Anthony, and K. H. Tung. Spade:On shape-based pattern detection in streaming time series. In International Conferenceon Data Engineering, pages 786–795, 2007.
[4] Bill Chiu, Eamonn Keogh, and Stefano Lonardi. Probabilistic discovery of time seriesmotifs. In Proceedings of the Ninth ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, pages 493–498, 2003.
[5] P Chordia and A Rae. Raag recognition using pitch- class and pitch-class dyaddistributions. In Proceedings of International Society for Music Information RetrievalConference, ISMIR, pages 431–436, 2007.
[6] Tom Collins, Andreas Arzt, Sebastian Flossmann, and Gerhard Widmer. Siarct-cfp:Improving precision and the discovery of inexact musical patterns in point-set repre-sentations. In Internation Society for Music Information Retrieval, pages 549–554, 2013.
[7] Tom Collins, Jeremy Thurlow, Robin Laney, Alistair Willis, and Paul H. Garthwaite.A comparative evaluation of algorithms for discovering translational patterns inbaroque keyboard works. In International Society for Music Information Retrieval, pages3–8, 2010.
[8] Darrell Conklin. Discovery of distinctive patterns in music. Intelligent Data Analysis,pages 547–554, 2010.
[9] Darrell Conklin. Distinctive patterns in the first movement of brahms string quartetin c minor. Journal of Mathematics and Music, 4(2):85–92, 2010.
[10] Jonathan D. Cryer and Kung-Sik Chan. Time Series Analysis: with Applications in R.Springer, 2008.
[11] Pranay Dighe, Parul Agarwal, Harish Karnick, Siddartha Thota, and Bhiksha Raj.Scale independent raga identification using chromagram patterns and swara basedfeatures. In 2013 IEEE International Conference on Multimedia and Expo Workshops, SanJose, CA, USA, July 15-19, 2013, pages 1–4, 2013.
[12] Pranay Dighe, Harish Karnick, and Bhiksha Raj. Swara histogram based structuralanalysis and identification of indian classical ragas. In Proceedings of the 14th Inter-national Society for Music Information Retrieval Conference, ISMIR 2013, Curitiba, Brazil,November 4-8, 2013, pages 35–40, 2013.
75
[13] Subbarama Dikshitulu. Sangita sampradaya pradarsini. The Music Academy Madras,Vol. 2, 2011.
[14] D.P.W. Ellis and G.E. Poliner. Identifying ‘cover songs’ with chroma features anddynamic programming beat tracking. In Proceedings of IEEE International Conferenceon Acoustics, Speech and Signal Processing, volume 4, pages 1429–1432, 2007.
[15] F. N. Fritsch and R. E. Carlson. Monotone Piecewise Cubic Interpolation. SIAM Journalon Numerical Analysis, Vol. 17, No. 2., 1980.
[16] Toni Giorgino. Computing and visualizing dynamic time warping alignments in R:The dtw package. Journal of Statistical Software, 31(7):1–24, 2009.
[17] AnYuan Guo and Hava Siegelmann. Time-warped longest common subsequencealgorithm for music retrieval. In Proceedings of 5th International Conference on MusicInformation Retrieval (ISMIR), 2004. http://works.bepress.com/hava_siegelmann/13.
[18] S Arthi H G Ranjani and T V Sreenivas. Shadja, swara identification and raga veri-fication in alapana using stochastic models. In 2011 IEEE Workshop on Applications ofSignal Processing to Audio and Acoustics (WASPAA), pages 29–32, 2011.
[19] F Scholer I S H Suyoto, A L Uitdenbogerd. Searching musical audio using symbolicqueries audio, speech, and language processing. IEEE Transactions on In Audio, Speech,and Language Processing, IEEE Transactions on, Vol. 16, No. 2., pages 372–381, 2008.
[20] Vignesh Ishwar, Ashwin Bellur, and Hema A Murthy. Motivic analysis and its rel-evance to raga identification in carnatic music. In Workshop on Computer Music,Instanbul, Turkey, July 2012. http://compmusic.upf.edu/publications.
[21] M Miron J Serra, G K Koduri and X Serra. Tuning of sung indian classical music. InProceedings of the 12th International Society for Music Information Retrieval Conference,ISMIR, pages 157–162, 2011.
[22] Berit Janssen, W. Bas de Haas, Anja Volk, and Peter van Kranenburg. Discoveringrepeated patterns in music: state of knowledge, challenges, perspectives. InternationalSymposium on Computer Music Modeling and Retrieval (CMMR), pages 225–240, 2013.
[23] Gopala Krishna Koduri, Sankalp Gulati, and Preeti Rao. A survey of raaga recognitiontechniques and improvements to the state-of-the-art. Sound and Music Computing,2011.
[24] Gopala Krishna Koduri, Sankalp Gulati, Preeti Rao, and Xavier Serra. Raga recogni-tion based on pitch distribution methods. Journal of New Music Research, 41(4):337–350,2012.
[25] A.S. Krishna, P.V. Rajkumar, K.P. Saishankar, and M. John. Identification of carnaticraagas using hidden markov models. In Applied Machine Intelligence and Informatics(SAMI), 2011 IEEE 9th International Symposium on, pages 107 –110, jan. 2011.
[26] T. M. Krishna. A Southern Music: The Karnatic Story, chapter 5. HarperCollins, India,2013.
[27] T M Krishna and Vignesh Ishwar. Carnatic music : Svara, gamaka, motif andraga identity. In Workshop on Computer Music, Instanbul, Turkey, July 2012. http://compmusic.upf.edu/publications.
[28] A Krishnaswamy. Application of pitch tracking to south in- dian classical music. InProc. of the IEEE Int. Conf. on Acous- tics, Speech and Signal Processing (ICASSP), pages557–560, 2003.
[29] V. Kumar, H. Pandya, and C.V. Jawahar. Identifying ragas in indian music. In 22ndInternational Conference on Pattern Recognition (ICPR), pages 767–772, 2014.
[30] Hwei-Jen Lin, Hung-Hsuan Wu, and Chun-Wei Wang. Music matching based onrough longest common subsequence. Journal of Information Science and Engineering,pages 95–110, 2011.
[31] Lie Lu, Muyuan Wang, and Hong-Jiang Zhang. Repeating pattern discovery andstructure analysis from acoustic music data. In Proceedings of the 6th ACM SIGMMInternational Workshop on Multimedia Information Retrieval, pages 275–282, 2004.
[32] David Meredith, Kjell Lemstrom, and Geraint A. Wiggins. Algorithms for discoveringrepeated patterns in multidimensional representations of polyphonic music. Journalof New Music Research, pages 321–345, 2002.
[33] Meinard Muller, Frank Kurth, and Michael Clausen. Audio matching via chroma-based statistical features. In Proceedings of International Society for Music InformationRetrieval (ISMIR), pages 288–295, 2005.
[34] Jiri Navratil and David Klusacek. On linear dets. In Proceedings of the IEEE InternationalConference on Acoustics, Speech, and Signal Processing, pages 229–232, 2007.
[35] Gaurav Pandey, Chaitanya Mishra, and Paul Ipe. Tansen: A system for automaticraga identification. In Indian International Conference on Artificial Intelligence, pages1350–1363, 2003.
[36] Pranav Patel, Eamonn Keogh, Jessica Lin, and Stefano Lonardi. Mining motifs inmassive time series databases. In Proceedings of IEEE International Conference on DataMining (ICDM02), pages 370–377, 2002.
[37] William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery.Numerical Recipes in C: The Art of Scientific Computing. Second Edition. Oxford UniversityPress, 1992.
[38] P. Rao, J. Ch. Ross, K. K. Ganguli, V. Pandit, V. Ishwar, A. Bellur, , and H. A. Murthy.Melodic motivic analysis of indian music. Journal of New Music Research, 43(1):115–131,2014.
[39] Joe Cheri Ross, Vinutha T. P., and Preeti Rao. Detecting melodic motifs from audiofor hindustani classical music. In Proceedings of 13th International Society for MusicInformation Retrieval (ISMIR), pages 193–198, 2012.
[40] Joe Cheri Ross and Preeti Rao. Detection of raga-characteristic phrases from hindus-tani classical music audio. Workshop on Computer Music, 2012. http://compmusic.upf.edu/publications.
[41] Justin Salamon and Emilia Gomez. Melody extraction from polyphonic music signalsusing pitch contours characteristics. In IEEE Transactions on Audio Speech and LanguageProcessing, 20(6):1759–1770, August 2012.
[42] Sridharan Sankaran, Krishnaraj P V, and Hema A Murthy. Automatic segmentationof composition in carnatic music using time-frequency cfcc templates. In Proceedingsof 11th International Symposium on Computer Music Multidisciplinary Research (CMMR),2015.
[43] J. Serra, E. Gomez, P. Herrera, and X. Serra. Chroma binary similarity and localalignment applied to cover song identification. Audio, Speech, and Language Processing,IEEE Transactions on, 16(6):1138–1151, Aug 2008.
[44] Joan Serra, Gopala K. Koduri, Marius Miron, and Xavier Serra. Assessing the tuningof sung indian classical music. In Proceedings of the 12th International Society for MusicInformation Retrieval Conference, ISMIR 2011, Miami, Florida, USA, October 24-28, 2011,pages 157–162, 2011.
[45] Sankalp Gulati Joan Serra and Xavier Serra. An evaluation of methodologies formelodic similarity in audio recordings of indian art music. In Proceedings of the 40thIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015,pages 678–682, April 2015.
[46] Surendra Shetty. Raga mining of indian music by extracting arohana-avarohanapattern. International Journal of Recent trends in Engineering, 1(1), 2009.
[47] Rajeswari Sridhar and Tv Geetha. Raga identification of carnatic music for musicinformation retrieval. International Journal of Recent trends in Engineering, 1(1):1–4,2009.
[48] M Subramanian. Carnatic ragam thodi pitch analysis of notes and gamakams. Journalof the Sangeet Natak Akademi, XLI(1):3–28, 2007.
[49] D Swathi. Analysis of carnatic music: A signal processing perspective. M.Tech. Thesis,IIT Madras, 2009.
[50] Alexandra L. Uitdenbogerd and Justin Zobel. Manipulation of music for melodymatching. MULTIMEDIA ’98 Proceedings of the sixth ACM international conference onMultimedia, pages 235–240, 1998.