Acknowledgement: Yvonne Lee, Paul Chan, Dong Minghui International Symposium on Next-Generation Artificial Intelligence 3 March 2016, Tokyo Personalized Singing Synthesis - Science Meeting Arts Haizhou Li Institute for Infocomm Research, A*STAR Singapore
28
Embed
Personalized Singing Synthesis - NEDOConverted singing voice time (s)) 0 0.5 1 1.5 2 0 2000 4000 6000 8000 Personalized Singing Synthesis – voice alignment h time Singing h time
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Acknowledgement:
Yvonne Lee, Paul Chan, Dong Minghui
International Symposium on Next-Generation Artificial Intelligence
3 March 2016, Tokyo
Personalized Singing Synthesis - Science Meeting Arts
Haizhou Li
Institute for Infocomm Research, A*STAR
Singapore
1. Speech
2. Singing
3. Music
4. Speech to Singing
Agenda: Personalized Singing Synthesis
• Speech is the most natural way of human communication.
• Singing is to augment regular speech by the use of both tonality and rhythm.
• Music is an artistic way of expression, with instrumental sounds and vocals.
• Singing is emotional musical vocalization. It is an emotional expression of feelings which has the power to alter the mood of both the singer and the listener.
• Singing requires expert knowledge and operations. Vocalists acquire singing skills by extensive training.
• By singing, we – Reduce stress and improve mood
– Lower blood pressure
– Help improve sleep
– Reduce perceived pain
– Motivate and empower …
13
Singing – a part of our life
14
• Almost all singers develop vibrato during voice training. Vibrato is a periodic modulation of the pitch frequency. Vibrato reduces the demand of accuracy in pitch frequency and the singer can use vibrato artistically for expressive purposes [Sundberg87]
[Sundberg87] J. Sundberg, The Science of the Singing Voice, Northern Illinois University Press, 1987.
Singing Theory – Vibrato
15
• By varying the vocal tract to achieve different voice spectra, singers may produce different voice timbres, styles, and achieve efficient sound transmission
[Wolfe09] J. Wolfe, M. Garnier, and J. Smith, Voice Acoustics: An Introduction to the Science of Speech and Singing, 2009. [Sundberg77] J. Sundberg, The Acoustics of the Singing Voice, Scientific American, 1977.
• Singing formant
By adjusting the larynx and the vocal tract near glottis, singing formant is made and the singing can be projected further far away [Wolfe09]
Singing Theory – Singing Formant (Tenor)
La Lore Loo Ler Lee
low pitch
high pitch
high pitch, order changed
16
[Joliveau04] E. Joliveau, J. Smith, and J. Wolfe, “Tuning of vocal tract resonance by sopranos”, Nature, vol. 427, pp. 116, Jan. 2004. [Wolfe09] J. Wolfe, M. Garnier, and J. Smith, Voice Acoustics: An Introduction to the Science of Speech and Singing, 2009. [Wolfe12] J. Wolfe, Sopranos: Resonance Tuning and Vowel Changes, 2012.
[Wolfe12] [Joliveau04]
[Wolfe09]
Singing Theory – Resonance Tuning (Soprano)
17
Music
18
• A piece of music is described by a sequence of notes, showing the pitch and the relative durations
• In Western music, 12 notes of fixed frequencies are used. They are different from each other by a semitone (a ratio of ). An octave span over 12 semitones.
12 2
name whole half quarter eighth sixteenth thirty-second
sixty-fourth
pitched note
rest note
name C4 C4# D4 D4# E4 F4 F4# G4 G4# A4 A4# B4
freq. (Hz)
262 277 294 311 330 349 370 392 415 440 466 494
Music Theory
19
Speech to Singing Synthesis
Are these possible? – Projecting vocal techniques in singing
– Personalizing a voice [Kenmochi10]
– Automating the accompaniment
[Kenmochi10] H. Kenmochi, “VOCALOID and Hatsune Miku phenomenon in Japan,” in Proc. Intersinging, pp. 1-4, 2010.
Singing Synthesis
Singing
Prosody
Timbre Content
Synchronization
21
Analysis and Synthesis
Content
Prosody
(F0)
Timbre
Synthesized
Singing
Music
Accompaniment
Speech Singing
Analysis
Synthesis
• Input: speech or singing
• Output: time-aligned vocal and music
• Changing prosody and timbre from speech to singing vocal
Siu Wa Lee, Ling Cen, Haizhou Li, Yaozhu Paul Chan, Minghui Dong, Method and system for template-based
personalized singing synthesis, US Patent: 20150025892 A1
Personalized Singing Synthesis
(shift to a higher pitch)
pitch
time
pitch
time
pitch
time
pitch
time
Singing
Syncopated Singing
(horizontal nudge
time warping)
Transposed Singing
(shift to a lower pitch)
pitch
time
pitch
time
pitch
time
Example of Speech to Singing Synthesis
Harmonized Singing
(Sum of Melody
& Accompaniment
Vocals)
Speech
pitch
time
24
Template singing voice
time (s)
frequ
ency
(Hz)
0 0.5 1 1.5 20
2000
4000
6000
8000
Speaking voice under conversion
time (s)
frequ
ency
(Hz)
0 0.5 10
2000
4000
6000
8000
Converted singing voice
time (s)
frequ
ency
(Hz)
0 0.5 1 1.5 20
2000
4000
6000
8000
Personalized Singing Synthesis – voice alignment
pitch
time
Singing
pitch
time
Speech
Siu Wa Lee, Ling Cen, Haizhou Li, Yaozhu Paul Chan, Minghui Dong, Method and system for template-based
personalized singing synthesis, US Patent: 20150025892 A1
25
• Generalized Pitch Modeling
– To learn and generate these fluctuations note-by-note [Lee12]
• Various types of fluctuation are implicitly modeled using the same representation
[Lee12] S. W. Lee, S. T. Ang, M. Dong, and H. Li, “Generalized F0 modeling with absolute and relative pitch features for signing voice synthesis,” in Proc. ICASSP, pp. 429-432, 2012.
Generalized Pitch Modeling
Content Independent Method
Pitch Modeling
• Context Independent Method – A fixed formula to assign pitch
fluctuation patterns
• However, pitch fluctuations are context dependent, e.g.
overshoot1 in high frequency notes is different from that in low frequency notes. Vibrato is present more often in long notes, etc.
1. Overshoot is a deflection exceeding the target note frequency after a note change 2. LEE S.W.Y, , DONG M.H., "Singing voice synthesis: Singer-dependent vibrato modeling and coherent processing of spectral envelope", INTERSPEECH 2011