An Investigation of the Relation Between Grapheme Embeddings and Pronunciation for Tacotron-based Systems

Authors: Antoine Perquin, Erica Cooper, Junichi Yamagishi

Submitted to Interspeech 2021.

Our models were trained on the SIWIS dataset.

Examples of samples used in the listening test

Natural WaveRNN from natural Character-based Tacotron Phoneme-based Tacotron
Book sentence
Parliament sentence
Semantically unpredictable sentence

Character embedding swapping

In the following sentences, the contextual character embedding sequence of two sentences are extracted. The embedding of the character between [] is replaced by that of the character between {} before resuming synthesis.

"Il a laissé une [b]arque." "Il a laissé une {m}arque." Result of the swapping
"Il a laissé une [b]arque." "Une {m}achine à laver." Result of the swapping
"Il a laissé une [b]arque." "La {m}aison bleue." Result of the swapping
"Il a laissé une [b]arque." "Le {m}onstre terrifiant." Result of the swapping