Speech samples for "Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance"

Authors: Hieu-Thi Luong, Junichi Yamagishi

Paper is accepted to SSW11, pre-print version of this paper can be found at https://arxiv.org/abs/2106.13479.

Related work: NAUTILUS: a Versatile Voice Cloning System

Due to license restriction, we cannot publish the speech samples used for the experiments in the paper to Internet. So we used four speakers of the JVS corpus, each with about one hundred utterances, as the target speakers to demonstrate the voice cloning tasks.

OG: is the original NAUTILUS system

VQ: is the new system NAUTILUS-VQ with vector quantization components.

1st sample OG/TTSu VQ/TTSu OG/VCAu VQ/VCAu
Input 古い瑠璃色の縁取りのルーペは、プレミアがついて高額で競り落とされた。 ► Play
J00000109_common_0029.wav
jvs001 (M) ► Play ► Play ► Play ► Play ► Play
jvs012 (M) ► Play ► Play ► Play ► Play ► Play
jvs004 (F) ► Play ► Play ► Play ► Play ► Play
jvs039 (F) ► Play ► Play ► Play ► Play ► Play


2nd sample OG/TTSu VQ/TTSu OG/VCAu VQ/VCAu
Input 少し力が入りすぎている人が、各グループに数人見受けられました。 ► Play
J00000114_common_0009.wav
jvs001 (M) ► Play ► Play ► Play ► Play ► Play
jvs012 (M) ► Play ► Play ► Play ► Play ► Play
jvs004 (F) ► Play ► Play ► Play ► Play ► Play
jvs039 (F) ► Play ► Play ► Play ► Play ► Play


Acknowledgement

We are grateful to Mr. Nobuyuki Nishizawa for helpful comments and suggestions

asdasdas