Speech samples for "Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance"

Authors: Hieu-Thi Luong, Junichi Yamagishi

Paper is accepted to SSW11, pre-print version of this paper can be found at https://arxiv.org/abs/2106.13479.

Due to license restriction, we cannot publish the speech samples used for the experiments in the paper to Internet. So we used four speakers of the JVS corpus, each with about one hundred utterances, as the target speakers to demonstrate the voice cloning tasks.

OG: is the original NAUTILUS system

VQ: is the new system NAUTILUS-VQ with vector quantization components.

1st sample		OG/TTS_u	VQ/TTS_u	OG/VCA_u	VQ/VCA_u
Input		古い瑠璃色の縁取りのルーペは、プレミアがついて高額で競り落とされた。		► Play J00000109_common_0029.wav
jvs001 (M)	► Play	► Play	► Play	► Play	► Play
jvs012 (M)	► Play	► Play	► Play	► Play	► Play

jvs004 (F)	► Play	► Play	► Play	► Play	► Play
jvs039 (F)	► Play	► Play	► Play	► Play	► Play

2nd sample		OG/TTS_u	VQ/TTS_u	OG/VCA_u	VQ/VCA_u
Input		少し力が入りすぎている人が、各グループに数人見受けられました。		► Play J00000114_common_0009.wav
jvs001 (M)	► Play	► Play	► Play	► Play	► Play
jvs012 (M)	► Play	► Play	► Play	► Play	► Play

jvs004 (F)	► Play	► Play	► Play	► Play	► Play
jvs039 (F)	► Play	► Play	► Play	► Play	► Play

Acknowledgement

We are grateful to Mr. Nobuyuki Nishizawa for helpful comments and suggestions