Multi-speaker Tacotron: d-vectors

Implementation of d-vectors from here: https://github.com/CorentinJ/Real-Time-Voice-Cloning

Cosine Similarities

Based on x-vectors extracted from synthesized speech.
systemseendevtest
lde-30.8420.4920.549
d-vectors0.8380.4660.551

Audio Samples

System Seen Speakers (training) Unseen Speakers (dev) Unseen Speakers (test)
p225p234p245p334 p360p304p343p264 p363p252p339p351
nat
copy synth
x-vector
LDE-3
d-vector