Cyclic-noise-NSF (VCTK samples)¶

Messages¶

Paper link:

Wang, X. & Yamagishi, J. Using Cyclic Noise as the Source Signal for Neural Source-Filter-Based Speech Waveform Model. in Proc. Interspeech 1992–1996. doi:10.21437/Interspeech.2020-1018
BibTex:
@inproceedings{wang2020cyclic,
address = {ISCA},
author = {Wang, Xin and Yamagishi, Junichi},
booktitle = {Proc. Interspeech},
doi = {10.21437/Interspeech.2020-1018},
pages = {1992--1996},
publisher = {ISCA},
title = {{Using Cyclic Noise as the Source Signal for Neural Source-Filter-Based Speech Waveform Model}},
url = {http://www.isca-speech.org/archive/Interspeech{\_}2020/abstracts/1018.html},
year = {2020}
}
This page lists samples on VCTK database

This page lists copy-synthesis waveform samples, i.e., waveforms generated given natural acoustic features. They were evaluated in the listening test

Code is available. You need both the CURRENNT toolkit and scripts. This subfolder in the script repository is made for this project

New implementaion based on Pytorch is also available

Slides for Interspeech 2020 presentation can be found on this page. Or you can directly download this PPT or PDF.

Audio samples¶

Models were trained on VCTK v0.9, 87 trainining speakers, 200 utterances/speaker, 16kHz waveforms, speaker-independently.

You can find pre-trained models in git repository above.

It may take a few minutes to load all the speech samples. You can also download all samples from this dropbox link.

Seen speakers¶

Test set samples for seen speakers (i.e., speakers who provided training data).

	Natural	WaveNet	Sin (hn-sinc-NSF sine-source)	Cno\({}_{\beta_2}\) (hn-sinc-NSF cyclic-noise-source)

p229_290.wav					p229_290.wav
p243_368.wav					p243_368.wav
p250_305.wav					p250_305.wav
p268_264.wav					p268_264.wav
p285_241.wav					p285_241.wav
p306_293.wav					p306_293.wav
p323_422.wav					p323_422.wav
p240_268.wav					p240_268.wav
p244_409.wav					p244_409.wav
p255_375.wav					p255_375.wav
p270_310.wav					p270_310.wav
p297_328.wav					p297_328.wav
p311_330.wav					p311_330.wav
p347_331.wav					p347_331.wav
p241_325.wav					p241_325.wav
p247_355.wav					p247_355.wav
p267_273.wav					p267_273.wav
p276_363.wav					p276_363.wav
p305_410.wav					p305_410.wav
p314_320.wav					p314_320.wav

Unseen speakers¶

Test set samples for unseen speakers (i.e., speakers who do not provide any training data).

	Natural	WaveNet	Sin (hn-sinc-NSF sine-source)	Cno\({}_{\beta_2}\) (hn-sinc-NSF cyclic-noise-source)

p251_341.wav					p251_341.wav
p253_390.wav					p253_390.wav
p254_384.wav					p254_384.wav
p257_329.wav					p257_329.wav
p258_231.wav					p258_231.wav
p262_356.wav					p262_356.wav
p265_319.wav					p265_319.wav
p272_247.wav					p272_247.wav
p279_362.wav					p279_362.wav
p293_282.wav					p293_282.wav
p303_317.wav					p303_317.wav
p307_336.wav					p307_336.wav
p310_359.wav					p310_359.wav
p329_256.wav					p329_256.wav
p330_145.wav					p330_145.wav
p335_239.wav					p335_239.wav
p336_382.wav					p336_382.wav
p345_079.wav					p345_079.wav
p364_116.wav					p364_116.wav
p374_193.wav					p374_193.wav