Cyclic-noise-NSF (VCTK samples)¶
Messages¶
Paper link:
Wang, X. & Yamagishi, J. Using Cyclic Noise as the Source Signal for Neural Source-Filter-Based Speech Waveform Model. in Proc. Interspeech 1992–1996. doi:10.21437/Interspeech.2020-1018
BibTex:
@inproceedings{wang2020cyclic, address = {ISCA}, author = {Wang, Xin and Yamagishi, Junichi}, booktitle = {Proc. Interspeech}, doi = {10.21437/Interspeech.2020-1018}, pages = {1992--1996}, publisher = {ISCA}, title = {{Using Cyclic Noise as the Source Signal for Neural Source-Filter-Based Speech Waveform Model}}, url = {http://www.isca-speech.org/archive/Interspeech{\_}2020/abstracts/1018.html}, year = {2020} }This page lists samples on VCTK database
This page lists copy-synthesis waveform samples, i.e., waveforms generated given natural acoustic features. They were evaluated in the listening test
Code is available. You need both the CURRENNT toolkit and scripts. This subfolder in the script repository is made for this project
New implementaion based on Pytorch is also available
Slides for Interspeech 2020 presentation can be found on this page. Or you can directly download this PPT or PDF.
Audio samples¶
Models were trained on VCTK v0.9, 87 trainining speakers, 200 utterances/speaker, 16kHz waveforms, speaker-independently.
You can find pre-trained models in git repository above.
It may take a few minutes to load all the speech samples. You can also download all samples from this dropbox link.
Seen speakers¶
Test set samples for seen speakers (i.e., speakers who provided training data).
Natural | WaveNet | Sin (hn-sinc-NSF sine-source) | Cno\({}_{\beta_2}\) (hn-sinc-NSF cyclic-noise-source) | ||
---|---|---|---|---|---|
p229_290.wav | p229_290.wav | ||||
p243_368.wav | p243_368.wav | ||||
p250_305.wav | p250_305.wav | ||||
p268_264.wav | p268_264.wav | ||||
p285_241.wav | p285_241.wav | ||||
p306_293.wav | p306_293.wav | ||||
p323_422.wav | p323_422.wav | ||||
p240_268.wav | p240_268.wav | ||||
p244_409.wav | p244_409.wav | ||||
p255_375.wav | p255_375.wav | ||||
p270_310.wav | p270_310.wav | ||||
p297_328.wav | p297_328.wav | ||||
p311_330.wav | p311_330.wav | ||||
p347_331.wav | p347_331.wav | ||||
p241_325.wav | p241_325.wav | ||||
p247_355.wav | p247_355.wav | ||||
p267_273.wav | p267_273.wav | ||||
p276_363.wav | p276_363.wav | ||||
p305_410.wav | p305_410.wav | ||||
p314_320.wav | p314_320.wav |
Unseen speakers¶
Test set samples for unseen speakers (i.e., speakers who do not provide any training data).
Natural | WaveNet | Sin (hn-sinc-NSF sine-source) | Cno\({}_{\beta_2}\) (hn-sinc-NSF cyclic-noise-source) | ||
---|---|---|---|---|---|
p251_341.wav | p251_341.wav | ||||
p253_390.wav | p253_390.wav | ||||
p254_384.wav | p254_384.wav | ||||
p257_329.wav | p257_329.wav | ||||
p258_231.wav | p258_231.wav | ||||
p262_356.wav | p262_356.wav | ||||
p265_319.wav | p265_319.wav | ||||
p272_247.wav | p272_247.wav | ||||
p279_362.wav | p279_362.wav | ||||
p293_282.wav | p293_282.wav | ||||
p303_317.wav | p303_317.wav | ||||
p307_336.wav | p307_336.wav | ||||
p310_359.wav | p310_359.wav | ||||
p329_256.wav | p329_256.wav | ||||
p330_145.wav | p330_145.wav | ||||
p335_239.wav | p335_239.wav | ||||
p336_382.wav | p336_382.wav | ||||
p345_079.wav | p345_079.wav | ||||
p364_116.wav | p364_116.wav | ||||
p374_193.wav | p374_193.wav |