Cyclic-noise-NSF (VCTK samples)

Messages

  • Paper link:

    Wang, X. & Yamagishi, J. Using Cyclic Noise as the Source Signal for Neural Source-Filter-Based Speech Waveform Model. in Proc. Interspeech 1992–1996. doi:10.21437/Interspeech.2020-1018

  • BibTex:

    @inproceedings{wang2020cyclic,
    address = {ISCA},
    author = {Wang, Xin and Yamagishi, Junichi},
    booktitle = {Proc. Interspeech},
    doi = {10.21437/Interspeech.2020-1018},
    pages = {1992--1996},
    publisher = {ISCA},
    title = {{Using Cyclic Noise as the Source Signal for Neural Source-Filter-Based Speech Waveform Model}},
    url = {http://www.isca-speech.org/archive/Interspeech{\_}2020/abstracts/1018.html},
    year = {2020}
    }
    
  • This page lists samples on VCTK database

  • This page lists copy-synthesis waveform samples, i.e., waveforms generated given natural acoustic features. They were evaluated in the listening test

  • Code is available. You need both the CURRENNT toolkit and scripts. This subfolder in the script repository is made for this project

  • New implementaion based on Pytorch is also available

  • Slides for Interspeech 2020 presentation can be found on this page. Or you can directly download this PPT or PDF.


Audio samples

Models were trained on VCTK v0.9, 87 trainining speakers, 200 utterances/speaker, 16kHz waveforms, speaker-independently.

You can find pre-trained models in git repository above.

It may take a few minutes to load all the speech samples. You can also download all samples from this dropbox link.

Seen speakers

Test set samples for seen speakers (i.e., speakers who provided training data).

NaturalWaveNetSin
(hn-sinc-NSF sine-source)
Cno\({}_{\beta_2}\)
(hn-sinc-NSF cyclic-noise-source)
p229_290.wav p229_290.wav
p243_368.wav p243_368.wav
p250_305.wav p250_305.wav
p268_264.wav p268_264.wav
p285_241.wav p285_241.wav
p306_293.wav p306_293.wav
p323_422.wav p323_422.wav
p240_268.wav p240_268.wav
p244_409.wav p244_409.wav
p255_375.wav p255_375.wav
p270_310.wav p270_310.wav
p297_328.wav p297_328.wav
p311_330.wav p311_330.wav
p347_331.wav p347_331.wav
p241_325.wav p241_325.wav
p247_355.wav p247_355.wav
p267_273.wav p267_273.wav
p276_363.wav p276_363.wav
p305_410.wav p305_410.wav
p314_320.wav p314_320.wav

Unseen speakers

Test set samples for unseen speakers (i.e., speakers who do not provide any training data).

NaturalWaveNetSin
(hn-sinc-NSF sine-source)
Cno\({}_{\beta_2}\)
(hn-sinc-NSF cyclic-noise-source)
p251_341.wav p251_341.wav
p253_390.wav p253_390.wav
p254_384.wav p254_384.wav
p257_329.wav p257_329.wav
p258_231.wav p258_231.wav
p262_356.wav p262_356.wav
p265_319.wav p265_319.wav
p272_247.wav p272_247.wav
p279_362.wav p279_362.wav
p293_282.wav p293_282.wav
p303_317.wav p303_317.wav
p307_336.wav p307_336.wav
p310_359.wav p310_359.wav
p329_256.wav p329_256.wav
p330_145.wav p330_145.wav
p335_239.wav p335_239.wav
p336_382.wav p336_382.wav
p345_079.wav p345_079.wav
p364_116.wav p364_116.wav
p374_193.wav p374_193.wav