.. samples-xin documentation master file, created by sphinx-quickstart on Sun Apr 25 22:58:24 2021. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. .. _label-nsf-v2: Hn-NSF and s-NSF **************** Messages -------- * Paper: Wang, X., Takaki, S. & Yamagishi, J. Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis. IEEE/ACM Trans. Audio, Speech, Lang. Process. 28, 402–415 (2020), `DOI:10.1109/TASLP.2019.2956145 `__ * BibTex:: @article{wangNSFall, author = {Wang, Xin and Takaki, Shinji and Yamagishi, Junichi}, doi = {10.1109/TASLP.2019.2956145}, journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing}, pages = {402--415}, title = {{Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis}}, url = {https://ieeexplore.ieee.org/document/8915761/}, volume = {28}, year = {2020} } * Experiments were based on ATR-Ximera F009 voice (Japanese, commercial database) * Code is available. You need both the `CURRENNT toolkit `_ and `scripts `_. `This subfolder `_ in the script repository is made for this project * New implementaion based on Pytorch is also `available `_; * Note that Copy-synthesis refers to waveform generation given natural acoustic features Text-to-speech refers to waveform generation given acoustic features predicted from the text input | Samples (main test) ------------------- Natural waveform samples cannot be released online due to the license issue. .. raw:: html Utterance: _NIKKEIR_03132_T01
Mel-spec. + F0 (15-hr data)WaveNethn-NSFb-NSFs-NSF
Copy-synthesis:
Text-to-speech:

MGC coef. + F0 (15-hr data)WaveNethn-NSFb-NSFs-NSF
Copy-synthesis:
Text-to-speech:

Mel-spec. + F0 (1.6-hr data)WaveNethn-NSFb-NSFs-NSF
Copy-synthesis:
Text-to-speech:

MGC coef. + F0 (1.6-hr data)WaveNethn-NSFb-NSFs-NSF
Copy-synthesis:
Text-to-speech:

Utterance: _NIKKEIR_00257_T01
Mel-spec. + F0 (15-hr data)WaveNethn-NSFb-NSFs-NSF
Copy-synthesis:
Text-to-speech:

MGC coef. + F0 (15-hr data)WaveNethn-NSFb-NSFs-NSF
Copy-synthesis:
Text-to-speech:

Mel-spec. + F0 (1.6-hr data)WaveNethn-NSFb-NSFs-NSF
Copy-synthesis:
Text-to-speech:

MGC coef. + F0 (1.6-hr data)WaveNethn-NSFb-NSFs-NSF
Copy-synthesis:
Text-to-speech:



Utterance: _BTEC_00312_T01
Mel-spec. + F0 (15-hr data)WaveNethn-NSFb-NSFs-NSF
Copy-synthesis:
Text-to-speech:

MGC coef. + F0 (15-hr data)WaveNethn-NSFb-NSFs-NSF
Copy-synthesis:
Text-to-speech:

Mel-spec. + F0 (1.6-hr data)WaveNethn-NSFb-NSFs-NSF
Copy-synthesis:
Text-to-speech:

MGC coef. + F0 (1.6-hr data)WaveNethn-NSFb-NSFs-NSF
Copy-synthesis:
Text-to-speech:



Utterance: _AOZORAR_09534_T01
Mel-spec. + F0 (15-hr data)WaveNethn-NSFb-NSFs-NSF
Copy-synthesis:
Text-to-speech:

MGC coef. + F0 (15-hr data)WaveNethn-NSFb-NSFs-NSF
Copy-synthesis:
Text-to-speech:

Mel-spec. + F0 (1.6-hr data)WaveNethn-NSFb-NSFs-NSF
Copy-synthesis:
Text-to-speech:

MGC coef. + F0 (1.6-hr data)WaveNethn-NSFb-NSFs-NSF
Copy-synthesis:
Text-to-speech:



Utterance: _AOZORAR_03372_T01
Mel-spec. + F0 (15-hr data)WaveNethn-NSFb-NSFs-NSF
Copy-synthesis:
Text-to-speech:

MGC coef. + F0 (15-hr data)WaveNethn-NSFb-NSFs-NSF
Copy-synthesis:
Text-to-speech:

Mel-spec. + F0 (1.6-hr data)WaveNethn-NSFb-NSFs-NSF
Copy-synthesis:
Text-to-speech:

MGC coef. + F0 (1.6-hr data)WaveNethn-NSFb-NSFs-NSF
Copy-synthesis:
Text-to-speech:
| Samples (ablation test) ----------------------- Ablation test on b-NSF (16kHz, please check the notation in the paper) .. raw:: html Utterance: _AOZORAR_03372_T01
b-NSF (trained on 5-hr. MGC+F0)
Copy-synthesis
L1L2L3
Copy-synthesis
S1S2
Copy-synthesis
N1N2
Copy-synthesis



Utterance: _NIKKEIR_03132_T01
b-NSF (trained on 5-hr. MGC+F0)
Copy-synthesis
L1L2L3
Copy-synthesis
S1S2
Copy-synthesis
N1N2
Copy-synthesis



Utterance: _NIKKEIR_00257_T01
b-NSF (trained on 5-hr. MGC+F0)
Copy-synthesis
L1L2L3
Copy-synthesis
S1S2
Copy-synthesis
N1N2
Copy-synthesis



Utterance: _AOZORAR_09534_T01
b-NSF (trained on 5-hr. MGC+F0)
Copy-synthesis
L1L2L3
Copy-synthesis
S1S2
Copy-synthesis
N1N2
Copy-synthesis
.. toctree:: :hidden: :maxdepth: 1