Hn-sinc-NSF¶
Messages¶
Paper:
Wang, X. & Yamagishi, J. Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis. in Proc. SSW 1–6 (ISCA, 2019). doi:10.21437/SSW.2019-1
BibTex:
@inproceedings{Wang2019, address = {ISCA}, author = {Wang, Xin and Yamagishi, Junichi}, booktitle = {Proc. SSW}, doi = {10.21437/SSW.2019-1}, pages = {1--6}, publisher = {ISCA}, title = {{Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis}}, url = {http://www.isca-speech.org/archive/SSW{\_}2019/abstracts/SSW10{\_}O{\_}1-1.html}, year = {2019} }Experiments were based on ATR-Ximera F009 voice (Japanese, commercial database)
Code is available. You need both the CURRENNT toolkit and scripts. This subfolder in the script repository is for this project
New implementaion based on Pytorch is also available;
Slides for SSW 2019 presentation can be found on this page. You may also directly download the PDF;
Note that
Copy-synthesis refers to waveform generation given natural acoustic features
Text-to-speech refers to waveform generation given acoustic features predicted from the text input
Audio samples¶
Natural waveform samples cannot be released online due to the license issue.
Utterance: _AOZORAR_09534_T01Mel-spec. + F0 (15-hr data) | WaveNet | hn-NSF | h-sinc1-NSF | h-sinc2-NSF | h-sinc3-NSF |
---|---|---|---|---|---|
Copy-synthesis: | |||||
Text-to-speech: |
Utterance: _AOZORAR_03372_T01
Mel-spec. + F0 (15-hr data) | WaveNet | hn-NSF | h-sinc1-NSF | h-sinc2-NSF | h-sinc3-NSF |
---|---|---|---|---|---|
Copy-synthesis: | |||||
Text-to-speech: |
Utterance: _NIKKEIR_03132_T01
Mel-spec. + F0 (15-hr data) | WaveNet | hn-NSF | h-sinc1-NSF | h-sinc2-NSF | h-sinc3-NSF |
---|---|---|---|---|---|
Copy-synthesis: | |||||
Text-to-speech: |
Utterance: _NIKKEIR_00257_T01
Mel-spec. + F0 (15-hr data) | WaveNet | hn-NSF | h-sinc1-NSF | h-sinc2-NSF | h-sinc3-NSF |
---|---|---|---|---|---|
Copy-synthesis: | |||||
Text-to-speech: |
Utterance: _BTEC_00312_T01
Mel-spec. + F0 (15-hr data) | WaveNet | hn-NSF | h-sinc1-NSF | h-sinc2-NSF | h-sinc3-NSF |
---|---|---|---|---|---|
Copy-synthesis: | |||||
Text-to-speech: |