Hn-sinc-NSF¶

Messages¶

Paper:

Wang, X. & Yamagishi, J. Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis. in Proc. SSW 1–6 (ISCA, 2019). doi:10.21437/SSW.2019-1
BibTex:
@inproceedings{Wang2019,
address = {ISCA},
author = {Wang, Xin and Yamagishi, Junichi},
booktitle = {Proc. SSW},
doi = {10.21437/SSW.2019-1},
pages = {1--6},
publisher = {ISCA},
title = {{Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis}},
url = {http://www.isca-speech.org/archive/SSW{\_}2019/abstracts/SSW10{\_}O{\_}1-1.html},
year = {2019}
}
Experiments were based on ATR-Ximera F009 voice (Japanese, commercial database)

Code is available. You need both the CURRENNT toolkit and scripts. This subfolder in the script repository is for this project

New implementaion based on Pytorch is also available;

Slides for SSW 2019 presentation can be found on this page. You may also directly download the PDF;

Note that

Copy-synthesis refers to waveform generation given natural acoustic features

Text-to-speech refers to waveform generation given acoustic features predicted from the text input