Hn-NSF and s-NSF¶
Messages¶
Paper:
Wang, X., Takaki, S. & Yamagishi, J. Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis. IEEE/ACM Trans. Audio, Speech, Lang. Process. 28, 402–415 (2020), DOI:10.1109/TASLP.2019.2956145
BibTex:
@article{wangNSFall, author = {Wang, Xin and Takaki, Shinji and Yamagishi, Junichi}, doi = {10.1109/TASLP.2019.2956145}, journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing}, pages = {402--415}, title = {{Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis}}, url = {https://ieeexplore.ieee.org/document/8915761/}, volume = {28}, year = {2020} }Experiments were based on ATR-Ximera F009 voice (Japanese, commercial database)
Code is available. You need both the CURRENNT toolkit and scripts. This subfolder in the script repository is made for this project
New implementaion based on Pytorch is also available;
Note that
Copy-synthesis refers to waveform generation given natural acoustic features
Text-to-speech refers to waveform generation given acoustic features predicted from the text input
Samples (main test)¶
Natural waveform samples cannot be released online due to the license issue.
Utterance: _NIKKEIR_03132_T01Mel-spec. + F0 (15-hr data) | WaveNet | hn-NSF | b-NSF | s-NSF |
---|---|---|---|---|
Copy-synthesis: | ||||
Text-to-speech: |
MGC coef. + F0 (15-hr data) | WaveNet | hn-NSF | b-NSF | s-NSF |
---|---|---|---|---|
Copy-synthesis: | ||||
Text-to-speech: |
Mel-spec. + F0 (1.6-hr data) | WaveNet | hn-NSF | b-NSF | s-NSF |
---|---|---|---|---|
Copy-synthesis: | ||||
Text-to-speech: |
MGC coef. + F0 (1.6-hr data) | WaveNet | hn-NSF | b-NSF | s-NSF |
---|---|---|---|---|
Copy-synthesis: | ||||
Text-to-speech: |
Utterance: _NIKKEIR_00257_T01
Mel-spec. + F0 (15-hr data) | WaveNet | hn-NSF | b-NSF | s-NSF |
---|---|---|---|---|
Copy-synthesis: | ||||
Text-to-speech: |
MGC coef. + F0 (15-hr data) | WaveNet | hn-NSF | b-NSF | s-NSF |
---|---|---|---|---|
Copy-synthesis: | ||||
Text-to-speech: |
Mel-spec. + F0 (1.6-hr data) | WaveNet | hn-NSF | b-NSF | s-NSF |
---|---|---|---|---|
Copy-synthesis: | ||||
Text-to-speech: |
MGC coef. + F0 (1.6-hr data) | WaveNet | hn-NSF | b-NSF | s-NSF |
---|---|---|---|---|
Copy-synthesis: | ||||
Text-to-speech: |
Utterance: _BTEC_00312_T01
Mel-spec. + F0 (15-hr data) | WaveNet | hn-NSF | b-NSF | s-NSF |
---|---|---|---|---|
Copy-synthesis: | ||||
Text-to-speech: |
MGC coef. + F0 (15-hr data) | WaveNet | hn-NSF | b-NSF | s-NSF |
---|---|---|---|---|
Copy-synthesis: | ||||
Text-to-speech: |
Mel-spec. + F0 (1.6-hr data) | WaveNet | hn-NSF | b-NSF | s-NSF |
---|---|---|---|---|
Copy-synthesis: | ||||
Text-to-speech: |
MGC coef. + F0 (1.6-hr data) | WaveNet | hn-NSF | b-NSF | s-NSF |
---|---|---|---|---|
Copy-synthesis: | ||||
Text-to-speech: |
Utterance: _AOZORAR_09534_T01
Mel-spec. + F0 (15-hr data) | WaveNet | hn-NSF | b-NSF | s-NSF |
---|---|---|---|---|
Copy-synthesis: | ||||
Text-to-speech: |
MGC coef. + F0 (15-hr data) | WaveNet | hn-NSF | b-NSF | s-NSF |
---|---|---|---|---|
Copy-synthesis: | ||||
Text-to-speech: |
Mel-spec. + F0 (1.6-hr data) | WaveNet | hn-NSF | b-NSF | s-NSF |
---|---|---|---|---|
Copy-synthesis: | ||||
Text-to-speech: |
MGC coef. + F0 (1.6-hr data) | WaveNet | hn-NSF | b-NSF | s-NSF |
---|---|---|---|---|
Copy-synthesis: | ||||
Text-to-speech: |
Utterance: _AOZORAR_03372_T01
Mel-spec. + F0 (15-hr data) | WaveNet | hn-NSF | b-NSF | s-NSF |
---|---|---|---|---|
Copy-synthesis: | ||||
Text-to-speech: |
MGC coef. + F0 (15-hr data) | WaveNet | hn-NSF | b-NSF | s-NSF |
---|---|---|---|---|
Copy-synthesis: | ||||
Text-to-speech: |
Mel-spec. + F0 (1.6-hr data) | WaveNet | hn-NSF | b-NSF | s-NSF |
---|---|---|---|---|
Copy-synthesis: | ||||
Text-to-speech: |
MGC coef. + F0 (1.6-hr data) | WaveNet | hn-NSF | b-NSF | s-NSF |
---|---|---|---|---|
Copy-synthesis: | ||||
Text-to-speech: |
Samples (ablation test)¶
Ablation test on b-NSF (16kHz, please check the notation in the paper)
Utterance: _AOZORAR_03372_T01b-NSF (trained on 5-hr. MGC+F0) | |||
---|---|---|---|
Copy-synthesis | |||
L1 | L2 | L3 | |
Copy-synthesis | |||
S1 | S2 | ||
Copy-synthesis | |||
N1 | N2 | ||
Copy-synthesis |
Utterance: _NIKKEIR_03132_T01
b-NSF (trained on 5-hr. MGC+F0) | |||
---|---|---|---|
Copy-synthesis | |||
L1 | L2 | L3 | |
Copy-synthesis | |||
S1 | S2 | ||
Copy-synthesis | |||
N1 | N2 | ||
Copy-synthesis |
Utterance: _NIKKEIR_00257_T01
b-NSF (trained on 5-hr. MGC+F0) | |||
---|---|---|---|
Copy-synthesis | |||
L1 | L2 | L3 | |
Copy-synthesis | |||
S1 | S2 | ||
Copy-synthesis | |||
N1 | N2 | ||
Copy-synthesis |
Utterance: _AOZORAR_09534_T01
b-NSF (trained on 5-hr. MGC+F0) | |||
---|---|---|---|
Copy-synthesis | |||
L1 | L2 | L3 | |
Copy-synthesis | |||
S1 | S2 | ||
Copy-synthesis | |||
N1 | N2 | ||
Copy-synthesis |