Home page of neural source-filter waveform models
Authors: Xin Wang, Shinji Takaki, Junichi Yamagishi
This is the home page for our recent work on neural source-filter (NSF) models.
If you have any comment and question, please send email to wangxin ~a~t~ nii ~dot~ ac ~dot~ jp.
Using cyclic noise as source signals for NSF-based speech waveform modeling
We explored different types of source signals for speech modeling.
The cyclic noise works best for both female and male speakers (from CMU-ARCTIC databases)
- Date: Sep 2020
- Submitted to Interspeech 2020
- Webpage: nsf-v4.html hosts the paper, samples, and models
Applying hn-sinc-NSF to music signal generation
We can train NSF models to generate waveforms for brass, string, and woodwind instruments.
- Date: May 2020
- Publication: to appear in ICASSP 2020
- Webpage: neural-music.html hosts the paper and samples
Harmonic-plus-noise NSF model with trainable Maximum Voiced Frequency (hn-sinc-NSF)
This new model enhances hn-NSF with trainable maximum voiced frequency and sinc-based high/low-pass FIR filters
Simplified NSF (s-NSF) and 1st Harmonic-plus-noise NSF model (hn-NSF)
This work introduces two new NSF models:
- s-NSF has simplified neural filter blocks
- hn-NSF combines harmonic-plus-noise modeling with s-NSF
s-NSF and hn-NSF are faster than b-NSF, and hn-NSF outperformed other s-NSF and b-NSF
Network structures, which are not fully described in the ICASSP 2019 paper, are explained in details.
Baseline NSF model (b-NSF)
First NSF model. It generates high-quality synthetic speech waveform at a much faster speed than WaveNet.