Home page of neural source-filter waveform models

Authors: Xin Wang, Shinji Takaki, Junichi Yamagishi

This is the home page for our recent work on neural source-filter (NSF) models.
If you have any comment and question, please send email to wangxin ~a~t~ nii ~dot~ ac ~dot~ jp.

Pytorch project for NSF models

Here is the pytorch re-implementation of NSF models:
https://github.com/nii-yamagishilab/project-NN-Pytorch-scripts
Pre-trained CMU-arctic models are included.

Using cyclic noise as source signals for NSF-based speech waveform modeling

We explored different types of source signals for speech modeling.
The cyclic noise works best for both female and male speakers (from CMU-ARCTIC databases) In addition to the CMU-arctic, we also trained models on VCTK corpus with more speakers.
Although they were not evaluated in the listening test in this paper, samples are uploaded here.

Applying hn-sinc-NSF to music signal generation

We can train NSF models to generate waveforms for brass, string, and woodwind instruments.

Harmonic-plus-noise NSF model with trainable Maximum Voiced Frequency (hn-sinc-NSF)

This new model enhances hn-NSF with trainable maximum voiced frequency and sinc-based high/low-pass FIR filters

Simplified NSF (s-NSF) and 1st Harmonic-plus-noise NSF model (hn-NSF)

This work introduces two new NSF models:
  1. s-NSF has simplified neural filter blocks
  2. hn-NSF combines harmonic-plus-noise modeling with s-NSF
s-NSF and hn-NSF are faster than b-NSF, and hn-NSF outperformed other s-NSF and b-NSF
Network structures, which are not fully described in the ICASSP 2019 paper, are explained in details.

Baseline NSF model (b-NSF)

First NSF model. It generates high-quality synthetic speech waveform at a much faster speed than WaveNet.