.. samples-xin documentation master file, created by
   sphinx-quickstart on Sun Apr 25 22:58:24 2021.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.
   
Home page of NSF model
**********************
*By Xin Wang, Shinji Takaki, Junichi Yamagishi*


.. image:: fig/fig_timeline.png
    :scale: 70%
    :alt: No image found

This the home site of neural source-filter (NSF) waveform models.

This site hosts the sample pages of all the NSF models illustrated in the figure.
The latest work comes first. If you have any questions or suggestions, please send email to wangxin ~a~t~ nii ~dot~ ac ~dot~ jp. 

Links:
    * Sample pages of other projects: `github page <https://nii-yamagishilab.github.io/samples-xin/>`__
    
    * Home page of `Xin Wang on github <http://tonywangx.github.io/>`__

|

Pytorch project
===============

Here is the Pytorch re-implementation of NSF models:

    https://github.com/nii-yamagishilab/project-NN-Pytorch-scripts

Notes:

    * This repository incldues the cyclic-noise-NSF, hn-sinc-NSF, hn-NSF;
   
    * Demo script and pre-trained models on CMU_ARCTIC database are included;

    * Tutorials (Jupyter notebooks) are available in this `tutorial <https://github.com/nii-yamagishilab/project-NN-Pytorch-scripts/tree/master/tutorials>`_ sub-directory;

    * Our NSF papers used the old CURRENNT implementation, not this Pytorch one (see `code <https://github.com/nii-yamagishilab/project-CURRENNT-scripts>`_ and `scripts <https://github.com/nii-yamagishilab/project-CURRENNT-public/>`_)
   
Comments and suggestions are welcome!
  
| 

Cyclic-noise-NSF
================

**Using cyclic noise as source signals for NSF-based speech waveform modeling**

Audio sample pages:

    Samples on CMU_ARCTIC database -> :ref:`label-nsf-v4` 

    Samples on VCTK database -> :ref:`label-nsf-v4_vctk`

Paper link:

    Wang, X. & Yamagishi, J. Using Cyclic Noise as the Source Signal for Neural Source-Filter-Based Speech Waveform Model. in Proc. Interspeech 1992–1996. `doi:10.21437/Interspeech.2020-1018 <http://dx.doi.org/10.21437/Interspeech.2020-1018>`__


BibTex::
    
    @inproceedings{wang2020cyclic,
    address = {ISCA},
    author = {Wang, Xin and Yamagishi, Junichi},
    booktitle = {Proc. Interspeech},
    doi = {10.21437/Interspeech.2020-1018},
    pages = {1992--1996},
    publisher = {ISCA},
    title = {{Using Cyclic Noise as the Source Signal for Neural Source-Filter-Based Speech Waveform Model}},
    url = {http://www.isca-speech.org/archive/Interspeech{\_}2020/abstracts/1018.html},
    year = {2020}
    }

|

Hn-sinc-NSF for music
=====================
**Transferring neural speech waveform synthesizers to musical instrument sounds generation**

This work applies NSF models to music instrumental audios generation. *A single model* can be trained to generate different types of instruments: brass, string, and woodwind instruments.

Audio sample page:

    Audio samples on URMP database -> :ref:`label-neural-music`

Paper:

    Zhao, Y., Wang, X., Juvela, L. & Yamagishi, J. Transferring neural speech waveform synthesizers to musical instrument sounds generation. in Proc. ICASSP 6269–6273 (IEEE, 2020). `doi:10.1109/ICASSP40776.2020.9053047 <https://doi.org/10.1109/ICASSP40776.2020.9053047>`__

BibTex::

    @inproceedings{Zhao2020,
    author = {Zhao, Yi and Wang, Xin and Juvela, Lauri and Yamagishi, Junichi},
    booktitle = {Proc. ICASSP},
    doi = {10.1109/ICASSP40776.2020.9053047},
    pages = {6269--6273},
    title = {{Transferring neural speech waveform synthesizers to musical instrument sounds generation}},
    url = {https://ieeexplore.ieee.org/document/9053047/},
    year = {2020}
    }

|

Hn-sinc-NSF
===========
**Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis**

This new model enhances hn-NSF with trainable maximum voiced frequency (MVF) and sinc-based high/low-pass FIR filters

Audio sample page:

    Audio samples on ATR Ximera F009 voice -> :ref:`label-nsf-v3` 

Paper:

    Wang, X. & Yamagishi, J. Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis. in Proc. SSW 1–6 (ISCA, 2019). `doi:10.21437/SSW.2019-1 <http://dx.doi.org/10.21437/SSW.2019-1>`__

BibTex::

    @inproceedings{Wang2019,
    address = {ISCA},
    author = {Wang, Xin and Yamagishi, Junichi},
    booktitle = {Proc. SSW},
    doi = {10.21437/SSW.2019-1},
    pages = {1--6},
    publisher = {ISCA},
    title = {{Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis}},
    url = {http://www.isca-speech.org/archive/SSW{\_}2019/abstracts/SSW10{\_}O{\_}1-1.html},
    year = {2019}
    }

|

Hn-NSF and s-NSF
================
**Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis**

This work introduces two new NSF models: s-NSF with simplified neural filter blocks and hn-NSF with harmonic-plus-noise structure. It also explains the details of NSF, which cannot fit into the ICASSP paper.

Audio sample page:
    
    Audio sample on ATR Ximera F009 voice -> :ref:`label-nsf-v2`

Paper:

    Wang, X., Takaki, S. & Yamagishi, J. Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis. IEEE/ACM Trans. Audio, Speech, Lang. Process. 28, 402–415 (2020), `DOI:10.1109/TASLP.2019.2956145 <https://doi.org/10.1109/TASLP.2019.2956145>`__

BibTex::

    @article{wangNSFall,
    author = {Wang, Xin and Takaki, Shinji and Yamagishi, Junichi},
    doi = {10.1109/TASLP.2019.2956145},
    journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
    pages = {402--415},
    title = {{Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis}},
    url = {https://ieeexplore.ieee.org/document/8915761/},
    volume = {28},
    year = {2020}
    }

|

Baseline NSF
============
**Neural source-filter-based waveform model for statistical parametric speech synthesis**

This is frst NSF model, a by-product of unsuccessful reproduction of Parallel WaveNet. Hence, the model uses similar dilated CNN blocks as WaveNet.

Audio sample page:
    
    Audio sample on ATR Ximera F009 and CMU_ARCTIC SLT -> :ref:`label-nsf-v1`

Paper:

    Wang, X., Takaki, S. & Yamagishi, J. Neural source-filter-based waveform model for statistical parametric speech synthesis. in Proc. ICASSP 5916–5920 (2019). `DOI:10.1109/ICASSP.2019.8682298 <https://doi.org/10.1109/ICASSP.2019.8682298>`__

BibTex::

    @inproceedings{wang2018neural,
    author = {Wang, Xin and Takaki, Shinji and Yamagishi, Junichi},
    booktitle = {Proc. ICASSP},
    doi = {10.1109/ICASSP.2019.8682298},
    pages = {5916--5920},
    publisher = {IEEE},
    title = {{Neural Source-filter-based Waveform Model for Statistical Parametric Speech Synthesis}},
    url = {https://ieeexplore.ieee.org/document/8682298/},
    year = {2019}
    }


|

Acknowledgement
===============
This project cannot be done without the help of reviewers, readers, colleagues, and all the friends:

    * The original `CURRENNT toolkit <http://jmlr.org/papers/v16/weninger15a.html>`__ is the work of Felix Weninger and colleagues, a fantastic toolkit using CUDA/THRUST. It is available on `Sourceforge <https://sourceforge.net/projects/currennt/>`_;

    * `WORLD vocoder <https://github.com/mmorise/World>`__ is the work of Dr. Morise;

    * `CMU_ARCTIC database <http://festvox.org/cmu_arctic>`__ is provided by Language Technologies Institute, Carnegie Mellon University;

    * `URMP database <http://www2.ece.rochester.edu/projects/air/projects/URMP.html>`__ is provided by AIR lab, University of Rochester;

This work can improved. Your feedback is welcome!


|

Note
====
Audio other than the natural samples on this website are distributed with under a `Creative Commons Attribution Non-Commercial NoDerivatives 4.0 (CC BY-NC-ND 4.0) license <https://creativecommons.org/licenses/by-nc-nd/4.0/>`__.

.. raw:: html
     
     <a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/4.0/"><img alt="クリエイティブ・コモンズ・ライセンス" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-nd/4.0/80x15.png" /></a>

.. toctree::
   :hidden:
   :maxdepth: 1
	      
   Home page of NSF <self>
   Cyc-noise-NSF (cmu) <nsf-v4>
   Cyc-noise-NSF (vctk) <nsf-v4_vctk>
   Music hn-sinc-NSF <neural-music>
   Hn-sinc-NSF <nsf-v3>
   Hn-NSF and s-NSF <nsf-v2>
   Baseline b-NSF <nsf-v1>