Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems?

Authors: Xuan Shi, Erica Cooper, Xin Wang, Junichi Yamagishi, Shrikanth Narayanan

Submitted to ICASSP 2023.

Notes:
  • Natural audios are from The MAESTRO Dataset V2.0.0. The MAESTRO dataset is made available by Google LLC under a Creative Commons Attribution Non-Commercial Share-Alike 4.0 (CC BY-NC-SA 4.0) license. Please cite the paper if you use the MAESTRO dataset:

    Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi Anna Huang, Sander Dieleman, Erich Elsen, Jesse Engel, and Douglas Eck. "Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset." In International Conference on Learning Representations, 2019.

  • We used an open-source software Fluidsynth and a commercial software Pianoteq as reference systems. Fluidsynth comes with GNU LESSER GENERAL PUBLIC LICENSE;
  • Audio samples other than those from MAESTRO on this website are distributed with under a Creative Commons Attribution Non-Commercial Share-Alike 4.0 (CC BY-NC-SA 4.0) license
  • クリエイティブ・コモンズ・ライセンス

    List of systems

    Picture not found

    Audio Samples

    Natural
    Fluidsynth
    Pianoteq
    abs-mfbf-nsfs
    taco-mfbf-nsfs
    abs-mfb-nsfs
    abs-mfb-nsf
    abs-mfb-nsfg
    abs-mfb-hfg
    taco-mfb-nsfs
    taco-mfb-nsf
    taco-mfb-nsfg
    taco-mfb-hfg
    trans-mfb-nsfs
    trans-mfb-nsf
    trans-mfb-nsfg
    trans-mfb-hfg
    joint-nsf
    joint-nsfg
    joint-hfg