Yamagishi Laboratory, National Institute of Informatics, Japan


Associate members
Follow @yamagishilab

Selected publications and their samples/codes:
"The ASVspoof 2019 database"
Xin Wang, Junichi Yamagishi, Massimiliano Todisco, Hector Delgado, Andreas Nautsch, Nicholas Evans, Md Sahidullah, Ville Vestman, Tomi Kinnunen, Kong Aik Lee, Lauri Juvela, Paavo Alku, Yu-Huai Peng, Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Sebastien Le Maguer, Markus Becker, Fergus Henderson, Rob Clark, Yu Zhang, Quan Wang, Ye Jia, Kai Onuma, Koji Mushika, Takashi Kaneda, Yuan Jiang, Li-Juan Liu, Yi-Chiao Wu, Wen-Chin Huang, Tomoki Toda, Kou Tanaka, Hirokazu Kameoka, Ingmar Steiner, Driss Matrouf, Jean-Francois Bonastre, Avashna Govender, Srikanth Ronanki, Jing-Xuan Zhang, Zhen-Hua Ling
Nov. 2019, Submitted to Computer Speech and Language
Preprint, project page

"Modeling of Rakugo Speech and Its Various Speaking Styles: Toward Speech Synthesis That Entertains Audiences"
Shuhei Kato, Yusuke Yasuda, Xin Wang, Erica Cooper, Shinji Takaki, Junichi Yamagishi
Nov. 2019, Submitted to IEEE Access
Preprint, Samples

"Use of a Capsule Network to Detect Fake Images and Videos"
Huy H. Nguyen, Junichi Yamagishi, Isao Echizen
Oct. 2019, Submitted to IEEE Journal of Selected Topics in Signal Processing, Special Issue on Data Driven Media Authentication and Forensics
Preprint, code

"Zero-Shot Multi-Speaker Text-To-Speech with State-of-the-art Neural Speaker Embeddings"
Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Fuming Fang, Xin Wang, Nanxin Chen, Junichi Yamagishi
Oct. 2019, Submitted to ICASSP 2020
Preprint, samples, codes (speaker encoder)

"Transferring neural speech waveform synthesizers to musical instrument sounds generation"
Yi Zhao, Xin Wang, Lauri Juvela, Junichi Yamagishi
Oct. 2019, Submitted to ICASSP 2020
Preprint, samples

"Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment"
Yusuke Yasuda, Xin Wang, Junichi Yamagishi
Oct. 2019, Submitted to ICASSP 2020
Preprint, samples

"Security of Facial Forensics Models Against Adversarial Attacks"
Rong Huang, Fuming Fang, Huy H. Nguyen, Junichi Yamagishi, Isao Echizen
Oct. 2019, Submitted to ICASSP 2020
Preprint, samples

"A Method for Identifying Origin of Digital Images Using a Convolution Neural Network"
Rong Huang, Fuming Fang, Huy H. Nguyen, Junichi Yamagishi, Isao Echizen
Oct. 2019, Submitted to ICASSP 2020
Preprint, samples

"Bootstrapping non-parallel voice conversion from speaker-adaptive text-to-speech"
Hieu-Thi Luong, Junichi Yamagishi
Sept. 2019, Accepted for the IEEE ASRU 2019
Preprint, samples

"Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments"
Yusuke Yasuda, Xin Wang, Junichi Yamagishi
September 2019, the 10th ISCA Speech Synthesis Workshop (SSW10)
Preprint, samples

"Rakugo speech synthesis using segment-to-segment neural transduction and style tokens — toward speech synthesis for entertaining audiences"
Shuhei Kato, Yusuke Yasuda, Xin Wang, Erica Cooper, Shinji Takaki, Junichi Yamagishi
September 2019, the 10th ISCA Speech Synthesis Workshop (SSW10)
PDF, samples

"Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis"
Xin Wang, Junichi Yamagishi
August 2019, the 10th ISCA Speech Synthesis Workshop (SSW10)
Preprint, samples and codes

"Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-based Detection"
David Ifeoluwa Adelani, Haotian Mai, Fuming Fang, Huy H. Nguyen, Junichi Yamagishi, Isao Echizen
July 2019, ArXiv
Preprint

"A Unified Speaker Adaptation Method for Speech Synthesis using Transcribed and Untranscribed Speech with Backpropagation"
Hieu-Thi Luong, Junichi Yamagishi
June 2019, ArXiv
Preprint, samples

"Multi-task Learning For Detecting and Segmenting Manipulated Facial Images and Videos"
Huy H. Nguyen, Fuming Fang, Junichi Yamagishi, Isao Echizen
June 2019, BTAS 2019
Preprint, Demo video, Codes

"Speaker Anonymization Using X-vector and Neural Waveform Models"
Fuming Fang, Xin Wang, Junichi Yamagishi, Isao Echizen, Massimiliano Todisco, Nicholas Evans, Jean-Francois Bonastre
May 2019, the 10th ISCA Speech Synthesis Workshop (SSW10)
Preprint, samples

"Neural source-filter waveform models for statistical parametric speech synthesis"
Xin Wang, Shinji Takaki, Junichi Yamagishi
April 2019, Accepted for IEEE/ACM Transactions on Audio, Speech, and Language Processing
Preprint, samples and codes

"ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection"
Massimiliano Todisco, Xin Wang, Ville Vestman, Md Sahidullah, Hector Delgado, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee
April 2019, Interspeech 2019, Graz, Austria
Preprint, challenge website, database

"GELP: GAN-Excited Liner Prediction for Speech Synthesis from Mel-spectrogram"
Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku
April 2019, Interspeech 2019, Graz, Austria
Preprint, samples

"Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet"
Mingyang Zhang, Xin Wang, Fuming Fang, Haizhou Li, Junichi Yamagishi
April 2019, Interspeech 2019, Graz, Austria
Preprint, samples

"MOSNet: Deep Learning based Objective Assessment for Voice Conversion"
Chen-Chou Lo, Szu-Wei Fu, Wen-Chin Huang, Xin Wang, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang
April 2019, Interspeech 2019, Graz, Austria
Preprint, Codes

"Does the Lombard Effect Improve Emotional Communication in Noise? - Analysis of Emotional Speech Acted in Noise -"
Yi Zhao, Atsushi Ando, Shinji Takaki, Junichi Yamagishi, Satoshi Kobashikawa
April 2019, Interspeech 2019, Graz, Austria
Preprint, samples

"Training Multi-Speaker Neural Text-to-Speech Systems using Speaker-Imbalanced Speech Corpora"
Hieu-Thi Luong, Xin Wang, Junichi Yamagishi, Nobuyuki Nishizawa
April 2019, Interspeech 2019, Graz, Austria
Preprint, samples

"Spatio-temporal generative adversarial network for gait anonymization"
Ngoc-Dung T. Tieu, Huy H. Nguyen, Hoang-Quoc Nguyen-Son, Junichi Yamagishi, Isao Echizen
March 2019, Journal of Information Security and Applications
Preprint

"Introduction to Voice Presentation Attack Detection and Recent Advances"
Md Sahidullah, Hector Delgado, Massimiliano Todisco, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, Kong-Aik Lee
Jan. 2019, a book-chapter in Handbook of Biometric Anti-Spoofing Presentation Attack Detection (Second Edition)
Preprint

"Neural source-filter-based waveform model for statistical parametric speech synthesis"
Xin Wang, Shinji Takaki, Junichi Yamagishi
Oct. 2018, ICASSP 2019
Preprint, samples and codes

"STFT spectral loss for training a neural speech waveform model"
Shinji Takaki, Toru Nakashika, Xin Wang, Junichi Yamagishi
Oct. 2018, ICASSP 2019
Preprint, samples, codes

"Capsule-Forensics: Using Capsule Networks to Detect Forged Images and Videos"
Huy H. Nguyen, Junichi Yamagishi, Isao Echizen
Oct. 2018, ICASSP 2019
Preprint, Demo video, Codes

"Attentive Filtering Networks for Audio Replay Attack Detection"
Cheng-I Lai, Alberto Abad, Korin Richmond, Junichi Yamagishi, Najim Dehak, Simon King
Oct. 2018, ICASSP 2019
Preprint, Codes

"Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language"
Yusuke Yasuda, Xin Wang, Shinji Takaki, Junichi Yamagishi
Oct. 2018, ICASSP 2019
Preprint, Codes (Tacotron with self attention), Codes (Tacotron2)

"Audiovisual speaker conversion: jointly and simultaneously transforming facial expression and acoustic characteristics"
Fuming Fang, Xin Wang, Junichi Yamagishi, Isao Echizen
Oct. 2018, ICASSP 2019
Preprint, samples

"Waveform generation for text-to-speech synthesis using pitch-synchronous multi-scale generative adversarial networks"
Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku
Oct. 2018, ICASSP 2019
Preprint, samples, codes

"Transforming acoustic characteristics to deceive playback spoofing countermeasures of speaker verification systems"
Fuming Fang, Junichi Yamagishi, Isao Echizen, Md Sahidullah, Tomi Kinnunen
Sept. 2018, WIFS 2018
Paper, Codes

"MesoNet: a Compact Facial Video Forgery Detection Network"
Darius Afchar, Vincent Nozick, Junichi Yamagishi, Isao Echizen
Sept. 2018, WIFS 2018
Paper, Demo video, Codes

PhD Thesis "Fundamental Frequency Modeling for Neural-Network-Based Statistical Parametric Speech Synthesis"
Xin Wnag
Sept. 2018, Sokendai University
PhD thesis, Codes

"Multimodal Speech Synthesis Architecture for Unsupervised Speaker Adaptation"
Hieu-Thi Luong and Junichi Yamagishi
Sept. 2018, Interspeech 2018
Paper, Speech samples

"Wasserstein GAN and Waveform Loss-based Acoustic Model Training for Multi-speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder"
Yi Zhao, Shinji Takaki, Hieu-Thi Luong, Junichi Yamagishi, Daisuke Saito, Nobuaki Minematsu
July 2018, IEEE Access
Paper, Speech samples

See full publications at
Google scholar, ResearchGate, Researchmap, or Edinburgh Research Explore.

Selected tutorials:
"Tutorial on end-to-end text-to-speech synthesis"
Xin Wang, Yusuke Yasuda
Part 1 – Neural waveform modeling (slides)(video)
Part 2 – Tactron and related end-to-end systems(slides)(video)

See other codes developed by our group at
GitHub

Call for Papers
September 30, 2019, Computer Speech and Language Special Issue Special issue on Advances in Automatic Speaker Verification Anti-spoofing
July 1, 2019, Special session at ASRU 2019 - 2019 IEEE Automatic Speech Recognition and Understanding Workshop: ASVspoof 2019: Analysing Operational Settings
May 17, 2019, SSW10 - The 10th ISCA Speech Synthesis Workshop
March 29, 2019, Special session at Interspeech 2019: The 2019 Automatic Speaker Verification Spoofing and Countermeasures Challenge: ASVspof Challenge
November 15, 2018, CSL Special issue on Speaker and language characterization and recognition: voice modeling, conversion, synthesis and ethical aspects

Call for Participants
Call for Participants: ASVspoof 2019 CHALLENGE: Future horizons in spoofed/fake audio detection

New databases
Nov 13, 2019, CSTR VCTK Corpus (version 0.92)
June 4, 2019, ASVspoof 2019: The 3rd Automatic Speaker Verification Spoofing and Countermeasures Challenge database
March 6, 2019, Alba speech corpus (Scottish female speaker, four speaking styles)

Past members of Yamagishi laboratory