Yamagishi Laboratory, National Institute of Informatics, Japan

researchmap
Associate members researchmap
Follow @yamagishilab

Selected publications and their samples/codes:
"Noise Tokens: Learning Neural Noise Templates for Environment-Aware Speech Enhancement"
Haoyu Li, Junichi Yamagishi
April. 2020, Submitted to Interspeech 2020
Preprint, samples

"Using Cyclic Noise as the Source Signal for Neural Source-Filter-based Speech Waveform Model"
Xin Wang, Junichi Yamagishi
April. 2020, Submitted to Interspeech 2020
Preprint, code and samples

"iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning"
Haoyu Li, Szu-Wei Fu, Yu Tsao, Junichi Yamagishi
April. 2020, Submitted to Interspeech 2020
Preprint, code and pretrained model, samples

"An initial investigation on optimizing tandem speaker verification and countermeasure systems using reinforcement learning"
Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen, Junichi Yamagishi
Speaker Odyssey 2020
Preprint, code

"ASVspoof 2019: a large-scale public database of synthetic, converted and replayed speech"
Xin Wang, Junichi Yamagishi, Massimiliano Todisco, Hector Delgado, Andreas Nautsch, Nicholas Evans, Md Sahidullah, Ville Vestman, Tomi Kinnunen, Kong Aik Lee, Lauri Juvela, Paavo Alku, Yu-Huai Peng, Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Sebastien Le Maguer, Markus Becker, Fergus Henderson, Rob Clark, Yu Zhang, Quan Wang, Ye Jia, Kai Onuma, Koji Mushika, Takashi Kaneda, Yuan Jiang, Li-Juan Liu, Yi-Chiao Wu, Wen-Chin Huang, Tomoki Toda, Kou Tanaka, Hirokazu Kameoka, Ingmar Steiner, Driss Matrouf, Jean-Francois Bonastre, Avashna Govender, Srikanth Ronanki, Jing-Xuan Zhang, Zhen-Hua Ling
Nov. 2019, Submitted to Computer Speech and Language
Preprint, project page, samples

"Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-based Detection"
David Ifeoluwa Adelani, Haotian Mai, Fuming Fang, Huy H. Nguyen, Junichi Yamagishi, Isao Echizen
AINA 2020
Preprint

"Zero-Shot Multi-Speaker Text-To-Speech with State-of-the-art Neural Speaker Embeddings"
Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Fuming Fang, Xin Wang, Nanxin Chen, Junichi Yamagishi
ICASSP 2020
Preprint, samples, codes (speaker encoder), codes (tacotron)

"Transferring neural speech waveform synthesizers to musical instrument sounds generation"
Yi Zhao, Xin Wang, Lauri Juvela, Junichi Yamagishi
ICASSP 2020
Preprint, samples

"Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment"
Yusuke Yasuda, Xin Wang, Junichi Yamagishi
ICASSP 2020
Preprint, samples

"Bootstrapping non-parallel voice conversion from speaker-adaptive text-to-speech"
Hieu-Thi Luong, Junichi Yamagishi
IEEE ASRU 2019
Preprint, samples

"Multi-task Learning For Detecting and Segmenting Manipulated Facial Images and Videos"
Huy H. Nguyen, Fuming Fang, Junichi Yamagishi, Isao Echizen
BTAS 2019
Preprint, Demo video, Codes

"Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments"
Yusuke Yasuda, Xin Wang, Junichi Yamagishi
10th ISCA Speech Synthesis Workshop (SSW10)
Preprint, samples

"Rakugo speech synthesis using segment-to-segment neural transduction and style tokens — toward speech synthesis for entertaining audiences"
Shuhei Kato, Yusuke Yasuda, Xin Wang, Erica Cooper, Shinji Takaki, Junichi Yamagishi
10th ISCA Speech Synthesis Workshop (SSW10)
Preprint, samples

"Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis"
Xin Wang, Junichi Yamagishi
10th ISCA Speech Synthesis Workshop (SSW10)
Preprint, samples and codes

"Speaker Anonymization Using X-vector and Neural Waveform Models"
Fuming Fang, Xin Wang, Junichi Yamagishi, Isao Echizen, Massimiliano Todisco, Nicholas Evans, Jean-Francois Bonastre
10th ISCA Speech Synthesis Workshop (SSW10)
Preprint, samples

"Neural source-filter waveform models for statistical parametric speech synthesis"
Xin Wang, Shinji Takaki, Junichi Yamagishi
IEEE/ACM Transactions on Audio, Speech, and Language Processing
Preprint, samples and codes

"ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection"
Massimiliano Todisco, Xin Wang, Ville Vestman, Md Sahidullah, Hector Delgado, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee
Interspeech 2019
Preprint, challenge website, database

"GELP: GAN-Excited Liner Prediction for Speech Synthesis from Mel-spectrogram"
Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku
Interspeech 2019
Preprint, samples

"Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet"
Mingyang Zhang, Xin Wang, Fuming Fang, Haizhou Li, Junichi Yamagishi
Interspeech 2019
Preprint, samples

"MOSNet: Deep Learning based Objective Assessment for Voice Conversion"
Chen-Chou Lo, Szu-Wei Fu, Wen-Chin Huang, Xin Wang, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang
Interspeech 2019
Preprint, Codes

"Does the Lombard Effect Improve Emotional Communication in Noise? - Analysis of Emotional Speech Acted in Noise -"
Yi Zhao, Atsushi Ando, Shinji Takaki, Junichi Yamagishi, Satoshi Kobashikawa
Interspeech 2019
Preprint, samples

"Training Multi-Speaker Neural Text-to-Speech Systems using Speaker-Imbalanced Speech Corpora"
Hieu-Thi Luong, Xin Wang, Junichi Yamagishi, Nobuyuki Nishizawa
Interspeech 2019
Preprint, samples

"Spatio-temporal generative adversarial network for gait anonymization"
Ngoc-Dung T. Tieu, Huy H. Nguyen, Hoang-Quoc Nguyen-Son, Junichi Yamagishi, Isao Echizen
Journal of Information Security and Applications
Preprint

"Introduction to Voice Presentation Attack Detection and Recent Advances"
Md Sahidullah, Hector Delgado, Massimiliano Todisco, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, Kong-Aik Lee
a book-chapter in Handbook of Biometric Anti-Spoofing Presentation Attack Detection (Second Edition)
Preprint

"Neural source-filter-based waveform model for statistical parametric speech synthesis"
Xin Wang, Shinji Takaki, Junichi Yamagishi
ICASSP 2019
Preprint, samples and codes

"STFT spectral loss for training a neural speech waveform model"
Shinji Takaki, Toru Nakashika, Xin Wang, Junichi Yamagishi
ICASSP 2019
Preprint, samples, codes

"Capsule-Forensics: Using Capsule Networks to Detect Forged Images and Videos"
Huy H. Nguyen, Junichi Yamagishi, Isao Echizen
ICASSP 2019
Preprint, Demo video, Codes

"Attentive Filtering Networks for Audio Replay Attack Detection"
Cheng-I Lai, Alberto Abad, Korin Richmond, Junichi Yamagishi, Najim Dehak, Simon King
ICASSP 2019
Preprint, Codes

"Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language"
Yusuke Yasuda, Xin Wang, Shinji Takaki, Junichi Yamagishi
ICASSP 2019
Preprint, Codes (Tacotron with self attention), Codes (Tacotron2)

"Audiovisual speaker conversion: jointly and simultaneously transforming facial expression and acoustic characteristics"
Fuming Fang, Xin Wang, Junichi Yamagishi, Isao Echizen
ICASSP 2019
Preprint, samples

"Waveform generation for text-to-speech synthesis using pitch-synchronous multi-scale generative adversarial networks"
Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku
ICASSP 2019
Preprint, samples, codes

See full publications at
Google scholar, ResearchGate, Researchmap, or Edinburgh Research Explore.

Selected tutorials:
"Tutorial on end-to-end text-to-speech synthesis"
Xin Wang, Yusuke Yasuda
Part 1 – Neural waveform modeling (slides)(video)
Part 2 – Tactron and related end-to-end systems(slides)(video)

See other codes developed by our group at
GitHub

Call for Papers
September 30, 2019, Computer Speech and Language Special Issue Special issue on Advances in Automatic Speaker Verification Anti-spoofing
July 1, 2019, Special session at ASRU 2019 - 2019 IEEE Automatic Speech Recognition and Understanding Workshop: ASVspoof 2019: Analysing Operational Settings
May 17, 2019, SSW10 - The 10th ISCA Speech Synthesis Workshop
March 29, 2019, Special session at Interspeech 2019: The 2019 Automatic Speaker Verification Spoofing and Countermeasures Challenge: ASVspof Challenge
November 15, 2018, CSL Special issue on Speaker and language characterization and recognition: voice modeling, conversion, synthesis and ethical aspects

Call for Participants
Call for Participants: VoicePrivacy Challenge 2020
Call for Participants: Voice Conversion Challenge 2020
Call for Participants: ASVspoof 2019 CHALLENGE: Future horizons in spoofed/fake audio detection

New databases
Nov 13, 2019, CSTR VCTK Corpus (version 0.92)
June 4, 2019, ASVspoof 2019: The 3rd Automatic Speaker Verification Spoofing and Countermeasures Challenge database
March 6, 2019, Alba speech corpus (Scottish female speaker, four speaking styles)

Past members
researchmap