Xin Wang

Project Assistant Professor

National Institute of Informatics

Latest

A Multi-Level Attention Model for Evidence-Based Fact Checking
A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection
An Initial Investigation for Detecting Partially Spoofed Audio
ASVspoof 2019: Spoofing Countermeasures for the Detection of Synthesized, Converted and Replayed Speech
ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection
How Similar or Different is Rakugo Speech Synthesizer to Professional Performers?
Multi-Task Learning in Utterance-Level and Segmental-Level Spoof Detection
Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis
Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing
ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech
Design Choices for X-Vector Based Speaker Anonymization
Introducing the VoicePrivacy Initiative
Modeling of Rakugo Speech and Its Limitations: Toward Speech Synthesis That Entertains Audiences
Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis
Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals
Transferring Neural Speech Waveform Synthesizers to Musical Instrument Sounds Generation
Using Cyclic Noise as the Source Signal for Neural Source-Filter-Based Speech Waveform Model
Zero-Shot Multi-Speaker Text-To-Speech with State-Of-The-Art Neural Speaker Embeddings
ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection
Investigation of Enhanced Tacotron Text-to-speech Synthesis Systems with Self-attention for Pitch Accent Language
Joint Training Framework for Text-to-Speech and Voice Conversion Using Multi-Source Tacotron and WaveNet
MOSNet: Deep Learning based Objective Assessment for Voice Conversion
Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis
Neural Source-filter-based Waveform Model for Statistical Parametric Speech Synthesis
Rakugo speech synthesis using segment-to-segment neural transduction and style tokens --- toward speech synthesis for entertaining audiences
Speaker Anonymization Using X-vector and Neural Waveform Models