multilingual-SSL-SAS-samples

Audio Samples from the paper Mitigating Language Mismatch in SSL-Based Speaker Anonymization

Authors: Zhe Zhang, Wen-Chin Huang, Xin Wang, Xiaoxiao Miao, and Junichi Yamagishi

Presented at Interspeech 2025

Abstract

Speaker anonymization aims to protect speaker identity while preserving content information and the intelligibility of speech. However, most speaker anonymization systems (SASs) are developed and evaluated using only English, resulting in degraded utility for other languages. This paper investigates language mismatch in SASs for Japanese and Mandarin speech. First, we fine-tune a self-supervised learning (SSL)-based content encoder with Japanese speech to verify effective language adaptation. Then, we propose fine-tuning a multilingual SSL model with Japanese speech and evaluating the SAS in Japanese and Mandarin. Downstream experiments show that fine-tuning an English-only SSL model with the target language enhances intelligibility while maintaining privacy and that multilingual SSL further extends SASs’ utility across different languages. These findings highlight the importance of language adaptation and multilingual pre-training of SSLs for robust multilingual speaker anonymization.

Source codes are released in GitHub repository.


This page provides audio samples from our speaker anonymization experiments. Samples are in two languages:

For each utterance, we first list:

Then, the table below shows the results for the three SSL-based methods:

The methods are grouped by speaker anonimizers or resynthesis:


Japanese (JVS) Samples

Utterance: jvs002_nonparallel_UT-PARAPHRASE-sent212-phrase1.wav

Original:

VPC Baseline B2 (McAdams):

Method Group Resynthesis Selection OHNN
HU-EN
HU-JA
mHU-JA

Utterance: jvs028_parallel_VOICEACTRESS100_059.wav

Original:

VPC Baseline B2 (McAdams):

Method Group Resynthesis Selection OHNN
HU-EN
HU-JA
mHU-JA

Utterance: jvs061_nonparallel_BASIC5000_1263.wav

Original:

VPC Baseline B2 (McAdams):

Method Group Resynthesis Selection OHNN
HU-EN
HU-JA
mHU-JA

Mandarin (AISHELL-3)

Utterance: SSB08220393.wav

Original:

VPC Baseline B2 (McAdams):

Method Group Resynthesis Selection OHNN
HU-EN
HU-JA
mHU-JA

Utterance: SSB12390116.wav

Original:

VPC Baseline B2 (McAdams):

Method Group Resynthesis Selection OHNN
HU-EN
HU-JA
mHU-JA

Utterance: SSB18720153.wav

Original:

VPC Baseline B2 (McAdams):

Method Group Resynthesis Selection OHNN
HU-EN
HU-JA
mHU-JA

Reference

VoicePrivacy Challenge 2024

Speaker anonymisation using the McAdams coefficient

Language-Independent Speaker Anonymization Approach using Self-Supervised Pre-Trained Models

Speaker anonymization using orthogonal Householder neural network