LISASSL

Audio samples from Language-independent Speaker Anonymization Approach using Self-supervised Pre-trained Models

Paper:

Published in Odyssey 2022.

Authors:

Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Natalia Tomashenko

Messages about Data:

English natural audios are from two datasets. The first one is the Librispeech corpus, which is made avaliable by Vassil Panayotov with the assistance of Daniel Povey under an Attribution 4.0 International (CC BY 4.0) lisence. The second one is the VCTK corpus, which is made avaliable by the Centre for Speech Technology Research (CSTR) of the University of Edinburgh under an Attribution 4.0 International (CC BY 4.0) lisence.
Madanrin natural audios are from AISHELL-3 corpus, which is made avaliable by AISHELL company under an Apache Lisense v2.0 lisence.
Audio samples other than those from Librispeech, VCTK and AISHELL-3 on this website are distributed with under an Attribution 4.0 International (CC BY 4.0) lisence.

Abstract of "Language-independent Speaker Anonymization Approach using Self-supervised Pre-trained Models":

Speaker anonymization aims to protect the privacy of the speakers while preserving the spoken linguistic information from speech. Current mainstream neural network speaker anonymization system is complicated, containing F0 extractor, speaker encoder, automatic speech recognition acoustic model (ASR AM), speech synthesis acoustic model (SS AM) and speech waveform generation model. Moreover, as an ASR AM is language-dependent, trained on English data, it is hard to adapt into another language. In this paper, we propose a simpler SSL-based method for language-independent speaker anonymization without any explicit language-dependent model, which can be easily used for other languages. Extensive experiments are conducted on VoicePrivacy Challenge (VPC) 2020 datasets in English and AISHELL-3 datasets in Mandarin to demonstrate the effectiveness of our proposed SSL-based language-independent speaker anonymization method.

English Speaker Anonymization

	Libri-test-female	Libri-test-male	VCTK-test-female	VCTK-test-male
Natural
VPC 2020 B1
S-ECAPA + HuBERT
S-ECAPA + HuBERT-km
S-ECAPA + HuBERT-soft
F-ECAPA + HuBERT-soft

Mandarin Speaker Anonymization

	Female		Male
Natural
	AISHELL-3-female-resynthesis	AISHELL-3-female-anonymized	AISHELL-3-male-resynthesis	AISHELL-3-male-anonymized
S-ECAPA + HuBERT-soft
F-ECAPA + HuBERT-soft

Page 2

Audio samples from Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions

Paper:

Published in INTERSPEECH 2022.

Authors:

Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Natalia Tomashenko

Messages about Data:

English natural audios are from two datasets. The first one is the Librispeech corpus, which is made avaliable by Vassil Panayotov with the assistance of Daniel Povey under an Attribution 4.0 International (CC BY 4.0) lisence. The second one is the VCTK corpus, which is made avaliable by the Centre for Speech Technology Research (CSTR) of the University of Edinburgh under an Attribution 4.0 International (CC BY 4.0) lisence.
Madanrin natural audios are from AISHELL-3 corpus, which is made avaliable by AISHELL company under an Apache Lisense v2.0 lisence.
Audio samples other than those from Librispeech, VCTK and AISHELL-3 on this website are distributed with under an Attribution 4.0 International (CC BY 4.0) lisence.

Abstract of "Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions":

In our previous work, we proposed a language-independent speaker anonymization system based on self-supervised learning models. Although the system can anonymize speech data of any language, the anonymization was imperfect, and the speech content of the anonymized speech was distorted. This limitation is more severe when the input speech is from a domain unseen in the training data. This study analyzed the bottleneck of the anonymization system under unseen conditions. It was found that the domain (e.g., language and channel) mismatch between the training and test data affected the neural waveform vocoder and anonymized speaker vectors, which limited the performance of the whole system. Increasing the training data diversity for the vocoder was found to be helpful to reduce its implicit language and channel dependency. Furthermore, a simple correlation-alignment-based domain adaption strategy was found to be significantly effective to alleviate the mismatch on the anonymized speaker vectors.

Manarin Speaker Anonymization

	AISHELL-3-female	AISHELL-3-male
Natural
Monolingual HiFi-GAN w/o CORAL
Monolingual HiFi-GAN w/ General-CORAL-10
Monolingual HiFi-GAN w/ Mandarin-CORAL-10
Multilingual HiFi-GAN w/o CORAL
Multilingual HiFi-GAN w/ General-CORAL-10
Multilingual HiFi-GAN w/ Mandarin-CORAL-10

Audio samples from Language-independent speaker anonymization using orthogonal Householder neural network

Paper:

Submitted to Journal.

Authors:

Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Natalia Tomashenko

Messages about Data:

English natural audios are from two datasets. The first one is the Librispeech corpus, which is made avaliable by Vassil Panayotov with the assistance of Daniel Povey under an Attribution 4.0 International (CC BY 4.0) lisence. The second one is the VCTK corpus, which is made avaliable by the Centre for Speech Technology Research (CSTR) of the University of Edinburgh under an Attribution 4.0 International (CC BY 4.0) lisence.
Madanrin natural audios are from AISHELL-3 corpus, which is made avaliable by AISHELL company under an Apache Lisense v2.0 lisence.
Audio samples other than those from Librispeech, VCTK and AISHELL-3 on this website are distributed with under an Attribution 4.0 International (CC BY 4.0) lisence.

Abstract of "Language-independent speaker anonymization using orthogonal Householder neural network":

Speaker anonymization aims to conceal a speaker's identity while preserving content information in speech. Current mainstream neural-network speaker anonymization systems disentangle speech into prosody-related, content, and speaker representations. The speaker representation is then anonymized by a selection-based speaker anonymizer that uses a mean vector over a set of randomly selected speaker vectors from an external pool of English speakers. However, the resulting anonymized vectors are subject to severe privacy leakage against powerful attackers, reduction in speaker diversity, and language mismatch problems for unseen language speaker anonymization. To generate diverse, language-neutral speaker vectors, this paper proposes an anonymizer based on an orthogonal Householder neural network (OHNN). Specifically, the OHNN acts like a rotation to transform the original speaker vectors into anonymized speaker vectors, which are constrained to follow the distribution over the original speaker vector space. A basic classification loss is introduced to ensure that anonymized speaker vectors from different speakers have unique speaker identities. To further protect speaker identities, an improved classification loss and similarity loss are used to push original-anonymized sample pairs away from each other. Experiments on VoicePrivacy Challenge datasets in English and the AISHELL-3 dataset in Mandarin demonstrate the proposed anonymizer's effectiveness.

English Speaker Anonymization

	Libri-test-female	Libri-test-male	VCTK-test-female	VCTK-test-male
Natural
VPC 2020 B1
SSL-SAS using selection-based anonymizer
SSL-SAS using roh-based anonymizer
SSL-SAS using loh-based anonymizer

Mandarin Speaker Anonymization

	AISHELL-3-female	AISHELL-3-male
Natural
SSL-SAS using selection-based anonymizer
SSL-SAS using roh-based anonymizer
SSL-SAS using loh-based anonymizer

Welcome to the SSL-based SAS audio samples website

Audio samples from Language-independent Speaker Anonymization Approach using Self-supervised Pre-trained Models

Paper:

Authors:

Messages about Data:

Abstract of "Language-independent Speaker Anonymization Approach using Self-supervised Pre-trained Models":

English Speaker Anonymization

Mandarin Speaker Anonymization

Page 2

Audio samples from Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions

Paper:

Authors:

Messages about Data:

Abstract of "Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions":

Manarin Speaker Anonymization

Audio samples from Language-independent speaker anonymization using orthogonal Householder neural network

Paper:

Authors:

Messages about Data:

Abstract of "Language-independent speaker anonymization using orthogonal Householder neural network":

English Speaker Anonymization

Mandarin Speaker Anonymization