Abstract of "Language-independent Speaker Anonymization Approach using Self-supervised Pre-trained Models":
Speaker anonymization aims to protect the privacy of the speakers while preserving the spoken linguistic information from speech. Current mainstream neural network speaker anonymization system is complicated, containing F0 extractor, speaker encoder, automatic speech recognition acoustic model (ASR AM), speech synthesis acoustic model (SS AM) and speech waveform generation model.
Moreover, as an ASR AM is language-dependent, trained on English data, it is hard to adapt into another language.
In this paper, we propose a simpler SSL-based method for language-independent speaker anonymization without any explicit language-dependent model, which can be easily used for other languages.
Extensive experiments are conducted on VoicePrivacy Challenge (VPC) 2020 datasets in English and AISHELL-3 datasets in Mandarin to demonstrate the effectiveness of our proposed SSL-based language-independent speaker anonymization method.
Abstract of "Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions":
In our previous work, we proposed a language-independent speaker anonymization system based on self-supervised learning models. Although the system can anonymize speech data of any language, the anonymization was imperfect, and the speech content of the anonymized speech was distorted. This limitation is more severe when the input speech is from a domain unseen in the training data.
This study analyzed the bottleneck of the anonymization system under unseen conditions. It was found that the domain (e.g., language and channel) mismatch between the training and test data affected the neural waveform vocoder and anonymized speaker vectors, which limited the performance of the whole system. Increasing the training data diversity for the vocoder was found to be helpful to reduce its implicit language and channel dependency. Furthermore, a simple correlation-alignment-based domain adaption strategy was found to be significantly effective to alleviate the mismatch on the anonymized speaker vectors.
Abstract of "Language-independent speaker anonymization using orthogonal Householder neural network":
Speaker anonymization aims to conceal a speaker's identity while preserving content information in speech.
Current mainstream neural-network speaker anonymization systems disentangle speech into prosody-related, content, and speaker representations.
The speaker representation is then anonymized by a selection-based speaker anonymizer that uses a mean vector over a set of randomly selected speaker vectors from an external pool of English speakers.
However, the resulting anonymized vectors are subject to severe privacy leakage against powerful attackers, reduction in speaker diversity, and language mismatch problems for unseen language speaker anonymization.
To generate diverse, language-neutral speaker vectors, this paper proposes an anonymizer based on an orthogonal Householder neural network (OHNN).
Specifically, the OHNN acts like a rotation to transform the original speaker vectors into anonymized speaker vectors, which are constrained to follow the distribution over the original speaker vector space.
A basic classification loss is introduced to ensure that anonymized speaker vectors from different speakers have unique speaker identities.
To further protect speaker identities, an improved classification loss and similarity loss are used to push original-anonymized sample pairs away from each other.
Experiments on VoicePrivacy Challenge datasets in English and the AISHELL-3 dataset in Mandarin demonstrate the proposed anonymizer's effectiveness.