Towards a unified assessment framework of speech pseudonymisation

Abstract

Anonymisation and pseudonymisation are two similar concepts used in privacy preservation for speech data. With no established definitions for these tasks, nor standard approaches to assessment, this paper provides definitions and presents two complementary assessment frameworks. The first is based on voice similarity matrices which provide both an immediate visualisation of privacy protection performance at the speaker level and two objective measures in the form of de-identification and voice distinctiveness preservation. The approach readily highlights imbalances in system performance at the speaker level. The second, referred to as the zero evidence biometric recognition assessment (ZEBRA) framework, is based on information theory and measures the amount of private information disclosed in speech data. The paper presents also an extension to the original ZEBRA framework. It aims to reflect the robustness of the privacy safeguard when a privacy adversary adapts to the protected speech. We demonstrate the application of both frameworks to assess pseudonymisation performance on the two VoicePrivacy 2020 challenge baseline solutions plus a third one. The two frameworks were designed independently of each other. The ZEBRA framework is fully consistent with the Bayesian decision theory and the other framework focuses instead on speaker-wise visualisations of a system performance. Thus, while metrics derived from them bear similarities, they expose differences in safeguard behavior. The assessment of pseudonymisation remains challenging and merits greater attention in the future.

Publication
Computer Speech & Language
Paul-Gauthier Noé
Paul-Gauthier Noé
Doctoral Student
Andreas Nautsch
Andreas Nautsch
Postdoctoral Researcher
Jose Patino
Jose Patino
Postdoctoral Researcher
Driss Matrouf
Driss Matrouf
Professor