Yamagishi Lab PhD Defenses

We have two PhD defenses scheduled for January 30th, 2024!

Time: 2024-01-30 14:00 ~ 18:00 (JST time)

Location: NII 1903 / online

Registration: Please register here if you are interested!


14:00 - 15:00 - Chang Zeng (open to work)

  • Title: Spoofing-aware Speaker Verification System Robust Against Domain and Channel Mismatches
  • Abstract:

Automatic speaker verification (ASV) has shown immense potential across domains like security, forensic analysis, and human-computer interaction. However, real-world deployment necessitates ASV systems that are robust to diverse conditions involving channel variability, spoofing attacks, and domain mismatch between training and test environments. This thesis makes significant contributions towards constructing reliable ASV systems under such multifaceted scenarios.

We first propose an attention-based back-end model that handles channel mismatch by pair-wise learning paradigm as well as explicitly modeling relationships between multiple enrollment utterances using self-attention. This provides superior performance over PLDA back-ends, reducing EER from 12.52% to 10.12% on the CNCeleb 1&2 datasets. Next, we enhance spoofing resilience by fusing ASV and countermeasure modules through score-level integration and joint optimization. Our spoofing-aware approach slashes SASV-EER from 22.91% to 1.19% on ASVspoof 2019 LA while preserving ASV accuracy. We then apply meta-learning to induce domain generalization, lowering EER by 26.7% on unseen genres in anti-spoofing tasks. Finally, we demonstrate the integration of these techniques into a unified framework, which incorporates the pair-wise learning paradigm and spoofing attacks simulation into the meta-learning paradigm, that concurrently addresses channel mismatch, spoofing attacks, and domain mismatch in an end-to-end manner. The experimental results exhibit significantly lower EERs compared to baseline systems with the supervised learning paradigm, confirming the promise of our integrated solution.

15:00 - 16:00 - Closed meeting between examiners

16:00 - 17:00 - Lin Zhang (open to work)

  • Title: “Whether, When, What”: Detection, Localization, and Diarization of Partially Spoofed Audio
  • Abstract:

Biometric systems are vulnerable to various manipulations and spoofing, such as text-to-speech synthesis, voice conversion, replay, tampering, adversarial attacks, etc. However, previous research has scarcely explored scenarios where synthetic speech segments are embedded within bona fide speech utterances. We dub this new spoofing scenario as ‘‘Partial Spoof’’ (PS). This thesis delves into this newly defined, threatening PS scenario. The primary objectives of this thesis are to define the PS scenario, construct a database with benchmark models, and analyze the performance of these benchmark models in the PS scenario. As one of the pioneering studies on the PS scenario, we designed three specific tasks, each associated with a question: (1) Spoof Detection: Whether the utterance is spoofed? This task aligns with the common task in the spoofing community distinguishing whether an utterance is spoofed or bona fide. (2) Spoof Localization: When do spoofs happen? This task aims to determine the location of spoof segments within utterances. (3) Spoof Diarization: What attacks when? This task not only locates the spoofed segments but also discriminates the specific spoofing techniques employed.

To explore the aforementioned tasks presented by the PS scenario, we developed various CMs, from conventional models enhanced with advanced strategies to cutting-edge models using self-supervised learning. In summary, this thesis establishes a series of benchmarks for the PS scenario research, representing a remarkable contribution to the speech anti-spoofing community. It serves as a foundation for further investigation into the PS scenario, and is the first study to release comprehensive PS-related resources. This includes a database with detailed timestamp annotations, codes, and models. All related resources, as mentioned in this PhD thesis are available at https://github.com/nii-yamagishilab/PartialSpoof.

17:00 - 18:00 - Closed meeting between examiners