Audio samples for "Multi-Metric Optimization using Generative Adversarial Networks for Near-End Speech Intelligibility Enhancement"


Authors: Haoyu Li, Junichi Yamagishi


Be careful with the volume when playing samples.


Sample 1:   Noise: Cafeteria;   SNR = -1 dB;   Talker: Female;   Text: "Be sure to set the lamp firmly in the hole"

Systems Speech Speech-in-Noise at Weak Reverb Speech-in-Noise at Medium Reverb Speech-in-Noise at Severe Reverb
Unmodified
SSDRC[1]
iMetricGAN[2]
Proposed (S+H+E)
Proposed (All)


Sample 2:   Noise: Cafeteria;   SNR = -5 dB;   Talker: Male;   Text: "We don't like to admit our small faults"

Systems Speech Speech-in-Noise at Weak Reverb Speech-in-Noise at Medium Reverb Speech-in-Noise at Severe Reverb
Unmodified
SSDRC[1]
iMetricGAN[2]
Proposed (S+H+E)
Proposed (All)


Sample 3:   Noise: Cafeteria;   SNR = -9 dB;   Talker: Female;   Text: "The stitch will serve but needs to be shortened"

Systems Speech Speech-in-Noise at Weak Reverb Speech-in-Noise at Medium Reverb Speech-in-Noise at Severe Reverb
Unmodified
SSDRC[1]
iMetricGAN[2]
Proposed (S+H+E)
Proposed (All)


Sample 4:   Noise: Airport Announcement;   SNR = -5 dB;   Talker: Male;   Text: "The bombs left most of the town in ruins"

Systems Speech Speech-in-Noise at Weak Reverb Speech-in-Noise at Medium Reverb Speech-in-Noise at Severe Reverb
Unmodified
SSDRC[1]
iMetricGAN[2]
Proposed (S+H+E)
Proposed (All)


Sample 5:   Noise: Airport Announcement;   SNR = -9 dB;   Talker: Female;   Text: "Will you please answer that phone"

Systems Speech Speech-in-Noise at Weak Reverb Speech-in-Noise at Medium Reverb Speech-in-Noise at Severe Reverb
Unmodified
SSDRC[1]
iMetricGAN[2]
Proposed (S+H+E)
Proposed (All)


Sample 6:   Noise: Airport Announcement;   SNR = -13 dB;   Talker: Male;   Text: "A pink shell was found on the sandy beach"

Systems Speech Speech-in-Noise at Weak Reverb Speech-in-Noise at Medium Reverb Speech-in-Noise at Severe Reverb
Unmodified
SSDRC[1]
iMetricGAN[2]
Proposed (S+H+E)
Proposed (All)


Reference

[1]. T.-C. Zorila, V. Kandia, and Y. Stylianou, “Speech-in-noise intelligibility improvement based on spectral shaping and dynamic range compression,” in Proc. Interspeech, 2012, pp. 635–638.

[2]. H. Li, S.-W. Fu, Y. Tsao, and J. Yamagishi, “iMetricGAN: Intelligibility Enhancement for Speech-in-Noise Using Generative Adversarial Network-Based Metric Learning,” in Proc. Interspeech, 2020, pp. 1336-1340.

Acknowledgement

Speech materials of Harvard sentences were provided by: