Joint Noise Reduction and Listening Enhancement for Full-End Speech Enhancement

Audio samples for "Joint Noise Reduction and Listening Enhancement for Full-End Speech Enhancement"

Authors: Haoyu Li, Yun Liu, Junichi Yamagishi

Preprint paper

In real-life application scenarios, noises exist in not only far-end speaker but also near-end listener environments, resulting in severe degradation of speech quality and intelligibility.
We proposed a DNN-based joint framework integrating noise reduction with listening enhancement.

In the following audio samples, far-end (speaker side) noise type is cafeteria; near-end (listener side) noise type is airport announcement.

Sample 1: Far-end SNR = 6 dB; Near-end SNR = -1 dB; Talker: Female; Text: "Our troops are set to strike heavy blows"

Systems	Speech	Speech under near-end noise
Noisy
Noisy+NR
Noisy+LE
DSPPipe
NeuralPipe
Joint
Joint+NT

Sample 2: Far-end SNR = 10 dB; Near-end SNR = -5 dB; Talker: Male; Text: "Take the match and strike it against your shoe"

Systems	Speech	Speech under near-end noise
Noisy
Noisy+NR
Noisy+LE
DSPPipe
NeuralPipe
Joint
Joint+NT

Sample 3: Far-end SNR = 14 dB; Near-end SNR = -5 dB; Talker: Female; Text: "The corner store was robbed last night"

Systems	Speech	Speech under near-end noise
Noisy
Noisy+NR
Noisy+LE
DSPPipe
NeuralPipe
Joint
Joint+NT

Acknowledgement

Speech materials of Harvard sentences were provided by:

C. Valentini-Botinhao, C. Mayo, and M. Cooke, “Hurricane natural speech corpus - higher quality version,” 2019. Available: https://doi.org/10.7488/ds/2482
P. Demonte, “HARVARD speech corpus - audio recording 2019,” 2019. Available: https://doi.org/10.17866/rd.salford.c.4437578.v1