Speech samples for "STFT spectral loss for training a neural speech waveform model"

Authors: Shinji Takaki, Toru Nakshika, Xin Wang, Junichi Yamagishi
Paper, Code

1: Author of the danger trail, Philip Steels, etc.

	World	WaveNet	E^(wav)	α=0	α=1	α=v

NAT:

AbS:
TTS:

2: To my surprise he began to show actual enthusiasm in my favor.

	World	WaveNet	E^(wav)	α=0	α=1	α=v

NAT:

AbS:
TTS:

3: In a flash Philip followed its direction.

	World	WaveNet	E^(wav)	α=0	α=1	α=v

NAT:

AbS:
TTS:

4: Much, replied Jeanne, as tersely.

	World	WaveNet	E^(wav)	α=0	α=1	α=v

NAT:

AbS:
TTS:

5: I suppose you picked that lingo up among the Indians.

	World	WaveNet	E^(wav)	α=0	α=1	α=v

NAT:

AbS:
TTS:

Acknowledgement
WORLD: https://github.com/mmorise/World
These synthetic speech samples were constructed using the CMU Arctic database. The CMU_ARCTIC databases were constructed at the Language Technologies Institute at Carnegie Mellon University. See http://festvox.org/cmu_arctic/ for more details.