Audiovisual speaker conversion: jointly and simultaneously transforming facial expression and acoustic characteristics

Authors: Fuming Fang, Xin Wang, Junichi Yamagishi, Isao Echizen
Paper

Spectrogram samples The horizontal direction indicates temporal axis while the vertical direction is frequency axis with range of 0 to 8000Hz. It seems that the proposed method predicted better spectrogram than the baseline in most cases.

Neutral


Happiness


Strong happiness


Sadness


Strong sadness


Anger


Strong anger