Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion

Publication
Accepted to IEEE Spoken Language Technology Workshop 2024