Speaker-Text Retrieval via Contrastive Learning