Challenges

We also provided opportunities for researchers outside the project to present their research results in an academic setting.

The Voice Conversion Challenge 2020

Since voice conversion is a machine learning technique that converts the voice of one speaker to another, it is necessary to train models using a common training database, convert them to a common target speaker, and compare the different methods using a unified evaluation method for fair comparison and deep scientific understanding. We therefore co-organize Voice Conversion Challenge 2020 and newly constructed a free database for performing and comparing voice conversion within and across languages. A number of organizations registered to participate, and 33 of them overseas actually constructed voice conversion systems. An international workshop for this challenge was co-organized with the Blizzard Challenge, a similar challenge for speech synthesis, as a half-day online workshop in November 2020 to present the research results of these organizations. This workshop was attended by 281 people, which was a much better response than expected.

The results of a large-scale subjective comparison experiment showed that there was no statistically significant difference between the converted speech of the top seven systems and the natural speech of the target speaker and showed that the performance of the voice conversion technology has evolved, and it has become possible to precisely reproduce the characteristics of the target speaker using various methods.

http://www.vc-challenge.org

ASVspoof challenge 2019

Although deep generative techniques for reproducing individuality in speech are expected to bring new value in entertainment and other fields, they also pose security problems for speaker recognition systems if they are misused. Since the liveness detection model for protection is also realized by supervised machine learning from a large amount of data, a database for liveness detection is required. Therefore, we first created a large-scale database that contains both natural and synthesized speech in cooperation with many organizations including Google (U.S.), iFlytek (China), and NTT (Japan). The large database was distributed to 154 organizations, 50 of which actually built liveness detection models. This database has beeb downloaded a total of 200,000 times to date. Special sessions were organized at both Interspeech 2019 and ASRU 2019 to allow these organizations to present their research results in academic conferences. In addition, a special issue in the international journal Computer Speech & Language was also organized. From the analysis results, we see that the top teams out of the 50 teams that participated in the challenge achieved very good results, confirming that they can achieve accurate liveness detection even if speech synthesis is greatly advanced (as long as when the training database is properly constructed).

https://www.asvspoof.org

Voice Privacy Initiative

We ran an international challenge on voice privacy protection, similar to the Voice Conversion Challenge and the ASVspoof challenge, in order to drive the field and accelerate research. The “Voice Privacy Initiative” was implemented in 2020. We defined the speech database, the evaluation set, and the evaluation procedure to be used in order to compare speaker anonymization methods with each other and more than a dozen universities, companies, and research organizations have proposed speaker anonymization methods according to our framework. We have conducted a mutual evaluation of speaker anonymization methods proposed by them. As a result of the intercomparison of these different anonymization methods, it was confirmed that all of them can significantly reduce the possibility of personal identification, but at the same time, the usefulness of the anonymized voice for downstream tasks is also reduced, and the best method does not exist yet. However, it is a very significant finding in that it clarified the points that needs to be improved in the future.

We organized special sessions at the international conferences Interspeech 2020 and Speaker Odyssey 2020 to allow these organizations to present their research results. We have also organized a special issue on speech privacy in the international journal Computer Speech & Language.

https://www.voiceprivacychallenge.org