Open Speech and Language Resources

Phone: 425 247 4129
(Daniel Povey)


Identifier: SLR101

Summary: Pronunciation scoring dataset, labeled independently by five human experts

Category: Speech

License: Apache License v.2.0

Download: speechocean762.tar.gz [520M]   (the data )   Mirrors: [US]  

About this resource:

This corpus aims to provide a free public dataset for the pronunciation scoring task. This corpus consists of 5000 English sentences. All the speakers are non-native, and their mother tongue is Mandarin. Half of the speakers are Children, and the others are adults. The information of age and gender are provided. Five experts made the scores. To avoid subjective bias, each expert scores independently under the same metric. For more details, please read the `` in the corpus.