Open Speech and Language Resources

Phone: 425 247 4129
(Daniel Povey)

Speechocean 10 Hours Chinese Mandarin Speech Recognition Corpus

Identifier: SLR90

Summary: Free 10.33 Hours Chinese Mandarin Speech Recognition Corpus Provided by Speechocean

Category: Speech

License: Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

Download: [error getting size]   (Corpus )   Mirrors: [US]  

About this resource:

Dataset retracted by request of the SpeechOcean company

  • The Chinese Mandarin speech recognition corpus is provided by speechocean.
  • This is a 10.33 hours corpus, which is collected over 4 different microphones simultaneously.
  • The corpus was recorded by 20 speakers (10 males and 10 females) in a quiet office. Each speaker was recorded around 120 utterances in one channel.
  • Transcription files are included.
  • The sentence transcription accuracy is higher than 98%.
  • It is totally free to use for academic purpose.
  • This corpus is a subset of a bigger corpus (159 hours). Please contact us if you are interested.
External URL
Contact Information


About Speechocean
Speechocean always devoted itself to providing specialized engineering data products and services to enterprises and scientific research institutions in the whole industry chain of AI. Our business involves various domains such as speech recognition, speech synthesis, computer vision, lexicon, and natural language processing and provides relevant services for the design, collection, transcription, annotation, etc. of data.