openslr.org

Open Speech and Language Resources

Primewords Chinese Corpus Set 1

Identifier: SLR47

Summary: Chinese Mandarin corpus released by Shanghai Primewords Co. Ltd. (www.primewords.cn), containing 100 hours of speech data.

Category: Speech

License: Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

Downloads (use a mirror closer to you):
primewords_md_2018_set1.tar.gz [9.0G] (speech data and transcripts ) Mirrors: [US] [EU] [CN]

About this resource:

This free Chinese Mandarin speech corpus set is released by Shanghai Primewords Information Technology Co., Ltd.

The corpus is recorded by smart mobile phones from 296 native Chinese speakers. The transcription accuracy is larger than 98%, at the confidence level of 95%. It is free for academic use.

The mapping between the transcript and utterance is given in JSON format.

You can cite the data using the following BibTeX entry:

    @misc{primewords_201801,
    title={Primewords Chinese Corpus Set 1},
    author={Primewords Information Technology Co., Ltd.},
    year={2018},
    note={\url{https://www.primewords.cn}}
    }

CONTACTOR Yinghui Liu, yinghui_liu@primewords.cn

External URLs: https://www.primewords.cn