Summary: Catalan speech corpus generated from Catalan Parliamentary sessions
License: CC Attribution 4.0 (CC BY 4.0)
Downloads (use a mirror closer to you):
parlament_v1.0_clean.tar.gz [7.7G] ( 90 hours of "clean" speech and transcripts ) Mirrors: [US]
parlament_v1.0_other.tar.gz [19G] ( 230 hours of "other" speech and transcripts ) Mirrors: [US]
About this resource:
Preparation of this corpus was supported by the Department of Culture of the Catalan autonomous government.
The audio files are PCM 16bit mono, little endian with the sample rate 16 kHz. As of release version 1.0, the corpus is separated into 90 hours of clean and 230 hours of other quality segments.
For contact firstname.lastname@example.org://collectivat.cat/asr The official ParlamentParla corpus webpage, with other resources and updates
http://laklak.eu/share/parlament_v1.0_clean.tar.gz (clean data )
http://laklak.eu/share/parlament_v1.0_other.tar.gz (other data )