openslr.org

Open Speech and Language Resources

Silbo Gomero Speech Corpus

Identifier: SLR137

Summary: Corpus of the Silbo Gomero whistled language, based on 49 minutes of recordings created by 4 whistlers.

Category: Speech

License: Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

Downloads (use a mirror closer to you):
README.txt [3.0K] ( Readme file ) Mirrors: [EU] [EU] [CN]
words.zip [error getting size] ( Single-word clips with transcripts ) Mirrors: [EU] [EU] [CN]
fragments.zip [error getting size] ( Short fragments with transcripts ) Mirrors: [EU] [EU] [CN]
sentences.zip [error getting size] ( Whole sentences with transcripts ) Mirrors: [EU] [EU] [CN]

About this resource:

This is a corpus of the Silbo Gomero whistled language, which is a whistled form of Spanish used on the La Gomera island. It was created from 49 minutes of raw recordings. The recordings contained read speech, and were produced by 4 fluent whistlers. They were created for use in teaching this language to children native to the island.

The corpus consists of 3 parts, each of which was made from the same data, edited in different ways; separate transcription file is provided for each part.

'words.zip' contains clips of single, separate words. Some clips may contain more than one word, in cases where the separation was not possible.
'sentences.zip' contains clips of entire sentences. Some parts of the recordings are not represented here; for example, one recording contained a poem, which could not be separated into sentences.
'fragments.zip' contains clips of short fragments of speech (on average, about 6.5 words long); those fragments were made by separating recordings where longer pauses between words occured.

This corpus was created by Agata Jakubiak, a student at University of Warsaw, from data provided by Francisco Javier Correa, working for the Silbo Gomero Teaching Project (Proyecto de Enseñanza de Silbo Gomero), as a part of research into Automatic Speech Recognition of whistled speech.

You can cite the data using the following BibTeX entry:

@inproceedings{jakubiak23_interspeech,
  author={Agata Jakubiak},
  title={{Whistle-to-text: Automatic recognition of the Silbo Gomero whistled language}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
  pages={3402--3406},
  doi={10.21437/Interspeech.2023-989}
}