Articulatory data offers promising developments in our understanding of speech production and advances in speech technologies. However, it is more expensive and difficult to obtain than audio data, which means data collection must be carefully planned. This paper presents a method for designing an articulatory speech corpus comparable to the widely-used TIMIT corpus, for languages other than English, using Italian as a case study. This data-driven method searches freely-available online text corpora for a set of sentences that provide broad phonetic coverage, while still being small enough to be read in a single session, which is important given the often invasive nature of articulatory data collection. Sentences are first phonemically transcribed and scored based on negative log-likelihood of triphones, with sentences that have many rare triphones scoring higher. The search algorithm then finds sentences that have high scores, but also contain the most frequent triphones. Experiments show that the distribution of triphones in the automatically selected sentences is similar to that found in handconstructed sentence sets for English, such as TIMIT, and offers broader phonetic coverage than selecting random sets of sentences
Data-driven design of a sentence list for an articulatory speech corpus
FADIGA, Luciano
2013
Abstract
Articulatory data offers promising developments in our understanding of speech production and advances in speech technologies. However, it is more expensive and difficult to obtain than audio data, which means data collection must be carefully planned. This paper presents a method for designing an articulatory speech corpus comparable to the widely-used TIMIT corpus, for languages other than English, using Italian as a case study. This data-driven method searches freely-available online text corpora for a set of sentences that provide broad phonetic coverage, while still being small enough to be read in a single session, which is important given the often invasive nature of articulatory data collection. Sentences are first phonemically transcribed and scored based on negative log-likelihood of triphones, with sentences that have many rare triphones scoring higher. The search algorithm then finds sentences that have high scores, but also contain the most frequent triphones. Experiments show that the distribution of triphones in the automatically selected sentences is similar to that found in handconstructed sentence sets for English, such as TIMIT, and offers broader phonetic coverage than selecting random sets of sentencesI documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.