For faster navigation, this Iframe is preloading the Wikiwand page for Persian Speech Corpus.

Persian Speech Corpus

This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages) A major contributor to this article appears to have a close connection with its subject. It may require cleanup to comply with Wikipedia's content policies, particularly neutral point of view. Please discuss further on the talk page. (May 2017) (Learn how and when to remove this message) The topic of this article may not meet Wikipedia's general notability guideline. Please help to demonstrate the notability of the topic by citing reliable secondary sources that are independent of the topic and provide significant coverage of it beyond a mere trivial mention. If notability cannot be shown, the article is likely to be merged, redirected, or deleted.Find sources: "Persian Speech Corpus" – news · newspapers · books · scholar · JSTOR (May 2017) (Learn how and when to remove this message) This article relies excessively on references to primary sources. Please improve this article by adding secondary or tertiary sources. Find sources: "Persian Speech Corpus" – news · newspapers · books · scholar · JSTOR (May 2017) (Learn how and when to remove this message) (Learn how and when to remove this message)

The Persian Speech Corpus is a Modern Persian speech corpus for speech synthesis. The corpus contains phonetic and orthographic transcriptions of about 2.5 hours of Persian speech aligned with recorded speech on the phoneme level, including annotations of word boundaries.[1] Previous spoken corpora of Persian include FARSDAT, which consists of read aloud speech from newspaper texts from 100 Persian speakers and the Telephone FARsi Spoken language DATabase (TFARSDAT) which comprises seven hours of read and spontaneous speech produced by 60 native speakers of Persian from ten regions of Iran.[2]

The Persian Speech Corpus was built using the same methodologies laid out in the doctoral project on Modern Standard Arabic of Nawar Halabi at the University of Southampton. The work was funded by MicroLinkPC, who own an exclusive license to commercialise the corpus, though the corpus is available for non-commercial use through the corpus' website. It is distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

The corpus was built for speech synthesis purposes, but has been used for building HMM based voices in Persian. It can also be used to automatically align other speech corpora with their phonetic transcript and could be used as part of a larger corpus for training speech recognition systems.[1]

Contents

[edit]

The corpus is downloadable from its website, and contains the following:

  • 396 .wav files containing spoken utterances
  • 396 .lab files containing text utterances
  • 396 .TextGrid files containing the phoneme labels with time stamps of the boundaries where these occur in the .wav files.
  • phonetic-transcript.txt which has the form "[wav_filename]" "[Phoneme Sequence]" in every line
  • orthographic-transcript.txt which has the form "[wav_filename]" "[Orthographic Transcript]" in every line

See also

[edit]

References

[edit]
  1. ^ a b Halabi, Nawar (2016). Modern Standard Persian Phonetics for Speech Synthesis (PDF) (PhD Thesis). University of Southampton, School of Electronics and Computer Science.
  2. ^ Bijankhan, Mahmood, Javad Sheykhzadegan, Mohammad Bahrani, Masood Ghayoomi, 2011. “Lessons from building a Persian written corpus: Peykare” Language Resources and Evaluation 45.2: 143–164
[edit]
{{bottomLinkPreText}} {{bottomLinkText}}
Persian Speech Corpus
Listen to this article

This browser is not supported by Wikiwand :(
Wikiwand requires a browser with modern capabilities in order to provide you with the best reading experience.
Please download and use one of the following browsers:

This article was just edited, click to reload
This article has been deleted on Wikipedia (Why?)

Back to homepage

Please click Add in the dialog above
Please click Allow in the top-left corner,
then click Install Now in the dialog
Please click Open in the download dialog,
then click Install
Please click the "Downloads" icon in the Safari toolbar, open the first download in the list,
then click Install
{{::$root.activation.text}}

Install Wikiwand

Install on Chrome Install on Firefox
Don't forget to rate us

Tell your friends about Wikiwand!

Gmail Facebook Twitter Link

Enjoying Wikiwand?

Tell your friends and spread the love:
Share on Gmail Share on Facebook Share on Twitter Share on Buffer

Our magic isn't perfect

You can help our automatic cover photo selection by reporting an unsuitable photo.

This photo is visually disturbing This photo is not a good choice

Thank you for helping!


Your input will affect cover photo selection, along with input from other users.

X

Get ready for Wikiwand 2.0 🎉! the new version arrives on September 1st! Don't want to wait?