PCVC Speech Dataset

The PCVC (Persian Consonant Vowel Combination) Speech Dataset is a Modern Persian speech corpus for speech recognition and also speaker recognition. The dataset contains sound samples of Modern Persian combination of vowel and consonant phonemes from different speakers. Every sound sample contains just one consonant and one vowel So it is somehow labeled in phoneme level. This dataset consists of 23 Persian consonants and 6 vowels. The sound samples are all possible combinations of vowels and consonants (138 samples for each speaker). The sample rate of all speech samples is 48000 which means there are 48000 sound samples in every 1 second. Every sound sample starts with consonant then continues with vowel. In each sample, in average, 0.5 second of each sample is speech and the rest is silence. Each sound sample ends with silence.^[1]^[2] All of sound samples are denoised with "Adaptive noise reduction" algorithm.^[3] Compared to Farsdat speech dataset^[4] and Persian speech corpus^[5] it is more easy to use because it is prepared in .mat data files.^[6] Also it is more based on phoneme based separation and all samples are denoised.

References

↑ Malekzadeh, Saber; Gholizadeh, Mohammad Hossein; Razavi, Seyyed Naser (2018). "Persian phonemes recognition using PPNet". Journal of Signal Processing Systems. doi:10.13140/RG.2.2.34836.96647.
↑ Malekzadeh, S., Gholizadeh, M.H. and Razavi, S.N., 2018. Persian Vowel recognition with MFCC and ANN on PCVC speech dataset. arXiv preprint arXiv:1812.06953.
↑ "PCVC Kaggle page". https://www.kaggle.com/sabermalek/pcvcspeech/home.
↑ Bijankhan, M., Sheikhzadegan, J., Roohani, M. R., Samareh, Y., Lucas, C., & Tebyani, M. (1994). FARSDAT-The Speech Database of Farsi Spoken Language. The Proceedings of the Australian Conference on Speech Science and Technology (Vol. 2, pp. 826–831).
↑ Halabi, Nawar (2016). Modern Standard Persian Phonetics for Speech Synthesis. University of Southampton, School of Electronics and Computer Science.
↑ "Access and change variables directly in MAT-files, without loading into memory.". https://uk.mathworks.com/help/matlab/ref/matfile.html.

External links

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/PCVC Speech Dataset. Read more

[1] Malekzadeh, Saber; Gholizadeh, Mohammad Hossein; Razavi, Seyyed Naser (2018). "Persian phonemes recognition using PPNet". Journal of Signal Processing Systems. doi:10.13140/RG.2.2.34836.96647.

[2] Malekzadeh, S., Gholizadeh, M.H. and Razavi, S.N., 2018. Persian Vowel recognition with MFCC and ANN on PCVC speech dataset. arXiv preprint arXiv:1812.06953.

[3] "PCVC Kaggle page". https://www.kaggle.com/sabermalek/pcvcspeech/home.

[4] Bijankhan, M., Sheikhzadegan, J., Roohani, M. R., Samareh, Y., Lucas, C., & Tebyani, M. (1994). FARSDAT-The Speech Database of Farsi Spoken Language. The Proceedings of the Australian Conference on Speech Science and Technology (Vol. 2, pp. 826–831).

[5] Halabi, Nawar (2016). Modern Standard Persian Phonetics for Speech Synthesis. University of Southampton, School of Electronics and Computer Science.

[6] "Access and change variables directly in MAT-files, without loading into memory.". https://uk.mathworks.com/help/matlab/ref/matfile.html.

[1]

[2]

[3]

[4]

[5]

[6]

v t e Corpus linguistics
Text corpora, English	American National Corpus Bank of English Bergen Corpus of London Teenage Language British National Corpus Brown Corpus Buckeye Corpus Cambridge English Corpus Corpus of Contemporary American English Enron Corpus EnTenTen International Corpus of English Lancaster-Oslo-Bergen Corpus Oxford English Corpus PropBank Spoken English Corpus TIMIT VerbNet Wellington Corpus of Spoken New Zealand English
Text corpora, non-English	Bijankhan Corpus CHILDES Croatian Language Corpus Croatian National Corpus Czech National Corpus Europarl Corpus German Reference Corpus Hamshahri Corpus National Corpus of Polish Neo-Assyrian Text Corpus Project Quranic Arabic Corpus Russian National Corpus Scottish Corpus of Texts and Speech Slovenian National Corpus TalkBank Tatoeba Tehran Monolingual Corpus Tekstaro de Esperanto TenTen Corpus Family Thesaurus Linguae Graecae
Organizations	BNC consortium COBUILD Sketch Engine

Anonymous

Search

PCVC Speech Dataset

Namespaces

More

Page actions

Contents

Contents

See also

References

External links

Navigation

Navigation

Help

Translate

Wiki tools

Wiki tools

Anonymous

Search

PCVC Speech Dataset

Contents

See also

References

External links

Navigation

Wiki tools

Page tools

Other projects

Categories