Please use this identifier to cite or link to this item:
|標題:||A statistical model with hierarchical structure for predicting prosody in a mandarin text-to-speech system||作者:||Yu, M.S.
|關鍵字:||speech synthesis;mandarin;text-to-speech (TTS) system;prosody;information synthesizer||Project:||Journal of the Chinese Institute of Engineers||期刊/報告no：:||Journal of the Chinese Institute of Engineers, Volume 28, Issue 3, Page(s) 385-399.||摘要:||
In this paper we propose a statistical prosody model with hierarchical structure for Mandarin text-to-speech (TTS) systems. There are four levels in our model, namely syllable level, word level, breath group (prosodic phrase) level, and utterance level. Here "hierarchy" means that each lower level is a subset of its higher level. The prosodic information is first found in each level, and then they are combined to get the predicted prosody. The advantages of our model are as follows: (1)Our model can relieve the data sparsity problem. Since there are only a few parameters in each level, the size of our training corpus need not be very large. (2) It is easy to verify the appropriateness of the output values of each level. (3) Our model has low prediction error. The experimental results show that the predicted prosodic values and their original values match very well. (4)Our prosody generator can predict all prosodic information. namely syllable duration, pause length, energy, and pitch contours.
|Appears in Collections:||資訊科學與工程學系所|
Show full item record
TAIR Related Article
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.