Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/37817
標題: A statistical model with hierarchical structure for predicting prosody in a mandarin text-to-speech system
作者: Yu, M.S.
余明興
Pan, N.H.
關鍵字: speech synthesis
mandarin
text-to-speech (TTS) system
prosody
information synthesizer
期刊/報告no:: Journal of the Chinese Institute of Engineers, Volume 28, Issue 3, Page(s) 385-399.
摘要: In this paper we propose a statistical prosody model with hierarchical structure for Mandarin text-to-speech (TTS) systems. There are four levels in our model, namely syllable level, word level, breath group (prosodic phrase) level, and utterance level. Here "hierarchy" means that each lower level is a subset of its higher level. The prosodic information is first found in each level, and then they are combined to get the predicted prosody. The advantages of our model are as follows: (1)Our model can relieve the data sparsity problem. Since there are only a few parameters in each level, the size of our training corpus need not be very large. (2) It is easy to verify the appropriateness of the output values of each level. (3) Our model has low prediction error. The experimental results show that the predicted prosodic values and their original values match very well. (4)Our prosody generator can predict all prosodic information. namely syllable duration, pause length, energy, and pitch contours.
URI: http://hdl.handle.net/11455/37817
ISSN: 0253-3839
文章連結: http://dx.doi.org/10.1080/02533839.2005.9671006
Appears in Collections:資訊科學與工程學系所

文件中的檔案:

取得全文請前往華藝線上圖書館



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.