Please use this identifier to cite or link to this item:
標題: A statistical model with hierarchical structure for predicting prosody in a mandarin text-to-speech system
作者: Yu, M.S.
Pan, N.H.
關鍵字: speech synthesis;mandarin;text-to-speech (TTS) system;prosody;information synthesizer
Project: Journal of the Chinese Institute of Engineers
期刊/報告no:: Journal of the Chinese Institute of Engineers, Volume 28, Issue 3, Page(s) 385-399.
In this paper we propose a statistical prosody model with hierarchical structure for Mandarin text-to-speech (TTS) systems. There are four levels in our model, namely syllable level, word level, breath group (prosodic phrase) level, and utterance level. Here "hierarchy" means that each lower level is a subset of its higher level. The prosodic information is first found in each level, and then they are combined to get the predicted prosody. The advantages of our model are as follows: (1)Our model can relieve the data sparsity problem. Since there are only a few parameters in each level, the size of our training corpus need not be very large. (2) It is easy to verify the appropriateness of the output values of each level. (3) Our model has low prediction error. The experimental results show that the predicted prosodic values and their original values match very well. (4)Our prosody generator can predict all prosodic information. namely syllable duration, pause length, energy, and pitch contours.
ISSN: 0253-3839
DOI: 10.1080/02533839.2005.9671006
Appears in Collections:資訊科學與工程學系所

Show full item record

Google ScholarTM




Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.