Please use this identifier to cite or link to this item: `http://hdl.handle.net/11455/18370`
 標題: 具語者調適功能之電話號碼語音合成系統A Telephone Number Text-to-Speech System With Speaker Adaptation 作者: 鄭元傑Zheng, Yuan-Jie 關鍵字: Text-to-Speech文句翻語音系統TTSProsodyDurationEnergySpeaker AdaptationTelephone Number韻律訊息音長音量語者調適電話號碼 出版社: 應用數學系 摘要: 在本篇論文中，我們發展了一套具有語者調適功能的國語電話號碼語音合成系統。由我們所錄製的連續音檔，使用階層式的統計方法，分析相關的參數來做韻律訊息的預估，其中我們所考慮的參數有前音、後音、段落與音節數，利用階層式的統計方法來做音長音量及停頓的預估。在我們的內部測試部分，音長平均誤差24ms、音量平均誤差1.83dB，在外部測試部分，音長平均誤差45ms、音量平均誤差2.22dB， 另外我們實驗了目前語音合成上很少討論到的語者調適，希望在韻律預估上，能錄製較少的訓練資料再由另一語者的韻律訊息，經過計算來預估其韻律訊息。在我們的實驗中，由5句（52個音節）電話號碼所求得的韻律，其音長的平均誤差44ms，音量的平均誤差2.26dB；由10句（107個音節）電話號碼所求得的韻律，其音長的平均誤差35ms，音量的平均誤差2.03dB；由20句（253個音節）電話號碼所求得的韻律，其音長的平均誤差29ms，音量的平均誤差1.99dB。In this thesis, We developed a Mandarin telephone number text-to-speech system with speaker adaptation. We use some parameters to predict prosody in a hierarchical way. The parameters of prosody include the numbers before and after the target number, segment information, and the number of syllables. We use the above parameters predict duration, volume, and pause. For the duration production model, the average errors of inside test and outside test are 24ms and 45ms, respectively. For volume production model, the average errors of inside test and outside test are 1.83dB and 2.22dB, respectively. In addition, we test speaker adaptation in our text-to-speech system, We try to use a speaker's prosody to predict that of someone else who has only few training data. In our test, the average errors in duration is 44ms and the average errors in volume is 2.26dB with 5 sentences(52 syllables) in the training data; the average errors in duration is 35ms and the average errors in volume is 2.03dB with 10 sentences(107 syllables) in the training data; and the average errors in duration is 29ms and the average errors in volume is 1.99dB with 20 sentences(253 syllables) in the training data. URI: http://hdl.handle.net/11455/18370 Appears in Collections: 應用數學系所