Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/19872
標題: 基於頻譜改造之中文語音合成
Mandarin Speech Synthesis based on Spectrum Reform
作者: 林尚毅
Lin, Shang-Yi
關鍵字: 語音合成
speech synthesis
頻譜調整
連音
spectrum reform
liaison
出版社: 資訊網路多媒體研究所
引用: [1]王小川, 語音訊號處理, 修訂二版, 全華圖書股份有限公司, 2009 [2]古鴻炎 周彥佐, 基於HNM之國語音節信號的合成方法, ROCLING 2007, 2007 [3]古鴻炎 蔡哲彰, 使用反射係數之頻譜包絡內插方法, ACLCLP會刊第二十一卷第二期, 2010 [4]張唐瑜, 以大量詞彙作為合成單元的中文文轉音系統, 國立中興大學資訊科學研究所碩士學位論文, 2005 [5]張瑜芸, Evaluation of TTS Systems in Intelligibility and Comprehension Tasks, ROCLING 2011, 2011 [6]廖皇量, 國語歌聲合成信號品質改進之研究, 國立台灣科技大學資訊工程系碩士學位論文, 2006 [7]劉榮現, 中文語音合成系統中基週調整與連音處理之研究,國立中興大學資訊科學與工程學系碩士學位論文, 2012 [8]謝欣汝 杜文祥 洪志偉, 基於離散餘弦轉換之語音特徵的強健性補償法, ROCLING 2011, 2011 [9]原始來源來自網路http://www.robotworld.org.tw/index.htm?pid=10&News_ID=4410 [10]張憲國, Fortran語言與數值計算課程網站, http://ocean.cv.nctu.edu.tw/NRCEST/fortran/ch10_4_2.htm [11]Alan V. Oppenheim, Ronald W. Schafer and John R. Buck, Discrete-Time Signal Processing, Second Edition, Prentice Hall International Inc., 1999 [12]Lawrence R. Rabiner and Ronald W. Schafer, Theory and Applications of Digital Speech Processing, Pearson Education Inc., 2010 [13]J. D. Markel and A. H. Gray Jr., Linear Prediction of Speech, New York : Springer-Verlag, 1976 [14]Yannis Stylianou, Harmonic plus Noise models for Speech, combined with Statistical Methods, for Speech and Speaker Modification, Ph.D. thesis, 1996 [15]MATLAB Product Documentation http://www.mathworks.com/help/toolbox/signal/ref/dct.html
摘要: 本論文主要在探討語音合成系統中的合成方法,我們將利用真人的聲音去進行韻律調整,期待能夠得到原音重現的合成音。而本論文的研究主軸在於韻律調整中的音高、音長、音量的調整與連音再造。 在音高、音長、音量調整中嘗試以經常被使用到的基週同步疊加法(Pitch Synchronous Overlap and Add, PSOLA)的架構為基礎,配合上離散傅立葉轉換(Discrete Fourier Transform, DFT)頻譜以及離散餘弦轉換(Discrete Cosine Transform, DCT)頻譜的改造,以期待能夠得到一個比基週同步疊加法更好聽的語音。在離散傅立葉轉換調整音高時,利用到弦波還原來改變弦波的頻率,跟一般的反傅立葉轉換(Inverse DFT)相比較不會失真。另外也利用語音合成較少用到的離散餘弦轉換,增加其頻譜解析度,能獲得更佳的聲音品質。 在連音再造部分,同樣使用了基週同步疊加法的架構,再配合上大家熟知的線性預估編碼(Linear Predictive Coding, LPC)來描述口腔模型,造出連音段的過渡頻譜,再利用頻譜合成出連音聲波。使得利用單音合成的語音合成器也能夠有真人發音時會產生的連音段。 最後利用音節(syllable)為單位的合成單元搭配上從真人的語句上截取下來的韻律參數,得到107句的合成句。利用這些合成句來進行兩種實驗,可辨度與自然度。可辨度是用召回率(recall)來計分,而自然度是用平均主觀分數(Mean Opinion Score, MOS)來評量。最後藉由這兩項指標來評斷這些方法的效能。
This thesis is to investigate the synthesis methods in a speech synthesis system. We adjust the prosody by using the sound of a real person to get a sound which is very similar to the original. The spindle of this thesis is prosody adjustment of pitch, duration, and volume, and liaison reproduction. In the adjustment of pitch, duration and volume, we reform the spectrum of discrete Fourier transform (DFT) and the spectrum of discrete cosine transform (DCT) based on the structure of pitch synchronous overlap and add (PSOLA) try to get better sound than that by using PSOLA. When using the discrete Fourier transform to adjust pitches, we use wave reconstruction to change the frequency of the wave and get less distortion than Inverse DFT. We also use the discrete cosine transform to increase the spectrum resolution to get better quality of sound. In liaison reproduction, we also take the structure of PSOLA as a basis, and use the well-known Linear Predictive Coding (LPC) to create a liaison transition spectrum. Then use spectrum to reconstruct liaison wave, make the sounds generated by syllable synthesis have liaison segment like human speech. In the end, we take the prosody information captured from human speech and the syllable synthesis unit to generate 107 sentences. Then we use them to judge two factors, intelligibility and comprehension. Intelligibility is judged by recall score, and comprehension is judged by Mean Opinion Score (MOS). We use them to assess the effect of those methods mentioned above and make a conclusion.
URI: http://hdl.handle.net/11455/19872
其他識別: U0005-1508201222423300
文章連結: http://www.airitilibrary.com/Publication/alDetailedMesh1?DocID=U0005-1508201222423300
Appears in Collections:資訊網路與多媒體研究所

文件中的檔案:

取得全文請前往華藝線上圖書館



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.