Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/19907
標題: 中文語音合成系統中基週調整和連音處理之研究
A Study on Pitch Adjustment and Coarticulation Processing for Mandarin Speech Synthesis Systems
作者: 劉榮現
Liu, Jung-Hsien
關鍵字: 語音合成
Yuang-Chin Chiang
音高調整
連音
線性預估編碼
基週標記
Hung-Yan Gu
出版社: 資訊科學與工程學系所
引用: [1] H. Bao, A. Wang, and S. Lu, “A Study of Evaluation Method for Synthetic Mandarin Speech”, Proceedings of ISCSLP 2002, pp. 383-386, Taipei, Taiwan. [2] John R. Deller, Jr. John G. Proakis, and John H.L. Hansen, “Discrete-Time Processing of Speech Signals”, MACMILLAN 1993. [3] D.H. Klatt,“Software for a Cascade/Parallel Formant Synthesizer”, Journal ofthe Acoustical Society of America, Vol. 67, pp. 971-995, 1980. [4] Alan V. Oppenheim, Alan S. Willsky , with S. Hamid ,“ Signals and Systems 2nd Edition”, Prentice Hall, 1996 . [5] D. O’Shaughnessy, L. Barbeau, D. Bernardi, & D. Archambault,“Diphone speech Synthesis”, Speech Communication, Vol. 7, pp. 55-65, 1988. [6] V. Siivola,“A Survey of Methods for the Synthesis of the Singing Voice”,Presentation of the course S-89.155, Sound Synthesis, 2002. [7] N. Sripriya, P. Vijayalakshmi, C. Arun Kumar, and T. Nagarajan,“Estimation of Instants of Significant Excitation from Speech Signal using Temporal Phase Periodicity”, pp. 1-4, TENCON 2009 - 2009 IEEE Region 10 Conference . [8] H. Valbret, E. Moulines, and J. P. Tubach, “Voice Transformation Using PSOLA Technique”, IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 145-148, Mar 1992. [9] WaveSurfer http://sourceforge.net/projects/wavesurfer/ [10] 王小川,“語音訊號處理(修訂二版)” , 2009。 [11] 古鴻炎,蔡哲彰, ”使用反射係數之頻譜包絡內插方法”, 中華民國計算語言學會通訊第21卷第2期, pp. 6-16, 2010。 [12] 江振宇,蕭希群,余秀敏和廖元甫,“語音韻律簡介”, 中華民國計算語言學會通訊第十八卷第二期 ,pp. 5-19 , 2007年6月。 [13] 紀旺松,“時域上之基週波形調變方法研究”, 國立中興大學應用數學研究所資訊組碩士論文 , 1998。 [14] 張唐瑜,“以大量詞彙作為合成單元的中文文轉音系統”, 國立中興大學資訊科學研究所碩士論文, 2005。 [15] 張智星,"音訊處理與辨識",網路線上課程,可由作者之網頁 http://www.cs.nthu.edu.tw/~jang連結到此線上課程。 [16] 張維齡,“基於HNM之語音合成方法”, 國立宜蘭大學資訊工程研究所碩士論文, 2009。 [17] 黃志文,余明興,黃世陽和吳明哲, “中文連音二字詞之語音合成”, ROCLING 1996。 [18] 歐陽若,“使用基週同步非線性重新取樣技術來調變聲調”, 國立中興大學應用數學研究所資訊組碩士論文, 1996。 [19] 維基百科-線性預測編碼 http://zh.wikipedia.org/wiki/%E7%B7%9A%E6%80%A7%E9%A0%90%E6%B8%AC%E7%B7%A8%E7%A2%BC [20] 維基百科-貝茲曲線 http://zh.wikipedia.org/wiki/%E8%B2%9D%E8%8C%B2%E6%9B%B2%E7%B7%9A
摘要: 本論文乃為探討中文文轉音(Text-to-Speech, TTS)系統波形拼接式合成法的基週調整和連音處理之問題。 我們採用了習知的線性預估編碼(Linear predictive coding, LPC)語音合成技術。我們先利用一種精確率達98.97%動態調整在聲波找基週標記之方法,在聲波上找到基週標記,再利用找到基週標記的結果,求出較有週期同步的殘差,並利用一種我們發現的相位序列方法在殘差上求取基週標記。 為要實際將方法作測試,我們也制定了一套可將韻母長、聲母長、連音長、停頓長都可表示出來的一個字三段的音長表示法;並設計了一個可自動將聲韻母切開的方法,並用簡易聲波上的方法將音長增長及縮短。 在基週調整部分,我們利用動態時間調整(DTW)觀察了一個低音的殘差週期與一個高音的殘差週期對應,最終採取了線性對應來調整殘差週期長;連音處理部分則將反射係數(Reflection Coefficient)和殘差用餘弦波平方及正弦波平方作為混合比例作過渡來合成。 最後我們用指導教授錄的107句平衡句為目標句和21862個參考音,利用以上的方法使用參考音將目標句的韻律合成出來並比較其結果與效能。
This thesis is to explore the pitch adjustment and coarticulation processing for Mandarin text-to-speech (TTS) synthesis system using wave concatenation synthesis method. We used linear predictive coding (LPC), which is a well-known speech synthesis technology. First, we developed a method to find pitch marks in speech wave. This method uses a dynamic adjustment method, whose precision is 98.97%. Then we used the pitch marks in wave for getting more synchronous LPC residual. And we got the pitch marks in residual using a phase sequence method, which is a new method. For testing and verifying our method, we developed a three-section duration representation, which can represent vowel duration, consonant duration, coarticulation duration, and pause duration. And we designed a method to detect vowel and consonant automatically. Also we increased and shortened the duration simply. In pitch adjustment, we used dynamic time warping (DTW) to observe the mapping between a residual pitch in lower pitch sound and a residual pitch in higher pitch sound. Finally, we used linear mapping to adjust the residual pitch length. In coarticulation processing, we used cosine square and sine square as mixing ratio on reflection coefficient and residual to achieve transition. In the end, we used one hundred and seven balanced sentences as goal sentences and used twenty-one thousand eight hundred and sixty-two word sound as reference sound (synthesis units). And we tried the above method to synthesize goal sentences by using reference sound and compared the goal sentences and synthesized sentences.
URI: http://hdl.handle.net/11455/19907
Appears in Collections:資訊科學與工程學系所

文件中的檔案:

取得全文請前往華藝線上圖書館



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.