Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/7912
標題: 極低位元率語音壓縮方法之研究-使用閉迴路語音轉換法
Very Low Bit Rate Speech Compression Using Closed-Loop Speech Transformation Methods
作者: 王建文
Wang, Jian-Wen
關鍵字: speech synthesis;語音合成;LP analysis;PSOLA;線性預測分析;基頻同步疊加合成
出版社: 電機工程學系所
引用: [1] 王小川, ”語音訊號處理”, 全華, 2005 [2] WAI C. CHU, ” SPEECH CODING ALGORITHMS Foundation and Evolution of Standardized Coders”, Wiley-Interscience, 2003 [3] Thomas F. Quatieri, “Discrete-Time Speech Signal Processing PRINCIPLES AND PRACTICE”, PRENTICE HALL, 2005 [4] Eric MOULUNES and Francis CHARPENTIER, “PITCH-SYNCHRONOUS WAVEFORM PROCESSING TECHNIQUES FOR TEXT-TO-SPEECH SYNTHESIS USING DIPHONES”, speech communication, vol. 9, pp.454-467, 1990 [5] Kechu Yi, Jun Cheng, Anliang Wang, Pu Zhang, Feng Liu, Weiying Li, Bin Yang, Shuanyi Du, Jim Gong, “A VOCODER BASED ON SPEECH RECOGNII''I''ION AND SYNTHESIS”, Global Telecommunications Conference, 1995. GLOBECOM ''95., IEEE [6] V. Ramasubramanian and D. Harish, “AN OPTIMAL UNIT-SELECTION ALGORITHM FOR ULTRA LOW BIT-RATE SPEECH CODING”, Speech and Signal Processing,pp.541-544, 2007 [7] Marc Padellini, François Capman, and Geneviève Baudoin, “VERY LOW BIT RATE (VLBR) SPEECH CODING AROUND 500 BITS/SEC”, XII. European Signal Processing Conference ,September, 2004 [8] K.S. Lee, R. Cox, “A very low bit rate speech coder based on on a recognition/synthesis paradigm,” IEEE Trans. SAP, Vol.9, N°5, pp. 482-491, July 2001 [9] K.S. Lee, R. Cox, “A segmental speech coder based on a concatenative TTS,” Speech Communication, Vol.38, pp. 89-100, 2002 [10] G.Baudoin, F.El Chami, “Corpus based very low bit rate speech coding,” Proc. ICASSP-03, pp. 792-795, 2003. [11] Anssi Rämö, Jani Nurminen, Sakari Himanen, and Ari Heikkinen, “Segmental Speech Coding Model for Storage Applications”, In Proceedings of Interspeech 2004 ICSLP International Conference on Spoken Language Processing. Jeju Island, South Korea. [12] Heng-Chou Chen, Chin-Yung Chen, Kui-Ming Tsou, Oscul T.-C. Chen, “A 0.75 Kbps Speech Codec Using Recognition and Synthesis Schemes”, Speech Coding For Telecommunications Proceeding, pp.27-28, 1997 [13] A. K. Katsaggelos, L. P. Kondi, F. W. Meier, J. Ostermann, andG. M. Schuster, “MPEG-4 and rate-distortion-based shape-coding techniques,” Proc. IEEE, Special Issue Part Two: Multimedia Signal Processing, vol. 86, no. 6, pp. 1126–1154, June 1998 [14] Praat objects, http://www.praat.org
摘要: 
傳統基於線性預測分析的語音編碼技術,受限於預測誤差訊號的編碼需求,使得壓縮位元率很難達到每秒1仟位元以下之目標,且使得解壓縮後還原之語音保有自然的音質。在本論文中,我們將嘗試結合語音合成系統中使用的韻律調整技術,降低線性預測誤差的儲存需求,使得整體壓縮倍率能夠得到提升。我們將採用TD-PSOLA演算法,以合成方式重建線性預測誤差訊號。由於TD-PSOLA是依據相同的線性關係調整合成訊號的音長,因此無法針對一段語音中不同的段落給予不同的音長調整參數,使得直接應用TD-PSOLA來合成線性預測誤差訊號,會與實際處理訊號有極大的偏差,進而造成還原語音音質的降低。針對此問題,我們將提出閉迴路是搜尋方法,來改進TD-PSOLA的處理結果,使得無論是重建的預測誤差及還原語音的品質,都能得到大幅的提升。

Conventional linear-prediction-based speech coding algorithm suffer form the requirement of encoding the prediction error signal, which makes it difficult to achieve the goal of having the overall bit rate to be less than 1kbps.
In this thesis, we propose a new method to compress the linear prediction error signal. The main idea of the proposed method is to combine the prosody modification technique commonly used in speech synthesis systems. By re-producing the prediction error signal using the TD-PSOLA algorithm for speech prosody modification, the storage requirement of the prediction error signal.
URI: http://hdl.handle.net/11455/7912
其他識別: U0005-0908200817471500
Appears in Collections:電機工程學系所

Show full item record
 

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.