Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/18762
標題: 探討K最近鄰居法及隨機投影法對中文單音辨識之影響
Investigation of the Speech Recognition for Mandarin Word by Methods of K-Nearest Neighbors and Random Projection
作者: 林逸輝
Lin, Yi-Haui
關鍵字: k最近鄰居法;k-nearest neighbor method;梅爾頻率倒頻譜係數;隨機投影法;Mel-frequency cepstrum coefficient;random projection.
出版社: 統計學研究所
引用: [1] B. P. Bogert, M. J. R. Healy, and J. W. Tukey: "The quefrency alanysis of time series for echoes: cepstrum, pseudo-autocovariance, cross-cepstrum, and saphe cracking", Proceedings of the Symposium on Time Series Analysis (M. Rosenblatt, Ed), Chapter 15, 209-243. New York: Wiley, 1963. [2] Bilginer Gulmezoglu, M., Dzhafarov, V., Keskin, M. and Barkana, A., (1999), "A novel approach to isolated word recognition", IEEE Trans. on Speech and Audio Processing, Vol. 7, no. 6, pp. 620-628. [3] Cover, T. and Hart, P., (1967), " Nearest neighbor pattern classification", IEEE Trans. on Information Theory, Vol. 13, no. 1, pp. 21-27. [4] Childers, D. G., Skinner, D. P. and Kemerait, R. C., (1977), "The cepstrum: A guide to processing", Proceedings of the IEEE, Vol. 65, no.10, pp. 1428-1443. [5] Gulmezoglu, M. B., Dzhafarov, V. and Barkana, A., (2001), "The common vector approach and its relation to principal component analysis", IEEE Trans on Speech and Audio Processing, Vol. 9, no. 6, pp. 655-662. [6] Keskin, M. Gulmezoglu, M. B. Parlaktuna, O. and Barkana, A. (1996), “Isolated word recognition by extracting personal differences”, in Proc. 6th Int.Conf. Signal Processing Applications and Technology, Boston, MA, pp. 1989-1992. [7] Rabiner, L. R. and Sambur, M. R., (1975), "An algorithm for determining the endpoints of isolated utterances", The Bell System Technical Journal, Vol. 54, no. 2, pp. 297-315. [8] Takiguchi, T., Yoshii, M., Ariki, Y. and Bilmes, J., (2012, March), "Acoustic model transformations based on random projections", International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 1933-1936. [9] 王小川 (2004),“語音訊號處理”。台北市:全華。 [10] 王國榮 (2000),“Visual Basic 6.0 實戰講座”。台北市:旗標。 [11] 吳明哲,黃世陽 (1998),“Visual Basic 6.0 中文版學習範本”。台北市:松崗。 [12] 陳宛余,李宗寶 (2011),“探討梅爾頻率倒頻譜係數之特徵擷取對國語子音之影響”。碩士論文,國立中興大學統計學研究所,台中。 [13] 陳佳妤,李宗寶 (2011),“探討梅爾頻率倒頻譜係數之特徵擷取對國語母音之影響”。碩士論文,國立中興大學統計學研究所,台中。 [14] 楊鎮光 (2002),“Visual Basic 與語音辨識” 台北市:松崗。 [15] 張國清,李宗寶 (2005),“用K-means之動態時間軸校正法於國語數字之語音辨識”。碩士論文,國立中興大學應用數學研究所,台中。 [16] 鍾靖爵,李宗寶 (2011),“利用共同向量法以及最佳梅爾頻率倒頻譜之特徵辨識特定語者之中文單音”。碩士論文,國立中興大學統計學研究所,台中。 [17] 羅璟義,李宗寶 (2009),“利用權重式共同向量法於中字彙之特定語者中文單音辨識”。碩士論文,國立中興大學應用數學研究所,台中。
摘要: 
本篇論文主要探討特定語者對於1391個中文單音與母音之辨識率。辨識流程主要分成三部分:首先將錄製好的語音資料進行前處理,如端點偵測、切割音框、預強調、視窗化等,接著利用梅爾頻率倒頻譜係數求取特徵值,最後以K最近鄰居法建構語音模型並進行比對。K最近鄰居法之優點為方法簡易且計算速度快,除應用在語音辨識,亦能應用在影像辨識。本論文之方法也將與隨機投影法相比較,觀察給定不同的實驗因子,如訓練音樣本數、子母音權重等參數組合,從中選出辨識表現最好的組合。本次實驗的語音資料庫是由十三位不同語者所錄製。實驗結果發現固定特徵值維度為39,取樣點為512,訓練音樣本數為2組,子、母音權重同為0.5時,本方法將得到平均單音辨識率達82.7%,純母音辨識率達 88.9%;然而,隨機投影法之平均單音辨識率為 82.2%,純母音辨識率為 89.7%。結果發現兩者在語音辨識上差異不大。

The aim of this paper is to discuss the recognition of 1391 mandarin consonant words and their vowel. The recognition process can mainly separate into three parts. First, we make the vocal data doing fore-process, such as endpoint detecting and frame cutting. Secondly, transform it into feature by Mel-frequency cepstrum coefficient. Last, construct the speech model by the method of k-nearest neighbor. K-nearest neighbor is a simple method, which the calculation is easy and fast. Furthermore, it can be applied on digital image recognition. In this paper, the method of random projection is also used to compare with the result from k-nearest neighbor. To obtain the optimal performance on speech recognition, the experiments are given different factors and the speech database is recorded by thirteen speakers. Through experiments, the optimal result is obtained as two sets of training sample and weights of vowel and consonant are 0.5. The highest average rate of isolated word recognition reached 82.7% and 82.2% for methods of k-nearest neighbor and random projection, and the corresponding vowel recognition rate is 88.9% and 89.7%. Hence the difference between two methods is not significant.
URI: http://hdl.handle.net/11455/18762
其他識別: U0005-1807201313172400
Appears in Collections:統計學研究所

Show full item record
 
TAIR Related Article

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.