Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/4851
標題: 混合型特徵值擷取之語音辨識系統
A mixed-type feature extraction system for speech recognition
作者: 尤譯鋒
Yu, Yi-Fung
關鍵字: feature;特徵值;Gamma;Mixed-type;cepstral mean and variance normalization;伽瑪;混合型;倒頻譜平均值與變異數正規化法
出版社: 通訊工程研究所
引用: [1] J. M. Naik, “Speaker Verification: A Tutorial,” IEEE Communication Magazine, 28, 1, pp.42-48 (1 990) [2] F. Bimbot, J. F. Bonastre, C. Fredouille, G. Gravier, M. C. Ivan, S. Meignier, T. Merlin, O. G. Javier, P. D. Dijana, and D. A. Reynolds, “A Tutorial on Text-independent Speaker Verification,” EURASIP Journal on Applied Signal Processing 2004:4, pp. 430-451, 2004. [3] L. Rabiner, and B. H. Juang, “Fundamentals of Speech Recognition,” Prentice-Hall International, Inc., 1993. [4] J. P. Campbell, Jr, “Speaker Recognition: A Tutorial,” IEEE Invited Paper, Proceedings of The IEEE, Vol. 85, No. 9, pp. 1-26, September 1997. [5] T. F. Quatieri, and Massachusetts Institute of Technology Lincoln Laboratory, “Discrete-Time Speech Signal Processing Principles and Practice,” Pearson Education Taiwan Ltd, 2005. [6] B. R. Wildermoth, “Text-independent Speaker Recognition Using Source Based Features,” Master of Philosophy, Griffith University, Australia, January 2001. [7] B. R. Wildermoth, “Text-independent Speaker Recognition Using Source Based Features,” Master of Philosophy, Griffith University, Australia, January 2001. [8] D. S. Reynold and R. C. Rose “Robust Test-independent Speaker Identification Using Gaussian Mixture Speaker Models,” IEEE Transactions on Speech and Audio Processing, Vol. 3, No. 1, January 1995. [9] KSR Murty and B. Yegnanarayana, ”Combining Evidence From Residual Phase and MFCC Features for Speaker Recognition,” IEEE Signal Processing Letters, Vol. 13, No. 1, pp. 52-55, January 2006. [10] A. Mezghani, and D. O’Shaughnessy, “Speaker Verification Using a New Representation Based on a Combination of MFCC and Formants,” CCECE/CCGEI, Saskatoon, pp. 1461-1464, May 2005. [11] K. Chen, Senior Member, IEEE, “On the Use of Different Speech Representations for Speaker Modeling,” IEEE Transactions on Systems, MAN, and Cybernetics-Part C: Applications and Reviews, Vol. 35, No. 3, pp. 301-314, August 2005. [12] S. Haykin, “Communication Systems 4th Edition,” John Wiley & Sons, Inc., 2001. [13] J.L.Shen, J.W.hung, and L.S.Lee, “Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments”, Int. Conf. on Spoken Lang. Processing, 1998, pp.1-4 [14] Q. Li, Senior Member, IEEE, J. Zheng, A. Tsai, and Q. Z., Member, IEEE, “Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition,” IEEE Transactions on Speech and Audio Processing, Vol. 10, No. 3, pp. 146-157, March 2002. [15] Y. Linde, A. Buzo and R.M. Gray, “An Algorithm for Vector Quantizer Design,” IEEE Trans. Comm., Vol. COM 28, pp. 84-95, Jan. 1980. [16] J. GRodriguez J.O. Garcia Cesar Martin and Luis Hernandez “Increasing Robustness In GMM Speaker Recognition System for noisy and reverberant Speech eith Low complexity Microphone Arrays” [17] A.Acero and X.Huang “Augmented Cepstral Normalization fo Robust Speech Recognition”. [18] M. Stengel, “ Introduction to Graphical Models, Hidden Markov Models and Bayesian Networks, ” Yoyohoshi, 441-8580 Japan March 7th, 2003. [19] 楊鎮光,” Visual Basic 語音辨識”,松崗出版,pp3-34-36,2002年6月 [20] H. Matsumoto and M. Moroto, “Evaluation of Mel-LPC Cepstrum in A Large Vocabulary Continuous Speech Recognition,” Proc. ICASSP, vol. 1, pp. 117–120, 2001. [21] J.L.Shen, J.W.hung, and L.S.Lee, “Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments”, Int. Conf. on Spoken Lang. Processing, 1998, pp.1-4 [22] R.Jang (張智星) Audio Signal Processing and Recognition (語音處理與辨識) http://neural.cs.nthu.edu.tw/jang/books/audioSignalProcessing/index.asp [23] 王小川,”語音訊號處理”,全華出版,2005年2月 [24] D. S. Reynold and R. C. Rose “Robust Test-independent Speaker Identification Using Gaussian Mixture Speaker Models,” IEEE Transactions on Speech and Audio Processing, Vol. 3, No. 1, January 1995. [25] A. Martin, G. Doddington, T. Kamm, M. Ordowski, M. Przybocki, “The DET Curve in Assessment of Detection Task Performance,” IEEE. [26] H. Matsumoto and M. Moroto, “Evalution of Mel-LPC Cepstrum in A Large Vocabulary continuous Speech Recognition,”IEEE,pp.117-120,2001 [27] Rabiner, L. and B.H. Juang, “Fundamentals of Speech Recognition” Prenrice-Hall, 1993. [28] 謝忠穎,”An Improved Speaker Verification System Using Orthogonal GMM,National Chung Hsing University 2006” [29] Stefan Strahl 、Alfred Mertins, “Sparse gammatone signal model optimized for English speech does not match the human auditory filters” Carl von Ossietzky University, 2007 [30] Stefan Strahl 、Alfred Mertins, “Analysis and design of gammatone signal models” Carl von Ossietzky University, 2009 [31] A. G. Katsiamis, E. M. Drakakis, and R. F. Lyon, ” Practical Gammatone-Like Filters for Auditory Processing”, Imperial College London, South Kensington Campus, 2007 [32] 維基百科知識網, http://en.wikipedia.org/wiki/MIPS_architecture [33] Nithya Sundaram, Brett Y. Smolenski, and Robert E. Yantorno, “INSTANTANEOUS NONLINEAR TEAGER ENERGY OPERATOR FOR ROBUST VOICED – UNVOICED SPEECH CLASSIFICATION”, Temple University, 2007
摘要: 
三種語音特徵值擷取方法,分別為-線性預估參數 (Linear Predictive Coding, LPC),梅爾倒頻譜參數 (Mel –frequency Cepstral Coefficients, MFCC)以及伽瑪倒頻譜參數 (GammaTone Cepstral Coefficients, GTCC)。 線性預估參數是將語音訊號中的共振峰剃除,估計保留語音訊號的強度以及頻率,梅爾倒頻譜參數是將語音訊號轉移至頻譜,並將頻率對照到梅爾頻譜上而伽瑪倒頻譜參數也是將語音訊號轉移至頻譜上,利用頻帶上最大頻率和最小頻率以及通道數目求出重疊部分,在利用重疊部份算出各個通道的中心頻率,伽瑪分佈之圖型給予不同的比重覆蓋於頻譜之上。依照每種特徵值擷取方法的特性,在本論文中將提出一種混合式的特徵值擷取演算法,依一定的比例,給與不同權重的數量,並整合於特徵值擷取演算法之中,再使用倒頻譜平均值與變異數正規化法,最後利用HMM (Hidden Markov model)求其機率值,並與設定之門檻值比較,藉此判斷是否為正確語者之語音。

使用混合式特徵值擷取演算法比單一特徵值擷取演算法,在語音文本為1234情況下男生平均提升14.2%,女生平均提升9.34%,而語音文本為ABCD情況下男生平均提升17.73%,女生平均提升15.87%。

A mixed-type feature extraction algorithm used three kinds of feature extraction method, respectively there are LPC, MFCC and GTCC. The use of LPC can remove formant and estimate amplitude and frequency of speech signal. The use of MFCC can transfer voice from frequency domain onto mel-frequency domain. The triangular mel-filters in the filter bank are placed in the frequency axis and given different weight. The GTCC used maximum frequency, minimum frequency and number of channels to calculate overlap level. The overlap level is used to find a frequency center. Each Gamma distribution is given different weight to cover the spectrum.

In this thesis, a mixed-type feature extraction algorithm base on the three feature characteristics is proposed. Different feature extraction method is given different weights. The cepstral mean and variance normalization technique are also used for the mixed-type feature extraction algorithm. We also use HMM(Hidden Markov model) to compare the probability and to determine the threshold value.

From the experiment, it shows that using the mixed-type feature extraction algorithm is better than using a sigle feature extraction alone. In the text “1234” experiment, the verification rate for man is increasing by 14.2% and for female is increasing by 9.34%. In the text “ABCD” experiment, the verification rate for man is increasing 17.73% and for female is increasing 15.87%.
URI: http://hdl.handle.net/11455/4851
其他識別: U0005-0108201111221600
Appears in Collections:通訊工程研究所

Show full item record
 
TAIR Related Article

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.