Please use this identifier to cite or link to this item:
Combination of Recurrent Fuzzy Network and Dynamic Time Warping for Mandarin Phrase Recognition
|關鍵字:||Mandarin Phrase Recognition;中文辭組辨識||出版社:||電機工程學系所||引用:||REFERENCES  D. J. Brems and B. L. Wattenbarger, “Dialog design for automatic speech recognition of telephone numbers and account numbers,” 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications. pp.117-120, Sept. 1994.  O. Viikki, “ASR in portable wireless devices,” IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 96-102, Dec. 2001  L. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993.  J.Wu and Q.Huo, “A Study of Minimum Classification Error (MCE) Linear Regression for Supervised Adaptation of MCE-Trained Continuous-Density Hidden Markov Models,” IEEE Trans on Speech and Audio Processing ,vol. 15. no. 2 , pp. 478-488, Feb. 2007.  C. T. Jen and S. Furui, “Predictive hidden Markov model selection for speech recognition,” IEEE trans on Speech and Audio Processing, vol. 13 , no. 3, pp. 377-387 , May. 2005 .  R.P. Lippmann, “Review of neural networks for speech recognition,” Neural Computation, vol. 1, pp. 1-38, 1989.  B. Petek, “On the predictive connectionist models for automatic speech recognition,” Proc. IEEE Int. Conf. Acoustic Speech and Signal Processing, pp. 3442-3445, 2000.  G. Z. Jin and W. J. Freeman , “Application of Novel Chaotic Neural Networks to Mandarin Digital Speech Recognition,” Int. Joint Conf. Neural Networks, pp.653-658, July 2006.  A. M. Ahmad , S. Ismail and D. F. Samaon , “Recurrent neural network with Backpropagation through time for speech recognition,” Proc. IEEE. Int .Sym. Communications and Information Technology, vol. 1, pp. 98-102, Oct. 2004 .  A. Waibel, T. Hanazawa, G. Hinton, K. Shiano, and K. Lang, “Phoneme recognition using time-delay neural networks,” IEEE Transactions on Acoustics Speech and Signal Processing, vol. 37, pp. 328-339, 1989.  L.Mesbahi and A.Benyetton, “Continuous speech recognition by adaptive temporal radial basis function,” Proc. IEEE. Int. Conf., vol.1, pp. 574-579, Oct. 2004.  J. B. Theocharis and G. Vachtsevanos, “Recursive learning algorithms for training fuzzy recurrent models,” Int. J. Intell. Syst., vol. 11, no. 12, pp. 1059-1098, 1996.  G. C. Mouzouri and J. M. Mendel, “Dynamic nonsingleton fuzzy logic systems for nonlinear modeling,” IEEE Trans. Fuzzy Syst., vol. 5, no. 2, pp. 199-208, 1997.  J. Zhang and A. J. Morris, “Recurrent neuro-fuzzy networks for nonlinear process modeling,” IEEE Trans. Neural Networks, vol. 10, no. 2, pp. 313-326, 1999.  P. A. Mastorocostas and J. B. Theocharis, “A recurrent fuzzy-neural model for dynamic system identification,” IEEE Trans. Syst., Man and Cyber., Part B: Cybernetics, vol. 32, no. 2, pp. 176-190, 2002.  C. H. Lee and C. C. Teng, “Identification and control of dynamic systems using recurrent fuzzy neural networks,” IEEE Trans. Fuzzy Systems, vol. 8, no. 4, pp. 349-366, 2000.  C. F. Juang, “A TSK-type recurrent fuzzy network for dynamic systems processing by neural network and genetic algorithm,” IEEE Trans. Fuzzy Systems, vol. 10, no. 2, pp. 155-170, 2002.  C. F. Juang , C. T. Chiou and C. L. Lai, “Hierarchical singleton-type recurrent neural fuzzy networks for noisy speech recognition,” IEEE Trans. Neural Networks, vol. 18, no. 3, pp. 833-843, May. 2007.  J. Tebelskis and A. Waibel, “Large vocabulary recognition using linked predictive neural networks,” Proc. IEEE. Int. Conf. Acoustics, Speech, and Signal Processing, vol.1. pp. 437-440, April. 1990.  K. I. Iso and T. Watanabe, “Speaker-Independent word recognition using a neural prediction model,”Proc. IEEE. Int. Conf. Acoustics, Speech, and Signal Processing, vol. 1. pp.441-444, April. 1990.  M. Trompf, “Neural network development for noise reduction in robust speech recognition,” Proc. Int. Joint Conf. Neural Networks, vol. 4, pp.722-727, 1992.  Y. Gao and J. P. Haton, “A hierarchical LPNN network for noise reduction and noise degraded speech recognition,” Proc. IEEE Int. Conf. Acoust., Speech and Signal Processing, vol. 2, pp. 89-92, April 1994.  A. Varga and H. J. M. Steeneken, “Assessment for automatic speech recognition: II. NOISE-92: A database and an experiment to study the effect of additive noise on speech recognition systems,” Speech Commun., vol. 12, pp. 247-251, 1993.  The Hidden Markov Model Toolkit (HTK), Cambridge University Engineering Department. [Online]. Available http://htk.eng.cam.ac.uk/ .  C. N. Cheng, Speech Detection in Variable Noise-Level Environment by Intelligent Learning Networks, Master thesis, National Chung-Hsing University, 2005.||摘要:||
本論文提出國字辭組的辨識利用DTW的原理基於SRNFN預測誤差,這樣的方法稱為DTW-SRNFN. SRNFN的遞迴迴歸的特性使其更適合處理時序上的語音訊號. 每一個辭組所包含的字彙均為單音節.SRNFN的訓練是對於單字去做訓練.n個SRNFN去模組化n個字彙,其中每一個SRNFN接收現在的音框特徵並且預估下ㄧ刻的音框特徵進而模組化字彙.每一個SRNFN的預測誤差被用來作為辨識的準則.在m個辭組的辨識中,對於辭組每一個音框的預測誤差利用各個已訓練的SRNFN來計算,之後形成了一個誤差矩陣.基於誤差矩陣,DTW被用來尋找一個最佳的路徑,映射在最佳被匹配的SRNFN對於每一個辭組的輸入音框中.每個辭組的累積誤差依據最佳的路徑被計算出來並且最小的累積誤差為辨識的結果.為了証實DTW-SRNFN的性能,實驗中30類辭組的辨識可分為57類單音的字彙來做處理.此外,對於處理加有不同程度的雜訊的語音辨識SRNFN輸入加有雜訊的特徵值作訓練.DTW-SRNFN的性能被用來與HMM做比較,結果顯示DTW-SRNFN可達到較HMM高的辨識率在乾淨與加有雜訊的環境中.最後,對於DTW-SRNFN與基於小波的強健語音切音方法的及時辨識已實現在基於PC的系統上.
This paper proposes Mandarin phrases recognition by Dynamic Time Warping (DTW) of Singleton-type Recurrent Neural Fuzzy Networks (SRNFN) prediction errors, and the method is called DTW-SRNFN. The recurrent property of SRNFN makes them suitable for processing temporal speech patterns. A Mandarin phrase comprises words each of which is monosyllabic. Training of SRNFN is based on the unit of words. There are SRNFN for modeling words, where each SRNFN receives the current frame feature and predicts the next one of its modeling word. The prediction error of each SRNFN is used as recognition criterion. In phrases recognition, the prediction errors of each trained SRNFN for each phrase frame is computed, resulting in an error matrix. Based on the error matrix, DTW is used to find an optimal path that maps the input frames to a best matched SRNFN (word) for each of the phrases. The accumulated error of each phrase is computed from its optimal path and the minimum one is the classified result. To verify the performance of DTW-SRNFN, experiments on recognition of 30 Mandarin phrases comprising 57 words are conducted. In addition, training of SRNFN with noisy features for noisy speech recognition with different types of noises is also conducted. Performance of DTW-SRNFN is compared with Hidden Markov Models (HMM). The results show that DTW-SRNFN achieves higher recognition rates than HMM in both clean and noisy environments. Finally, a PC-based system is set up for real-time implementation of the proposed DTW-SRNFN together with a wavelet-based robust speech detection method.
|Appears in Collections:||電機工程學系所|
Show full item record
TAIR Related Article
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.