Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/6447
標題: 結合遞迴模糊濾波器與隱藏式馬可夫模型執行噪音下之鳥鳴與人類語言辨識
Combination of Recurrent Fuzzy Filter and Hidden Markov Model for Noisy Birdsong and Human Speech Recognition
作者: 陳泰茂
Chen, Tai-Mao
關鍵字: Birdsong Recognition
鳥鳴辨識
Human Speech Detection and Recognition
Wavelet Transform
語音切割與辨識
小波轉換
出版社: 電機工程學系所
引用: [1] A. L. McIlraith and H. C Card, “A comparison of backpropagation and statistical classifiers for bird identification,” Prof. Of IEEE Int. Conf. Neural Networks, vol. 1 pp. 100-104, 1997. [2] A. L. McIlraith; H. C. Card, “Bird song identification using artificial neural networks and statistical analysis,” Proc. IEEE Int. Conf. Electrical Computer Engineering, vol. 1, pp. 63-66, 1997. [3] A. L. McIlraith; H. C. Card, “Birdsong recognition using backpropagation and multivariate statistics,” IEEE Trans. Signal Processing, vol. 45, pp. 2740-2748, 1997. [4] C. Kwan; G. Mei; X. Zhao; Z. Ren; R. Xu; V. Stanford; C. Rochet; J. Aube; K.C. Ho, “Bird classification algorithms: theory and experimental results,” IEEE CNF, Acoustics, Speech and Signal Processing, vol. 5, pp. 289-292, 2004. [5] P. Somervuo; A. Harma, ”Bird song recognition based on syllable pair histograms,”IEEE CNF, Acoustics, Speech and Signal Processing, vol. 5, pp. 825-828, 2004. [6] A. Harma; P. Somervuo, ”Classification of the harmonic structure in bird vocalization,” IEEE CNF, Acoustics, Speech and Signal Processing, vol. 5, pp. 701-704, 2004. [7] L. R. Rabiner and M. R. Sambur, “Voiced-unvoiced-silence detection using the Itakura LPC distance measure,” Proc. of IEEE Int. Conf. Aconstics, Speech, signal processing, pp. 323-326, 1977. [8] L. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993. [9] R.P. Lippmann, “Review of neural networks for speech recognition,” Neural Computation, vol. 1, pp. 1-38, 1989. [10] J. C. Junqua, “Robustness and cooperative multimodal man-machine communication applications,” Proc. Second Venaco Workshop and ESCA ETRW, 1991 [11] J. C. Junqua, B. Mak, B. Reaves, “A robust algorithm for word boundary detection in the presence of noise,” IEEE Trans. Speech and Audio Processing, vol. 1, pp. 406-412., 1994. [12] L. Karray, C. Mokbel, J. Monne, “Solutions for robust speech/non-speech detection in wireless environment,” Proc. of 4th IVTTA, pp. 166-170, 1998. [13] F. Lamel, L. Rabiner R., A. Rosenberg E., J. Wilson G., “An improved endpoint detector for isolated word recognition,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 29 , pp. 777- 785, 1981. [14] M. H Savoji,. “A robust algorithm for accurate end-pointing of speech signals;” Speech Communication. vol. 8,pp. 45-60, 1989. [15] J. A. Haigh, J. S. Mason,. “Robust voice activity detection using cepstral features,” Proc. IEEE Region 10 Conf. TENCON, vol. 3 , pp. 321 - 324, 1993. [16] J. Rouat, Y.C. Liu, D. Morissette, “Pitch determination and voiced/unvoiced decision algorithm for noisy speech,” Speech Communication, vol. 21, pp. 191-207, 1997. [17] G. D. Wu, C. T. Lin, “Word boundary detection with Mel-scale frequency bank in noisy environment,” IEEE Trans. Speech and Audio Processing vol. 8, pp. 541-553, 2000. [18] G. D. Wu, C. T. Lin, “A recurrent neural fuzzy network for word boundary detection in variable noise-level environments,” IEEE Transactions on systems, Man, and cybernetics, vol. 31, pp. 84-97, 2001. [19] F. Britelli, S. Casale, A. Cavallaro, “Robust voice activity detector for wireless communications using soft computing,” IEEE Selected Areas In Communication, vol.16, pp. 1818-1829, 1998. [20] Y. Qi, B. R. Hunt, “Voiced-unvoiced-silence classification of speech using hybrid features and a network classifier,” IEEE Trans. Speech and Audio Processing, vol. 1,pp. 250-255, 1993. [21] D. Enqing, L. Guizhong, Z. Yatong, and Z. Xiaodi, “Applying support vector machines to voice activity detection,” Prof. IEEE Int, Conf. Signal Processing, pp. 1124-1127, 2002. [22] C. F. Juang and C. T. Lin, “An on-line self-constructing neural fuzzy inference network and its applications,” IEEE Trans, Fuzzy Systems, vol. 6, pp. 12-32, 1998. [23] C. F. Juang and C. T. Lin, “A recurrent self-organizing neural fuzzy inference network,” IEEE Trans, Neural Networks, vol. 10, pp. 828-845, 1999. [24] Y. Gong, “Speech recognition in noisy environments: a survey,” Speech Communication, Vol. 16, pp. 261-291, 1995. [25] S. Tamura and A. Waibel, “Noise reduction using connectionist models,” Proc. of the Int. Conf. of Acoust., Speech and Signal Processing, New York, pp. 553-556, Apr. 1988. [26] H. B. D. Sorensen, “A cepstral noise reduction multi-layer neural network,” Proc. IEEE Int. Conf. Acoust., Speech and Signal Processing, pp. 933-936, 1991. [27] C.F. Juang and C.T. Lin, “Noisy speech processing by recurrently adaptive fuzzy filter,” IEEE Trans. Fuzzy Systems, Vol. 9, No. 1, pp. 139-152, Feb, 2001. [28] V. Gorrini and H. Bersini, “Recurrent fuzzy systems,” Proc. IEEE Int. Conf. Fuzzy Systems, Orlando, FL, vol. 1, pp. 193-198, 1994. [29] The HTK Toolkit, Version 3.2, 2001-2002 Cambridge University Engineering Department. [30] A. Varga and H. J. M. Steeneken, “Assessment for automatic speech recognition: II. NOISE-92: “A database and an experiment to study the effect of additive noise on speech recognition systems,” Speech Commun., vol. 12, pp. 247-251, 1993. [31] Y. T. Chan, Wavelet Basics, Kluwer Academic Publishers, 1995. [32] J. B. Allen, “Cochlear modeling,” IEEE Acoust., Speech, Signal Processing Mag. 2, pp. 3-29, 1985. [33] V. Vapnik, The Nature of Statistical Learning Theory, New York : Springer - Verlag, 1995. [34] C. Cortes, V. Vapnik, “Support vector networks,” International Journal on Machine Learning, vol. 20, pp. 1-25, 1995. [35] S. -H. Chiu, “Skin Color Image Segmentation by Support Vector Machine-aided Self Organizing Fuzzy Network,” Master Thesis of National Chung Hsing University, Taichung City, Taiwan ROC. [36] J. S. Roger Jang, and C. T. Sun, “Functional Equivalence Between Radial Basis Function Networks and Fuzzy Inference System, IEEE Trans. Neural Networks, vol. 4, no. 1, pp. 156-159, January 1993. [37] Y. -P. Huang; T. Tsai, “A fuzzy semantic approach to retrieving bird information using handheld devices,” IEEE JNL, Intelligence Systems , vol. 20, pp. 16-23 Feb, 2005. [38] E. Khan, “Recurrent fuzzy logic in speech recognition,” IEEE CNF, Microelectronics Communications Technology Producing, pp. 602-607 Nov, 1995.
摘要: 本論文提出結合遞迴模糊濾波器(RFNF)與隱藏式馬可夫模型(HMM)執行噪音環境下之鳥鳴與人類語音辨識。在此論文中,鳥鳴及人類語音辨識均是採用相同的辨識方法。針對鳥鳴辨識而言,我們使用能量參數從鳥鳴序列中切割出重要的鳥鳴部分。在被切出鳥鳴的部分求出每一個音框的線性預估係數來當作特徵向量,這些特徵向量作為隱藏式馬可夫模型的輸入特徵參數。一般而言,在實際戶外環境中的鳥鳴,通常會被ㄧ些變動雜訊或是其它干擾所影響,以至於降低辨識的性能。為了解決此問題,我們提出了RFNF-HMM辨識器,其中每一個HMM連接一個RFNF在特徵空間上作為濾除雜訊用。實驗中我們是採用在台灣十種鳥類的鳥鳴去作為辨識的依據。另外,對於在變動噪音環境下之人類語音辨識,我們利用小波能量與越零率來作為語音切割的切割參數。利用三種智慧型網路當作切割器,並且比較它們各別的性能,分別是遞迴模糊類神經網路(RSONFIN)、結合模糊分群輔助支持向量機學習之模糊系統(FS-FCSVM)、支持向量機之高斯模式(SVM)。語音切割之後,對每一個音框求得倒頻譜係數作為RFNF-HMM的輸入特徵參數。在實驗中,藉由ROC曲線以及辨識率來計算語音切割的性能。實驗結果顯示出此切割參數有很不錯的性能,並與RTF參數作比較,也顯現出人類語言辨識在其它不同雜訊之變動噪音環境下也有不錯的效果。
Birdsong and Human speech recognition by Recurrent Fuzzy Network Filtered Hidden Markov Model (RFNF-HMM) in variable noise-level environments is proposed in this thesis. Birdsong and Human Speech are generated in a similar way and a similar recognition approach is proposed in this thesis. For birdsong recognition, the energy parameter is used to segment a significant portion from a birdsong sequence. Then, the linear predictive coding (LPC) coefficients of each frame in the segmented birdsong are extracted and used as feature vectors. These feature vectors are fed as inputs to HMM recognizers. Birdsong in outside practical environments are usually corrupted by non-stationary noise or other interference, which degrades recognition performance. To handle this problem, RFNF-HMM recognizer is proposed, where each HMM is connected with a RFNF for noise filtering in feature domain. Experiments in recognition of ten species of birds in Taiwan according to birdsongs are performed. For Human speech recognition in variable noise-level environments, Wavelet Energy (WE) and Zero Crossing Rate (ZCR) are proposed as detection parameters for word boundary detection. Three kinds of Intelligent Learning Networks (ILN) are used as detectors with their performance being compared. They are RSONFIN, Fuzzy System learned through the combination of Fuzzy Clustering and Support Vector Machine (FS-FCSVM), and Gaussian kernel SVM. After words detection, the cepstral coefficients are used as features and RFNF-HMM are used as recognizers. In experiments, words detection performance is evaluated both by Receiver Operator Characteristic (ROC) curves and recognition rates. Experimental results show that the proposed detection parameters are robust and effective with comparisons to the Refined Time-Frequency (RTF) parameters. Admiring recognition rates in human speech recognition with different types of unknown noises in variable noise-level environments are achieved in experiments.
URI: http://hdl.handle.net/11455/6447
其他識別: U0005-1707200614581600
文章連結: http://www.airitilibrary.com/Publication/alDetailedMesh1?DocID=U0005-1707200614581600
Appears in Collections:電機工程學系所

文件中的檔案:

取得全文請前往華藝線上圖書館



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.