Please use this identifier to cite or link to this item:
Combination of Recurrent Fuzzy Filter and Hidden Markov Model for Noisy Birdsong and Human Speech Recognition
|關鍵字:||Birdsong Recognition;鳥鳴辨識;Human Speech Detection and Recognition;Wavelet Transform;語音切割與辨識;小波轉換||出版社:||電機工程學系所||引用:|| A. L. McIlraith and H. C Card, “A comparison of backpropagation and statistical classifiers for bird identification,” Prof. Of IEEE Int. Conf. Neural Networks, vol. 1 pp. 100-104, 1997.  A. L. McIlraith; H. C. Card, “Bird song identification using artificial neural networks and statistical analysis,” Proc. IEEE Int. Conf. Electrical Computer Engineering, vol. 1, pp. 63-66, 1997.  A. L. McIlraith; H. C. Card, “Birdsong recognition using backpropagation and multivariate statistics,” IEEE Trans. Signal Processing, vol. 45, pp. 2740-2748, 1997.  C. Kwan; G. Mei; X. Zhao; Z. Ren; R. Xu; V. Stanford; C. Rochet; J. Aube; K.C. Ho, “Bird classification algorithms: theory and experimental results,” IEEE CNF, Acoustics, Speech and Signal Processing, vol. 5, pp. 289-292, 2004.  P. Somervuo; A. Harma, ”Bird song recognition based on syllable pair histograms,”IEEE CNF, Acoustics, Speech and Signal Processing, vol. 5, pp. 825-828, 2004.  A. Harma; P. Somervuo, ”Classification of the harmonic structure in bird vocalization,” IEEE CNF, Acoustics, Speech and Signal Processing, vol. 5, pp. 701-704, 2004.  L. R. Rabiner and M. R. Sambur, “Voiced-unvoiced-silence detection using the Itakura LPC distance measure,” Proc. of IEEE Int. Conf. Aconstics, Speech, signal processing, pp. 323-326, 1977.  L. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993.  R.P. Lippmann, “Review of neural networks for speech recognition,” Neural Computation, vol. 1, pp. 1-38, 1989.  J. C. Junqua, “Robustness and cooperative multimodal man-machine communication applications,” Proc. Second Venaco Workshop and ESCA ETRW, 1991  J. C. Junqua, B. Mak, B. Reaves, “A robust algorithm for word boundary detection in the presence of noise,” IEEE Trans. Speech and Audio Processing, vol. 1, pp. 406-412., 1994.  L. Karray, C. Mokbel, J. Monne, “Solutions for robust speech/non-speech detection in wireless environment,” Proc. of 4th IVTTA, pp. 166-170, 1998.  F. Lamel, L. Rabiner R., A. Rosenberg E., J. Wilson G., “An improved endpoint detector for isolated word recognition,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 29 , pp. 777- 785, 1981.  M. H Savoji,. “A robust algorithm for accurate end-pointing of speech signals;” Speech Communication. vol. 8,pp. 45-60, 1989.  J. A. Haigh, J. S. Mason,. “Robust voice activity detection using cepstral features,” Proc. IEEE Region 10 Conf. TENCON, vol. 3 , pp. 321 - 324, 1993.  J. Rouat, Y.C. Liu, D. Morissette, “Pitch determination and voiced/unvoiced decision algorithm for noisy speech,” Speech Communication, vol. 21, pp. 191-207, 1997.  G. D. Wu, C. T. Lin, “Word boundary detection with Mel-scale frequency bank in noisy environment,” IEEE Trans. Speech and Audio Processing vol. 8, pp. 541-553, 2000.  G. D. Wu, C. T. Lin, “A recurrent neural fuzzy network for word boundary detection in variable noise-level environments,” IEEE Transactions on systems, Man, and cybernetics, vol. 31, pp. 84-97, 2001.  F. Britelli, S. Casale, A. Cavallaro, “Robust voice activity detector for wireless communications using soft computing,” IEEE Selected Areas In Communication, vol.16, pp. 1818-1829, 1998.  Y. Qi, B. R. Hunt, “Voiced-unvoiced-silence classification of speech using hybrid features and a network classifier,” IEEE Trans. Speech and Audio Processing, vol. 1,pp. 250-255, 1993.  D. Enqing, L. Guizhong, Z. Yatong, and Z. Xiaodi, “Applying support vector machines to voice activity detection,” Prof. IEEE Int, Conf. Signal Processing, pp. 1124-1127, 2002.  C. F. Juang and C. T. Lin, “An on-line self-constructing neural fuzzy inference network and its applications,” IEEE Trans, Fuzzy Systems, vol. 6, pp. 12-32, 1998.  C. F. Juang and C. T. Lin, “A recurrent self-organizing neural fuzzy inference network,” IEEE Trans, Neural Networks, vol. 10, pp. 828-845, 1999.  Y. Gong, “Speech recognition in noisy environments: a survey,” Speech Communication, Vol. 16, pp. 261-291, 1995.  S. Tamura and A. Waibel, “Noise reduction using connectionist models,” Proc. of the Int. Conf. of Acoust., Speech and Signal Processing, New York, pp. 553-556, Apr. 1988.  H. B. D. Sorensen, “A cepstral noise reduction multi-layer neural network,” Proc. IEEE Int. Conf. Acoust., Speech and Signal Processing, pp. 933-936, 1991.  C.F. Juang and C.T. Lin, “Noisy speech processing by recurrently adaptive fuzzy filter,” IEEE Trans. Fuzzy Systems, Vol. 9, No. 1, pp. 139-152, Feb, 2001.  V. Gorrini and H. Bersini, “Recurrent fuzzy systems,” Proc. IEEE Int. Conf. Fuzzy Systems, Orlando, FL, vol. 1, pp. 193-198, 1994.  The HTK Toolkit, Version 3.2, 2001-2002 Cambridge University Engineering Department.  A. Varga and H. J. M. Steeneken, “Assessment for automatic speech recognition: II. NOISE-92: “A database and an experiment to study the effect of additive noise on speech recognition systems,” Speech Commun., vol. 12, pp. 247-251, 1993.  Y. T. Chan, Wavelet Basics, Kluwer Academic Publishers, 1995.  J. B. Allen, “Cochlear modeling,” IEEE Acoust., Speech, Signal Processing Mag. 2, pp. 3-29, 1985.  V. Vapnik, The Nature of Statistical Learning Theory, New York : Springer - Verlag, 1995.  C. Cortes, V. Vapnik, “Support vector networks,” International Journal on Machine Learning, vol. 20, pp. 1-25, 1995.  S. -H. Chiu, “Skin Color Image Segmentation by Support Vector Machine-aided Self Organizing Fuzzy Network,” Master Thesis of National Chung Hsing University, Taichung City, Taiwan ROC.  J. S. Roger Jang, and C. T. Sun, “Functional Equivalence Between Radial Basis Function Networks and Fuzzy Inference System, IEEE Trans. Neural Networks, vol. 4, no. 1, pp. 156-159, January 1993.  Y. -P. Huang; T. Tsai, “A fuzzy semantic approach to retrieving bird information using handheld devices,” IEEE JNL, Intelligence Systems , vol. 20, pp. 16-23 Feb, 2005.  E. Khan, “Recurrent fuzzy logic in speech recognition,” IEEE CNF, Microelectronics Communications Technology Producing, pp. 602-607 Nov, 1995.||摘要:||
Birdsong and Human speech recognition by Recurrent Fuzzy Network Filtered Hidden Markov Model (RFNF-HMM) in variable noise-level environments is proposed in this thesis. Birdsong and Human Speech are generated in a similar way and a similar recognition approach is proposed in this thesis. For birdsong recognition, the energy parameter is used to segment a significant portion from a birdsong sequence. Then, the linear predictive coding (LPC) coefficients of each frame in the segmented birdsong are extracted and used as feature vectors. These feature vectors are fed as inputs to HMM recognizers. Birdsong in outside practical environments are usually corrupted by non-stationary noise or other interference, which degrades recognition performance. To handle this problem, RFNF-HMM recognizer is proposed, where each HMM is connected with a RFNF for noise filtering in feature domain. Experiments in recognition of ten species of birds in Taiwan according to birdsongs are performed. For Human speech recognition in variable noise-level environments, Wavelet Energy (WE) and Zero Crossing Rate (ZCR) are proposed as detection parameters for word boundary detection. Three kinds of Intelligent Learning Networks (ILN) are used as detectors with their performance being compared. They are RSONFIN, Fuzzy System learned through the combination of Fuzzy Clustering and Support Vector Machine (FS-FCSVM), and Gaussian kernel SVM. After words detection, the cepstral coefficients are used as features and RFNF-HMM are used as recognizers. In experiments, words detection performance is evaluated both by Receiver Operator Characteristic (ROC) curves and recognition rates. Experimental results show that the proposed detection parameters are robust and effective with comparisons to the Refined Time-Frequency (RTF) parameters. Admiring recognition rates in human speech recognition with different types of unknown noises in variable noise-level environments are achieved in experiments.
|Appears in Collections:||電機工程學系所|
Show full item record
TAIR Related Article
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.