Please use this identifier to cite or link to this item:
Noisy Speech Detection and Musical Instruments Recognition by Wavelet Transform Analysis with Soft Computing Approach
|引用:||References  F. Lamel, L. R. Rabiner, A. E. Rosenberg, and J. G. Wilson, “An improved endpoint detector for isolated word recognition,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 29, no. 4, pp. 777- 785, 1981.  M. H. Savoji, “A robust algorithm for accurate end-pointing of speech signals,” Speech Communication, vol. 8, no. 1, pp. 45-60, 1989.  H. I. Kim and S. K. Park, “Voice activity detection algorithm using radial basis function network,” Electronic Letters, vol. 40, no. 22, pp. 1454-1455, Oct. 2004.  Y. Qi and B. R. Hunt, “Voiced-unvoiced-silence classification of speech using hybrid features and a network classifier,” IEEE Tran. Speech Audio Processing, vol. 1, pp. 250–255, Apr. 1993.  J. A. Haigh and J. S. Mason, “Robust voice activity detection using cepstral features,” Proc. IEEE Region 10 Conf. TENCON ''93, vol. 3, pp. 321–324, Oct. 1993.  J. Rouat, Y. C. Liu, and D. Morissette, “Pitch determination and voiced/unvoiced decision algorithm for noisy speech,” Speech Communication, vol. 21, no. 3, pp. 191-207, 1997.  J. C. Junqua, B. Mak, and B. Reaves, “A robust algorithm for word boundary detection in the presence of noise,” IEEE Trans. Speech and Audio Processing, vol. 2, pp.406-412, July 1994  G. D. Wu and C. T. Lin, “Word boundary detection with Mel-scale frequency bank in noisy environment,” IEEE Trans. Speech and Audio Processing, vol. 8, no, 5, pp. 541-553, Sep. 2000.  G. D. Wu and C. T. Lin, “A recurrent neural fuzzy network for word boundary detection in variable noise-level environments,” IEEE Transactions on systems, Man, and cybernetics, vol. 31, no. 1, pp. 84-97, Feb. 2001.  J. Kacur, J. Frank, and G. Rozinaj, “Speech detection in the noisy environment using wavelet transform,” Proc. 4th EURASIP Conf. Video/Image Processing and Multimedia Communications, vol. 2, pp. 661-666, 2003.  X. Zhang, Z. Zhao, and G. Zhao, “A speech endpoint detection method based on wavelet coefficient variance and sub-band amplitude variance,” Proc. 1st Int. Conf. Innovative Computing, Information and Control, vol. 3, pp. 83-86, 2006.  C. F. Juang, C. N. Cheng, and T. M. Chen, “Speech detection in noisy environments by wavelet energy-based recurrent neural fuzzy network,” Expert Systems with Applications, vol. 36, no. 1, pp. 321-332, Jan. 2009.  L. S. Huang and C. H. Yang, “A novel approach to robust speech endpoint detection in cars environments,” Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), pp. 1751-1754, 2000.  B. F. Wu and K. C. Wang, “Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments,” IEEE Trans, Speech and Audio Processing, vol. 13, no. 5, pp.762 – 775, Sep. 2005.  F. Beritelli, S. Casale, and A. Cavallaro, “A robust voice activity detector for wireless communications using soft computing,” IEEE Selected Areas In Communication, vol. 16, no. 9, pp. 1818-1829, Dec. 1998.  C. F. Juang and C. T. Lin, “An on-line self-constructing neural fuzzy inference network and its applications,” IEEE Trans. Fuzzy Systems|
|摘要:||本論文應用小波特徵和基於軟性計算分類器於語音偵測及樂器辨識的問題。針對語音偵測，本論文使用Haar小波能量和小波熵(HWEE)當做特徵參數。Haar 小波能量獲得方式為找出語音和非語音區段在不同雜訊的大小具有明顯差異的強健的能量帶。同樣的，小波熵的獲得方式是選擇兩個小波能量帶使得所獲得的熵可明顯區別出語音/非語音的差異。所提出的HWEE 特徵送到遞回式自我演化的第二型模糊類神經網路(RSEIT2FNN)分類器當輸入。使用RSEIT2FNN的原因為此網路採用比第一類型模糊集更抗雜訊的第二型模糊集合。RSEIT2FNN的遞回式架構有助於記住測試音框的相鄰資訊。基於HWEE 的RSEIT2FNN偵測法已應用在不同雜訊環境及不同雜訊大小的語音偵測。
This thesis applies wavelet features and soft computing-based classifiers to speech detection and musical instruments recognition problems. For speech detection, this thesis uses Haar wavelet energy and entropy (HWEE) as detection features. The Haar wavelet energy (HWE) is derived by using the robust band that shows the most significant difference between speech and nonspeech segments at different noise levels. Similarly, the wavelet energy entropy (WEE) is computed by selecting the two wavelet energy bands of which entropy shows the most significant speech/nonspeech difference. The HWEE features are fed as inputs to a recurrent self-evolving interval type-2 fuzzy neural network (RSEIT2FNN) for classification. The RSEIT2FNN is used because it uses type-2 fuzzy sets, which are more robust to noise than type-1 fuzzy sets. The recurrent structure in the RSEIT2FNN helps to remember the context information of a test frame. The HWEE-based RSEIT2FNN detection was applied to speech detection in different noisy environments with different noise levels. The most critical technology of musical instrument recognition is the extraction of musical instrument features, especially those distinguishing ones. To this end, this thesis proposes the incorporation of HWE and wavelet packet decomposition energy features into traditional audio features to improve recognition rate. For classifier design, this thesis proposes a divide-and-conquer support vector machine (SVM) classification technique. This technique efficiently captures the context information in audio signals with high-dimensional features for recognition improvement. Experimental results on the recognition of ten musical instruments show the effectiveness of the proposed recognition approach.
|Appears in Collections:||電機工程學系所|
Show full item record
TAIR Related Article
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.