Please use this identifier to cite or link to this item:
標題: 基於小波分析與軟性計算方法的雜訊語音偵測與樂器辨識
Noisy Speech Detection and Musical Instruments Recognition by Wavelet Transform Analysis with Soft Computing Approach
作者: 杜秋娟
Tu, Chiu-Chuan
關鍵字: 遞回式自我演化的第二型模糊類神經網路;RSEIT2FNN;Haar小波能量;小波熵;支持向量機;小波包分解能量;HWE;HWEE;SVM;WPD
出版社: 電機工程學系所
引用: References [1] F. Lamel, L. R. Rabiner, A. E. Rosenberg, and J. G. Wilson, “An improved endpoint detector for isolated word recognition,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 29, no. 4, pp. 777- 785, 1981. [2] M. H. Savoji, “A robust algorithm for accurate end-pointing of speech signals,” Speech Communication, vol. 8, no. 1, pp. 45-60, 1989. [3] H. I. Kim and S. K. Park, “Voice activity detection algorithm using radial basis function network,” Electronic Letters, vol. 40, no. 22, pp. 1454-1455, Oct. 2004. [4] Y. Qi and B. R. Hunt, “Voiced-unvoiced-silence classification of speech using hybrid features and a network classifier,” IEEE Tran. Speech Audio Processing, vol. 1, pp. 250–255, Apr. 1993. [5] J. A. Haigh and J. S. Mason, “Robust voice activity detection using cepstral features,” Proc. IEEE Region 10 Conf. TENCON ''93, vol. 3, pp. 321–324, Oct. 1993. [6] J. Rouat, Y. C. Liu, and D. Morissette, “Pitch determination and voiced/unvoiced decision algorithm for noisy speech,” Speech Communication, vol. 21, no. 3, pp. 191-207, 1997. [7] J. C. Junqua, B. Mak, and B. Reaves, “A robust algorithm for word boundary detection in the presence of noise,” IEEE Trans. Speech and Audio Processing, vol. 2, pp.406-412, July 1994 [8] G. D. Wu and C. T. Lin, “Word boundary detection with Mel-scale frequency bank in noisy environment,” IEEE Trans. Speech and Audio Processing, vol. 8, no, 5, pp. 541-553, Sep. 2000. [9] G. D. Wu and C. T. Lin, “A recurrent neural fuzzy network for word boundary detection in variable noise-level environments,” IEEE Transactions on systems, Man, and cybernetics, vol. 31, no. 1, pp. 84-97, Feb. 2001. [10] J. Kacur, J. Frank, and G. Rozinaj, “Speech detection in the noisy environment using wavelet transform,” Proc. 4th EURASIP Conf. Video/Image Processing and Multimedia Communications, vol. 2, pp. 661-666, 2003. [11] X. Zhang, Z. Zhao, and G. Zhao, “A speech endpoint detection method based on wavelet coefficient variance and sub-band amplitude variance,” Proc. 1st Int. Conf. Innovative Computing, Information and Control, vol. 3, pp. 83-86, 2006. [12] C. F. Juang, C. N. Cheng, and T. M. Chen, “Speech detection in noisy environments by wavelet energy-based recurrent neural fuzzy network,” Expert Systems with Applications, vol. 36, no. 1, pp. 321-332, Jan. 2009. [13] L. S. Huang and C. H. Yang, “A novel approach to robust speech endpoint detection in cars environments,” Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), pp. 1751-1754, 2000. [14] B. F. Wu and K. C. Wang, “Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments,” IEEE Trans, Speech and Audio Processing, vol. 13, no. 5, pp.762 – 775, Sep. 2005. [15] F. Beritelli, S. Casale, and A. Cavallaro, “A robust voice activity detector for wireless communications using soft computing,” IEEE Selected Areas In Communication, vol. 16, no. 9, pp. 1818-1829, Dec. 1998. [16] C. F. Juang and C. T. Lin, “An on-line self-constructing neural fuzzy inference network and its applications,” IEEE Trans. Fuzzy Systems
本論文應用小波特徵和基於軟性計算分類器於語音偵測及樂器辨識的問題。針對語音偵測,本論文使用Haar小波能量和小波熵(HWEE)當做特徵參數。Haar 小波能量獲得方式為找出語音和非語音區段在不同雜訊的大小具有明顯差異的強健的能量帶。同樣的,小波熵的獲得方式是選擇兩個小波能量帶使得所獲得的熵可明顯區別出語音/非語音的差異。所提出的HWEE 特徵送到遞回式自我演化的第二型模糊類神經網路(RSEIT2FNN)分類器當輸入。使用RSEIT2FNN的原因為此網路採用比第一類型模糊集更抗雜訊的第二型模糊集合。RSEIT2FNN的遞回式架構有助於記住測試音框的相鄰資訊。基於HWEE 的RSEIT2FNN偵測法已應用在不同雜訊環境及不同雜訊大小的語音偵測。
樂器辨識最關鍵的技術乃在於截取樂器的特徵,尤其是具區別性的特徵參數。為此,本論文提出合併小波能量和小波包分解能量到傳統音訊特徵的特徵參數以提高辨識率。分類器設計方面,本論文提出分散克服的支持向量機(SVM) 分類技術。此技術能獲取具有高維度特徵音訊的相鄰內容以改善辨識效果。十種樂器辨識的實驗結果顯示出所提的方法是有效的。

This thesis applies wavelet features and soft computing-based classifiers to speech detection and musical instruments recognition problems. For speech detection, this thesis uses Haar wavelet energy and entropy (HWEE) as detection features. The Haar wavelet energy (HWE) is derived by using the robust band that shows the most significant difference between speech and nonspeech segments at different noise levels. Similarly, the wavelet energy entropy (WEE) is computed by selecting the two wavelet energy bands of which entropy shows the most significant speech/nonspeech difference. The HWEE features are fed as inputs to a recurrent self-evolving interval type-2 fuzzy neural network (RSEIT2FNN) for classification. The RSEIT2FNN is used because it uses type-2 fuzzy sets, which are more robust to noise than type-1 fuzzy sets. The recurrent structure in the RSEIT2FNN helps to remember the context information of a test frame. The HWEE-based RSEIT2FNN detection was applied to speech detection in different noisy environments with different noise levels.
The most critical technology of musical instrument recognition is the extraction of musical instrument features, especially those distinguishing ones. To this end, this thesis proposes the incorporation of HWE and wavelet packet decomposition energy features into traditional audio features to improve recognition rate. For classifier design, this thesis proposes a divide-and-conquer support vector machine (SVM) classification technique. This technique efficiently captures the context information in audio signals with high-dimensional features for recognition improvement. Experimental results on the recognition of ten musical instruments show the effectiveness of the proposed recognition approach.
其他識別: U0005-2412201214104500
Appears in Collections:電機工程學系所

Show full item record

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.