Please use this identifier to cite or link to this item:
標題: 以智慧型學習網路執行變動噪音環境下的語音偵測
Speech Detection in Variable Noise-Level Environment by Intelligent Learning Networks
作者: 鄭君楠
Cheng, Chun-Nan
關鍵字: speech segmentation;語音切割;VAD;fuzzy neural network;模糊類神經網路;偵測器;智慧型;越零率
出版社: 電機工程學系
本論文提出利用低頻帶小波能量(LWE)及越零率(ZCR)參數為基礎的智慧型學習網路偵測器,以用於變動噪音環境下的語音切割。LWE是利用小波轉換(wavelet transfer)的特性而得,其可以降低雜訊對語音訊號的影響。再加上越零率(ZCR)的使用,只需兩個參數即可偵測出變動噪音環境下的語音訊號,且擁有不錯的抗雜訊能力。在偵測器部份,我們分別使用了三種不同的智慧型學習網路,第一個是遞迴類神經模糊網路(RSONFIN),第二個是向量支持機(SVM) ,第三個是新提出的以向量支持機輔助之自我組織模糊網路(SVM-SOFN) 。在SVM-SOFN中,網路的前件部是使用線上模糊分群來建構完成。前件部可將輸入參數轉變為可線性分離的參數。接下來則利用具現線核心的SVM來決定網路後件部模糊單值的值。偵測表現的指標除了採用ROC及相關數據之外,我們並由ROC曲線決定最適當的網路輸出門檻值,以用來決定出較合適的語音段並送入辨識器中辨識。在我們的實驗中是用特定語者所錄的語音再加入不同的噪音種類及不從噪訊比(SNR)來做切割。使用LWE及ZCR參數下的三個網路皆有很不錯的性能效果。同時,與精練化的時間頻率參數比較,所提出的LWE參數有較好的表現。而三個網路的特點分別是: RSONFIN較少的參數量,SVM具有非常容易訓練的特點,SVM-SOFN有較少的參數量及容易訓練的特點。

Speech detection in variable noise-level environments by Intelligent Learning Networks using Low-frequency-band Wavelet Energy (LWE) and Zero Crossing Rate (ZCR) features is proposed in this thesis. The LWE is derived based on Wavelet transformation; it can reduce the affection of noise in a speech signal. With the inclusion of ZCR, we can robustly and effectively detect speech from noise with only two features. For detector design, three types of intelligent learning networks are used; they are Recurrent Self-Organizing Neural Fuzzy Inference Network (RSONFIN), Support Vector Machine (SVM), and the newly proposed SVM-aided Self-Organizing Fuzzy Network (SVM-SOFN). In SVM-SOFN, the antecedent part is constructed by on-line fuzzy clustering. The antecedent part helps to transfer input features into linearly separable features. The consequent part parameters are fuzzy singletons, and are learned by SVM with linear kernels. The performance indices used are receiver operating characteristic (ROC) and corresponding figures. In addition, the most suitable detection threshold for network output is regulated by ROC curve, so that suitable speech intervals ate detected and put into recognizes. Experiments on speaker dependent with different categories of noises and variable Signal-to-Noise Ratio (SNR) levels are performed. The results show that based on LWE and ZCR features, all of the three Intelligent Learning Networks have pretty well performance. Compared with refined time-frequency feature, the proposed LWE has also shown to achieve a better performance. The results also show that among the three networks, RSONFIN is characterized with small number of network parameters, SVM is easy to train, and SVM-SOFN is characterized with both advantages, i.e. the network size is small and it is easy to train.
Appears in Collections:電機工程學系所

Show full item record

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.