Please use this identifier to cite or link to this item:
標題: 利用RBM預訓練之DBN類神經法於高混合度之中文母音辨識
Applying the Method of Deep Belief Network Pre-trained by Restricted Boltzmann Machines on High Confused Mandarin Vowel Recognition
作者: 陳映岑
Ying-Tsen Chen
關鍵字: 受限波爾茲曼機;深度置信網絡;梅爾倒頻譜係數;多層感知機;RBM;DBN;MFCC;MLP
引用: 1.王小川。2004。語音訊號處理。全華。 2.林昇甫、洪成安。1993。神經網路入門與圖樣辨識。全華。 3.G. E. Hinton,'Training products of experts by minimizing contrastive divergence'in Neural Computation, vol.14, no.8, pp.1771–1800, 2002. 4.G. E. Hinton, S. Osindero, and Y. Teh,'A fast learning algorithm for deep belief nets,'in Neural Computation, vol.18, no.7, pp.1527–1554, 2006. 5.Bengio, Y., Lamblin, P., Popovici, D. & Larochelle, H.'Greedy layer-wise training of deep networks,'in Advances in Neural Information Processing Systems 19 (NIPS 2006), pp.153-160, MIT Press, 2007. 6.A. Mohamed, G. Dahl, and G. Hinton,'Deep belief networks for phone recognition,'in Proc. NIPS Workshop Deep Learning for Speech Recognition and Related Applications, 2009. 7.G. E. Hinton,'A practical guide to training restricted Boltzmann machines,'Tech. Rep. UTML TR 2010-003, Dept. Comput. Sci., Univ. Toronto, 2010. 8.A. Mohamed, G. Dahl, and G. Hinton,'Acoustic modeling using deep belief networks,'IEEE Trans. Audio Speech Lang. Processing, vol.20, no.1, pp.14–22, Jan. 2012. 9.G. Hinton, Li Deng, Dong Yu, et al.'Deep Neural Networks for Acoustic Modeling in Speech Recognition,'in IEEE Signal Processing Magazine, pp.82-97, IEEE, 2012.
本論文主要是應用受限波爾茲曼機(Restricted Boltzmann machine, RBM)預訓練之深度置信網絡(Deep belief network, DBN),對高混合度之中文母音如<ㄢ、ㄤ>、<ㄛ、ㄨㄛ>、<ㄥ、 ㄣ>等進行辨識。首先,我們會錄製20位語者的語音資料,再將錄製好的語音做數位取樣、端點偵測、音框切割、視窗化等一系列的前處理;接著利用梅爾倒頻譜係數(Mel-frequency cepstral coefficients, MFCC)求取語音訊號之特徵值,並將這些特徵值作為模型的輸入。不同於多層感知機(Multilayer perceptron, MLP)之初始鍵結值是隨機給定的,DBN使用RBM對鍵結值做預訓練,先估計出一組較好的鍵結值,作為MLP之初使參數,再透過梯度下降法進行微調。因DBN經由預訓練取得了較好的初始參數,故在使用MLP調參的階段,會比一般採用隨機初始值之MLP收斂的更快,辨識結果也更好。本篇採用的是母音資料,每筆資料共有25個音框,每個音框有39個特徵參數,模型則是一層或兩層隱藏層的RBM預訓練之DBN。本方法之辨識結果,比起採用隨機鍵結值之MLP,辨識率至少可提升0.67%,至多可提升9.61%。平均而言,RBM預訓練之DBN,可以比MLP高出4.59%的辨識率。

This thesis mainly uses deep belief network (DBN) pre-trained by restricted Boltzmann machine (RBM) to recognize high confused mandarin vowels such as <ㄢ, ㄤ>, <ㄛ , ㄨㄛ>, <ㄥ, ㄣ>, etc. First, we would record the phonetic data of 20 speakers, and then perform a series of pre-processing such as digital sampling, endpoint detection, frame cutting, and windowing. Then take Mel-frequency cepstral coefficients (MFCC) as the features of the phonetic data, and use these features as the input to train the model. Different from multilayer perceptron (MLP) which uses random initial weights and biases, DBN uses RBM to pre-train the initial parameters in order to get a set of better initial parameters. After pre-training, take these initial parameters as the initial weights and biases of MLP, and then fine-tune these parameters by method of gradient descent. Since DBN obtains better initial parameters by pre-training, in the stage of using MLP to fine-tune parameters, the model converges faster than general MLP, and the recognition result is better, too. This research uses vowel data, each vowel has 25 frames, each frame has 39 features, and the model is DBN pre-trained by RBM which has one or two hidden layers. The identification rate of this method is at least 0.67% higher than that of MLP, and can increase by 9.61% at most. On average, DBN pre-trained by RBM has 4.59% higher identification rate than MLP.
Rights: 同意授權瀏覽/列印電子全文服務,2021-08-20起公開。
Appears in Collections:統計學研究所

Files in This Item:
File SizeFormat Existing users please Login
nchu-107-7105018007-1.pdf1.67 MBAdobe PDFThis file is only available in the university internal network    Request a copy
Show full item record

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.