Please use this identifier to cite or link to this item:
DC FieldValueLanguage
dc.contributorChung-Bow Leeen_US
dc.contributor.authorYing-Tsen Chenen_US
dc.identifier.citation1.王小川。2004。語音訊號處理。全華。 2.林昇甫、洪成安。1993。神經網路入門與圖樣辨識。全華。 3.G. E. Hinton,'Training products of experts by minimizing contrastive divergence'in Neural Computation, vol.14, no.8, pp.1771–1800, 2002. 4.G. E. Hinton, S. Osindero, and Y. Teh,'A fast learning algorithm for deep belief nets,'in Neural Computation, vol.18, no.7, pp.1527–1554, 2006. 5.Bengio, Y., Lamblin, P., Popovici, D. & Larochelle, H.'Greedy layer-wise training of deep networks,'in Advances in Neural Information Processing Systems 19 (NIPS 2006), pp.153-160, MIT Press, 2007. 6.A. Mohamed, G. Dahl, and G. Hinton,'Deep belief networks for phone recognition,'in Proc. NIPS Workshop Deep Learning for Speech Recognition and Related Applications, 2009. 7.G. E. Hinton,'A practical guide to training restricted Boltzmann machines,'Tech. Rep. UTML TR 2010-003, Dept. Comput. Sci., Univ. Toronto, 2010. 8.A. Mohamed, G. Dahl, and G. Hinton,'Acoustic modeling using deep belief networks,'IEEE Trans. Audio Speech Lang. Processing, vol.20, no.1, pp.14–22, Jan. 2012. 9.G. Hinton, Li Deng, Dong Yu, et al.'Deep Neural Networks for Acoustic Modeling in Speech Recognition,'in IEEE Signal Processing Magazine, pp.82-97, IEEE, 2012.zh_TW
dc.description.abstract本論文主要是應用受限波爾茲曼機(Restricted Boltzmann machine, RBM)預訓練之深度置信網絡(Deep belief network, DBN),對高混合度之中文母音如<ㄢ、ㄤ>、<ㄛ、ㄨㄛ>、<ㄥ、 ㄣ>等進行辨識。首先,我們會錄製20位語者的語音資料,再將錄製好的語音做數位取樣、端點偵測、音框切割、視窗化等一系列的前處理;接著利用梅爾倒頻譜係數(Mel-frequency cepstral coefficients, MFCC)求取語音訊號之特徵值,並將這些特徵值作為模型的輸入。不同於多層感知機(Multilayer perceptron, MLP)之初始鍵結值是隨機給定的,DBN使用RBM對鍵結值做預訓練,先估計出一組較好的鍵結值,作為MLP之初使參數,再透過梯度下降法進行微調。因DBN經由預訓練取得了較好的初始參數,故在使用MLP調參的階段,會比一般採用隨機初始值之MLP收斂的更快,辨識結果也更好。本篇採用的是母音資料,每筆資料共有25個音框,每個音框有39個特徵參數,模型則是一層或兩層隱藏層的RBM預訓練之DBN。本方法之辨識結果,比起採用隨機鍵結值之MLP,辨識率至少可提升0.67%,至多可提升9.61%。平均而言,RBM預訓練之DBN,可以比MLP高出4.59%的辨識率。zh_TW
dc.description.abstractThis thesis mainly uses deep belief network (DBN) pre-trained by restricted Boltzmann machine (RBM) to recognize high confused mandarin vowels such as <ㄢ, ㄤ>, <ㄛ , ㄨㄛ>, <ㄥ, ㄣ>, etc. First, we would record the phonetic data of 20 speakers, and then perform a series of pre-processing such as digital sampling, endpoint detection, frame cutting, and windowing. Then take Mel-frequency cepstral coefficients (MFCC) as the features of the phonetic data, and use these features as the input to train the model. Different from multilayer perceptron (MLP) which uses random initial weights and biases, DBN uses RBM to pre-train the initial parameters in order to get a set of better initial parameters. After pre-training, take these initial parameters as the initial weights and biases of MLP, and then fine-tune these parameters by method of gradient descent. Since DBN obtains better initial parameters by pre-training, in the stage of using MLP to fine-tune parameters, the model converges faster than general MLP, and the recognition result is better, too. This research uses vowel data, each vowel has 25 frames, each frame has 39 features, and the model is DBN pre-trained by RBM which has one or two hidden layers. The identification rate of this method is at least 0.67% higher than that of MLP, and can increase by 9.61% at most. On average, DBN pre-trained by RBM has 4.59% higher identification rate than MLP.en_US
dc.description.tableofcontents摘要 i Abstract ii 目錄 iii 附圖目錄 v 附表目錄 vi 第一章 緒論 1 1.1 研究動機與目的 1 1.2 相關研究 1 1.3 語音辨識簡介 3 1.3.1 何謂語音辨識 3 1.3.2 語音的特性 3 1.3.3 語音辨識的應用 3 1.4 論文架構 4 第二章 語音訊號前處理與特徵值擷取 5 2.1 語音訊號前處理 5 2.1.1 數位取樣 6 2.1.2 常態化 6 2.1.3 端點偵測 6 2.1.4 切割音框 7 2.1.5 預強調 8 2.1.6 視窗化 8 2.2 特徵值擷取 9 2.2.1 梅爾倒頻譜係數 9 2.2.2 離散傅立葉轉換 9 2.2.3 三角濾波器 10 2.2.4 頻率範圍 10 2.2.5 對數能量 11 2.2.6 離散餘弦轉換 11 第三章 研究方法 12 3.1 簡介 12 3.2 類神經網路 12 3.2.1 感知機 12 3.2.2 活化函數 14 3.2.3 損失函數 15 3.3 受限波爾茲曼機 15 3.4 深度置信網絡 21 3.5 附加方法 22 3.5.1 切割訓練、測試集 22 3.5.2 隨機批次輸入 23 3.5.3 Early Stopping 23 3.6 語音辨識系統 23 第四章 實驗流程與結果 26 4.1 使用軟體 26 4.2 實驗流程 26 4.3 實驗參數設定 27 4.4 資料來源 27 4.5 實驗結果 27 第五章 結論 35 5.1 總結 35 5.2 改善與建議 36 參考文獻 37zh_TW
dc.titleApplying the Method of Deep Belief Network Pre-trained by Restricted Boltzmann Machines on High Confused Mandarin Vowel Recognitionen_US
dc.typethesis and dissertationen_US
item.fulltextwith fulltext-
item.openairetypethesis and dissertation-
Appears in Collections:統計學研究所
Files in This Item:
File SizeFormat Existing users please Login
nchu-107-7105018007-1.pdf1.67 MBAdobe PDFThis file is only available in the university internal network    Request a copy
Show simple item record

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.