Please use this identifier to cite or link to this item:
The Mandarin Monosyllable Recognition by Using the Method of Convolutional Neural Network
|關鍵字:||多層感知機;梅爾倒頻譜係數;卷積類神經網路;語音辨識;機器學習;MLP;MFCC;CNN;Speech recognition;Machine learning||引用:|| 蘇木春、張孝德。2003。機器學習：類神經網路、模糊系統以及基因演算法則。修訂二版。全華。  蘇奕銘、李宗寶。2016。應用MLP、RBF及DNN類神經網路方法於中文母音辨識。碩士論文，國立中興大學統計學研究所，台中。  Alex Graves, Abdel-rahman Mohamed and Geoffrey Hinton, 'Speech recognition with deep recurrent neural networks,' in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, 2013.  Abdul Ahad, Ahsan Fayyaz and Tariq Mehmood, 'Speech recognition using multilayer perceptron.' in Students Conference, 2002. ISCON '02. Proceedings , vol. 1, pp.103-109, IEEE, 2002.  Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton, 'ImageNet classification with deep convolutional neural networks', in NIPS, 2012.  Diederik P. Kingma and Jimmy Lei Ba,' ADAM: A method for stochastic optimization' in ICLR, 2015.  K. Simonyan and A. Zisserman. 'Very deep convolutional networks for large-scale image recognition.' in ICLR, 2015.  LeCun, Y., Bottou, L., Orr, G. B. and Muller, K-R., 'Efficient backProp.' in Orr, G. B. and Muller, K-R. (Eds), Neural Networks: Tricks of the trade, Springer, 1998.  Neelima Rajput and S.K. Verma, 'Back propagation feed forward neural network approach for speech recognition,' in 2014 3rd International Conference on Reliability, Infocom Technologies and Optimization (ICRITO 2014) - (Trends and Future Directions), pp. 1-6, IEEE, 2014.  Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn, and Dong Yu, 'Convolutional neural networks for speech recognition' in Transactions on audio, speech, and Language Processing, vol. 22, no. 10, IEEE/ACM, october 2014.  Yanmin Qian, Mengxiao Bi, Member, Tian Tan and Kai Yu, 'Very deep convolutional neural networks for noise robust speech recognition' in Transactions on audio, speech, and Language Processing, vol. 24, no. 12, IEEE/ACM, december 2016.  Xu Tian, Jun Zhang, Zejun Ma, Yi He, Juan Wei, Peihao Wu, Wenchang Situ, Shuai Li and Yang Zhang, 'Deep LSTM for large vocabulary continuous speech recognition' arXiv:1703.07090, 2017.||摘要:||
本論文主要探討卷積類神經網路（Convolutional neural network, CNN）在中文單音上的辨識。將20個不同語者所錄製的1391個單音，進行數位採樣、音框切割、視窗化等一系列的前處理後，取得梅爾倒頻譜係數（Mel-Frequency cepstral coefficients,MFCC）作為模型的輸入特徵。本方法將利用卷積、池化、批標準化等過程，對原始特徵做進一步的擷取，最後再輸入多層感知機（Multilayer perceptron, MLP）進行分類。除了將全部1391個單音直接分類外，也嘗試了其他的分類方法，如先分母音、再分子音的模型設計，或者進一步將母音聲調作分類，共3個主要模型，辨識率分別為82.89%、82.76%、80.46%。最後再透過模型不加權投票，得到最佳辨識率84.05%。
This thesis mainly discusses the speech recognition using CNN(convolutional neural network) in Chinese monophonic. MFCC(Mel-Frequency cepstral coefficients) were obtained as models after a series of pre-processing such as digital sampling, frame cutting and windowing were performed on a total of 1391 single tones recorded by 20 different speakers as the input of model. This method will use the convolution, pooling, batch normalization and other layers to further extract the original features, and finally input the MLP(multilayer perceptron) for classification. In addition to directly classifying all 1391 monophonic, other classification methods have been tried, such as model design of first denominator and re-molecular sound, or further classification of vowel tones. There are 3 main models with recognition rates of 82.89. %, 82.76%, 80.46%. Finally, through the model unweighted voting, the best recognition rate is 84.05%.
|Appears in Collections:||統計學研究所|
Show full item record
Files in This Item:
TAIR Related Article
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.