Please use this identifier to cite or link to this item:
標題: 語者相關之國語單音辨識
Recognition of Speaker Dependent Mandarin Monosyllables
作者: 劉柏揚
Liu, Bo-Yang
關鍵字: Speech Recognition;國語語音辨識;LPCC
出版社: 資訊科學與工程學系所
引用: [1]許志興,民國83年,聲霸卡支應用與語音辨識,台北市,旗標 [2] Lai, Z.C. Jim and C.C. Lue, “Fast Search Algorithms for VQ Codebook Generation”, JVCIR(7), No. 2, PP163–168, 1996. [3] L.R. Rabiner and B.H. Juang, “Fundamentals of speech recognition”, Prentice Hall, New Jersey, 1993. [4] L.R. Rabiner and R.W. Schafer, “Digital processing of speech recognition signals”, Prentice-Hall Co. Ltd, 1978. [5] L.R. Rabiner, S.E. Levinson and M.M. Sondhi, “On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition”, BSTJ 62 (6) 1075–1105, 1983. [6] Qiang Huo and Chorkin Chan, “Contextual Vector Quantization for Speech Recognition with Discrete Hidden Markov Model”. Speech, Image Processing and Neural Networks, Proceedings of ISSIPNN 1994, Page(s): 698–701 vol.2. [7] H. El-Ramly Salwa, S. Abdel-Kader Nemat and El-Adawi Reem, “Neural Networks Used for Speech Recognition”, NINETEENTH NATIONAL RADIO SCIENCE CONFERENCE, ALEXANDRIA, March, 119-21, 2002. [8] S.B. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”, IEEE Trans. Acoust. Speech Signal Process. ASSP-28 (4) 357–366, 1980. [9] T. Lee, P. C. Ching and L. W. Chan, “Recurrent Neural Networks for Speech Modeling and Speech Recognition”, Int. Conf. on Acoustics, Speech and Signal Processing, Vol. 5, 3319–3322, 1995. [10] Tze-Fen Li, “Speech recognition of mandarin monosyllables”, Pattern Recognition 36 (2003) 2713 – 2721. [11] W.Y. Chen, Y.F. Liao and S.H. Chen, “Speech Recognition with Hierarchical Recurrent Neural Networks”, Pattern Recognition, Vol. 28, No. 6, 795–805, 1995. [12] Yaxin Zhang, Mike Alder and Roberto Togneri, “USING GAUSSIAN MIXTURE MODELING IN SPEECH RECOGNITION”, IEEE, 1994. [13] Y. Linde, A. Buzo and R.M. Gray, “An algorithm for vector quantizer design”, IEEE Trans. Commun. COM-28 (1) 84–95, 1980.
語音辨識系統往往需要大量的計算與系統資源,因此要在較低計算能力與系統資源的裝置(如行動電話等)透過語音來進行文字輸入則較顯得較為困難,本研究利用國語音節的讀法及國語單音的特性─聲母短暫且較不穩定而韻母佔大部分的聲音比例且較為穩定,提出一個訓練快速、簡單且具有一定辨識能力的國語單音辨識系統。我們先將注音符號的16個韻母及3個介母整理分類成36個母音類別,再對應母音的類別進行子音(注音符號聲母部分)分類,在語者相關的情形下,找出母音及子音的代表音框,並利用線性預測倒頻譜系數(LPCC, Linear Predictive Cepstral Coefficient)的方法從代表音框擷取特徵向量,依先辨識母音後再辨識子音的順序完成國語單音辨識。國語音節的讀法可分為開口呼、齊齒呼、合口呼及撮口呼四種,這四種讀法在一聲的情況時,子音加上母音一共有293個國語單音組合,經實驗證明我們提出的方法辨識正確率可達86.73%,如採用前三名的候選辨識,更可達91.57%,雖然辨識的正確率仍有精進的空間,但是整體計算及所需的系統資源則非常的節省。

Automatic speech recognition (ASR) always need a huge amount of computation and system resource, so it is difficult to apply ASR on a device with low computation capability such as mobile phone. We exploit the utterance and the characteristic of the Mandarin monosyllable - a consonant sound is short and instable and a vowel sound is long and stable, to propose a fast, simple ASR to classify speaker-dependent Mandarin monosyllables. We classify the vowels of Mandarin phonetic symbols into 36 classes. The classifier first classifies the vowel of a test monosyllable into one of the 36 classes, and then it recognizes the consonant of this test monosyllable. First we try to find high-representative frames of the vowel and consonant, and then extract feature vectors of Linear Predictive Cepstral Coefficient (LPCC). Then we use the feature vectors to recognize the vowel, and base on the recognized vowel-class, we recognize the consonant. For 293 Mandarin monosyllables, the recognition rate of our method is 86.73%, and it achieves 91.57% with top 3 candidates. Although the recognition rate of our method is not very good, but the total computation power and system resource needed are low.
其他識別: U0005-1008200823375300
Appears in Collections:資訊科學與工程學系所

Show full item record

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.