Please use this identifier to cite or link to this item:
標題: 國語單音統計辨認法
Statistical Speech Classification on Mandarin Monosyllables
作者: 洪玉璇
關鍵字: Speech Classification;語音辨識
出版社: 應用數學系所
引用: [1] H. Dudley and S. Balashek, "Automatic recognition of phonetic patterns in speech," J. Acoust. Soc. Amer., vol. 30, pp. 721-739, (1958). [2] K. H. Davis, R. Biddulph, and S. Balashek, "Automatic recognition of spoken digits," J. Acoust. Soc. Amer., vol. 24, pp. 637-642, (1952). [3] P. B. Denes and M. V. Mathews, "Spoken digit recognition using time frequency pattern matching," J. Acoust. Soc. Amer., vol. 32, pp. 1450-1455, (1960). [4] V. W. Zue, "The use of speech knowledge in automatic speech recognition," Proc. IEEE, vol. 73, no. 11, pp. 1602-1615, Nov. (1985). [5] J. W. Cooley and J. W. Tukey, "An algorithm for the machine calculation of complex Fourier series," Mathematics Computation, vol.19, April (1965), pp.297-301. [6] B. S. Atal and S. L. Hanauer, "Speech analysis and synthesis by linear prediction of the speech wave," J. Acoust. Soc. Amer., vol. 50, pp. 637-655, (1971). [7] F. Itakura, "Minimum prediction residual principle applied to speech recognition," IEEE Trans. Acoust., Speech, Signal Processing, vol. 23, no. 1, pp. 67-72, Feb. (1975). [8] J. Makhoul and J. Wolf, "Linear Prediction and the Spectral Analysis of Speech," Bolt, Baranek, and Newman, Inc., Cambridge, Mass., Rep. 2304, (1972). [9] J.Tierney, "A study of LPC analysis of speech in additive noise," IEEE Trans. Acoust. Speech, Signal Processing, vol. 28, no. 4, pp. 389-397, (1980). [10] N. R. Sambur and L. R. Rabiner, "A speaker-independent digit recognition system," B.S.T.J., vol. 54, no.1, pp. 84-102, Jan. (1975). [11] S. B. Davis and P. Mermelstein, "Comparison of parametric representations for monosyllabic word recignition in continuously spoken sentences," IEEE Trans. Acoust., Speech, Signal Processing, vol. 28, no. 4, pp. 357-366, Aug. (1980). [12] S. S. McCandless, "An algorithm for automatic formant extraction using linear prediction spectra," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-22, no. 2, pp. 135-141, Apr. (1974). [13] A. Aktas, B. Kammerer, W. Kupper, and H. Lagger, "Large-vocabulary isolated word recognition with past coarse time alignment," IEEE ICASSP 86, Tokyo, pp. 709-712, (1986). [14] B. P. Landell, R. E. Wohlford, and L. G. Bahler, "Improved speech recognitionin noise," IEEE ICASSP 86, Tokyo, pp. 749-751, (1986). [15] H. Murveit and R. W. Brodersen,"An integrated-circuit-based speech recognition system," IEEE Trans. Acoust., Speech, Signal Pricessing, vol.ASSP-34, no. 6, pp. 1465-1472, Dec. (1986). [16] J. L. Gauvain, J. Mariani, and J. S. Lienard, "On the use of time compression for word-based recognition," Proc. 1983 ICASSP, pp. 1029-1032, Apr. (1983). [17] J. L. Gauvain and J. Mariani, "Evaluation of time compressing for connected word recognition," Proc. 1984 ICASSP, Boston, MA, pp. 391-394. [18] L. Rabiner and J. Wilpon, "Speaker-independent isolated word recognition for a moderate size (54 word) vocabulary," IEEE Trans. Acoust., Speech and Signal Processing, vol. ASSP-27, no.6, pp. 583-587, Dec. (1979). [19] L. Rabiner, S. E. Levinson, A. E. Rosenberg, and J. G. Wilson, "Speaker-independent recognition of isolated words using clustering techniques," IEEE Trans. Acoust. Speech, Signal Processing, vol. 27, pp. 336-349, (1979). [20] L. Rabiner and S. Levinson, "Isolated and connected word recognition-theory and selected applications," IEEE Trans. Communications, vol. COM-29, no. 5, pp. 621-658, May (1981). [21] L. Wilcox and B. Lowerre, "Coarse classification using a hierarchical decision tree and top down parsing," IEEE ICASSP 86, Tokyo, pp. 73-76, (1986). [22] S. Furui, "Speaker-independent isolated word recognition using dynamic features of speech spectrum," IEEE Trans. Acoust., speech, processing,vol. ASSP-34, no. 1, pp. 52-59, Feb. (1986). [23] S. K. Das, "Some experiments in discrete utterane recognition," IEEE Trans. Acoust., Speech, Signal Processing, vol. 30, no. 5, pp. 766-770, (1982). [24] S. K. Das and W.S. Mohn, "A scheme for speech processing in automatic speaker verification," IEEE Trans. Audio Electro-Acoust., vol. AU-19, pp. 32-43, Mar. (1971). [25] S. L. Banner, "Simulating an acoustic recognizer," IEEE ICASSP 86, pp. 725-728, (1986). [26] S. Morishima, H. Harashima, and H. Miyakawa, "A proposal of a knowledge based isolated word recognition," IEEE ICASSP 86, Tokyo, pp.713-716, (1986). [27] M. Kuhn, H. Tomaschewski, and H. Ney, "Fast nonlinear time alignment for isolated word recognition," Proc. 1981 ICASSP, pp. 736-740, May (1981). [28] Y. Tohkura, "A weighted cepstral distance measure for speech recognition," IEEE ICASSP 86, Tokyo, pp. 761-764, (1986). [29] A. Buzo, A. Gray, R. Gray, and J. Markel, "Speech coding based upon vector quantization," IEEE Trans. Acoust., Speech and Signal Processing, vol. ASSP-28, no. 5, pp. 562-573, Oct. (1980). [30] C. E. Shannon, "Coding theorems for a discrete source with a fidelity criterion," in information and Decision Processes, R. E. Machol, ed. New York: McGrawHill, pp. 93-126, (1960). [31] B. H. Juang, D. Y. Wong, and A. H. Gray, Jr., "Distortion performance of vector quatization for LPC voice coding," IEEE Trans. Acoust., Speech and Signal Processing, vol. 30, no. 2, pp. 294-303, (1982). [32] D. Burton, J. Shore, and J. Buck, "Isolated-word speech recognition using multisection vector quatization codebooks, "IEEE Trans. Acoust., Speech and Signal Processing, vol. ASSP-33, no. 4, pp. 837-849, Aug. (1985). [33] J. E. Shore and D. K. Burton, "Discrete utterance speech recognition without time alignment, " IEEE Trans. Inform. Theory, vol. 29, pp. 473-491, (1983). [34] J. Makhoul, S. Roucos, and H. Gish, "Vector quantization in speech coding," Proc. IEEE, vol. 73, no. 11, pp. 1551-1588, Nov. (1985). [35] R. Gray, "Vector quantization," IEEE ASSP Magazine, pp. 4-29, Apr. (1984). [36] Y. Linde, A. Buzo, and R. M. Gray, "An algorithm for vector quantizer design," IEEE Trans. Communications, vol. COM-28, no 1, pp. 84-95, Jan. (1980). [37] H. Y. Gu, C. Y. Tseng and L. S. Lee, "Markov modeling of mandarin Chinese for decoding the phonetic sequence into Chinese characters," Computer Speech and Language, vol. 5, no. 4, pp. 363-377, (1991). [38] H. W. Hon, B. S. Yuan, Y. L. Chow, S. Naryan, and K. F. Lee, ”Toward large vocabulary mandarin Chinese speech recognition”, Proc. ICASSP 1994, pp. 545-548. [39] L. S. Lee, C. Y. Tseng, H. Y. Gu, K. J. Chen, F. H. Liu, C. H. Chang, S. H. Hsieh, and C. H. Chen, "A real-time mandarin diction machine for Chinese language with unlimited tests and very large vocabulary," Proc. 1990 ICASSP, Albuquerque, NM, USA, pp.65-68, (1990). [40] L. S. Lee, C. Y. Tseng, K. J. Chen, I. J. Hung, M. Y. Lee, L. F. Chien, Y. M. Lee, R. Y. Lyu, H. M. Wang, Y. C. Chang, T. s. Lin, H. Y. Gu, C. P. Nee, C.Y. Liao, Y. J. Yang, T. C. Chang, and R. C. Yang, "Golden mandarin (II)-An improved single-chip real-time mandarin dictation machine for Chinese language with vary large vocabulary," Proc. 1993 ICASSP, pp. 503-506. [41] L. S. Lee and J. T. Chen, "An initial study on speaker adaptation techniques for isolated mandarin syllable recognition," 1990 proc. Of Telecommunications Symp., Taiwan, pp. 115-121. [42] Y. Q. Gao, T. Y. Huang, Z. W. Lin, B. Xu, and D. X. Xu, "A real-time Chinese speech recognition system with unlimited vacobulary," Proc. ICASSP 1991, pp. 257-260. [43] Tze Fen Li, Chung Bow Lee and Tseng Chang Yen, "A Note on Mel frequency cepstra in Speech Recognition." [44] Tze Fen Li, "Speech recognition of mandarin monosyllables," Pattern Recognition, vol.36, pp. 2713-2721, April (2003). [45] K. Fukunage, Introduction to Statistical Pattern Recognition, Academic Press, New York, (1972).
本篇論文主要的目的是要研究非特定語者(Speech Speaker-Independent)的語音辨識。我們利用線性預估編碼的線性迴歸係數之倒頻譜參數將語音的特徵表現出來,並且將其壓縮成大小相同之矩陣。計算出單一語音中所有發音者之特徵矩陣的平均矩陣及變異矩陣,用來做為建立標準語音資料庫。再用貝氏分類法針對單一測試語音進行比對,且此測試語音不包含於語音資料庫中,觀察其辨識之結果。


Because of the rapid development of the computer, the relation between our life and computer is getting much and much closer. In order to make a high quality and humanized living environment, the speech recognition study becomes more and more important. The main purpose in this thesis is to study speech recognition with speech speaker-independent. We use a sequence of linear predict coding cepstra vectors to represent each mandarin syllable, and compress it into the matrix of features. Finally, a simplified Bayes decision rule is used for classification of mandarin syllables. The computation for feature extraction and classification are fast and precise.

Key Words: speech speaker-independent, linear predict coding, cepstra, Bayes decision rule
其他識別: U0005-2706200611111500
Appears in Collections:應用數學系所

Show full item record

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.