Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/19687
標題: 台語無聲調輸入法的實作及改良
Implementation and Improvement of Toneless Input Method for Taiwanese
作者: 劉昭甫
Liou, Jau-Fu
關鍵字: Taiwanese input method
台語輸入法
phoneme-to-character
maximum matching
First-Syllable input
音轉字
長詞優先法
音首輸入
出版社: 資訊科學與工程學系所
引用: 1. Gordon, Raymond G., Jr. (Ed.). (2005). Ethnologue: Languages of the world (15th ed.). Dallas, TX: SIL International. (Online version: http://www.ethnologue.com). 2. Amelia-Fong Lochovsky & Hon-Kit Cheung, “N-gram estimates in probabilistic models for pinyin to hanzi transcription,” IEEE International Conference on Intelligent Processing Systems, pp.1798-1803, Beijing, 1997. 3. Bing-Quan Liu & Xiao-Long Wang, "An approach to machine learning of chinese pinyin-to-character conversion for small-memory application", Proceedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 2002, pp.1287-1291. 4. Frederick Jelinek, “Statistical methods for speech recognition,” The MIT Press Cambridge Massachusetts, 1997. 5. T. H. Ho, K. C. Wang, J. S. Lin, & L. S. Lee, ”Integrating long-distance language modeling to phoneme-to-character conversion,” Proceeding of ROCLING X, pp.287-292, 1997. 6. R.-Y. Lyu, M.-S. Liang, C.-C. Chiang, “Toward Constructing A Multilingual Speech Corpus for Taiwanese (Minnan), Hakka, and Mandarin,” International Journal of Computational Linguistics & Chinese Language Processing, Vol. 9, No. 2, pp. 1-12, August 2004. 7. Xiaolong Wang, Qingcai Chen, and Daniel S. Yeung, “Mining pinyin-to-character conversion rules from large-scale corpus: A rough set approach”, IEEE Transactions on System, Man, and Cybernetics, pp.834-844, 2004. 8. Xuan Wang, Lu Li, Lin Yao, and Waqas Anwar, “A maximum entropy approach to Chinese pinyin-to-character conversion”, IEEE International Conference on Systems, Man, and Cybernetics, pp.2956-2959, Taipei, 2006. 9. Chinese Gigaword Third Edition, http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007T38 10. 黃宣範,1995,語言、社會與族群意識,台北:文鶴出版公司。 11. 李尚德,“台語辭典建構與台語變調探討”,中興大學資訊科學研究所碩士論文,2007。 12. 林嘉信,“與多種拼音方法相容的國語輸入系統”,中興大學應用數學研究所資訊組碩士論文,2002。 13. 許書豪,“台語連音變調問題研究”,中興大學資訊網路與多媒體研究所碩士論文,2010。 14. 許聞廉與陳克健,“自然智慧型輸入系統的語意分析脈絡會意法”,台灣中央研究院資訊所,1993。 15. 蔡宗謀,“中文文句轉台語語音系統初步研究”,中興大學資訊科學與工程研究所碩士論文,2008。 16. 蔡承融,“國台語無聲調拼音輸入法實作”,中興大學資訊科學與工程研究所碩士論文,2008。 17. 羅火嵐,“中文無聲調拼音輸入法及其實作”,中興大學資訊科學研究所碩士論文,2006。 18. OpenVanilla: 文字輸入之禪道,http://openvanilla.org/ 19. 世界台灣語通用協會, http://tw-pinyin.taiwantp.net/ 20. 台文/華文線頂辭典, http://iug.csie.dahan.edu.tw/q/q.asp 21. 自然輸入法, http://www.iq-t.com/PRODUCTS/going9_01.asp 22. 信望愛台語客語輸入法, http://taigi.fhl.net/TaigiIME/ 23. 教育部國語推行委員會, http://www.edu.tw/mandr/ 24. 微軟新注音輸入法, http://www.microsoft.com/taiwan/windowsxp/ime/windowsxp.htm
摘要: 本論文實作出一個便利於輸入的台語拼音輸入法。在輸入拼音時不需要輸入聲調,可解決變調後唸法與原來聲調不同的問題。本輸入法相容通用拼音、TP、台灣閩南語羅馬字拼音方案和教會羅馬字四種台語拼音方案,初學者對台語不熟悉也可多種拼音方案混著輸入。 本論文除了基本的輸入模式外,另外加入了音首輸入模式、英文輸入模式和縮寫輸入模式,來增加輸入效率。音首輸入指的只輸入詞彙中每個音節的第一個字母即可得到該詞彙。英文輸入模式和縮寫輸入模式指的是以英文或是縮寫做為輸入得到中文詞,可用於輸入台語發音不熟悉可是知道他的英文或縮寫的詞彙。 本輸入法還提供使用者從台語拼音選擇國語詞彙的機制,以解決使用者對於台語詞彙不太認同的問題。並提供了加註拼音的機制讓使用者在國語詞彙上加註台語拼音。 在台語語料不足的情況下,本論文嘗試以中文語料的詞頻套用到台語的詞頻。使用連續三個詞的長詞優先法,在混合台語多種拼音方案時,正確率從45%提升到60.3%。
This dissertation implements an efficient Taiwanese Pinyin input method. You do not need to input the tone when you input Pinyin, so you do not need to consider the problem of Taiwanese tone sandhi. Users can input many types of Taiwanese phonetic symbols, which are Tongyong Pinyin, Taiwan Pinyin, Taiwanese Romanization System, and Peh-Oe-Ji. Novice users also can fuzzy input those Pinyin. We add three input modes, which are First-Syllable input mode, English input mode, and abbreviation input mode, to increase input efficiency. First-Syllable means the first letter sequence of each syllable. This means that a user only needs to input a Taiwaness phonetic symbol per character (syllable), which is very efficient compared to the current methods. In English mode and abbreviation mode, you can input the English word or English abbreviation to get Chinese word. You can use those input mode when you don''t konw the Taiwinese Pinyin of the word but you know its English translation or abbreviation. Users may not like some Taiwanese words, so we provide Chinese words for choice. And we provide this function that users can mark Taiwanese Pinyin. In order to deal with the lack of Taiwanese language corpus, we apply the Chinese word frequency instead of Taiwanese word frequency in phoneme to character translation. We use an algorithm based on three word maximum matching combined with unigram probability. The accuracy is improved from 45% to 60.3%.
URI: http://hdl.handle.net/11455/19687
其他識別: U0005-1508201014302000
文章連結: http://www.airitilibrary.com/Publication/alDetailedMesh1?DocID=U0005-1508201014302000
Appears in Collections:資訊科學與工程學系所

文件中的檔案:

取得全文請前往華藝線上圖書館



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.