Please use this identifier to cite or link to this item:
標題: Statistical Behavior analysis of smoothing methods for language models of mandarin data sets
作者: Yu, M.S.
Huang, F.L.
Tsai, P.Y.
關鍵字: language models;smoothing methods;statistical behaviors;cross;entropy;natural language processing;sparse data;probabilities
Project: Lecture Notes in Computer Science
期刊/報告no:: Lecture Notes in Computer Science, Volume 4182, Page(s) 172-186.
In this paper, we discuss the properties of statistical behavior and entropies of three smoothing methods; two well-known and one proposed smoothing method will be used on three language models in Mandarin data sets. Because of the problem of data sparseness, smoothing methods are employed to estimate the probability for each event (including all the seen and unseen events) in a language model. A set of properties used to analyze the statistical behaviors of three smoothing methods are proposed. Our proposed smoothing methods comply with all the properties. We implement three language models in Mandarin data sets and then discuss the entropy. In general, the entropies of proposed smoothing method for three models are lower than that of other two methods.
ISSN: 0302-9743
Appears in Collections:資訊科學與工程學系所

Show full item record

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.