Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/5416
標題: 探討空氣品質資料庫中遺失值補值的方法
Methods for imputation of missing values in air quality data sets
作者: 洪啟嵐
Hung, Chi-Lan
關鍵字: missing values;遺失值;imputation;Inverse Square Distance Weighting;Kriging;補值;距離平方反比;克利金
出版社: 環境工程學系所
引用: 中文部分 王建峰,「九份二山順向坡滑動機制研究與殘坡風險評估」,碩士論文,國立中興土木工程研究所,台中(2001). 李育明,「克利金法於環境規劃之應用領域探討」,中國環境工程學刊,第7 卷,第3 期,第241-251 頁,(1997). 李慶暉、林志娟編譯,「基礎統計學」譯自Statistics: concepts and applications,(2005) 沈永勝,「整合自動分群與加權式灰關聯技術於大型資料庫內遺失值之處理」,碩士論文,國立台灣科技大學電子工程系,台北(2005). 林翊逵,「回顧近七年南投地區臭氧變化」,碩士論文,國立中興大學環境工程學系,台中(2006). 陳佳惠,「以克力金分析空氣品質之研究」,碩士論文,國立中興大學環境工程學系,台中(1997). 陳盈良,「考慮空間變異性之邊坡風險分析-以梨山地區為例」,碩 士論文,國立中興土木工程研究所,台中(2003). 陳德祐,「焚化爐戴奧辛空氣偵測空間統計分析」,碩士論文,淡江大學數學學系,台北(2004). 游裕昌,「運用基因群集技術於大型資料庫內遺失值之處理」,碩士論文,國立台灣科技大學電子工程系,台北(2004). 潘慧芳,「中部空氣品質區空氣品質之時空分佈特性分析」,碩士論文,國立中興大學環境工程學系,台中(2004). 蔡玉琴,「淡水河流域降雨時空分析及推估:地理資訊系統的應用」,碩士論文,台灣師範大學地理研究所,(1994). 鍾旻修,「應用地理統計於土壤重金屬污染物之空間分佈探討」,碩士論文,逢甲大學環境工程與科學學系,台中(2003). 行政院環保署,「九十五年空氣品質監測年報」,(2006). 行政院環保署網頁,http://www.epa.gov.tw 英文部分 Batista, G. E. A. P. A. and M. C. Monard, “An Analysis of Four Missing Data Treatment Methods for Supervised Learning,” Applied Artificial Intelligence, vol. 17, no. 5-6, pp. 519-533, (2003). Goovaerts, P.“ Geostatistcal approaches for incorporating elevation into the spatial interpolation of rainfall”, Journal of Hydrology, 228, 113-129 ,(2000). Gotway, C.A., R.B. Ferguson, G.W. Herget and T.A.Petetson, Comparison of Kriging and Inverse-Distance Methods for Mapping of Soil Parameters”. Soil Sci. Soc. Am. J., 60,pp.1237-1247,( 1996). Hernandez, M.A. and Stolfo, S.J., “Real-word Data is Dirty:Data Cleansing and The Merge/Purge Problem”, Data Mining and Knowledge Discovery, vol.2, pp.9-37, ( 1998). Heikki Junninen, Harri Niska, Kari Tuppurainen, Juhani Ruuskanen,Mikko Kolehmainen.“Methods for imputation of missing values in air quality data sets”, Atmospheric Environment Vol.38,pp.2895–2907,(2004) Isaaks, E. H. and Srivastava, R. M., “Applied Geostatistics,Oxford University Press”, New York, pp.561, (1989). Krajewski, W.F.“ Cokriging radar-rainfall and rain gage data”, Journal of Geophysical Research, 92, D8, 9571-9580,(1987). Pyle, D.,“Data Preparation for Data Mining”, Morgan Kaufmann Publishers, (1999). Steve Price, “Surface Interpolation of Real Estate Market Data”. UniGIS at Simon Fraser University,(1997). Sheng-Tun Li, Li-Yen Shue. “Data mining to aid policy makingin air pollution management”, Expert Systems with Applications Vol.27,pp.331–340,(2004) Vila, M.A., Cubero, J.C., Medina, J.M. and Pons, O., “Soft Computing: A New Perspective for Some Data Mining Problems”, Vistas in Astronomy, vol.41, no.3, pp.379-386, (1997). Watson, D.F. and Philip, G.M., “A Refinement of Inverse Distance Weighted Interpolation”, Geo-processing, Vol.2, pp.315-327,( 1985). R Development Core Team 1999–2005, http://www.rproject.org/
摘要: 
環保署於1994年起為了要進行空氣品質趨勢探討,並了解各地的污染情形和各種污染的傳輸情形以及制訂合理改善控制對策,於是就依各地污染特性、地形及氣象條件等,將台灣地區劃分成七個空氣品質區,至今共設立了超過七十個空氣品質監測站。 因為有監測站全天候的監測各地空氣品質狀況,使得現在有環境空氣品質資料庫的歷史資料供大家研究及參考,可經由這龐大的資料庫中找尋出我們所需要的資訊。但監測站網所收集的空氣汙染物資料,只在傳輸的過程中就有約10%的資料因此遺失,若再加上其他因素的影響,導致更多資料的遺失。而資料庫中遺失值的存在,往往會影響到資料分析的品質,所以如何妥善處理這些遺失值是非常重要的關鍵。本研究分別利用距離平方反比與克利金兩種模式對空氣品質資料中的遺失值進行補值,並分析討論各補值後的效果,再經由蒙地卡羅(Monte Carlo)交叉驗證,探討及分析何種模式對於遺失值回復的準確性是比較好的。
經研究實驗模擬後發現,對於空氣品質資料庫中的遺失值進行補值時,克力金模式模擬補值的相對誤差比距離平方反比法的相對誤差小,且經蒙地卡羅法交叉驗證,綜合五個汙染物PM10、CO、O3、SO2、NOx結果來說,克力金法的平均相對誤差為25%,距離平方反比法的相對誤差為19%,顯示克力金模式比距離平方反比法是更適用的插值模式。

In order to discuss the problem of the air quality, to understand the pollution and the different pollution transmission in each place, and to design strategies to control and improve the contamination, Environmental Protection Administration in Taiwan in 1994 divided Taiwan into 7 air quality districts based on their different pollution features, the terrain, and the climate in each district. They have already set up over 70 air quality monitoring stations in Taiwan. Because of the air quality monitoring stations which monitored air quality condition of each place, we are able to have the historical information of air quality of everywhere for everybody to research and find out the useful information from this huge database. But around 10% of data are missing only during the transmission process. However, the missing values of the database will affect the data analysis, hence, it is very important to resolve the missing value problem. My research uses Inverse Square Distance Weighting method and Kriging method to impute the missing values, and discusses, analyzes, and compares the result of using the 2 different methods. Monte Carlo method is then used to test and verify which one is the better method to yield the accurate values to replace the missing values.
After my research, the absolute error of Inverse Square Distance Weighting is smaller than it of Monte Carlo method for imputation of air quality data. After verifying, the absolute error of Inverse Square Distance Weighting and Kriging method is respectively 25% and 19%. It shows that Kriging method is better imputation method than Inverse Square Distance Weighting.
URI: http://hdl.handle.net/11455/5416
其他識別: U0005-1508200821492700
Appears in Collections:環境工程學系所

Show full item record
 

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.