Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/51616
標題: Mining Frequent Patterns Effectively from Concept-Drifting Data Streams Using a Count Approximation Based Method
於概念漂移的動態資料串流環境下有效挖掘頻繁樣式的估算式探勘方法
作者: 賈坤芳
關鍵字: 資訊科學--軟體;技術發展;Data Mining;Data Stream;Data-stream Mining;Frequent Pattern;Count Approximation Based Method;Data Concept;Concept Drift
摘要: 
「資料探勘」是一門從儲存於資料庫的龐大資料中挖掘出有價值知識的技術。近年來在許多實際應用領域中,資料不再是靜態儲存於資料庫中的型態,而是以動態的方式持續不斷地傳輸、稱為「資料串流」的處理模型。資料串流在現實生活中的應用非常廣泛,且在串流資料當中可能潛藏著具有價值的資訊或者樣式,然而要發現它們並不容易。由於資料串流具有傳輸速率非固定、尖峰時刻資料量暴增等不安定因素,使得資料串流探勘比起資料庫探勘要來得困難許多。此外,存在於串流資料中的資料特徵或者資料分佈通常是動態的,會隨著時間改變,這種現象被稱為「概念漂移」。資料串流的概念漂移現象會對探勘系統的執行表現以及探勘品質造成負面影響,是一個不容忽視的問題,然而目前在資料串流頻繁樣式探勘的領域中鮮少有相關研究進行探討並嘗試解決。針對可能出現概念漂移的動態資料串流環境,本計畫預計研究並提出解決概念漂移問題的頻繁樣式探勘方法。我們將根據項目集支持度之間的關聯性來定義資料概念、建立資料概念模型,並且根據此模型設計一套以估算為基礎的探勘方法論。經由建立項目集支持度之間關聯性的對映函數,可以認知並且表示資料串流的資料概念。至於以估算為基礎的探勘方法會記錄串流資料中的一部份項目集作為摘要資訊,在探勘時使用摘要資訊透過對映函數來計算未記錄的項目集並且找出頻繁樣式。針對概念漂移的問題,此探勘方法具有一套根據估算準確度來偵測當前資料概念是否發生改變的技術。當偵測到概念漂移發生時,透過重新學習或者漸進式調整的方式來更新對映函數、認知新資料概念,概念漂移的問題因此獲得解決。在學術方面,本計畫將定義並且解決於資料串流環境中探勘頻繁樣式的概念漂移問題,展示新研究方向。在應用方面,對於對資料串流探勘有高度需求的行業,例如零售業和金融業,本計畫可開發出一個實際解決概念漂移、具備高效率與良好品質的資料串流頻繁樣式探勘系統,為這些行業帶來實質助益。

Data mining is a process of finding interesting knowledge from a world of data stored in databases.Recently, knowledge discovery communities have focused on a new model of data processing, where dataarrives in the form of continuous streams. It is often referred to as “data streams”. Data streams possess wideapplications in real world in recent years, such as transactional records, web-flow or click-stream records, etc.There is possibly some hidden information in these streaming data which are valuable but not easy to find out.The natural features of data streams such as variable transit-rate and peak volume have brought out manyconstraints. As a result, data mining in data streams is much more difficult than that in the databases. Besides,the characteristic or distribution of data in a data stream usually changes dynamically with time, whichphenomenon is called “concept drift”. Concept drift in data streams is a practical problem which will badlyaffect the mining performance or mining quality of a mining system. However, this problem is rarelyconsidered in the research of data-stream frequent pattern mining nowadays.In this proposal, we propose a method for concept drift handling as well as frequent pattern mining indynamic data streams. The items of our study include data-concept definition, concept modeling, conceptrepresentation, and the design of a count approximation based mining methodology. The designed miningmethod records part of the itemsets in the data stream as synopsis information. To accomplish the mining task,it approximates for the unrecorded itemsets using the synopsis and then selects the frequent ones. By buildingthe model (as a mapping function) of correlation between the frequencies of different itemsets, the miningmethod is able to understand and represent the data concept hidden in the data stream. Furthermore, thetechnique of concept-drift detection for the mining method is feasible. When there is concept drift detected inthe stream data, the mining method comprehends the new concept by either re-learning or incrementaladjustment. The problem of concept drift is solved accordingly. The study in this proposal shows someimportant issues about handling the concept-drift problem when mining frequent patterns in data streams,which is a new research topic. For those real-life applications having a high-degree demand for the discoveryof frequent patterns in dynamic data streams, the mining system with our proposed concept-drift solvablemining method will bring them essential helpfulness.
URI: http://hdl.handle.net/11455/51616
其他識別: NSC100-2221-E005-088
Appears in Collections:資訊科學與工程學系所

Show full item record
 

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.