Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/19989
標題: 基於Hadoop 框架建立巨量資料分析處理模型研究
Design and Implementation of a Big Data Processing and Analysis Framework on the Hadoop Ecosystem
作者: 簡玠忠
Chien, Chien-Chung
關鍵字: Hadoop
Hadoop
MapReduce
雲端運算
分散式運算
大量資料處理
MapReduce
Could Computing
Distributed Computing
Big Data
出版社: 資訊科學與工程學系所
引用: [1] http://www.idc.com/getdoc.jsp?containerId=234294 [2] http://en.wikipedia.org/wiki/Cloud_computing [3] http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf [4] http://upload.wikimedia.org/wikipedia/commons/3/3c/Cloud_computing_layers.png [5] http://upload.wikimedia.org/wikipedia/commons/thumb/8/87/Cloud_computing_types.svg/395px-Cloud_computing_types.svg.png [6] http://www.moneydj.com/kmdj/wiki/wikiviewer.aspx?keyid=b2a16b54-77ee-4a1d-8feb-a3d0366e55c8#ixzz2B2h351NH [7] http://en.wikipedia.org/wiki/Big_data [8] http://hadoop.apache.org/ [9] White, Tom ,”Hadoop: The Definitive Guide”, O''Reilly Media,10 May 2012 [10] http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/zh-TW//archive/mapreduce-osdi04.pdf [11] http://hadoop.apache.org/core/docs/r0.16.4/hdfs_design.html [12] http://hbase.apache.org/book.html#datamodel [13] http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html [14] http://www.hadoop.tw/hadoop/2010/04/ [15] http://www.splunk.com [16] http://bigtop.apache.org/
摘要: 近幾年來,數位資料呈現爆炸性的成長,IDC研究中指出,全球資料每兩年就倍數成長,速度超越摩爾定律。-而隨著資料量成長及雲端運算的普及,預估2020年全球數位資料量將達35ZB,並且三分之一的數位資料將透過雲端儲存與處理。 所以在往後的數位世界,大量的數位資料對個人或企業都存在許多潛在的商機,但如何分析這些大量資料也面臨了技術瓶頸,因為這些資料大多是非結構化的型式存在於的不同系統中,無法用Database或傳統方式來分析,因此新的擷取、搜尋、發掘及分析方式將是未來對巨量資料處理的重點,為了協助企業能有個平台來分析這些巨量資料的潛在價值,建構一個巨量資料處理流程是本篇論文的目的。 在本論文中,我們基於Hadoop Ecosystem概念,整合HBase、 Pig等相關元件,了解各個元件的概念與應用,進而建構出Log的分析架構,提供企業或有巨量資料需要分析之單位,以此平台達到快速處理與分析巨量資料之目的。
A research conducted by IDC indicates that information worldwide has doubled its amount in every two years, which has broken Moore’s Law. Besides, , with the increase of digital information and the universalization of Cloud Computing, it is pridicted that the amount of digital data will reach 35ZB by 2020. In addition, one third of digital data will be stored and processed through Cloud Computing. Consequently, large amounts of digital data will be the business opportunity for corporations and individuals. However, while we analyze the mega data, the limit of current technology is still a problem because most of these data is non-structured and stored in different systems. As a result, it is hard to analyze by database and other conventional ways. The new ways to retrieve, research, discover, and analyze the mega data would be the challenge issues of data processing. The main purpose of this research is to build a mega data processing platform on a private cloud environment , so as to enable efficiently and promptly analyze the potential profits of mega data Based on the Hadoop Ecosystem, we integrate Hbase, Pig, and other similar tools, understand their purpose goal and usage of each tool, and construct a log data analysis framework, providing enterprises or organizations with a platform that achieve high-speed process and analysis of mega data information.
URI: http://hdl.handle.net/11455/19989
其他識別: U0005-0102201315141200
文章連結: http://www.airitilibrary.com/Publication/alDetailedMesh1?DocID=U0005-0102201315141200
Appears in Collections:資訊科學與工程學系所

文件中的檔案:

取得全文請前往華藝線上圖書館



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.