Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/8168
標題: 移動估計器電路架構與可重組化計算引擎之設計與實現
VLSI Architecture Design and Implementation of Motion Estimator and Reconfigurable Computing Engine for General Purpose Applications
作者: 陳聯霏
Chen, Lien-Fei
關鍵字: Motion Estimator;移動估計器;Motion Estimation Algorithm;Full-Search Block-Matching Algorithm;Fast Algorithm;Reconfigurable Computing System;移動估計演算法;全區域區塊比對演算法;快速演算法;可重組化計算系統
出版社: 電機工程學系
摘要: 
由於高效能以及高解析度視訊品質的要求日益漸增,記憶體的頻寬也逐漸引起視訊系統設計者的注意。在本論文中,共提出了兩個新式的移動估計器電路架構,且同時具有上述的特色。第一的架構是一個適用於全區域搜尋比對演算法且具有高資料重複使用特性以及高資料處理量的電路架構。其資料重複使用的特性是考慮上下相鄰的區塊的搜尋區域之資料具有重疊的特性,以降低記憶體頻寬。基於一維處理單元陣列與資料交錯式位移暫存器陣列的架構,吾人提出的架構不但能夠有效的重複使用同一列的相鄰區塊間資料重疊的特性,也考慮到上下相鄰區塊間資料重疊的特性以降低外部記憶體的存取次數以及降低整體的腳位數且不使用on-chip記憶體。此電路目前已經使用標準單元以及TSMC 0.25µm 1P5M製程實作。晶片實作結果顯示本架構可工作於100MHz以及其功率消耗為153mW。晶片面積為1.77×1.77 mm2。
對於第二個提出的架構,不僅考量了記憶體頻寬的問題,同時也試著能夠適應各類的移動估計演算法。由於on-chip記憶體以及記憶體插補組織的技術,此架構可以得到較佳的記憶體頻寬,同時此二技術亦為整體電路具有適應各類移動估計演算法彈性的基礎。因此,基於此基礎,再加上電路中的筒型位移器以取代傳統的可重組化連線網路、加法器樹狀電路以及參數化記憶體地址產生器即可輕鬆達到彈性的要求。此具彈性、有效使用記憶體且可重組化架構可根據應用需求而具有低計算複雜度、低功率消耗、高品質以及高效能等好處。此具彈性的架構不僅支援全區域區塊比對搜尋演算法,同時也支援其他的快速演算法如三步搜尋區塊比對演算法、鑽石搜尋演算法等各類快速演算法。
除了上述的兩個應用於移動估計演算法之ASIC架構外,本論文也提出了一個適用於一般用途數位訊號處理應用之可重組化計算引擎。此可重組化計算引擎的核心元件為一般用途處理叢集(GPPC: General-Purpose Processing Cluster)陣列,如同一MIMD模型,於實現各類演算法與應用時可達到高度彈性。一般用途處理叢集為一個SIMD模型以便能夠有效率的處理具有資料高度平行特性的應用。一般用途處理叢集不但能夠執行三十二位元的運算,也可同時執行4-way八位元運算或2-way十六位元運算。為了能夠具有有效的網路連結,Inter GPPC Row Reconfigurable Network的提出可達到資料通訊上的彈性以同時亦不因內部網路連線之線路延遲造成效能低落。

Owing to the video qualities of high performance and high resolution, the memory bandwidth comes into system designer notice. In this thesis, all the proposed motion estimation architectures have memory efficient features. The first architecture is a high data-reuse and high throughput architecture with multiple-slice processing for Full Search Block-Matching Algorithm (FSBMA), and the data reuse issue is considered for the candidate blocks pixel of the adjacent current block slices to decrease the memory bandwidth. Based on one-dimensional processing element (PE) arrays and data interlacing shift-register array, the proposed architecture can efficiently reuse data not only in the overlapped region of the adjacent candidate block on the same slice, but also in the overlapped region of the vertically adjacent candidate block slices to diminish external memory access and save the pin counts without using on-chip memory modules. The proposed architecture has been implemented using standard cell methodology for TSMC 0.25um 1P5M technology. The chip implementation results show that proposed architecture can work at 100MHz and its power consumption is about 153 mW. The chip size is 1.77×1.77 mm2.
For the second proposed architecture, we not only consider the memory bandwidth issue but also try to conform to all kinds of motion estimation algorithms. Due to the on-chip memory and the memory interleaving organization, we obtain the optimal memory bandwidth and these techniques are the foundation of the flexibility. Therefore, the barrel shifter, the adder tree, and the parameterized address generator easily achieve the flexibility upon this foundation. The flexible and reconfigurable architecture with memory efficiency can get benefits, which low computational complexity, low power consumption, high quality, and high performance according to applications and trade-off. The proposed flexible architecture supports not only FSBMA but also the other fast algorithms, such as the 3-step hierarchical search block-matching algorithm, the diamond search block-matching algorithm (DS), the block-based gradient descent search algorithm (BBDS), and the 4-step search block-matching algorithm (4SS), etc.
In additional to ASIC architectures, the reconfigurable computing engine for general-purpose applications is also proposed in this thesis. The kernel component of the Reconfigurable Computing (RC) Engine is the GPPC (General-Purpose Processing Cluster) array, which is constructed from GPPCs, as an MIMD model to achieve high flexibility for mapping applications and algorithms to the RC Engine. GPPC is an SIMD model to perform the data-parallelism applications efficiently. GPPC can not only execute the 32-bit operations, can but also perform 4-way 8-bit operations or 2-way 16-bit operations simultaneously. For the efficient connectivity, the Inter GPPC Row Reconfigurable Network is proposed to achieve the flexibility of the communication and performance of the interconnection network wire delay.
URI: http://hdl.handle.net/11455/8168
Appears in Collections:電機工程學系所

Show full item record
 

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.