Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/8277
標題: 移動估算演算法研究與及其電路架構設計與實現
Analysis and Architecture Design of Efficient Motion Estimations
作者: 黃聖瑜
Huang, Sheng-Yu
關鍵字: H.264;H.264;VLSI;Video Compression;Full Search Motion Estimation;Fast Motion Estimation;Parallel Algorithm;超大型積體電路;影像壓縮;全區域移動估算;快速移動估算;平行演算法
出版社: 電機工程學系所
引用: [1] Information Technology - Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbit/s - Part 2: Video, ISO/IEC 11172-2, 1993 [2] Information Technology - Generic Coding of Moving Pictures and Associated Audio Information: Video, ISO/IEC 13818-2 and ITU-T Recommendation H.262, 1996. [3] Information Technology - Coding of Audio-Visual Objects - Part 2: Visual, ISO/IEC 14496-2, 1999. [4] Video Codec for Audiovisual Services at p × 64 Kbit/s, ITU-T Recommendation H.261, Mar. 1993. [5] Video Coding for Low Bit Rate Communication, ITU-T Recommendation H.263, Feb. 1998. [6] Draft ITU-T recommendation and final draft international standard of joint video specification (ITU-T Rec. H.264/ISO/IEC 14 496-10 AVC), in Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, JVT-G050, 2003. [7] H. Malvar, A. Hallapuro, M. Karczewicz, and L. Kerofsky, “Low-complexity transform and quantization in H.264/AVC,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, pp. 598-603, Jul. 2003. [8] T. Wedi, “Motion compensation in H.264/AVC,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, pp. 577-586, Jul. 2003. [9] T. Wiegand, X. Zhang, and B. Girod, “Long-term memory motion-compensated prediction,” IEEE Trans. Circuits Syst. Video Technol., vol. 9, pp. 70-84, Feb. 1999. [10] S. Wenger, “H.264/AVC over IP,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, pp. 645-656, Jul. 2003. [11] D. Marpe, H. Schwarz, and T. Wiegand, “Context-adaptive binary arithmetic coding in the H.264/AVC video compression standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, pp. 620-636, Jul. 2003. [12] T. Koga, K. Iinuma, A. Hirano, Y. Iijima and T. Ishiguro, “Motion compensated interframe coding for video conferencing,” in Proc. Nut. Telecommun. Conf., pp. G5.3.1-5.3.5, Nov. 1981. [13] J. Y. Tham, S. Ranganath, M. Ranganath, and A. A. Kassim, “A novel unrestricted center-biased diamond aearch algorithm for block motion estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 4, pp. 369-378, Aug. 1998. [14] S. Zhu and K. K. Ma, “A new diamond search algorithm for fast block-matching motion estimation,” IEEE Transactions on Image Processing, vol. 9, pp. 287-290, Feb. 2000. [15] M. J. Chen, L. G. Chen, and T. D. Chiueh, “One-dimensional full search motion estimation algorithm for video coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 4, no. 5, 1994, pp. 504-509, Jun. 1994. [16] L. M. Po and W. C. Ma, “A novel four-step search algorithm for fast block motion estimation,” IEEE Trans. on Circuits and Syst. Video Technol., vol. 6, no. 3, pp. 313-317, Jun. 1996. [17] J. R. Jain and A. K. Jain, “Displacement measurement and its application in interframe image coding,” IEEE Trans. Commun., vol. COM-299, no.12, pp. 1799-1808, Dec. 1981. [18] R. Srinivasan and K. R. Rao, “Predictive coding based on efficient motion estimation,” IEEE Trans. Commun., vol. COM-33, no. 8, pp. 888-896, Aug. 1985. [19] M. Ghanbari, “The cross search algorithm for motion estimation,” IEEE Trans. Commun., vol. 38, pp. 950-953, Jul. 1990. [20] S. Kappagantula and K. R. Rao, “Motion compensated interframe image predicition,” IEEE Trans. On Commun., vol. COM-33, no.9, pp. 1011-1015, Sep. 1985. [21] R. Li, B. Zeng, and M. L. Liou, “A new three-step search algorithm for block motion estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 4, no. 4, pp. 438-442, Aug. 1994. [22] L. K. Liu, and E. Feig, “A block-based gradient descent search algorithm for block motion estimation in video coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 6, pp. 419-422, Aug. 1996. [23] A. M. Tourapis, O. C. Au, M. L. Liou, G. Shen, and I. Ahmad, “Optimizing the MPEG-4 encoder advanced diamond zonal search,” in Proc. of IEEE Int. Symp. On Circuit and Syst., pp. 674-677, 2000. [24] A. M. Tourapis, O. C. Au, and M. L. Liou, “Highly efficient predictive zonal algorithms for fast block-matching motion estimation,” IEEE Trans. on Circuits Syst. Video Technol., vol. 12, pp. 934-947, Oct. 2002. [25] A. Zaccarin and B. Liu, “Fast algorithms for block motion estimation,” in Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, no. 3, pp. 449-452, 1992. [26] K. M. Nam, J. S. Kim, R. H. Park, and Y. S. Shim, “A fast hierarchical motion vector estimation algorithm using mean pyramid,” IEEE Trans. on Circuits and Syst. for Video Technol., vol. 5, pp. 344-351, 1995. [27] D. Tzovaras, M. Strintzis, H. Sahinolou, “Evaluation of multiresolution block matching techniques for motion and display estimation,” Signal Processing and Image Communication, no. 6, pp. 56-67, 1994 [28] J. H. Lee, K. W. Lim, B. C. Song, and J. B. Ra, “A fast multi-resolution block matching algorithm and its VLSI architecture for low bit-rate video coding,” IEEE Trans. Circuits Syst. Video Technol., vol.11, no.12, pp. 1289-1301, Dec. 2001. [29] J. H. Lee and N. S. Lee, “Variable block size motion estimation algorithm and its hardware architecture for H.264/AVC” in Proc. of IEEE Int. Symp. On Circuit and Syst., Canada, pp. 741-744, May 2004. [30] W. Li and E. Salari, “Successive elimination algorithm for motion estimation,” IEEE Trans. Image Processing, vol. 4, pp. 105-107, Jan. 1995. [31] X. Q. Gao, C. J. Duanmu, and C. R. Zou, “A multilevel successive elimination algorithm for block matching motion estimation,” IEEE Trans. Image Processing, vol. 9, pp. 501-504, Mar. 2000. [32] M. Brünig and W. Niehsen, “Fast full-search block matching,” IEEE Trans. Circuits Syst. Video Technol., vol. 11, pp. 241-247, Feb. 2001. [33] Y. W. Huang, S. Y. Chien, B. Y. Hsieh and L. G. Chen, “Global elimination algorithm and architecture design for fast block matching motion estimation,” IEEE Trans. on Circuits and Syst. for Video Technol., vol. 14, pp. 898-907, 2004. [34] K. M. Yang, M. T. Sun, and L. Wu, “A family of VLSI designs for the motion compensation block-matching algorithm,” IEEE Trans. on Circuit and Syst., vol. 36, no. 10, Oct. 1989. [35] T. Komarek and P. Pirsch, “Array architectures for block matching algorithms,” IEEE Trans. on Circuit and Syst., vol. 36, no. 10, Oct. 1989. [36] H. M. Jong, L. G. Chen, and T. D. Chiueh, “Accuracy improvement and cost reduction of the 3-step search block-matching algorithm for video coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 4, pp. 88-90, Feb. 1994. [37] Y. W. Huang, B. Y. Hsieh, T. C. Wang, S. Y. Chien, S. Y. Ma, C. F. Shen, and L. G. Chen, “Analysis and reduction of reference frames for motion estimation in MPEG-4 AVC/JVT/H.264,” in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Apr. 2003. [38] Y. Su and M. T. Sun, “Fast multiple reference frame motion estimation for H.264,” in Proc. IEEE Int. Conf. on Multimedia and Expo. 2004, vol. 1, pp. 695-698, Jun. 2004. [39] G. J. Sullivan and T. Wiegand, “Rate-distortion optimization for video compression,” IEEE Signal Processing Magazine, pp. 74-90, Nov. 1998. [40] T. Wiegand and B. Girod, “Lagrangian multiplier selection in bybrid video coder control,” in Proc. of IEEE Int. Conf. on Image Processing (ICIP'01), pp. 542-545, 2001. [41] S. Y. Yap, J. V. McCanny, “A VLSI architecture for variable block size video motion estimation,” IEEE Trans. on Circuits and Syst., vol. 51, no. 7, pp. 384-389, Jul. 2004. [42] T. C. Chen, Y. W. Huang, and L. G. Chen, “Analysis and design of macroblock pipelining for H.264/AVC VLSI architecture,” in Proc. ISCAS ''2004, vol. 2, pp. II-273-276, May 2004. [43] T. C. Chen, Y. W. Huang, and L. G. Chen, “Analysis and design of macroblock pipelining for H.264/AVC VLSI architecture,” in Proc. IEEE Int. Symp. Circuits Syst., pp. 273-276, 2004. [44] C. Y. Chen, S. Y. Chien, Y. W. Huang, T. C. Chen, T. C. Wang, L. G. Chen, “Analysis and architecture design of variable block size motion estimation for H.264/AVC,” IEEE Trans. on Circuits and Syst. Part I, vol. 53, no. 2, pp. 578-593, Feb. 2006. [45] J. C. Tuan, T. S. Chang and C. W. Jen, “On the data reuse and memory bandwidth analysis for full-search block-matching VLSI architecture,” IEEE Trans. on Circuits and Syst. for Video Technol., vol. 12, no. 1, pp. 61-72, Jan. 2002. [46] C. Y. Chen, C. T. Huang, Y. H. Chen, and L. G. Chen, “Level C+ data reuse scheme for motion estimation with corresponding coding order,” IEEE Trans. on Circuits and Syst. for Video Technol., vol. 16, no. 4, pp. 553-558, Apr. 2006. [47] M. Keating and P. Bricaud, Reuse Methodology Manual, 3rd ed. Norwell, MA: Kluwer, 2002. [48] H. Yeo and Y. H. Hu, “A novel modular systolic array architecture for full-search block matching motion estimation,” IEEE Trans. on Circuits and Syst. for Video Technol., vol. 5, no. 5, pp. 407-416, Oct. 1995. [49] L. De Vos and M. Stegherr, “Parameterizable VLSI architectures for the full-search block-matching algorithm,” IEEE Trans. on Circuits and Syst., vol. 36, no. 10, pp. 1309-1316, Oct. 1989. [50] C. H. Hsieh and T. P. Lin, “VLSI architecture for block-matching motion estimation algorithm,” IEEE Trans. on Circuits and Syst. for Video Technol., vol. 2, no. 2, pp. 169-175, Jun. 1992. [51] T. Komarek and P. Pirsch, “Arrary architectures for block matching algorithms,” IEEE Trans. on Circuits and Syst., vol. 36, no. 10, pp. 1301-1308, Oct. 1989. [52] Y. K. Lai and L. G. Chen, “A data-interlacing architecture with two-dimensional data-reuse for full-search block-matching algorithm,” IEEE Trans. on Circuits and Syst. for Video Technol., vol. 8, no. 2, pp. 124-127, Apr. 1998. [53] S. Kittitornkun, and Y. H. Hu, “Frame-level pipelined motion estimation array processor,” IEEE Trans. on Circuits and Syst. for Video Technol., vol. 11, no. 2, pp. 248-251, Feb. 2001. [54] W. M. Chao, C. W. Hsu, Y. C. Chang, and L. G. Chen, “A novel hybrid motion estimator supporting diamond search and fast full search,” in Proc. of ISCAS'02, 2002. [55] S. S. Lin, P. C. Tseng, C. P. Lin, and L. G. Chen, “Multi-mode content-aware motion estimation algorithm for power-aware video coding systems,” in Proc. of IEEE Workshop on SIPS'04, 2004. [56] H. K. Jung, C. P. Hong, J. S. Choi, and Y. H. Ha, “A VLSI architecture for the alternative subsampling block-matching algorithms,” IEEE Trans. Consumer Electron., vol. 41, pp. 231-238, May 1995. [57] K. B. Lee, H. Y. Chin, H. C. Hsu, and C. W. Jen, “QME: An efficient subsampling-based block matching algorithm for motion estimation,” in Proc. IEEE Int. Symp. Circuits Syst., vol. 2, pp. II-305-308, May 2004. [58] C. L. Su, Y. C. Yang, C. W. Chen, W. S. Yang, Y. L. Chen, S. Y. Tseng, and J. I. Guo, “A low complexity high quality integer motion estimation architecture design for H.264/AVC,” in Proc. 2006 IEEE Asia Pacific Conference on Circuits and Syst. (APCCAS 2006), Dec. 2006. [59] T. C. Chen, Y. H. Chen, S. F. Tsai, S. Y. Chien, L. G. Chen, “Fast algorithm and architecture design of low-power integer motion estimation for H.264/AVC,” IEEE Trans. on Circuits and Syst. for Video Technol., vol. 17, no. 5, pp. 568-577 May 2007. [60] http://iphome.hhi.de/suehring/tml/download/, JVT reference software.
摘要: 
移動估算是視訊編碼系統的最重要的部分,在編碼器中,需要最多運算量與記憶體存取。其中,又以H.264/AVC為目前最新的國際視訊編碼標準,相較於MPEG-4、H.263、和MPEG-2,它可分別節省37%、48%、和64%的資料量。本論文首先將會介紹最近二十年來(1981-2006)其中最重要的移動估算演算法和架構,以簡單的例子介紹之。
第二部分,我們提出一個應用於高效能以及高解析度視訊品質的移動估算器,此架構所採用的演算法為全區域搜尋區塊比對演算法,此架構為具有可調性以及管線化的二維移動估算器架構。由可調性以及管線化的技術,增加了此架構的效能,並滿足處理能力的需求。並且此架構在畫面邊界上,可以連續處理,而不會有任何的閒置時間,架構上採用層級C+的資料重複使用,來降低外部記憶體頻寬。實作上採用標準單元以及TSMC 0.18um 1P6M製程實作,晶片實作結果顯示此架構為目前最快速之全搜尋架構,本架構可工作於100 MHz,其消耗功率為364.06 mW,晶片面積為3.24 × 3.24 mm2。
第三部分,我們提出一個基於像素取樣的快速演算法,此演算法能避免陷入局部最佳的情況。同時,此快速演算法可保有和全搜尋相當的視訊品質,並只需要全區域搜尋區塊比對演算法7.5%之運算量。此演算法應用於移動估算硬體架構上,可以達到低功率消耗,為了節省硬體面積,採用4 × 4處理單元、樹狀加法器與平行比較單元,且此硬體皆可以於兩個步驟上重複使用。在記憶體方面,採用分離記憶體結構與內嵌記憶體組態兩種技術,將像素有效率的儲存於記憶體中,以達到層級C的資料重複使用率。實作上採用標準單元以及TSMC 0.18um 1P6M製程實作,晶片實作結果,在節省面積和提升運算速度的綜合表現為全搜尋架構之十五倍。最高工作頻率為52.4 MHz,可處理SDTV(720 × 480)的畫面,消耗功率為43.38mW,晶片面積為2.3 × 1.7 mm2。
最後,我們提出一個基於二階段漸進的快速演算法,並將其套用於H.264/AVC中整數點移動估算架構上。於第一階段,我們採用全域消除演算法與取樣的機制來達到節省運算量的目的;第二階段再以固定區域做全區域搜尋區塊比對演算法,來減少影像失真過大的現象發生。此快速演算法可保有和全搜尋相當的視訊品質,並只需要多重區塊大小之全區域搜尋區塊比對演算法5%的運算量。為了達到所制定的H.264編碼器規格,採用平行處理。4 × 4個像素加法器用來萃取大約的特徵,平行樹狀絕對差值加法器用來執行比對工作,四組平行比較單元用來分別找尋各自有潛力的候選區塊。在記憶體方面,同樣採用分離記憶體結構與內嵌記憶體組態兩種技術,達到層級C的資料重複使用率。實作上採用標準單元以及TSMC 0.18um 1P6M製程實作,晶片實作結果,在節省面積和提升運算速度的綜合表現為全搜尋架構之二十倍。最高工作頻率為62.5 MHz,可處理HD720p(1280 × 720)的畫面,消耗功率為183.0 mW,晶片面積為3.20 × 3.58 mm2。
簡而言之,我們對移動估算技術的貢獻主要有三點。可調性以及管線化的二維移動估算器架構提供高效能與低消耗外部記憶體頻寬;基於像素取樣的移動估算器架構能夠以減少最少的畫面品質下,提供低功率消耗以及低硬體面積;而二階段漸進的H.264/AVC整數點移動估算架構,為目前最高效能的H.264/AVC整數點移動估算架構,支援HD720p的畫面,且具有低功率消耗以及低硬體面積,並且畫面品質表現上擁有最好的效果。我們由衷的希望我們提出的概念,能對數位影像帶來進步。

Motion estimation is the most important part in video coding systems. It demands the most computing power and memory access in a video encoder. Among them, H.264/AVC is the latest international video coding standard. It can save 37%, 48%, and 64% of bitrates in comparison with MPEG-4, H.263, and MPEG-2, respectively. In the first part of this thesis, we introduce main motion estimation algorithms and architectures during the last two decades (1981-2006).
Secondly, we proposed an application to the video qualities of high performance and high resolution motion estimator. This architecture is a scalable two-dimensional pipelined motion estimation processor for full search block matching algorithm (FSBMA). By scalable and pipeline technology, this architecture can be scaled up or down to meet the performance requirements. The proposed 2-D motion estimator can perform the block-matching operations of the consecutive frames smoothly without any processing element (PE) idle time at frame boundaries. Furthermore, it reduced the external memory bandwidth with level C+ data reuse. The proposed architecture has been implemented using standard cell methodology for TSMC 0.18um 1P6M technology. The chip implementation results show that the performance of the proposed architecture is high for FSBMA. It can work at 100 MHz and its power consumption is about 364.06 mW. And its chip size is 3.24 3.24 mm2.
Thirdly, our proposed fast algorithm can avoid trapping into the local minimum based on pixel subsampling algorithm. While preserving the same quality as FSBMA, our algorithm complexity is about 7.5% of FSBMA one. Thus, the power consumption of our proposed motion estimator is low. It is composed of a 4 4 PE array, a parallel sum of absolute differences (SAD) tree, and a parallel comparator tree. The hardware cost is low since the datapath can be reused during the operations of these two steps. In order to reduce the system memory bandwidth, the memory interleaving organization and local memory configuration are proposed to easily arrange the current and reference pixel, and it may achieve the Level C data reuse scheme. The proposed architecture has been implemented using standard cell methodology for TSMC 0.18um 1P6M technology. The proposed architecture can process SDTV (720 480) resolution pictures in 30 frames per second at 52 MHz. The chip implementation results show that the proposed architecture is 15 times more area-speed efficient than full search architectures. It can work at 52.4 MHz, and its power consumption is about 43.38 mW. And its chip size is 2.3 1.7 mm2.
Finally, we proposed a fast motion estimation algorithm based on coarse-to-fine technique. We applied it to integer motion estimator of H.264 encoder. In the first stage, we adopt global elimination and downsampling algorithm to reduce computational complexity. In the second stage, we perform local full search on pixels around the selected candidates to obtain the 41 MVs. While preserving the same quality as FS, our algorithm complexity is about 5% of the variable block size (VBS) full search. In order to achieve H.264 encoder specification, we adopt parallel processing techniques. The corresponding coarse-to-fine architecture is composed of the pixel sum array to extract coarse features, the parallel SAD tree to perform matching operations, and the parallel comparator tree of four banks to find the each potential candidate. In order to reduce the system memory bandwidth, the memory interleaving organization and local memory configuration are also proposed to arrange the current and reference pixel, and achieve the Level C data reuse scheme. The proposed architecture has been implemented using standard cell methodology for TSMC 0.18um 1P6M technology. The proposed architecture can process HD720p (1280 720) resolution pictures in 30 frames per second at 59.6 MHz. The chip implementation results show that proposed architecture is 20 times of VBS full search architectures according to area-speed product. It can work at 62.5 MHz and power consumption is about 183.0 mW. The chip size is 3.20 3.58 mm2.
In conclusion, the contributions of the thesis mainly focus on three directions. Firstly, the scalable two-dimensional pipelined motion estimator has high performance and low external memory bandwidth. Secondly, the motion estimator based on fast pixel subsampling algorithm can reduce the maximum computational complexity and hardware cost with minimum video quality distortion. Thirdly, the motion estimator based on coarse-to-fine fast algorithm can achieve the highest performance among all integer motion estimation architectures and can support HD720p video format in H.264/AVC application. This architecture has low power consumption and low hardware cost, and it can almost display the same video quality as full search. We sincerely hope that our research results can make progress for the video technology.
URI: http://hdl.handle.net/11455/8277
其他識別: U0005-2108200818501200
Appears in Collections:電機工程學系所

Show full item record
 

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.