Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/8917
標題: 應用於HDTV1080p之H.264/AVC高級規範具功率感知幀間預測演算法研究及其電路架構設計與實現
Algorithm and Architecture Design of HDTV1080p Power-Aware Inter Prediction Processor in H.264/AVC High Profile
作者: 劉自清
Liu, Tzu-Chin
關鍵字: inter prediction;幀間編碼;power-aware;high profile;H.264/AVC;功率感知;高級規範;H.264/AVC
出版社: 電機工程學系所
引用: [1] Video Codec for Audiovisual Services at p 64 x Kbit/s, ITU-T Recommendation H.261, March, 1993. [2] Video Coding for Low Bit Rate Communication, ITU-T Recommendation H.263, February, 1998. [3] Information Technology - Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5Mbit/s - Part 2: Video, ISO/IEC 11172-2, 1993. [4] Information Technolohy - Generic Coding of Moving Pictures and Associated Audio Information: Video, ISO/IEC 13818-2 and ITU-T Recommendation H.262, 1996. [5] Information Technology - Coding of Audio-Visual Objects - Part 2: Visual, ISO/IEC 14496-2, 1999. [6] Joint Video Team, Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification, ITU-T Recommendation H.264 and ISO/IEC 14496-10 AVC, May 2003. [7] Tung-Chien Chen, Shao-Yi Chien, Yu-Wen Huang, Chen-Han Tsai, Ching-Yeh Chen, To-Wei Chen and Liang-Gee Chen, “Analysis and architecture design of an HDTV720p 30 frames/s H.264/AVC encoder,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 16,no. 6, June 2006. [8] R. Li, B. Zeng, and M. L. Liou, “A new three-step search algorithm for block mo-tion estimation,” Transactions on Circuits and Systems for Video Technology, vol. 4, no. 4, pp. 438 - 442, August. 1994. [9] J. Y. Tham, S. Ranganath, M. Ranganath, A. A. Kassim, “A novel unrestricted center-biased diamond search algorithm for block motion estimation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 8, no. 4, August. 1998. [10] Shan Zhu and Kai-Kuang Ma, “A new diamond search algorithm for fast block-matching motion estimation,” IEEE Transactions on Image Prpcessing, vol. 9, no. 2, February 2000. [11] M. J. Chen, L. G. Chen, and T. D. Chiueh, “One-dimensional full search motion estimation algorithm for video coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 4, no. 5, 1994, pp. 504 - 509, June. 1994. [12] L. M. Po and W. C. Ma, “A novel four-step search algorithm for fast block motion estimation,” IEEE Transactions on Circuits and Systems for Video Technology., vol. 6, no. 3, pp. 313 - 317, June. 1996. [13] “A fast hierarchical motion vector estimation algorithm using mean pyramid,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 5, no. 4, August 1995. [14] W. Li and E. Salari “Successive elimination algorithm for motion estimation,” IEEE Transactions on Image Prpcessing, vol. 4, no. 1, January 1995. [15] X. Q. Gao, C. J. Duanmu, C. R. Zou, “A multilevel successive elimination algo-rithm for block matching motion estimation,” IEEE Transactions on Image Prpcessing, vol. 9, no. 3, March 2000. [16] Yu-Wen Huang, Shao-Yi Chien, Bing-Yu Hsieh, Liang-Gee Chen, “Global elimi-nation algorithm and architecture design for fast block matching motion estima-tion,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, no. 6, June 2004. [17] L. F. Chen, S. P. Yang, Y. K. Lai, “Model-based early termination scheme for H.264/AVC inter prediction,” IEEE International Conference on Acoustics, Speech and Signal Processing, (ICASSP'09), pp. 597 - 600, 2009. [18] I-M. Pao, M.-T. Sun, “Modeling DCT coefficients for fast video encoding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 9, no. 4, pp. 608 - 616, 1999. [19] Lam E.Y., Goodman. J.W., “A mathematical analysis of the DCT coefficient dis-tributions for images,” IEEE Transactions on Image Processing, vol. 9, no. 10, pp. 1661 - 1666, 2000. [20] Lam E.Y., “Analysis of the DCT coefficient distributions for document coding,” IEEE Signal Processing Letters, vol. 11, no. 2, part 1, pp. 97 - 100, 2004. [21] Tung-Chien Chen, Yu-Wen Huang, Liang-Gee Chen, “Fully utilized and reusable architecture for fractional motion estimation of H.264/AVC,” IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 5, pp.9 - 12, 2004. [22] Chia-Chun Lin, Yu-Kun Lin, and Tian-Sheuan Chang, “PMRME: A parallel mul-ti-resolution motion estimation algorithm and architecture for HDTV sized H.264 video coding,” IEEE International Conference on Acoustics, Speech and Signal Processing, (ICASSP'07), pp. II - 385 - II-388, 2007. [23] Tzu-Yun Kuo, Yu-Kun Lin, and Tian-Sheuan Chang, “SIFME: A single iteration fractional-pel motion estimation algorithm and architecture for HDTV sized H.264 video coding,” IEEE International Conference on Acoustics, Speech and Signal Processing, (ICASSP'07), pp. I - 1185 - I-1188, 2007. [24] Pei-Kuei Tsung, Wei-Yin Chen, Li-Fu Ding, Chuan-Yung Tsai, Tzu-Der Chuang and Liang-Gee Chen, “Single-iteration full-search fractional motion estimation for quad full HD H.264/AVC encoding,” IEEE International Conference on Multime-dia and Expo, (ICME'09), 2009. [25] Zhan-Yuan Cheng, Che-Hong Chen, Bin-Da Liu, Jar-Ferr Yang, “High throughput 2-D transform architectures for H.264 advanced video coders.” IEEE Asia-Pacific Conference on Circuits and Systems, Proceedings. vol2, pp. 1141 - 1144, 2004. [26] K. Mizuno, J. Miyakoshi, Y. Murachi, M. Hamamoto, T. Iinuma, T. Iishihara, F.Yin, J. Lee, H. Kawaguchi, and M. Yoshimoto, “An H.264/AVC MP@L4.1 quarter-pel motion estimation processor VLSI for real-time MBAFF encoding,” IEEE Interna-tional Conference on Electronics, Circuits and Systems, (ICECS'08). 15th, pp. 1179-1182, August. 2008. [27] Y.-K. Lin, C.-C. Lin, T.-Y. Kuo, and T.-S. Chang, “A hardware-efficient H.264/AVC motion-estimation design for high-definition video,” IEEE Transac-tions on Circuits and Systems I: Regular Papers, vol. 55, no. 6, pp. 1526-1535, July 2008. [28] Iinuma, T. Miyakoshi, J. Murachi, Y. Matsuno, T. Hamamoto, M. Ishihara, T. Ka-waguchi, H. Yoshimoto, M.; Miyama, M., “An 800-uW H.264 Baseline-Profile Motion Estimation Processor Core,” IEEE Asian on Solid-State Circuits Conference, ASSCC 2006.November 2006. [29] Ching-Lung Su, Wei-Sen Yang, Ya-Li Chen, Yao Li, Ching-Wen Chen, Jiun-In Guo, Shau-Yin Tseng, “Low complexity high quality fractional motion estimation algo-rithm and architecture design for H.264/AVC,” IEEE Asia Pacific Conference on Circuits and Systems, 2006. APCCAS 2006. pp. 578 - 581. 2006. [30] Murachi, Y., Mizuno, K., Miyakoshi, J., Hamamoto, M., Iinuma, T., Ishihara, T., Fang Yin, Jangchung Lee, Kamino, T., Kawaguchi, H., Yoshimoto, M., “A sub 100 mW H.264/AVC MP@L4.1 integer-pel motion estimation processor VLSI for MBAFF encoding,” IEEE International Symposium on Circuits and Systems, (IS-CAS'08),pp. 848 - 851, 2008. [31] Tung-Chien Chen, Yu-Han Chen, Sung-Fang Tsai, Shao-Yi Chien, Liang-Gee Chen, “Fast algorithm and architecture design of low-power integer motion estimation for H.264/AVC,” IEEE Transactions on Circuits and Systems for Video Technology, vol 17, pp. 568 - 577, 2007. [32] Liu, Zhenyu, Song, Yang, Shao, Ming, Li, Shen, Li, Lingfeng, Goto, Satoshi, Ike-naga, Takeshi, “32-Parallel sad tree hardwired engine for variable block size motion estimation in HDTV1080p real-time encoding application,” IEEE Workshop on Signal Processing Systems, pp. 675 - 680, 2007. [33] Kumagai, K., Changqi Yang, Izumino, H., Narita, N., Shinjo, K., Iwashita, S., Na-kaoka, Y., Kawamura, T., Komabashiri, H., Minato, T., Arnbo, A., Suzuki, T., Zhenyu Liu, Yang Song, Goto, S., Ikenaga, T., Mabuchi, Y., and Yoshida, K., “System-in-silicon architecture and its application to H.264/AVC motion estimation for 1080HDTV,” IEEE International Solid-State Circuits Conference, (ISSCC'06). Digest of Technical Papers.pp.1706 - 1715, 2008. [34] Hsiu-Cheng Chang, Jia-Wei Chen, Ching-Lung Su, Yao-Chang Yang, Yao Li, Chun-Hao Chang, Ze-Min Chen, Wei-Sen Yang, Chien-Chang Lin, Ching-Wen Chen, Jinn-Shan Wang, Jiun-In Quo, “A 7mW-to-183mW dynamic quality-scalable H.264 video encoder chip,” IEEE International on Solid-State Circuits Conference, (ISSCC''07). Digest of Technical Papers. pp. 280 - 603, 2007. [35] Yu-Wen Huang, Tung-Chien Chen, Chen-Han Tsai, Ching-Yeh Chen, To-Wei Chen, Chi-Shi Chen, Chun-Fu Shen, Shyh-Yih Ma, Tu-Chih Wang, Bing-Yu Hsieh, Hung-Chi Fang, Liang-Gee Chen, “A 1.3TOPS H.264/AVC single-chip encoder for HDTV applications,” IEEE International on Solid-State Circuits Conference, Digest of Technical Papers. (ISSCC''05), vol 1, pp. 128 - 588. 2005. [36] Yi-Hau Chen, Tzu-Der Chuang, Yu-Jen Chen, Chung-Te Li, Chia-Jung Hsu, Shao-Yi Chien, Liang-Gee Chen, “An H.264/AVC scalable extension and high profile HDTV 1080p encoder chip,” IEEE Symposium on VLSI Circuits, pp.104 - 105, 2008. [37] Tung-Chien Chen, Yu-Han Chen, Chuan-Yung Tsai, Sung-Fang Tsai, Shao-Yi Chien, Liang-Gee Chen, “2.8 to 67.2mW low-power and power-aware H.264 encoder for mobile applications,” IEEE Symposium on VLSI Circuits, pp.222 - 223, 2007. [38] Yu-Kun Lin, De-Wei Li, Chia-Chun Lin, Tzu-Yun Kuo, Sian-Jin Wu, Wei-Cheng Tai, Wei-Cheng Chang, Tian-Sheuan Chang, “A 242mW 10mm2 1080p H.264/AVC high-profile encoder chip,” IEEE International on Solid-State Circuits Conference, (ISSCC''08). Digest of Technical Papers, pp. 314 - 615, 2008. [39] Zhenyu Liu, Yang Song, Ming Shao, Shen Li, Lingfeng Li, Ishiwata, S., Nakagawa, M., Goto, S., Ikenaga, T., “HDTV1080p H.264/AVC Encoder Chip Design and Performance Analysis,” IEEE Journal of Solid-State Circuits, vol 44, pp. 594 - 608, 2009. [40] Yu-Han Chen, Tung-Chien Chen, Liang-Gee Chen, “Power-scalable algorithm and reconfigurable macro-block pipelining architecture of H.264 encoder for mobile application,” IEEE International Conference on Multimedia and Expo, pp. 281 - 284, 2006.
摘要: 
在本論文中首先將探討H.264/AVC的四級區塊排程編碼系統,針對四級區塊排程編碼系統所造成的資料危障問題提出解決辦法,因此提出一新系統架構混合型二級區塊排程系統來解決四級區塊排程編碼系統的問題。混合型二級區塊排程系統最大的不同點在於將原本分為兩個獨立系統層級的整數點移動估測與高精確度移動估測結合在同一個系統層級,因此為了達到系統層級的平衡,在幀間編碼的演算法中提出眾多快速演算法以達到節省運算量、降低工作週期的目的。
演算法層面提出一快速幀間二階漸進式移動估測演算法來取代全區域區塊比對搜尋演算法,此演算法利用候選區塊取樣技術,並以二階漸進式,粗略搜尋到高精確搜尋的方式來達到快速搜尋的效果。高精確度演算法方面則提出多模式預先決策技術以及以移動估測向量為基礎之快速高精確度移動估測演算法來降低高精確度移動估測所需要運算量。利用此三種演算法來達到降低工作週期以及平衡系統層級的目的。
在硬體層面,由於混合型二級區塊排程系統將整數點移動估測與高精確度移動估測的運算結合在同一層級,因此可以設一單一運算單元來完成整數點移動估測與高精確度移動估測的運算,結合此兩種運算的單一運算單元可以大幅度的減少以往四級區塊排程編碼系統需要兩個獨立的整數點運算單元以及高精確運算單元的硬體花費。
在功率消耗方面,提出一以模型為基礎之具提早中斷機制移動估測演算法,在進行幀間編碼運算之前先行決定是否要提前中斷整個幀間編碼運算,並在此演算法中提供4種不同的省電模式以節省功率消耗。
晶片實驗結果採用UMC Faraday 0.09um 1P9M標準元件製程,最高頻率可以達到205Mhz達到HDTV1080p即時畫面標準,最大功率消耗達到202mW、最低功率消耗則僅需要23mW。

In this thesis, a power-aware inter prediction processor for H.264/AVC high profile HDTV1080p applications is proposed. In order to eliminate the data hazard of the clas-sic four-stage MB pipeline, we propose a hybrid two-stage macroblock(MB) pipeline scheme. Typically, the dual engine architecture is adopted to serve the integer motion estimation(IME) and the fractional motion estimation(FME) using two MB pipeline scheme. In this thesis, an inter prediction processor with a single parallel motion esti-mation engine and a reconfigurable datapath is proposed. We perform the IME and the FME in the single engine to avoid data hazard results from the two MB pipeline scheme. The IME and the FME are performed sequentially via the single engine at the same MB pipeline stage. In order to reduce the overall computational complexity, four hardware friendly algorithms are proposed and coupled with the proposed inter prediction pro-cessor to accomplish the HD1080p video applications. The first technique is to combine two-stage coarse-to-fine fast algorithm with the corresponding 64 process elements(PE) array architecture in order to perform the VBSME in parallel. The second inter mode pre-decision algorithm, which is realized at the IME engine to early decide several better MB partitions, and to substantially decrease the computational cycle counts of the FME. Moreover, one-pass fast FME is further adopted to decrease the computational cycles. For the overall inter prediction, the model-based early termination scheme forth fast algorithm, is proposed to early terminate the H.264/AVC inter prediction based on the statistical modeling of the transform coefficients. In the light of the theoretical model, we develop a power-aware adaptive mechanism with multiple thresholds derived from the statistical model to early terminate the ME operation. The proposed inter prediction processor consists of the current MB memory module, the search area memory module, pixel sum array(PSA), the 6-tap finite impulse response(FIR) filter array, the cluster array with 64 PEs, SAD tree, and the SATD calculation unit. The implementation result shows that the proposed inter prediction processor has 431K logic gates with core size 2.6 2.6 mm2. Moreover, In the UMC 0.09um 1P9M CMOS process, it only consumes about 25~202 mW to achieve the H.264/AVC HP HDTV1080p applications at 205 MHz.
URI: http://hdl.handle.net/11455/8917
其他識別: U0005-2308201011364800
Appears in Collections:電機工程學系所

Show full item record
 

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.