Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/8558
標題: 應用於多視訊標準之二維轉換架構與適用於H.264/AVC高級規範之幀內畫面編碼器電路架構設計與實現
Design and Implementation of 2-D Transform Architecture for Multi-Standard Video Applications and Intra Frame Coder for H.264/AVC High Profile
作者: 康弘鈞
Kang, Hung-Chun
關鍵字: Intra Frame Coder for H.264 High Profile;H.264高級規範之幀內畫面編碼器;H.264 Intra Prediction Generator for high profile;Multi-Standard Video Applications;2-D transform;H.264高級規範之幀內預測產生器;多視訊標準;二維轉換
出版社: 電機工程學系所
引用: [1] ITU-T Recommendation H.263: Video coding for low bitrate communication, Mar. 1996. [2] ITU-T Recommendation H.264: Advanced Video Coding for Generic Audiovisual Service, Mar. 2005. [3] “Proposed SMPTE Standard for Television: VC-1 Compressed Video Bitstream Format and Decoding Process,” Document 421M, SMPTE, Aug. 2005. [4] M. J. Narasimha and A. M. Peterson, “On the computation of the discrete cosine transform,” IEEE Trans. Commun., vol. 26, no.6, pp. 934-936, June 1978. [5] S. Yu and E. E. Swartzlander Jr., “A scaled DCT architecture with the CORDIC algorithm,” IEEE Trans. Signal Process., vol. 50, no. 1, pp. 160-167, Jan. 2002. [6] S.-F. Hsiao, Y. H. Hu, T.-B. Juang, and C.-H. Lee, , “ Efficient VLSI implemen-tation of fast multiplierless approximated DCT using ,parameterized hardware modules for silicon intellectual property design,” IEEE Trans. Circuits Syst. I: Reg. Papers, vol. 52, no. 8, pp. 1568-1579, Aug. 2005. [7] H. Malvar, “Fast computation of discrete cosine transform through fast Hartley transform,” Electron. Lett., vol. 22, no. 7, pp. 352-353, March 1986. [8] A. Madisetti and A. N. Willson Jr., “A 100MHz 2-D 8x8 DCT/IDCT processor for HDTV applications,” IEEE Trans. Circuits Syst. Video Technol., vol. 5, no. 2, pp. 158-165, April 1995. [9] T. Xanthopoulos and A. P. Chandrakasan, “A low-power IDCT macrocell for MPEG-2 MP@ML exploiting data distribution properties for minimal activity,” IEEE J. Solid-State Circuits, vol. 34, no. 5, pp. 693-703, May 1999. [10] Y.-P. Lee, T.-H. Chen, L.-G. Chen, M.-J. Chen, and C.-W. Ku, “A cost-effective architecture for 8x8 two-dimensional DCT/IDCT using direct method,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, no. 3, pp. 459-467, June 1997. [11] J. Lee, N. Vijaykrishnan, and M. J. Irwin, “Inverse discrete cosine transform ar-chitecture exploiting sparseness and symmetry properties,” IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 5, pp. 655-662, May 2006. [12] Y. H. Hu and Z. Wu, “An efficient CORDIC array structure for the implementa-tion of discrete cosine transform,” IEEE Trans. Image Processing, vol. 43, no. 1, pp. 331-336, Jan. 1995. [13] J.-H. Hsiao, L.-G. Chen, T.-D. Chiueh, and C.-T. Chen, “High throughput CORDIC-based systolic array design for the discrete cosine transform,” IEEE Trans. Circuits Syst. Video Technol., vol. 5, no. 3, pp. 218-255, June 1995. [14] E. P. Mariatos, D. E. Metafas, J. A. Hallas, and C. E. Goutis, “A fast DCT proc-essor based on special purpose CORDIC rotators,” in Proc. IEEE Int. Symp. Cir-cuits Syst., vol. 4, no. 30, pp. 271-274, May 1994. [15] F. Zhou and P. Komerup, “High speed DCT/IDCT using a pipelined CORDIC algorithm,” in Proc. 12th Symp. Comp. Arith., pp. 180-187, July 1995. [16] T.-Y. Sung, Y.-S. Shieh, C.-W. Yu, and H.-C. Hsin, “High-efficiency and low power architectures for 2-D DCT and IDCT based on CORDIC rotation,,” in Proc. IEEE Conf. PDCAT, pp. 191-196, Dec. 2006. [17] J.-I. Guo, C.-M. Liu, and C.-W. Jen, “A new array architecture for prime length discrete cosine transform,” IEEE Trans. Signal Process., vol. 41, no. 1, pp. 436-442, Jan. 1993. [18] Y.-H. Chan and W.-C. Siu, “A cyclic correlated structure for the realization of discrete cosine transform,” IEEE Trans. Circuits Syst. II: Analog Digit. Signal Process., vol. 39, no. 21, pp. 109-113, Feb. 1992. [19] D. F. Chiper, “Novel systolic array design for discrete cosine transform with high throughput rate,” in Proc. IEEE Int. Symp. Circuits Syst., vol. 2, pp. 746-749, May 1996. [20] C. Cheng and K. K. Parhi, “A novel systolic array structure for DCT,” IEEE Trans. Circuits Syst. II: Exp. Briefs, vol. 52, no. 7, pp. 366-369, July 2005. [21] C. Cheng and K. K. Parhi, “Hardware efficient fast DCT based on novel cyclic convolution structures,” IEEE Trans. Signal Process., vol. 54, no. 11, pp. 4419-4434, Nov. 2006. [22] D. F. Chiper, M. N. S. Swamy, M. O. Ahmad, “An efficient unified framework for implementation of a prime-length DCT/IDCT with high throughput,” IEEE Trans. Signal Process., vol. 55, no. 6, pp. 2925-2936, June 2007. [23] M. T. Sun, T. C. Chen, and A. M. Gottlieb, “VLSI implementation of a 16x16 discrete cosine transform,” IEEE Trans. Circuits Syst., vol. 36, no. 4, pp. 610-617, April 1989. [24] S. Uramoto, Y. Inoue, A. Takabatake, J. Takeda, Y. Yamashita, H. Terane, and M. Yoshimoto, “A 100-MHz 2-D discrete cosine transform core processor,” IEEE J. Solid-State Circuits., vol. 27, no. 4, pp. 492-498, April 1992. [25] H. Fujiwara, M. L. Liou, M. T. Sun, K. M. Yang, M. Maruyama, K. Shomura, and K. Ohyama, “An all-ASIC implementation of a low bit-rate video codec,” IEEE Trans. Circuits Syst. Video Technol., vol. 2, no. 2, pp. 123-133, June 1992. [26] T. Masaki, Y. Morimoto, T. Onoye, and I. Shirakawa, “VLSI implementation of inverse discrete cosine transformer and motion compensator for MPEG2 HDTV video decoding,” IEEE Trans. Circuits Syst. Video Technol., vol. 5, no. 5, pp. 387-393, Oct. 1995. [27] T. Kuroda, T. Fujita, S. Mita, T. Nagamatsu, S. Yoshioka, K. Suzuki, F. Sano, M. Norishima, M. Murota, M. Kato, M. Kinugawa, M. Kakumu, and T. Sakurai, “A 0.9V, 150-MHz, 10-mW, 4 mm2, 2-D discrete cosine transform core processor with variable threshold-voltage (VT) scheme,” IEEE J. Solid-State Circuits., vol. 31, no. 11, pp. 1770-1777, Nov. 1996. [28] L. Fanucci and S. Saponara, “Data driven VLSI computation for low power DCT-based video coding,” in Proc. IEEE 9th Int. Conf. Electron. Circuits Syst., pp. 541-544, Sept. 2002. [29] T. Xanthopoulos, and A. P. Chandrakasan, “A low-power DCT core using adap-tive bitwidth and arithmetic activity exploiting signal correlations and quantiza-tion,” IEEE J. Solid-State Circuits, vol. 35, no. 2, pp. 740-750, May 2000. [30] H. C. Karthanasis, “A low ROM distributed arithmetic implementation of the forward/inverse DCT/DST using rotations,” IEEE Trans. Consumer Electron., vol. 41, no. 2, pp. 263-272, May 1995. [31] J.-I. Guo and C.-C. Li, “A generalized architecture for the one-dimensional dis-crete cosine transforms,” IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 7, pp. 874-881, July 2001. [32] J. W. Chen, K. Hung, J. S. Wang, and J. I. Guo, “A performance aware IP core design for multi-mode transform coding using scalable-DA algorithm,” in Proc. IEEE Symp. Circuits Syst., pp. 21-24, May 2006. [33] J.-I. Guo, C.-M. Liu, and C.-W. Jen, “The efficient memory-based VLSI array de-signs for DFT and DCT,” IEEE Trans. Circuits Syst. II: Analog Digit. Signal Process., vol. 39, no. 10, pp. 723-733, Oct. 1992. [34] J.-I. Guo, J.-W. Chen, and H.-C. Chen, “A new 2-D 8x8 DCT/IDCT core design using group distributed arithmetic,” in Proc. IEEE Int. Symp. Circuits Syst., vol. 2, pp. II-752 - II-755, May 2003. [35] H.-C. Chen, J.-I. Guo, T.-S. Chang, and C.-W. Jen, “A memory-based realization of cyclic convolution and its application to discrete cosine transform,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 3, pp. 445-453, March 2005. [36] D. F. Chiper, M. N. S. Swamy, M. O. Ahmad, and T. Stouraitis, “Systolic algo-rithms and a memory-based design approach for a unified architecture for the computation of DCT/DST/IDCT/IDST,” IEEE Trans. Circuits Syst. I: Reg. Pa-pers, vol. 52, no. 6, pp. 1125-1137, June 2005. [37] P. K. Meher, “Systolic designs for DCT using a low-complexity concurrent con-volutional formulation,” IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 9, pp. 1041-1050, Sep. 2006. [38] P. K. Meher, “Unified systolic-like architecture for DCT and DST using distrib-uted arithmetic,” IEEE Trans. Circuits Syst. I: Reg. Papers, vol. 53, no. 12, pp. 2656-2663, Dec. 2006. [39] A. M. Shams, A. Chidanandan, W. Pan, and M. A. Bayoumi, “NEDA: a low-power high-performance DCT architecture,” IEEE Trans. Signal Process., vol. 54, no. 3, pp. 955-964, March 2006. [40] A. Chidanadan and M. Bayoumi, “Area-efficient NEDA architecture for the 1-D DCT/IDCT,” in Proc. IEEE Int. Conf. Acoust., Speech, and Signal Process., vol. 3, pp. III-944 - III-947, May 2006. [41] Y. Chen, X. Cao, Q. Xie, and C. Peng, “An area efficient high performance DCT distributed architecture for video compression,” in Proc. IEEE Int. Conf. Ad-vanced Communication Technology, vol. 1, pp. 238-241, Feb. 2007. [42] H. Malvar, A. Hallapuro, M. Karczewicz, and L. Kerofsky, “Low-complexity transform and quantization in H.264/AVC,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 598-603, July 2003. [43] Y.-W. Huang, B.-Y.Hsieh, T.-C. Chen, and L.-G. Chen, “Analysis, fast algo-rithm, and VLSI architecture design for H.264/AVC intra-frame coder,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 3, pp. 378-401, March 2005. [44] Z.-Y. Cheng, C.-H. Chen, B.-D. Liu, and J.-F. Yang, “High throughput 2-D transform architectures for H.264 advanced video coders,” in Proc. IEEE Asia-Pacific Conf. Circuits Syst., vol. 2, pp. 1141-1144, Dec. 2004. [45] K.-H. Chen, J.-I Guo, K.-C. Chao, J.-S. Wang, and Y.-S. Chu, “A high-performance low power direct 2-D transform coding IP design for MPEG-4 AVC/H.264 with a switching power suppression technique,” in Proc. IEEE Int. Symp. VLSI Design, Automation and Test, pp. 291-294, April 2005. [46] C.-P. Fan, “Fast 2-dimensional 4x4 forward integer transform implementation for H.264/AVC,” IEEE Trans. Circuits Syst. II: Exp. Briefs, vol. 53, no. 3, pp. 174-177, March 2006. [47] K.-H. Chen, J.-I. Guo, and J.-S. Wang, “A high-performance direct 2-D transform coding IP design for MPEG-4 AVC/H.264,” IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 4, pp. 472-483, April 2006. [48] W. Hwangbo, J. Kim, and C.-M. Kyung, “A high-performance 2-D inverse trans-form architecture for the H.264/AVC decoder,” in Proc. IEEE Int. Symp. Circuits Syst., pp. 1613-1616, May 2007. [49] C.-P. Fan, “Fast 2-dimensional 8x8 integer transform algorithm design for H.264/AVC fidelity range extensions,” IEICE Trans. Inf. & Syst., vol. E89-D. no. 12, pp. 3006-3011, Dec. 2006. [50] C.-P. Fan, “Cost-effective hardware sharing architectures for fast 8x8 and 4x4 integer transform for H.264/AVC,” in Proc. IEEE Asia-Pacific Conf. Circuits Syst., pp. 776-779, Dec. 2006. [51] Y.-C. Chao, H.-H. Tsai, Y.-H. Lin, J.-F. Yang, and B.-D. Liu, “A novel design for computation of all transforms in H.264/AVC decoders,” in Proc. IEEE Int. Conf. Multimedia and Expo, pp. 1914-1917, Aug. 2007. [52] D. Gong, Y. He, and Z. Cao, “New cost-effective VLSI implementation of a 2-D discrete cosine transform and its inverse,” IEEE Trans. Circuits Syst. Video Technol., vol. 14, no. 4, pp. 405-415, April 2004. [53] C.-W. Ku, C.-C. Cheng, G.-S. Yu, M.-C. Tsai, and T.-S. Chang, “A high-definiation H.264/AVC intra-frame codec IP for digital video and still cam-era applications,” IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 8, pp. 917-928, Aug. 2006. [54] L.-F. Chen, K.-H. Li, C.-Y. Huang, and Y.-K. Lai, “Analysis and architecture de-sign of multi-transform architecture for H.264/AVC intra frame coder,” in Proc. IEEE Int. Conf. Multimedia & Expo, pp. 145-148, June 2008. [55] Joint Video Team of ISO/IEC MEPG and ITU-T VCEG, H.264/AVC Reference Software JM14.2, http://bs.hhi.de/suehringg/tml/dwnload/. [56] Yu-Wen Huang, Bing-Yu Hsieh, Tung-Chien Chen, and Liang-Gee Chen, “Analysis, Fast Algorithm, and VLSI Architecture Design for H.264/AVC Intra Frame Coder,” IEEE Trans. Circuit and Systems for Video Technology, vol. 15, no. 3, March 2005. [57] Kibum Suh, Seongmo Park ,and Hanjin Cho, “An Efficient Hardware Architec-ture of Intra Prediction and TQ/IQIT Module for H.264 Encoder,” ETRI Journal, vol.27, no. 5, October 2005 [58] Chun-Wei Ku, Chao-Chung Cheng, Guo-Shiuan Yu, Min-Chi Tsai, and Tian-Sheuan Chang, “A High-Definition H.264/AVC Intra-Frame Codec IP for Digital Video and Still Camera Applications,” IEEE Trans. Circuits and Systems for Video Technology, vol. 16, no. 8, August 2006 [59] De-Wei Li, Chun-Wei Ku, Chao-Chung Cheng, Yu-Kun Lin, and Tian-Sheuan Chang, “A 61MHz 72K Gate 1280x720 30FPS H.264 Intra Encoder,” IEEE In-ternational Conference Acoustics, Speech and Signal Processing(ICASSP) 2007 [60] Chun-Hao Chang, Jia-Wei Chen, Hsiu-Cheng Chang, Yao-Chang Yang, Jinn-Shyan Wang, and Jiun-In Guo, “A Quality Scalable H.264/AVC Baseline In-tra Encoder for High Definition Video Applications,” Proc. 2007 IEEE Workshop on Signal Processing Systems, Oct. 17-19, Shanghai, China 2007 [61] Y. K. Lai, C. C Chou, and Y. C. Chung, “A Simple and Cost Effective Video En-coder with Memory-Reducing CAVLC,” in Proc. ISCAS' 05, vol. 1, page(s):432 - 4351, May 2005. [62] Yu-Kun Lin, Chun-Wei Ku, De-Wei Li, and Tian-Sheuan Chang, ”A 140-MHz 94K Gates HD 1080p 30-Frames/s Intra-Only Profile H.264 Encoder,“ IEEE Trans. Circuits and Systems for Video Technology, vol. 19, no. 3, March 2009 [63] Tzu-Der Chuang, Yi-Hau Chen, Chen-Han Tsai, Yu-Jen Chen, and Liang-Gee Che, “Algorithm and Architecture Design for Intra Prediction in H.264/AVC High Profile,” in Picture Coding Symposium (PCS), 2007. [64] Seiji Mochizuki, Tetsuya Shibayama, Masaru Hase, Fumitaka Izuhara, Kazushi Akie, Masaki Nobori, Ren Imaoka, Hiroshi Ueda, Kazuyuki Ishikawa, and Hiromi Watanabe, “A 64mW High Picture Quality H.264/MPEG-4 Video Codec IP for HD Mobile Applications in 90 nm CMOS,” IEEE Journal of Solid Circuit, VOL. 43, NO. 11, November 2008
摘要: 
首先,我們使用新式分散式算術演算法(NEDA)來實現我們的2-D多重轉換架構,採用這演算法不需要乘法和ROM,讓電路可以用簡單的位移器和加法器就可以完成。我們也利用新式分散式算術演算法提出了一個有效的2-D轉換架構,且利用單一核心即可完成傳統DCT的8x8運算以及H.264/AVC 的8x8與4x4整數轉換、4x4 Hadamard轉換以支援多視訊編碼標準的應用。針對H.264/AVC標準,我們使用相同的演算法,提出了一個高處理量的直接2-D多重轉換架構,一個4x4區塊只需2個週期或1個週期即可完成運算。
此外,我們亦提出了一個工作於207.4MHz和每秒30張畫面HD1080P解析度的8平行度H.264高級規範幀內畫面編碼架構。利用所提出的單一核心之多重轉換架構,可以同時產生整數轉換和Hadamard轉換的結果,來達到減少執行週期與維持影像品質。同時我們在影像編碼的排程上,提出了I8MB/I16MB/I4MB交錯的排程,來使得硬體使用效率大大的提昇。而在幀內預測架構上,我們使用了種子(seed)運算的方法,使得幀內預測影像可以在2個週期完成4×4影像區塊,8個週期完成8x8影像區塊。在編碼架構上,我們採用了記憶體交錯(Ping-Pong mode)的技術,使得預測階段與編碼階段得以分離,如此才可完成HD1080P畫面的編碼。最後實作結果顯示本架構可工作於207.4MHz的頻率下,完成HD1080P@30fps的影像壓縮而晶片面積為1.26×1.26mm2。我們架構的模式判斷是採用Hadamard Transform的相減轉換絕對值之和(SATD),且有支援I16MB平面模式的預測畫面。若將我們的架構應用於HD720P@30(1280×720)畫面上的話,只需工作於92MHz即可完成,因此可以將此電路架構運用於多種影像相關的電子產品上。

First, we adopt a new distributed arithmetic algorithm (NEDA) to implement our architecture. Without adopting multiplier and ROM, the architecture can easily be implemented by using shifts and adders. Therefore, we propose an efficient two-dimensional (2-D) transform architecture with unified kernel that can support traditional 8x8 DCT, the 8x8 and 4x4 integer transform, and the 4x4 Hadamard transform for multi-standard video coding applications. For the H.264/AVC standard, we also propose a high throughput direct 2-D multiple transform by using the same algorithm; therefore, a 4x4 block can be performed in two or one cycle.
In thesis, with eight parallel processing elements, the H.264 intra frame coder architecture for high profile is also proposed to achieve HD1080P@30fps. Also, we propose a unified kernel multi-transform architecture which can achieve the integer transform and the Hadamard transform simultaneously, reduce operation cycles, and maintain image quality. In our image coding schedule, we propose I8MB/I16MB/I4MB interleaving schedule to increase hardware utilization. Besides, a seed method is employed in calculating the result of the intra prediction so as to achieve the 4×4 block predictor at two cycles and the 8x8 block predictor at eight cycles in the intra predictor unit. In the entropy coding architecture, we adopt memory interleaving technique to differentiate between the prediction phase and the coding phase. Hence, this technique can achieve the real-time video coding for HD1080P applications. According to the experimental results, the results of the chip implementation show that the proposed architecture can operate at 207.4MHz to achieve HD1080P@30fps image compression, and the chip size is 1.26×1.26mm2. Furthermore, our architecture adopts the Hadamard transform (SATD) for the mode decision, and supports the plane mode of the I16MB prediction. The proposed architecture can perform 1280×720@30fps in real-time at 92MHz as it is applied to the HD720P(1280×720)specification. Consequently, the proposed architecture can be utilized in various mobile video applications.
URI: http://hdl.handle.net/11455/8558
其他識別: U0005-1808200918464400
Appears in Collections:電機工程學系所

Show full item record
 

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.