Please use this identifier to cite or link to this item:
Cost-Effective Hardware Sharing Architecture Design of Fast 4x4 and 8x8 Integer Transforms for Multi-Standard Video Codecs
|引用:|| G. A. Su and C. P. Fan, “Low-cost hardware sharing architecture of fast 1-D inverse transforms for H.264/AVC and AVS applications”, IEEE Transactions on Circuits and Systems, Part II, vol.55, no.12, pp.1249-1253, December 2008.  C. W. Chang, S. J. Hsu, C. P. Fan, “Efficient fast transform processor with cost-effective hardware sharing architecture for multi-standard video encoding”, IEEE International Congress on Image and Signal Processing (CISP), 2012 5th, pp.14-18, October 2012.  C. P. Fan, C. H. Fang, C. W. Chang, S. J. Hsu, “Fast Multiple Inverse Transforms With Low-Cost Hardware Sharing Design for Multistandard Video Decoding”, IEEE Transactions on Circuits and Systems II: Express Briefs, vol.58, issue 8, pp.517-521, August 2011.  S. Sha; S. Weiwei; F. Tibo; Z. Xiaoyang, “A Unified 4/8/16/32-Point Integer IDCT Architecture for Multiple Video Coding Standards”, IEEE International Conference on Multimedia and Expo (ICME), pp.788-793, 9-13 July 2012.  M. Muhammad, M. Carl, W. Khan, “A fast hybrid DCT architecture supporting H264, VC-1, MPEG-2, AVS and JPEG codecs”, 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA), pp.545-549, 2-5 July 2012.  Nilsson, Mike E, “The advanced video coding standard”, The IEEE 2-Day Seminar on IT to HD: Visions of Broadcasting in the 21st Century, pp.85-100, 30 Nov.- 1 Dec. 2004.  G. Liu, “An area-efficient IDCT architecture for multiple video standards,” in Proc. 2nd Int. Conf. ICISE, pp. 3518 – 3522, Dec. 2010.  S. Lee and K. Cho, “Architecture of transform circuit for video decoder supporting multiple standards”, Electron. Lett. , vol. 44, no. 4, pp.274–275, Feb. 2008.  S. Kim, H. Chang, S. Lee and K. Cho, “VLSI Design to Unify IDCT and IQ Circuit for Multistandard Video Decoder,” in Proc. ISIC Int. Symp. Prototyping, pp.328-331, 2008.  Martuza, Muhammad, “A cost effective implementation of 8×8 transform of HEVC from H.264/AVC”, 2012 25th IEEE Canadian Conference on Electrical & Computer Engineering (CCECE), pp1-4, April 29 2012-May 2 2012.  Wahid. K, Martuza, M. Das. M, McCrosky. C, “Resource shared architecture of multiple transforms for multiple video codecs”, 2011 24th Canadian Conference on Electrical and Computer Engineering (CCECE), pp947-950, 8-11 May 2011.  Martuza. M, “A cost effective implementation of 8×8 transform of HEVC from H.264/AVC”, 2012 25th IEEE Canadian Conference on Electrical & Computer Engineering (CCECE), pp1-4, April 29 2012-May 2 2012.  G. Liu, “An area-efficient IDCT architecture for multiple video standards”, 2010 2nd International Conference on Information Science and Engineering (ICISE), pp 3518-3522, 4-6 Dec. 2010.  Y. F. Lai, Y. K. Lai, “Design and implementation of reconfigurable IDCT architecture for multi-standard video decoders”, 2010 International SoC Design Conference (ISOCC), pp107-110, 22-23 Nov. 2010.|
|摘要:||在此篇論文中，目的是將多種視訊壓縮標準中的整數離散餘弦轉換在實作上利用硬體共享的技術，以降低晶片的硬體成本。此硬體共享架構支援了H.264/AVC、AVS、VC-1、MPEG-1/2/4、HEVC與VP8 等多種標準的4x4 與8x8 離散轉換，並且囊括了正轉換共享、反轉換共享與正反轉換共享三個版本的硬體。首先，將各離散餘弦轉換矩陣運算化，再將轉換矩陣做稀疏矩陣的拆解，以簡化運算複雜度與降低晶片面積以及運算時間; 再將各種視訊壓縮標準的轉換稀疏矩陣組合運用，並使用提出的演算法做更細部的矩陣拆解，找出共同子矩陣再加以共享，達到更進一步的硬體簡化的效果。
我們提出的完整二維離散餘弦轉換，可分為兩個一維的正反轉共享轉換以及一個轉置記憶體，總邏輯閘數量為57.3K個。此論文提出的一維4x4與8x8正反轉共享架構與完全無共享設計的直接架構相比，加法運算減少了83.5%，而位移的運算也減少了60.8%。在VLSI實作上，為了滿足Full HD (1920x1080@60Hz)的規格，操作頻率為110.8MHz，而最高操作頻率可達到200MHz。在一維正向轉換、一維反向轉換與一維正反轉換共享的三種晶片硬體實作中，所使用的邏輯閘數量分別為16815、17438與22212個。|
In this thesis, the research destination is to reduce the chip area by using hardware sharing techniques to multiple 4x4 and 8x8 integer discrete cosine transforms for multiple video coding standards. The proposed hardware sharing architecture supports two sizes transforms, i.e. 4x4 and 8x8, for H.264/AVC, AVS, VC-1, MPEG-1/2/4, HEVC, and VP8 video coding standards. The proposed VLSI implementations include three versions, which are the hardware sharing based multiple forward transforms, the hardware sharing based multiple inverse transforms, and the hardware sharing based multiple forward and inverse transforms. Firstly, we replace the transform matrices with the well-known matrix expressions, and then decompose the entire transform matrices to several sparse matrices. Thus, the computational complexity, the chip area, and the operational time are reduced. By using the proposed matrix decomposition algorithm, we decompose the sparse transform matrices to be small further for ease and efficient hardware shares. For the complete video transform coding, the two dimensional transform architecture is needed, and it is divided into two 1-D transforms and one transpose memory. The proposed hardware sharing based 2-D transform requires 57.3K gates. Compared with the individual implementation without shares, the proposed 1-D hardware sharing based multiple forward and inverse transform design reduces additions by 83.5% and shift operations by 60.8%. The operational frequency is 110.8MHz to satisfy the Full HD (1920x1080@60Hz) specification, and the maximum operational frequency can be up to 200MHz. The gate counts of the hardware sharing based 1-D forward, 1-D inverse, and 1-D forward and inverse transform designs, are 16815, 17438 and 22212, respectively.
|Appears in Collections:||電機工程學系所|
Show full item record
TAIR Related Article
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.