Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/6501
標題: 適用於多視訊編碼標準之快速多模式整數轉換及其低成本硬體共享架構設計與晶片實作
Low-Cost Hardware-Sharing Design and Chip Implementation of Fast Multiple Integer Transforms for Multi-Standard Video Codec
作者: 許順吉
Hsu, Shun-Ji
關鍵字: Integer Transform;整數轉換;Hardware Sharing Architecture;Fast Algorithm;硬體共享架構;快速演算法
出版社: 電機工程學系所
引用: 中文參考資料 [一] 戴顯權, “多媒體通訊”,民國97年1月 [二] http://zh.wikipedia.org/wiki/File:Dandelion_clock_quarter_dft_dct.png 英文參考資料 [1] T. Wiegand, G. J. Sullivan, G. Bjøntegaard, and A. Luthra, Overview of the H.264/AVC video coding standard, IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 560-576, July 2003. [2] W. Gao, C. Reader, F. Wu, Y. He, and L. Yu, H. Lu, S. Yang, T. Huang, and X. Pan, “AVS – The Chinese Next-Generation Video Coding Standard”, National Association of Broadcasters (NAB) Conference 2004. [3] S. Srinivasan, P. Hsu, T. Holcomb, K. Mukerjee, S. L. Regunathan, B. Lin, J. Liang, M. C. Lee and J. Ribas-Corbera, “Windows Media Video 9: overview and applications,” Signal Processing: Image Communication, vol.19, issue 9, pp.851-875, October 2004. [4] S. Lee, K. Cho, “Architecture of transform circuit for video decoder supporting multiple standards,” Electron. Lett, vol. 44, no.4, pp. 274-275, Feb. 2008. [5] C. P. Fan and G. A. Su, “Efficient Low Cost Sharing Design of Fast 1-D Inverse Integer Transform Algorithms for H.264/AVC and VC-1,” IEEE Signal Processing Letters, vol.15, pp.926-929, December 2008. [6] G. A. Su and C. P. Fan, “Low-cost hardware sharing architecture of fast 1-D inverse transforms for H.264/AVC and AVS applications,” IEEE Transactions on Circuits and Systems, Part II, vol.55, no.12, pp.1249-1253, December 2008. [7] C. P. Fan and G. A. Su, “Fast Algorithm and Low Cost Hardware Sharing Design of Multiple Integer Transforms for VC-1,” IEEE Transactions on Circuits and Systems, Part II, vol.56, no.10, pp.788-792, October 2009. [8] H. Qi, Q. Huang, W. Gao, “A low-cost very large scale integration architecture for multistandard inverse transform,” IEEE Transactions on Circuits and Systems, Part II, vol. 57, no. 7, pp. 551-555, July 2010. [9] Y. Li, Y. He, and S. L. Mei, “A highly parallel joint VLSI architecture for transforms in H.264/AVC,” Journal of Signal Processing Systems, vol. 50, no. 1, pp.19-32, Jan. 2008. [10] C. Y. Huang, L. F. Chen, and Y. K. Lai, “A high-speed 2-D transform architecture with unique kernel for multi-standard video applications”, IEEE International Symposium on Circuits and Systems, pp. 21-24. 2008. [11] J. H. Part, S. H. Lee, K. S. Lim, J. H. Kim, and S. Kim, “A flexible transform processor architecture for multiple-CODECSs”, IEEE International Symposium on Circuits and Systems, pp. 5347-5350, 2006. [12] ISO/IEC JTC 1/SC 29/WG 1 - Coding of Still Pictures, 2009. [13] ISO/IEC 11172-2 MPEG-1 Video Coding Standard, Information technology - Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s - Part 2: Video, 1993. [14] ISO/IEC 13818-2 MPEG-2 Video Coding Standard, Information technology - Generic coding of moving pictures and associated audio information: Video, 1995. [15] ISO/IEC 14496-2 MPEG-4 Video Coding Standard, Information technology - Coding of audio-visual objects - Part 2: Visual, 2004. [16] T. Wiegand and G. Sullivan, “Draft ITU-T recommendation and final draft international standard of joint video specification” (ITU-T rec. H.264/ISO/IEC 14496-10 AVC, presented at Joint Video Team (JVC) of ISO/IEC MPEG and ITU-T VCEG), 2003. [17] S. Gordon, D. Marple, and T. Wiegand, “Simplified use of 8x8 transforms - Updated proposal and results”, JVT-K028, 11th Meeting, Munich, Germany, March 2004. [18] L. Yu, S. Chen, and J. Wang, “Overview of AVS video coding standards”, Signal Processing: Image Communication, vol. 24, issue 4, pp.247-262, April 2009. [19] I.E.G Richardson, H.264 and MPEG-4 Video Compression-Video Coding for Next-generation Multimedia, John Wiley & Sons, 2003. [20] K. H. Chen, J. I. Guo, and J. S. Wang, “A High-Performance Direct 2-D Transform Coding IP Design for MPEG-4 AVC/H.264,” IEEE Transactions on Circuits and Systems for Video Technology, pp.472-483, Vol. 16, No. 4, April 2006. [21] Z. Y. Cheng, C. H. Chen, B. D. Liu and J. F. Yang, "High throughput 2-D transform architectures for H.264 advanced video coders," IEEE Asia-Pacific Conference on Circuits and Systems, Vol. 2, pp. 1141-1144, Dec. 2004. [22] B. Shi, W. Zheng, D. Li, and M. Zhang, “Fast Algorithm and Architecture Design for H.264/AVC Multiple Transforms”, IEEE International Conference on Multimedia and Expo, pp. 2086-2089, July 2007. [23] W. Hwangbo, J. Kim, and C. M. Kyung, “A High-Performance 2-D Inverse Transform Architecture for the H.264/AVC Decoder”, IEEE International Symposium on Circuits and Systems, pp.1613-1616, May 2007. [24] C. P. Fan and Y. L. Lin, “Implementations of Low-Cost Hardware Sharing Architectures for Fast 8x8 and 4x4 Integer Transforms in H.264/AVC”, IEICE TRANS. Fundamentals, Vol.E90-A, No.2, Feb. 2007.
摘要: 
在本論文中,以降低計算複雜度及節省硬體成本為目的,分別對H.264/AVC、AVS、VC-1及MPEG-1/2/4視訊標準之間的一維反向整數轉換、正向整數轉換和正反向整數轉換共享架構提出快速演算法。所提出的快速演算法則是以矩陣的行列置換及分解技巧及加入補償矩陣做運算為設計出發點,將轉換矩陣分解成稀疏矩陣的乘積,除了能夠降低計算複雜度,更可將資料流程簡單化並且方便我們之後對硬體做管線化處理。另外盡可能拆解出相同的矩陣來做不同視訊標準之間的硬體電路共享。以硬體觀點而言,乘法器所佔據之硬體面積較大且延遲時間長,所以我們利用加法及位移運算來完成固定係數乘法器的運算。
另外對於二維設計,我們利用兩級行列方式,用一維共享架構來實作二維多模式整數轉換的硬體電路。在此,我們需要一組暫存器陣列來實作轉置記憶體電路(transpose memory)。在第一級,我們對輸入矩陣的每一行做一維整數轉換,之後再將轉換後的資料以行轉列的方式輸入到轉置記憶體中存放。而在第二級中,則對轉置記憶體的輸出矩陣的每一列做一維整數轉換輸出,以達到二維多模式整數轉換的目的。
在VLSI實作的部分,根據合成的結果,所提出的一維反向、正向和正反向整數轉換硬體共享架構所需的邏輯閘數量分別為35427、18672和51696個邏輯閘,與沒有考慮共享的各別轉換設計相比,分別大約節省了45%、51%和49%的邏輯閘數量,由此可見,所提出的共享架構確實能夠降低硬體的成本。在二維設計的部分,反向、正向和正反向整數轉換硬體共架構所需的邏輯閘數量分別為95188、62591和132655個邏輯閘,最大的電路工作速度可達到125MHz,所以吞吐量(throughput)可以達到每秒1000百萬畫素(pixels)。

In this thesis, the fast multiple integer transforms algorithms and their hardware sharing designs of H.264/AVC, AVS, VC-1, and MPEG-1/2/4 are proposed by using the matrix operations, which are the row/column permutations, the decompositions with the sparse matrices, and the matrix offset computations. By factorizations and shift-and-addition computations, the proposed 1-D hardware sharing transform scheme is achieved without multiplications, which can save the area for hardware designs. In addition, through decomposing the original transform matrices into the product of the sparse matrices, the computational complexities can be reduced.
To implement the hardware design of the 2-D transforms, the two-stage row-column wise scheme is applied to our design with the proposed 1-D hardware sharing architecture and the transpose memory, which is a single register array. In the first stage, the 1-D columns of the input data are transformed and the transformed data are transferred into the transpose memory. In the second stage, the 1-D rows of the first stage outputs, which are obtained from the transpose memory, are transformed consecutively.
For multiple-standard video codec, the hardware cost of the proposed 1-D hardware sharing inverse, forward, and inverse/forward transform designs reduces gate counts by 45%,51%, and 49%, respectively, compared with that of the individual and separate realizations. Then the hardware cost of the proposed 2-D hardware sharing inverse, forward, and inverse/forward transform designs requires 95188, 62591, 132655 gate counts, respectively, and can process up to 125MHz operational frequency. According to the synthesis results, the throughput rate achieves 1000M pixels/sec at 125MHz.
URI: http://hdl.handle.net/11455/6501
其他識別: U0005-1508201119151400
Appears in Collections:電機工程學系所

Show full item record
 

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.