Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/7414
標題: 適用於多視訊編碼標準之低功率具單一核心的二維 轉換架構設計與實現
Design and Implementation of Low Power 2-D Transform Architecture with Unique Kernel for Multi-Standard Video Coding Applications
作者: 黃重裕
Huang, Chong-Yu
關鍵字: discrete cosine transform (DCT);離散餘弦轉換;integer transform;VLSI;multi-standard;MPEG-1/2/4;H.264/AVC;整數轉換;超大型積體電路;多視訊標準
出版社: 電機工程學系所
引用: [1] ITU-T Recommendation H.263: Video coding for low bitrate communication, Mar. 1996. [2] ITU-T Recommendation H.264: Advanced Video Coding for Generic Audiovisual Service, Mar. 2005. [3] T. Wiegand, G. J. Sullivan, G. Bjontegard, and A. Luthra, “Overview of the H.264/AVC Video Coding Standard,”IEEE Trans. on Circuits and System for Video Technology, vol. 13, no. 7, pp. 560-576, July 2003. [4] A. Puri, X. Chen, and A. Luthra, “Video Coding Using the H.264/MPEG-4 AVC Compression Standard,” IEEE Trans. on Signal Processing: Image Communication, pp. 793-849, 2004. [5] R. Schafer, T. Wiegand, and H. Schwarz, “The emerging H.264/AVC standard,” EBU Technical Review, Jan. 2003. [6] G. J. Sullivan, P. topiwala, and A. Luthra, “The H.264/AVC Advanced Video Coding Standard: Overview and Introduction to the Fidelity Range Extensions,” SPIE Conf. on Applicatoins of Digital Image Processing, Aug. 2004. [7] D. Marpe, and T. W., “H.264/MPEG4-AVC Fidelity Range Extensions: Tools, Profile, Performance, and Application Areas,” IEEE International Conf. Image Processing, vol. 1, pp. 593-596, Sept. 2005. [8] H. S. Hou, “A Fast Recursive Algorithm For Computing the Discrete Cosine Transform,” IEEE Trans. Acoustics, Speech, Signal Processing, vol. ASSP-35, no. 10, pp. 1455-1461, Oct. 1987. [9] A. Madisetti, and N. Willson, Jr., “A 100 MHz 2-D 8x8 DCT/IDCT processor for HDTV applications,” IEEE Trans. on Circuits and System for Video Technology, vol. 5, no. 2, pp. 158-165, April 1995. [10] W. H. Chen, C. H. Smith, and S. C. Fralick, “A Fast Computational Algorithm for the Discrete Cosine Transform,” IEEE Trans. on Communications, vol. COM-25, no. 9, Sept. 1977. [11] C. Loeffler, A. Ligtenberg, and George S. Moschytz, “Practical Fast 1-D DCT Algorithm with 11 Multiplications,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Process, vol. 2, pp. 988-991, May 1989. [12] Y. M. Chien, and Y. Lin “A Recursive DCT Algorithm with New Distributed Arithmetic,” IEEE ICASSP Internal. Conf. Comm. Circuits and System Proceeding, vol. 4, pp. 2582-2587, June 2006. [13] M. T. Sun, T. C. Chen, and A. M. Gottlieb, “VLSI implementation of a 16x16 discrete cosine transform (DCT),” IEEE Trans. on Circuits and System, vol. CAS-36, no. 4, pp. 610-617, April 1989. [14] S. Uramoto, Y. Inoue, A. Takabatake, J. Takeda, Y. Yamashita, H. Terane, and M. Yoshimoto, “A 100M-Hz 2-D Discrete Cosine Transform Core Processor, “IEEE J. Solid-State Circuits, vol. 27, no. 27, pp. 492-499, April 1992. [15] W. Pan, “A Fast 2-D DCT Algorithm Via Distributed Arithmetic Optimization,” International Conf. on Image Processing, vol. 3, pp. 114-117, Sept. 2000. [16] A. M. Shams, A. Chidanandan, W. Pan, and M. A. Bayoumi, “NEDA: A low-power high-performance DCT architecture,” IEEE Trans. Signal Processing, vol. 54, no. 3, pp. 955-964, Mar. 2006. [17] S. Ghosh, S. Venigalla and M. Bayoumi, “Design and Implementation of a 2D DCT Architecture using Coefficient Distributed Arithmetic,” IEEE Computer Society Annual Symposium on VLSI, pp. 162-166, May 2005. [18] L. Fanucci and S. Saponara, “Data Driven VLSI Computation for Low Power DCT-Based Video Coding,” in Proc. 9th Int. Conf. Electronics, Circuits, System, pp. 541-544, Sept. 2002. [19] T. Xanthopoulos, and A. P. Chandrakasan, “A Low-Power DCT Core Using Adaptive Bitwidth and Arithmetic Activity Exploiting Signal Correlations and Quantization,” IEEE J. Solid-State Circuits, vol. 35, no. 2, pp. 740-750, May 2000. [20] J. W. Chen, K. Hung, J. S. Wang, and J. I. Guo, “A Performance Aware IP Core Design for Multi-mode Transform Coding Using Scalable-DA Algorithm,” IEEE ISCAS, pp. 21-24, May 2006. [21] J. I. Guo, R. C. Ju, and J. W. Chen, “An efficient 2-D DCT/IDCT core design using cyclic convolution and adder-based realization,” IEEE Trans. on Circuit and System For Video Technology, vol. 14, no. 4, pp. 416-428, April 2004. [22] C. Cheng, and K. K. Parhi, “Hardware Efficient Fast DCT Based on Novel Cyclic Convolution Structures,” IEEE Trans. on Signal Processing, vol. 54, no. 11, pp. 4419-4434, Nov. 2006. [23] D. Gong, Y. He, and Z. Cao, “New Cost-Effective VLSI Implementation of a 2-D Discrete Cosine Transform (DCT) and Its Inverse,” IEEE Trans. on Circuits and System for Video Technology , vol. 5, no. 14, pp. 405-415, April 2004. [24] Y. P. Lee, T. H. Chen, L. G. Chen, M. J. Chen, and C. W. Ku, “A Cost-Effective Architecture for 8x8 Two-Dimensional DCT/IDCT Using Direct Method,” IEEE Trans. on Circuits and System for Video Technology, vol. 7, no. 3, pp. 459-466, June 1997. [25] B. L. Jian, Z. Xuan, T. J. Rong, and L. Yue, “An Efficient VLSI Architecture For 2-D DCT Using Direct Method,” IEEE International Conf. on ASIC Proceeding, pp. 393-396, Oct. 2001 [26] Y. T. Chang, and C. L. Wang, ”New Systolic Array Implementation of the 2-D Discrete Cosine Transform (DCT) and Its Inverse,” IEEE Trans. on Circuits and System for Video Technology , vol. 5, no. 2, pp. 150-157, April 1995. [27] Y. T. Chang, C. L. Wang, and C. H. Chang, “A New Fast DCT Algorithm and Its Systolic VLSI Implementation,” IEEE Trans. on Circuits and System, vol. 44, no. 11, pp. 959-962, Nov. 1997. [28] H. Jeong, J. Kim, and W. Cho, “Low-Power Multiplierless DCT Architecture Using Image Correlation,” IEEE Trans. Consumer Electronics. , vol. 50, no. 1, pp. 262-267, Feb. 2004. [29] Y. H. Hu, and Z. Wu, “An Efficient CORDIC Array Structure for the Implementation of Discrete Cosine Transform,” IEEE Trans. Signal Processing, vol. 43, no. 1, pp. 331-336, Jan. 1995. [30] J. H. Hsiao, L. G. Chen, T. D. Chiueh, and C. T. Chen, “High Throughput CORDIC-Based Systolic Array Design for the Discrete Cosine Transform,” IEEE Trans. on Circuits and System for Video Technology, vol. 5, no. 3, pp. 218-225, June 1995. [31] T. Y. Sung, Y. S. Shieh, C. W Yu, and H, C. Hsin, “High-Efficiency and Low Power Architectures for 2-D DCT and IDCT Based on CORDIC Rotation,” IEEE Conf. on PDCAT, pp. 191-196, Dec. 2006. [32] K. Lengwehasatit, and A. Ortega, “Scalable Variable Complexity Approximate Forward DCT,” IEEE Trans. on Circuits and System for Video Technology, vol. 14, no. 11, pp. 1236-1247, Nov. 2004. [33] N. J. August, and D. S. Ha, “Low Power Design of DCT and IDCT for Low Bit Rate Video Codecs,” IEEE Trans. on Multimedia, vol. 6, no. 3, pp. 441-422, Jane 2004. [34] T. Masaki, Y. Morimoto, T. Onoye, and I. Shirakawa, “VLSI Implementation of Inverse Discrete Cosine Transformer and Motion Compensator for MPEG2 HDTV Video Decoding,” IEEE Trans. on Circuits and System for Video Technology, vol. 5, no. 5, pp. 387-395, Oct. 1995. [35] T. Xanthopoulos, and A. P. Chandrakasan, “A Low-Power IDCT Macrocell for MPEG-2 MP@ML Exploiting Data Distribution Properties for Minimal Activity,” IEEE J. Solid-State Circuits, vol. 34, no. 5, pp. 693-703, May 1999. [36] J. Lee, N. Vijaykrishnan, and M. J. Irwin, “Efficient VLSI Implementation of Inverse Discrete Cosine Transform,” IEEE International Acoustics, Speech, and Signal Processing, vol. 5, pp. 177-180, May 2004. [37] A. Navarro, A. Silva, and J. Tavares, “MPEG-4 Codec Performance Using a Fast Integer IDCT,” IEEE Tenth International Symposium Consumer Electronics, pp. 1-5, June 2006. [38] J. Lee, N. Vijaykrishnan, and M. Jane Irwin, “Inverse Discrete Cosine Transform Architecture Exploiting Sparseness and Symmetry Properties,” IEEE Trans. on Circuits and System for Video Technology, vol. 16, no. 5, pp. 655-662, May 2006. [39] Z. Y. Cheng, C. H. Chen, B. D. Liu, and J. F Yang, “High Throughput 2-D Transform Architectures for H.264 Advanced Video Coders,” IEEE Asia-Pacific Conf. Circuit and System, vol. 2, pp.1141-1144, Dec. 2004. [40] C. P. Fan, “Fast 2-Dimensional 4x4 Forward Integer Transform Implementation for H.264/AVC,” IEEE Trans. Circuit and System, vol.53, no. 3, pp. 174-177, Mar. 2006. [41] H. Qi, W. Gao, S. Ma, and D. Zhao, “Adaptive Block-Size Transform Based on Extended Integer 8x8/4x4 Transforms for H.264/AVC,” IEEE International Conf. on Image Processing, pp. 1341-1344, Oct. 2006.
摘要: 
近年來,數位訊號處理對於可攜式電子裝置有著重要的影響力,而對於行動裝置而言,低功率為電路設計之首要課題。目前離散餘弦轉換(DCT)已經被廣泛應用於各類影像以及視訊壓縮標準。然而,目前尚未能滿足當今各類廣泛使用之影像、視訊編碼標準的電路設計,電路的設計不但需要滿足低功率的需求,同樣需要能夠支援各類影像以及視訊編碼標準。因此設計一個適用於多視訊編碼標準之低功率離散餘弦轉換(DCT)電路架構是當前一個值得研究的課題。
在本論文中,我們使用新式分散式算術演算法(NEDA)來實現我們的架構,採用這演算法不需要乘法器和ROM,讓電路可以用簡單的位移器和加法器就可以完成。我們也利用新式分散式算術演算法提出了一個有效的2-D轉換架構,且利用單一核心即可完成傳統DCT的8x8運算以及H.264/AVC 的8x8與4x4整數轉換以支援多視訊編碼標準的應用。此外,我們使用加法樹(adder tree)改善採用分散式算術(DA)演算法所照成的低產出量,因此我們的產出量可以達到每秒400M pixels, 在工作頻率6M Hz、12M Hz 和48M Hz的時候分別可以即時處理HD 720p、1080p和數位電影畫面。而為了降低功率,我們找到一個有效化簡加法數的方法使運算量降低,光對傳統DCT的8x8運算而言就使加法數降低95.8%。根據實現的結果,此架構的功率消耗在時脈50M Hz時是38.7mW。因此本架構具有高處理量和低成本之特性來達到低功率的效果。採用相同的方法,我們也提出了一個支援多視訊編碼標準的IDCT架構。從VLSI實現的觀點來看,我們設計的架構一樣都具有簡單,模組化且規則。
針對H.264/AVC標準,我們使用相同的演算法,提出了一個高處理量的直接2-D多重轉換架構。這個架構可以執行4種整數轉換,分別是4x4正相轉換、Hadamard 轉換、反相整數轉換和反相Hadamard轉換。根據合成的結果可以跑到時脈100M Hz使處理量達到每秒800M pixels。

In recent years, digital signal processing has significant effects and it is the most important job to design a low power circuit for portable devices. The discrete cosine transform (DCT) has been extensively applied to image and video coding standard. Designing circuit not only requires low power but also supports multi-standard video coding applications in order to meet the requirements of various video coding standards. However, no circuits can meet so far. Therefore, it is worth to research such a topic.
In the thesis, we adopt a new distributed arithmetic algorithm (NEDA) to implement our architecture. There are multiplier-free and ROM-free to make architecture easily to be implemented by some shifts and adders. Therefore, we propose an efficient 2-D transform architecture with unique kernel that can support traditional 8x8 DCT, 8x8 and 4x4 integer transform for multi-standard video coding applications. Furthermore, we utilize adder tree to improve low throughput problem that adopts DA algorithm. Our throughput rate is 400M pixels/s that can process real-time HDTV 720p, 1080p and digital cinema video at 6M Hz, 12M Hz and 48M Hz frequency, respectively. In order to reduce power consumption, we find an efficient approach to simplify number of adder to reduce computation more than 95.8% in terms of traditional DCT. According to experimental results, the power consumption of our proposed architecture is 38.7mW at 50M Hz frequency. Therefore, our proposed architecture has properties of high throughput and low cost to achieve low power effect. In the same way, we also propose IDCT architecture for multi-standard video coding applications. From the viewpoint of the VLSI realization, the proposed architecture is also simple, modular, and regular.
For H.264/AVC standard, we also propose a high throughput direct 2-D multiple transforms using the same algorithm. This architecture can support four transforms that include 4x4 forward integer transform, Hadamard transform, inverse integer transform and inverse Hadamard transform. According to synthesis result, the throughput rate can achieve 800M pixels/s at 100M Hz frequency.
URI: http://hdl.handle.net/11455/7414
其他識別: U0005-0708200710400600
Appears in Collections:電機工程學系所

Show full item record
 

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.