Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/8451
標題: 應用於視訊與通訊信號處理的低複雜度離散轉換及其架構設計
Low-Complexity Discrete Transforms and Their Architecture Designs for Video and Communication Signal Processings
作者: 蘇國安
Su, Guo-An
關鍵字: integer transform;整數轉換;algorithm;hardware sharing architecture;演算法;硬體共享架構
出版社: 電機工程學系所
引用: [1]T. Wiegand and G. Sullivan, “Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification,” (ITU-T rec. H.264/ISO/IEC 14496-10 AVC, presented at Joint Video Team (JVC) of ISO/IEC MPEG and ITU-T VCEG), 2003. [2]W. Gao, C. Reader, F. Wu, Y. He, and L. Yu, Hanqing Lu, Shiqiang Yang, Tiejun Huang, and Xingde Pan, “AVS – The Chinese Next-Generation Video Coding Standard,” National Association of Broadcasters (NAB) Conference 2004. [3]Iain E. G. Richardson, H.264 and MPEG-4 Video Compression-Video Coding for Next-generation Multimedia, John Wiley & Sons, 2003. [4]H. S. Malvar, A. Hallapuro, M. Karczewicz, and L. Kerofsky, “Low Complexity Transform and Quantization in H.264/AVC,” IEEE Transactions on Circuits Systems for Video Technology, vol. 13, no. 7, pp. 598-603, July 2003. [5]M. Wien, “Variable Block-size Transforms for H.264/AVC,“ IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 604-613, July 2003. [6]D. Marpe, T. Wiegand, and S. Gordon, “H.264/MPEG4-AVC Fidelity Range Extensions : Tools, Profiles, Performance, and Application Areas,” IEEE International Conference on Image Processing, pp. 593-596, Sept. 2005. [7]S. Gordon, D. Marple, and T. Wiegand, “Simplified Use of 8×8 Transforms-updated Proposal and Results,” JVT-K028, 11th Meeting, Munich, Germany, March 2004. [8]C. P. Fan and Y. L. Lin, “Implementations of Low-Cost Hardware Sharing Architectures for Fast 8×8 and 4×4 Integer Transforms in H.264/AVC,” IEICE Transactions on Fundamentals, vol. E90-A, no. 2, Feb. 2007. [9]A. Madisetti and Willson, A. N. , Jr. , ”A 100 MHz 2-D 8×8 DCT/IDCT processor for HDTV applications,” IEEE Transactions on Circuits and Systems for Video Technology, pp. 158-165, vol. 5, issue 2, April 1995. [10]L. Z. Liu, Q. Lin, M. T. Rong, and J. Li, “A 2-D Forward/Inverse Integer Transform Processor of H.264 Based on Highly-parallel Architecture,” 4th IEEE International Workshop on System-on-Chip for Real-Time Applications, pp. 158-161, July 2004. [11]W. Hwangbo, J. Kim, and C. M. Kyung, “A High-Performance 2-D Inverse Transform Architecture for the H.264/AVC Decoder,” IEEE International Symposium on Circuits and Systems, pp. 1613-1616, May 2007. [12]B. Shi, W. Zheng, D. Li, and M. Zhang, “Fast Algorithm and Architecture Design for H.264/AVC Multiple Transforms,” IEEE International Conference on Multimedia and Expo, pp. 2086-2089, July 2007. [13]Z. Y. Cheng, C. H. Chen, B. D. Liu, and J. F. Yang, “High Throughput 2-D Transform Architectures for H.264 Advanced Video Coders,” IEEE Asia-Pacific Conference on Circuits and Systems, vol. 2, pp. 1141-1144, Dec. 2004. [14] K. H. Chen, J. I. Guo, and J. S. Wang, “A High-Performance Direct 2-D Transform Coding IP Design for MPEG-4 AVC/H.264,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 16, no. 4, pp. 472-483, April 2006. [15] M . D. Michael, Advanced Digital Design with the Verilog HDL, Prentice Hall, 2003. [16] R. A. Horn and C. R. Johnson, Topics in Matrix Analysis, Cambridge Univ. Press, New York, pp. 239-267, 1991. [17]S. Srinivasan, et al., “Windows Media Video 9: Overview and Applications,” Signal Processing: Image Communication, vol. 19, issue 9, pp. 851-875, October 2004. [18]S. Srinivasan and S. Regunathan, “Computationally Efficient Transforms for Video Coding,” IEEE International Conference on Image Processing, vol. 2, pp. 11-14, September 2005. [19]S. Srinivasan, et al., “Fast Video Codec Transform Implementations,” Patent Application Publication (US 20050256916), United States, Nov. 17, 2005. [20]S. Srinivasan and S. L. Regunathan, ”An overview of VC-1,” Visual Communications and Image Processing, Proc. of SPIE, vol. 5960, pp. 720-728, Jul. 2005. [21]“Standard for Television: VC-1 Compressed Video Bit-stream Format and Decoding Process,” SMPTE 421M-2006. [22]Trac D. Tran, “The BinDCT: Fast Multiplierless Approximation of the DCT,” IEEE Signal Processing Letter, vol. 7, issue 6, pp. 141-144, June 2000. [23]Philip P. Dang, Paul M. Chau, Truong Q. Nguyen, and Trac D. Tran, “BinDCT and Its Efficient VLSI Architectures for Real-Time Embedded Applications,” Journal of Imaging Science and Technology, vol. 49, no. 2, pp. 124-137, March/April 2005. [24] David A. Patterson and John L. Hennessy, Computer Organization and Design : The Hardware and Software Interface, Third Edition, Morgan Kaufmann Publishers, 2005. [25] S. Lee and K. Cho, “Design of Transform and Quantization Circuit for Multi-Standard Integrated Video Decoder,” IEEE Workshop on Signal Processing Systems, pp. 181-186, Oct. 2007. [26] S. Lee and K. Cho, “Circuit Implementation for Transform and Quantization Operations of H.264/MPEG-4/VC-1 Video Decoder,” International Conference on Design & Technology of Integrated Systems in Nanoscale Era, pp. 102-107, Sep. 2007. [27] G. A. Su and C. P. Fan, “Cost Effective Hardware Sharing Architecture for Fast 1-D 8×8 Forward and Inverse Integer Transforms of H.264/AVC High Profile,” IEEE Asia-Pacific Conference on Circuits and Systems, Dec. 2008. [28] Alan V. Oppenheim and Ronald W. Schafer, Discrete Time Signal Processing, Prentice Hall, Englewood Cliffs, NJ, 1989. [29] Jar-Ferr Yang and Chih-Peng Fan, “Compact Recursive Structures for Discrete Cosine Transform,” IEEE Transactions on Circuits and Systems-II: Analog and Digital Signal Processing, vol. 47, no. 4, April 2000. [30] J. D. Markel, “FFT Pruning,” IEEE Transactions on Audio Electro-acoust., vol. AU-19, no. 4, pp. 305-311, Dec. 1971. [31] K. Nagai, “Pruning the Decimation-in-Time FFT Algorithm with Frequency Shift,” IEEE Transactions on Acoust. Speech Signal Processing, vol. ASSP-34, no. 4, pp. 1008-1100, Aug. 1986. [32] D. P. Skinner, “Pruning the Decimation-in-Time FFT Algorithm,” IEEE Transactions on Acoust. Speech Signal Processing, vol. ASSP-24, no. 2, pp. 193-194, Apr. 1976. [33] S. Bouguezel, M. Omair Ahmad, and M. N.S. Swamy, “Efficient Pruning Algorithms for the DFT Computation for a Subset of Output Samples,” International Symposium on Circuits and Systems, vol. 4, pp. IV-97- IV-100, May 2003. [34] H. V. Sorensen and C. S. Burms, “Efficient Computation of the DFT with Only a Subset of Input or Output Points,” IEEE Transactions on Signal Processing, vol. 41, no. 3, pp. 1184-1200, March 1993. [35] Daisuke Takahashi, “An Extended Split-Radix FFT Algorithm,” IEEE Signal Processing Letters, vol. 8, no. 5, pp. 145-147, May 2001. [36] Saad Bouguezel, M. Omair Ahmad, and M. N. Swamy, “A New Radix-2/8 FFT Algorithm for Length- DFTs,” IEEE Transactions on Circuits and Systems-I: Regular Papers, vol. 51, no. 9, pp. 1723-1732, Sept. 2004. [37] S. C. Chen and K. L. Ho, “Split Vector-Radix Fast Fourier Transform,” IEEE Transactions on Signal Processing, vol. 40, no. 8, pp. 2029- 2039, August 1992. [38] Wen-Chang Yeh and Chein-Wei Jen, “High-Speed and Low-Power Split-Radix FFT,” IEEE Transactions on Signal Processing, vol. 51, no. 3, pp. 864-874, March 2003. [39] Jung-Yeol Oh and Myoung-Seob Lim, “A Radix-24 SDF Pipeline FFT Processor for OFDM Modulation,” The First IEEE VTS APWCS (Asia Pacific Wireless Communications Symposium), January 2004. [40] Lihong Jia, Yonghong Gao, Jouni Isoaho, and Hannu Tenhunen, “A New VLSI-Oriented FFT Algorithm and Implementation,” IEEE ASIC Conference, pp. 337-341, 1998. [41] Saad Bouguezel, M. Omair Ahmad, and M.N.S. Swamy, “An Efficient Split-Radix FFT Algorithm,” International Symposium on Circuits and Systems, vol. 4, pp. 65-68, May 2003. [42] E. H. Wold and A. M. Despain, “Pipeline and Parallel-Pipeline FFT Processors for VLSI Implementation”, IEEE Transactions on Comput., vol. 33, No. 5, pp. 414-426, May 1984. [43] Alvin M. Despain, “Fourier Transform Computer Using CORDIC Iterations,” IEEE Transactions on Comput., vol. 23, no. 10, pp. 993-1001, Oct. 1974. [44] IEEE Std. 802.16-2004, Part 16: Air Interface for Fixed Broadband Wireless Access Systems, 2004. [45] ETSI EN 300 744 (v1.2.1):Digital Video Broadcasting (DVB); Framing Structure, Channel Coding and Modulation for Digital Terrestrial Television, 1999. [46] IEEE 802.11, IEEE Standard for Wireless LAN Medium Access Control and Physical Layer Specifications, 1999. [47] T1E1.4/98-007R4: Standards Project for Interfaces Relating to Carrier to Customer Connection of Asymmetrical Digital Subscriber Line (ADSL) Equipment. 1998. [48] ETSI TS 101 270-2 (v1.1.1): Transmission and Multiplexing(TM); Access Transmission Systems on Metallic Access Cables; Very High Speed Digital Subscriber Line (VDSL); Part 2: Transceiver specification, 2001. [49] Jia, L., Gao, Y., and Tenhunen, H., “Efficient VLSI Implementation of Radix-8 FFT Algorithm,” IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pp. 468-471, 1999. [50] He, S. and Torkelson, M., “A New Approach to Pipeline FFT Processor,” The 10th International Parallel Processing Symposium, pp. 766-770, 1996. [51] He, S. and Torkelson, M., “Designing Pipeline FFT Processor for OFDM (de)Modulation,” International Symposium on Signals, Systems, and Electronics, pp. 257-262, 1998. [52] Son, B. S., Jo, B. G., Sunwoo, M. H. and Kim, Y. S., “A High-Speed FFT Processor for OFDM Systems,” IEEE International Symposium on Circuits and Systems, vol. 3, pp. 281-284, 2002. [53] Fan, C. P., Lee M. S. and Su G. A., “A Low Multiplier and Multiplication Costs 256-Point FFT Implementation with Simplified Radix-24 SDF Architecture,” IEEE Asia-Pacific Conference on Circuits and Systems, 2006. [54] Lee, H. and Shin, M., “A high-speed low-complexity two-parallel radix-24 FFT/IFFT processor for UWB applications,” IEEE Asia Solid-State Circuits Conference, pp. 284-287, 2007. [55] Lin, Y. W., Liu, H. Y., and Lee, C. Y., “A 1-GS/s FFT/IFFT Processor for UWB Applications,” IEEE Journal of Solid-Stale Circuit, vol. 40, no. 8, pp. 1726-1735, 2005. [56] Oh, J. Y. and Lim, M. S., “Fast Fourier Transform Processor Based on Low-Power and Area-Efficient Algorithm,” IEEE Asia-Pacific Conference on Advanced System Integrated Circuits, pp. 198-201, 2004. [57] Y. Z. Zhang and Y. F. Yao, “Fast Implement of Recursive DFTs,” IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 1071-1074, May 1989. [58] Hun-Chen Chen, Jiun-In Guo, and Chien-Wei Jen, “A New Group Distributed Arithmetic Design for the One Dimensional Discrete Fourier Transform,” IEEE International Symposium on Circuits and Systems, vol. 1, pp. 26-29, May 2002. [59] H. Guo, G. A. Sitton, and C. S. Burrus, ”The Quick Discrete Fourier Transform,” IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3, pp. 19-22, April 1994. [60] Richard Hartley and Kenneth Welles II, “Recursive Computation of the Fourier Transform,” IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3, pp. 1792-1795, May 1990. [61] Daniel Pak-Kong Lun and Wan-Chi Siu, “An Enhanced Model for Software Realization of Recursive Prime Radix Discrete Fourier Transform,” IEEE International Symposium on Circuits and Systems, vol. 3, pp. 2369-2372, May 1990. [62] K. J. R. Liu, C. T. Chiu, K. Kolagotla, and J. F. JaJa, “Optimal Unified Architectures for the Real-Time Computation of Time-Recursive Discrete Sinusoidal Transforms,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 4, no. 2, pp. 168-180, April 1994. [63] Jar-Ferr Yang and Fu-Kun Chen, “Recursive Discrete Fourier Transform with Unified IIR Filter Structures,” Elsevier Science B.V., Signal Processing, vol. 82, issue 1, pp. 31-41, January 2002. [64] I. Niven, H. S. Zuckerman, and H. L. Montgomery, An Introduction to the Theory of Number, New York, Wiley, 1991. [65] L. P. Chau and W. C. Siu, “Recursive Algorithm for the Discrete Cosine Transform with General Lengths,” IEE Electronics Letters, vol. 30, issue 3, pp. 197-198, Feb. 1994. [66] Zhongde Wang and Jullien G.A, ”Recursive Algorithm for the Discrete Cosine Transform with Regular Structure,” IEEE Transactions on Circuits and Systems, vol. 2, pp. 1097-1100, Aug. 1993. [67] Aburdene, M. F. and Jianqing Zheng, “Computation of Discrete Cosine Transform Using Clenshaw''s Recurrence Formula,” IEEE Transactions on Signal Processing Letters, vol. 2, issue 8, pp. 155-156, Aug. 1995. [68] Lap-Pui Chau and Wan-Chi Siu, “Efficient Formulation for the Realization of Discrete Cosine Transform Using Recursive Structure,” IEEE International Conference on Speech, and Signal Processing, vol. 3, pp. 19-22, April 1994. [69] Bitmead R., “On Recursive Discrete Fourier Transformation,” IEEE Transactions on Speech, and Signal Processing, vol. 30, issue 2, pp. 319-322, April 1982. [70] Macias J. A. R. and Exposito A. G.., “Recursive Formulation of Short-Time Discrete Trigonometric Transforms,” IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 45, issue 4, pp. 525-527, April 1998. [71] Lap-Pui Chau and Wan-Chi Siu, “Direct Formulation for the Realization of Discrete Cosine Transform Using Recursive Structure,” IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 42, issue 1, pp. 50-52, Jan. 1995. [72] Simon Haykin and Barry Van Veen, Signals and Systems, WILEY. [73] Chih-Peng Fan and Guo-An Su, “Novel Recursive Discrete Fourier Transform with Compact Architecture,” IEEE APCCAS 2004, Taiwan, pp. 1081-1084, 2004. [74] Che-Hong Chen, Bin-Da Liu, ”Recursive Architectures for Realizing Modified Discrete Cosine Transform and Its Inverse,” IEEE Transactions on Circuits and Systems-II, vol. 50, no. 1, pp. 38-45, January 2003. [75] Vladimir Nikolajevic, “Computation of Forward and Inverse MDCT Using Clenshaw’s Recurrence Formula,” IEEE Transactions on Signal Processing, vol. 51, no. 5, pp. 1439-1444, May 2003. [76] Vladimir Nikolajevic, “New Recursive Algorithms for the Forward and Inverse MDCT,” IEEE Workshop on Signal Processing Systems, pp. 51-57, Sept. 2001. [77] Zhan-Yuan Cheng, Che-Hong Chen, Bin-Da Liu and Jar-Ferr Yang, ”Unified Selectable Fixed-Coefficient Recursive Structures for Computing DCT, IMDCT, and Sub-band Synthesis Filtering,” International Symposium on Circuits and Systems, vol. 3, pp. 557-560, May 2004. [78] 陳建志, “正交分頻多工系統中使用最大可能估測法之頻率與時序同步問題之研究,” 國立臺灣大學電機工程學研究所碩士論文, 中華民國九十二年七月。
摘要: 
In this thesis, we present two topics, where one is the algorithms and the implementations of the integer transforms for the H.264/AVC, VC-1, and AVS applications, and the other is the algorithms and the implementations of the pruning FFT, the radix-24 FFT, and the recursive DFT for the communication applications.
In the algorithms and the implementations of the integer transforms, the fast algorithms and their hardware sharing architecture between the integer transforms in the different video standards are proposed to reduce the hardware costs. For the development of the integer transforms in the H.264/AVC video standard, the fast algorithms and the hardware sharing architectures of the 1-D forward/inverse integer transforms are proposed to achieve the low-cost hardware circuits when the hardware circuits of the 1-D forward and inverses of H.264/AVC need to be implemented in the same chip simultaneously. By the cell based design flow, the gate counts of the hardware sharing design for the 1-D 4×4 and 8×8 forward/inverse integer transforms are smaller than those of the individual 1-D 4×4 and 8×8 forward/inverse integer transforms without the hardware share.
For the development of the integer transforms in the VC-1 video standard, the proposed fast algorithms for the 1-D 8×8 inverse integer transform of VC-1 are developed based on the matrix symmetric property and matrix decompositions. The numbers of the additions and the shift operations of the proposed fast 1-D inverse integer transforms are smaller than those of the previous fast method. For the hardware sharing design of the fast 1-D 4×4 and 8×8 forward/inverse integer transforms, the common hardware modules are shared to reduce the total hardware costs. Thus, the hardware costs of the proposed 1-D and 2-D hardware sharing design are smaller than those of the individual and separate designs without shares.
For the hardware sharing design of H.264/AVC and AVS, the hardware sharing design for the 1-D 2×2, 4×4, and 8×8 inverse transforms of H.264/AVC and the 1-D 8×8 inverse transform of AVS is proposed with the low hardware cost, especially for the multiple decoding applications in China. By sharing the hardware, the proposed 1-D hardware sharing architecture is realized by adding the offset computations, and it is implemented with the pipelined architecture. Thus, the hardware costs of the proposed sharing architecture are smaller than those of the individual and separate designs.
In the algorithms and implementations of Discrete Fourier Transform (DFT), the algorithms and architectures of the pruning FFT, the radix-24 FFT, and the recursive DFT are proposed. The proposed pruning FFT algorithm, which is developed by the grouped scheme, is applied to compute DFT with the power-of-two partial transform length. The group-based pruning FFT algorithm applies the scheme of the grouped frequency indices to accelerate DFT computations. The proposed pruning FFT algorithm has fewer complex multiplications than the other pruning FFT algorithms when the number of the partial transformed outputs is equal to or larger than the 1/16 total FFT transform length.
Next, the efficient and low-cost 256-point FFT architecture and implementation are proposed. Based on the radix-16 FFT algorithm, the proposed 256-point FFT processor utilizes the simplified cascaded radix-24 single-path delay feedback (SDF) structure. Thus, the control circuit of the proposed simplified radix-24 FFT SDF architecture is simpler than that of the direct radix-16 FFT SDF structure. In the hardware verification, the throughput of our FFT design processes up to 35.5M samples/sec with a Xilinx Virtex2 1500 FPGA chip, and it processes up to 51.5M samples/sec with the UMC 0.18μm standard cell library.
Finally, the efficient recursive algorithm for the DFT computations with the power-of-two transform length is proposed. The benefit of proposed recursive structure is the reduction of the loop numbers, and the signal to quantization noise ratio (SQNR) of the proposed recursive DFT is greater than that of the conventional Goertzel's algorithm. The proposed recursive DFT is also modified by the selected coefficients to reduce the round-off error and increase SQNR.
URI: http://hdl.handle.net/11455/8451
其他識別: U0005-0907200901294800
Appears in Collections:電機工程學系所

Show full item record
 
TAIR Related Article

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.