Please use this identifier to cite or link to this item:
標題: 應用於多輸入多輸出正交分頻多工系統預編碼與訊號偵測之快速矩陣分解法設計與晶片實現
Designs and Chip Implementations of Fast Matrix Decomposition Schemes for Precoding and Signal Detection in MIMO OFDM Systems
作者: 陳韋達
Chen, Wei-Da
關鍵字: 複數QR分解
Complex-valued QR Decompostion
Signal Detection
Complex-valued Singular Values Decomposition
Geometric Mean Decomposition
出版社: 電機工程學系所
引用: [1] 802.15.3c-2009, IEEE Standard for Information Technology—Telecommunications and Information Exchange Between Systems—Local and Metropolitan Area Networks—Specific Requirements. Part 15.3: Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specifications for High Rate Wireless Personal Area Networks (WPANs) Amendment 2: Millimeter-Wave-Based Alternative Physical Layer Extension. [2] E. Perahia, “IEEE 802.11n development: History, process, and technology,” IEEE Comm., vol. 46, no. 7, Jul. 2008. [3] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, “Space-time block codes from orthogonal designs, “ IEEE Trans. Inf. Theory, vol. 45, pp.1456 -1467, 1999 [4] S. Alamouti, “A simple transmit diversity technique for wireless communications,” IEEE J. Sel. Areas Commun., vol. 16, no. 8, pp. 1451–1458, Oct. 1998. [5] A. van Zelst, R. van Nee, and G. A. Awater, “Space division multiplexing (SDM) for OFDM systems,” in Proc. IEEE Veh. Technol. Conf., May 2000, pp. 1070–1074. [6] R. van Nee, A. van Zelst and G. Awater, ” Maximum likelihood decoding in a space division multiplexing system,” in Proc. Veh. Technol. Conf. , May 2000,pp.15-18. [7] A. Burg, et al., ” VLSI implementation of MIMO detection using the sphere decoding algorithm,” IEEE Journal of Solid State Circuits, Vol40, no.7, July 2005 [8] B. M. Hochwald and S. Ten Brink, “Achieving near-capacity on a multiple-antenna channel,” IEEE Trans. Commun., vol. 51, no. 3, pp. 389–399, 2003. [9] Z. Guo and P. Nilsson, ”Algorithm and implementation of the K-best sphere decoding for MIMO detection,” IEEE J. Sel. Areas Commun., vol. 24, no. 3, pp. 491-503, Mar. 2006. [10] M. Shabany and P. Glenn Gulak, “A 675 Mbps, 4x4 64-QAM K-Best MIMO Detector in 0.13 [11] G.D. Golden, C.J. Foschini, R.A.Valenzula and P.W. Wolniansky, “Detection algorithm and initial laboratory result using V-BLAST space-time communication architecture,” Electronics letters, vol.35, no.1, pp.14-16, Jan1999. [12] X. Li and X. Cao, ”Low complexity signal detection algorithm for MIMO-OFDM systems,” IEE Electronics Letters, vol. 41, no. 2, pp. 83-85, Jan. 2005. [13] R. Bohnke, D. Wubbenm, V. Kuhn and K.-D. Kammeyer, “Reduced complexity MMSE detection for BLAST architectures,” in Proc. IEEE GLOBECOM ''03, Dec. 2003, vol. 4, pp. 2258-2262. [14] A. Burg, et al., ”Algorithm and VLSI architecture for linear MMSE detection in MIMO-OFDM systems,” in Proc. IEEE Int. Symp. Circuits and Syst., May 2006, pp. 4102–4105. [15] T.F. Coleman and C.F. Van Loan, Handbook for Matrix Computations. Philadelphia :SIAM, 1988. [16] C.F.T. Tang, K.J.R. Liu, and S.A. Tretter, ”On systolic arrays for recursive complex Householder transformations with applications to array processing,” in Proc. IEEE Acoustics, Speech, and Signal Processing, Toronto, Canada, Apr. 1991, pp. 1033-1036. [17] K.-L. Chung and W.-M. Yan, ”The complex Householder transform,” IEEE Trans. Signal Process., vol.45, no. 9, pp. 2374-2376, Sept. 1997. [18] Z. Liu, K. Dickson, and J. V. McCanny, “Application-specific instruction set processor for SoC implementation of modern signal processing algorithms,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 4, pp. 755- 765, Apr. 2005. [19] K. H. Lin, R. C. Chang, C. L. Huang, F. C. Chen and S. C. Lin, “Implementation of QR decomposition for MIMO-OFDM detection systems,” in Proc. IEEE Int. Electronics, Circuits and Syst., Malta, Sept. 2008, pp.57-60. [20] Jack E. Volder, ” The CORDIC Trigonometric Computing Technique, ” IRE Trans. Electron. Comput., vol. EC-8, no.3 , pp. 330-334, Sept. 1959. [21] D. Patel, M. Shabany, P. G. Gulak, “A Low-Complexity High-Speed QR Decomposition implementation for MIMO Receivers,” in Proc. IEEE Int. Symp. Circuits Syst., Taipei, Taiwan, May 2009, pp.33-36. [22] Kuang-Hao Lin, R.C.-H. Chang, Hsin-Lei Lin, and Ching-Fen Wu, “Analysis and Architecture Design of a Downlink M-Modification MC-CDMA System Using the Tomlinson–Harashima Precoding Technique,” IEEE Trans. Veh. Technol., vol. 57, no. 3, pp. 1387-1397, May 2008. [23] C.K. Singh, S.H. Prasad, and P.T. Balsara, ”VLSI Architecture for Matrix Inversion using Modified Gram-Schmidt based QR Decomposition,” in Proc. Int. VLSI Design Conf., Jan. 2007, pp. 836-841. [24] Y. Hu, “CORDIC-based VLSI Architectures for Digital Signal Processing,” IEEE Sig. Proc. Mag., vol.9, no.3, pp. 16-35,July 1992. [25] P. Luethi, et al., “VLSI Implementation of a High-Speed Iterative Sorted MMSE QR Decomposition,” in Proc. IEEE Int. Symp. Circuits Syst., New Orleans, USA, May 2007, pp. 1421-1424. [26] G. E. Forsythe and P. Henrici, “The cyclic Jacobi method for computing the principal values of a complex matrix,” Trans. Amer. Math. Soc., vol. 94, pp.1-23 ,1960. [27] G. Golub and W. Kahan, “Calculating the singular values and pseudo-inverse of a matrix,” J. Soc. Indust. Appl. Math.: Ser. B, Numer. Anal., vol. 2 , pp. 205-224, Jan. 1965. [28] J. Demmel and W. Kahan, “Accurate singular values of bidiagonal matrices,” SIAM J. Sci. Stat. Comput., vol.15, num.5, pp. 873-912, Sept.1990. [29] Y. Jiang, J. Li and W.W. Hager, ”Joint transceiver design for MIMO communications using geometric mean decomposition,” IEEE Trans. Signal Process, vol 53, no. 10 , pp. 3791-3803, Oct. 2005. [30] Y. Jiang, J. Li and W.W. Hager, ”Uniform channel decomposition for MIMO communications,” IEEE Trans. Signal Process, vol. 53, no. 11, pp. 4283-4294, Oct. 2005. [31] I. E. Telatar, “Capacity of multi-antenna Gaussian channels,” Eur. Trans. Telecommun., vol. 10, no. 6, pp. 585–595, 1999. [32] A. Maltsev, V. Pestretsov, R. Maslennikov, and A. Khoryaev, “Triangular systolic array with reduced latency for QR-decomposition of complex matrices,” in Proc. IEEE Int. Symp. Circuits Syst., Kos Island, Greece, May 2006, pp. 385-388. [33] F. Sobhanmanesh and S. Nooshabadi, ” Parametric minimum hardware QR-factoriser architecture for V-BLAST detection,” IEE Proc. Circuits, Devices and Sys., vol. 153, no. 5, pp. 433-441, Oct. 2006. [34] Yin-Tsung Hwang and Wei-Da Chen, “ A Low Complexity Complex QR Factorization Design for Signal Detection in MIMO OFDM Systems,” in Proc. IEEE ISCAS, Seattle, USA, May 2008, pp.932-935. [35] A. Bjorck, ”Solving linear least squares problems by Gram-Schmidt orthogonalization,” BIT, vol. 7, pp. 1-21, 1967. [36] S. Y. Kung, VLSI Array Processors. Englewood Cliffs, NJ: Prentice-Hall, 1988. [37] R., Hamill, J. V. McCanny and R. L. Walke, “Online CORDIC Algorithm and VLSI Architecture for Implementing QR-Array Processors, “ IEEE Trans. Signal Process., vol. 48, no.2, pp. 592-598, Feb. 2000. [38] C.S. Wu and A.Y. Wu, “Modified Vector Rotational CORDIC (MVR-CORDIC) Algorithm and Architecture,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 48, no. 6, pp. 529-532, Jun. 2001. [39] K. Maharatna, et al., “Modified virtually scaling-free adaptive CORDIC rotator algorithm and architecture,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 11, pp. 1463-1474, Nov. 2005. [40] P. Salmela, A. Burian, H. Sorokin and J. Takala, “Complex-valued QR decomposition implementation for MIMO receivers,” in Proc. IEEE Acoustics, Speech, and Signal Processing, Las Vegas, Nevada, USA, Apr. 2008, pp.1433-1436. [41] C. Studer, P. Blosch, P. Friedli and A. Burg, “Matrix Decomposition Architecture for MIMO Systems: Design and Implementation Trade-offs,” in Proc. IEEE Asilomar Conf. on Signals, Systems, and Computers, Pacific Grove, CA, US, Nov. 2007, pp. 1986-1990. [42] I. LaRoche and S. Roy, “An efficient regular matrix inversion circuit architecture for MIMO processing,” in Proc. IEEE Int. Symp. Circuits and Systs., Sept. 2006, pp. 4819–4819. [43] M. Myllyla, et al., “Complexity Analysis of MMSE Detector Architectures for MIMO OFDM Systems,” in Proc. Asilomar Conference on Signals, Systems and Computers, Pacific Grove, Oct. 2005, pp.75-81. [44] H. Kim, et al., ”An efficient FPGA based MIMO-MMSE Detector,” in Proc. of EU-. SIPCO, Sept. 2007, pp. 1131–1135. [45] L. Boher, R. Rabineau and M.Helard, “An Efficient MMSE Equalizer Implementation for 4×4 MIMO-OFDM Systems in Frequency Selective Fast Varying Channels,” in Proc. IEEE PIMRC, Sept. 2007, pp. 3-7. [46] L. Boher, R. Rabineau and M. Helard, “FPGA Implementation of an Iterative Receiver for MIMO-OFDM Systems,” IEEE J. Sel. Areas Commun., vol.26, no.6, pp.857-866, Aug. 2000. [47] J.-K. Zhang, A. Kavcic and K. M.Wong, “Equal-diagonal QR decomposition and its application to precoder design for successive-cancellation detection,” IEEE Trans. Inf. Theory, vol. 51, no. 1, pp. 154 - 172, Jan. 2005. [48] J. L. Tzeng, C. J. Huang, Y. H. Yuan and H. P. Ma, “A high performance low complexity joint transceiver for closed-loop MIMO applications,” in Proc. Asia and South Pacific on Design Automation Conference , Taipei, Jan. 2010, pp. 381-382. [49] W. C. Kan and G.E Sobelman, “MIMO Transceiver Design Based on a Modified Geometric Mean Decomposition,” in Proc. IEEE Int. Symp. Circuits Syst., New Orleans, LA, USA, May 2007, pp. 677-680. [50] R. P. Brent, F. T. Luk and C. F. Van Loan, “Computation of the Singular Value Decomposition Using Mesh-Connected Processors,” J. VLSI Comput. Syst., vol. 1, no.3, pp. 242-270, Mar. 1982. [51] W. Yue, K. Cunningham, P. Nagvajara, and J. Johnson, “Singular value decomposition hardware for MIMO: State of the art and custom design,” in Proc IEEE ReConFig, Dec. 2010, pp. 400–405. [52] C. Z. Zhan et al.,”High convergence speed low computation complexity SVD algorithm for MIMO-OFDM systems,” in Proc. IEEE Int. Symp. VLSI Design, Automation, and Test, Hsinchu, Taiwan, Apr. 2009, pp. 195-198. [53] A. Horn,”On the eigenvalues of a matrix with prescribed singular values,” Proc. Amer. Math. Soc., vol. 5, pp. 4-7, 1954. [54] G. Golub and C. van Loan, “Matrix computations,” 3rd ed. London: The Johns Hopkins University Press, Baltimore, 1996. [55] W.W. Hager. (2003, Dec. 3). Geometric Mean Decomposition [Online]. Available: [56] R.P. Brent and FT. Luk, “A systolic architecture for almost linear time solution of the symmetric eigenvalue problem,” Technical Report TR-CS-82-525, Dept. of Computer Science, Cornell University, Aug.1982. [57] C. Zhan, Y.-L. Chen and A.-Y. Wu, “Iterative Superlinear-Convergence SVD Beamforming Algorithm and VLSI Architecture for MIMO-OFDM Systems,” IEEE Trans. Signal Process, vol. 60, no. 6, pp. 3264-3277, Jun. 2012. [58] D. Markovic, B. Nikolic, and R. W. Brodersen, “Power and area minimization for multidimensional signal processing,” IEEE J. Solid State Circuits, vol. 42, no. 4, pp. 922–934, Apr. 2007. [59] Y.-L. Chen, C.-Z. Zhan, T.-J. Jheng and A.-Y. Wu, “Reconfigurable Adaptive Singular Value Decomposition Engine Design for High-Throughput MIMO-OFDM Systems,” IEEE Trans. VLSI Syst., vol. 21, no. 4, pp. 747–760, Apr. 2013. [60] C. Senning , C. Studer , P. Luethi and W. Fichtner, “Hardware-efficient steering matrix computation architecture for MIMO communication systems, ” Proc. IEEE Int. Symp. Circuits Syst., 2008, pp.304 -307. [61] D. Cescato and H. Bölcskei, “QR decomposition of Laurent polynomial matrices sampled on the unit circle,” IEEE Trans. Inf. Theory, vol. 56, no. 9, pp. 4754–4761, Sep. 2010.
摘要: 在多輸入多輸出的系統,為了基頻訊號處理,經常要承受大量的矩陣運算,這些運算將會成為即時系統實現之重大阻礙。預設碼與訊號偵測是兩個最高運算密集度的模組。在這一篇論文當中,我們著手研究許多不同類型的矩陣分解,這一些經常使用在多輸入多輸出之訊號處理。在第二章,我們首先回顧多輸入多輸出預設碼與訊號偵測分解方式之應用。特別選擇QR分解和幾何平均分解法,由於它們可以分別應用在基於多輸入多輸出QR-BALST訊號偵測以及預編碼上。 在QR分解部分,有兩個版本的設計分別呈現在第三與第四章。第一個是高吞吐量、完全平行的複數QR分解設計且只使用實數吉文氏旋轉(Givens rotation)。低運算複雜度與各種不同分解的計算方式也一併呈現。藉由精心設計硬體排程,一個4 × 4 複數QR分解只需要花費8個時脈週期。遵循電子電機工程師學會(IEEE) 802.11n的標準,發展出2 × 2和4 × 4之QR分解晶片設計。在TSMC 0.18um製程下,所實現的結果,顯示出這兩個設計都有每秒計算15百萬複數QR分解之能力。第二個複數QR分解是第一版本QR分解具有最小均方誤差(MMSE)加強之特色。藉由使用額外訊號處理之摺疊(folding)技巧,使得所建議的設計在計算一個4 × 4複數最小均方誤差QR分解只需要花費四個時脈週期且已發展硬體在TSMC 0.18um製程下和兩種不同FPGA平台(Xilinx and Altera)裡。 在幾何平均分解的部分,將開發兩個有效率的計算方式,分別在第五與第六章。有別於傳統基於幾何平均分解演算法的奇異值分解。這兩個版本都是使用雙對角線矩陣視為計算幾何平均分解之前處理,而取代原來的奇異值分解。它們具有低複雜度和不需要置換運算之特色且預算碼與訊號偵測之間的硬體可以共用。第一個幾何平均分解運算的方式,是採用漸進的方式,每次的運算都是從一個2×2 sub-matrix開始,最後可得到幾何平均分解的結果;而第二個方式是採用分治法(divide-and-conquer)計算策略。運算複雜度分析顯示出,相較於傳統的方式,計算效率至少優於30%。在第七章,將處理硬體實現。建議的GMD演算法將會映射到完全平行化以及高管線化的架構上,且此架構運算一個4×4複數矩陣的GMD只需要四個時脈週期。它也具支援兩個運算模組之整合架構,另如:針對訊號偵測的QR分解和預設碼的幾何平均分解。在TSMC 90nm CMOS製程下的晶片實現,最大時脈頻率可達到170MHz且此設計每秒可以計算42.5M個幾何平均分解或是QR分解的運算。最後在第八章,將敘述結論與這篇論文的未來應用。
Multiple Input Multiple Output (MIMO) systems often impose tremendous computing overheads in the form of matrix operations to the base band signal processing. This becomes a formidable barrier in real time system implementation. In particular, precoding and signal detection are the two most computation-intensive modules. In this dissertation, we start with an investigation on various matrix decomposition schemes commonly used in MIMO signal processing. The applications of these decomposition schemes on MIMO signal detection and precoding are first reviewed in chapter 2. In particular, QR decomposition and geometric mean decomposition are chosen specifically for the applications in QR-blast based MIMO signal detection and MIMO signal pre-coding, respectively. In the QR decomposition part, two versions of the design are presented in chapter 3 and chapter 4, respectively. The first one indicates a high throughput, fully parallel Complex-valued QR Decomposition (CQRD) design using real-valued Givens rotations only. The simplicity in computing complexity against various decomposition schemes is shown. Via a carefully plotted scheduling, one CQRD computation can be completed in 8 clock cycles. Sized 2 � 2 and 4 � 4 chip designs largely following the IEEE 802.11n standard are developed. The implementation results in TSMC 0.18 um process technology show that both designs are capable of performing 15M CQRDs per second. The second CQRF design features a minimum mean square error (MMSE) enhancement of the first one. By applying an additional DSP folding technique, the design takes only four clock cycles to perform a 4x4 complex-valued MMSE-QR decomposition. The ASIC fabrication in a TSMC 0.18µm process technology and the FPGA implementations in two types of FPGA devices (Xilinx and Altera) are developed. In the GMD part, two versions of the efficient computing scheme are developed in chapter 5 and chapter 6. Unlike conventional SVD based GMD algorithms, both schemes use matrix bi-diagonalization rather than SVD as the pre-processing step. They also feature lower computing complexities, permutation-free operations, and hardware sharing between the pre-coding and the signal detection modules. The first version of the GMD computing scheme adopts a progressive approach and obtains the GMD result incrementally starting from a 2�2 sub-matrix. The second version of the GMD scheme adopts a divide-and-conquer computing strategy. Computing complexity analyses indicate at least 30% more computing efficiency than other SVD based GMD computing schemes. In chapter 7, the hardware implementation is addressed. The scheme is mapped to a fully parallel and deeply pipelined architecture where one GMD computation of a 4�4 complex-valued matrix can be accomplished every 4 clock cycles. It also features a joint design supporting two computing modes, i.e. QRD for signal decoding and GMD for precoding. Chip implementation in TSMC 90nm CMOS technology shows that, with a maximum clock frequency up to 170MHz, the design can perform 42.5M GMD or QRD computations per second. Finally, in chapter 8, the conclusion and the future work of this dissertation are drawn.
其他識別: U0005-3006201316182200
Appears in Collections:電機工程學系所



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.