Please use this identifier to cite or link to this item:
標題: 序列相關資料的主成分分析法-以德基水庫水質監測數據為例
Principal Components Analysis of Serially Correlated Data -the Monitored Water Data of Techi Reservoir as an example
作者: 何玫儀
Ho, Mei-I
關鍵字: principal components analysis;主成分分析法;linear mixed model;restricted maximum likelihood;線性混合模式;限制最大概度法
出版社: 農藝學系
環境資料常會在同一地點測量一個以上的變數測值,多變量統計分析中的主成分分析(principal components analysis)是最常被用於縮減資料維度的方法,其做法是使用少數幾個變數的線性組合,來解釋原始數據大部分的變異。在使用主成分分析時,觀測值間應不具有顯著的相關性。然而,當資料是在同一地點被測量多次時,恐會影響主成分分析的結果。
本文針對此一問題進行研究,首先透過模擬的方式,使用線性混合模式(linear mixed model)來配適數據,分別模擬出重複觀測值間具有四種相關結構(correlation structure)組合的數據:CS(compound symmetric)與CS,CS與獨立,AR(1)(first-order autoregressive)與AR(1),以及AR(1)與獨立,然後計算模擬數據之變異數成分(variance components)在不同相關結構組合、共變異數、觀測值個數和相關程度(指重複的觀測值間的相關程度)這四個因素下之偏差(bias)比值,並探討這四個因素的變動對其影響。另外,再以具有不偏(unbiased)特性的限制最大概度法(restricted maximum likelihood, REML)修正後的變異數-共變異數矩陣(variance-covariance matrix)為標準,來探討前述四個因素的變動對特徵值(eigenvalue)與特徵向量(eigenvector)之影響。
最後以德基水庫水質監測數據為例,詳述數據重複觀測值間相關性之檢定、變異數-共變異數矩陣之修正、以及未經修正與修正在分析結果上之差異比較,並以圖形來詮釋修正後主成分分析的結果,提供一多變數序列相關(serially correlated)數據之主成分分析過程。

Environmental data are often involved with datasets that have more than one response variable for each experimental unit. For investigations involving a large number of observed variables, it is usually useful to simplify the analysis by considering a smaller number of linear combinations of the original response variables. The principal components analysis (PCA) is perhaps the best known dimension-reduction tool of multivariate analysis. The assumption of serially uncorrelated observations should be satisfied when one utilizes the PCA. However, when the data are measured several times at a location, they will be serially correlated. Under the circumstances, the results will be doubtful if PCA method is utilized.
In order to overcome the problem, first we utilized linear mixed model to fit the serially correlated data by simulation. There are four kinds of correlated structures: compound symmetric (CS) with CS, CS with independent, first-order autoregressive (AR(1)) with AR(1), and AR(1) with independent for the simulated datasets. The results were produced by calculating the biases of variance components in different correlated structures, covariance, the number of observations, and the correlated measure. Secondly, we employed the restricted maximum likelihood (REML) as the standard to investigate the change of the above four factors to the impacts of eigenvalues and eigenvectors.
Finally, we used the monitored water data of Techi reservoir as an example to go into detail about the correlated test of repeated observations, the revisal of variance-covariance matrix, and the comparison with the analysis result of revising or not. By the dataset, it provides a thoroughly detailed analysis of the PCA for multivariate serially correlated data.
Appears in Collections:農藝學系

Show full item record

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.