PCA非常好理解,本质上是选择一组新的正交基,将n维空间的数据转换到m维空间中,且m<n。下面给一个简单的示例说明,将一个2维的数据降低到一维表示。
Background: There is a list of numbers who stand for a curve. Can we describe the curve using less points, such as a line.
First Method: fitting the curve by using least square method. The Model is “y = ax + b”.
Second Method: finding one eigenvector standing for the curve by using PCA.
背景: 我们有一组数据表示一个平面上的曲线,为了减少存储数据的数量,我们是否可以用更少的点描述曲线,比如一条线。
方法一:用线性拟合
方法二:用主成分分析法
pd_analysis.csv
920 -3.7764
1520 -0.4437
2120 3.3307
2720 8.0182
3320 11.8022
clc;clear
% Input
data = csvread('pd_analysis.csv');
pos = (data(1:end,1))';
disparity = (data(1:end,2))';
curve = [disparity; pos];
plot(curve(1,:),curve(2,:),'b.');
First Method:
[p, S] = polyfit(curve(1,:),curve(2,:), 1);
x = min(curve(1,:)) : 1 : max(curve(1,:));
y = polyval(p,x,S);
hold on
plot(x, y, 'r--','LineWidth',2)
Second Method
% PCA
X = curve';
[coeff,score,roots] = pca(X);
meanX = mean(X,1);
[n,~] = size(X);
Xfit = repmat(meanX,n,1) + score(:,1)*coeff(:,1)';
dirVect = coeff(:,1);
t = [min(score(:,1))-.2, max(score(:,1))+.2];
endpts = [meanX + t(1)*dirVect'; meanX + t(2)*dirVect'];
K = dirVect(2)/dirVect(1)
plot(endpts(:,1),endpts(:,2),'k-');