本文是SLAM十四讲学习笔记,用Eigen库手撕高斯牛顿法。
手撕高斯牛顿法(用Eigen库实现)
1.高斯牛顿法介绍
简单的最小二乘问题可以描述如下:
m
i
n
x
F
(
x
)
=
1
2
∣
∣
f
(
x
)
∣
∣
2
\begin{aligned} \underset{x}{min}F(x)=\frac {1}{2} ||f(x)||^2 \end{aligned}
xminF(x)=21∣∣f(x)∣∣2
高斯牛顿法是最优化算法中最简单的方法之一,我们用高斯牛顿法来求解这个最小二乘问题。它的思想是将
f
(
x
)
f(x)
f(x)进行一阶的泰勒展开。(请注意,这里是对
f
(
x
)
f(x)
f(x)进行一阶泰勒展开,而不是对目标函数
F
(
x
)
F(x)
F(x)展开。)
首先回忆一下泰勒展开:
f
(
x
)
=
f
(
x
0
)
0
!
+
f
′
(
x
0
)
1
!
(
x
−
x
0
)
+
f
′
′
(
x
0
)
2
!
(
x
−
x
0
)
2
+
.
.
.
+
f
(
n
)
(
x
0
)
n
!
(
x
−
x
0
)
n
+
R
n
(
x
)
\begin{aligned} f(x)=\frac{f(x_0)}{0!}+\frac{f^{'}(x_0)}{1!}(x-x_0)+\frac{f^{''}(x_0)}{2!}(x-x_0)^{2}+...+\frac{f^{(n)}(x_0)}{n!}(x-x_0)^{n}+R_n(x) \end{aligned}
f(x)=0!f(x0)+1!f′(x0)(x−x0)+2!f′′(x0)(x−x0)2+...+n!f(n)(x0)(x−x0)n+Rn(x)
f
(
x
)
f(x)
f(x)的一阶泰勒展开为:
f
(
x
+
Δ
x
)
≈
f
(
x
)
+
J
(
x
)
T
Δ
x
\begin{aligned} f(x+\Delta x)\approx{f(x)}+J(x)^T\Delta x \end{aligned}
f(x+Δx)≈f(x)+J(x)TΔx
其中
J
(
x
)
T
J(x)^T
J(x)T为
f
(
x
)
f(x)
f(x)关于
x
x
x的导数,为
n
×
1
n \times 1
n×1的列向量。我们的目标是寻找增量
Δ
x
\Delta x
Δx使得
∣
∣
f
(
x
+
Δ
x
)
∣
∣
2
||f(x+\Delta x)||^2
∣∣f(x+Δx)∣∣2达到最小。为了求
Δ
x
\Delta x
Δx,我们需要解一个线性的最小二乘问题:
Δ
x
∗
=
a
r
g
m
i
n
Δ
x
1
2
∣
∣
f
(
x
)
+
J
(
x
)
T
Δ
x
∣
∣
2
\begin{aligned} \Delta x^*=arg\ \underset{\Delta x}{min}\ \frac{1}{2} ||{f(x)}+J(x)^T\Delta x||^2 \end{aligned}
Δx∗=arg Δxmin 21∣∣f(x)+J(x)TΔx∣∣2
根据极值条件,将上述目标函数对
Δ
x
\Delta x
Δx求导,并令导数为零。为此,展开目标函数的平方项:
1
2
∣
∣
f
(
x
)
+
J
(
x
)
T
Δ
x
∣
∣
2
=
1
2
(
f
(
x
)
+
J
(
x
)
T
Δ
x
)
T
(
f
(
x
)
+
J
(
x
)
T
Δ
x
)
=
1
2
(
∣
∣
f
(
x
)
∣
∣
2
2
+
2
f
(
x
)
J
(
x
)
T
Δ
x
+
Δ
x
T
J
(
x
)
J
(
x
)
T
Δ
x
)
\begin{aligned} \frac{1}{2} ||{f(x)}+J(x)^T\Delta x||^2&=\frac{1}{2}({f(x)}+J(x)^T\Delta x)^T({f(x)}+J(x)^T\Delta x)\\ &=\frac{1}{2}(||f(x)||^{2}_2+2f(x)J(x)^T\Delta x+\Delta x^TJ(x)J(x)^T\Delta x) \end{aligned}
21∣∣f(x)+J(x)TΔx∣∣2=21(f(x)+J(x)TΔx)T(f(x)+J(x)TΔx)=21(∣∣f(x)∣∣22+2f(x)J(x)TΔx+ΔxTJ(x)J(x)TΔx)
求上式关于
Δ
x
\Delta x
Δx的导数,并令其为零:
J
(
x
)
f
(
x
)
+
J
(
x
)
J
(
x
T
)
Δ
x
=
0
\begin{aligned} J(x)f(x)+J(x)J(x^T)\Delta x=0 \end{aligned}
J(x)f(x)+J(x)J(xT)Δx=0
可以得到如下方程组:
J
(
x
)
J
(
x
)
T
Δ
x
=
−
J
(
x
)
f
(
x
)
\begin{aligned} J(x)J(x)^T\Delta x=-J(x)f(x) \end{aligned}
J(x)J(x)TΔx=−J(x)f(x)
令
H
(
x
)
=
J
(
x
)
J
(
x
)
T
H(x)=J(x)J(x)^T
H(x)=J(x)J(x)T,
g
(
x
)
=
−
J
(
x
)
f
(
x
)
g(x)=-J(x)f(x)
g(x)=−J(x)f(x),则该方程组可以表示为:
H
(
x
)
Δ
x
=
g
(
x
)
\begin{aligned} H(x)\Delta x=g(x) \end{aligned}
H(x)Δx=g(x)
这个方程式关于变量
Δ
x
\Delta x
Δx的线性方程,求解这个方程式整个优化问题的核心所在。如果能顺利解出方程,高斯牛顿法的算法步骤可以写作如下:
高斯牛顿法执行步骤
1.给定初始值
x
0
x_0
x0;
2.对于第
k
k
k次迭代,求出当前的雅克比矩阵
J
(
x
k
)
J(x_k)
J(xk)和误差
f
(
x
k
)
f(x_k)
f(xk);
3.求解增量方程
H
Δ
x
k
=
g
H\Delta x_k=g
HΔxk=g;
4.若
Δ
x
k
\Delta x_k
Δxk足够小,则停止。否则令
Δ
x
k
+
1
=
x
k
+
Δ
x
k
\Delta x_{k+1}=x_k+\Delta x_k
Δxk+1=xk+Δxk,返回第2步;
2.高斯牛顿法Eigen库实现
2.1非线性函数构造
考虑一条满足如下方程的曲线:
y
=
e
(
a
x
2
+
b
x
+
c
)
+
w
\begin{aligned} y=e^{(ax^2+bx+c)}+w \end{aligned}
y=e(ax2+bx+c)+w
其中
a
,
b
,
c
a,b,c
a,b,c为曲线的参数,
w
w
w为高斯噪声,满足
w
∼
(
0
,
σ
2
)
w\sim(0,\sigma^2)
w∼(0,σ2)。假设我们有
N
N
N个关于
x
,
y
x,y
x,y的观测数据点,根据这些观测点求出曲线参数,最小二乘函数如下:
m
i
n
a
,
b
,
c
1
2
∑
i
=
1
N
∣
∣
y
i
−
e
a
x
i
2
+
b
x
i
+
c
∣
∣
2
.
\begin{aligned} \underset{a,b,c}{min} \frac{1}{2} \sum_{i=1}^N||y_i-e^{ax_i^2+bx_i+c}||^2. \end{aligned}
a,b,cmin21i=1∑N∣∣yi−eaxi2+bxi+c∣∣2.
定义误差模型为:
e
i
=
y
i
−
e
a
x
i
2
+
b
x
i
+
c
e_i=y_i-e^{ax_i^2+bx_i+c}
ei=yi−eaxi2+bxi+c
那么,可以求出每个误差项对于状态变量的导数:
∂
e
i
∂
a
=
−
x
i
2
e
a
x
i
2
+
b
x
i
+
c
∂
e
i
∂
b
=
−
x
i
e
a
x
i
2
+
b
x
i
+
c
∂
e
i
∂
c
=
−
e
a
x
i
2
+
b
x
i
+
c
\begin{aligned} \frac {\partial e_i}{\partial a}&=-x_i^2e^{ax_i^2+bx_i+c}\\ \frac {\partial e_i}{\partial b}&=-x_ie^{ax_i^2+bx_i+c}\\ \frac {\partial e_i}{\partial c}&=-e^{ax_i^2+bx_i+c} \end{aligned}
∂a∂ei∂b∂ei∂c∂ei=−xi2eaxi2+bxi+c=−xieaxi2+bxi+c=−eaxi2+bxi+c
于是有
J
i
=
[
∂
e
i
∂
a
,
∂
e
i
∂
b
,
∂
e
i
∂
c
]
T
J_i=[\frac {\partial e_i}{\partial a},\frac {\partial e_i}{\partial b},\frac {\partial e_i}{\partial c}]^T
Ji=[∂a∂ei,∂b∂ei,∂c∂ei]T,增量方程如下:
(
∑
i
=
1
100
J
i
J
i
T
)
Δ
x
=
(
∑
i
=
1
100
−
J
i
e
i
)
(\sum_{i=1}^{100}J_iJ_i^T)\Delta x=(\sum_{i=1}^{100}-J_ie_i)
(i=1∑100JiJiT)Δx=(i=1∑100−Jiei)
2.2Eigen库实现
#include <iostream>
#include <opencv2/opencv.hpp>
#include <Eigen/Dense>
using namespace std;
using namespace Eigen;
int main()
{
/*第一部分,生成观测数据xi,yi*/
double ar = 1.0, br = 2.0, cr = 1.0;//真实参数值
double ae = 2.0, be = -1.0, ce = 5.0;//估计参数值
int N = 100;//数据点
double w_sigma = 1.0;//噪声的sigma值
cv::RNG rng;//opencv随机数产生器
vector<double> x_data, y_data;//数据
for (int i = 0; i < N; i++)
{
double x = i / 100.0;
x_data.push_back(x);
y_data.push_back(exp(ar*x*x + br * x + cr) + rng.gaussian(w_sigma*w_sigma));//加上高斯噪声
}
/*第二部分,开始高斯牛顿迭代*/
int iterations = 100;
double cost = 0, lastCost = 0;//本次迭代的cost和上一次迭代的cost
//开始计时间
chrono::steady_clock::time_point t1 = chrono::steady_clock::now();
//迭代iterations次
for (int iter = 0; iter < iterations; iter++)
{
Matrix3d H = Matrix3d::Zero();//H=JxJ^T
Vector3d b = Vector3d::Zero();//b=-J*f
cost = 0;
//求解每个观测点的损失,即F=1/2*||f||^2方程中的f
for (int i = 0; i < N; i++)
{
double xi = x_data[i], yi = y_data[i];//第i个数据点
double error = yi - exp(ae*xi*xi + be * xi + ce);
Vector3d J;//雅可比矩阵
J[0] = -xi * xi*exp(ae*xi*xi + be * xi + ce); // de/da
J[1] = -xi * exp(ae*xi*xi + be * xi + ce); // de/db
J[2] = -exp(ae*xi*xi + be * xi + ce); // de/dc
H += J*J.transpose();
b += -error*J;
cost += error * error;
}
//求解线性方程组Hx=b
Vector3d dx = H.ldlt().solve(b);
if (isnan(dx[0]))
{
std::cout << "result is nan!" << std::endl;
break;
}
//如果当前迭代不能使目标函数减小,则停止迭代
if (iter > 0 && cost >= lastCost)
{
std::cout << "cost: " << cost << ">= lastCost:" << lastCost << ",break." << std::endl;
break;
}
//更新参数
ae += dx[0];
be += dx[1];
ce += dx[2];
//记录当前迭代次数的代价函数值
lastCost = cost;
//输出迭代结果
std::cout << "iteration:"<<iter+1<<"\ttotal cost:" << cost << "\t\tupdata:" << dx.transpose() << "\t\testimated params:" << ae << "," << be << "," << ce << endl;
}
chrono::steady_clock::time_point t2 = chrono::steady_clock::now();
chrono::duration<double>time_used = chrono::duration_cast<chrono::duration<double>>(t2 - t1);
std::cout << "solve time cost=" << time_used.count() << "seconds." << endl;
std::cout << "estimated abc=" << ae << "," << be << "," << ce << endl;
return 0;
}
2.3输出结果
最终得到参数a、b、c为0.890912、2.1719、0.943629,用时9ms。
iteration:1 total cost:3.19575e+06 updata:0.0455771 0.078164 -0.985329 estimated params:2.04558,-0.921836,4.01467
iteration:2 total cost:376785 updata: 0.065762 0.224972 -0.962521 estimated params:2.11134,-0.696864,3.05215
iteration:3 total cost:35673.6 updata:-0.0670241 0.617616 -0.907497 estimated params:2.04432,-0.0792484,2.14465
iteration:4 total cost:2195.01 updata:-0.522767 1.19192 -0.756452 estimated params:1.52155,1.11267,1.3882
iteration:5 total cost:174.853 updata:-0.537502 0.909933 -0.386395 estimated params:0.984045,2.0226,1.00181
iteration:6 total cost:102.78 updata:-0.0919666 0.147331 -0.0573675 estimated params:0.892079,2.16994,0.944438
iteration:7 total cost:101.937 updata:-0.00117081 0.00196749 -0.00081055 estimated params:0.890908,2.1719,0.943628
iteration:8 total cost:101.937 updata: 3.4312e-06 -4.28555e-06 1.08348e-06 estimated params:0.890912,2.1719,0.943629
iteration:9 total cost:101.937 updata:-2.01204e-08 2.68928e-08 -7.86602e-09 estimated params:0.890912,2.1719,0.943629
cost: 101.937>= lastCost:101.937,break.
solve time cost=0.00926409seconds.
estimated abc=0.890912,2.1719,0.943629
3.参考资料
3.1参考书籍
《视觉SLAM十四讲》–高翔