using LinearAlgebra
using Optim
Solve $$ \min_x f(x) = \frac{1}{2} x^T A x + b^T x + a $$ where $A \succ 0$. Setting $\nabla f(x) = 0$, it is equivalent to solve the linear system $Ax = -b$.
f = x -> 0.5*dot(x,A*x)+dot(b,x)
Adapted from https://www.rose-hulman.edu/~bryan/lottamath/congrad.pdf
Let $$ A = \begin{pmatrix} 3 & 1 & 0 \\ 1 & 2 & 2 \\ 0 & 2 & 4 \end{pmatrix} $$ Consider the function to minimize $$ f(x) = \frac{1}{2} x^TAx, $$ and assume the we already computed \begin{align*} d_0 &= (1, 0, 0)\\ d_1 & = (1, −3, 0)\\ d_2 &= (−2, 6, −5). \end{align*}
Check that $d_0$, $d_1$ and $d_2$ are $A$-conjugate.
A = [ 3.0 1 0 ; 1 2 2 ; 0 2 4]
d0 = [ 1.0 0 0 ]'
d1 = [ 1.0 -3.0 0.0 ]'
d2 = [ -2.0 6.0 -5.0]'
println("$(dot(d0, A*d1)) $(dot(d0, A*d2)) $(dot(d1, A*d2))")
det(A), eigen(A)
Take initial guess $x_0 = (1, 2, 3)$. Compute $x_1$, $x_2$ and $x_3$ using the conjugate gradient algorithm. Is $x_3$ optimal?
x0 = [1 2 3.0]'
-A*x0
f(x) = x^T*A*x
We have to compute $\alpha_k$, $k = 1,2,3$, by solving $$ \min_{\alpha} f(x_k + \alpha d_k) $$
In order to obtain $\alpha_1$, we have to minimize \begin{align*} f(x_0 + \alpha d_0) &= \frac{1}{2} \left(\begin{pmatrix} 1 & 2 & 3\end{pmatrix} + \alpha \begin{pmatrix} 1 & 0 & 0\end{pmatrix} \right) \begin{pmatrix} 3 & 1 & 0 \\ 1 & 2 & 2 \\ 0 & 2 & 4 \end{pmatrix} \left(\begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix} + \alpha \begin{pmatrix} 1 \\0 \\0 \end{pmatrix} \right) \\ & = \frac{1}{2}\begin{pmatrix} 1 + \alpha & 2 & 3 \end{pmatrix} \begin{pmatrix} 3 & 1 & 0 \\ 1 & 2 & 2 \\ 0 & 2 & 4 \end{pmatrix} \begin{pmatrix} 1 + \alpha \\ 2 \\ 3 \end{pmatrix}\\ & = \frac{1}{2}\begin{pmatrix} 1 + \alpha & 2 & 3 \end{pmatrix} \begin{pmatrix} 5+3\alpha \\ 11+\alpha \\ 16 \end{pmatrix}\\ & = \frac{1}{2} ((1 + \alpha)(5+3\alpha) + 22+2\alpha + 48 ) \\ & = \frac{1}{2} ( 3\alpha^2 + 8\alpha + 5 + 70 + 2\alpha ) \\ & = \frac{3}{2}\alpha^2 + 5\alpha+\frac{75}{2} \end{align*} with respect to $\alpha$.
We can obtain it by searching the zero of the derivative with respect to $\alpha$, that is $$ \frac{d}{d\alpha} f(x+\alpha d) = 0 $$
or
Therefore, we must have
Thus $$ \alpha_{0} = -\frac{5}{3} $$ $$ x_1 = x_0 - \frac{5}{3} d_0 = \begin{pmatrix} -\frac{2}{3} \\ 2 \\ 3 \end{pmatrix} $$
We can also directly compute $\alpha_0$ as $$ \alpha_0 = - \frac{d_0^T\nabla f(x_0)}{d_0^TAd_0} $$
x0 = [1 ; 2 ; 3.0]
∇f = A*x0
d0 = [1 ; 0 ; 0]
α0 = -dot(d0,∇f)/dot(d0,A*d0)
x1 = x0+α0*d0
A linesearch from $x_1$ in direction $d_1$ requires us to minimize $$ f(x_1 + \alpha d_1) = \frac{15}{2}\alpha^2 - 28\alpha + \frac{100}{3} $$ which occurs at $$ \alpha_1 = \frac{28}{15}, $$ yielding $$ x_2 = x_1 + \frac{28}{15}d_1 = \begin{pmatrix} \frac{6}{5} \\ \frac{-18}{5} \\ 3 \end{pmatrix}. $$
α1 = -dot(d1,A*x1)/dot(d1,A*d1)
28/15
x2 = x1+α1*d1
The final linesearch from $x_2$ in direction $d_2$ requires us to minimize $$ f(x_2 + \alpha d_2) = 20 \alpha^2 - 24\alpha + \frac{36}{5} $$ which occurs at $$ \alpha_2 = \frac{3}{5}, $$ yielding $$ x_3 = x_2 + \frac{3}{5}d_2 = \begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix}, $$ which is of course correct.
Similarly, we can compute the new point as
α2 = -dot(d2,A*x2)/dot(d2,A*d2)
x3 = x2+α2*d2
A first version of the conjugate gradient algorithm follows.
function cg_quadratic(A:: Matrix, b:: Vector, x0:: Vector, trace:: Bool = false)
n = length(x0)
x = x0
g = b+A*x
d = -g
if (trace)
iter = [ x ]
iterg = [ norm(g) ]
end
k = 0
for k = 1:n-1
Ad = A*d
normd = dot(d,Ad)
α = -dot(d,g)/normd
x += α*d
if (trace)
iter = [ iter; [x] ]
iterg = [ iterg; norm(g)]
end
g = b+A*x
β = dot(g,Ad)/normd
d = -g+β*d
end
normd = dot(d,A*d)
α = -dot(d,g)/normd
x += α*d
if (trace)
g = b+A*x # g must be equal to 0
iter = [ iter; [x] ]
iterg = [ iterg; norm(g)]
return x, iter, iterg
end
return x
end
Consider the simple example
A = [2 1; 1 2]
b = [1, 0]
A\(-b)
Solve $$ \min_{\alpha} f(x) = \frac{1}{2}x^TAx+b^Tx+c $$
Or, equivalently, we solve $$ c+\min_{\alpha} f(x) = \frac{1}{2}x^TAx+b^Tx $$
cg_quadratic(A, b, [0, 0], true)
What if $A$ is not positive definite?
A = [ 1 2 ; 2 1]
A\(-b)
cg_quadratic(A, b, [0, 0], true)
cg_quadratic(A, b, [1, 1], true)
f([1/3,-2/3])
f([0,0])
The conjugate gradient finds the solution of the linear system, and this does correspond to a first-order critical point of the function.
∇f = x -> A*x+b
x = [1.0/3; -2.0/3]
∇f(x)
x = [1; 1]
∇f(x)
step= x -> x-α*∇f(x)
α = 10
dot(step(x),A*step(x))
λ, u = eigen(A)
u
x = u[:,1]
α = 10
f = x -> 0.5*dot(x,A*x)+dot(b,x)
f(step(x))
α = 1000
dot(step(x),A*step(x))+dot(b,x)
f(x)
x = [1/3.0; -2/3]
f(x)
cg_quadratic(A, b, x, true)
We will need to incorporate a test on $\nabla f(x_k)$!
A more complex example
n = 500;
m = 600;
A = randn(n,m);
A = A * A'; # A is now a positive semi-definite matrix
A = A+I # A is positive definite
b = zeros(n)
for i = 1:n
b[i] = randn()
end
x0 = zeros(n)
b1 = A\(-b)
b2, iter, iterg = cg_quadratic(A, b, x0, true);
norm(b1-b2)
iterg
It works, but do we need to perform all the 500 iterations? We could be satisfied if we are close to the solution. We can mesure the residual of the linear system $$ r = b+Ax $$ that is nothing else the the gradient of the objective function of the quadratic optimization problem.
iter
We incorporate a convergence test in the function.
function cg_quadratic_tol(A:: Matrix, b:: Vector, x0:: Vector, trace:: Bool = false, tol = 1e-8)
n = length(x0)
x = x0
if (trace)
iter = [ x ]
end
g = b+A*x
d = -g
k = 0
tol2 = tol*tol
β = 0.0
while ((dot(g,g) > tol2) && (k <= n))
Ad = A*d
normd = dot(d,Ad)
α = dot(g,g)/normd
# α = -dot(d,g)/normd
x += α*d
if (trace)
iter = [ iter; x ]
end
g = b+A*x
β = dot(g,Ad)/normd
d = -g+β*d
k += 1
end
if (trace)
iter = [ iter; x ]
return x, iter, k
end
return x, k
end
x, iter, k = cg_quadratic_tol(A, b, x0, true)
The number of iterations is
k
Are we close to the solution?
norm(b1-x)
size(A)
which is much less than the problem dimension
A basic implementation of a preconditioned conjugate gradient algorithm follows, were $M$ is the inverse of the preconditioner to apply.
function pcg_quadratic_tol(A:: Matrix, b:: Vector, x0:: Vector, M:: Matrix,
trace:: Bool = false, tol = 1e-8)
n = length(x0)
x = x0
if (trace)
iter = [ x ]
end
g = b+A*x
v = M*g
d = -v
k = 0
tol2 = tol*tol
β = 0.0
gv = dot(g,v)
while ((gv > tol2) && (k <= n))
# while ((dot(g,g) > tol2) && (k <= n))
Ad = A*d
normd = dot(d,Ad)
#gv = dot(g,v)
α = gv/normd
x += α*d
if (trace)
iter = [ iter; x ]
end
g += α*Ad
v = M*g
gvold = gv
gv = dot(g,v)
β = gv/gvold
d = -v+β*d
k += 1
end
if (trace)
iter = [ iter; x ]
return x, iter, k
end
return x, k
end
Let's check first that when there is no preconditioning, we obtain the same iterates. Set
M = zeros(n,n)+I
x, iter, k = pcg_quadratic_tol(A, b, x0, M, true)
k, norm(x-b1)
We can compute the eigenvalues and condition number of $A$.
eigen(A)
cond(A)
Try to compute a simple precontionner using the inverse of the diagonal of matrix $A$.
D = 1 ./diag(A)
M = Diagonal(D)
Unfortunately, in this case, it does not help as the condition number is not improving.
B = M*A
cond(B)
Consider another situation when $A$ is diagonal.
n = 1000;
A = zeros(n,n);
for i = 1:n
A[i,i] = 10*rand()
end
b = zeros(n)
for i = 1:n
b[i] = rand()
end
x0 = zeros(n)
cond(A)
The solution we are looking for is
A\b
Without preconditionning, with have the iterates sequence
M = zeros(n,n)+I
x, iter, k = pcg_quadratic_tol(A, b, x0, M, true)
This is equivalent to the unpreconditioned version.
x, iter, k = cg_quadratic_tol(A, b, x0, true)
However, since $A$ is diagonal, an obvious diagonal preconditionner is $A^{-1}$ itself.
M = zeros(n,n)
for i = 1:n
M[i,i] = 1/A[i,i]
end
The condition number of the preconditioned matrix is of course equal to 1.
cond(M*A)
The theory then predicts that we converge in one iteration with the precionditionned conjugate gradient.
x, iter, k = pcg_quadratic_tol(A, b, x0, M, true)
Consider now another example.
A = zeros(n,n)+3*I
for i = 1:n-1
A[i,i+1] = 1.4
A[i+1,i] = 1.4
end
A
eigen(A)
A\(-b)
x, iter, k = cg_quadratic_tol(A, b, x0, true)
M = zeros(n,n)
for i = 1:n
M[i,i] = 1/A[i,i]
end
cond(A)
cond(M*A)
x, iter, k = pcg_quadratic_tol(A, b, x0, M, true)
There is no advantage.
M = A^(-1)
x, iter, k = pcg_quadratic_tol(A, b, x0, M, true)
Consider now the following example.
n = 1000
A = zeros(n,n)+Diagonal([2+i*i for i=1:n])
for i = 1:n-1
A[i,i+1] = 1
A[i+1,i] = 1
end
A[n,1] = 1
A[1,n] = 1
cond(A)
κ = cond(A)
(sqrt(κ)-1)/(sqrt(κ)+1)
A
A^(-1)
M = zeros(n,n)+I
x, iter, k = pcg_quadratic_tol(A, b, x0, M, true)
M = zeros(n,n)
for i = 1:n
M[i,i] = 1/A[i,i]
end
cond(A*M), cond(A)
M
A*M
x, iter, k = pcg_quadratic_tol(A, b, x0, M, true)