Assuming $B_k$ is a summetric positive definite matrix, the BFGS update is defined as $$ B_{k+1} = B_k - \frac{B_ks_ks_k^T B_k}{s_k^T B_k s_k} + \frac{y_ky_k^T}{s_k^Ty_k} $$ where $y_k = \nabla f(x_{k+1}) - \nabla f(x_k)$ and $s_k = x_{k+1} - x_k$. Its implementation in Julia is direct.
function BFGSUpdate(B, y, s)
Bs = B*s
return B - (Bs*Bs')/dot(s, Bs) + (y*y')/dot(s,y)
end
function BFGSUpdate!(B, y, s)
Bs = B*s
B[:,:] = B - (Bs*Bs')/dot(s, Bs) + (y*y')/dot(s,y)
end
using LinearAlgebra
n = 3
y = [ 1.0 2 3]'
s = [ 0.5 0.5 0.5 ]'
BFGSUpdate(I, y, s)
B = zeros(n,n)+I
BFGSUpdate(B, y, s)
B
BFGSUpdate!(B, y, s)
B
BFGSUpdate!(B, y, s)
It is however often more convenient to work with the inverse. A technical derivation gives
The corresponding Julia implementation follows.
function InvBFGSUpdate_naive(invB::Matrix, y::Vector, s::Vector)
ys = dot(y, s)
A = I-(s*y')/ys
return A*invB*A' + (s*s')/ys
end
This implementation is however inefficient as it involves a temporary matrix. We can avoid by reorganizing the terms, recognizing that $B_{k}^{-1}$ is symmetric, and that $y_{k}^{T}B_k^{-1}y_k$ and $s_k^Ty_k$ are scalar. This leads to $$ B_{k+1}^{-1} = B_{k}^{-1} + \frac{(s_k^Ty_k+y_k^TB_{k}^{-1}y_k)s_ks_k^T}{(s_k^Ty_k)^2} - \frac{B_k^{-1}y_ks_k^T + s_ky_k^TB_k^{-1}}{s_k^Ty_k} $$ The corresponding Julia implementation is
function InvBFGSUpdate(invB::Matrix, y::Vector, s::Vector)
ys = dot(y, s)
invBy = invB*y
return invB+1.0/ys*((ys+dot(y, invBy))/ys*(s*s') - invBy*s' - s*invBy')
end
using LinearAlgebra
n = 2
B = invB = zeros(n,n)+I
n, m = size(B)
n
s = [-1.75,-0.75]
y = [-8.5,-5.0]
Bp = BFGSUpdate(B, y, s)
inv(Bp)
invBp = InvBFGSUpdate_naive(invB, y, s)
invBp = InvBFGSUpdate(invB, y, s)
using BenchmarkTools
@btime InvBFGSUpdate_naive(invB, y, s)
@btime InvBFGSUpdate(invB, y, s)
n = 1000
B = invB = zeros(n,n)+I
n, m = size(B)
s = rand(n)
y = 10 * rand(n)
A1 = InvBFGSUpdate_naive(invB, y, s)
@btime InvBFGSUpdate_naive(invB, y, s)
A2 = InvBFGSUpdate(invB, y, s)
@btime InvBFGSUpdate(invB, y, s)
A1-A2
norm(A1-A2)
Adapted from https://en.wikipedia.org/wiki/Symmetric_rank-one
where $y_k = \nabla f(x_{\rm cand}) - \nabla f(x_k)$ and $s_k = x_{\rm cand} - x_k$.
Here $B_k$ is not necessarily positive definite.
The corresponding update to the approximate inverse-Hessian $H_{k}=B_{k}^{-1}$ is $$ H_{k+1}=H_{k}+{\frac{(s_{k}-H_{k}y_{k})(s_{k}-H_{k}y_{k})^{T}}{(s_{k}-H_{k}y_{k})^{T}y_{k}}}. $$ The SR1 formula has been rediscovered a number of times. A drawback is that the denominator can vanish. Some authors have suggested that the update be applied only if $$ |s_{k}^{T}(y_{k}-B_{k}s_{k})|\geq r\|s_{k}\|\|y_{k}-B_{k}s_{k}\|, $$ where $r \in (0,1)$ is a small number, e.g. $10^{-8}$.