Interior Points methods

IPOpt

In [1]:
using JuMP
using Ipopt
using LinearAlgebra
In [ ]:
"""
Problem: Solve the Rosenbrock function
"""

# For more information see here:
# https://jump.readthedocs.org/en/latest/installation.html#getting-solvers

# Information about Ipopt options here:
# http://www.coin-or.org/Ipopt/documentation/node39.html

m = Model(with_optimizer(Ipopt.Optimizer))
# setsolver(m, IpoptSolver(tol = 1e-6, max_iter = 200, output_file = "results.txt"))

@variable(m, x1)
@variable(m, x2)

@NLobjective(m, Min, 100(x2-x1^2)^2 + (1-x1)^2)

set_start_value(x1, -1.2)
set_start_value(x2, 1)

optimize!(m)

println("got ", objective_value(m), " at [", value(x1), ", ", value(x2), "]")
In [ ]:
print(m)
In [ ]:
m = Model(with_optimizer(Ipopt.Optimizer))

@variable(m, x[i=1:2])

@NLobjective(m, Min, 100(x[2]-x[1]^2)^2 + (1-x[1])^2)

set_start_value(x[1], -1.2)
set_start_value(x[2], 1)

println(m)

optimize!(m)

println("got ", objective_value(m), " at ", [value(x[1]), value(x[2])])
In [ ]:
include("mle.jl")
In [ ]:
n = 100

m = Model(with_optimizer(Ipopt.Optimizer))

@variable(m, x[i=1:n], start=(i/2))
@NLexpression(m, myexpr[i=1:n], sin(x[i]))
@NLconstraint(m, myconstr[i=1:n], myexpr[i] <= 0.5)
@NLobjective(m, Min, sum(x[i]^2-1 for i = 1:n))
In [ ]:
m
In [ ]:
optimize!(m)
In [ ]:
objective_value(m)
In [ ]:
sol = [value(x[i]) for i =1:100]
In [ ]:
sin.(sol)
In [ ]:
n = 100

m = Model(with_optimizer(Ipopt.Optimizer))

@variable(m, x[i=1:n], start=-1.0)
@NLexpression(m, myexpr[i=1:n], sin(x[i]))
@NLconstraint(m, myconstr[i=1:n], myexpr[i] <= 0.5)
@NLobjective(m, Min, sum(x[i]^2-1 for i = 1:n))
In [ ]:
optimize!(m)
In [ ]:
sol = [value(x[i]) for i =1:100]
In [ ]:
objective_value(m)
In [ ]:
sin.(sol)

Interior Points Methods for Linear Programming

Consider the program \begin{align*} \min_x\ & c^Tx \\ \mbox{s.t. } & Ax = b \\ & x \geq 0 \end{align*}

The Lagrangian is $$ L(x,\lambda,\mu) = c^Tx + \lambda^T(Ax - b) + \mu^T(-x) $$ and the duality bound is given by $$ \min_{x \in S} \max_{\mu \geq 0, \lambda} L(x,\lambda, \mu) $$ The KKT system becomes \begin{align*} c + A^T\lambda - \mu &= 0 \\ Ax &= b \\ x &\geq 0 \\ \mu_i x_i &= 0,\ i = 1,\ldots,n \\ \mu &\geq 0 \end{align*} Considering $\mu$ as a surplus vector, the first and fifth conditions can be rewritten as $$ c + A^T\lambda \geq 0 $$

At $(x^*, \lambda^*, \mu^*)$, $$ L(x^*, \lambda^*, \mu^*) = c^Tx^* + (\lambda^*)^T(Ax^*-b)-(\mu^*)^Tx^* = c^Tx^* $$

Moreover, the first KKT condition implies that at the primal-dual solution $(x^*, \lambda^*, \mu^*)$, the Lagrangian becomes $$ L(x^*, \lambda^*, \mu^*) = -b^T\lambda^* $$ as \begin{align*} L(x^*, \lambda^*, \mu^*) = c^Tx^* + (\lambda^*)^T(Ax^*-b)-(\mu^*)^Tx^* & = (c^T+(\lambda^*)^TA-(\mu^*)^T)x^*-(\lambda^*)^Tb \\ & = (x^*)^T(c+A^T\lambda^*-\mu^*)-b^T\lambda^* \end{align*}

Along with the duality bound, this means that we have to solve the problem \begin{align*} \max_{\lambda} \ & -b^T\lambda \\ \mbox{s.t. } & A^T\lambda \geq -c \end{align*} Setting $y = -\lambda$, the program can be rewritten as \begin{align*} \max_{y}\ & b^Ty \\ \mbox{s.t. } & A^Ty \leq c \end{align*} We retrieve the well-known dual problem is linear programming.

If we apply the logarithmic barrier to the primal linear program, we can rewrite it as \begin{align*} \min\ & c^Tx - \mu \sum_{i = 1}^n \ln x_i \\ \mbox{s.t. } & Ax = b, \end{align*} and the barrier Lagrangian is $$ L(x,\lambda) = c^Tx + \lambda^T(Ax - b) - \mu \sum_{i = 1}^n \ln x_i $$ and the KKT conditions are now \begin{align*} c + A^T\lambda - \mu X^{-1}e &= 0 \\ Ax &= b \end{align*} Set $s = \mu X^{-1}e$. We can rewrite the conditions as \begin{align*} c + A^T\lambda - s &= 0 \\ Ax &= b \\ s &= \mu X^{-1}e \end{align*} or \begin{align*} c + A^T\lambda - s &= 0 \\ Ax &= b \\ s_i x_i &= \mu \end{align*} The logarithmic barrier is equivalent to perturb the complementarity condition of the KKT system.

As before, set $y = -\lambda$. Solving the KKT system is equivalent to look for a solution of the linear system \begin{align*} A^Ty+s-c &= 0\\ Ax-b &= 0 \\ SXe-\mu e &= 0 \end{align*} The corresponding Newton equation is then $$ \begin{pmatrix} 0 & A^T & I \\ A & 0 & 0 \\ S & 0 & X \end{pmatrix} \begin{pmatrix} \Delta x \ \Delta y \ \Delta s

\end{pmatrix}

- \begin{pmatrix} A^Ty+s-c \\ Ax-b \\ SXe-\mu e \end{pmatrix} $$

Example: the farmer problem

Adapted from https://www.sonoma.edu/users/w/wilsonst/Courses/Math_131/lp/Farm.html

A farmer has 10 acres to plant in wheat and rye. He has to plant at least 7 acres. However, he has only CAD1200 to spend and each acre of wheat costs CAD200 to plant and each acre of rye costs CAD100 to plant. Moreover, the farmer has to get the planting done in 12 hours and it takes an hour to plant an acre of wheat and 2 hours to plant an acre of rye. If the profit is CAD500 per acre of wheat and CAD300 per acre of rye how many acres of each should be planted to maximize profits?

In [5]:
c = [-500.0 ; -300.0 ; 0.0 ; 0.0 ; 0.0; 0.0]
A = [1.0 1.0 1.0 0.0 0.0 0.0; 1.0 1.0 0.0 -1.0 0.0 0.0; 2.0 1.0 0.0 0.0 1.0 0.0; 1.0 2.0 0.0 0.0 0.0 1.0]
b = [10.0; 7.0; 12.0; 12]
x = [3.5; 4; 2.5; 0.5; 1; 0.5]
m, n = size(A)
μ = 10.0
y = zeros(4)
s = zeros(6)
for i = 1:6
    s[i] = μ/x[i]
end
w = [ x ; y; s ]
θ = 0.5
ρ = 0.9
Out[5]:
0.9
In [3]:
#using Gurobi
using Clp
In [6]:
model = Model(with_optimizer(Clp.Optimizer))
n = length(x)
m = length(b)

@variable(model, X[1:n] >= 0)
for i = 1:m
    @constraint(model, sum(A[i,j]*X[j] for j in 1:n) == b[i])
end
@objective(model, Min, sum(c[i]*X[i] for i in 1:n))

model
Out[6]:
$$ \begin{alignat*}{1}\min\quad & -500 X_{1} - 300 X_{2}\\ \text{Subject to} \quad & X_{1} \geq 0.0\\ & X_{2} \geq 0.0\\ & X_{3} \geq 0.0\\ & X_{4} \geq 0.0\\ & X_{5} \geq 0.0\\ & X_{6} \geq 0.0\\ & X_{1} + X_{2} + X_{3} = 10.0\\ & X_{1} + X_{2} - X_{4} = 7.0\\ & 2 X_{1} + X_{2} + X_{5} = 12.0\\ & X_{1} + 2 X_{2} + X_{6} = 12.0\\ \end{alignat*} $$
In [7]:
optimize!(model)
Coin0506I Presolve 4 (0) rows, 2 (-4) columns and 8 (-4) elements
Clp0006I 0  Obj -1516.8748 Primal inf 3.207812 (1) Dual inf 800 (2)
Clp0006I 2  Obj -3200
Clp0000I Optimal - objective value -3200
Coin0511I After Postsolve, objective -3200, infeasibilities - dual 0 (0), primal 0 (0)
Clp0032I Optimal objective -3200 - 2 iterations time 0.022, Presolve 0.01
In [8]:
objective_value(model)
Out[8]:
-3200.0
In [9]:
[value(X[i]) for i = 1:n]
Out[9]:
6-element Array{Float64,1}:
 4.0
 4.0
 2.0
 1.0
 0.0
 0.0
In [10]:
cost(c::Vector,x::Vector) = c'*x
Out[10]:
cost (generic function with 1 method)
In [11]:
cost(c,x)
Out[11]:
-2950.0
In [12]:
A*x-b
Out[12]:
4-element Array{Float64,1}:
 0.0
 0.0
 0.0
 0.0
In [13]:
s, x, s.*x
Out[13]:
([2.85714, 2.5, 4.0, 20.0, 10.0, 20.0], [3.5, 4.0, 2.5, 0.5, 1.0, 0.5], [10.0, 10.0, 10.0, 10.0, 10.0, 10.0])
In [14]:
m,n = size(A)
B = [ zeros(n,n) A' I ; A zeros(m,m) zeros(m,n) ; diagm(0 => x) zeros(n,m) diagm(0 => s)]
Out[14]:
16×16 Array{Float64,2}:
 0.0  0.0  0.0   0.0  0.0  0.0  1.0  …  1.0      0.0  0.0   0.0   0.0   0.0
 0.0  0.0  0.0   0.0  0.0  0.0  1.0     0.0      1.0  0.0   0.0   0.0   0.0
 0.0  0.0  0.0   0.0  0.0  0.0  1.0     0.0      0.0  1.0   0.0   0.0   0.0
 0.0  0.0  0.0   0.0  0.0  0.0  0.0     0.0      0.0  0.0   1.0   0.0   0.0
 0.0  0.0  0.0   0.0  0.0  0.0  0.0     0.0      0.0  0.0   0.0   1.0   0.0
 0.0  0.0  0.0   0.0  0.0  0.0  0.0  …  0.0      0.0  0.0   0.0   0.0   1.0
 1.0  1.0  1.0   0.0  0.0  0.0  0.0     0.0      0.0  0.0   0.0   0.0   0.0
 1.0  1.0  0.0  -1.0  0.0  0.0  0.0     0.0      0.0  0.0   0.0   0.0   0.0
 2.0  1.0  0.0   0.0  1.0  0.0  0.0     0.0      0.0  0.0   0.0   0.0   0.0
 1.0  2.0  0.0   0.0  0.0  1.0  0.0     0.0      0.0  0.0   0.0   0.0   0.0
 3.5  0.0  0.0   0.0  0.0  0.0  0.0  …  2.85714  0.0  0.0   0.0   0.0   0.0
 0.0  4.0  0.0   0.0  0.0  0.0  0.0     0.0      2.5  0.0   0.0   0.0   0.0
 0.0  0.0  2.5   0.0  0.0  0.0  0.0     0.0      0.0  4.0   0.0   0.0   0.0
 0.0  0.0  0.0   0.5  0.0  0.0  0.0     0.0      0.0  0.0  20.0   0.0   0.0
 0.0  0.0  0.0   0.0  1.0  0.0  0.0     0.0      0.0  0.0   0.0  10.0   0.0
 0.0  0.0  0.0   0.0  0.0  0.5  0.0  …  0.0      0.0  0.0   0.0   0.0  20.0

This is absolutely inefficient as the matrix is very sparse!

In [15]:
rhs = [ c-s-A'*y ; b-A*x ; [μ - x[i]*s[i] for i=1:n] ]
Out[15]:
16-element Array{Float64,1}:
 -502.85714285714283
 -302.5             
   -4.0             
  -20.0             
  -10.0             
  -20.0             
    0.0             
    0.0             
    0.0             
    0.0             
    0.0             
    0.0             
    0.0             
    0.0             
    0.0             
    0.0             
In [16]:
function d(c::Vector, A::Matrix, b::Vector, x::Vector, y::Vector, s::Vector, μ::Float64)
    m,n = size(A)
    B = [ zeros(n,n) A' I ; A zeros(m,m) zeros(m,n) ; diagm(0 => x) zeros(n,m) diagm(0 => s)]
    rhs = [ c-s-A'*y ; b-A*x ; [μ - x[i]*s[i] for i=1:n] ]
    Δ = B\rhs
    return Δ
end
Out[16]:
d (generic function with 1 method)
In [17]:
Δ = d(c,A,b,x,y,s,μ)
Out[17]:
16-element Array{Float64,1}:
  193.06735751295338  
   38.66913397483349  
 -231.73649148778676  
  231.73649148778688  
 -424.80384900074023  
 -270.4056254626204   
 -148.83530717986667  
   14.20658771280533  
  -52.48038490007405  
  -26.760140636565513 
 -236.5075129533679   
  -61.87061435973359  
  144.83530717986673  
   -5.79341228719467  
   42.48038490007402  
    6.7601406365655095
In [18]:
Δx = Δ[1:n]
Δy = Δ[n+1:n+m]
Δs = Δ[n+m+1:n+m+n]
A*(x+Δx)
Out[18]:
4-element Array{Float64,1}:
 10.000000000000107
  7.0              
 12.0              
 12.0              
In [19]:
x+Δx, cost(c,x+Δx)
Out[19]:
([196.567, 42.6691, -229.236, 232.236, -423.804, -269.906], -111084.41894892673)
In [22]:
function maxα(x::Vector, s::Vector, Δx::Vector, Δs::Vector)
    n = length(c)
    α = Inf
    for i=1:n
        if (Δx[i] < 0)
            α = min(α,-x[i]/Δx[i])
        end
        if (Δs[i] < 0)
            α = min(α,-s[i]/Δs[i])
        end
    end
    return α
end
Out[22]:
maxα (generic function with 1 method)
In [23]:
α = maxα(x,s,Δx,Δs)
Out[23]:
0.0018490739574836164
In [24]:
x+α*Δx
Out[24]:
6-element Array{Float64,1}:
 3.856995822817381  
 4.071502088591309  
 2.0715020885913096 
 0.9284979114086904 
 0.21450626577392862
 0.0                
In [25]:
cost(c,x+α*Δx)
Out[25]:
-3149.9485379860835
In [26]:
s+α*Δs
Out[26]:
6-element Array{Float64,1}:
  2.4198229741915656
  2.3855966582539048
  4.267811194630432 
 19.989287552214783 
 10.078549373422607 
 20.0125            
In [27]:
α *= θ
Out[27]:
0.0009245369787418082
In [28]:
x += α*Δx
y += α*Δy
s += α*Δs

x, y, s
Out[28]:
([3.6785, 4.03575, 2.28575, 0.714249, 0.607253, 0.25], [-0.137604, 0.0131345, -0.0485201, -0.0247407], [2.63848, 2.4428, 4.13391, 19.9946, 10.0393, 20.0063])
In [29]:
A*x
Out[29]:
4-element Array{Float64,1}:
 10.0
  7.0
 12.0
 12.0
In [30]:
cost(c,x)
Out[30]:
-3049.9742689930417
In [31]:
μ *= ρ
Out[31]:
9.0
In [32]:
while (μ > 1e-50)
    Δ = d(c,A,b,x,y,s,μ)
    Δx = Δ[1:n]
    Δy = Δ[n+1:n+m]
    Δs = Δ[n+m+1:n+m+n]
    α = θ*maxα(x,s,Δx,Δs)
    x += α*Δx
    y += α*Δy
    s += α*Δs
    μ *= ρ
end
In [33]:
x
Out[33]:
6-element Array{Float64,1}:
 3.8369541409781034 
 4.081522929510947  
 2.081522929510949  
 0.9184770704890519 
 0.24456878853284386
 2.21e-321          
In [34]:
cost(c,x)
Out[34]:
-3142.933949342336
In [ ]: