๐Ÿ“– Constrained optimization#

โฑ | words

Setting up a constrained optimization problem#

Letโ€™s start with recalling the definition of a general optimization problem

Definition

The general form of the optimization problem is

\[\begin{split} V(\theta) = \max_{x} f(x,\theta) \\ \text {subject to} \\ g_i(x,\theta) = 0, \; i\in\{1,\dots,I\}\\ h_j(x,\theta) \le 0, \; j\in\{1,\dots,J\} \end{split}\]

where:

  • \(f(x,\theta) \colon \mathbb{R}^N \times \mathbb{R}^K \to \mathbb{R}\) is an objective function

  • \(x \in \mathbb{R}^N\) are decision/choice variables

  • \(\theta \in \mathbb{R}^K\) are parameters

  • \(g_i(x,\theta) = 0, \; i\in\{1,\dots,I\}\) where \(g_i \colon \mathbb{R}^N \times \mathbb{R}^K \to \mathbb{R}\), are equality constraints

  • \(h_j(x,\theta) \le 0, \; j\in\{1,\dots,J\}\) where \(h_j \colon \mathbb{R}^N \times \mathbb{R}^K \to \mathbb{R}\), are inequality constraints

  • \(V(\theta) \colon \mathbb{R}^K \to \mathbb{R}\) is a value function

Today we focus on the problem with constrants, i.e. \(I+J>0\)

The obvious classification that we follow

  • equality constrained problems, i.e. \(I>0\), \(J=0\)

  • inequality constrained problems, i.e. \(J>0\), which can include equalities as special case

Note

Note that every optimization problem can be seen as constrained if we define the objective function to have the domain that coincides with the admissible set.

\[\begin{split} \begin{array}{c} \max_x f(x): \mathbb{R}^N \to \mathbb{R} \\ \text {subject to} \\ g_i(x) = 0, \; i\in\{1,\dots,I\}\\ h_j(x) \le 0, \; j\in\{1,\dots,J\} \end{array} \end{split}\]

is equivalent to

\[\begin{split} \begin{array}{c} \max_x f(x): A \to \mathbb{R} \\ A = \big\{x \in \mathbb{R}^N \colon g_i(x) = 0, \; i\in\{1,\dots,I\} \text{ and } h_j(x) \le 0, \; j\in\{1,\dots,J\}\big\} \end{array} \end{split}\]

Constrained Optimization = economics#

A major focus of econ: the optimal allocation of scarce resources

  • optimal = optimization

  • scarce = constrained

Standard constrained problems:

  • Maximize utility given budget

  • Maximize portfolio return given risk constraints

  • Minimize cost given output requirement

Example

Maximization of utility subject to budget constraint

\[ % \max u(x_1, x_2) \; \text{ such that } \; p_1 x_1 + p_2 x_2 \leq m % \]
  • \(p_i\) is the price of good \(i\), assumed \(> 0\)

  • \(m\) is the budget, assumed \(> 0\)

  • \(x_i \geq 0\) for \(i = 1,2\)

  • let \(u(x_1, x_2) = \alpha \log(x_1) + \beta \log(x_2)\), \(\alpha>0\), \(\beta>0\)

_images/budget_set_1.png

Fig. 81 Budget set when, \(p_1=0.8\), \(p_2 = 1.2\), \(m=4\)#

_images/log_util.png

Fig. 82 Log utility with \(\alpha=0.4\), \(\beta=0.5\)#

_images/log_util_contour.png

Fig. 83 Log utility with \(\alpha=0.4\), \(\beta=0.5\)#

We seek a bundle \((x_1^\star, x_2^\star)\) that maximizes \(u\) over the budget set

That is,

\[ % \alpha \log(x_1^\star) + \beta \log(x_2^\star) \geq \alpha \log(x_1) + \beta \log(x_2) % \]

for all \((x_1, x_2)\) satisfying \(x_i \geq 0\) for each \(i\) and

\[ % p_1 x_1 + p_2 x_2 \leq m % \]

Visually, here is the budget set and objective function:

_images/budget_set_3.png

Fig. 84 Utility max for \(p_1=1\), \(p_2 = 1.2\), \(m=4\), \(\alpha=0.4\), \(\beta=0.5\)#

First observation: \(u(0, x_2) = u(x_1, 0) = u(0, 0) = -\infty\)

  • Hence we need consider only strictly positive bundles

Second observation: \(u(x_1, x_2)\) is strictly increasing in both \(x_i\)

  • Never choose a point \((x_1, x_2)\) with \(p_1 x_1 + p_2 x_2 < m\)

  • Otherwise can increase \(u(x_1, x_2)\) by small change

Hence we can replace \(\leq\) with \(=\) in the constraint

\[ % p_1 x_1 + p_2 x_2 \leq m \quad \text{becomes} \quad p_1 x_1 + p_2 x_2 = m % \]

Implication: Just search along the budget line

Substitution Method#

Substitute constraint into objective function and treat the problem as unconstrained

This changes

\[ % \max_{x_1, x_2} \left\{ \alpha \log(x_1) + \beta \log(x_2) \right\} \text{ such that } \; p_1 x_1 + p_2 x_2 = m % \]

into

\[ % \max_{x_1} \left\{ \alpha \log(x_1) + \beta \log((m - p_1x_1) / p_2) \right\} % \]

Since all candidates satisfy \(x_1 > 0\) and \(x_2 > 0\), the domain is

\[ 0 < x_1 < \frac{m}{p_1} \]
_images/one_dim_budget.png

Fig. 85 Utility max for \(p_1=1\), \(p_2 = 1.2\), \(m=4\), \(\alpha=0.4\), \(\beta=0.5\)#

_images/log_util_budget_line.png

Fig. 86 Utility max for \(p_1=1\), \(p_2 = 1.2\), \(m=4\), \(\alpha=0.4\), \(\beta=0.5\)#

First order condition for

\[ % \max_{x_1} \left\{ \alpha \log(x_1) + \beta \log((m - p_1x_1) / p_2) \right\} % \]

gives

\[ % x_1^\star = \frac{\alpha}{\alpha + \beta} \cdot \frac{m}{p_1} % \]

Exercise: Verify

Exercise: Check second order condition (strict concavity)

Substituting into \(p_1 x_1^\star + p_2 x_2^\star = m\) gives

\[ % x_2^\star = \frac{\beta}{\beta + \alpha} \cdot \frac{m}{p_2} % \]
_images/log_util_optimizer.png

Fig. 87 Maximizer for \(p_1=1\), \(p_2 = 1.2\), \(m=4\), \(\alpha=0.4\), \(\beta=0.5\)#

Substitution method: algorithm#

How to solve

\[\begin{split} % \max_{x_1, x_2} \; f(x_1, x_2) \\ \text{ such that } g(x_1, x_2) = 0 % \end{split}\]

Steps:

  1. Write constraint as \(x_2 = h(x_1)\) for some function \(h\)

  • Solve univariate problem \(\max_{x_1} f(x_1, h(x_1))\) to get \(x_1^\star\)

  1. Plug \(x_1^\star\) into \(x_2 = h(x_1)\) to get \(x_2^\star\)

Limitations: substitution doesnโ€™t always work

Example

Suppose that \(g(x_1, x_2) = x_1^2 + x_2^2 - 1\)

Step 1 from the cookbook says

use \(g(x_1, x_2) = 0\) to write \(x_2\) as a function of \(x_1\)

But \(x_2\) has two possible values for each \(x_1 \in (-1, 1)\)

\[ % x_2 = \pm \sqrt{1 - x_1^2} % \]

In other words, \(x_2\) is not a well defined function of \(x_1\)

As we meet more complicated constraints such problems intensify

  • Constraint defines complex curve in \((x_1, x_2)\) space

  • Inequality constraints, etc.

We need more general, systematic approach

Tangency conditions#

Optimization approach based on tangency of contours and constraints

Example

Consider again the problem

\[\begin{split} \max_{x_1, x_2} \left\{ \alpha \log(x_1) + \beta \log(x_2) \right\} \\ \text{ such that } p_1 x_1 + p_2 x_2 = m \end{split}\]

Turns out that the maximizer has the following property:

  • budget line is tangent to an indifference curve at maximizer

_images/log_util_maximizer_tangent.png

Fig. 88 Maximizer for \(p_1=1\), \(p_2 = 1.2\), \(m=4\), \(\alpha=0.4\), \(\beta=0.5\)#

In fact this is an instance of a general pattern

Notation: Letโ€™s call \((x_1, x_2)\) interior to the budget line if \(x_i > 0\) for \(i=1,2\) (not a โ€œcornerโ€ solution, see below)

In general, any interior maximizer \((x_1^\star, x_2^\star)\) of differentiable utility function \(u\) has the property: budget line is tangent to a contour line at \((x_1^\star, x_2^\star)\)

Otherwise we can do better:

_images/log_util_not_maximizer.png

Fig. 89 When tangency fails we can do better#

Necessity of tangency often rules out a lot of points

We exploit this fact to build intuition and develop more general method

Relative Slope Conditions#

Consider an equality constrained optimization problem where objective and constraint functions are differentiable:

\[\begin{split} % f(x_1, x_2) \longrightarrow \max_{x_1, x_2} \\ \text{ such that } g(x_1, x_2) = 0 % \end{split}\]

How to develop necessary conditions for optima via tangency?

_images/tangency_1.png

Fig. 90 Contours for \(f\) and \(g\)#

_images/tangency_2.png

Fig. 91 Contours for \(f\) and \(g\)#

_images/tangency_3.png

Fig. 92 Tangency necessary for optimality#

How do we locate an optimal \((x_1, x_2)\) pair?

Diversion: derivative of an implicit function

To continue with the tangency approach we need to know how to differentiate an implicit function \(x_2(x_1)\) which is given by a contour line of the utility function \(f(x_1, x_2)=c\)

Letโ€™s approach this task in the following way

  • utility function is given by \(f \colon \mathbb{R}^2 \to \mathbb{R}\)

  • the contour line is given by \(f(x_1,x_2) = c\) which defines an implicit function \(x_1(x_2)\)

  • let map \(g \colon \mathbb{R} \to \mathbb{R}^2\) given by \(g(x_2) = \big[x_1(x_2),x_2\big]\)

The total derivative of \(f \circ g\) can be derived using the chain rule

\[\begin{split} Dg(x_2) = \left( \begin{array}{c} \frac{d x_1}{d x_2} \\ 1 \end{array} \right) \\ Df(x) = \left( \frac{\partial f}{\partial x_1}(x), \frac{\partial f}{\partial x_2}(x) \right) = \big(f_1(x),f_2(x)\big) \\ D(f \circ g)(x) = Df(x) \cdot Dg(x) = f_1\big(g(x_2)\big) \frac{d x_1}{d x_2} + f_2\big(g(x_2)\big) \end{split}\]

Differentiating the equation \(f(x_1,x_2) = c\) on both sides with respect to \(x_2\) gives \(D(f \circ g)(x) = 0\), thus leading to the final expression for the derivative of the implicit function

\[ \frac{d x_1}{d x_2}(x_2) = - \frac{f_2(x_1,x_2)}{f_1(x_1,x_2)} = - \frac{f_2(x_1,x_2)}{f_1(x_1,x_2)} \]
_images/tangency_4.png

Fig. 93 Slope of contour lines#

Letโ€™s choose \((x_1, x_2)\) to equalize the slopes

That is, choose \((x_1, x_2)\) to solve

\[ % - \frac{f_1(x_1, x_2)} {f_2(x_1, x_2)} = - \frac{g_1(x_1, x_2)} {g_2(x_1, x_2)} % \]

Equivalent:

\[ % \frac{f_1(x_1, x_2)} {f_2(x_1, x_2)} = \frac{g_1(x_1, x_2)} {g_2(x_1, x_2)} % \]

Also need to respect \(g(x_1, x_2) = 0\)

_images/tangency_5.png

Fig. 94 Condition for tangency#

Tangency condition algorithm#

In summary, when \(f\) and \(g\) are both differentiable functions, to find candidates for optima in

\[ % \max_{x_1, x_2} f(x_1, x_2) % \]
\[ % \text{ such that } \; g(x_1, x_2) = 0 % \]
  1. (Impose slope tangency) Set

\[ % \frac{f_1(x_1, x_2)}{f_2(x_1, x_2)} = \frac{g_1(x_1, x_2)}{g_2(x_1, x_2)} % \]
  1. (Impose constraint) Set \(g(x_1, x_2) = 0\)

  2. Solve simultaneously for \((x_1, x_2)\) pairs satisfying these conditions

Example

Consider again

\[ % \max_{x_1, x_2} \left\{ \alpha \log(x_1) + \beta \log(x_2) \right\} % \]
\[ % \text{ such that } \; p_1 x_1 + p_2 x_2 - m = 0 % \]

Then

\[ % \frac{f_1(x_1, x_2)}{f_2(x_1, x_2)} = \frac{g_1(x_1, x_2)}{g_2(x_1, x_2)} \quad \iff \quad \frac{\alpha}{\beta} \frac{x_2}{x_1} = \frac{p_1}{p_2} % \]

Solving simultaneously with \(p_1 x_1 + p_2 x_2 = m\) gives

\[ % x_1^\star = \frac{\alpha}{\alpha + \beta} \cdot \frac{m}{p_1} \quad \text{and} \quad x_2^\star = \frac{\beta}{\beta + \alpha} \cdot \frac{m}{p_2} % \]

Same as beforeโ€ฆ

This approach can be applied to maximization as well as minimization problems

The method of Lagrange multipliers#

Most general method for equality constrained optimization

Lagrange theorem

Let \(f \colon \mathbb{R}^N \to \mathbb{R}\) and \(g \colon \mathbb{R}^N \to \mathbb{R}^K\) be continuously differentiable functions.

Let \(D = \{ x \colon g_i(x) = 0, i=1,\dots,K \} \subset \mathbb{R}^N\)

Suppose that

  • \(x^\star \in D\) is a local extremum of \(f\) on \(D\), and

  • the gradients of the constraint functions \(g_i\) are linearly independent at \(x^\star\) (equivalently, the rank of the Jacobian matrix \(Dg(x^\star)\) is equal to \(K\))

Then there exists a vector \(\lambda^\star \in \mathbb{R}^K\) such that

\[ Df(x^\star) - \lambda^\star \cdot Dg(x^\star) = Df(x^\star) - \sum_{i=1}^K \lambda_i^\star Dg_i(x^\star) = 0 \]

Note

  • \(\lambda^\star\) is called the vector of Lagrange multipliers

  • The assumption that the gradients of the constraint functions \(g_i\) are linearly independent at \(x^\star\) is called the constraint qualification assumption

  • Lagrange theorem provides necessary first order conditions for both maximum and minimum problems

Definition

The function \(\mathcal{L} \colon \mathbb{R}^{N+K} \to \mathbb{R}\) defined as

\[ \mathcal{L}(x,\lambda) = f(x) - \lambda \cdot g(x) = f(x) - \sum_{i=1}^K \lambda_i g_i(x) \]

is called a Lagrangian (function)

  • Different sources define the Lagrangian with minus or plus in front of the second term: mathematically the two definitions are equivalent, but the choice of the sign affects the interpretation of the Lagrange multipliers

  • I like to have a minus in the Lagrangian for consistency with the inequality constrained case where the sign is important

  • We come back to this in the next lecture

Fact

First order conditions for the Lagrangian coincide with the conditions in the Lagrange theorem, and together with the constraint qualification assumption they provide necessary first order conditions for the constrained optimization problem.

Example

\[\begin{split} \max_{x_1, x_2} \left\{ \alpha \log(x_1) + \beta \log(x_2) \right\} \\ \text{ such that } \; p_1 x_1 + p_2 x_2 - m = 0 \end{split}\]

Form the Lagrangian

\[ \mathcal{L}(x_1,x_2,\lambda) = \alpha \log(x_1) + \beta \log(x_2) - \lambda(p_1 x_1 + p_2 x_2 - m) \]

FOC

\[\begin{split} \frac{\partial \mathcal{L}}{\partial x_1} = 0 \implies \frac{\alpha}{x_1} - \lambda p_1 = 0\\ \frac{\partial \mathcal{L}}{\partial x_2} = 0 \implies \frac{\beta}{x_2} - \lambda p_2 = 0 \\ \frac{\partial \mathcal{L}}{\partial \lambda} = 0 \implies x_1 p_1 + x_2 p_2 -m = 0 \end{split}\]

From the first two equations we have \(x_1 = \frac{\alpha}{\lambda p_1}\) and \(x_2 = \frac{\beta}{\lambda p_2}\), and then from the third one \(\frac{\alpha}{\lambda} + \frac{\beta}{\lambda} = m\). This gives \(\lambda = \frac{\alpha + \beta}{m}\), leading to the same result

\[ % x_1^\star = \frac{\alpha}{\alpha + \beta} \cdot \frac{m}{p_1} \quad \text{and} \quad x_2^\star = \frac{\beta}{\beta + \alpha} \cdot \frac{m}{p_2} % \]

Lagrangian as tangency condition#

\[ \max_{x_1, x_2} f(x_1, x_2) \; \text{ such that } \; g(x_1, x_2) = 0 \]

Set

\[ \mathcal{L}(x_1, x_2, \lambda) := f(x_1, x_2) - \lambda g(x_1, x_2) \]

and solve the system of

\[ \frac{\partial \mathcal{L}}{\partial x_1} = 0, \quad \frac{\partial \mathcal{L}}{\partial x_2} = 0, \quad \frac{\partial \mathcal{L}}{\partial \lambda} = 0 \]

We have

\[ \frac{\partial \mathcal{L}}{\partial x_i}(x_1, x_2, \lambda) = 0 \iff f_i(x_1, x_2) = - \lambda g_i(x_1, x_2), \; i=1,2 \]

Hence \(\frac{\partial \mathcal{L}}{\partial x_1} = \frac{\partial \mathcal{L}}{\partial x_2} = 0\) gives

\[ \frac{f_1(x_1, x_2)}{f_2(x_1, x_2)} = \frac{g_1(x_1, x_2)}{g_2(x_1, x_2)} \]

Finally

\[ \frac{\partial \mathcal{L}}{\partial \lambda}(x_1, x_2, \lambda) = 0 \iff g(x_1, x_2) = 0 \]

Hence the method leads us to the same conditions as the tangency condition method!

Second order conditions#

  • similar to the unconstrained case, but applied to the Lagrangian

  • have to do with local optima only

  • similarly, provide sufficient conditions in the strict case, but only necessary conditions in the weak case

Using the rules of the differentiation from multivariate calculus we can express the gradient and the Hessian of the Lagrangian w.r.t. \(x\) as

\[\begin{split} \nabla_x \mathcal{L}(\lambda,x) = Df(x) - \lambda \cdot Dg(x)\\ H_x \mathcal{L}(\lambda,x) = Hf(x) - \lambda \odot Hg(x) \end{split}\]
  • \(\odot\) denotes multiplication of the vector \(\lambda\) and the Hessian \((K,N,N)\) tensor \(Hg(x)\) along the first dimension

Fact (necessary SOC)

Let \(f \colon \mathbb{R}^N \to \mathbb{R}\) and \(g \colon \mathbb{R}^N \to \mathbb{R}^K\) be twice continuously differentiable functions (\(C^2\)) and let \(D = \{ x \colon g_i(x) = 0, i=1,\dots,K \} \subset \mathbb{R}^N\)

Suppose \(x^\star \in D\) is the local constrained optimizer of \(f\) on \(D\) and that the constraint qualification assumption holds at \(x^\star\) and some \(\lambda^\star \in \mathbb{R}^K\)

Then:

  • If \(f\) has a local maximum on \(D\) at \(x^\star\), then \(H_x \mathcal{L}(\lambda,x)\) defined above is negative semi-definite on a linear constraint set \(\{v: Dg(x^\star) v = 0\}\), that is

\[ \forall v \in \mathbb{R}^N \text{ and } Dg(x^\star) v = 0 \implies v' H_x \mathcal{L}(\lambda,x) v \le 0 \]
  • If \(f\) has a local minimum on \(D\) at \(x^\star\), then \(H_x \mathcal{L}(\lambda,x)\) defined above is positive semi-definite on a linear constraint set \(\{v: Dg(x^\star) v = 0\}\), that is

\[ \forall v \in \mathbb{R}^N \text{ and } Dg(x^\star) v = 0 \implies v' H_x \mathcal{L}(\lambda,x) v \ge 0 \]

Fact (sufficient SOC)

Let \(f \colon \mathbb{R}^N \to \mathbb{R}\) and \(g \colon \mathbb{R}^N \to \mathbb{R}^K\) be twice continuously differentiable functions (\(C^2\)) and let \(D = \{ x \colon g_i(x) = 0, i=1,\dots,K \} \subset \mathbb{R}^N\)

Suppose there exists \(x^\star \in D\) such that \(\mathrm{rank}(Dg(x^\star)) = K\) and \(Df(x^\star) - \lambda^\star \cdot Dg(x^\star) = 0\) for some \(\lambda^\star \in \mathbb{R}^K\) (the constraint qualification assumption holds at \(x^\star\) which satisfies the FOC)

Then:

  • If \(H_x \mathcal{L}(\lambda,x)\) defined above is negative definite on a linear constraint set \(\{v: Dg(x^\star) v = 0\}\), that is

\[ \forall v \ne 0 \text{ and } Dg(x^\star) v = 0 \implies v' H_x \mathcal{L}(\lambda,x) v < 0, \]

then \(x^\star\) is the local constrained maximum of \(f\) on \(D\)

  • If \(H_x \mathcal{L}(\lambda,x)\) defined above is positive definite on a linear constraint set \(\{v: Dg(x^\star) v = 0\}\), that is

\[ \forall v \ne 0 \text{ and } Dg(x^\star) v = 0 \implies v' H_x \mathcal{L}(\lambda,x) v > 0 \]

then \(x^\star\) is the local constrained minimum of \(f\) on \(D\)

Thus, the structure of both the necessary and the sufficient conditions is very similar to the unconstrained case.

The important difference is that the definiteness of the Hessian is checked on the linear constraint set \(\{v: Dg(x^\star) v = 0\}\) rather than on the whole \(\mathbb{R}^N\).

The linear constraint set represents the tangent hyperplane to the constraint set at \(x^\star\).

But how do we check for definiteness on a linear constraint set?

Bordered Hessian#

Letโ€™s start with an example where we just apply the unconstrained second order conditions to the Lagrangian function

Example

\[\begin{split} \max_{x, y} \left\{ \log(x) + 3 \log(y) \right\} \\ \text{ such that } \; x^2 + y^2 = 1, x >0, y>0 \end{split}\]

The Lagrangian is

\[ \mathcal{L}(x,y,\lambda) = \log(x) + 3 \log(y) - \lambda(x^2 + y^2 - 1) \]

The first order conditions are then

\[\begin{split} \begin{cases} \frac{\partial \mathcal{L}}{\partial x} = \frac{1}{x}-2\lambda x = 0 \\ \frac{\partial \mathcal{L}}{\partial y} = \frac{3}{y}-2\lambda y = 0 \\ \frac{\partial \mathcal{L}}{\partial \lambda} = - (x^2 + y^2 - 1) = 0 \end{cases} \iff \begin{cases} 2\lambda x^2 = 1 \\ 2\lambda y^2 = 3 \\ x^2 + y^2 = 1 \end{cases} \end{split}\]

Adding the first and second equation, we get \(2\lambda(x^2+y^2) = 4\), and therefore combining with the third, \(\lambda = 2\). From this, \(x^2 = 1/4\), \(x = 1/2\), and \(y^2 = 3/4\), \(y = \sqrt{3}/2\) (taking only the positive solutions as required).

The Hessian of the Lagrangian function is traditionally written with Lagrange multipliers as first variables, so we have

\[\begin{split} H \mathcal{L}(\lambda,x,y) = \begin{pmatrix} 0, -2x, -2y \\ -2x, -\frac{1}{x^2}-2\lambda, 0 \\ -2y, 0, -\frac{3}{y^2}-2\lambda \end{pmatrix} \end{split}\]

Evaluated at the stationary point \((1/2,\sqrt{3}/2)\) and \(\lambda=2\)

\[\begin{split} H \mathcal{L}(2,\frac{1}{2},\frac{\sqrt{3}}{2}) = \begin{pmatrix} 0, -1, -\sqrt{3} \\ -1, -8, 0 \\ -\sqrt{3}, 0, -8 \end{pmatrix} \end{split}\]

The leading principle minors are \(0\), \(-1\) and \(32\) which does not fit any of the five patterns of the Silvester criterion. THIS IS WRONG

The problem can be solved by substitution \(y = \sqrt{1-x^2}\) and itโ€™s not too hard to show that the resulting one-dimensional objective is concave.

We need new theory!

  • Note that the upper left corner in the Hessian of the Lagrangian is always zero

Definition

The Hessian of the Lagrangian with respect to \((\lambda, x)\) is caleld a bordered Hessian, and is given by

\[\begin{split} H \mathcal{L}(\lambda,x) = \left( \begin{array}{ll} [0]_{K \times K} & [-Dg(x)]_{K \times N} \\ [-Dg(x)]'_{N \times K} & [Hf(x) - \lambda \odot Hg(x)]_{N \times N} \end{array} \right) \end{split}\]
  • Note that the bordered Hessian \(H \mathcal{L}(\lambda,x)\) is different from the Hessian of the Lagrangian w.r.t. \(x\) denoted above as \(H_x \mathcal{L}(\lambda,x)\)

  • \(H \mathcal{L}(\lambda,x)\) is \((N+K \times N+K)\) matrix

  • \(H_x \mathcal{L}(\lambda,x)\) is \((N \times N)\) matrix

The bordered Hessian can be derived by differentiating the gradient

\[ \nabla \mathcal{L}(\lambda,x) = \big(-g(x) \;, \; Df(x) - \lambda \cdot Dg(x)\big) \]

with respect ot \((\lambda,x)\), using the standard approach from multivariate calculus

  • the subscripts in the definition above denote the shape of the blocks in the bordered Hessian

  • by convention the Lagrange multipliers are placed before the other variables.

  • however, some textbooks place the โ€œborderโ€, i.e. Jacobians of the constrained in upper left corner and the Hessian w.r.t. \(x\) of the objective in the lower right corner, with corresponding changes in the propositions below

  • the sign for the second term in the Lagrangian also appears and may differ between different texts

Diversion: definiteness of quadratic form subject to linear constraint#

Let us review the theory using simpler case of quadratic forms

Definition

Let \(x \in \mathbb{R}^N\), \(A\) is \((N \times N)\) symmetric matrix and \(B\) is \((K \times N)\) matrix.

Quadratic form \(Q(x)\) is called positive definite subject to a linear constraint \(Bx=0\) if

\[ Q(x) = x^T A x >0 \text{ for all } x\ne 0 \text{ and } Bx=0 \]

Quadratic form \(Q(x)\) is called negative definite subject to a linear constraint \(Bx=0\) if

\[ Q(x) = x^T A x <0 \text{ for all } x\ne 0 \text{ and } Bx=0 \]

Positive and negative semi-definiteness subject to a linear constraint is obtained by replacing strict inequalities with non-strict ones.

  • recall that \(Bx=0\) is equivalent to saying that \(x\) is in the null space (kernel) of \(B\)

  • we assume that \(N>K\), so that null space of \(B\) has non-zero dimension \(N-K\)

  • we also assume that \(\mathrm{rank}(B)=K\), so that no two rows of \(B\) are linearly dependent, that is each constraint is important

Without loss of generality (due to reordering of the variables) we can decompose \((K \times M)\) matrix \(B\) as

\[ B = \big(B_1,B_2 \big): B_1 \text{ is } (K \times K) \text{ and } \mathrm{rank}(B_1)=K, \]

\(B_2\) is then a \((K \times N-K)\) matrix representing the basis for the null space of \(B\).

Then the following derivation allows to express the null space of \(B\) through \((x_{K+1},\dots,x_N)\) as parameters

\[\begin{split} B_1 \begin{pmatrix} x_1 \\ \vdots \\ x_K \end{pmatrix} + B_2 \begin{pmatrix} x_{K+1} \\ \vdots \\ x_N \end{pmatrix} = 0 \iff \end{split}\]
\[\begin{split} B_1^{-1} B_1 \begin{pmatrix} x_1 \\ \vdots \\ x_K \end{pmatrix} + B_1^{-1} B_2 \begin{pmatrix} x_{K+1} \\ \vdots \\ x_N \end{pmatrix} = 0 \iff \end{split}\]
\[\begin{split} \begin{pmatrix} x_1 \\ \vdots \\ x_K \end{pmatrix} = - B_1^{-1} B_2 \begin{pmatrix} x_{K+1} \\ \vdots \\ x_N \end{pmatrix} = J \begin{pmatrix} x_{K+1} \\ \vdots \\ x_N \end{pmatrix} \end{split}\]

The newly denoted matrix \(J\) is \((K \times N-K)\). We can write the full vector \(x: Bx=0\) in terms of the truncated vector \(\hat{x}=(x_{K+1},\dots,x_{N})\) which plays the role of parameters. Using matrix \(J\) and the \((N-K \times N-K)\) identity matrix \(I_{N-K}\) we have

\[\begin{split} x = \begin{pmatrix} J \\ I_{N-K} \end{pmatrix} \begin{pmatrix} x_{K+1} \\ \vdots \\ x_N \end{pmatrix} = \begin{pmatrix} J \\ I_{N-K} \end{pmatrix} \hat{x} \end{split}\]

With this, the original quadratic form \(x^TAx\) subject to the linear constraint can be rewritten as

\[\begin{split} \left( \begin{pmatrix} J \\ I_{N-K} \end{pmatrix} \begin{pmatrix} x_{K+1} \\ \vdots \\ x_N \end{pmatrix} \right)^T A \begin{pmatrix} J \\ I_{N-K} \end{pmatrix} \begin{pmatrix} x_{K+1} \\ \vdots \\ x_N \end{pmatrix} = \hat{x}^{T} \begin{pmatrix} J \\ I_{N-K} \end{pmatrix} ^T A \begin{pmatrix} J \\ I_{N-K} \end{pmatrix} \hat{x} \end{split}\]

We have a new quadratic form which is a map \(\mathbb{R}^{N-K} \to \mathbb{R}\) and is defined by the \((N-K \times N-K)\) symmetric matrix

\[\begin{split} E = \begin{pmatrix} J \\ I_{N-K} \end{pmatrix} ^T A \begin{pmatrix} J \\ I_{N-K} \end{pmatrix} \end{split}\]

The original quadratic form \(x^T A x\) evaluated at \(x\) which satisfies \(Bx=0\) returns the same value as the new quadratic form \(\hat{x}^T E \hat{x}\) evaluated at the corresponding \(\hat{x}=(x_{K+1},\dots,x_N)\). Therefore we can study the definiteness of quadratic form \(E\) instead of definiteness of the original quadratic form subject to the linear constraint!

Direct computation of \(E\) requires the inversion of \(B_1\) which is inconvenient. To study the definiteness of \(E\) conveniently, consider a \((N+K \times N+K)\) bordered matrix

\[\begin{split} H= \begin{pmatrix} 0 & B_1 & B_2 \\ B_1^T & A_{11} & A_{12} \\ B_2^T & A_{21} & A_{22} \end{pmatrix} \end{split}\]

where \(A_{ij}\) are appropriately sized blocks of matrix \(A\). Namely \(A_{11}\) is \((K \times K)\), \(A_{22}\) is \((N-K \times N-K)\), and \(A_{12}\) and \(A_{21}\) are \((K \times N-K)\) and \((N-K \times K)\) respectively.

Bordered matrixes are useful as most matrix arithmetic operations can be performed on bordered matrices as if the blocks were individual scalar elements. For example, with an additional (change-of-basis) block matrix

\[\begin{split} \begin{pmatrix} I_K & 0 & 0 \\ 0 & I_K & J \\ 0 & 0 & I_{N-K} \end{pmatrix} \end{split}\]

we have

\[\begin{split} \begin{pmatrix} I_K & 0 & 0 \\ 0 & I_K & 0 \\ 0 & J^T & I_{N-K} \end{pmatrix} \begin{pmatrix} 0 & B_1 & B_2 \\ B_1^T & A_{11} & A_{12} \\ B_2^T & A_{21} & A_{22} \end{pmatrix} \begin{pmatrix} I_K & 0 & 0 \\ 0 & I_K & J \\ 0 & 0 & I_{N-K} \end{pmatrix} = \end{split}\]
\[\begin{split} = \begin{pmatrix} 0 & B_1 & B_1 J + B_2 \\ B_1^T & A_{11} & A_{11}J+A_{12} \\ J^TB_1^T+B_2^T & J^TA_{11}+A_{21} & J^TA_{11}J+A_{21}J+J^T A_{12}+A_{22} \end{pmatrix} = \end{split}\]

Note that \(B_1 J + B_2 = B_1 (-B_1^{-1}B_2) + B_2 = 0\). Further, let \(C=A_{11}J+A_{12}\), and then \(C^{T}=J^TA_{11}^T+A_{12}^T=J^TA_{11}+A_{21}\) since \(A\) is symmetric

\[\begin{split} = \begin{pmatrix} 0 & B_1 & 0 \\ B_1^T & A_{11} & C \\ 0 & C^T & E \end{pmatrix} \end{split}\]

To verify that the lower right block is just \(E\), apply the decomposition of \(A\) into blocks to the definition of \(E\) above:

\[\begin{split} E = \begin{pmatrix} J \\ I_{N-K} \end{pmatrix} ^T \begin{pmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{pmatrix} \begin{pmatrix} J \\ I_{N-K} \end{pmatrix} =\\= J^TA_{11}J+A_{21}J+J^T A_{12}+A_{22} \end{split}\]

Next, letโ€™s compute the determinant of bordered matrix \(H\). First, note that the determinant of the diagonal change of basis matrix is 1 (the product of the elements on the diagonal):

\[\begin{split} \det \begin{pmatrix} I_K & 0 & 0 \\ 0 & I_K & 0 \\ 0 & J^T & I_{N-K} \end{pmatrix} = \det \begin{pmatrix} I_K & 0 & 0 \\ 0 & I_K & J \\ 0 & 0 & I_{N-K} \end{pmatrix} = 1 \end{split}\]

And therefore due to the determinant of the product rule, we have

\[\begin{split} \det H= \det \begin{pmatrix} 0 & B_1 & 0 \\ B_1^T & A_{11} & C \\ 0 & C^T & E \end{pmatrix} \end{split}\]

Next we will use the following formula for square block matrices, where \(W_1\) and \(W_2\) are square:

\[\begin{split} \det \begin{pmatrix} W_1 & 0 \\ W_3 & W_2 \end{pmatrix} = \det (W_1) \det (W_2) \end{split}\]

We have

\[\begin{split} \det H= \det \begin{pmatrix} 0 & B_1 & 0 \\ B_1^T & A_{11} & C \\ 0 & C^T & E \end{pmatrix} = (-1)^K \det \begin{pmatrix} B_1 & 0 & 0 \\ A_{11} & B_1^T & C \\ C^T & 0 & E \end{pmatrix} = \end{split}\]

The \((-1)^K\) coefficient is due to the fact that we swapped \(K\) rows (generally, swapping two rows is the same as multiplying a matrix to a permutation matrix which determinant is in this case -1). Applying the formula for block determinant and determinant of transpose matrix (twice), we have

\[\begin{split} \det H = (-1)^K \det B_1 \det \begin{pmatrix} B_1^T & C \\ 0 & E \end{pmatrix} = (-1)^K (\det B_1)^2 \det E \end{split}\]

We conclude that the sign of \(\det E\) corresponds to the sign of bordered matrix \(\det H\) directly, corrected by the number of linear constraints \(K\).

Importantly, the same derivation holds for each principle minor of \(E\)

Therefore, the definiteness of the quadratic form given by \(E\) can be checked by examining the determinants of the matrixes obtained form the bordered \(H\) by removing certain columns and rows from the last \(N-K\) columns and rows.

To examine the signs of the leading principle minors of order \(j\), the last \(j\) columns and bottom \(j\) rows have to be removed, for \(j=1,\dots,N-K\). To examine the signs of any principle minors of order \(j\), any \(j\) columns and rows (with the same index, i.e. corresponding to a diagonal element) have to be removed among the last \(N-K\) columns and rows.

Letโ€™s compute the order-\(j\) leading principle minor of \(E\) through the elements of the bordered matrix \(H\). Denote the matrix composed on the first \(j\) columns and first \(j\) rows of \(E\) by \(E^j\):

\[\begin{split} E^j = \begin{pmatrix} J^j \\ I_j \end{pmatrix} ^T \begin{pmatrix} A_{11} & A^j_{12} \\ A^j_{21} & A^j_{22} \end{pmatrix} \begin{pmatrix} J^j \\ I_j \end{pmatrix} \end{split}\]

Here all the relevant matrixes are truncated such that dimension \(N-K\) is replaced by \(j\):

  • \(J^j\) is \((K \times j)\) matrix formed of the first \(j\) columns of \(J\), \((K \times j)\) matrix

  • \(A^j_{22}\) is the first \(j\) rows and columns of the block \(A_{22}\), \((j \times j)\) matrix

  • \(A^j_{12}\) is the first \(j\) columns of the block \(A_{12}\), \((K \times j)\) matrix

  • \(A^j_{21}\) is the first \(j\) rows of the block \(A_{21}\), \((j \times K)\) matrix

  • \(I_j\) is \((j \times j)\) identity matrix

We have

\[ \det H^j = (-1)^K (\det B_1)^2 \det E^j \quad\Leftrightarrow \]
\[ \det E^j = \frac{(-1)^K}{(\det B_1)^2} \det H^j \]

where \(H^j\) is given by \(H\) with the last \((N-K-j)\) rows and columns removed:

\[\begin{split} H^j = \begin{pmatrix} 0 & B_1 & B^j_2 \\ B_1^T & A_{11} & A^j_{12} \\ (B^j_2)^T & A^j_{21} & A^j_{22} \end{pmatrix} \end{split}\]
  • where \(B^j_2\) is buit in similar manner to all other truncated matriced: these are the first \(j\) columns of the matrix \(B_2\), \((K \times j)\) matrix

In other words, the oder-\(j\) leading principle minor of \(E\) is given by the determinant of the bordered matrix \(H\) where the last \((N-K-j)\) rows and columns are dropped

Definiteness of quadratic form subject to linear constraint

In order to establish definiteness of the quadratic form \(x^TAx\), \(x \in \mathbb{R}^N\), subject to a linear constraint \(Bx=0\), we need to check signs of the \(N-K\) leading principle minors of the \((N-K \times N-K)\) matrix \(E\) which represents the equivalent lower dimension quadratic form without the constraint. The signs of these leading principle minors correspond to \((-1)^K\) times the signs of the determinant of the corresponding bordered matrix \(H\), where a number of last rows and columns are dropped, according to the following table:

Order \(j\) of leading principle minor in \(E\)

Include first rows/columns of \(H\)

Drop last rows/columns of \(H\)

1

\(2K+1\)

\(N-K-1\)

2

\(2K+2\)

\(N-K-2\)

\(\vdots\)

\(\vdots\)

\(\vdots\)

\(N-K-1\)

\(N+K-1\)

1

\(N-K\)

\(N+K\)

0

Clearly, this table has \(N-K\) rows, and it is \(N-K\) determinants that have to be computed.

The definiteness of the quadratic form subject to the linear constraint is then given by the pattern of signs of these determinants according to the Silvesterโ€™s criterion.

Note that computing determinants of \(H\) after dropping a number of last rows and columns is effectively computing last leading principle minors of \(H\).

If any of the leading principle minors turns out to be zero, we can move on and check the patters for the semi-definiteness.

To establishing semi-definiteness we have to do a bit more work because not only the signs of the leading principal minors, but the signs of all principle minors have to be inspected.

Semi-definiteness of quadratic form subject to linear constraint

In order to establish semi-definiteness of the quadratic form \(x^TAx\), \(x \in \mathbb{R}^N\), subject to a linear constraint \(Bx=0\), we need to check signs of the principle minors of the \((N-K \times N-K)\) matrix \(E\) which represents the equivalent lower dimension quadratic form without the constraint. The signs of these principle minors correspond to \((-1)^K\) times the signs of the determinant of the corresponding bordered matrix \(H\), where a number number of rows and columns are dropped from among the last \(N-K\) rows and columns (in all combinations), according to the following table:

Order \(j\) of principle minor in \(E\)

Number of principle minors

Drop among last \(N-K\) rows/columns

1

\(N-K\)

\(N-K-1\)

2

\(C(N-K,2)\)

\(N-K-2\)

\(\vdots\)

\(\vdots\)

\(\vdots\)

\(N-K-1\)

\(N-K\)

1

\(N-K\)

1

0

The table again has \(N-K\) rows, but each row corresponds to a different number of principle minors of a given order, namely for order \(j\) there are \((N-k)\)-choose-\(j\) principle minors given by the corresponding number of combinations \(C(N-K,j)\).

The total number of principle minors to be checked is \(2^{N-K}-1\), given by the sum of binomial coefficients

The definiteness of the quadratic form subject to the linear constraint is then given by the pattern of signs of these determinants according to the Silvesterโ€™s criterion.

In what follows we will refer to computing determinants of \(H\) after dropping a number of rows and columns among the last \(N-K\) rows and columns as computing last principle minors of \(H\). The number of last principle minors will correspond to the number of lines in the table above, and not the number of determinants to be computed.

Application of the Silvester criterion for the positive definiteness and semi-definiteness is straightforward: all determinants have to be positive of non-negative, respectively.

However, let us examine the patterns of the alternating signs of the principle minors of different orders in the Silvesterโ€™s criterion for negative definiteness and semi-definiteness. The sign of the determinants corresponding to the \(j=1\) order minors has to be \((-1)(-1)^K\), and then alternate according to the following sequences:

\[\begin{split} K = 1,3,5\dots \quad : \quad + - + - + \dots \\ K = 2,4,6\dots \quad : \quad - + - + - \dots \end{split}\]

The number of elements in these sequences that are to be checked is \(N-K\), thus the sign changes \(N-K-1\) times. Thus, the last sign is given by \((-1)(-1)^K(-1)^{N-K-1} = (-1)^N\). The highest order principle minor to check corresponds to the full determinant of the bordered matrix \(H\), and therefore it is more convenient to check the negative definiteness and semi-definiteness pattern by ensuring that:

  • the signs alternate for different order principle minors along the change of \(j\)

  • the sign of the last principle minor given by the determinant of the full bordered matrix \(H\) is \((-1)^N\)

Note that this version of the alternation rule applies for both leading principle minors and all principle minors, though in the latter case more determinants have to be checked to satisfy the sign alternation pattern.

Example

Determine the definiteness of quadratic form \(x^TAx\) subject to the linear constraint \(Bx=0\) where

\[\begin{split} A = \begin{pmatrix} 3 & -2 & 3 \\ -2 & 5 & -1 \\ 3 & -1 & 6 \end{pmatrix} \quad B = (1,2,0) \end{split}\]

Form the bordered matrix \(H\)

\[\begin{split} H = \begin{pmatrix} 0 & 1 & 2 & 0 \\ 1 & 3 & -2 & 3 \\ 2 & -2 & 5 & -1 \\ 0 & 3 & -1 & 6 \end{pmatrix} \end{split}\]

We have \(N=3\), \(K=1\), therefore we have to compute \(N-K=2\) determinants to check the signs of the leading principle minors of \(E\) (lower dimension unconstrained quadratic form). Note that all signs will have to be interpreted flipped, as \((-1)^K = (-1)^1 = -1\).

The sign of the first order leading principle minor of \(E\) is

\[\begin{split} \mathrm{sign}(\det E^1) = \mathrm{sign}\left( (-1)^1 \det \begin{pmatrix} 0 & 1 & 2 \\ 1 & 3 & -2 \\ 2 & -2 & 5 \end{pmatrix} \right) = \mathrm{sign} \big( - ( -4 -4 -12 -5 ) \big) = +1 \end{split}\]

The sign of the second order leading principle minor of \(E\) is

\[\begin{split} \mathrm{sign}(\det E^2) = \mathrm{sign}\left( (-1)^1 \det \begin{pmatrix} 0 & 1 & 2 & 0 \\ 1 & 3 & -2 & 3 \\ 2 & -2 & 5 & -1 \\ 0 & 3 & -1 & 6 \end{pmatrix} \right) = \end{split}\]
\[\begin{split} = \mathrm{sign} \Big( -\left( -\det \begin{pmatrix} 1 & 2 & 0 \\ -2 & 5 & -1 \\ 3 & -1 & 6 \end{pmatrix} +2\det \begin{pmatrix} 1 & 2 & 0 \\ 3 & -2 & 3 \\ 3 & -1 & 6 \end{pmatrix} \right) \Big) = \end{split}\]
\[ = \mathrm{sign} \Big( 30 -6 +24 -1 - 2(-12 +18-36 +3) \Big) = \mathrm{sign} (47+54) = +1 \]

Hence, by Silvesterโ€™s criterion, the quadratic form \(x^TAx\) subject to the linear constraint \(Bx=0\) is positive definite.

Back to second order conditions#

Recall that the second order conditions of the equality constrained optimization problems are formulated in terms of the definiteness of the Hessian of the Lagrangian function with respect to \(x\), \(H_x \mathcal{L}(\lambda,x)\), subject to the linear constraint \(Dg(x^\star) x = 0\).

Therefore, all the theory on the definiteness of the quadratic form subject to the linear constraint can be applied to the bordered Hessian of the Lagrangian function \(H \mathcal{L}(\lambda,x)\).

Fact (definiteness of Hessian on linear constraint set)

For \(f \colon \mathbb{R}^N \to \mathbb{R}\) and \(g \colon \mathbb{R}^N \to \mathbb{R}^K\) twice continuously differentiable functions (\(C^2\)):

\(H_x \mathcal{L}(\lambda,x)\) is positive definite on a linear constraint set \(\{v: Dg(x^\star) v = 0\}\) \(\iff\) the last \(N-K\) leading principal minors of \(H \mathcal{L}(\lambda,x)\) are non-zero and have the same sign as \((-1)^K\).

\(H_x \mathcal{L}(\lambda,x)\) is negative definite on a linear constraint set \(\{v: Dg(x^\star) v = 0\}\) \(\iff\) the last \(N-K\) leading principal minors of \(H \mathcal{L}(\lambda,x)\) are non-zero and alternate in sign, with the sign of the full Hessian matrix equal to \((-1)^N\).

\(H_x \mathcal{L}(\lambda,x)\) is positive semi-definite on a linear constraint set \(\{v: Dg(x^\star) v = 0\}\) \(\iff\) the last \(N-K\) principal minors of \(H \mathcal{L}(\lambda,x)\) are zero or have the same sign as \((-1)^K\).

\(H_x \mathcal{L}(\lambda,x)\) is negative semi-definite on a linear constraint set \(\{v: Dg(x^\star) v = 0\}\) \(\iff\) the last \(N-K\) principal minors of \(H \mathcal{L}(\lambda,x)\) are zero or alternate in sign, with the sign of the full Hessian matrix equal to \((-1)^N\).

  • see above the exact definition of the last principle minors and how to count them

Similarly to the unconstrained case, when none of the listed conditions for the bordered Hessian hold, it is indefinite and the point tested by the second order conditions is not a constrained optimizer.


We can now reformulate the second order conditions for the equality constrained optimization problems in terms of signs of the last principle minors of the bordered Hessian of the Lagrangian function.

Fact (necessary SOC)

Let \(f \colon \mathbb{R}^N \to \mathbb{R}\) and \(g \colon \mathbb{R}^N \to \mathbb{R}^K\) be twice continuously differentiable functions (\(C^2\)), and let \(D = \{ x \colon g_i(x) = 0, i=1,\dots,K \} \subset \mathbb{R}^N\)

Suppose \(x^\star \in D\) is the local constrained optimizer of \(f\) on \(D\) and that the constraint qualification assumption holds at \(x^\star\) and some \(\lambda^\star \in \mathbb{R}^K\)

Then:

  • If \(f\) has a local maximum on \(D\) at \(x^\star\), then the last \(N-K\) principal minors of bordered Hessian \(H \mathcal{L}(\lambda,x)\) are zero or alternate in sign, with the sign of the determinant of the full Hessian matrix equal to \((-1)^N\)

  • If \(f\) has a local minimum on \(D\) at \(x^\star\), then the last \(N-K\) principal minors of bordered Hessian \(H \mathcal{L}(\lambda,x)\) are zero or have the same sign as \((-1)^K\)

  • see above the exact definition of the last principle minors and how to count them

Fact (sufficient SOC)

Let \(f \colon \mathbb{R}^N \to \mathbb{R}\) and \(g \colon \mathbb{R}^N \to \mathbb{R}^K\) be twice continuously differentiable functions (\(C^2\)) and let \(D = \{ x \colon g_i(x) = 0, i=1,\dots,K \} \subset \mathbb{R}^N\)

Suppose there exists \(x^\star \in D\) such that \(\mathrm{rank}(Dg(x^\star)) = K\) and \(Df(x^\star) - \lambda^\star \cdot Dg(x^\star) = 0\) for some \(\lambda^\star \in \mathbb{R}^K\) (i.e. the constraint qualification assumption holds at \((x^\star,\lambda^\star)\) which satisfies the FOC)

Then:

  • If the last \(N-K\) leading principal minors of the bordered Hessian \(H \mathcal{L}(\lambda,x)\) are non-zero and alternate in sign, with the sign of the determinant of the full Hessian matrix equal to \((-1)^N\), then \(x^\star\) is the local constrained maximum of \(f\) on \(D\)

  • If the last \(N-K\) leading principal minors of the bordered Hessian \(H \mathcal{L}(\lambda,x)\) are non-zero and have the same sign as \((-1)^K\), then \(x^\star\) is the local constrained minimum of \(f\) on \(D\)

Example

Letโ€™s check the second order conditions for the consumer choice problem.

\[ \mathcal{L}(x_1,x_2,\lambda) = \alpha \log(x_1) + \beta \log(x_2) - \lambda(p_1 x_1 + p_2 x_2 - m) \]
\[ \nabla_x \mathcal{L}(\lambda,x) = \left(\frac{\alpha}{x_1} - \lambda p_1,\; \frac{\beta}{x_2} - \lambda p_2\right) \]
\[\begin{split} H_x \mathcal{L}(\lambda,x) = \left( \begin{array}{cc} -\frac{\alpha}{x_1^2},& 0 \\ 0,& -\frac{\beta}{x_2^2} \end{array} \right) \end{split}\]
\[ \nabla \mathcal{L}(\lambda,x) = \left( m-x_1 p_1 - x_2 p_2, \; \frac{\alpha}{x_1} - \lambda p_1, \; \frac{\beta}{x_2} - \lambda p_2 \right) \]
\[\begin{split} H\mathcal{L}(\lambda,x) = \left( \begin{array}{ccc} 0,& -p_1,& -p_2 \\ -p_1,& -\frac{\alpha}{x_1^2},& 0 \\ -p_2,& 0,& -\frac{\beta}{x_2^2} \end{array} \right) \end{split}\]

Solving FOC we have as before

\[ x^\star = \left( \frac{\alpha}{\alpha + \beta} \frac{m}{p_1}, \; \frac{\beta}{\beta + \alpha} \frac{m}{p_2} \right) \text{ and } \lambda^\star = \frac{\alpha+\beta}{m} \]

To check the second order conditions compute the Hessian at \(x^\star\)

\[\begin{split} H\mathcal{L}(\lambda^\star,x^\star) = \left( \begin{array}{ccc} 0,& -p_1,& -p_2 \\ -p_1,& -\frac{(\alpha+\beta)^2p_1^2}{\alpha m^2},& 0 \\ -p_2,& 0,& -\frac{(\alpha+\beta)^2p_2^2}{\beta m^2} \end{array} \right) \end{split}\]

Because \(N=2\) and \(K=1\) we need to check \(N-K=1\) last lead principle minor, which is the determinant of the whole \(H\mathcal{L}(\lambda^\star,x^\star)\). Using the triangle rule for computing determinants of the \(3 \times 3\) matrix, we have

\[ \mathrm{det}\left(H\mathcal{L}(\lambda^\star,x^\star)\right)= \frac{(\alpha+\beta)^2 p_1^2 p_2^2}{\alpha m^2} + \frac{(\alpha+\beta)^2 p_1^2 p_2^2}{\beta m^2} = \frac{(\alpha+\beta)^3 p_1^2 p_2^2}{\alpha \beta m^2} >0 \]

We have that the determinant of the bordered Hessian has the same sign as \((-1)^N=(-1)^2\), and the condition for sign alternation is (trivially) satisfied for \(N-K = 1\) last leading principal minors.

Hence, \(H_x \mathcal{L}(\lambda,x)\) is negative definite, implying that \(x^\star\) is the local constrained maximum of \(f\) on \(D\) by the sufficient second order condition.

Lagrangian method: algorithm#

  1. Write down the Lagrangian function \(\mathcal{L}(x,\lambda)\)

  2. Find all stationary points of \(\mathcal{L}(x,\lambda)\) with respect to \(x\) and \(\lambda\), i.e. solve the system of first order equations

  3. Derive the bordered Hessian \(H \mathcal{L}(x,\lambda)\) and compute its value at the stationary points

  4. Using second order conditions, check if the stationary points are local optima

  5. To find the global optima, compare the function values at all identified local optima

  • note that using convexity/concavity of the objective function is not as straightforward as in the unconstrained case, requires additional assumptions on the convexity of the constrained set

When does Lagrange method fail?#

Example

\[\begin{split} f(x,y) = -y \to \max_{x,y} \\ \text {subject to} \\ g(x,y) = y^3 -x^2 = 0 \end{split}\]

Proceed according to the Lagrange recipe

\[ \mathcal{L}(x,y,\lambda) = -y -\lambda(y^3 -x^2) \]
\[ \nabla \mathcal{L}(x,y,\lambda) = \left(x^2-y^3,\; 2x\lambda,\; -3y^2\lambda-1 \right) \]

The FOC system of equations

\[\begin{split} \begin{cases} 2x\lambda =0 \\ 3y^2\lambda+1=0 \\ x^2-y^3 =0 \end{cases} \end{split}\]

does not have solutions! From second equation \(\lambda \ne 0\), then from the first equation \(x=0\), and so \(y=0\) from the thrid, but then the second equation is not satisfied.

However, we can deduce that \((x^\star,y^\star)=(0,0)\) is a maximizer in the following way. For all points \((x,y)\) which satisfy the constraint \(y^3 \ge 0\) because \(x^2 \ge 0\), and thus the maximum attainable value is \(f(x,y)=0\) which it takes at \((0,0)\).

Why does Lagrange method fail?

\[\begin{split} \nabla g(x,y) = (2x,-3y^2)\\ \nabla g(x^\star,y^\star) = (0,0) \end{split}\]

The Lagrange method fails because the constraint qualification assumption is not satisfied at \((0,0)\), i.e. the rank of Jacobian matrix of the constraints (one by two in this case) is zero which is less than the number of constraints \(K\).

Lagrange method may not work when:

  • Constraints qualification assumption is not satisfied at the stationary point

  • FOC system of equations does not have solutions (thus no local optima)

  • Saddle points โ€“ second order conditions can help

  • Local optimizer is not a global one โ€“ no way to check without studying analytic properties of each problem

Conclusion: great tool, but blind application may lead to wrong results!

References and reading#

References
  • Simon & Blume: 16.2, whole of chapter 18, 19.3

  • Sundaram: chapters 5 and 6

Further reading and self-learning