🔬 Tutorial problems epsilon#

\(\epsilon\).1#

Formulation

Consider a function \(f : \mathbb{R}^N \ni x \mapsto x^{T}Bx \in \mathbb{R}\), where \(N \times N\) matrix \(B\) is square but not symmetric.

Show that the same function can be represented as \(x^{T}Ax\) where \(A\) is symmetric.

Hint

Given a square matrix \(M\), you can use the identity \(M = \tfrac{1}{2}(M+M') + \tfrac{1}{2}(M-M')\) where the first component is symmetric and the second is not symmetric.

Fact

If \(A\) and \(B\) are conformable for matrix multiplication, then

\[ (AB)^{T} = B^T A^T \]

Solution

In general a matrix can be decomposed into a symmetric and an anti-symmetric part as follows:

\[ M=\frac{1}{2}(M+M^T)+\frac{1}{2}(M-M^T)=M_s+M_a \]

Note that the symmetric part is invariant to the transpose, \(M^T_s=M_s\) while the antisymmetric part changes its sign under transposition: \(M^T_a=-M_a\)

Let us examine the purely antisymmetric matrix

\[ q=x^T M_a x=(x^T M_a^T x)^T=-q \]

Which implies that \(q=0\) has to hold true for all \(x\)!

Then, it is clear that the \(A = \frac{1}{2}(B+B^T)\) solves the problem.

\(\epsilon\).2#

Formulation

Consider a function \(f : \mathbb{R}^N \ni {\bf x} \mapsto {\bf x}'A{\bf x} \in \mathbb{R}\), where \(N \times N\) matrix \(A\) is symmetric.

Using the product rule of multivariate calculus, derive the gradient and Hessian of \(f\). Make sure that all multiplied vectors and matrices are conformable.

Hint

You can assume that \({\bf x}\) is a column vector, and that any vector function of \({\bf x}\) is also a column vector.

Definition

Let \(A\) denote an open set in \(\mathbb{R}^N\), and let \(f \colon A \to \mathbb{R}\). Assume that \(f\) is twice differentiable at \(x \in A\).

The total derivative of the gradient of function \(f\) at point \(x\), \(\nabla f(x)\) is called the Hessian matrix of \(f\) denoted by \(Hf\) or \(\nabla^2 f\), and is given by a \(N \times N\) matrix

\[\begin{split} Hf(x) = \nabla^2 f(x) = \left( \begin{array}{ccc} \frac{\partial^2 f}{\partial x_1 \partial x_1}(x) & \cdots & \frac{\partial^2 f}{\partial x_1 \partial x_N}(x) \\ \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_N \partial x_1}(x) & \cdots & \frac{\partial^2 f}{\partial x_N \partial x_N}(x) \end{array} \right) \end{split}\]

Solution

A possible answer:

Represent the quadratic form as a dot product of two functions \(f(x) = x' A x = h(x) \cdot g(x)\), where \(h(x) = x\) and \(g(x) = A x\). Then \(Dh(x) = I\) (identity matrix) and \(Dg(x) = A\).

The last Jacobian can be easity derived by representing matrix multiplication as a linear combination of columns. Differentiating with respect to each element of \(x\) then yields a Jacobian composed of columns of matrix \(A\), therefore equal to it.

\[\begin{split} g(x) = A x = \left( \begin{array}{ccc} a_{11} & \cdots & a_{1N} \\ \vdots & \ddots & \vdots \\ a_{N1} & \cdots & a_{NN} \end{array} \right) \left( \begin{array}{c} x_1 \\ \vdots \\ x_N \end{array} \right) = \left( \begin{array}{c} a_{11} \\ \vdots \\ a_{N1} \end{array} \right) x_1 + \cdots \left( \begin{array}{c} a_{1N} \\ \vdots \\ a_{NN} \end{array} \right) x_N \end{split}\]

Applying the dot product rule of differentiation we have

\[\begin{split} D(h \cdot g)(x) = [h(x)]^T Dg(x) + [g(x)]^T Dh(x) = \\ = x^T A + [Ax]^T I = x^T A + x^T A = 2 x^T A = 2 [A x]^T \end{split}\]

The last transformation is transpose of a product + utilizing symmetry of \(A\).

The final answer is the \(1 \times N\) matrix (row vector)

\[ Df(x) = f(x) = 2 x^T A = 2 [A x]^T \]

\(\epsilon\).3#

Formulation

In which direction should one move from a given point in order to increase the value of the function most rapidly:

\(\quad\) \(f(x,y) = 4x^2y\) from the point \((2,3)\)
\(\quad\) \(f(x,y) = y^2 e^{3x}\) from the point \((0,3)\)

Present your answer as a vector of length 1.

[Simon and Blume, 1994]: Exercises 14.18, 14.19

Hint

Review the definition and facts about the gradient of a multivariate functions.

Solution

The direction of most rapid ascent of the multivariate function is given by its gradient.

\[\begin{split} \begin{array}{l} \partial f(x,y)/\partial x = 8xy \\ \partial f(x,y)/\partial y = 4x^2 \\ \nabla f(x,y) = (8xy, 4x^2) \\ \nabla f(2,3) = (48, 16) \end{array} \end{split}\]

We need to make sure that the direction vector had length 1, so we divide by its norm

\[ \|(48, 32)\| = \sqrt{48^2+16^2} = \sqrt{2^{8}3^{2}+2^8} = 2^4 sqrt{10} \]

The direction vector is then \((48, 16)/16\sqrt{10} = (3/\sqrt{10}, 1/\sqrt{10})\)

\[\begin{split} \begin{array}{l} \partial f(x,y)/\partial x = 3 y^2 e^{3x} \\ \partial f(x,y)/\partial y = 2y e^{3x} \\ \nabla f(x,y) = (3 y^2 e^{3x}, 2y e^{3x}) = e^{3x} (3 y^2, 2y) \\ \nabla f(0,3) = (27, 6) \end{array} \end{split}\]

We need to make sure that the direction vector had length 1, so we divide by its norm

\[ \|(27, 6)\| = \sqrt{27^2+6^2} = \sqrt{3^{6}+3^2 2^2} = 3 \sqrt{81+4} \]

The direction vector is then \((27, 6)/3\sqrt{85} = (9/\sqrt{85}, 2/\sqrt{85})\)

\(\epsilon\).4#

Formulation

A critical point of a multivariate function is the point at which all partial derivatives are zero.

Compute the critical points of the following functions:

\(\quad\) \(x^4+x^2-6xy + 3y^2\)
\(\quad\) \(x^2-6xy+2y^2+10x+2y-5\)
\(\quad\) \(xy^2+x^3y-xy\)
\(\quad\) \(3x^4+3x^2y-y^3\)
\(\quad\) \(x^2+6xy+y^2-3yz+4z^2-10x-5y-21z\)
\(\quad\) \((x^2+2y^2+3z^2) e^{-(x^2+y^2+z^2)}\)

[Simon and Blume, 1994]: Exercises 17.1, 17.2

Hint

Solution

Solution algorithm:

compute partial derivatives
solve the system of equations formed by the partial derivatives equalized to zeros

Correct answers:

\((0,0), (1,1), (-1,-1)\)
\((13/7,16/7)\)
\((0,0), (0,1), (1,0), (-1,0), (1/\sqrt{5},2/5), (-1/\sqrt{5},2/5)\)
\((0,0), (1/2,-1/2), (-1/2,-1/2)\)
\((2,1,3)\)
\((0,0,0), (1,0,0), (-1,0,0), (0,1,0), (0,-1,0), (0,0,1), (0,0,-1)\)

\(\epsilon\).5#

Formulation

Compute directional derivative of \(f(x,y) = xy^2 + x^3y\) at the point \((4,-2)\) in the direction of the vector \((1/\sqrt{10},3/\sqrt{10})\).

Proceed in two different ways:

first, using the definition of the directional derivative, write down a function \(g \colon \mathbb{R} \to \mathbb{R}\) of \(h\) given as a slice of the original function \(f(x,y)\) through the given point in the given direction; then differentiate this function and compute the derivative at \(h=0\)
second, use the gradient formula; and verify that the same answer is obtained

[Simon and Blume, 1994]: Exercise 14.20

Hint

Follow the example in the lecture notes

Solution

Denote \((x_0,y_0) = (4,-2)\), and \(v = (1/\sqrt{10},3/\sqrt{10})\).

Note that \(\|v\| = \sqrt{(1/\sqrt{10})^2+(3/\sqrt{10})^2} = (1+3^2)/10 = 1\).

First approach

Let \(g(h)\) denote the univariate function of \(h\) obtained by slicing the function \(f(x,y)\) through the point \((x_0,y_0)\) in the direction of the vector \(v\):

\[ g(h) = f(x_0+hv_1,y_0+hv_2) = (x_0+hv_1)(y_0+hv_2)^2 + (x_0+hv_1)^3(y_0+hv_2) \]

We have

\[\begin{split} \begin{array}{l} \frac{d g(h)}{dh} = v_1(y_0+hv_2)^2 + 2v_2(x_0+hv_1)(y_0+hv_2) + 3v_1(x_0+hv_1)^2(y_0+hv_2)+v_2(x_0+hv_1)^3 \\ \frac{d g(0)}{dh} = v_1 y_0^2 + 2v_2 x_0 y_0 + 3v_1 x_0^2 y_0 + v_2 x_0^3 \end{array} \end{split}\]

Evaluating the last expression at the given \((x_0,y_0)\) and \(v\) we get

\[\begin{split} \begin{array}{l} \frac{d g(0)}{dh} = (1/\sqrt{10}) (-2)^2 + 2 (3/\sqrt{10})4(-2) + 3 (1/\sqrt{10}) 4^2 (-2) + (3/\sqrt{10}) 4^3 = \\ = (4-48-96+192)/\sqrt{10} = 52/\sqrt{10} \end{array} \end{split}\]

Second approach

Compute the partial derivatives and the gradient

\[\begin{split} \begin{array}{l} \frac{\partial f(x,y)}{\partial x} = y^2+3x^2y \\ \frac{\partial f(x,y)}{\partial y} = 2xy+x^3 \\ \nabla f(x,y) = \left( y^2+3x^2y, 2xy+x^3 \right) \\ D_v f(x_0,y_0) = \nabla f(x_0,y_0) \cdot v = (y_0^2+3x_0^2y_0, 2x_0y_0+x_0^3) \cdot (1/\sqrt{10},3/\sqrt{10}) = \\ = ((-2)^2+3\cdot 4^2(-2), 2\cdot 4(-2)+4^3) \cdot (1/\sqrt{10},3/\sqrt{10}) = \\ = (4-96, -16+64) \cdot (1/\sqrt{10},3/\sqrt{10}) = \\ = (-92,48) \cdot (1/\sqrt{10},3/\sqrt{10}) = -92/\sqrt{10} + 48\cdot 3/\sqrt{10} = 52/\sqrt{10} \end{array} \end{split}\]

🔬 Tutorial problems epsilon

Contents

🔬 Tutorial problems epsilon#

\(\epsilon\).1#

\(\epsilon\).2#

\(\epsilon\).3#

\(\epsilon\).4#

\(\epsilon\).5#