🔬 Tutorial problems gamma \gamma

🔬 Tutorial problems gamma \(\gamma\)#

Note

This problems are designed to help you practice the concepts covered in the lectures. Not all problems may be covered in the tutorial, those left out are for additional practice on your own.

\(\gamma\).1#

Consider an \((n \times n)\) Vandermonde matrix [this one can be named :)] of the form

\[\begin{split} V = \begin{bmatrix} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \\ x_1^2 & x_2^2 & \cdots & x_n^2 \\ \vdots & \vdots & \ddots & \vdots \\ x_1^{n-1} & x_2^{n-1} & \cdots & x_n^{n-1} \end{bmatrix} \end{split}\]

Show that the determinant of \(V\) is given by

\[ \det(V) = \Pi_{j<i \leqslant n}(x_i-x_j) \]

for the cases \(n=2\), \(n=3\) and \(n=4\)

Properties of the determinants help in finding an elegant solution.

Using the fact that the determinant is invariant to Gaussian elementary row operations we perform the following derivation:

\[\begin{split} \det \begin{pmatrix} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \\ x_1^2 & x_2^2 & \cdots & x_n^2 \\ \vdots & \vdots & \ddots & \vdots \\ x_1^{n-1} & x_2^{n-1} & \cdots & x_n^{n-1} \end{pmatrix} = \det \begin{pmatrix} 1 & 0 & \cdots & 0 \\ x_1 & x_2-x_1 & \cdots & x_n-x_1 \\ x_1^2 & x_2^2-x_1^2 & \cdots & x_n^2-x_1^2 \\ \vdots & \vdots & \ddots & \vdots \\ x_1^{n-1} & x_2^{n-1}-x_1^{n-1} & \cdots & x_n^{n-1}-x_1^{n-1} \end{pmatrix} = \end{split}\]

(expand along the first row)

\[\begin{split} = \det \begin{pmatrix} x_2-x_1 & x_3-x_1 & \cdots & x_n-x_1 \\ x_2^2-x_1^2 & x_3^2-x_1^2 & \cdots & x_n^2-x_1^2 \\ \vdots & \vdots & \ddots & \vdots \\ x_2^{n-1}-x_1^{n-1} & x_3^{n-1}-x_1^{n-1} & \cdots & x_n^{n-1}-x_1^{n-1} \end{pmatrix} = \end{split}\]

(take out common factor from each row)

\[\begin{split} = (x_2-x_1)(x_3-x_1)\dots(x_n-x_1) \det \begin{pmatrix} 1 & 1 & \cdots & 1 \\ \frac{x_2^2-x_1^2}{x_2-x_1} & \frac{x_3^2-x_1^2}{x_3-x_1} & \cdots & \frac{x_n^2-x_1^2}{x_n-x_1} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{x_2^{n-1}-x_1^{n-1}}{x_2-x_1} & \frac{x_3^{n-1}-x_1^{n-1}}{x_3-x_1} & \cdots & \frac{x_n^{n-1}-x_1^{n-1}}{x_n-x_1} \\ \end{pmatrix} \end{split}\]

It is not hard to show (by long division of polynomials and mathematical induction) that

\[ \frac{a^{k}-b^{k}}{a-b} =a^{k-1}+a^{k-2}b + \dots + ab^{k-2} + b^{k-1} =\sum_{i=1}^{k} a^{k-i}b^{i-1} \]

Continuing the derivation:

\[\begin{split} = \prod_{i=2}^n (x_i-x_1) \det \begin{pmatrix} 1 & \cdots & 1 \\ x_2+x_1 & \cdots & x_n+x_1 \\ \vdots & \ddots & \vdots \\ \sum_{i=1}^{n-1} x_2^{n-1-i}x_1^{i-1} & \cdots & \sum_{i=1}^{n-1} x_n^{n-1-i}x_1^{i-1} \end{pmatrix} = \end{split}\]

We can then repeat the same procedure:

  • subtract the first row from all other

  • notice regularities in the polynomials

  • expand the determinant along the first row

  • take out common factors from each row

It is then obvious how to prove the main formula by mathematical induction.

\(n=2\)

\[\begin{split} \det \begin{pmatrix} 1 & 1 \\ x & y \end{pmatrix} =y-x \end{split}\]

\(n=3\)

\[\begin{split} \det \begin{pmatrix} 1 & 1 & 1 \\ x & y & z \\ x^2 & y^2 & z^2 \end{pmatrix} = \det \begin{pmatrix} 1 & 0 & 0 \\ x & y-x & z-x \\ x^2 & y^2-x^2 & z^2-x^2 \end{pmatrix} = \end{split}\]
\[\begin{split} = \det \begin{pmatrix} y-x & z-x \\ y^2-x^2 & z^2-x^2 \end{pmatrix} = \end{split}\]
\[\begin{split} = (y-x)(z-x) \det \begin{pmatrix} 1 & 1 \\ x+y & x+z \end{pmatrix} = \end{split}\]
\[ = (y-x)(z-x)(x+z-x-y)=(y-x)(z-x)(z-y) \]

\(n=4\)

Follow the same scheme, using

  • \(x^2-y^2=(x-y)(x+y)\)

  • \(x^3-y^3=(x-y)(x^2+xy+y^2)\)

\(\gamma\).2#

Consider a function \(f : \mathbb{R}^N \ni x \mapsto x^{T}Bx \in \mathbb{R}\), where \(N \times N\) matrix \(B\) is square but not symmetric.

Show that the same function can be represented as \(x^{T}Ax\) where \(A\) is symmetric.

Given a square matrix \(M\), you can use the identity \(M = \tfrac{1}{2}(M+M') + \tfrac{1}{2}(M-M')\) where the first component is symmetric and the second is not symmetric.

If \(A\) and \(B\) are conformable for matrix multiplication, then \((AB)^{T} = B^T A^T\).

In general a matrix can be decomposed into a symmetric and an anti-symmetric part as follows:

\[ M=\frac{1}{2}(M+M^T)+\frac{1}{2}(M-M^T)=M_s+M_a \]

Note that the symmetric part is invariant to the transpose, \(M^T_s=M_s\) while the antisymmetric part changes its sign under transposition: \(M^T_a=-M_a\)

Let us examine the purely antisymmetric matrix

\[ q=x^T M_a x=(x^T M_a^T x)^T=-q \]

Which implies that \(q=0\) has to hold true for all \(x\)!

Then, it is clear that the \(A = \frac{1}{2}(B+B^T)\) solves the problem.

\(\gamma\).3#

In which direction should one move from a given point in order to increase the value of the function most rapidly:

  1. \(\quad\) \(f(x,y) = 4x^2y\) from the point \((2,3)\)

  2. \(\quad\) \(f(x,y) = y^2 e^{3x}\) from the point \((0,3)\)

Present your answer as a vector of length 1.

[Simon and Blume, 1994]: Exercises 14.18, 14.19

Review the definition and facts about the gradient of a multivariate functions.

The direction of most rapid ascent of the multivariate function is given by its gradient.

1.

\[\begin{split} \begin{array}{l} \partial f(x,y)/\partial x = 8xy \\ \partial f(x,y)/\partial y = 4x^2 \\ \nabla f(x,y) = (8xy, 4x^2) \\ \nabla f(2,3) = (48, 16) \end{array} \end{split}\]

We need to make sure that the direction vector had length 1, so we divide by its norm

\[ \|(48, 32)\| = \sqrt{48^2+16^2} = \sqrt{2^{8}3^{2}+2^8} = 2^4 sqrt{10} \]

The direction vector is then \((48, 16)/16\sqrt{10} = (3/\sqrt{10}, 1/\sqrt{10})\)

2.

\[\begin{split} \begin{array}{l} \partial f(x,y)/\partial x = 3 y^2 e^{3x} \\ \partial f(x,y)/\partial y = 2y e^{3x} \\ \nabla f(x,y) = (3 y^2 e^{3x}, 2y e^{3x}) = e^{3x} (3 y^2, 2y) \\ \nabla f(0,3) = (27, 6) \end{array} \end{split}\]

We need to make sure that the direction vector had length 1, so we divide by its norm

\[ \|(27, 6)\| = \sqrt{27^2+6^2} = \sqrt{3^{6}+3^2 2^2} = 3 \sqrt{81+4} \]

The direction vector is then \((27, 6)/3\sqrt{85} = (9/\sqrt{85}, 2/\sqrt{85})\)

\(\gamma\).4#

A critical point of a multivariate function is the point at which all partial derivatives are zero.

Compute the critical points of the following functions:

  1. \(\quad\) \(x^4+x^2-6xy + 3y^2\)

  2. \(\quad\) \(x^2-6xy+2y^2+10x+2y-5\)

  3. \(\quad\) \(xy^2+x^3y-xy\)

  4. \(\quad\) \(3x^4+3x^2y-y^3\)

  5. \(\quad\) \(x^2+6xy+y^2-3yz+4z^2-10x-5y-21z\)

  6. \(\quad\) \((x^2+2y^2+3z^2) e^{-(x^2+y^2+z^2)}\)

[Simon and Blume, 1994]: Exercises 17.1, 17.2

Solution algorithm:

  • compute partial derivatives

  • solve the system of equations formed by the partial derivatives equalized to zeros

Correct answers:

  1. \((0,0), (1,1), (-1,-1)\)

  2. \((13/7,16/7)\)

  3. \((0,0), (0,1), (1,0), (-1,0), (1/\sqrt{5},2/5), (-1/\sqrt{5},2/5)\)

  4. \((0,0), (1/2,-1/2), (-1/2,-1/2)\)

  5. \((2,1,3)\)

  6. \((0,0,0), (1,0,0), (-1,0,0), (0,1,0), (0,-1,0), (0,0,1), (0,0,-1)\)

\(\gamma\).5#

Consider a function \(f : \mathbb{R}^N \ni x \mapsto x'Ax \in \mathbb{R}\), where \(N \times N\) matrix \(A\) is symmetric.

Using the product rule of multivariate calculus, derive the gradient and Hessian of \(f\). Make sure that all multiplied vectors and matrices are conformable.

You can assume that \(x\) is a column vector, and that any vector function of \(x\) is also a column vector.

A possible answer:

Represent the quadratic form as a dot product of two functions \(f(x) = x' A x = h(x) \cdot g(x)\), where \(h(x) = x\) and \(g(x) = A x\). Then \(Dh(x) = I\) (identity matrix) and \(Dg(x) = A\).

The last Jacobian can be easity derived by representing matrix multiplication as a linear combination of columns. Differentiating with respect to each element of \(x\) then yields a Jacobian composed of columns of matrix \(A\), therefore equal to it.

\[\begin{split} g(x) = A x = \left( \begin{array}{ccc} a_{11} & \cdots & a_{1N} \\ \vdots & \ddots & \vdots \\ a_{N1} & \cdots & a_{NN} \end{array} \right) \left( \begin{array}{c} x_1 \\ \vdots \\ x_N \end{array} \right) = \left( \begin{array}{c} a_{11} \\ \vdots \\ a_{N1} \end{array} \right) x_1 + \cdots \left( \begin{array}{c} a_{1N} \\ \vdots \\ a_{NN} \end{array} \right) x_N \end{split}\]

Applying the dot product rule of differentiation we have

\[\begin{split} D(h \cdot g)(x) = [h(x)]^T Dg(x) + [g(x)]^T Dh(x) = \\ = x^T A + [Ax]^T I = x^T A + x^T A = 2 x^T A = 2 [A x]^T \end{split}\]

The last transformation is transpose of a product + utilizing symmetry of \(A\).

The final answer is the \(1 \times N\) matrix (row vector)

\[ Df(x) = f(x) = 2 x^T A = 2 [A x]^T \]

\(\gamma\).6#

Compute directional derivative of \(f(x,y) = xy^2 + x^3y\) at the point \((4,-2)\) in the direction of the vector \((1/\sqrt{10},3/\sqrt{10})\).

Proceed in two different ways:

  • first, using the definition of the directional derivative, write down a function \(g \colon \mathbb{R} \to \mathbb{R}\) of \(h\) given as a slice of the original function \(f(x,y)\) through the given point in the given direction; then differentiate this function and compute the derivative at \(h=0\)

  • second, use the gradient formula; and verify that the same answer is obtained

[Simon and Blume, 1994]: Exercise 14.20

Follow the example in the lecture notes.

Denote \((x_0,y_0) = (4,-2)\), and \(v = (1/\sqrt{10},3/\sqrt{10})\).

Note that \(\|v\| = \sqrt{(1/\sqrt{10})^2+(3/\sqrt{10})^2} = (1+3^2)/10 = 1\).

First approach

Let \(g(h)\) denote the univariate function of \(h\) obtained by slicing the function \(f(x,y)\) through the point \((x_0,y_0)\) in the direction of the vector \(v\):

\[ g(h) = f(x_0+hv_1,y_0+hv_2) = (x_0+hv_1)(y_0+hv_2)^2 + (x_0+hv_1)^3(y_0+hv_2) \]

We have

\[\begin{split} \begin{array}{l} \frac{d g(h)}{dh} = v_1(y_0+hv_2)^2 + 2v_2(x_0+hv_1)(y_0+hv_2) + 3v_1(x_0+hv_1)^2(y_0+hv_2)+v_2(x_0+hv_1)^3 \\ \frac{d g(0)}{dh} = v_1 y_0^2 + 2v_2 x_0 y_0 + 3v_1 x_0^2 y_0 + v_2 x_0^3 \end{array} \end{split}\]

Evaluating the last expression at the given \((x_0,y_0)\) and \(v\) we get

\[\begin{split} \begin{array}{l} \frac{d g(0)}{dh} = (1/\sqrt{10}) (-2)^2 + 2 (3/\sqrt{10})4(-2) + 3 (1/\sqrt{10}) 4^2 (-2) + (3/\sqrt{10}) 4^3 = \\ = (4-48-96+192)/\sqrt{10} = 52/\sqrt{10} \end{array} \end{split}\]

Second approach

Compute the partial derivatives and the gradient

\[\begin{split} \begin{array}{l} \frac{\partial f(x,y)}{\partial x} = y^2+3x^2y \\ \frac{\partial f(x,y)}{\partial y} = 2xy+x^3 \\ \nabla f(x,y) = \left( y^2+3x^2y, 2xy+x^3 \right) \\ D_v f(x_0,y_0) = \nabla f(x_0,y_0) \cdot v = (y_0^2+3x_0^2y_0, 2x_0y_0+x_0^3) \cdot (1/\sqrt{10},3/\sqrt{10}) = \\ = ((-2)^2+3\cdot 4^2(-2), 2\cdot 4(-2)+4^3) \cdot (1/\sqrt{10},3/\sqrt{10}) = \\ = (4-96, -16+64) \cdot (1/\sqrt{10},3/\sqrt{10}) = \\ = (-92,48) \cdot (1/\sqrt{10},3/\sqrt{10}) = -92/\sqrt{10} + 48\cdot 3/\sqrt{10} = 52/\sqrt{10} \end{array} \end{split}\]