🔬 Tutorial problems theta

🔬 Tutorial problems theta#

\(\theta\).1#

Determine definiteness of the quadratic forms defined with the following matrixes either by Silvester’s criterion or eigenvalue criterion. For the asymmetric matrices use their symmetric part \(\frac{1}{2}(A+A^{T})\) when constructing a quadratic form (see exercise \(\epsilon\).1)

\[\begin{split} A_1 = \begin{pmatrix} 5 & 0 & 1 \\ 1 & 1 & 0 \\ -7 & 1 & 0 \end{pmatrix} \end{split}\]
\[\begin{split} A_2 = \begin{pmatrix} 5 & -2 & 3 \\ 0 & 4 & 0 \\ 0 & -1 & 3 \end{pmatrix} \end{split}\]
\[\begin{split} A_3 = \begin{pmatrix} 1 & 0 & 12 \\ 2 & -5 & 0 \\ 1 & 0 & 2 \end{pmatrix} \end{split}\]
\[\begin{split} A_4 = \begin{pmatrix} 2 & 2 & 2 \\ 2 & 2 & 2 \\ 2 & 2 & 2 \end{pmatrix} \end{split}\]
\[\begin{split} A_5 = \begin{pmatrix} -4 & 2 & -6 \\ 2 & -1 & 3 \\ -6 & 3 & -9 \end{pmatrix} \end{split}\]

\(A_1\)

Matrix \(A_1\) is not symmetric, we consider its symmetric part

\[\begin{split} \frac{1}{2}(A_1+A_1^{T}) = \begin{pmatrix} 5 & \tfrac{1}{2} & -3 \\ \tfrac{1}{2} & 1 & \tfrac{1}{2} \\ -3 & \tfrac{1}{2} & 0 \end{pmatrix} \end{split}\]

Applying the Silvester’s criterion, first we compute the leading principle minors:

\[ M_1 = \det(5) >0 \]
\[\begin{split} M_2 = \det \begin{pmatrix} 5 & \tfrac{1}{2} \\ \tfrac{1}{2} & 1 \end{pmatrix} = 5 - \tfrac{1}{4} >0 \end{split}\]
\[\begin{split} M_2 = \det \begin{pmatrix} 5 & \tfrac{1}{2} & -3 \\ \tfrac{1}{2} & 1 & \tfrac{1}{2} \\ -3 & \tfrac{1}{2} & 0 \end{pmatrix} = 0-\tfrac{3}{4}-\tfrac{3}{4}-9-\tfrac{5}{4} <0 \end{split}\]

This pattern does not fit neither definite nor semi-definite patters of the Silvester’s criterion, therefore \(A_1\) is indefinite.

You can also verify numerically that eigenvalues have varying signs.

\(A_2\)

Matrix \(A_2\) is not symmetric, we consider its symmetric part

\[\begin{split} \frac{1}{2}(A_2+A_2^{T}) = \begin{pmatrix} 5 & -1 & \tfrac{3}{2} \\ -1 & 4 & -\tfrac{1}{2} \\ \tfrac{3}{2} & -\tfrac{1}{2} & 3 \end{pmatrix} \end{split}\]

Applying the Silvester’s criterion, first we compute the leading principle minors:

\[ M_1 = \det(5) >0 \]
\[\begin{split} M_2 = \det \begin{pmatrix} 5 & -1 \\ -1 & 4 \end{pmatrix} = 20 - 1 >0 \end{split}\]
\[\begin{split} M_2 = \det \begin{pmatrix} 5 & -1 & \tfrac{3}{2} \\ -1 & 4 & -\tfrac{1}{2} \\ \tfrac{3}{2} & -\tfrac{1}{2} & 3 \end{pmatrix} = 60 + \tfrac{3}{4} + \tfrac{3}{4} - 9 - 3 - \tfrac{5}{4} >0 \end{split}\]

Hence, by Silvester’s criterion \(A_2\) is positive definite.

You can also verify numerically that all eigenvalues are strictly positive.

\(A_3\)

Matrix \(A_3\) is not symmetric, we consider its symmetric part

\[\begin{split} \frac{1}{2}(A_3+A_3^{T}) = \begin{pmatrix} 1 & 1 & \tfrac{13}{2} \\ 1 & -5 & 0 \\ \tfrac{13}{2} & 0 & 2 \end{pmatrix} \end{split}\]

Applying the Silvester’s criterion, first we compute the leading principle minors:

\[ M_1 = \det(1) >0 \]
\[\begin{split} M_2 = \det \begin{pmatrix} 1 & 1 \\ 1 & -5 \end{pmatrix} = -5 - 1 < 0 \end{split}\]

This pattern does not fit neither definite nor semi-definite patters of the Silvester’s criterion, therefore \(A_3\) is indefinite.

You can also verify numerically that eigenvalues have varying signs.

\(A_4\)

Matrix \(A_4\) is symmetric, let us assess definiteness after computing eigenvalues

\[\begin{split} \det \begin{pmatrix} 2-\lambda & 2 & 2 \\ 2 & 2-\lambda & 2 \\ 2 & 2 & 2-\lambda \end{pmatrix} = \det \begin{pmatrix} -\lambda & 2 & 2 \\ \lambda & 2-\lambda & 2 \\ 0 & 2 & 2-\lambda \end{pmatrix} = \end{split}\]

(subtracting second column from the first, see the section on the properties of the determinants)

\[ = -\lambda \big( (2-\lambda)^2 -4 \big) -\lambda(4 -2\lambda -4) = \]
\[ = -\lambda \big(\lambda^2 - 4\lambda + 4-4 -2\lambda \big) = -\lambda^2 (\lambda - 6) \]

Hence, the eigenvalues are \(\{0,6\}\), therefore \(A_4\) is positive semi-definite.

Silvester’s criterion agrees with this conclusion:

  • all principle minors of order 1 are positive, \(\det(2)>0\)

  • all other principle minors are zero

This is consistent with the positive semi-definite definiteness pattern of the Silvester’s criterion.

\(A_5\)

Matrix \(A_5\) is symmetric, let us again assess definiteness after computing eigenvalues

\[\begin{split} \det \begin{pmatrix} -4-\lambda & 2 & -6 \\ 2 & -1-\lambda & 3 \\ -6 & 3 & -9-\lambda \end{pmatrix} = \end{split}\]
\[\begin{split} = -(\lambda+4)(\lambda+1)(\lambda+9) -36 -36 +\\ + 36(\lambda+1) + 4(\lambda+9) +9(\lambda+4) =\\ = -\lambda^3 -14\lambda^2 - 49\lambda - 36 -72 +\\ +36\lambda + 36 + 4\lambda + 36 + 9\lambda + 36 =\\ = -\lambda^3 -14\lambda^2 = -\lambda^2(\lambda + 14) \end{split}\]

Eigenvalues are \(\{0,-14\}\), hence the \(A_5\) is negative semi-definite.

Silvester’s criterion again agrees with this conclusion:

  • all principle minors of order 1 are negative:

\[\begin{split} \det(-4)<0\\ \det(-1)<0\\ \det(-9)<0 \end{split}\]
  • all other principle minors are zero:

\[\begin{split} \det \begin{pmatrix} -4 & 2 \\ 2 & -1 \end{pmatrix} =0\\ \det \begin{pmatrix} -4 & -6 \\ -6 & -9 \end{pmatrix} =0\\ \det \begin{pmatrix} -1 & 3 \\ 3 & -9 \end{pmatrix} =0\\ \det \begin{pmatrix} -4 & 2 & -6 \\ 2 & -1 & 3 \\ -6 & 3 & -9 \end{pmatrix} =-36 \cdot 3 +36 \cdot 3 =0 \end{split}\]

This is consistent with the negative semi-definite definiteness pattern of the Silvester’s criterion.

\(\theta\).2#

This exercise takes you on a tour of a binary logit model and its properties.

Consider a model when a decision maker is making a choice between \(J=2\) two alternatives, each of which has a scalar characteristic \(x_j \in \mathbb{R}\), \(j=1,2\). Econometrician observes data on these characteristics, the choice made by the decision maker \(y_i \in \{0,1\}\) and an attribute of the decision maker, \(z_i \in \mathbb{R}\). The positive value of \(y_i\) denotes that the first alternative was chosen. The data is indexed with \(i\) and has \(N\) observations, i.e. \(i \in \{1,\dots,N\}\).

To rationalize the data the econometrician assumes that the utility of each alternative is given by a scalar product of a vector of parameters \(\beta \in \mathbb{R}^2\) and a vector function \(h \colon \mathbb{R}^2 \to \mathbb{R}^2\) of alternative and decision maker attributes. Let

\[\begin{split} h \colon \left( \begin{array}{c} x \\ z \end{array} \right) \mapsto \left( \begin{array}{l} x \\ xz \end{array} \right) \end{split}\]

In line with the random utility model, the econometrician also assumes that the utility of each alternative contains the additively separable random component which has an appropriately centered type I extreme value distribution, such that the choice probabilities for the two alternatives are given by a vector function \(p \colon \mathbb{R}^2 \to (0,1) \subset \mathbb{R}^2\)

\[\begin{split} p \colon \left( \begin{array}{c} u_1 \\ u_2 \end{array} \right) \mapsto \left( \begin{array}{c} \frac{\exp(u_1)}{\exp(u_1) + \exp(u_2)}\\ \frac{\exp(u_2)}{\exp(u_1) + \exp(u_2)} \end{array} \right) \end{split}\]

In order to estimate the vector of parameters of the model \(\beta\), the econometrician maximizes the likelihood of observing the data \(D = \big(\{x_j\}_{j \in \{1,2\}},\{z_i,y_i\}_{i \in \{1,\dots,N\}}\big)\). The log-likelihood function \(logL \colon \mathbb{R}^{2+J+2N} \to \mathbb{R}\) is given by

\[ logL(\beta,D) = \sum_{i=1}^N \ell_i(\beta,x_1,x_2,z_i,y_i), \]

where the individual log-likelihood contribution is given by a scalar product function \(\ell_i \colon \mathbb{R}^6 \to \mathbb{R}\)

\[\begin{split} \ell_i(\beta,x_1,x_2,z_i,y_i) = \left( \begin{array}{l} y_i \\ 1-y_i \end{array} \right) \cdot \log\left(p \left( \begin{array}{l} \beta \cdot h(x_1,z_i) \\ \beta \cdot h(x_2,z_i) \end{array} \right) \right) \end{split}\]

Assignments:

  1. Write down the optimization problem the econometrician is solving. Explain the meaning of each part.

    • What are the variables the econometrician has control over in the estimation exercise?

    • What variables should be treated as parameters of the optimization problem?

  2. Elaborate on whether the solution can be guaranteed to exist.

    • What theorem should be applied?

    • What conditions of the theorem are met?

    • What conditions of the theorem are not met?

  3. Derive the gradient and Hessian of the log-likelihood function. Make sure that all multiplied vectors and matrices are conformable.

  4. Derive conditions under which the Hessian of the log-likelihood function is negative definite.

  5. Derive conditions under which the likelihood function has a unique maximizer (and thus the logit model has a unique maximum likelihood estimator).

Question F.3

1.

The optimization problem is:

\[ \max \limits_{\beta \in \mathbb{R}^2} \sum\limits_{i=1}^{n} l_i(\beta, x_1, x_2, z_i, y_i). \]
  • We can control \(\beta = {[\beta_1, \beta_2]^{\prime}}\) (coefficients to be estimated).

  • We treat \(x_1, x_2, {(z_i, y_i)}_{i=1}^n\) as parameters (data).

2.

We can try to apply Weierstrass theorem.

  • The objective function is continuous.

  • But the domain is not compact (\(\mathbb{R}^2\) is closed but unbounded).

3.

Denote \(D_{\beta} l_i\) as the Jacobian of \(l_i\) with respect to \(\beta\), \(H_{\beta} l_i\) as the Hessian of \(l_i\) with respect to \(\beta\).

Notice that \(l_i(\beta) = l_i\Big(p_i\big(u_i(\beta)\big)\Big)\), then by the chain rule:

\[ D_{\beta}l_i =D_{p_i}l_i ~D_{u_i}p_i ~D_{\beta}u_i. \]

We calculate the three terms on the r.h.s. one by one:

\[\begin{split} \begin{array}{ll} D_{p_i} l_i &= \left[\begin{array}{cc} y_i/p_{i1} & (1-y_i)/p_{i2}\\ \end{array}\right], \\ D_{u_i} p_i &= \left[\begin{array}{cc} p_{i1}p_{i2} & -p_{i1} p_{i2}\\ -p_{i1} p_{i2} & p_{i1}p_{i2}\\ \end{array}\right] = p_{i1} p_{i2} \left[\begin{array}{cc} 1 & -1\\ -1 & 1\\ \end{array}\right], \\ D_{\beta} u_i &= \left[\begin{array}{cc} x_1 & x_1 z_i\\ x_2 & x_2 z_i\\ \end{array}\right]. \end{array} \end{split}\]

Thus,

\[\begin{split} \begin{array}{ll} D_\beta l_i &= \left[\begin{array}{cc} y_i/p_{i1} & (1-y_i)/p_{i2}\\ \end{array}\right] p_{i1} p_{i2} \left[\begin{array}{cc} 1 & -1\\ -1 & 1\\ \end{array}\right] \left[\begin{array}{cc} x_1 & x_1 z_i\\ x_2 & x_2 z_i\\ \end{array}\right],\\ &= (x_1 -x_2) \cdot \left[\begin{array}{cc} y_i p_{i2} - (1-y_i)p_{i1} & y_i p_{i2}z - (1-y_i)p_{i1} z\\ \end{array}\right]. \end{array} \end{split}\]

The Jacobian (gradient) of the MLE objective function is:

\[\begin{split} D_\beta l = \sum\limits_{i=1}^{n} D_\beta l_i = (x_1 -x_2) \cdot \sum\limits_{i=1}^{n} \left[\begin{array}{cc} y_i p_{i2} - (1-y_i)p_{i1} & y_i p_{i2}z - (1-y_i )p_{i1} z\\ \end{array}\right] \end{split}\]

We set \(h(\beta_i) \equiv (D_\beta l_i)^{\prime}\). Then \(h(\beta)= h\Big(p_i\big(u_i(\beta)\big)\Big)\) Thus, by applying the chain rule again, Hessian of \(l_i\) with respect to \(\beta\) is:

\[\begin{split} \begin{array}{ll} H_\beta l_i &= D_{\beta} h(\beta)\\ &= D_{p_i} h(p_i) ~ D_{u_i} p_i ~ D_{\beta} u_i\\ &= (x_1 - x_2) \left[\begin{array}{cc} -(1-y_i) & y_i \\ -(1-y_i)z_i & y_i z_i \\ \end{array}\right] p_{i1} p_{i2} \left[\begin{array}{cc} 1 & -1\\ -1 & 1\\ \end{array}\right] \left[\begin{array}{cc} x_1 & x_1 z_i\\ x_2 & x_2 z_i\\ \end{array}\right] \\ & = (x_1 - x_2)^2 p_{i1} p_{i2} \left[\begin{array}{cc} -1 & -z_i\\ -z_i & -z_i^2\\ \end{array}\right] \end{array} \end{split}\]

Thus, the Hessian of the MLE objective function is:

\[\begin{split} \begin{array}{ll} H_\beta l &= \sum\limits_{i=1}^{n} H_\beta l_i \\ &= (x_1 - x_2)^2 \sum\limits_{i=1}^{n} p_{i1} p_{i2} \left[\begin{array}{cc} x_1 & x_1 z_i\\ x_2 & x_2 z_i\\ \end{array}\right]\\ & = (x_1 - x_2)^2 \left[\begin{array}{cc} - \sum\limits_{i=1}^{n} p_{i1} p_{i2} & - \sum\limits_{i=1}^{n} p_{i1} p_{i2} z_i \\ - \sum\limits_{i=1}^{n} p_{i1} p_{i2} z_i & - \sum\limits_{i=1}^{n} p_{i1} p_{i2} z_i^2 \\ \end{array}\right] \end{array} \end{split}\]

4.

Let us find the conditions under which the Hessian is negative definite. Notice that \((x_1 - x_2)^2 > 0\) if and only if \(x_1 \neq x_2\), and \(p_{i1} p_{i2} > 0\) for all \(i\). Thus, if \(x_1 \neq x_2\), we only need to check the condition \(det(H_\beta l) > 0\). Notice that:

\[\begin{split} \begin{array}{ll} & \quad \quad \quad det(H_\beta l) > 0\\ &\iff (\sum_i p_{i1}p_{i2})(\sum_i p_{i1} p_{i2} z_i^2) - (\sum_i p_{i1} p_{i2} z_i)^2 > 0. \\ & \iff \sum_{i, j} p_{i1}p_{i2}p_{j1}p_{j2}z_j^2 - \sum_{i, j} p_{i1}p_{i2}p_{j1}p_{j2}z_i z_j > 0\\ & \iff \sum_{i > j} p_{i1}p_{i2}p_{j1}p_{j2}(z_i^2 + z_j^2) - \sum_{i > j} p_{i1}p_{i2}p_{j1}p_{j2}(2 z_i z_j) > 0\\ & \iff \sum_{i > j} p_{i1}p_{i2}p_{j1}p_{j2}(z_i - z_j)^2 > 0.\\ & \iff z_i \neq z_j \text{ for some } i, j. \end{array} \end{split}\]

Thus, we get a sufficient condition for the unique maximizer:

  • \(x_1 \neq x_2\),

  • \(z_i \neq z_j\) for some \(i, j\).

It’s easy to show this condition is also a necessary condition. Since if \(x_1 = x_2\), \(\beta_1\) has no impact on the likelihood function, and if \(z_i = z_j\) for all \(i, j\), we cannot distinguish \(\beta_1\) from \(\beta_2\) (this is called strict multicollinearity in econometrics).

Thus, the logit model has a unique ML estimator if and only if \(x_1 \neq x_2\), and \(z_i \neq z_j\) for some \(i, j\).

The intuition is that if the model can be estimated (identifiable in econometrics jargon), the two alternatives cannot be the same (\(x_1 \neq x_2\)), and at least two people in the data set should have different characteristics (\(z_i \neq z_j\) for some \(i, j\)).

5.

If the Hessian \(H_{\beta} l\) is negative definite for all \(\beta \in \mathbb{R}^2\), then \(\det (H_{\beta} l)> 0\) always holds, we know there must be at least one solution to the first order conditions \(D_{\beta} l = 0\) by the Inverse function theorem.

Moreover, if the Hessian is negative definite, then the MLE objective is strictly concave, i.e., there is a unique maximizer of the log likelihood function, which is the unique solution to the first order conditions.