🔬 Tutorial problems theta#
\(\theta\).1#
Determine definiteness of the quadratic forms defined with the following matrixes either by Silvester’s criterion or eigenvalue criterion. For the asymmetric matrices use their symmetric part \(\frac{1}{2}(A+A^{T})\) when constructing a quadratic form (see exercise \(\epsilon\).1)
\(A_1\)
Matrix \(A_1\) is not symmetric, we consider its symmetric part
Applying the Silvester’s criterion, first we compute the leading principle minors:
This pattern does not fit neither definite nor semi-definite patters of the Silvester’s criterion, therefore \(A_1\) is indefinite.
You can also verify numerically that eigenvalues have varying signs.
\(A_2\)
Matrix \(A_2\) is not symmetric, we consider its symmetric part
Applying the Silvester’s criterion, first we compute the leading principle minors:
Hence, by Silvester’s criterion \(A_2\) is positive definite.
You can also verify numerically that all eigenvalues are strictly positive.
\(A_3\)
Matrix \(A_3\) is not symmetric, we consider its symmetric part
Applying the Silvester’s criterion, first we compute the leading principle minors:
This pattern does not fit neither definite nor semi-definite patters of the Silvester’s criterion, therefore \(A_3\) is indefinite.
You can also verify numerically that eigenvalues have varying signs.
\(A_4\)
Matrix \(A_4\) is symmetric, let us assess definiteness after computing eigenvalues
(subtracting second column from the first, see the section on the properties of the determinants)
Hence, the eigenvalues are \(\{0,6\}\), therefore \(A_4\) is positive semi-definite.
Silvester’s criterion agrees with this conclusion:
all principle minors of order 1 are positive, \(\det(2)>0\)
all other principle minors are zero
This is consistent with the positive semi-definite definiteness pattern of the Silvester’s criterion.
\(A_5\)
Matrix \(A_5\) is symmetric, let us again assess definiteness after computing eigenvalues
Eigenvalues are \(\{0,-14\}\), hence the \(A_5\) is negative semi-definite.
Silvester’s criterion again agrees with this conclusion:
all principle minors of order 1 are negative:
all other principle minors are zero:
This is consistent with the negative semi-definite definiteness pattern of the Silvester’s criterion.
\(\theta\).2#
This exercise takes you on a tour of a binary logit model and its properties.
Consider a model when a decision maker is making a choice between \(J=2\) two alternatives, each of which has a scalar characteristic \(x_j \in \mathbb{R}\), \(j=1,2\). Econometrician observes data on these characteristics, the choice made by the decision maker \(y_i \in \{0,1\}\) and an attribute of the decision maker, \(z_i \in \mathbb{R}\). The positive value of \(y_i\) denotes that the first alternative was chosen. The data is indexed with \(i\) and has \(N\) observations, i.e. \(i \in \{1,\dots,N\}\).
To rationalize the data the econometrician assumes that the utility of each alternative is given by a scalar product of a vector of parameters \(\beta \in \mathbb{R}^2\) and a vector function \(h \colon \mathbb{R}^2 \to \mathbb{R}^2\) of alternative and decision maker attributes. Let
In line with the random utility model, the econometrician also assumes that the utility of each alternative contains the additively separable random component which has an appropriately centered type I extreme value distribution, such that the choice probabilities for the two alternatives are given by a vector function \(p \colon \mathbb{R}^2 \to (0,1) \subset \mathbb{R}^2\)
In order to estimate the vector of parameters of the model \(\beta\), the econometrician maximizes the likelihood of observing the data \(D = \big(\{x_j\}_{j \in \{1,2\}},\{z_i,y_i\}_{i \in \{1,\dots,N\}}\big)\). The log-likelihood function \(logL \colon \mathbb{R}^{2+J+2N} \to \mathbb{R}\) is given by
where the individual log-likelihood contribution is given by a scalar product function \(\ell_i \colon \mathbb{R}^6 \to \mathbb{R}\)
Assignments:
Write down the optimization problem the econometrician is solving. Explain the meaning of each part.
What are the variables the econometrician has control over in the estimation exercise?
What variables should be treated as parameters of the optimization problem?
Elaborate on whether the solution can be guaranteed to exist.
What theorem should be applied?
What conditions of the theorem are met?
What conditions of the theorem are not met?
Derive the gradient and Hessian of the log-likelihood function. Make sure that all multiplied vectors and matrices are conformable.
Derive conditions under which the Hessian of the log-likelihood function is negative definite.
Derive conditions under which the likelihood function has a unique maximizer (and thus the logit model has a unique maximum likelihood estimator).
Question F.3
1.
The optimization problem is:
We can control \(\beta = {[\beta_1, \beta_2]^{\prime}}\) (coefficients to be estimated).
We treat \(x_1, x_2, {(z_i, y_i)}_{i=1}^n\) as parameters (data).
2.
We can try to apply Weierstrass theorem.
The objective function is continuous.
But the domain is not compact (\(\mathbb{R}^2\) is closed but unbounded).
3.
Denote \(D_{\beta} l_i\) as the Jacobian of \(l_i\) with respect to \(\beta\), \(H_{\beta} l_i\) as the Hessian of \(l_i\) with respect to \(\beta\).
Notice that \(l_i(\beta) = l_i\Big(p_i\big(u_i(\beta)\big)\Big)\), then by the chain rule:
We calculate the three terms on the r.h.s. one by one:
Thus,
The Jacobian (gradient) of the MLE objective function is:
We set \(h(\beta_i) \equiv (D_\beta l_i)^{\prime}\). Then \(h(\beta)= h\Big(p_i\big(u_i(\beta)\big)\Big)\) Thus, by applying the chain rule again, Hessian of \(l_i\) with respect to \(\beta\) is:
Thus, the Hessian of the MLE objective function is:
4.
Let us find the conditions under which the Hessian is negative definite. Notice that \((x_1 - x_2)^2 > 0\) if and only if \(x_1 \neq x_2\), and \(p_{i1} p_{i2} > 0\) for all \(i\). Thus, if \(x_1 \neq x_2\), we only need to check the condition \(det(H_\beta l) > 0\). Notice that:
Thus, we get a sufficient condition for the unique maximizer:
\(x_1 \neq x_2\),
\(z_i \neq z_j\) for some \(i, j\).
It’s easy to show this condition is also a necessary condition. Since if \(x_1 = x_2\), \(\beta_1\) has no impact on the likelihood function, and if \(z_i = z_j\) for all \(i, j\), we cannot distinguish \(\beta_1\) from \(\beta_2\) (this is called strict multicollinearity in econometrics).
Thus, the logit model has a unique ML estimator if and only if \(x_1 \neq x_2\), and \(z_i \neq z_j\) for some \(i, j\).
The intuition is that if the model can be estimated (identifiable in econometrics jargon), the two alternatives cannot be the same (\(x_1 \neq x_2\)), and at least two people in the data set should have different characteristics (\(z_i \neq z_j\) for some \(i, j\)).
5.
If the Hessian \(H_{\beta} l\) is negative definite for all \(\beta \in \mathbb{R}^2\), then \(\det (H_{\beta} l)> 0\) always holds, we know there must be at least one solution to the first order conditions \(D_{\beta} l = 0\) by the Inverse function theorem.
Moreover, if the Hessian is negative definite, then the MLE objective is strictly concave, i.e., there is a unique maximizer of the log likelihood function, which is the unique solution to the first order conditions.