π¬ Tutorial problems eta \(\eta\)#
Note
This problems are designed to help you practice the concepts covered in the lectures. Not all problems may be covered in the tutorial, those left out are for additional practice on your own.
\(\eta\).1#
Find the largest domain \(S \subset \mathbb{R}^2\) on which
is concave.
How about strictly concave?
It is useful to review the Hessian based conditions for concavity and the conditions for definiteness of a Hessian of \(\mathbb{R}^2 \to \mathbb{R}\) functions.
Combining the fact on definiteness of the Hessian of two-variate functions from the lecture notes with the Hessian based conditions for convexity of functions, we can formulate the following fact.
Fact: the sufficient conditions for concavity/convexity in 2D
Let \(z = f(x,y)\) be a twice continuously differentiable function defined for all \((x, y) \in R^2\).
Then it holds:
\(f \text{ is convex } \iff f''_{1,1} \ge 0, \; f''_{2,2} \ge 0 , \text{ and } f''_{1,1} f''_{2,2} - (f''_{1,2})^2 \ge 0\)
\(f \text{ is concave } \iff f''_{1,1} \le 0, \; f''_{2,2} \le 0 , \text{ and } f''_{1,1} f''_{2,2} - (f''_{1,2})^2 \ge 0\)
\(f''_{1,1} > 0 \text{ and } f''_{1,1} f''_{2,2} \implies f \text{ is strictly convex}\)
\(f''_{1,1} < 0 \text{ and } f''_{1,1} f''_{2,2} \implies f \text{ is strictly concave}\)
The proof is a simple combination of the known facts named above.
Now we have
\(f^{\prime}_{1}(x, y) = 2x - y -3x^2\)
\(f^{\prime}_{2}(x, y) = -2y - x\)
\(f^{\prime\prime}_{1, 1}(x, y) = 2 - 6x\)
\(f^{\prime\prime}_{2, 2}(x, y) = -2\)
\(f^{\prime\prime}_{1, 2}(x, y) = -1\)
Thus, \(S = \{(x, y) \in \mathbb{R^2}: x \geq \frac{5}{12}\}= [\frac{5}{12}, +\infty) \times \mathbb{R}\).
For the strict concavity we just replace all the inequalities to strict inequalities, and we get \(S = \{(x, y) \in \mathbb{R^2}: x > \frac{5}{12}\}= (\frac{5}{12}, +\infty) \times \mathbb{R}\).
\(\eta\).2#
Show that the function \(f(x) = - |x|\) from \(\mathbb{R}\) to \(\mathbb{R}\) is concave.
Because the function is not differentiable everywhere in its domain, using the definition of concavity could be an easier way.
Pick any \(x, y \in \mathbb{R}\) and any \(\lambda \in [0, 1]\). By the triangle inequality, we have
and hence
That is, \(f(\lambda x + (1 - \lambda) y) \geq \lambda f(x) + (1 - \lambda)f(y)\). Hence \(f\) is concave as claimed.
\(\eta\).3#
Consider the function \(f\) from \(\mathbb{R}\) to \(\mathbb{R}\) defined by
Give a necessary and sufficient (if and only if) condition on \(c\) under which \(f\) has a unique minimizer.
The function \(f\) has a unique minimizer at \(x^* = 0\) if and only if \(c \ne 0\).
Hereβs one proof: If \(c \ne 0\) then the function is strictly convex. Moreover, it is stationary at \(x^* = 0\). Hence, by our facts on minimization under convexity, \(x^*\) is the unique minimizer. The condition is necessary and sufficient because if \(c = 0\), then \(f\) is a constant function, which clearly does not have a unique minimizer.
Hereβs a second (more direct) proof that the correct condition is \(c \ne 0\). Suppose first that \(c \ne0\) and pick any \(x \in \mathbb{R}\). We have
This tells us that \(x^* = 0\) is a minimizer. Moreover,
Hence \(x^* = 0\) is the unique minimizer.
Suppose next that \(x^* = 0\) is the unique minimizer. Then it must be that \(c \ne0\), for if \(c=0\) then \(f(x) = f(x^*)\) for every \(x \in \mathbb{R}\).
\(\eta\).4#
Let \(C\) be an \(N \times K\) matrix, let \(z \in \mathbb{R}\) and consider the function \(f\) from \(\mathbb{R}^K\) to \(\mathbb{R}\) defined by
Show that \(f\) has a unique minimizer on \(\mathbb{R}^K\) if and only if \(C\) has linearly independent columns.
Obviously, you should draw intuition from the preceding question.
Also, what does linear independence of the columns of \(C\) say about the vector \(C x\) for different choices of \(x\)?
Suppose first that \(C\) has linearly independent columns. We claim that \(x = 0\) is the unique minimizer of \(f\) on \(\mathbb{R}^K\). To see this observe that if \(x = 0\) then \(f(x) = z\). On the other hand, if \(x \ne 0\), then, by linear independence, \(Cx\) is not the origin, and hence \(\| Cx \| > 0\). Therefore
Thus \(x = 0\) is the unique minimizer of \(f\) on \(\mathbb{R}^K\) as claimed.
Since this is an βif and only ifβ proof we also need to show that when \(f\) has a unique minimizer on \(\mathbb{R}^K\), it must be that \(C\) has linearly independent columns. Suppose to the contrary that the columns of \(C\) are not linearly independent. We will show that multiple minimizers exist.
Since \(f(x) = \| C x \|^2 + z\) it is clear that \(f(x) \geq z\), and hence \(x = 0\) is one minimizer. (At this point, \(f\) evaluates to \(z\).) Since the columns of \(C\) are not linearly independent, there exists a nonzero vector \(y\) such that \(C y = 0\). At this vector we clearly have \(f(y) = z\). Hence \(y\) is another minimizer.
\(\eta\).5#
This exercise takes you on a tour of a binary logit model and its properties.
Consider a model when a decision maker is making a choice between \(J=2\) two alternatives, each of which has a scalar characteristic \(x_j \in \mathbb{R}\), \(j=1,2\). Econometrician observes data on these characteristics, the choice made by the decision maker \(y_i \in \{0,1\}\) and an attribute of the decision maker, \(z_i \in \mathbb{R}\). The positive value of \(y_i\) denotes that the first alternative was chosen. The data is indexed with \(i\) and has \(N\) observations, i.e. \(i \in \{1,\dots,N\}\).
To rationalize the data the econometrician assumes that the utility of each alternative is given by a scalar product of a vector of parameters \(\beta \in \mathbb{R}^2\) and a vector function \(h \colon \mathbb{R}^2 \to \mathbb{R}^2\) of alternative and decision maker attributes. Let
In line with the random utility model, the econometrician also assumes that the utility of each alternative contains the additively separable random component which has an appropriately centered type I extreme value distribution, such that the choice probabilities for the two alternatives are given by a vector function \(p \colon \mathbb{R}^2 \to (0,1)\times(0,1) \subset \mathbb{R}^2\)
In order to estimate the vector of parameters of the model \(\beta\), the econometrician maximizes the likelihood of observing the data \(D = \big(\{x_j\}_{j \in \{1,2\}},\{z_i,y_i\}_{i \in \{1,\dots,N\}}\big)\). The log-likelihood function \(logL \colon \mathbb{R}^{2+J+2N} \to \mathbb{R}\) is given by
where the individual log-likelihood contribution is given by a scalar product function \(\ell_i \colon \mathbb{R}^6 \to \mathbb{R}\)
Assignments:
Write down the optimization problem the econometrician is solving. Explain the meaning of each part.
What are the variables the econometrician has control over in the estimation exercise?
What variables should be treated as parameters of the optimization problem?
Elaborate on whether the solution can be guaranteed to exist.
What theorem should be applied?
What conditions of the theorem are met?
What conditions of the theorem are not met?
Derive the gradient and Hessian of the log-likelihood function. Make sure that all multiplied vectors and matrices are conformable.
Derive conditions under which the Hessian of the log-likelihood function is negative definite.
Derive conditions under which the likelihood function has a unique maximizer (and thus the logit model has a unique maximum likelihood estimator).
Question F.3
1.
The optimization problem is:
We can control \(\beta = {[\beta_1, \beta_2]^{\prime}}\) (coefficients to be estimated).
We treat \(x_1, x_2, {(z_i, y_i)}_{i=1}^n\) as parameters (data).
2.
We can try to apply Weierstrass theorem.
The objective function is continuous.
But the domain is not compact (\(\mathbb{R}^2\) is closed but unbounded).
3.
Denote \(D_{\beta} l_i\) as the Jacobian of \(l_i\) with respect to \(\beta\), \(H_{\beta} l_i\) as the Hessian of \(l_i\) with respect to \(\beta\).
Notice that \(l_i(\beta) = l_i\Big(p_i\big(u_i(\beta)\big)\Big)\), then by the chain rule:
We calculate the three terms on the r.h.s. one by one:
Thus,
The Jacobian (gradient) of the MLE objective function is:
We set \(h(\beta_i) \equiv (D_\beta l_i)^{\prime}\). Then \(h(\beta)= h\Big(p_i\big(u_i(\beta)\big)\Big)\) Thus, by applying the chain rule again, Hessian of \(l_i\) with respect to \(\beta\) is:
Thus, the Hessian of the MLE objective function is:
4.
Let us find the conditions under which the Hessian is negative definite. Notice that \((x_1 - x_2)^2 > 0\) if and only if \(x_1 \neq x_2\), and \(p_{i1} p_{i2} > 0\) for all \(i\). Thus, if \(x_1 \neq x_2\), we only need to check the condition \(det(H_\beta l) > 0\). Notice that:
Thus, we get a sufficient condition for the unique maximizer:
\(x_1 \neq x_2\),
\(z_i \neq z_j\) for some \(i, j\).
Itβs easy to show this condition is also a necessary condition. Since if \(x_1 = x_2\), \(\beta_1\) has no impact on the likelihood function, and if \(z_i = z_j\) for all \(i, j\), we cannot distinguish \(\beta_1\) from \(\beta_2\) (this is called strict multicollinearity in econometrics).
Thus, the logit model has a unique ML estimator if and only if \(x_1 \neq x_2\), and \(z_i \neq z_j\) for some \(i, j\).
The intuition is that if the model can be estimated (identifiable in econometrics jargon), the two alternatives cannot be the same (\(x_1 \neq x_2\)), and at least two people in the data set should have different characteristics (\(z_i \neq z_j\) for some \(i, j\)).
5.
We have shown necessary and sufficient conditions for the Hessian of the log-likelihood function to be negative definite for all \(\beta \in \mathbb{R}^2\).
Therefore if the maximum exists, it is unique due to the fact that the objective function is strictly concave on \(\mathbb{R}^2\).
The existence of the maximum is not however guaranteed because the domain is not compact as discussed in part 2. For the Weierstrass theorem to apply we have to assume that the true parameter value \(\beta^*\) is within some bounded subset of \(\mathbb{R}^2\). This is one of the standard regularity assumptions in statistics.