πŸ”¬ Tutorial problems eta \eta

πŸ”¬ Tutorial problems eta \(\eta\)#

Note

This problems are designed to help you practice the concepts covered in the lectures. Not all problems may be covered in the tutorial, those left out are for additional practice on your own.

\(\eta\).1#

Find the largest domain \(S \subset \mathbb{R}^2\) on which

\[ f(x, y) = x^2 - y^2 - xy - x^3 \]

is concave.

How about strictly concave?

It is useful to review the Hessian based conditions for concavity and the conditions for definiteness of a Hessian of \(\mathbb{R}^2 \to \mathbb{R}\) functions.

Combining the fact on definiteness of the Hessian of two-variate functions from the lecture notes with the Hessian based conditions for convexity of functions, we can formulate the following fact.

Fact: the sufficient conditions for concavity/convexity in 2D

Let \(z = f(x,y)\) be a twice continuously differentiable function defined for all \((x, y) \in R^2\).

Then it holds:

  • \(f \text{ is convex } \iff f''_{1,1} \ge 0, \; f''_{2,2} \ge 0 , \text{ and } f''_{1,1} f''_{2,2} - (f''_{1,2})^2 \ge 0\)

  • \(f \text{ is concave } \iff f''_{1,1} \le 0, \; f''_{2,2} \le 0 , \text{ and } f''_{1,1} f''_{2,2} - (f''_{1,2})^2 \ge 0\)

  • \(f''_{1,1} > 0 \text{ and } f''_{1,1} f''_{2,2} \implies f \text{ is strictly convex}\)

  • \(f''_{1,1} < 0 \text{ and } f''_{1,1} f''_{2,2} \implies f \text{ is strictly concave}\)

The proof is a simple combination of the known facts named above.

Now we have

\(f^{\prime}_{1}(x, y) = 2x - y -3x^2\)

\(f^{\prime}_{2}(x, y) = -2y - x\)

\(f^{\prime\prime}_{1, 1}(x, y) = 2 - 6x\)

\(f^{\prime\prime}_{2, 2}(x, y) = -2\)

\(f^{\prime\prime}_{1, 2}(x, y) = -1\)

\[ f(x, y) \text{ is concave } \iff \]
\[\begin{split} \begin{cases} f^{\prime\prime}_{1, 1}(x, y) = 2 - 6x \leq 0 \\ f^{\prime\prime}_{2, 2}(x, y) = -2 \leq 0 \\ f^{\prime\prime}_{1, 1}(x, y) f^{\prime\prime}_{2, 2}(x, y) - f^{\prime\prime}_{1, 2}(x, y)^2 = (2 - 6x)(-2) - (-1)^2 = 12x - 5 \geq 0 \\ \end{cases} \end{split}\]
\[\begin{split} \iff x \geq \frac{5}{12}\\ \end{split}\]

Thus, \(S = \{(x, y) \in \mathbb{R^2}: x \geq \frac{5}{12}\}= [\frac{5}{12}, +\infty) \times \mathbb{R}\).

For the strict concavity we just replace all the inequalities to strict inequalities, and we get \(S = \{(x, y) \in \mathbb{R^2}: x > \frac{5}{12}\}= (\frac{5}{12}, +\infty) \times \mathbb{R}\).

\(\eta\).2#

Show that the function \(f(x) = - |x|\) from \(\mathbb{R}\) to \(\mathbb{R}\) is concave.

Because the function is not differentiable everywhere in its domain, using the definition of concavity could be an easier way.

Pick any \(x, y \in \mathbb{R}\) and any \(\lambda \in [0, 1]\). By the triangle inequality, we have

\[ |\lambda x + (1 - \lambda) y| \leq |\lambda x| + |(1 - \lambda) y| \]

and hence

\[ - |\lambda x + (1 - \lambda) y| \geq - |\lambda x| - |(1 - \lambda) y| = - \lambda |x| - (1 - \lambda) |y| \]

That is, \(f(\lambda x + (1 - \lambda) y) \geq \lambda f(x) + (1 - \lambda)f(y)\). Hence \(f\) is concave as claimed.

\(\eta\).3#

Consider the function \(f\) from \(\mathbb{R}\) to \(\mathbb{R}\) defined by

\[ f(x) = (c x)^2 + z \]

Give a necessary and sufficient (if and only if) condition on \(c\) under which \(f\) has a unique minimizer.

The function \(f\) has a unique minimizer at \(x^* = 0\) if and only if \(c \ne 0\).

Here’s one proof: If \(c \ne 0\) then the function is strictly convex. Moreover, it is stationary at \(x^* = 0\). Hence, by our facts on minimization under convexity, \(x^*\) is the unique minimizer. The condition is necessary and sufficient because if \(c = 0\), then \(f\) is a constant function, which clearly does not have a unique minimizer.

Here’s a second (more direct) proof that the correct condition is \(c \ne 0\). Suppose first that \(c \ne0\) and pick any \(x \in \mathbb{R}\). We have

\[ f(x) = (cx)^2 + z \geq z = f(0) \]

This tells us that \(x^* = 0\) is a minimizer. Moreover,

\[ f(x) = (cx)^2 + z > z = f(0) \quad \text{whenever} \quad x \ne x^* \]

Hence \(x^* = 0\) is the unique minimizer.

Suppose next that \(x^* = 0\) is the unique minimizer. Then it must be that \(c \ne0\), for if \(c=0\) then \(f(x) = f(x^*)\) for every \(x \in \mathbb{R}\).

\(\eta\).4#

Let \(C\) be an \(N \times K\) matrix, let \(z \in \mathbb{R}\) and consider the function \(f\) from \(\mathbb{R}^K\) to \(\mathbb{R}\) defined by

\[ f(x) = x' C' C x + z \]

Show that \(f\) has a unique minimizer on \(\mathbb{R}^K\) if and only if \(C\) has linearly independent columns.

Obviously, you should draw intuition from the preceding question.

Also, what does linear independence of the columns of \(C\) say about the vector \(C x\) for different choices of \(x\)?

Suppose first that \(C\) has linearly independent columns. We claim that \(x = 0\) is the unique minimizer of \(f\) on \(\mathbb{R}^K\). To see this observe that if \(x = 0\) then \(f(x) = z\). On the other hand, if \(x \ne 0\), then, by linear independence, \(Cx\) is not the origin, and hence \(\| Cx \| > 0\). Therefore

\[ f(x) = x' C' C x + z = (C x )' C x + z = \| C x \|^2 + z > z \]

Thus \(x = 0\) is the unique minimizer of \(f\) on \(\mathbb{R}^K\) as claimed.

Since this is an β€œif and only if” proof we also need to show that when \(f\) has a unique minimizer on \(\mathbb{R}^K\), it must be that \(C\) has linearly independent columns. Suppose to the contrary that the columns of \(C\) are not linearly independent. We will show that multiple minimizers exist.

Since \(f(x) = \| C x \|^2 + z\) it is clear that \(f(x) \geq z\), and hence \(x = 0\) is one minimizer. (At this point, \(f\) evaluates to \(z\).) Since the columns of \(C\) are not linearly independent, there exists a nonzero vector \(y\) such that \(C y = 0\). At this vector we clearly have \(f(y) = z\). Hence \(y\) is another minimizer.

\(\eta\).5#

This exercise takes you on a tour of a binary logit model and its properties.

Consider a model when a decision maker is making a choice between \(J=2\) two alternatives, each of which has a scalar characteristic \(x_j \in \mathbb{R}\), \(j=1,2\). Econometrician observes data on these characteristics, the choice made by the decision maker \(y_i \in \{0,1\}\) and an attribute of the decision maker, \(z_i \in \mathbb{R}\). The positive value of \(y_i\) denotes that the first alternative was chosen. The data is indexed with \(i\) and has \(N\) observations, i.e. \(i \in \{1,\dots,N\}\).

To rationalize the data the econometrician assumes that the utility of each alternative is given by a scalar product of a vector of parameters \(\beta \in \mathbb{R}^2\) and a vector function \(h \colon \mathbb{R}^2 \to \mathbb{R}^2\) of alternative and decision maker attributes. Let

\[\begin{split} h \colon \left( \begin{array}{c} x \\ z \end{array} \right) \mapsto \left( \begin{array}{l} x \\ xz \end{array} \right) \end{split}\]

In line with the random utility model, the econometrician also assumes that the utility of each alternative contains the additively separable random component which has an appropriately centered type I extreme value distribution, such that the choice probabilities for the two alternatives are given by a vector function \(p \colon \mathbb{R}^2 \to (0,1)\times(0,1) \subset \mathbb{R}^2\)

\[\begin{split} p \colon \left( \begin{array}{c} u_1 \\ u_2 \end{array} \right) \mapsto \left( \begin{array}{c} \frac{\exp(u_1)}{\exp(u_1) + \exp(u_2)}\\ \frac{\exp(u_2)}{\exp(u_1) + \exp(u_2)} \end{array} \right) \end{split}\]

In order to estimate the vector of parameters of the model \(\beta\), the econometrician maximizes the likelihood of observing the data \(D = \big(\{x_j\}_{j \in \{1,2\}},\{z_i,y_i\}_{i \in \{1,\dots,N\}}\big)\). The log-likelihood function \(logL \colon \mathbb{R}^{2+J+2N} \to \mathbb{R}\) is given by

\[ logL(\beta,D) = \sum_{i=1}^N \ell_i(\beta,x_1,x_2,z_i,y_i), \]

where the individual log-likelihood contribution is given by a scalar product function \(\ell_i \colon \mathbb{R}^6 \to \mathbb{R}\)

\[\begin{split} \ell_i(\beta,x_1,x_2,z_i,y_i) = \left( \begin{array}{l} y_i \\ 1-y_i \end{array} \right) \cdot \log\left(p \left( \begin{array}{l} \beta \cdot h(x_1,z_i) \\ \beta \cdot h(x_2,z_i) \end{array} \right) \right) \end{split}\]

Assignments:

  1. Write down the optimization problem the econometrician is solving. Explain the meaning of each part.

    • What are the variables the econometrician has control over in the estimation exercise?

    • What variables should be treated as parameters of the optimization problem?

  2. Elaborate on whether the solution can be guaranteed to exist.

    • What theorem should be applied?

    • What conditions of the theorem are met?

    • What conditions of the theorem are not met?

  3. Derive the gradient and Hessian of the log-likelihood function. Make sure that all multiplied vectors and matrices are conformable.

  4. Derive conditions under which the Hessian of the log-likelihood function is negative definite.

  5. Derive conditions under which the likelihood function has a unique maximizer (and thus the logit model has a unique maximum likelihood estimator).

Question F.3

1.

The optimization problem is:

\[ \max \limits_{\beta \in \mathbb{R}^2} \sum\limits_{i=1}^{n} l_i(\beta, x_1, x_2, z_i, y_i). \]
  • We can control \(\beta = {[\beta_1, \beta_2]^{\prime}}\) (coefficients to be estimated).

  • We treat \(x_1, x_2, {(z_i, y_i)}_{i=1}^n\) as parameters (data).

2.

We can try to apply Weierstrass theorem.

  • The objective function is continuous.

  • But the domain is not compact (\(\mathbb{R}^2\) is closed but unbounded).

3.

Denote \(D_{\beta} l_i\) as the Jacobian of \(l_i\) with respect to \(\beta\), \(H_{\beta} l_i\) as the Hessian of \(l_i\) with respect to \(\beta\).

Notice that \(l_i(\beta) = l_i\Big(p_i\big(u_i(\beta)\big)\Big)\), then by the chain rule:

\[ D_{\beta}l_i =D_{p_i}l_i ~D_{u_i}p_i ~D_{\beta}u_i. \]

We calculate the three terms on the r.h.s. one by one:

\[\begin{split} \begin{array}{ll} D_{p_i} l_i &= \left[\begin{array}{cc} y_i/p_{i1} & (1-y_i)/p_{i2}\\ \end{array}\right], \\ D_{u_i} p_i &= \left[\begin{array}{cc} p_{i1}p_{i2} & -p_{i1} p_{i2}\\ -p_{i1} p_{i2} & p_{i1}p_{i2}\\ \end{array}\right] = p_{i1} p_{i2} \left[\begin{array}{cc} 1 & -1\\ -1 & 1\\ \end{array}\right], \\ D_{\beta} u_i &= \left[\begin{array}{cc} x_1 & x_1 z_i\\ x_2 & x_2 z_i\\ \end{array}\right]. \end{array} \end{split}\]

Thus,

\[\begin{split} \begin{array}{ll} D_\beta l_i &= \left[\begin{array}{cc} y_i/p_{i1} & (1-y_i)/p_{i2}\\ \end{array}\right] p_{i1} p_{i2} \left[\begin{array}{cc} 1 & -1\\ -1 & 1\\ \end{array}\right] \left[\begin{array}{cc} x_1 & x_1 z_i\\ x_2 & x_2 z_i\\ \end{array}\right],\\ &= (x_1 -x_2) \cdot \left[\begin{array}{cc} y_i p_{i2} - (1-y_i)p_{i1} & y_i p_{i2}z - (1-y_i)p_{i1} z\\ \end{array}\right]. \end{array} \end{split}\]

The Jacobian (gradient) of the MLE objective function is:

\[\begin{split} D_\beta l = \sum\limits_{i=1}^{n} D_\beta l_i = (x_1 -x_2) \cdot \sum\limits_{i=1}^{n} \left[\begin{array}{cc} y_i p_{i2} - (1-y_i)p_{i1} & y_i p_{i2}z - (1-y_i )p_{i1} z\\ \end{array}\right] \end{split}\]

We set \(h(\beta_i) \equiv (D_\beta l_i)^{\prime}\). Then \(h(\beta)= h\Big(p_i\big(u_i(\beta)\big)\Big)\) Thus, by applying the chain rule again, Hessian of \(l_i\) with respect to \(\beta\) is:

\[\begin{split} \begin{array}{ll} H_\beta l_i &= D_{\beta} h(\beta)\\ &= D_{p_i} h(p_i) ~ D_{u_i} p_i ~ D_{\beta} u_i\\ &= (x_1 - x_2) \left[\begin{array}{cc} -(1-y_i) & y_i \\ -(1-y_i)z_i & y_i z_i \\ \end{array}\right] p_{i1} p_{i2} \left[\begin{array}{cc} 1 & -1\\ -1 & 1\\ \end{array}\right] \left[\begin{array}{cc} x_1 & x_1 z_i\\ x_2 & x_2 z_i\\ \end{array}\right] \\ & = (x_1 - x_2)^2 p_{i1} p_{i2} \left[\begin{array}{cc} -1 & -z_i\\ -z_i & -z_i^2\\ \end{array}\right] \end{array} \end{split}\]

Thus, the Hessian of the MLE objective function is:

\[\begin{split} \begin{array}{ll} H_\beta l &= \sum\limits_{i=1}^{n} H_\beta l_i \\ &= (x_1 - x_2)^2 \sum\limits_{i=1}^{n} p_{i1} p_{i2} \left[\begin{array}{cc} x_1 & x_1 z_i\\ x_2 & x_2 z_i\\ \end{array}\right]\\ & = (x_1 - x_2)^2 \left[\begin{array}{cc} - \sum\limits_{i=1}^{n} p_{i1} p_{i2} & - \sum\limits_{i=1}^{n} p_{i1} p_{i2} z_i \\ - \sum\limits_{i=1}^{n} p_{i1} p_{i2} z_i & - \sum\limits_{i=1}^{n} p_{i1} p_{i2} z_i^2 \\ \end{array}\right] \end{array} \end{split}\]

4.

Let us find the conditions under which the Hessian is negative definite. Notice that \((x_1 - x_2)^2 > 0\) if and only if \(x_1 \neq x_2\), and \(p_{i1} p_{i2} > 0\) for all \(i\). Thus, if \(x_1 \neq x_2\), we only need to check the condition \(det(H_\beta l) > 0\). Notice that:

\[\begin{split} \begin{array}{ll} & \quad \quad \quad det(H_\beta l) > 0\\ &\iff (\sum_i p_{i1}p_{i2})(\sum_i p_{i1} p_{i2} z_i^2) - (\sum_i p_{i1} p_{i2} z_i)^2 > 0. \\ & \iff \sum_{i, j} p_{i1}p_{i2}p_{j1}p_{j2}z_j^2 - \sum_{i, j} p_{i1}p_{i2}p_{j1}p_{j2}z_i z_j > 0\\ & \iff \sum_{i > j} p_{i1}p_{i2}p_{j1}p_{j2}(z_i^2 + z_j^2) - \sum_{i > j} p_{i1}p_{i2}p_{j1}p_{j2}(2 z_i z_j) > 0\\ & \iff \sum_{i > j} p_{i1}p_{i2}p_{j1}p_{j2}(z_i - z_j)^2 > 0.\\ & \iff z_i \neq z_j \text{ for some } i, j. \end{array} \end{split}\]

Thus, we get a sufficient condition for the unique maximizer:

  • \(x_1 \neq x_2\),

  • \(z_i \neq z_j\) for some \(i, j\).

It’s easy to show this condition is also a necessary condition. Since if \(x_1 = x_2\), \(\beta_1\) has no impact on the likelihood function, and if \(z_i = z_j\) for all \(i, j\), we cannot distinguish \(\beta_1\) from \(\beta_2\) (this is called strict multicollinearity in econometrics).

Thus, the logit model has a unique ML estimator if and only if \(x_1 \neq x_2\), and \(z_i \neq z_j\) for some \(i, j\).

The intuition is that if the model can be estimated (identifiable in econometrics jargon), the two alternatives cannot be the same (\(x_1 \neq x_2\)), and at least two people in the data set should have different characteristics (\(z_i \neq z_j\) for some \(i, j\)).

5.

We have shown necessary and sufficient conditions for the Hessian of the log-likelihood function to be negative definite for all \(\beta \in \mathbb{R}^2\).

Therefore if the maximum exists, it is unique due to the fact that the objective function is strictly concave on \(\mathbb{R}^2\).

The existence of the maximum is not however guaranteed because the domain is not compact as discussed in part 2. For the Weierstrass theorem to apply we have to assume that the true parameter value \(\beta^*\) is within some bounded subset of \(\mathbb{R}^2\). This is one of the standard regularity assumptions in statistics.