3  Exercises 3

Exercise 3.1 (Unbiased sample variance) Let \(X\) be a random variable with expectation \(m\) and variance \(\sigma^2\), its population variance is \(\tilde{S}^2=\frac{1}{n} \sum X_{i}^{2}-\overline{X}_{n}^{2}\) with \(\overline{X}_{n}\) the empirical mean (sample mean) of \(X\) and \(\frac{1}{n} \sum X_{i}^{2}\) the empirical mean of \(X^{2}\).

  1. Compute \(E\left(\overline{X}_{n}\right)\) and \(V\left(\overline{X}_{n}\right)\) and deduce \(E(\overline{X}_{n}^{2})\).
  2. Finally, show that \(E\left(\tilde{S}^2\right)=\frac{n-1}{n} V(X)\) and deduce an unbiased estimator of the variance (we call this estimator \(S_{n}^{2}\) the sample variance).

Exercise 3.2 (Moment Estimation) Let the following sample

\[1,0,2,1,1,0,1,0,0\]

  1. Calculate its empirical mean and variance.
  2. Assuming that the data in this sample are realizations of a variable of unknown distribution, give an unbiased estimate of the expectation and variance of this distribution.
  3. We choose to model the values of this sample by a Binomial distribution \(\mathcal{B}(2, p)\). Use the empirical mean to propose a point estimate for \(p\).
  4. Using the same model, use the variance to propose another estimate for \(p\).
  5. We choose to model the values in this sample by a Poisson distribution \(\mathcal{P}(\lambda)\), which has expectation \(\lambda\). What point estimate do you propose for \(\lambda\)?

Exercise 3.3 (MLE of Geometric distribution) Some birds fly after making a few jumps on the ground. It is assumed that the number \(X\) of jumps can be modeled by a Pascal (Geometric) distribution on \(\mathbb{N}^*\):

\[P(X=x)= p(1-p)^{x-1} \quad x \geq 1\]

For \(n=130\) birds of this type, we collected the following data:

Number of jumps \(x\) 1 2 3 4 5 6 7 8 9 10 11 12
Occurence 48 31 20 9 6 5 4 2 1 1 2 1
  1. What is the Maximum Likelihood Estimator (MLE) of \(p\)?
  2. Calculate a value of this estimator using the collected data.

Exercise 3.4 (Estimators comparison) Let \(X\) be a random variable with Uniform distribution on an interval \([0, a]\) where \(a\) is an unknown parameter, and we have \(\left(X_{1}, \ldots, X_{n}\right)\) a sample of \(X\) of size \(n\). We note \(\overline{X}_{n}\) the empirical mean of \(X\).

  1. Let \(T_{n}=2 \overline{X}_{n}\) be estimator of \(a\) using the method of moments. Show that \(T_{n}\) is an unbiased estimator of \(a\) and calculate its mean squarred error.
  2. Let \(T_{n}^{\prime}=\max \left(X_{1}, \ldots, X_{n}\right)\). Show that \(T_n^{\prime}\) is the maximum likelihood estimator of \(a\).
  3. Give the distribution function of \(T_{n}^{\prime}\). Deduce a density of \(T_{n}^{\prime}\), then its bias and its mean squarred error.
  4. Let \(T_{n}^{\prime \prime}=\displaystyle \frac{n+1}{n} T_{n}^{\prime}\). Determine its bias and its mean squarred error.
  5. For large values of \(n\), what is the best estimator of \(a\)?

Simulation

Let \(\left(X_{1}, \ldots, X_{n}\right)\) a sample of a Uniform distribution on \([0, \theta]\) where \(\theta\) is an unknown parameter.

Consider the following convergent estimators of \(\theta\).

\[ \begin{aligned} T_{1}=& \frac{2}{n} \times\left(X_{1}+\ldots+X_{n}\right) \\ T_{2}=& \sqrt{\frac{3}{n} \times\left(X_{1}^{2}+\ldots+X_{n}^{2}\right)} \\ T_{3}=&\left(\frac{4}{n} \times\left(X_{1}^{3}+\ldots+X_{n}^{3}\right)\right)^{\frac{1}{3}} \\ T_{4}=&\left(\frac{3}{2 n} \times(\sqrt{X_{1}}+\ldots+\sqrt{X_{n}})\right)^{2} \\ T_{5}=&\left(\frac{1}{2 n} \times\left(\frac{1}{\sqrt{X_{1}}}+\ldots+\frac{1}{\sqrt{X_{n}}}\right)\right)^{-2} \\ T_{6}=& \exp (1) \times\left(X_{1} \times \ldots \times X_{n}\right)^{\frac{1}{n}} \\ T_{7}=& \max \left\{X_{1}, \ldots, X_{n}\right\} \\ T_{8}=& \frac{n+1}{n} \max \left\{X_{1}, \ldots, X_{n}\right\} \end{aligned} \]

  • \(T1\) is the moment estimator of \(\theta\).
  • \(T7\) is the biased MLE of \(\theta\).
  • \(T8\) is the unbiased MLE of \(\theta\).

1. Choose a value of \(\theta\) and simulate 1000 samples of size 100 of the Uniform distribution on \([0, \theta]\). Compute for each of these samples the value taken by the 8 estimators.

We can then create a matrix with 1000 rows and 8 columns where the jth column contains the 1000 realizations of the estimator \(T_{j}\).

2. Calculate the empirical mean and the empirical variance of the 8 samples of size 1000 thus obtained. Deduce an estimate of the bias and the squared error of each of the 8 estimators.

Recall that for an estimator \(T\), the bias is \(E[T]-\theta\) and the MSE is \(E\left[(T-\theta)^{2}\right]\).

3. Which estimators are the least biased? and which have the least MSE?

4. Show on the same graph the boxplots of the 8 estimators. Show on the same graph the true value of the parameter, in red. By looking on each estimator: which one would you prefer to use?

Extra

Exercise 3.5 A company manufactures electrical devices, each consisting of two main components.
Let \(X\) be the random variable representing the number of defective components in a randomly chosen device.
The device is declared defective if at least one of its two components is defective.
We assume that the distribution of \(X\) belongs to a family depending on a parameter \(\theta\), defined as:

Values of \(X\) 0 1 2
Probabilities \(1-3\theta\) \(2\theta\) \(\theta\)

  1. For which values of \(\theta\) does the table define a probability distribution?
  2. Compute \(E_\theta(X)\) and \(\mathrm{Var}_\theta(X)\).

Let \(X_1, \dots, X_n\) be a sample of \(X\).

  1. Write down the likelihood function of this model. (Hint: Let \(n_0, n_1, n_2\) be the counts of values 0, 1, 2 in the sample (\(n_0+n_1+n_2=n\)).)
  2. Find the maximum likelihood estimator (MLE) of this model.
      1. Is this estimator unbiased?
      1. Compute its mean squared error (MSE).
      1. Is this estimator consistent?
  3. Which estimator is obtained using the method of moments based on the first moment of \(X\)?
    Is this estimator unbiased? Compute its mean squared error.

For \(i = 1, \dots, n\), define \(Z_i\) by: \(Z_i = 1\) if \(X_i = 0\), and \(Z_i = 0\) otherwise.

  1. Write down the likelihood function of this new model.
  2. Find the MLE of this model.
      1. Is this estimator unbiased?
      1. Compute its mean squared error.
      1. Is this estimator consistent?
  3. Which estimations of \(\theta\) are obtained from the different estimators in this exercise, given the following data:
Values of \(X\) 0 1 2
Frequencies 49 7 4
  1. Which estimator would you prefer, and why?