3  Exercises 3

Exercise 3.1 (Unbiased sample variance) Let \(X\) be a random variable with expectation \(m\) and variance \(\sigma^2\), its population variance is \(\tilde{S}^2=\frac{1}{n} \sum X_{i}^{2}-\overline{X}_{n}^{2}\) with \(\overline{X}_{n}\) the empirical mean (sample mean) of \(X\) and \(\frac{1}{n} \sum X_{i}^{2}\) the empirical mean of \(X^{2}\).

  1. Compute \(E\left(\overline{X}_{n}\right)\) and \(V\left(\overline{X}_{n}\right)\) and deduce \(E(\overline{X}_{n}^{2})\).
  2. Finally, show that \(E\left(\tilde{S}^2\right)=\frac{n-1}{n} V(X)\) and deduce an unbiased estimator of the variance (we call this estimator \(S_{n}^{2}\) the sample variance).

Exercise 3.2 (Moment Estimation) Let the following sample

\[1,0,2,1,1,0,1,0,0\]

  1. Calculate its empirical mean and variance.
  2. Assuming that the data in this sample are realizations of a variable of unknown distribution, give an unbiased estimate of the expectation and variance of this distribution.
  3. We choose to model the values of this sample by a Binomial distribution \(\mathcal{B}(2, p)\). Use the empirical mean to propose a point estimate for \(p\).
  4. Using the same model, use the variance to propose another estimate for \(p\).
  5. We choose to model the values in this sample by a Poisson distribution \(\mathcal{P}(\lambda)\), which has expectation \(\lambda\). What point estimate do you propose for \(\lambda\)?

Exercise 3.3 (MLE of Geometric distribution) Some birds fly after making a few jumps on the ground. It is assumed that the number \(X\) of jumps can be modeled by a Pascal (Geometric) distribution on \(\mathbb{N}^*\):

\[P(X=x)= p(1-p)^{x-1} \quad x \geq 1\]

For \(n=130\) birds of this type, we collected the following data:

Number of jumps \(x\) 1 2 3 4 5 6 7 8 9 10 11 12
Occurence 48 31 20 9 6 5 4 2 1 1 2 1
  1. What is the Maximum Likelihood Estimator (MLE) of \(p\)?
  2. Calculate a value of this estimator using the collected data.

Exercise 3.4 (Estimators comparison) Let \(X\) be a random variable with Uniform distribution on an interval \([0, a]\) where \(a\) is an unknown parameter, and we have \(\left(X_{1}, \ldots, X_{n}\right)\) a sample of \(X\) of size \(n\). We note \(\overline{X}_{n}\) the empirical mean of \(X\).

  1. Let \(T_{n}=2 \overline{X}_{n}\) be estimator of \(a\) using the method of moments. Show that \(T_{n}\) is an unbiased estimator of \(a\) and calculate its mean squarred error.
  2. Let \(T_{n}^{\prime}=\max \left(X_{1}, \ldots, X_{n}\right)\). Show that \(T_n^{\prime}\) is the maximum likelihood estimator of \(a\).
  3. Give the distribution function of \(T_{n}^{\prime}\). Deduce a density of \(T_{n}^{\prime}\), then its bias and its mean squarred error.
  4. Let \(T_{n}^{\prime \prime}=\displaystyle \frac{n+1}{n} T_{n}^{\prime}\). Determine its bias and its mean squarred error.
  5. For large values of \(n\), what is the best estimator of \(a\)?

Simulation

Let \(\left(X_{1}, \ldots, X_{n}\right)\) a sample of a Uniform distribution on \([0, \theta]\) where \(\theta\) is an unknown parameter.

Consider the following convergent estimators of \(\theta\).

\[ \begin{aligned} T_{1}=& \frac{2}{n} \times\left(X_{1}+\ldots+X_{n}\right) \\ T_{2}=& \sqrt{\frac{3}{n} \times\left(X_{1}^{2}+\ldots+X_{n}^{2}\right)} \\ T_{3}=&\left(\frac{4}{n} \times\left(X_{1}^{3}+\ldots+X_{n}^{3}\right)\right)^{\frac{1}{3}} \\ T_{4}=&\left(\frac{3}{2 n} \times(\sqrt{X_{1}}+\ldots+\sqrt{X_{n}})\right)^{2} \\ T_{5}=&\left(\frac{1}{2 n} \times\left(\frac{1}{\sqrt{X_{1}}}+\ldots+\frac{1}{\sqrt{X_{n}}}\right)\right)^{-2} \\ T_{6}=& \exp (1) \times\left(X_{1} \times \ldots \times X_{n}\right)^{\frac{1}{n}} \\ T_{7}=& \max \left\{X_{1}, \ldots, X_{n}\right\} \\ T_{8}=& \frac{n+1}{n} \max \left\{X_{1}, \ldots, X_{n}\right\} \end{aligned} \]

  • \(T1\) is the moment estimator of \(\theta\).
  • \(T7\) is the biased MLE of \(\theta\).
  • \(T8\) is the unbiased MLE of \(\theta\).

1. Choose a value of \(\theta\) and simulate 1000 samples of size 100 of the Uniform distribution on \([0, \theta]\). Compute for each of these samples the value taken by the 8 estimators.

We can then create a matrix with 1000 rows and 8 columns where the jth column contains the 1000 realizations of the estimator \(T_{j}\).

2. Calculate the empirical mean and the empirical variance of the 8 samples of size 1000 thus obtained. Deduce an estimate of the bias and the squared error of each of the 8 estimators.

Recall that for an estimator \(T\), the bias is \(E[T]-\theta\) and the MSE is \(E\left[(T-\theta)^{2}\right]\).

3. Which estimators are the least biased? and which have the least MSE?

4. Show on the same graph the boxplots of the 8 estimators. Show on the same graph the true value of the parameter, in red. By looking on each estimator: which one would you prefer to use?