3 Exercises 3
Exercise 3.1 (Unbiased sample variance) Let \(X\) be a random variable with expectation \(m\) and variance \(\sigma^2\), its population variance is \(\tilde{S}^2=\frac{1}{n} \sum X_{i}^{2}-\overline{X}_{n}^{2}\) with \(\overline{X}_{n}\) the empirical mean (sample mean) of \(X\) and \(\frac{1}{n} \sum X_{i}^{2}\) the empirical mean of \(X^{2}\).
- Compute \(E\left(\overline{X}_{n}\right)\) and \(V\left(\overline{X}_{n}\right)\) and deduce \(E(\overline{X}_{n}^{2})\).
- Finally, show that \(E\left(\tilde{S}^2\right)=\frac{n-1}{n} V(X)\) and deduce an unbiased estimator of the variance (we call this estimator \(S_{n}^{2}\) the sample variance).
Exercise 3.2 (Moment Estimation) Let the following sample
\[1,0,2,1,1,0,1,0,0\]
- Calculate its empirical mean and variance.
- Assuming that the data in this sample are realizations of a variable of unknown distribution, give an unbiased estimate of the expectation and variance of this distribution.
- We choose to model the values of this sample by a Binomial distribution \(\mathcal{B}(2, p)\). Use the empirical mean to propose a point estimate for \(p\).
- Using the same model, use the variance to propose another estimate for \(p\).
- We choose to model the values in this sample by a Poisson distribution \(\mathcal{P}(\lambda)\), which has expectation \(\lambda\). What point estimate do you propose for \(\lambda\)?
Exercise 3.3 (MLE of Geometric distribution) Some birds fly after making a few jumps on the ground. It is assumed that the number \(X\) of jumps can be modeled by a Pascal (Geometric) distribution on \(\mathbb{N}^*\):
\[P(X=x)= p(1-p)^{x-1} \quad x \geq 1\]
For \(n=130\) birds of this type, we collected the following data:
Number of jumps \(x\) | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
Occurence | 48 | 31 | 20 | 9 | 6 | 5 | 4 | 2 | 1 | 1 | 2 | 1 |
- What is the Maximum Likelihood Estimator (MLE) of \(p\)?
- Calculate a value of this estimator using the collected data.
Exercise 3.4 (Estimators comparison) Let \(X\) be a random variable with Uniform distribution on an interval \([0, a]\) where \(a\) is an unknown parameter, and we have \(\left(X_{1}, \ldots, X_{n}\right)\) a sample of \(X\) of size \(n\). We note \(\overline{X}_{n}\) the empirical mean of \(X\).
- Let \(T_{n}=2 \overline{X}_{n}\) be estimator of \(a\) using the method of moments. Show that \(T_{n}\) is an unbiased estimator of \(a\) and calculate its mean squarred error.
- Let \(T_{n}^{\prime}=\max \left(X_{1}, \ldots, X_{n}\right)\). Show that \(T_n^{\prime}\) is the maximum likelihood estimator of \(a\).
- Give the distribution function of \(T_{n}^{\prime}\). Deduce a density of \(T_{n}^{\prime}\), then its bias and its mean squarred error.
- Let \(T_{n}^{\prime \prime}=\displaystyle \frac{n+1}{n} T_{n}^{\prime}\). Determine its bias and its mean squarred error.
- For large values of \(n\), what is the best estimator of \(a\)?
Simulation
Let \(\left(X_{1}, \ldots, X_{n}\right)\) a sample of a Uniform distribution on \([0, \theta]\) where \(\theta\) is an unknown parameter.
Consider the following convergent estimators of \(\theta\).
\[ \begin{aligned} T_{1}=& \frac{2}{n} \times\left(X_{1}+\ldots+X_{n}\right) \\ T_{2}=& \sqrt{\frac{3}{n} \times\left(X_{1}^{2}+\ldots+X_{n}^{2}\right)} \\ T_{3}=&\left(\frac{4}{n} \times\left(X_{1}^{3}+\ldots+X_{n}^{3}\right)\right)^{\frac{1}{3}} \\ T_{4}=&\left(\frac{3}{2 n} \times(\sqrt{X_{1}}+\ldots+\sqrt{X_{n}})\right)^{2} \\ T_{5}=&\left(\frac{1}{2 n} \times\left(\frac{1}{\sqrt{X_{1}}}+\ldots+\frac{1}{\sqrt{X_{n}}}\right)\right)^{-2} \\ T_{6}=& \exp (1) \times\left(X_{1} \times \ldots \times X_{n}\right)^{\frac{1}{n}} \\ T_{7}=& \max \left\{X_{1}, \ldots, X_{n}\right\} \\ T_{8}=& \frac{n+1}{n} \max \left\{X_{1}, \ldots, X_{n}\right\} \end{aligned} \]
1. Choose a value of \(\theta\) and simulate 1000 samples of size 100 of the Uniform distribution on \([0, \theta]\). Compute for each of these samples the value taken by the 8 estimators.
We can then create a matrix with 1000 rows and 8 columns where the jth column contains the 1000 realizations of the estimator \(T_{j}\).
2. Calculate the empirical mean and the empirical variance of the 8 samples of size 1000 thus obtained. Deduce an estimate of the bias and the squared error of each of the 8 estimators.
3. Which estimators are the least biased? and which have the least MSE?
4. Show on the same graph the boxplots of the 8 estimators. Show on the same graph the true value of the parameter, in red. By looking on each estimator: which one would you prefer to use?