3 Exercises 3
Exercise 3.1 (Unbiased sample variance) Let \(X\) be a random variable with expectation \(m\) and variance \(\sigma^2\), its population variance is \(\tilde{S}^2=\frac{1}{n} \sum X_{i}^{2}-\overline{X}_{n}^{2}\) with \(\overline{X}_{n}\) the empirical mean (sample mean) of \(X\) and \(\frac{1}{n} \sum X_{i}^{2}\) the empirical mean of \(X^{2}\).
- Compute \(E\left(\overline{X}_{n}\right)\) and \(V\left(\overline{X}_{n}\right)\) and deduce \(E(\overline{X}_{n}^{2})\).
- Finally, show that \(E\left(\tilde{S}^2\right)=\frac{n-1}{n} V(X)\) and deduce an unbiased estimator of the variance (we call this estimator \(S_{n}^{2}\) the sample variance).
Exercise 3.2 (Moment Estimation) Let the following sample
\[1,0,2,1,1,0,1,0,0\]
- Calculate its empirical mean and variance.
- Assuming that the data in this sample are realizations of a variable of unknown distribution, give an unbiased estimate of the expectation and variance of this distribution.
- We choose to model the values of this sample by a Binomial distribution \(\mathcal{B}(2, p)\). Use the empirical mean to propose a point estimate for \(p\).
- Using the same model, use the variance to propose another estimate for \(p\).
- We choose to model the values in this sample by a Poisson distribution \(\mathcal{P}(\lambda)\), which has expectation \(\lambda\). What point estimate do you propose for \(\lambda\)?
Exercise 3.3 (MLE of Geometric distribution) Some birds fly after making a few jumps on the ground. It is assumed that the number \(X\) of jumps can be modeled by a Pascal (Geometric) distribution on \(\mathbb{N}^*\):
\[P(X=x)= p(1-p)^{x-1} \quad x \geq 1\]
For \(n=130\) birds of this type, we collected the following data:
Number of jumps \(x\) | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
Occurence | 48 | 31 | 20 | 9 | 6 | 5 | 4 | 2 | 1 | 1 | 2 | 1 |
- What is the Maximum Likelihood Estimator (MLE) of \(p\)?
- Calculate a value of this estimator using the collected data.
Exercise 3.4 (Estimators comparison) Let \(X\) be a random variable with Uniform distribution on an interval \([0, a]\) where \(a\) is an unknown parameter, and we have \(\left(X_{1}, \ldots, X_{n}\right)\) a sample of \(X\) of size \(n\). We note \(\overline{X}_{n}\) the empirical mean of \(X\).
- Let \(T_{n}=2 \overline{X}_{n}\) be estimator of \(a\) using the method of moments. Show that \(T_{n}\) is an unbiased estimator of \(a\) and calculate its mean squarred error.
- Let \(T_{n}^{\prime}=\max \left(X_{1}, \ldots, X_{n}\right)\). Show that \(T_n^{\prime}\) is the maximum likelihood estimator of \(a\).
- Give the distribution function of \(T_{n}^{\prime}\). Deduce a density of \(T_{n}^{\prime}\), then its bias and its mean squarred error.
- Let \(T_{n}^{\prime \prime}=\displaystyle \frac{n+1}{n} T_{n}^{\prime}\). Determine its bias and its mean squarred error.
- For large values of \(n\), what is the best estimator of \(a\)?
Simulation
Let \(\left(X_{1}, \ldots, X_{n}\right)\) a sample of a Uniform distribution on \([0, \theta]\) where \(\theta\) is an unknown parameter.
Consider the following convergent estimators of \(\theta\).
\[ \begin{aligned} T_{1}=& \frac{2}{n} \times\left(X_{1}+\ldots+X_{n}\right) \\ T_{2}=& \sqrt{\frac{3}{n} \times\left(X_{1}^{2}+\ldots+X_{n}^{2}\right)} \\ T_{3}=&\left(\frac{4}{n} \times\left(X_{1}^{3}+\ldots+X_{n}^{3}\right)\right)^{\frac{1}{3}} \\ T_{4}=&\left(\frac{3}{2 n} \times(\sqrt{X_{1}}+\ldots+\sqrt{X_{n}})\right)^{2} \\ T_{5}=&\left(\frac{1}{2 n} \times\left(\frac{1}{\sqrt{X_{1}}}+\ldots+\frac{1}{\sqrt{X_{n}}}\right)\right)^{-2} \\ T_{6}=& \exp (1) \times\left(X_{1} \times \ldots \times X_{n}\right)^{\frac{1}{n}} \\ T_{7}=& \max \left\{X_{1}, \ldots, X_{n}\right\} \\ T_{8}=& \frac{n+1}{n} \max \left\{X_{1}, \ldots, X_{n}\right\} \end{aligned} \]
1. Choose a value of \(\theta\) and simulate 1000 samples of size 100 of the Uniform distribution on \([0, \theta]\). Compute for each of these samples the value taken by the 8 estimators.
We can then create a matrix with 1000 rows and 8 columns where the jth column contains the 1000 realizations of the estimator \(T_{j}\).
2. Calculate the empirical mean and the empirical variance of the 8 samples of size 1000 thus obtained. Deduce an estimate of the bias and the squared error of each of the 8 estimators.
3. Which estimators are the least biased? and which have the least MSE?
4. Show on the same graph the boxplots of the 8 estimators. Show on the same graph the true value of the parameter, in red. By looking on each estimator: which one would you prefer to use?
Extra
Exercise 3.5 A company manufactures electrical devices, each consisting of two main components.
Let \(X\) be the random variable representing the number of defective components in a randomly chosen device.
The device is declared defective if at least one of its two components is defective.
We assume that the distribution of \(X\) belongs to a family depending on a parameter \(\theta\), defined as:
Values of \(X\) | 0 | 1 | 2 |
---|---|---|---|
Probabilities | \(1-3\theta\) | \(2\theta\) | \(\theta\) |
- For which values of \(\theta\) does the table define a probability distribution?
- Compute \(E_\theta(X)\) and \(\mathrm{Var}_\theta(X)\).
Let \(X_1, \dots, X_n\) be a sample of \(X\).
- Write down the likelihood function of this model. (Hint: Let \(n_0, n_1, n_2\) be the counts of values 0, 1, 2 in the sample (\(n_0+n_1+n_2=n\)).)
- Find the maximum likelihood estimator (MLE) of this model.
- Is this estimator unbiased?
- Is this estimator unbiased?
- Compute its mean squared error (MSE).
- Compute its mean squared error (MSE).
- Is this estimator consistent?
- Which estimator is obtained using the method of moments based on the first moment of \(X\)?
Is this estimator unbiased? Compute its mean squared error.
For \(i = 1, \dots, n\), define \(Z_i\) by: \(Z_i = 1\) if \(X_i = 0\), and \(Z_i = 0\) otherwise.
- Write down the likelihood function of this new model.
- Find the MLE of this model.
- Is this estimator unbiased?
- Is this estimator unbiased?
- Compute its mean squared error.
- Compute its mean squared error.
- Is this estimator consistent?
- Which estimations of \(\theta\) are obtained from the different estimators in this exercise, given the following data:
Values of \(X\) | 0 | 1 | 2 |
---|---|---|---|
Frequencies | 49 | 7 | 4 |
- Which estimator would you prefer, and why?