Main References & Credits

Readings:

Dalal, Siddhartha R., Edward B. Fowlkes, and Bruce Hoadley. 1989. “Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure.” Journal of the American Statistical Association 84 (408): 945–57. https://doi.org/10.1080/01621459.1989.10478858.
FISHER, R. A. 1936. “THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS.” Annals of Eugenics 7 (2): 179–88. https://doi.org/10.1111/j.1469-1809.1936.tb02137.x.
Presidential Commission on the Space Shuttle Challenger Accident. 1986. Report of the Presidential Commission on the Space Shuttle Challenger Accident (Vols. 1 & 2). Washington, DC. http://history.nasa.gov/rogersrep/genindex.htm.

  1. Also known as SSE: Sum of Squared Errors.↩︎

  2. They are unique and always exist. They can be obtained by solving \(\frac{\partial}{\partial \beta_0}\text{RSS}(\beta_0,\beta_1)=0\) and \(\frac{\partial}{\partial \beta_1}\text{RSS}(\beta_0,\beta_1)=0\).↩︎

  3. If \(\beta_1 = 0\) this means \(COV(X,Y)=0\). Remember that if the covariance is null that doesn’t necessarily means that \(X\) and \(Y\) are independent, this means there is no linear relationship between them, they are maybe independents or they have other type of relationships.↩︎

  4. Recall that SSR is different from RSS (Residual Sum of Squares)↩︎

  5. Recall that SSE and RSS (for \((\hat \beta_0,\hat \beta_1)\)) are just different names for referring to the same quantity: \(\text{SSE}=\sum_{i=1}^n\left(Y_i-\hat Y_i\right)^2=\sum_{i=1}^n\left(Y_i-\hat \beta_0-\hat \beta_1X_i\right)^2=\mathrm{RSS}\left(\hat \beta_0,\hat \beta_1\right)\).↩︎

  6. The \(F_{n,m}\) distribution arises as the quotient of two independent random variables \(\chi^2_n\) and \(\chi^2_m\), \(\frac{\chi^2_n/n}{\chi^2_m/m}\).↩︎

  7. Important to be sure that \(\hat{\beta}\) is minimising RSS.↩︎

  8. Recal that ESS is the explained sum of squares, ESS = TSS - RSS.↩︎

  9. More complex – included here just for clarification of the anova’s output.↩︎

  10. Recall that \(R^2 = 1- \frac{\text{RSS}}{\text{TSS}}\)↩︎

  11. It is defined as \(R_{adj}^2 = 1- \frac{\text{RSS}/(n-p-1)}{\text{TSS}/(n-1)} = 1- \frac{\text{RSS}}{\text{TSS}}\times\frac{n-1}{n-p-1}\)↩︎

  12. in the formula, \(\log\) is the natural logarithm \(\ln\)↩︎

  13. Old Faithful, is a hydrothermal geyser in Yellowstone National Park in the state of Wyoming, U.S.A., and is a popular tourist attraction. Its name stems from the supposed regularity of its eruptions. The data set comprises 272 observations, each of which represents a single eruption and contains two variables corresponding to the duration in minutes of the eruption, and the time until the next eruption, also in minutes.↩︎

  14. Source: the famous MOOC Statistical Learning↩︎

  15. Source: Trevor Hastie’s website↩︎

  16. Source: Marvin Wright’s talk from Why R? 2019↩︎

  17. An Introduction to Recursive Partitioning Using the rpart Routines - Details of the rpart package.↩︎

  18. rpart.plot Package - Detailed manual on plotting with rpart using the rpart.plot package.↩︎

  19. For classification a suggestion is mtry = \(\sqrt{p}\).↩︎

  20. generalized boosted models package↩︎

  21. Source↩︎

  22. For classification, the suggested mtry for a random forest is \(\sqrt{p}\).↩︎

  23. Old Faithful, is a hydrothermal geyser in Yellowstone National Park in the state of Wyoming, U.S.A., and is a popular tourist attraction. Its name stems from the supposed regularity of its eruptions. The data set comprises 272 observations, each of which represents a single eruption and contains two variables corresponding to the duration in minutes of the eruption, and the time until the next eruption, also in minutes.↩︎

  24. Made by Joseph J. Allaire https://github.com/jjallaire↩︎