Math 50 Fall 2017, Homework #5

NOTE: For your homework download and use the template (https://math.dartmouth.edu/~m50f17/HW5.Rmd)

Read the green comments in the rmd file to see where your answers should go.

Normal probability plots

You can use qqnorm and qqline functions to plot probability plots of residuals. The functions rstandard and rstudent calculate the standardized residuals and R-student residuals, respectively.

prop = read.table("https://math.dartmouth.edu/~m50f17/propellant.csv", header=T, sep=",")
age <- prop$Age
shearS <- prop$ShearS

fitted = lm(shearS ~ age)

stdRes = rstandard(fitted)
rStuRes = rstudent(fitted)

qqnorm(rStuRes, main="Normal Probability Plot (residuals on vertical axis)") 
qqline(rStuRes)

However, note that the book uses residuals on the x-axis instead of y-axis. In order to obtain that use the parameter datax as shown below. In the below graph x-axis denotes the R-student residuals and the y-axis is the theoretical quantiles ( in the book y-axis is probability instead of quantiles).

qqnorm(rStuRes, datax = TRUE ,
       main="Normal Probability Plot") 
qqline(rStuRes, datax = TRUE )

Residual vs predicted response plot

yHat <- predict(fitted)

plot (yHat, rStuRes)
abline(0,0)

After observing that the observation points 5 and 6 look like potential outliers, next we delete those points and compare the fitted model of the deleted data with the full data.

plot(age, shearS, xlim=c(0,30), ylim=c(1600,2700))
abline(fitted$coef, lwd = 2, col = "blue")

ageRem <- age[-6] 
ageRem <- ageRem[-5] 

shearSRem <- shearS[-6]
shearSRem <- shearSRem[-5]

fitted2 = lm(shearSRem ~ ageRem)
abline(fitted2$coef, lwd = 2, col = "red")

A note

There is a dedicated library :

MPV: Data Sets from Montgomery, Peck and Vining’s Book

in order to provide an easy way to load tables from the book. To install the library type :

install.packages(“MPV”)

Below is an example how to use this library. Check https://cran.r-project.org/web/packages/MPV/MPV.pdf for table names.

library(MPV)

## 
## Attaching package: 'MPV'

## The following object is masked from 'package:datasets':
## 
##     stackloss

data(table.b1)

y <- table.b1$y
x1 <- table.b1$x1
x3 <- table.b1$x3
x8 <- table.b1$x8

y.lm <- lm(y ~ x1 + x3 + x8)
summary(y.lm)

## 
## Call:
## lm(formula = y ~ x1 + x3 + x8)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.6835 -1.5609 -0.1448  1.7188  4.0386 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 12.136744   9.145612   1.327  0.19698   
## x1           0.001123   0.001938   0.580  0.56762   
## x3           0.159582   0.296318   0.539  0.59516   
## x8          -0.006496   0.002088  -3.112  0.00475 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.42 on 24 degrees of freedom
## Multiple R-squared:  0.5702, Adjusted R-squared:  0.5164 
## F-statistic: 10.61 on 3 and 24 DF,  p-value: 0.0001244

Question-1

Solve the parts (a), (b) and (c) of Problem 4.1. In addition answer the following.

Is it possible to perform lack of fit test using the steps (4.20) to (4.24) ?

Answer:

Question-2

Chapter 4, Problem 2 all parts.

Answer:

Question-3

Chapter 4, Problem 19 all parts. In addition answer the following.

Find the point with largest (in absolute value) r-student residual as a potential outlier. Repeat the regression analysis after deleting that point from the observation data. Construct the probability plot and residual vs predicted response plot. Calculate the differences (deleted vs full data) in fitted coefficients, \(MS_{res}\) and \(R^2\). Comment on the differences in the plots and the values. Do you think it is an influential point? Do they imply any improvement?