NOTE: For your homework download and use the template (https://math.dartmouth.edu/~m50f17/HW2.Rmd)

Read the green comments in the rmd file to see where your answers should go.


Question-1 (Sample)

Given a fixed confidence interval percentage (say 95%) at what value of x does CI on the mean response achieve its minimum width?

Write an R-chunk using the propellant data which computes the following.

  1. Fit a simple linear regression model relating shear strength to age.
  2. Plot scatter diagram.
  3. Plot two curves (in blue color) that traces upper and lower limits of 95% confidence interval on \(E(y|x_0)\)
  4. Plot two curves (in red color) that traces upper and lower limits of 95% prediction interval for \(y\)
  5. Print the 95% quantile of the corresponding t distribution

Answer:

The width of the interval is \[ 2 t_{\alpha/2,n-2} \sqrt{MS_{Res} ((1/n) + (x_0 - \bar{x})^2 / S_{xx} } \] and all terms inside the square root are positive. Therefore it is minimized when \(x_0=\bar{x}\).

# Computation part of the answer : 

prop<-read.table("https://math.dartmouth.edu/~m50f17/propellant.csv", header=T, sep=",")

shearS<-prop$ShearS 
age<-prop$Age

plot(age, shearS, xlab = "Propellant Age (weeks)", ylab = "Shear S. (psi)", main = "Rocket Propellant")
fitted <- lm(shearS ~ age)

ageList <- seq(0,25,0.5)
cList <- predict(fitted, list(age = ageList), int = "c", level = 0.95)
pList <- predict(fitted, list(age = ageList), int = "p", level = 0.95)

matlines(ageList, pList, lty='solid' ,  col = "red")
matlines(ageList, cList, lty = 'solid', col = "blue") 

# since n=20 we look at the t_18 distribution
wantedQuantile <- qt( 0.95, 18) ; 
cat("95% quantile is of t_18 is : ", wantedQuantile ) ; 
## 95% quantile is of t_18 is :  1.734064

Question-2

Plot the same graph as in Question-1 without using R function predict, but instead directly calculating the interval limits we discussed in class. In particular, what are the limits of 95% confidence interval on \(E(y|x_0)\)?

Answer:

# Computation part of the answer : 
Sxx<-sum((age-mean(age))^2)
Sxy<-sum((age-mean(age))*shearS)
beta1Hat<-Sxy/Sxx
beta0Hat<-mean(shearS)-Sxy/Sxx*mean(age)
plot(age,shearS)
abline(c(beta0Hat,beta1Hat), lwd=2, col='blue')

N <- length(age) ; 
dof <- N - 2 ; 
muHat <- beta0Hat + beta1Hat *  ageList ;
yHat <- fitted$fitted.values  ; 

SSres <- sum((shearS - yHat)^2);
MSres <- SSres / dof  ; 
xBar <- mean (age) 

tQuantile_975 <- qt( 1 - 0.05/2 , 18) ; 

upperCI <- muHat + tQuantile_975 * sqrt ( MSres * ( (1/N) +  (ageList-xBar)^2 / Sxx ) ) 
lowerCI <- muHat - tQuantile_975 * sqrt ( MSres * ( (1/N) +  (ageList-xBar)^2 / Sxx ) ) 

matlines(ageList, upperCI, lty = 'solid', col = "blue") 
matlines(ageList, lowerCI, lty = 'solid', col = "blue") 

yHatList <- beta0Hat + beta1Hat *  ageList ; 
upperPI <- yHatList + tQuantile_975 * sqrt ( MSres * ( 1 + (1/N) +  (ageList-xBar)^2 / Sxx ) ) 
lowerPI <- yHatList - tQuantile_975 * sqrt ( MSres * ( 1 + (1/N) +  (ageList-xBar)^2 / Sxx ) ) 

matlines(ageList, upperPI, lty = 'solid', col = "red") 
matlines(ageList, lowerPI, lty = 'solid', col = "red") 


Question-3

Load the propellant data and fit a simple linear regression model relating shear strength to age.

  1. Test the hypothesis \(\beta_1 = -30\) using confidence level 97.5%.
  2. Calculate the limits of 97.5% confidence interval for \(\beta_0\) and \(\beta_1\)
  3. Is there any relation between the answers you find in part (a) and (b) ?
  4. Calculate \(R^2\)

Answer:

# Computation part of the answer : 

seBeta1Hat = sqrt ( MSres / Sxx ) ; 
guess = -30 ; 
testStat = ( beta1Hat - guess ) / seBeta1Hat ; 

tQuantile_9875 <- qt( 1 - 0.025/2 , 18) ; 

cat ("Part (a)
 Absolute value of test statistic is " , abs(testStat)  , " Since it is greater than  t_quantile=" ,  tQuantile_9875 , "
 we REJECT the hypothesis \n\n")
## Part (a)
##  Absolute value of test statistic is  2.476056  Since it is greater than  t_quantile= 2.445006 
##  we REJECT the hypothesis
seBeta0Hat = sqrt ( MSres * ( (1/N) +  (xBar)^2 / Sxx ) )  ; 
cat ("Part (b)  
 Lower and upper bounds of 97.5% CI for beta0 are : " , beta0Hat - tQuantile_9875 * seBeta0Hat ," and ",  beta0Hat + tQuantile_9875 * seBeta0Hat ,  "
 Lower and upper bounds of 97.5% CI for beta1 are : " , beta1Hat - tQuantile_9875 * seBeta1Hat ," and ",  beta1Hat + tQuantile_9875 * seBeta1Hat,  "\n\n")
## Part (b)  
##  Lower and upper bounds of 97.5% CI for beta0 are :  2519.792  and  2735.852 
##  Lower and upper bounds of 97.5% CI for beta1 are :  -44.21747  and  -30.08971
cat ("Part (c) 
 Yes, t-test is equivalent to testing whether -30 is in the CI for beta1. We observe that -30 is out of the above confidence interval for beta1, thus hypothesis is rejected in part b \n\n")
## Part (c) 
##  Yes, t-test is equivalent to testing whether -30 is in the CI for beta1. We observe that -30 is out of the above confidence interval for beta1, thus hypothesis is rejected in part b
yBar <- mean (shearS)
SSt <- sum((shearS - yBar)^2);
SSr <- sum((yHat - yBar)^2);
cat ("Part (d) 
 R-square is : " ,  SSr / SSt )
## Part (d) 
##  R-square is :  0.9018414

Question-4

Load the propellant data. This time let us consider a relation between square of shear strength and propellant age.

  1. Fit a simple linear regression model relating square of shear strength to age. Plot scatter diagram and fitted line.
  2. Using analysis-of-variance test for significance of regression (using the formulas we discussed in class)
  3. Use t-test and check significance of regression (using the formulas we discussed in class)
  4. Does the regression analysis predict a linear relationship between square of shear strength and propellant age ?

Answer:

# Computation part of the answer : 

prop<-read.table("https://math.dartmouth.edu/~m50f17/propellant.csv", header=T, sep=",")
shearS<-prop$ShearS 
age<-prop$Age
shear_SQ<- (prop$ShearS)^2 
plot(age, shear_SQ, xlab = "Propellant Age (weeks)", ylab = "Shear S. Squared", main = "Rocket Propellant")
fitted_SQ <- lm(shear_SQ ~ age)
abline(fitted_SQ$coef, lwd=2, col='blue')

yHat_SQ <- fitted_SQ$fitted.values  ; 
beta1Hat_SQ <- fitted_SQ$coefficients[[2]]

SSres_SQ <- sum((shear_SQ - yHat_SQ)^2);
MSres_SQ <- SSres_SQ / dof  ; 
Sxx<-sum((age-mean(age))^2)

seBeta1Hat_SQ = sqrt ( MSres_SQ / Sxx ) ; 
testStat_SQ = ( beta1Hat_SQ - 0 ) / seBeta1Hat_SQ ; 
tQuantile_975 <- qt( 1 - 0.05/2 , 18) ; 


fQuantile95 <- qf(0.95,1,18) ; 
yBar_SQ <- mean(shear_SQ)
SSr_SQ <- sum((yBar_SQ - yHat_SQ)^2);
SSt_SQ <- sum((yBar_SQ - shear_SQ)^2);
MSres_SQ <- SSres_SQ / dof  ; 
MSr_SQ <- SSr_SQ / 1  ; 

fStat <- MSr_SQ / MSres_SQ ; 

  
cat ("Part (b)  
  F statistic is " , fStat  , " Since it is greater than  f_quantile=" ,  fQuantile95, " analysis of variance test REJECTS the hypothesis of significance of regression  \n\n")
## Part (b)  
##   F statistic is  165.7173  Since it is greater than  f_quantile= 4.413873  analysis of variance test REJECTS the hypothesis of significance of regression
cat ("Part (c)  
  Absolute value of test statistic is " , abs(testStat_SQ)  , " Since it is greater than  t_quantile=" ,  tQuantile_975,  " 
 we REJECT the hypothesis of significance of regression \n\n")
## Part (c)  
##   Absolute value of test statistic is  12.87312  Since it is greater than  t_quantile= 2.100922  
##  we REJECT the hypothesis of significance of regression
cat ("Part (d)  
 Yes, since rejection of beta1=0 hypothesis above suggests a linear relationship.")
## Part (d)  
##  Yes, since rejection of beta1=0 hypothesis above suggests a linear relationship.

Question-5

Once again using propellant data fit a simple linear regression model between shear strength and propellant age. Consider the steps that we used to obtain t-test for hypothesis \(\beta_1 = G_1\), and following similar steps in order to develop a test for \(\beta_1 > G_1\) instead. Then

  1. Test the hypothesis statement \[\beta_1 > -50\] with confidence level 99.9%.
  2. Find the smallest value \(G_1\) such that the above hypothesis statement is rejected.
  3. Similarly what is the smallest value \(G_0\) such that the statement “\(\beta_0 > G_0\) with probability 0.999” is rejected.

Answer:

First note that, with probability of 0.999 the value of \((\hat{\beta}_1 -\beta_1 )/se(\hat{\beta}_1)\) is smaller than \(q_{0.999}\) (\(0.999^{th}\) t-Quantile value). Thus, \(\beta_1\) is greater than \(\hat{\beta}_1 - q_{0.999} se(\hat{\beta}_1)\) with the same probability.

# Computation part of the answer : 
tQuantile_999 =  qt( 0.999 , 18) ;

cat ( "Using the note above,  beta1 is greater than (beta1Hat - seBeta1Hat * tQuantile_999 )  = ", 
      (beta1Hat - seBeta1Hat * tQuantile_999 ) ,"   with probability 0.999  \n\n" ) 
## Using the note above,  beta1 is greater than (beta1Hat - seBeta1Hat * tQuantile_999 )  =  -47.58467    with probability 0.999
wantedQuantile <- ( beta1Hat - (-50) ) / seBeta1Hat  ;
cat ("Part (a) 
  Test is not rejected since -50 is smaller than " ,  (beta1Hat - seBeta1Hat * tQuantile_999  ) , " 

  Alternatively, you can look at the p-value for the test :  
  p-value : " , pt(wantedQuantile, dof) , "  which is larger than 0.999 \n\n")
## Part (a) 
##   Test is not rejected since -50 is smaller than  -47.58467  
## 
##   Alternatively, you can look at the p-value for the test :  
##   p-value :  0.9998441   which is larger than 0.999
cat ("Part (b) 
 For any value greater than " ,  (beta1Hat - seBeta1Hat * tQuantile_999  ) , "  test is rejected \n\n" ) 
## Part (b) 
##  For any value greater than  -47.58467   test is rejected
cat ("Part (c) \n
 Similar to above Go=(beta0Hat - seBeta0Hat * tQuantile_999 ) 
 Go=", (beta0Hat - seBeta0Hat * tQuantile_999 ), "\n " )
## Part (c) 
## 
##  Similar to above Go=(beta0Hat - seBeta0Hat * tQuantile_999 ) 
##  Go= 2468.297 
##