#Here we use our age data to explore association and correlation. # This vector (called A) contains the actual reported ages of our 10 subjects in years. A=c(35,44,48,42,23,44,54,22,54,26) # This array contains the guesses made by our 9 in class groups. The rows are the rows samples and the columns are the group. So G[i,j] is the person i's age as guessed by group j. G=cbind( c(36,30,37,32,25,31,62,23,43,24), c(27,32,43,37,24,35,56,22,43,23), c(29,38,35,36,22,30,55,24,43,22), c(26,27,36,32,28,35,58,24,43,27), c(29,25,41,22,19,33,58,24.5,38,22), c(32,30,37,28,28,34,56,26,42,25), c(35,40,55,36,30,35,60,27,41,22), c(26,30,43,21,21,33,58,18,42,17), c(35,28,41,33,29,32,60,27,43,24) ) # First let us compare with a scatter plot the actual ages with the average of our group ages, basically the guess we would make as a class. We let mMG denote the mean guess for a given picture over all 9 of our groups. MG=apply(G,1,mean) # We look at the relationship between the actual age and the class consensus age. plot(A,MG,main="Comparing Guessed Ages with Actual Ages",xlab="Actual Ages",ylab="Class's Consensus Estimated Age") #Question 1: What is the shape of this comparison, and what does this shape tell us? #Question 2: Correlation is a measure of linear association. How can we modify our data so as to straighten out our scatter plot? # A possible answer is to assume that perceived age PA is some function of age, f(as). We'd like this function look some like our scatter plot and be as simple as possible. Here we'll make it a quadratic and have it satisfy f(25)=25, f(60)=60, f(42)=30 in order to approximate our data. Solving the linear equations gives us. PA=(2/51)*A^2-(7/3)*A+1000/17 plot(PA,MG,main="Comparing Guessed Ages with Perceived Ages",xlab=" Our Model of Perceived Ages",ylab="Class's Consensus Estimated Age") # The shape of a scatter plot is scale independent. This means we can standardize the data without affecting the shape. Let us do this: mPA=mean(PA) sPA=sd(PA) zPA=(PA-mPA)/sPA mMG=mean(MG) sMG=sd(MG) zMG=(MG-mMG)/sMG plot(zPA, zMG,main="The Standardized Version",xlab="Standardized Perceived Ages",ylab="Standardized Class's Consensus") # Once we have linear assocition we can measure it's strength via the correlation coefficient. r=cor(PA,MG) # The correlation coefficient is also scale independent. For example using the standardized variables we have cor(zPA, zMG) # Using the standardized variables we can easily compute r; N=length(A) r=sum((zPA*zMG)/(N-1)) # When working with the standardized variable the line of best fit has slope r. It is given in red in the following picture. plot(zPA, zMG,main="The Standardized Version",xlab="Standardized Perceived Ages",ylab="Standardized Class's Consensus") abline(lm(zPA ~ zMG),col="red") abline(0,1) abline(0,1/r,col="blue") # Question 3: Interpret the three lines in this picture. In particular, the blue and red lines have "the same slope". What could I possibly mean by this? #Question 4: Suppose you are given a perceived age of 40, what would be the best guess for the age that the class will guess this sample to be? # We now explore the residuals of the standardized functions. Res=zMG-r*zPA plot(PA,Res,main="The Standardized Residuals",xlab="Standardized Perceived Ages",ylab="Residuals of the Standardized Values") # Question 5: Do the residuals tells anything new? hist(Res,main="Histogram of Residuals of Standardized Values",xlab="Residuals of the Standardized Values") # Question 6: Is this roughly a standard normal. Why or why not? # Our next goal is to make predictor of the Guessed age from a given perceived age directly (with out standardizing). To do we need to re-scale and sift our line. b1=(sMG/sPA)*r b0=mMG-b1*(mPA) hatMG=b1*PA+b0 plot(PA,MG,main="Comparing Guessed Ages with Perceived Ages",xlab=" Our Model of Perceived Ages",ylab="Each Group's Estimated Age") points(PA,hatMG,col="red") # Question 6: Suppose you are given a perceived age of 40, what would be the best guess for the age that the class will guess this sample to be? # Question 7: lm(X ~ Y) should be read as "please produce a linear model to predict Guesses from FofAges." Explain what it actually does. lm(MG ~ PA)