# Here attempt to rank our age guess data by group, the rank will be from 1 to 9 with 1 being the best. #Once gain # This vector (called A) contains the actual reported ages of our 10 subjects in years. A=c(35,44,48,42,23,44,54,22,54,26) # This array contains the guesses made by our 9 in class groups. The rows are the rows samples and the columns are the group. So G[i,j] is the person i's age as guessed by group j. G=cbind( c(36,30,37,32,25,31,62,23,43,24), c(27,32,43,37,24,35,56,22,43,23), c(29,38,35,36,22,30,55,24,43,22), c(26,27,36,32,28,35,58,24,43,27), c(29,25,41,22,19,33,58,24.5,38,22), c(32,30,37,28,28,34,56,26,42,25), c(35,40,55,36,30,35,60,27,41,22), c(26,30,43,21,21,33,58,18,42,17), c(35,28,41,33,29,32,60,27,43,24) ) # We could use the median Error MedErrors=apply(G-A,2,median) RankMedErrors=order(abs(MedErrors)) RankMedErrors # Hence group 7 had the best ranking. #We could also use the mean errors. MeanErrors=apply(G-A,2,mean) RankMeanErrors=order(abs(MeanErrors)) RankMeanErrors # Notice there are some changes. We can see the whole picture of this ranking via #We could also use the median absolute errors. MedAbsErrors=apply(abs(G-A),2,median) RankMedAbsErrors=order(MedAbsErrors) RankMedAbsErrors #We could also use the mean absolute errors. MeanAbsErrors=apply(abs(G-A),2,mean) RankMeanAbsErrors=order(MeanAbsErrors) RankMeanAbsErrors summary(abs(G-A)) # RankMedErrors RankMeanErrors RankMedAbsErrors RankMeanAbsErrors # Question 1: Which ranking do you think is best and why? What happens to ties? How could you make an even better ranking system? # One answer is to control for the individual photographs by using standard deviations form the mean for each photo. #(Importnat Note)# Often, once all confounding factors have been controlled for (and outliers have been explained) the errors are often approximately normal. (This is a part of the Entropy principle.) Let us control for the photos and see what happens. m=apply((G-A),1,mean) s=apply((G-A),1,sd) SdErrors=((G-A)-m)/s hist(SdErrors,nclass=7,freq=FALSE) X=seq(-3,3,by=1/(2*3)) points(X,dnorm(X,0,1),col="blue",type="l") #Question 2: Assuming a normal model, roughly what percent of the errors do we expect to have a standardized score greater than 2 in absolute value. How many were there in our data? 2*(1-pnorm(2,0,1)) # Question 3: Assuming a normal model, what standardized score has the property that roughly half the absolute values of the errors will be larger than this value. How many of our data points satisfy this condition? qnorm(.25,0,1) ############ # The means (say) of the standardized errors themselves are likely to form a poor ranking system? Why and how should we modify them? #: A possible answer: RankingErrors=abs(SdErrors+m/s) MeanStandErrors=apply(RankingErrors,2,mean) RankMeanStandErrors=order(MeanStandErrors) RankMeanStandErrors summary(RankingErrors) # Question 4: Compare RankMeanStandErrors and RankMeanAbsErrors. Are they measuring pretty much the same thing. Can you explain the differences looking at the data? ####### We can also comparing our rankings visually using a scatter plot. plot( order(RankMeanAbsErrors),order(RankMeanStandErrors),main="Comparing Error Rankings",xlab="Using Mean Absolute Errors",ylab="Using Mean Absolute Rescaled Errors") # What does it mean to be on the "diagonal"?