# Simulation day! Let's all set our own "seed" so that we get some different answers! We could use the system time (accuracy=seconds), or you could use your favorite 9 digit number. Time=as.numeric(Sys.time()) set.seed(Time,kind ="Marsaglia-Multicarry") runif(1) # 1. Exploring the normal. Here is a simulation of normal data. mu=0 sigma=1 Trials=1000 Rn=rnorm(Trials,mu,sigma) # Let look at the Data produced by our simulation. Rn[1:min(10,Trials)] hist(Rn,main=paste(Trials," randomly chosen normal numbers"),xlab="Values",freq = FALSE) # We can add info to a plot with "points". Here X is a list of point that are with in 3.5 standard deviations of the mean and Y is the normal curve evaluated at these points. X=seq(mu-3.5*sigma,mu+3.5*sigma,by=(7*sigma)/100) Y=dnorm(X,mu,sigma) points(X,Y,col="blue",type="l") # Another way to look at this data is to sort it and plot the percentiles. Sn=sort(Rn) Sn[1:min(10,Trials)] Per=100*(1:Trials)/Trials plot(Sn,Per,main=paste(Trials,"Randomly chosen normal numbers"),xlab="Values",ylab="Percentile",type="l") ################# # We can use the normal in order to simulate and explore associated data. N=50 sX=rnorm(N,0,1) # Recall the if X and Y are correlated then SD(Y)=sqrt(Sd(X)^2+Sd(Res)^2) with Sd(X)=abs(r) and Sd(Res)=sqrt(1-r^2). The residuals are playing the role of an Error that is independent of X. Let us image this error is also normal and form a standardized Y variable as follows: rho=.85 Error=rnorm(N,0,1) sY=rho*sX+sqrt(1-rho^2)*Error plot(sX,sY,main=paste("A Picture of Correlated Variables, rho=",rho),xlab="Standardized X",ylab="Standardized Y") points(c(min(sX),max(sX)),c(rho*min(sX),rho*max(sX)),type="l",col="green") # We can now fake say weight in pounds of male children as compared with their fathers: X=30*sX+165 Y=30*sY+165 plot(X,Y,main="Faked Weight",xlab="Father's Weight in pounds",ylab="Male Offspring's Weight in pounds") ############### # Notice sY has mean 0 and standard deviation 1 and is a sum of normals. In chapter 16, you learned that the sum of normals is again normal. Let us explore this in this case, by comparing Y to the appropriate normal curve. hist(Y,main=paste(Trials,"Offspring weights, a sum of normals"),xlab="Weights",freq = FALSE) mu=165 sigma=30 X1=seq(mu-3.5*sigma,mu+3.5*sigma,by=(7*sigma)/100) Y1=dnorm(X1,mu,sigma) points(X1,Y1,col="blue",type="l") ############### # Imagine we are given the above X and Y data, but are not told how they were produced. What would we do?... r=cor(X,Y) r Lin=lm(Y~X) Lin # We can still plot our data. Since we know the true relationship between the variables we can compare the "true" line of best fit (in green) and compare it with the regression line (in red). If they are different, why? plot(X,Y,main=paste("A Picture of Correlated Data, r=",round(r,4)),xlab="X Data",ylab="Y Data") points(c(min(X),max(X)),c(Lin[[1]][2] *min(X)+Lin[[1]][1],Lin[[1]][2] *max(X)+Lin[[1]][1]),type="l",col="red") points(c(min(X),max(X)),c(rho*min(X)+(mu-rho*mu),rho*max(X)+(mu-rho*mu)),type="l",col="green") ########### Let us look at the residuals. We expect them to look like our error. In particular, they should be independent of X. Res=Y-(Lin[[1]][2]*X+Lin[[1]][1]) plot(X,Res,main="A Graph of the Residual in predicting Y from X") points(c(min(X),max(X)),c(0,0),col="blue",type="l") #Let us look at the residuals of the standardized data. If we drew a histogram, what normal should we compare it to? zX=(X-mean(X))/sd(X) zY=(Y-mean(Y))/sd(Y) Linz=lm(zY~zX) Resz=zY-(Linz[[1]][2]*zX+Linz[[1]][1]) hist(Resz,main=paste("The Residuals"),xlab="Standardized Residuals",freq = FALSE) sigma=sqrt(1-(r)^2) X1=seq(-3.5*sigma,3.5*sigma,by=(7*sigma)/100) Y1=dnorm(X1,0,sigma) points(X1,Y1,col="blue",type="l") ######################### #2. Someone offers you 25 dollars to play let them play the following game against you: "Flip a fir coin until a heads shows up. If this takes N flips then you pay me 2^N dollars." Is this a reasonable offer? # Here is a simulation of geometric random variable, the number of flips till the first success. Trials=10^4 N=Trials SuccessRate=.5 p=SuccessRate q=1-p X=rnbinom(N,1,p)+1 T=table(X)/N T #Is this really a geometric random variable? V=1:length(T) plot(V,as.vector(T),cex=2) points(V,p*q^(V-1),col="red") # Here are the results of the game G=2^X summary(G)