Friday, November 11, 2011

Propagation of error

     At the onset, this was strictly an excercise of my own curiosity and I didn't imagine writing this down in any form at all. As someone who has done some modelling work in the past, I'm embarrassed to say that I had never fully grasped how one can gauge the error of a model output without having to do some sort of Monte Carlo simulation whereby the model parameters are repeatedly randomized within a given confidence interval. Its relatively easy to imagine that a model containing many parameters, each with an associated error, will tend to propagate these errors throughout. Without getting to far over my head here, I will just say that there are defined methods for calculating the error of a variable if one knows the underlying error of the functions that define them (and I have tried out only a very simple one here!).
     In the example below, I have three main variables (x, y, and z) and two functions that define the relationships y~x and z~y. The question is, given these functions, what would be the error of a predicted z value given an initial x value? The most general rule seems to be:
     error(z~x)^2 = error(y~x)^2 + error(z~y)^2
However, correlated errors require additional terms (see Wikipedia: Propagation of uncertainty). The following example does just that by simulating correlated error terms using the MASS package's function mvrnorm().

example:

library(MASS) #package required for generating correlated random numbers
 
set.seed(1111) #set the seed of the random number generator
n=10000 #number of values to generate for the variable x
e1.sd <- 3 #the standard deviation of the error term in the relationship y~a+b*x+e1
e2.sd <- 10 #the standard deviation of the error term in the relationship z~c+d*y+e2
e.cor <- matrix(c(1,0.3,0.3,1),2) #the correlation matrix of the two error terms (i.e. in this example they are correlated at r=0.3)
e.cov <- e.cor*as.matrix(c(e1.sd,e2.sd))%*%t(as.matrix(c(e1.sd,e2.sd))) # the covariance matrix of the error terms
e <- mvrnorm(n,c(0,0),e.cov) #generate the error terms givin the defined correlation
e1 <- e[,1] #error term #1
e2 <- e[,2] #error term #2
#cor(e) #shows that we are very close to what we input into the mvrnorm function
#cov(e)
 
x <- runif(n, min=0, max=20) #randomly generated x values
#the first model defining y~x
a <- 10
b <- 2
y <- a + b*x + e1
 
#the second model defining z~y
c <- 20
d <- 12
z <- c + d*y + e2
 
#implied relationship defining z~x
z.plus.e <- c + d*a + d*b*x + d*e1 + e2 #The two error terms are "d*e1" and "e2"
 
#relationship without error (i.e. the z-values that would be calulated with the linear models) 
z.minus.e <- c + d*a + d*b*x
 
#Comparison of the derived z values as calculated by the models versus the actual z values
fit.z <- lm(z ~ z.minus.e - 1)
summary(fit.z)
 
plot(z ~ z.minus.e, pch=".", cex=4, col=rgb(0,0,1,0.2))
abline(0,1,col=rgb(0.5,0.5,0.5,0.5), lwd=10)
abline(fit.z,col=2, lwd=2)
 
#Comparison of modelled z error versus that calculated from the two introduced error terms
sd(resid(fit.z))
sqrt(sd(d*e1)^2 + sd(e2)^2 + 2*cov(d*e1,e2))
Created by Pretty R at inside-R.org

3 comments:

  1. Andrej-Nikolai SpiessNovember 12, 2011 at 5:52 PM

    Have you tried the 'propagate' function in my qpcR package. It does Monte Carlo (also using mvrnorm and covariance structire), classical first-order Taylor expansion and permutation with ties. If you try it, please make some comments on how to improve it.

    Greets,
    Andrej

    ReplyDelete
  2. Hi Marc. This is not a comment on your post. However, I have a question regarding how to generate correlated outcomes between groups. If I have 3 groups (Control, Treatment1 and Treatment2) and each has an outcome (Y), but the outcomes between groups are correlated by .80. I was wondering how you generate these data. Thanks!

    ReplyDelete
    Replies
    1. Hi - you would need to have a matrix (e.g. like e.cor), where the diagonal is 1 and all other values are 0.8. Since you don't mention covariance, then this correlation matrix could go directly into the mvrnorm function:
      library(MASS)
      n <- 10000
      e.cor <- matrix(0.8, 3, 3)
      diag(e.cor) <- 1
      e <- mvrnorm(n, c(0,0,0), e.cor)
      cor(e) # see results

      Delete