Monday, September 15, 2014

PCA / EOF for data with missing values - a comparison of accuracy

Not all Principal Component Analysis (PCA) (also called Empirical Orthogonal Function analysis, EOF) approaches are equal when it comes to dealing with a data field that contain missing values (i.e. "gappy"). The following post compares several methods by assessing the accuracy of the derived PCs to reconstruct the "true" data set, as was similarly conducted by Taylor et al. (2013).

The gappy EOF methods to be compared are:
  1. LSEOF - "Least-Squares Empirical Orthogonal Functions" - The traditional approach, which modifies the covariance matrix used for the EOF decomposition by the number of paired observations, and further scales the projected PCs by these same weightings (see Björnsson and Venegas 1997, von Storch and Zweiers 1999 for details).
  2. RSEOF - "Recursively Subtracted Empirical Orthogonal Functions" - This approach modifies the LSEOF approach by recursively solving for the leading EOF, whose reconstructed field is then subtracted from the original field. This recursive subtraction is done until a given stopping point (i.e. number of EOFs, % remaining variance, etc.) (see Taylor et al. 2013 for details)
  3. DINEOF - "Data Interpolating Empirical Orthogonal Functions" - This approach gradually solves for EOFs by means of an iterative algorothm to fit EOFS to a given number of non-missing value reference points (small percentage of observations) via RMSE minimization (see Beckers and Rixen 2003 for details).
I have introduced both the LSEOF [link] and DINEOF [link] methods in the past, but have never directly compared them for the blog. The purpose of this post is to make this comparison and to also introduce a more general EOF function that is capable of conducting RSEOF. All analyses can be reproduced following installation of the "sinkr" package: https://github.com/marchtaylor/sinkr

The basic problem comes down to the difficulties of decomposing a matrix that is not "positive-definite", i.e. the estimated covariance matrix from a gappy data set. DINEOF entirely avoids this issue by first interpolating the values to create a full data field, while LSEOF and RSEOF rely on decomposing this estimation. A known problem is that the trailing EOFs derived from such a matrix are amplified in their singular values, which can consequently amplify errors in field reconstructions when included. The RSEOF approach thus attempts to remedy these issues by recursively solving for only leading EOFs. In the following examples, I show the performance of the three approaches in terms of reconstructing the data field (including the "true" values).

Example 1 - Synthetic data set:
The first example uses a synthetic data set used by Beckers and Rixen (2003) in their introduction of the DINEOF approach. The accuracy of the reconstruction is dependent on the number of  EOFs used. In a non-gappy example, a perfect reconstruction should be possible using this full set of EOFs - In fact it only takes 9 EOFs when using the non-noisy true field, since it is a composite of 9 signals. In the case of the noisy, gappy data sets, reconstructions with trailing EOFs may increase errors. This can be seen in the figure at the top of the post showing RMSE vs the number of EOFs used in the reconstruction.

The figure shows the DINEOF approach to be the most accurate. The LSEOF approach has a clear RSME minimum with 4 EOFs, while the RSEOF approach was largely able to remedy the amplification of error when using trailing EOFs. The problem of error amplification is even more dramatic when viewed visually, as in the following where the full set of EOFs have been used:


It's clear that the LSEOF approach is only successful in reconstructing the non-gap values of the observed data, while values of gaps are washed out by the error in trailing EOFs. In fact, given the associated amplitude of the noise, there were only about 4 EOFs modes that really carry any information across the entire field. Again, DINEOF does a fine job in precisely estimating the EOF modes, even in cases where modes were quite similar in magnitude (e.g. EOFs 2 & 3). The LSEOF and RSEOF approaches create more of a mixture out of the modes 2 & 3 :


Example 2 - Sea Level Pressure:
In a more natural example - sea level pressure (SLP) is subjected to the three methods as well. This data set is widely available, which is why I use it here, but it technically doesn't contain gaps - I have added 50% gaps to the data field in a random fashion. High resolution daily chlorophyll or sea surface temperature remote sensing data would be a more obvious application for this analysis, but for the purposes of reproduction, I'll stick with SLP. In this case, EOF provides us with both the spatial and temporal patterns of the dominant modes of variability. PCs (i.e. the temporal signal) have been scaled in magnitude by removing the amplitude of the singular values.

There is very little difference in the leading 3 modes (the EOF modes of the "true" field is shown for comparison). The explained variance (%) of each EOF is given in the upper right corner of the spatial mode:



In fact, given the large area included in the analysis, there are many significant modes and differences in the error of the reconstruction only become obvious after about 5 EOFs. In the example by Taylor et al. (2013) for chlorophyll anomalies in the Galapagos archipelago, stark differences were immediately seen due to a predominance of a single EOF mode as influenced by ENSO variability.
For SLP anomalies the error of the reconstruction is not problematic until the higher trailing modes are used:

Interestingly the point of error minimum in the LSEOF approach is with 19 EOFs, which is nearly the same number that is given by a null-model permutation test to determine EOF significance (n = 18 EOFs) (similar to "Rule N" by Preisendorfer 1988):

I guess this final example may be a good illustration why the LSEOF has persisted in so many climate science references - The problems in EOF estimation may only become apparent in data sets where variability is restricted to a small number of modes, and many applications of EOF are typically applied to larger geographic regions.

The DINEOF method appears to be a good choice in the estimation of EOFs in gappy data. One is obviously not going to have a "true" data set by which to gauge the relative performance of approaches, but given the guidelines of selecting reference values, DINEOF should perform well under most cases. The RSEOF method also does a good job in correcting for the issues of LSEOF, and is also much less computationally intensive than DINEOF - In the SLP example, deriving 25 EOFs from the data field (dimensions = 1536 x 451) took 68 sec. with DINEOF vs 14 sec. with RSEOF (~20% of the time).

Again, the functions used for this analysis are available within the sinkr package - Installation of the sinkr package via devtools is outlined in the first lines of the example script below.



References:

Beckers, J.-M., Rixen, M., 2003. EOF Calculations and Data Filling from Incomplete Oceanographic Datasets. Journal of Atmospheric and Oceanic Technology 20.12: 1839-1856.  [link]

Björnsson, H. and Venegas, S.A., 1997. A manual for EOF and SVD analyses of climate data, McGill University, CCGCR Report No. 97-1, Montréal, Québec, 52pp. [link]

Preisendorfer, R. W., Principle Component Analysis in Meteorology and Oceanography, Elsevier Sci., New York, 1988.


von Storch, H, Zwiers, F.W. (1999). Statistical analysis in climate research. Cambridge University Press.



Code to reproduce:

#library(devtools)
#install_github("marchtaylor/sinkr")
 
library(sinkr)
 
###################
### 1st example ###
###################
 
### Make data
#color palette
pal <- colorRampPalette(c("blue", "cyan", "yellow", "red"))
 
#Generate data
m=50
n=100
frac.gaps <- 0.5 # the fraction of data with NaNs
N.S.ratio <- 0.1 # the Noise to Signal ratio for adding noise to data
 
x <- (seq(m)*2*pi)/m
t <- (seq(n)*2*pi)/n
 
#True field
Xt <- 
 outer(sin(x), sin(t)) + 
 outer(sin(2.1*x), sin(2.1*t)) + 
 outer(sin(3.1*x), sin(3.1*t)) +
 outer(tanh(x), cos(t)) + 
 outer(tanh(2*x), cos(2.1*t)) + 
 outer(tanh(4*x), cos(0.1*t)) + 
 outer(tanh(2.4*x), cos(1.1*t)) + 
 tanh(outer(x, t, FUN="+")) + 
 tanh(outer(x, 2*t, FUN="+")
)
 
Xt <- t(Xt)
image(Xt, col=pal(100))
 
 
#Noise field
set.seed(1)
RAND <- matrix(runif(length(Xt), min=-1, max=1), nrow=nrow(Xt), ncol=ncol(Xt))
R <- RAND * N.S.ratio * Xt
 
 
#True field + Noise field
Xp <- Xt + R
image(Xp, col=pal(100))
 
#Observed field with gaps
set.seed(1)
gaps <- sample(seq(length(Xp)), frac.gaps*length(Xp))
Xo <- replace(Xp, gaps, NaN)
image(Xo, col=pal(100))
 
 
### Interpolation, EOF, and Reconstruction ###
#Interpolation with DINEOF
set.seed(1)
din <- dineof(Xo)
Xa <- din$Xa 
image(Xa, col=pal(100))
 
# EOF
Et <- eof(Xp) # true
El <- eof(Xo) # obs, lseof
Er <- eof(Xo, recursive=TRUE) # obs, rseof
Ed <- eof(Xa) # obs, dineof + lseof
 
###Reconstruction
VALS <- which(Xo == 0)
Rt <- eofRecon(Et)
Rl <- eofRecon(El)
Rr <- eofRecon(Er)
Rd <- eofRecon(Ed)
 
###Plot
png(file="eof_interpolation.png", width=7, height=4.5, res=400, units="in", type="cairo")
op <- par(mfrow=c(2,3), mar=c(3,3,2,2), ps=10, bg="white")
image(Xt, col=pal(100))
mtext("True", side=3, line=0.5)
image(Xp, col=pal(100))
mtext("True + noise", side=3, line=0.5)
image(Xo, col=pal(100))
mtext("Observed", side=3, line=0.5)
image(Rl, col=pal(100))
mtext("LSEOF recon", side=3, line=0.5)
image(Rr, col=pal(100))
mtext("RSEOF recon", side=3, line=0.5)
image(Rd, col=pal(100))
mtext("DINEOF recon", side=3, line=0.5)
par(op)
dev.off()
 
 
 
png(file="eofs.png", width=6, height=6.5, res=400, units="in", type="cairo")
op <- par(no.readonly=TRUE)
layout(matrix(c(1:9,10,10,10), nrow=4, ncol=3, byrow=TRUE), widths=c(2,2,2), heights=c(2,2,2,0.5))
par(mar=c(3,3,2,1), ps=10, bg="white")
for(i in seq(9)){
El.sign <- sign(cor(Et$u[,i], El$u[,i]))
Er.sign <- sign(cor(Et$u[,i], Er$u[,i]))
Ed.sign <- sign(cor(Et$u[,i], Ed$u[,i]))
 
YLIM <- range(Et$u[,i], Ed$u[,i]*Ed.sign, El$u[,i]*El.sign, Er$u[,i]*Er.sign)
plot(x, Et$u[,i], ylim=YLIM, xlab="", ylab="", col=8)
lines(x, El$u[,i]*El.sign, col=2, lwd=1, lty=1)
lines(x, Er$u[,i]*Er.sign, col=3, lwd=1, lty=1)
lines(x, Ed$u[,i]*Ed.sign, col=4, lwd=1, lty=1)
mtext(paste("EOF", i), side=3, line=0.5)
}
par(mar=c(0,2,0,1), ps=14)
plot(1, t="n", axes=FALSE, bty="n")
legend("center", ncol=4, legend=c("LSEOF (True)", "LSEOF (Obs.)", "RSEOF (Obs.)",  "DINEOF (Obs.)"), pch=c(1,NA, NA, NA), lty=c(NA, 1,1,1), lwd=1, col=c(8,2:4), bty="n")
par(op)
dev.off()
 
 
 
 
###Error of recon
neof <- seq(15)
rmse.Et <- NaN * neof
rmse.El <- NaN * neof
rmse.Er <- NaN * neof
rmse.Ed <- NaN * neof
refField <- "Xp"
for (i in neof){
 Rt <- eofRecon(Et, pcs=seq(i))
 Rl <- eofRecon(El, pcs=seq(i))
 Rr <- eofRecon(Er, pcs=seq(i))
 Rd <- eofRecon(Ed, pcs=seq(i))
 rmse.Et[i] <- sqrt(mean((get(refField) - Rt)^2, na.rm=TRUE))
 rmse.El[i] <- sqrt(mean((get(refField) - Rl)^2, na.rm=TRUE)) 
 rmse.Er[i] <- sqrt(mean((get(refField) - Rr)^2, na.rm=TRUE))
 rmse.Ed[i] <- sqrt(mean((get(refField) - Rd)^2, na.rm=TRUE))
}
 
ylim <- range(c(rmse.Et, rmse.El, rmse.Er, rmse.Ed))
 
png(file="eof_recon_error.png", width=5, height=5, res=400, units="in", type="cairo")
op <- par(mar=c(5,5,2,1), ps=10)
plot(neof, rmse.Et, t="n", ylim=ylim, xlab="No. EOFs", ylab="RMSE")
grid()
lines(neof, rmse.Et, pch=21, t="o", col=1, bg=c(NA, 1)[(rmse.Et == min(rmse.Et))+1])
lines(neof, rmse.El, pch=21, t="o", col=2, bg=c(NA, 2)[(rmse.El == min(rmse.El))+1])
lines(neof, rmse.Er, pch=21, t="o", col=3, bg=c(NA, 3)[(rmse.Er == min(rmse.Er))+1])
lines(neof, rmse.Ed, pch=21, t="o", col=4, bg=c(NA, 4)[(rmse.Ed == min(rmse.Ed))+1])
legend("topleft", legend=c("LSEOF (obs.)", "RSEOF (obs.)", "DINEOF (obs.)", "Ref. (true+noise)"), col=c(2,3,4,1), lty=1, pch=1, bty="n")
mtext("Error of reconstruction", side=3, line=0.5)
par(op)
dev.off()
 
 
# Null model error comparison
Err <- eofNull(Xp, centered=TRUE, scaled=TRUE, nperm=99)
 
png(file="eof_significance.png", width=5, height=5, res=400, units="in", type="cairo")
op <- par(mar=c(5,5,2,1), ps=10)
ylim <- range(Err$Lambda.orig, Err$Lambda)
plot(Err$Lambda.orig, log="y", pch=16, ylim=ylim, xlab="EOF", ylab="Lambda")
Qs <- apply(Err$Lambda, 2,  quantile, probs=0.95)
lines(Qs,col=2, lty=2, lwd=2) # 95% error quantile
abline(v=Err$n.sig+0.5, lty=2, col=4, lwd=2)
mtext(paste("Significant EOFs =", Err$n.sig), side=3, line=0.5, col=4)
par(op)
dev.off()
 
 
#############################################
### 2nd example - sea Level pressure data ###
#############################################
library(ncdf)
library(akima)
library(maps)
library(sinkr)
 
nc <- open.ncdf("slp.mnmean.nc") # from http://www.esrl.noaa.gov/psd/gcos_wgsp/Gridded/data.hadslp2.html
nc
slp <- get.var.ncdf(nc, "slp")
lon <- get.var.ncdf(nc, "lon")
#lon[which(lon>180)] <- lon[which(lon>180)]-360
lat <- get.var.ncdf(nc, "lat")
ts <- get.var.ncdf(nc, "time")
ts <- as.Date(ts, origin="1800-01-01")
close(nc)
 
 
# re-structure into 2-D matrix
slp <- matrix(c(slp), nrow=dim(slp)[3], ncol=prod(dim(slp)[1:2]), byrow=TRUE)
 
# Make grid id's
grd <- expand.grid(lon=lon, lat=lat)
 
# Trim matrix
grd.incl <- lonLatFilter(grd$lon, grd$lat, 100, 310, -30, 30)
slp <- slp[,grd.incl]
grd <- grd[grd.incl,]
 
# Calc. monthly anomaly
slp.anom <- fieldAnomaly(y=slp, x=as.POSIXlt(ts), level="monthly")
 
# Make gappy dataset
set.seed(1)
slpg.anom <- replace(slp.anom, sample(length(slp.anom), 0.5*length(slp.anom)), NaN)
 
### EOFs (leading 25)
neofs <- 25
t1 <- Sys.time()
Et <- eof(slp.anom, recursive=FALSE, method="irlba", nu=neofs) # LSEOF of true field
Et.time <- Sys.time()-t1
 
t1 <- Sys.time()
El <- eof(slpg.anom, recursive=FALSE, method="irlba", nu=neofs) # LSEOF of gappy field
El.time <- Sys.time()-t1
 
t1 <- Sys.time()
Er <- eof(slpg.anom, recursive=TRUE, method="irlba", nu=neofs) # RSEOF of gappy field
Er.time <- Sys.time()-t1
 
t1 <- Sys.time()
set.seed(1)
din <- dineof(slpg.anom)
print(paste(din$n.eof, "EOFs", ";", "RMS = ", round(tail(din$RMS,1),3))) # "56 EOFs ; RMS =  0.289"
Ed <- eof(din$Xa, recursive=FALSE, method="irlba", nu=neofs) # DINEOF + LSEOF of gappy field
Ed.time <- Sys.time()-t1
 
 
Et.time;El.time;Er.time;Ed.time
#Time difference of 0.7268829 secs
#Time difference of 0.7765241 secs
#Time difference of 14.77254 secs
#Time difference of 1.143369 mins
 
#############
### Plots ###
#############
# Error of reconstruction
neof <- seq(25)
rmse.Et <- NaN * neof
rmse.El <- NaN * neof
rmse.Er <- NaN * neof
rmse.Ed <- NaN * neof
for (i in neof){
  Rt <- eofRecon(Et, pcs=seq(i))
  Rl <- eofRecon(El, pcs=seq(i))
  Rr <- eofRecon(Er, pcs=seq(i))
  Rd <- eofRecon(Ed, pcs=seq(i))
  rmse.Et[i] <- sqrt(mean((slp.anom - Rt)^2, na.rm=TRUE))
  rmse.El[i] <- sqrt(mean((slp.anom - Rl)^2, na.rm=TRUE)) 
  rmse.Er[i] <- sqrt(mean((slp.anom - Rr)^2, na.rm=TRUE))
  rmse.Ed[i] <- sqrt(mean((slp.anom - Rd)^2, na.rm=TRUE))
}
 
ylim <- range(c(rmse.Et, rmse.El, rmse.Er, rmse.Ed))
 
png(file="slp_eof_recon_error.png", width=5, height=5, res=400, units="in", type="cairo")
op <- par(mar=c(5,5,2,1), ps=10)
plot(neof, rmse.Et, log="", t="n", ylim=ylim, xlab="No. EOFs", ylab="RMSE")
grid()
lines(neof, rmse.Et, pch=21, t="o", col=1, bg=c(NA, 1)[(rmse.Et == min(rmse.Et))+1])
lines(neof, rmse.El, pch=21, t="o", col=2, bg=c(NA, 2)[(rmse.El == min(rmse.El))+1])
lines(neof, rmse.Er, pch=21, t="o", col=3, bg=c(NA, 3)[(rmse.Er == min(rmse.Er))+1])
lines(neof, rmse.Ed, pch=21, t="o", col=4, bg=c(NA, 4)[(rmse.Ed == min(rmse.Ed))+1])
legend("topright", legend=c("LSEOF (obs.)", "RSEOF (obs.)", "DINEOF (obs.)", "Ref. (true)"), col=c(2,3,4,1), lty=1, pch=1, bty="n")
mtext("Error of reconstruction", side=3, line=0.5)
par(op)
dev.off()
 
 
# Null model error comparison
Err <- eofNull(slp.anom, method="irlba", nu=neofs, nperm=10)
 
png(file="slp_eof_significance.png", width=5, height=5, res=400, units="in", type="cairo")
op <- par(mar=c(5,5,2,1), ps=10)
ylim <- range(Err$Lambda.orig, Err$Lambda)
plot(Err$Lambda.orig, log="y", pch=16, ylim=ylim, xlab="EOF", ylab="Lambda")
Qs <- apply(Err$Lambda, 2,  quantile, probs=0.95)
lines(Qs,col=2, lty=2, lwd=2) # 95% error quantile
abline(v=Err$n.sig+0.5, lty=2, col=4, lwd=2)
mtext(paste("Significant EOFs =", Err$n.sig), side=3, line=0.5, col=4)
par(op)
dev.off()
 
eof
# EOF maps
DAT <- c("Et", "El", "Er", "Ed")
TITLE1 <- c(
 "True SLP",
 "Gappy SLP", 
 "Gappy SLP",
 "Gappy SLP"
)
TITLE2 <- c(
 "(LSEOF)",
 "(LSEOF)",
 "(RSEOF)",
 "(DINEOF)"
)
 
Qs <- c()
for(k in seq(DAT)){
 DAT.tmp <- get(DAT[k])
 Asc <- DAT.tmp$A[,1:3] %*% expmat(diag(DAT.tmp$Lambda[1:3]), -0.5)
 Qs <- c(Qs, quantile(Asc, probs=c(0.01,0.99), na.rm=TRUE))
 rm(DAT.tmp)
}
YLIM <- range(Qs)
 
ZRAN <- range(c(get(DAT[1])$u[,1:3], get(DAT[2])$u[,1:3], get(DAT[3])$u[,1:3], get(DAT[4])$u[,1:3]))
ZLIM <- c(-max(abs(ZRAN)), max(abs(ZRAN)))
reso <- 100
 
PAL <- colorRampPalette(c(rgb(1,0.3,0.3), rgb(0.9,0.9,0.9), rgb(0.3,0.3,1)))
NCOL <- 15
 
lonlat.rat <- earthDist(min(grd$lon), min(grd$lat), max(grd$lon), min(grd$lat)) / earthDist(min(grd$lon), min(grd$lat), min(grd$lon), max(grd$lat))
map.width <- 2
map.height <- map.width / lonlat.rat 
ts.height <- 0.7
scale.height <- 1
 
RES <- 400
PS <- 10
CEX <- 1
OMI=c(0.3, 0.7, 0.3, 0.2)
WIDTHS = rep(map.width,3) 
HEIGHTS <- c(rep(c(map.height, ts.height), 4), scale.height)
L.LWD <- 0.5
WIDTH = sum(WIDTHS) + sum(OMI[c(2,4)])
HEIGHT = sum(HEIGHTS) + sum(OMI[c(1,3)])
WIDTH ; HEIGHT 
 
OPPSIGN <- c(7,8,9, 13,15, 20)
 
png(filename="slp_top3_eof_maps.png", width=WIDTH, height=HEIGHT, res=RES, units="in", type="cairo")
#pdf("eofx3_maps.pdf", width=WIDTH, height=HEIGHT)
#x11(width=WIDTH, height=HEIGHT)
op <- par(omi=OMI, ps=PS)
layout(matrix(c(1:24, 25,25,25), nrow=9, ncol=3, byrow=TRUE), widths = WIDTHS, heights = HEIGHTS, respect=TRUE)
 
par(cex=1)
plot.num <- 0
for(k in seq(DAT)){
 #k=1
 DAT.tmp <- get(DAT[k])
 #DAT2.tmp <- get(DAT2[k])
 TITLE1.tmp <- TITLE1[k]
 TITLE2.tmp <- TITLE2[k]
 
  # Spatial patterns
 par(mar=c(0,0.25,0.25,0))
 for(i in seq(3)){
  #i=1
  plot.num <- plot.num + 1
  u <- DAT.tmp$u[,i]
  if(plot.num %in% OPPSIGN){
   u <- u * -1
  }
  F <- interp(x=grd$lon, y=grd$lat, z=u, 
              xo=seq(min(grd$lon), max(grd$lon), length=reso),
              yo=seq(min(grd$lat), max(grd$lat), length=reso)
  )
  image(F, col=PAL(NCOL), zlim=ZLIM, xaxs="i", yaxs="i", xlab="", ylab="", axes=FALSE)
  map("world2", add=TRUE, col=1, lwd=0.5)
  USR <- par()$usr
  EXPLVAR <- DAT.tmp$Lambda / DAT.tmp$tot.var * 100 
    text(x=USR[1]+0.9*(USR[2]-USR[1]), y=USR[3]+0.9*(USR[4]-USR[3]), labels=paste(round(EXPLVAR[i]), "%", sep=""), font=2)
  box()
  if(k ==1){
   mtext(paste("EOF", i), side=3, line=0.5, font=2)
  }
  if(i == 1){
   mtext(TITLE1.tmp, side=2, line=2)
   mtext(TITLE2.tmp, side=2, line=1)
  }
 }
 
  # Temporal patterns
 par(mar=c(0,0.25,0,0))
 DAT.tmp$Asc <- DAT.tmp$A[,1:3] %*% expmat(diag(DAT.tmp$Lambda[1:3]), -0.5)
 for(i in seq(3)){
  #i=1
  TS <- as.POSIXct(ts)
  TS.SEQ <- as.POSIXct(seq.Date(as.Date("1800-01-01"), as.Date("2100-01-01"), by="10 years"))
  TS.SEQ2 <- as.POSIXct(seq.Date(as.Date("1800-01-01"), as.Date("2100-01-01"), by="30 years"))
  plot.num <- plot.num + 1
    Asc <- DAT.tmp$Asc[,i]
  if(plot.num %in% (OPPSIGN+3)){
    Asc <- Asc * -1
  }
  plot(TS, Asc, t="n", lwd=L.LWD, xaxs="i", xlab="", ylab="", xaxt="n", yaxt="n", ylim=YLIM)
  USR <- par()$usr
    rect(USR[1], USR[3], USR[2], USR[4], col=rgb(0.7,0.7,0.7))
    lines(TS, Asc, lwd=L.LWD)
    df.tmp <- data.frame(sig=Asc, year=as.POSIXlt(TS)$year+1900)
    agg <- aggregate(sig ~ year, data=df.tmp, FUN=mean)
  lines(as.POSIXct(paste(agg$year, "07-01", sep="-")), agg$sig, col=7, lwd=L.LWD*2)
  #lines(smooth.spline(x=TS, y=Asc, spar=0.3), col=7, lwd=L.LWD*2)
  abline(h=0, lty=3, col=rgb(0.5,0.5,0.5), lwd=L.LWD)
  abline(v=TS.SEQ, lty=3, col=rgb(0.5,0.5,0.5), lwd=L.LWD)
  if(k == length(DAT)){
   par(tcl=-0.25)
   axis.POSIXct(1, at=TS.SEQ, labels=NA, las=2)
   par(tcl=-0.5)
   axis.POSIXct(1, at=TS.SEQ2, las=2)
  }
  if(i == 1) axis(2, at=c(-2,0,2))
 }
 rm(DAT.tmp)
 rm(TITLE1.tmp)
 rm(TITLE2.tmp)
}
 
#plot scale
par(mar=c(1,3,3,3))
plot.num <- plot.num + 1
imageScale(1, ZLIM, col=PAL(NCOL), axis.pos=1)
box()
 
par(op)
 
dev.off()
Created by Pretty R at inside-R.org

4 comments:

  1. Hi, for some reason I get an inverse EOF1 for the three gappy SLPs compared to that of the true SLP (i.e. positive values on the left hand side, and negative on the right hand side). A similar situation for EOF 2 in the cases of the LSEOF and DINEOF. And again for EOF3 for LSEOF and RSEOF. I think this might be a bug on the code but I have been unable to find it. I say that this might be a bug because I have copied-pasted your code into my R session from this page and your post on R-bloggers and I get the results I mention above both of the times. Any advice?

    ReplyDelete
    Replies
    1. Hello - thanks for your comment.
      This is not a bug. Actually, the signs of the EOF patterns are arbitrary and, depending on your run / OS, you may get different signs than those presented in the example. The key is that the signs of the EOFs (u) and their coefficients (A) operate together to represent the original data (i.e. as in reconstruction). You will see in the code, that I flipped some of the signs around for the purpose of plotting/comparison by using the OPPSIGN vector - i.e. the signs of those maps / time series were flipped in the plot.
      Hope that helps

      Delete
  2. This makes sense. Many thanks for your very prompt response.

    ReplyDelete
  3. With {sinkr}, the line

    slp.anom <- fieldAnomaly(y=slp, x=as.POSIXlt(ts), level="monthly")

    should be now

    slp.anom <- fieldAnomaly(y=slp, x=as.POSIXlt(ts), level="monthly")


    Also, {ncdf} is deprecated. With {ncdf4}, "open.ncdf" becomes "nc_open", "get.var.ncdf", "ncvar_get" and "close.ncdf", "nc_close".

    ReplyDelete