Friday, November 8, 2013

Working with hdf files in R - Example: Pathfinder SST data



Following  a question that I posted on stackoverflow.com, I recieved the great advice to use the Bioconductor rhdf5 package to work with HDF5 files. The package is not located on CRAN, but can be sourced from the Bioconductor website:

source("http://bioconductor.org/biocLite.R")
biocLite("rhdf5")
Created by Pretty R at inside-R.org

As an example, I use the package to extract Pathfinder sea surface temperature (SST) data, available in netCDF-4 format (the features of netCDF-4 are a subset of the features of HDF5). This type of file is not readable by the netCDF package ncdf.The result is the above plot of a subarea from one of the daily data sets.

To reproduce the figure, you will need the image.scale and val2col functions found on this blog.

To reproduce example:

###required packages
library("rhdf5")
library(maps)
 
###required functions
source("image.scale.R") #http://menugget.blogspot.de/2011/08/adding-scale-to-image-plot.html
source("val2col.R") # http://menugget.blogspot.de/2011/09/converting-values-to-color-levels.html
 
 
###load data
#AVHRR Pathfinder Version 5.2 data (link: "http://www.nodc.noaa.gov/SatelliteData/pathfinder4km/")
fname <- "20120103133752-NODC-L3C_GHRSST-SSTskin-AVHRR_Pathfinder-PFV5.2_NOAA19_G_2012003_day-v02.0-fv01.0.nc" #(from Pathfinder ftp: "ftp://ftp.nodc.noaa.gov/pub/data.nodc/pathfinder/Version5.2/2012/")
 
#List the content of the HDF5 file.
tmp <- h5ls(fname)
tmp
 
lon <- h5read(fname, "lon")
lat <- h5read(fname, "lat")
sst <- h5read(fname, "sea_surface_temperature")
dim(sst)
 
#crop to desired area
lonRan <- c(-140, -60)
latRan <- c(20, 50)
lonKeep <- which(lon > lonRan[1] & lon < lonRan[2])
latKeep <- which(lat> latRan[1] & lat< latRan[2])
 
sst2 <- sst[lonKeep, latKeep, 1]
range(sst2)
sst2 <- replace(sst2, sst2 == -32768, NaN)
range(sst2, na.rm=TRUE)
sst2 <- (sst2 + 273.15) * 0.01 # change from deg Kelvin to deg Celsius and scale by 0.01 (p.45 http://data.nodc.noaa.gov/pathfinder/Version5.2/GDS_TechSpecs_v2.0.pdf)
range(sst2, na.rm=TRUE)
 
lon2 <- lon[lonKeep] # subset of lon values
lat2 <- rev(lat[latKeep]) # subset of lat values , and reverse for increasing values
sst2 <- sst2[,rev(seq(latKeep))] # reverse column order (lat) for increasing values
 
 
###plot
#figure template from "http://menugget.blogspot.de/2013/01/my-template-for-controlling-publication.html"
#Layout of plots
#1 1 3
#1 2 3
LO <- matrix(c(1,2), nrow=1, ncol=2, byrow=TRUE)
LO #double check layout
 
#Resolution, pointsize
RESO <- 400
PS <- 10
 
#Overall units in inches
WIDTHS <- c(5,1) #widths of each figure in layout (i.e. column widths)
HEIGHTS <- c(3.5)  #heights of each figure in layout (i.e. row heights)
 
#Outer margins and calculation of full dimensions
OMA <- c(0,0,1,0)  #Outer margins c(bottom, left, top, right)
HEIGHT <- sum(HEIGHTS) + OMA[1]*PS*1/72 + OMA[3]*PS*1/72
WIDTH <- sum(WIDTHS) + OMA[2]*PS*1/72 + OMA[4]*PS*1/72
 
#Double check full dimensions
WIDTH; HEIGHT 
 
 
#Start plot; open device - run from x11() down to observe behavior; run from pdf() down to produce .pdf
png("sst_usa.png", width=WIDTH, height=HEIGHT, units="in", res=RESO)
#pdf("sst_usa.pdf", width=WIDTH, height=HEIGHT)
#x11(width=WIDTH, height=HEIGHT)
 
par(oma=OMA, ps=PS) #settings before layout
layout(LO, heights=HEIGHTS, widths=WIDTHS)
#layout.show(max(LO)) # run to see layout; comment out to prevent plotting during .pdf
par(cex=1) # layout has the tendency change par()$cex, so this step is important for control
 
#plot 1
par(mar=c(2,2,1,1))
pal <- colorRampPalette(c("blue", "cyan", "yellow", "red"))
image(lon2, lat2, sst2, col=pal(100), xlab="", ylab="")
map("world", add=TRUE, fill=TRUE, col=8)
mtext("Pathfinder SST (2012-01-03)", side=3, line=0.5, font=2)
box()
 
#plot 2
par(mar=c(2,0,1,4))
image.scale(sst2, col=pal(100), xlab="", ylab="", axes=FALSE, horiz=FALSE)
axis(4)
mtext("deg [C°]", side=4, line=2.5)
box()
 
dev.off() # closes device
Created by Pretty R at inside-R.org


6 comments:

  1. Thanks to this post I've discovered HDF5 and it just transformed one of my pieces of research from impossible to possible.

    THANKS!!!

    ReplyDelete
  2. I seem to have a problem reproducing even the example. At the first hdf function call, the response is that 'HDF5: unable to open file
    Error in h5checktypeOrOpenLoc(file, readonly = TRUE) :
    Error in h5checktypeOrOpenLoc(). File '~/workspace/r_scratch//ncdf/data/20120103133752-NODC-L3C_GHRSST-SSTskin-AVHRR_Pathfinder-PFV5.2_NOAA19_G_2012003_day-v02.0-fv01.0.nc' is not a valid HDF5 file.'

    Any suggestion of where I should start looking for a solution?

    I'm using R 3.0.2 on Ubuntu 12.4.3; all packages are up to date.

    Thanks!

    ReplyDelete
    Replies
    1. Is it possible that the double backslash in your file path is the cause of the problem?

      Delete
  3. Thanks for your reply. I’ve seen the double backslash too late – mistake in copy/pasting the error message, not in my script. The problem was not that the file was not found, but that the file is not valid HDF5 file; and that problem remains.

    ReplyDelete
    Replies
    1. Hi Edward - Sorry I can't be of help. I had no trouble opening these netCDF-4 files with the package. Please post a comment if you figure out what the problem was.

      Delete
  4. Thanks for the post! I read an HDF5 data and try to convert that into raster. After conversion, I add raster extent using xmin(r), xmax(r), ymin(x), ymin(r) When I check the dimensions they are correct but the cell size in not. it shows that the resolution for x and y is different. Could you please show how to read the full data set without clipping and convert that into raster? Thanks!

    ReplyDelete