Monday, April 2, 2012

Working with Globcolour data



[UPDATE: new example can be found here. Functions have also been added to the sinkr package.]

The Globcolour project (http://www.globcolour.info/) provides relatively easy access to ocean color remote sensing data. Data is provided at http://hermes.acri.fr/ and the following parameters are available:
· Chlorophyll-a (CHL1 and CHL2)
· Fully normalised water leaving radiances at 412, 443, 490, 510, 531, 550-565, 620, 665-
670, 681 and 709 nm (Lxxx)
· Coloured dissolved and detrital organic materials absorption coefficient (CDM)
· Diffuse attenuation coefficient (Kd(490))
· Particulate back-scattering coefficient (bbp)
· Total Suspended Matter (TSM)
· Relative excess of radiance at 555 nm (EL555)
· Photosynthetic Available Radiation (PAR)
· Heated layer depth (ZHL)
· Secchi disk depth (ZSD)
· Primary production (PP)
· Aerosol optical thickness over water (T865)
· Cloud Fraction (CF)

Of particular interest to ecologists are the estimates of Chlorophyll a (chla) , which combines data from several satellites for better coverage - SeaWiFS (NASA), MODIS (NASA), MERIS (ESA).  Data is available at several temporal (daily, 8-days, and monthly averages) and spatial (4.63 km, 0.25°, and 1°) resolutions for the global domain. Several merged products are available: simple averaging (AV), weighted averaging (AVW), and Garver, Siegel, Maritorena Model (GSM) [for more information see the Product User Guide].

Due to the gappy nature of the data (e.g. due to land and clouds), many of the data products only provide values at grids where estimation was possible. For high resolution data, such as in 4.63 km resolution daily estimates, grids with values are often far fewer than the total number of ISIN grids (n=23,761,676) used by the product. This saves space in the files for download, but you may need to reconstruct the field (with NaN's included for grids without observations) for some analyses.

The following example shows how to retrieve Globcolour data and process it using R. Global data is available, but I have provided instructions for processing a smaller area of 4.63 km resolution chla data from the Galapagos archipelago. One can define lat/lon limits for the desired area on the http://hermes.acri.fr/ interface. An ftp address will be sent via Email as to the location of the data when finished.

Several steps are presented in the example below:
   In UNIX-type console:
1. Download and unzip .nc files (I use the "wget" command within a cygwin console on my PC)
  In R:
2. Open successive .nc files and extract the chla estimates with corresponding ISIN grid column and row identifiers. Due to the varying size of each file, the data are stored in lists of undefined size.
3. Unlist the variables into vectors and create a data.frame
4. Make a sparse matrix with available data
5. (optional) Convert to a matrix object for some analyses
6. Obtain ISIN grid data using function "isin.convert" found below (grid identification, lat/lon positions, polygons for mapping)
7. Calculate statistics and plot

I have had some issues with projection-type plots for small areas due to some troubles of the mapproject function in using small lat/lon limits. Therefore, the example just creates a simple plot using Cartesian coordinates (although the dimensions of the plot have been set to approximate the geographic distances of the area. You will need several other functions found within this blog to reproduce the example (val2colcolor.palette, image.scale, earth.dist, map.frame.R). Several packages are also required (ncdf, Matrix, maps, mapdata, PBSmapping).


the isin.grid function:
#This function is used to convert information regarding the
#ISIN grid information used by Globcolour as well as to construct
#associated polygons for use in mapping.
#The raw Globcolour .nc files come with column and row pointers
#as to the the grid's location. For 4.63 km resolution data, this
#translates to 4320 latitudinal rows with varying number of 
#associated longitudinal columns depending on the latitudinal
#circumference. 
#Input must be either a vector of grid numbers ["grd"] or a dataframe
#with column and row identifiers ["coord", e.g. columns in coord$col and
#rows in coord$row]
#When the argument "polygon=FALSE" (Default), the function will output
#a dataframe object containing grid information (gridnumber["output$grd"],
#column["output$col"], row["output$row"], longitude["output$lon"], and 
#latitude[output$lat])
#If the argument "polygon=TRUE", then the putput will be a list with
#polygon shapes in a dataframe(longitudinal coordinates of corners
#["[[i]]$x"], latitudinal coordinates of corners ["[[i]]$y"])
 
isin.convert <- function(grd=NULL, coord=NULL, polygons=FALSE){
 
###ISIN grid definition
 Re=6378.145
 #NUMROWS
 Nlat=4320
 
 dr=(pi*Re)/Nlat  #lat bin width
 dPHI=pi/Nlat
 
 PHI = -(pi/2)+(1:Nlat)*dPHI-(dPHI/2)
 
 p=2*pi*Re*cos(PHI)
 Nlon=round(p/dr, 0)
 dlon=p/Nlon
 dphi=(2*pi)/Nlon
 Ntot=sum(Nlon)
 lat=PHI*180/pi
 
 if(!missing(grd)){
  #calc. coord
  cumNlon <- c(0,cumsum(Nlon))
  g.row <- sapply(grd, fun <- function(x){max(which(cumNlon < x))})
  g.col <- grd - cumNlon[g.row]
  #calc. lonlat
  g.lat = lat[g.row]
  Nlon_row = Nlon[g.row]
  g.lon <- (360*(g.col-0.5)/Nlon_row) - 180
 }
 
 if(!missing(coord)){
  #calc. coord
  g.row <- coord$row
  g.col <- coord$col
  #calc. lonlat
  g.lat = lat[g.row]
  Nlon_row = Nlon[g.row]
  g.lon <- (360*(g.col-0.5)/Nlon_row) - 180
  #calc. grd
  cumNlon <- c(0,cumsum(Nlon))
  grd <- cumNlon[g.row] + g.col
 }
 
 if(polygons){
  Nlon_row = Nlon[g.row]
  g.width <- 360/Nlon_row
  g.height <- 180/Nlat
  tmp <- data.frame(g.lon=g.lon, g.lat=g.lat, g.width=g.width, g.height=g.height)
  polys <- vector(mode="list", length(grd))
  for(i in seq(polys)){
   xs <- c(tmp$g.lon[i]-tmp$g.width[i]/2, tmp$g.lon[i]-tmp$g.width[i]/2, tmp$g.lon[i]+tmp$g.width[i]/2, tmp$g.lon[i]+tmp$g.width[i]/2)
   ys <- c(tmp$g.lat[i]-tmp$g.height[i]/2, tmp$g.lat[i]+tmp$g.height[i]/2, tmp$g.lat[i]+tmp$g.height[i]/2, tmp$g.lat[i]-tmp$g.height[i]/2)
   polys[[i]] <- data.frame(x=xs, y=ys)
  }
  return(polys)
 }
 
 if(!polygons){
  data.frame(grd=grd, col=g.col, row=g.row, lon=g.lon, lat=g.lat) 
 }
 
}
Created by Pretty R at inside-R.org



to reproduce the example above:
###required libraries
library(ncdf)
library(Matrix)
library(maps)
library(mapdata)
library(PBSmapping)
 
#required functions (from "www.menugget.blogspot.com")
source("val2col.R")
source("isin.convert.R")
source("color.palette.R")
source("image.scale.R")
source("earth.dist.R")
source("map.frame.R")
 
 
###to get ordered products and unzip them
###get all 2003 average weighted value (AWV) files 
#wget -r -nd --ftp-user=hermes --ftp-password=hermes% 'ftp://hermes.acri.fr/922733645/L3b_2003*AWV*gz' 
###unzip .nc files
#gunzip *.gz
 
#define path of .nc files
#may need to adjust slash orientation depending on UNIX or Windows
#raw_path <- "C:/cygwin/home/ME/"
raw_path <- "C:\\cygwin\\home\\ME\\globcolour\\galapagos\\" 
 
#Define years to include
YEAR_min=1997
YEAR_max=2012
 
#Import data
grd <- list()
chl <- list()
date <- list()
iter=0
for(i in seq(YEAR_max-YEAR_min+1)){
 #tmp = system(paste("ls ", raw_path), intern=TRUE) #for UNIX
 tmp = unlist(strsplit(shell(paste("dir", raw_path), intern=TRUE), " ")) #for Windows
 
 YEAR <- c(YEAR_min:YEAR_max)[i]
 tmp <- tmp[grep(".nc", tmp, fixed=TRUE)]
 tmp <- tmp[grep(paste("L3b_", YEAR, sep=""), tmp)]
 #additional filters if needed
 #tmp <- tmp[grep("GSM", tmp)] #only GSM products
 #tmp <- tmp[grep("MO", tmp)] #only monthly products
 
 list_obs = data.frame(file_name=tmp, date=format(strptime(substr(tmp, 5, 12), "%Y%m%d")))
 for (j in 1:length(list_obs[,1])){  #loop_2: for each date j
  iter <- iter+1
  nc <- open.ncdf(paste(raw_path,as.character(list_obs$file_name[j]),sep=""))   # opens Globcolour data file and 
  idxrow = get.var.ncdf(nc, "row")+1
  idxcol = get.var.ncdf(nc, "col")+1
  val = get.var.ncdf(nc, "CHL1_mean")
  close.ncdf(nc)
  rm("nc")
  date[[iter]] <- as.character(rep(list_obs$date[j], length(val)))
  grd[[iter]] <- isin.convert(coord=data.frame(col=idxcol, row=idxrow))$grd
  chl[[iter]] <- val
  print(as.character(list_obs[j,2]))
 }
}
 
#unite data into dataframe
DB <- data.frame(
date=unlist(date),
grd=unlist(grd), 
chl=unlist(chl)
)
head(DB)
tail(DB)
dim(DB)
 
###Fast way of filling sparse data into a field
 
#create lookup tables for unique dates and grids
#lookup for dates
u.date <- unique(DB$date)
u.date <- u.date[order(u.date)]
date.lut <- data.frame(date=u.date, id=seq(u.date))
date.lut
DB$dateid <- date.lut$id[match(DB$date, date.lut$date)]
#lookup for grids
u.grd <- unique(DB$grd)
u.grd <- u.grd[order(u.grd)]
grd.lut <- data.frame(grd=u.grd, id=seq(u.grd))
grd.lut
DB$grdid <- grd.lut$id[match(DB$grd, grd.lut$grd)]
 
#Create sparse matrix
field <- sparseMatrix(i=DB$dateid, j=DB$grdid, x=DB$chl)
dim(field)
 
#Convert field object to class matrix and give appropriate row and column names
field <- as.matrix(field)
field[which(field==0)] <- NaN
rownames(field) <- as.character(date.lut$date)
colnames(field) <- grd.lut$grd
 
#grid info using isin.convert()
isin <- isin.convert(grd=as.numeric(colnames(field)))
head(isin)
polys <- isin.convert(grd=as.numeric(colnames(field)), polygons=TRUE)
 
 
###Explorations of data
#fraction of months with observation. Retain only those with greater than 20%
frac.obs <- colSums(!is.na(field))/dim(field)[1] #fraction of dates with observation value
grd.incl <- which(frac.obs > 0.20) # grids to include - fraction observed > 0.2
length(grd.incl)
 
#Monthly mean values
MON <- as.POSIXlt(rownames(field))$mon
mon.mean <- matrix(NaN, nrow=12, ncol=dim(field)[2])
rownames(mon.mean) <- 1:12; colnames(mon.mean) <- colnames(field)
for(i in 1:12){
 mon.mean[i,grd.incl] <- colMeans(field[which(MON == i-1),grd.incl], na.rm=TRUE)
}
 
#Yearly means
spatial.mean <- colMeans(mon.mean, na.rm=TRUE)
 
#Plot of map without projection
val <- spatial.mean
pal=color.palette(rev(c("red", "yellow", "cyan", "blue")), n.steps.between=c(2,5,10))
COLS <- val2col(val, col=pal(100))
 
YLIM <- c(min(isin$lat), max(isin$lat)) 
XLIM <- c(min(isin$lon), max(isin$lon))
 
x.dist <- earth.dist(XLIM[1], mean(YLIM), XLIM[2], mean(YLIM))
y.dist <- earth.dist(mean(XLIM), YLIM[1], mean(XLIM), YLIM[2]) 
heights=c(6)
widths=c(heights[1]*x.dist/y.dist, 1.5)
res=400
 
png(filename="spatial.mean.no.projection.png", width = sum(widths), height = sum(heights), units="in", res=res)
par(omi=c(0.1, 0.1, 0.1, 0.1), ps=12)
layout(matrix(c(1,2),nrow=1,ncol=2,byrow=TRUE), widths = widths, heights = heights, respect=TRUE)
layout.show(2)
 
par(mai=c(0.5, 0.5, 0.1, 0.1))
plot(isin$lon, isin$lat, t="n", xlim=XLIM, ylim=YLIM, xaxs="i", yaxs="i")
for(i in seq(polys)){
 polygon(polys[[i]], col=COLS[i], border=COLS[i], lwd=0.001)
}
map("worldHires", fill=TRUE, col="grey", add=TRUE)
abline(h=seq(-90,90,0.5), lty=3, col="grey")
abline(v=seq(-180,180,0.5), lty=3, col="grey")
map.frame(deg.ext=0.5)
 
#add scale
par(mai=c(0.5, 0.1, 0.1, 1))
image.scale(val, col=pal(100), horiz=FALSE, yaxt="n", ylab="", xlab="")
axis(4)
mtext(paste("yearly mean chl a [mg m^-3, ", YEAR_min, "-", YEAR_max, "]", sep=""), side=4, line=3)
box()
 
dev.off()
Created by Pretty R at inside-R.org



2 comments:

  1. This is fantastic. Thanks! I'm re-writing this in python and will report back once I have it all together. Just curious, what is it you are using the `dphi` for in the `isin.convert` function?

    ReplyDelete
  2. Glad you find it helpful. `dphi` is the "Latitudinal angular discretisation (radians)". You can find a full description of all of the ISIN grid definitions in the "Product User's Guide" .pdf , p.69(see link at the end of the second paragraph). Cheers

    ReplyDelete