Thursday, February 25, 2016

RSpectra::svds function now default method for sinkr::dineof

The summary blog post describing the RSpectra (formerly rARPACK) package made a convincing case for the improved decomposition speed of the "svds" function for partial SVD (Singular Value Decomposition) over several other R packages. Until now, the sinkr package has relied on irlba for it's dineof function (Data Interpolating Empirical Orthogonal Functions). Since the routine can be quite computationally intensive, I wanted to test the performance of svds as an alternative method.

In a simple example that performs SVD on a field of sea level pressure of the equatorial Pacific, svds outperforms irlba both in speed and correlation of singular vector (e.g. $u) to the output of the base svd function. One can see in the following graph, that trailing vectors break down in their correlation while svds maintains nearly perfect correlation. Interestingly, this artifact is removed by first centering the data field.


While the effect looks dramatic above, it should be noted that the trailing vectors usually carry only a small fraction of information, and thus contribute only marginally to errors in field reconstruction. Below is a figure showing the reconstruction error of svd, svds, and irlba with increasing levels of truncation.


Finally, both methods were compared in their performance within dineof. With the non-centered field both approaches arrive to a similar RMS, but svds converges with less iterations and EOFs than irlba. With the centered data both methods produce nearly identical results.


So, even though the differences are small, the rSpectra::svds method will now be the default method for both the eof and dineof functions within sinkr. For the moment, the irlba method is maintained for compatibility with previous versions.

Code to reproduce:

Thursday, February 18, 2016

Visualizing model predictions in 3d

Here is a brief exploration of the misc3d package, which has some nice functions that can be used in conjunction with rgl. I am especially pleased with the output of contour3d which I have used to plot GAM predictions in 3d.
The example is a simple dataset of x, y, and z, data that were used to calculate a 4th variable "value" with the equation:

 value = -0.01x3 + -0.2*y2 + -0.3*z2

 Fitting GAM model to this dataset resulted in the following spline terms.



Then, the fitted GAM model was used to predict values on a regular 3d grid for plotting with the rgl package. The following plot shows the original data, with value values colored (blue colors of the spectrum are low values, red colors are high values). Finally, the contour3d function is used to add the GAM predictions as colored contours.


I got some nice insight from the R code accompanying the book by Murcell (2011).

References:
Murrell, P., 2011. R graphics. CRC Press.


Example script:

Wednesday, February 17, 2016

Working with Globcolour data (Part 2)

Just a quick note to announce that the makeGlobcolourField and isin.convert functions have been added to the sinkr package. In addition, the makeGlobcolourField function now used the ncdf4 package to read the .nc files. Both functions are only set up to deal with the higher resolution 4 km data based on the ISIN grid ("L3b").
The following script is an example of extracting data for the Philippines, and produces a map of mean Chl1 values:






Example script: