This post describes a possible use for the GPGPU server that Dara was suggesting Azimuth might want to try to use. Any comments/improvements from anyone welcome!

I'm aware that it's non-trivial to get good GPGPU performance :-) My background in the area is having spent the last 2 years working on ARM's compiler for OpenCL. While it's different to an NVIDIA system (ARM has a unified memory architecture rather than on-chip memory like a desktop GPU) I've had moderate experience in getting code running both correctly and performantly upon a GPU.

With regards to what, the thoughts I'd had is looking at doing various kinds of regression upon the El Nino dataset. You'll have seen the writing on the wiki I've done on [[Bilinear regression]]. Having done some research on how other people solve things my view on the best way to do the fitting is:

1. Alternate between fixing the left factors and solving for the right factors, and fixing the right factors and solving for the left factors. This has the advantage that with one set fixed the problem becomes "linear regression", hence the same code can also be used for linear regression just by not doing the alternation. (A paper appears to say the alernating approach still converges.)

2. Since the sparsity weights I'd like to try include \$L_1\$ and \$L_{1/2}\$ priors, it looks like the best way of solving the individual linear regression problems is [[Co-ordinate descent]], which requires more descent steps but of a drastically simpler kind. It looks like co-ordinate descent performs quite well when parallelised.

With such a program written I'd be inclined to see what results are obtained for predicting the El Nino 3.4 index (or some variant) using:

1. Linear regression, linear regression with \$L_2\$ prior, linear regression with \$L_1\$ prior, linear regression with \$L_{1/2}\$ prior.

2. Bilinear regression, bilinear regression with \$L_2\$ prior, bilinear regression with \$L_1\$ prior, bilinear regression with \$L_{1/2}\$ prior.

It's quite possible that this won't demonstrate anything beyond medium prediction performance, but it might reveal some interesting stuff. It's also some results that John could use at NIPS as a baseline for "standard ML techniques" to compare Ludescher et al against.

For the size of data we've got I think it's only suitable either for a very beefy multicore CPU or a GPGPU setup. (I can probably do some correctness testing and debugging on tiny problems using pocl to run OpenCL on my laptop's CPU before having to run on an actual GPU.)