Options

Possible GPGPU task: linear/bilinear regression on El Nino dataset

This post describes a possible use for the GPGPU server that Dara was suggesting Azimuth might want to try to use. Any comments/improvements from anyone welcome!

I'm aware that it's non-trivial to get good GPGPU performance :-) My background in the area is having spent the last 2 years working on ARM's compiler for OpenCL. While it's different to an NVIDIA system (ARM has a unified memory architecture rather than on-chip memory like a desktop GPU) I've had moderate experience in getting code running both correctly and performantly upon a GPU.

With regards to what, the thoughts I'd had is looking at doing various kinds of regression upon the El Nino dataset. You'll have seen the writing on the wiki I've done on Bilinear regression. Having done some research on how other people solve things my view on the best way to do the fitting is:

  1. Alternate between fixing the left factors and solving for the right factors, and fixing the right factors and solving for the left factors. This has the advantage that with one set fixed the problem becomes "linear regression", hence the same code can also be used for linear regression just by not doing the alternation. (A paper appears to say the alernating approach still converges.)

  2. Since the sparsity weights I'd like to try include $L_1$ and $L_{1/2}$ priors, it looks like the best way of solving the individual linear regression problems is Co-ordinate descent, which requires more descent steps but of a drastically simpler kind. It looks like co-ordinate descent performs quite well when parallelised.

With such a program written I'd be inclined to see what results are obtained for predicting the El Nino 3.4 index (or some variant) using:

  1. Linear regression, linear regression with $L_2$ prior, linear regression with $L_1$ prior, linear regression with $L_{1/2}$ prior.

  2. Bilinear regression, bilinear regression with $L_2$ prior, bilinear regression with $L_1$ prior, bilinear regression with $L_{1/2}$ prior.

It's quite possible that this won't demonstrate anything beyond medium prediction performance, but it might reveal some interesting stuff. It's also some results that John could use at NIPS as a baseline for "standard ML techniques" to compare Ludescher et al against.

For the size of data we've got I think it's only suitable either for a very beefy multicore CPU or a GPGPU setup. (I can probably do some correctness testing and debugging on tiny problems using pocl to run OpenCL on my laptop's CPU before having to run on an actual GPU.)

Comments

  • 1.

    Hello David

    These are fine, however I would like to ask you to make a specific application on specific set of data. You can see how since I arrived I have tried to organize my coding.

    If possible, and please do not feel pressure, make a narrowly specific application of these algorithms to address a specific problem.

    There are 1000000000s blogs and websites on machine learning and yet I can count on fingers of one hand where I could download any useful code to solve actual problems.

    This might not be your cup of tea, but I suggest you consider if you may

    Dara

    Comment Source:Hello David These are fine, however I would like to ask you to make a specific application on specific set of data. You can see how since I arrived I have tried to organize my coding. If possible, and please do not feel pressure, make a narrowly specific application of these algorithms to address a specific problem. There are 1000000000s blogs and websites on machine learning and yet I can count on fingers of one hand where I could download any useful code to solve actual problems. This might not be your cup of tea, but I suggest you consider if you may Dara
  • 2.
    edited September 2014

    David wrote:

    It’s also some results that John could use at NIPS as a baseline for “standard ML techniques” to compare Ludescher et al against.

    Yes, I support this project and I'll be glad to help!

    My recent post in El Niño project - thinking about the next steps was part of my plan to avoid complete embarrassment or nerve-racking suspense by making sure that without much new effort I can put together some reasonable talk at NIPS. But this sounds like a great extra initiative.

    Maybe I can use my (limited) skill in climate science to help you prepare a very specific coding challenge for Dara. First, a basic question. When you talk about

    predicting the El Nino 3.4 index (or some variant) using:

    • Linear regression, linear regression with $L_2$ prior, linear regression with $L_1$ prior, linear regression with $L_{1/2}$ prior.

    • Bilinear regression, bilinear regression with $L_2$ prior, bilinear regression with $L_1$ prior, bilinear regression with $L_{1/2}$ prior.

    are you talking about predicting the El Niño 3.4 index at some time given just previous values of this index, or - more demanding but potentially much more rewarding - given previous values of the temperature at a grid of points in the Pacific. The latter is the sort of data that Ludescher et al use, so it could help "benchmark" their program.

    Comment Source:David wrote: > It’s also some results that John could use at NIPS as a baseline for “standard ML techniques” to compare Ludescher et al against. Yes, I support this project and I'll be glad to help! My recent post in [El Niño project - thinking about the next steps](http://forum.azimuthproject.org/discussion/1382/el-nino-project-thinking-about-the-next-steps/?Focus=12497#Comment_12497) was part of my plan to avoid complete embarrassment or nerve-racking suspense by making sure that without much new effort I can put together some reasonable talk at NIPS. But this sounds like a great extra initiative. Maybe I can use my (limited) skill in climate science to help you prepare a very specific coding challenge for Dara. First, a basic question. When you talk about > predicting the El Nino 3.4 index (or some variant) using: > * Linear regression, linear regression with $L_2$ prior, linear regression with $L_1$ prior, linear regression with $L_{1/2}$ prior. > * Bilinear regression, bilinear regression with $L_2$ prior, bilinear regression with $L_1$ prior, bilinear regression with $L_{1/2}$ prior. are you talking about predicting the El Niño 3.4 index at some time given just previous values of this index, or - more demanding but potentially much more rewarding - given previous values of the temperature at a grid of points in the Pacific. The latter is the sort of data that Ludescher _et al_ use, so it could help "benchmark" their program.
  • 3.

    Maybe I can use my (limited) skill in climate science to help you prepare a very specific coding challenge for Dara.

    YES!

    Comment Source:>Maybe I can use my (limited) skill in climate science to help you prepare a very specific coding challenge for Dara. YES!
  • 4.

    I have the same question as John.

    The practical definition of "prediction" can be interpreted in different ways. I interpret as what John is saying is that we can treat this in either of two ways -- or both, for that matter:

    1. Predict ENSO at the current time (or at historical times) based on correlated events from other measures, spatial or otherwise.

    2. Predict ENSO at future times based on some sort of physical model or heuristic model that we can glean from past history.

    We can use #1 to help with #2, but only if we have models for the other correlated measures. This is a subtle yet important detail. For example, we can find all sorts of correlations with rainfall, other indices such as NAO, AMO, etc., but unless we have models for those other measures, we can't use those to predict the future -- instead, we are simply chasing our tail and staying in the present.

    Does that make sense as a reframing of the question ?

    Comment Source:I have the same question as John. The practical definition of "prediction" can be interpreted in different ways. I interpret as what John is saying is that we can treat this in either of two ways -- or both, for that matter: 1. Predict ENSO at the current time (or at historical times) based on correlated events from other measures, spatial or otherwise. 2. Predict ENSO at future times based on some sort of physical model or heuristic model that we can glean from past history. We can use #1 to help with #2, but only if we have models for the other correlated measures. This is a subtle yet important detail. For example, we can find all sorts of correlations with rainfall, other indices such as NAO, AMO, etc., but unless we have models for those other measures, we can't use those to predict the future -- instead, we are simply chasing our tail and staying in the present. Does that make sense as a reframing of the question ?
  • 5.
    edited September 2014

    So the idea is to find some real-valued value that's related to El Nino to try and predict from the values of the time series in the Pacific Basin. I'm not sure whether the El Nino 3.4 index is the best thing to predict: it seems to be a 3-month average whereas for Ludescher et al's "is there an El Nino" it's a value over 5 months, so maybe an average of 5 (or maybe 7) would be more useful for comparing with Ludescher et al?

    Comment Source:So the idea is to find some real-valued value that's related to El Nino to try and predict from the values of the time series in the Pacific Basin. I'm not sure whether the El Nino 3.4 index is the best thing to predict: it seems to be a 3-month average whereas for Ludescher et al's "is there an El Nino" it's a value over 5 months, so maybe an average of 5 (or maybe 7) would be more useful for comparing with Ludescher et al?
  • 6.

    Is it also possible to move away from linear to non-linear, it is clear that the data we are concerned about is not linear nor could be approximately linearly

    Dara

    Comment Source:Is it also possible to move away from linear to non-linear, it is clear that the data we are concerned about is not linear nor could be approximately linearly Dara
  • 7.

    There are 1000000000s blogs and websites on machine learning and yet I can count on fingers of one hand where I could download any useful code to solve actual problems.

    Dara, I recommend looking at the examples section of the scikit learn documentation. Also if ypu or other people are interested in doing machine learning or scientific computing in python I highly recommend using the Anaconda python distribution. It gives you most of the commonly used quantitative python libraries like scikt, pandas ipython, scipy numpy, ... (195 libraries) out of the box. Stop installing dependencies becoming the biggest part of a project :).

    Comment Source:> There are 1000000000s blogs and websites on machine learning and yet I can count on fingers of one hand where I could download any useful code to solve actual problems. Dara, I recommend looking at the [examples section](http://scikit-learn.org/dev/auto_examples/index.html) of the scikit learn documentation. Also if ypu or other people are interested in doing machine learning or scientific computing in python I highly recommend using the [Anaconda](http://docs.continuum.io/anaconda/) python distribution. It gives you most of the commonly used quantitative python libraries like scikt, pandas ipython, scipy numpy, ... (195 libraries) out of the box. Stop installing dependencies becoming the biggest part of a project :).
  • 8.
    edited September 2014

    Maybe I can use my (limited) skill in climate science to help you prepare a very specific coding challenge for Dara.

    I think formulating a good prediction problem is the most important issue for the project at this point.

    Daniel

    Comment Source:> Maybe I can use my (limited) skill in climate science to help you prepare a very specific coding challenge for Dara. I think formulating a good prediction problem is the most important issue for the project at this point. Daniel
  • 9.

    Hello Daniel

    We used SCIKIT for an entire year and while it works for demos with small data (except Neural Networks code incomplete) we discovered that for realistic applications we could not train the sample data set.

    We did have a full installation with Anaconda, the account you were given had access accordingly. We installed everything!

    I give you examples why we dropped its usage.

    For SVM/SVR it provides access to limited KERNELS which are not producing accurate enough forecasts, so we wanted to add our own WAVELET KERNEL and it was a nightmare! Finally a world class Python programmer gave up!

    For Knn we wanted to use our own metrics or other known metrics, again same.

    No parallelization support, therefore for large datasets it is useless.

    NO SYMBOLIC COMPUTATIONS at all! This is a huge issue when one does algorithms and needs to wrestle the algebras.

    Neural Network is still not supported which is a big problem.

    For professional use and serious research, the kinds John requires I do not recommend SCIKIT. If you want to teach classes for undergrads, maybe a good option.

    So we decided to write our own code and contribute to their source tree, and it looked like there are political issues amongst the organizers and certain developments are blocked e.g. Neural Networks. We found their API interface cumbersome for quick contributions.

    Finally I switched to Mathematica and C.

    Sorry.

    I AM WILLING TO PROVIDE THE PYTHON ANACONDA INSTALLATIONS REQUIRED FOR SCIKIT, if a competent programmer could show some serious results, but personally I cannot contribute with SCIKIT

    DAra

    Comment Source:Hello Daniel We used SCIKIT for an entire year and while it works for demos with small data (except Neural Networks code incomplete) we discovered that for realistic applications we could not train the sample data set. We did have a full installation with Anaconda, the account you were given had access accordingly. We installed everything! I give you examples why we dropped its usage. For SVM/SVR it provides access to limited KERNELS which are not producing accurate enough forecasts, so we wanted to add our own WAVELET KERNEL and it was a nightmare! Finally a world class Python programmer gave up! For Knn we wanted to use our own metrics or other known metrics, again same. No parallelization support, therefore for large datasets it is useless. NO SYMBOLIC COMPUTATIONS at all! This is a huge issue when one does algorithms and needs to wrestle the algebras. Neural Network is still not supported which is a big problem. For professional use and serious research, the kinds John requires I do not recommend SCIKIT. If you want to teach classes for undergrads, maybe a good option. So we decided to write our own code and contribute to their source tree, and it looked like there are political issues amongst the organizers and certain developments are blocked e.g. Neural Networks. We found their API interface cumbersome for quick contributions. Finally I switched to Mathematica and C. Sorry. I AM WILLING TO PROVIDE THE PYTHON ANACONDA INSTALLATIONS REQUIRED FOR SCIKIT, if a competent programmer could show some serious results, but personally I cannot contribute with SCIKIT DAra
  • 10.

    Just to emphasize I am not ANTI-scikit and grind no axes with its founders

    Dara

    Comment Source:Just to emphasize I am not ANTI-scikit and grind no axes with its founders Dara
  • 11.

    Daniel wrote:

    I think formulating a good prediction problem is the most import issue for the project at this point.

    Daniel is 100% correct on this.

    Comment Source:Daniel wrote: >I think formulating a good prediction problem is the most import issue for the project at this point. Daniel is 100% correct on this.
  • 12.

    Hi Dara,

    I have had very good experiences with scikit. In my work I mainly use the linear model and tree ensemble subpackages of scikit. These can handle any dataset that can reasonably fit in memory (same limitation as R and I think Mathematica). I have it it to train random forests on dense datasets with 250000 rows by 4000 columns, and linear models on sparse data sets with 250000 rows and 500000 columns. The linear models train in minutes or even seconds.

    I also use some of the unsupervised libraries like k-means and various matrix decompositions. Matrix decomposition take more time and cannot handle datasets like the above, but all the weather pattern decompositions I posted here on were done with scikit.

    You can train SVMs with arbitrary kernels by using sklearn.kernel_approximation.Nystroem followed by a linear method like the SGDClassifier. For large datasets this is much more efficient than say svmlib. I have used this method to train gaussian svms on datasets of the size quoted above. I am not aware of any other public libraries that can train nonlinear kernels on large data sets. SVMs with non-linear kernels are usually very inefficient, since the original algorithms for these use dual space methods which are quadratic in the number of rows. There are modern faster approximate methods (like the Nystroem kernel above), but I am not aware of libraries implement these. Fast primal space libraries like liblinear and vowpal-wabbit only handle linear classifiers, although vowpal-wabbit has recently added a multilayer option, but that is not a kernel method.

    Scikit does not scale to the google/twitter/facebook kind of big data, but it easily handles of the size we have been looking at here. It has laso scaled better than R on problems I have worked with.

    Daniel

    Comment Source:Hi Dara, I have had very good experiences with scikit. In my work I mainly use the linear model and tree ensemble subpackages of scikit. These can handle any dataset that can reasonably fit in memory (same limitation as R and I think Mathematica). I have it it to train random forests on dense datasets with 250000 rows by 4000 columns, and linear models on sparse data sets with 250000 rows and 500000 columns. The linear models train in minutes or even seconds. I also use some of the unsupervised libraries like k-means and various matrix decompositions. Matrix decomposition take more time and cannot handle datasets like the above, but all the weather pattern decompositions I posted here on were done with scikit. You can train SVMs with arbitrary kernels by using sklearn.kernel_approximation.Nystroem followed by a linear method like the SGDClassifier. For large datasets this is much more efficient than say svmlib. I have used this method to train gaussian svms on datasets of the size quoted above. I am not aware of any other public libraries that can train nonlinear kernels on large data sets. SVMs with non-linear kernels are usually very inefficient, since the original algorithms for these use dual space methods which are quadratic in the number of rows. There are modern faster approximate methods (like the Nystroem kernel above), but I am not aware of libraries implement these. Fast primal space libraries like liblinear and vowpal-wabbit only handle linear classifiers, although vowpal-wabbit has recently added a multilayer option, but that is not a kernel method. Scikit does not scale to the google/twitter/facebook kind of big data, but it easily handles of the size we have been looking at here. It has laso scaled better than R on problems I have worked with. Daniel
  • 13.

    Hello Daniel

    I am sure these experiences are viable for those who know SCIKIT and how to use it. I seem to have failed.

    I can provide the servers with SCIKIT and iPython notebooks for serious programmers, personally cannot spend more time on SCIKIT. Easily we could exchange data and run scripts between the platforms. So no loss.

    With regards to weather patterns, the traditional time-series data we have are tiny i.e. the ones we used for our examples past few months.

    I am moving on to volumetric satellite data e.g. GPM.

    Again I do not wish my comments to reflect on SCIKIT nor being critical about what you said.

    Dara

    Comment Source:Hello Daniel I am sure these experiences are viable for those who know SCIKIT and how to use it. I seem to have failed. I can provide the servers with SCIKIT and iPython notebooks for serious programmers, personally cannot spend more time on SCIKIT. Easily we could exchange data and run scripts between the platforms. So no loss. With regards to weather patterns, the traditional time-series data we have are tiny i.e. the ones we used for our examples past few months. I am moving on to volumetric satellite data e.g. GPM. Again I do not wish my comments to reflect on SCIKIT nor being critical about what you said. Dara
  • 14.
    edited September 2014

    John wrote:

    Are you talking about predicting the El Niño 3.4 index at some time given just previous values of this index, or - more demanding but potentially much more rewarding - given previous values of the temperature at a grid of points in the Pacific?

    Paul Pukite wrote:

    I interpret as what John is saying is that we can treat this in either of two ways – or both, for that matter:

    • Predict ENSO at the current time (or at historical times) based on correlated events from other measures, spatial or otherwise.

    • Predict ENSO at future times based on some sort of physical model or heuristic model that we can glean from past history.

    This is not the distinction I was highlighting. I was talking about

    1) predicting the El Niño 3.4 index at time $t$ given just values of the El Niño 3.4 index at times before $t - \Delta t$ for some $\Delta t$,

    versus

    2) predicting the El Niño 3.4 index at time $t$ given values of many other quantities at times before $t - \Delta t$ for some $\Delta t$,

    Ludescher et al do 2). Until December (at least) I want to focus on doing things similar to what they do. But I imagine it's quite standard to use machine learning techniques to do tasks like 1). In my reply to Daniel Mahler, I proposed a version of 2) that I hope will be easy:

    3) predicting the El Niño 3.4 index at time $t$ given the "average link strength" at times between $t - \Delta_1 t$ and $t - \Delta_2 t$, for some numbers $\Delta_1 t$ and $\Delta_2 t$.

    This is a problem of predicting one time series given another, presumably much less data-intensive that predicting one time series given many others, like the temperatures at all these grid points:

    I am not wanting to focus on physical models just yet. It's a very interesting challenge, and I know that's what you like to do. But I don't think I can do it before December! After December I will feel more free to tackle big ambitious projects.

    Comment Source:John wrote: > Are you talking about predicting the El Niño 3.4 index at some time given just previous values of this index, or - more demanding but potentially much more rewarding - given previous values of the temperature at a grid of points in the Pacific? Paul Pukite wrote: > I interpret as what John is saying is that we can treat this in either of two ways – or both, for that matter: > * Predict ENSO at the current time (or at historical times) based on correlated events from other measures, spatial or otherwise. > * Predict ENSO at future times based on some sort of physical model or heuristic model that we can glean from past history. This is not the distinction I was highlighting. I was talking about 1) predicting the El Niño 3.4 index at time $t$ given just values of the El Niño 3.4 index at times before $t - \Delta t$ for some $\Delta t$, versus 2) predicting the El Niño 3.4 index at time $t$ given values of many other quantities at times before $t - \Delta t$ for some $\Delta t$, Ludescher _et al_ do 2). Until December (at least) I want to focus on doing things similar to what they do. But I imagine it's quite standard to use machine learning techniques to do tasks like 1). In [my reply to Daniel Mahler](http://forum.azimuthproject.org/discussion/1382/el-nino-project-thinking-about-the-next-steps/?Focus=12521#Comment_12521), I proposed a version of 2) that I hope will be easy: 3) predicting the El Niño 3.4 index at time $t$ given the "average link strength" at times between $t - \Delta_1 t$ and $t - \Delta_2 t$, for some numbers $\Delta_1 t$ and $\Delta_2 t$. This is a problem of predicting one time series given another, presumably much less data-intensive that predicting one time series given _many_ others, like the temperatures at all these grid points: <img width = "400" src = "http://math.ucr.edu/home/baez/ecological/el_nino/ludescher_el_nino_cooperativity_1a.jpg" alt = ""/> I am not wanting to focus on physical models just yet. It's a very interesting challenge, and I know that's what you like to do. But I don't think I can do it before December! After December I will feel more free to tackle big ambitious projects.
Sign In or Register to comment.