Home › Azimuth Project › › - Questions

It looks like you're new here. If you want to get involved, click one of these buttons!

- All Categories 2.4K
- Chat 505
- Study Groups 21
- Petri Nets 9
- Epidemiology 4
- Leaf Modeling 2
- Review Sections 9
- MIT 2020: Programming with Categories 51
- MIT 2020: Lectures 20
- MIT 2020: Exercises 25
- Baez ACT 2019: Online Course 339
- Baez ACT 2019: Lectures 79
- Baez ACT 2019: Exercises 149
- Baez ACT 2019: Chat 50
- UCR ACT Seminar 4
- General 75
- Azimuth Code Project 111
- Statistical methods 4
- Drafts 10
- Math Syntax Demos 15
- Wiki - Latest Changes 3
- Strategy 113
- Azimuth Project 1.1K
- - Spam 1
- News and Information 148
- Azimuth Blog 149
- - Conventions and Policies 21
- - Questions 43
- Azimuth Wiki 719

Options

As John asked about applying other well known machine learning algorithms on the El Nino problem, I've been vaguely thinking about it. However, (at least to my understanding) a lot of the techniques simply won't scale to the volume of data -- either because my laptops are relatively low power and I don't have access to a cluster, or more seriously because a lot of the techniques are are solving problems by gradient descent/stochastic algorithms/etc which tend to either not converge or converge to very poor local minima in very high dimesional spaces.

If I do come up with any good dimensionaliy reduction techniques I'll put them here, but I also ought to check if this has been looked at before by others?

## Comments

If you've been offline for a bit perhaps this goes some way:

Ian Ross, Nonlinear dimensional reduction methods in climate data analysis (2008) which compares the top 3 EOF PCA results against +20 dynamic models for El Nino 3 prediction.

Glad you're back with more good stuff :).

`If you've been offline for a bit perhaps this goes some way: Ian Ross, [Nonlinear dimensional reduction methods in climate data analysis (2008)](http://arxiv.org/abs/0901.0537) which compares the top 3 EOF PCA results against +20 dynamic models for El Nino 3 prediction. Glad you're back with more good stuff :).`

Look at Differential Evolution and GPU parrallelization, I am looking for 'the few the proud' who are willing to brave some CUDA code for this algorithm paralleliized in a CUDA server (which I got one right now as we are talking).

To be more serious, even for 1 dimensional time series e.g. stocks the forecast algorithms are really not worked by the industries involved. There has a been a lot of academic paper publishing and claiming block buster results and yet in actuality little code that works. Not that there is a shortage of all kinds of variations of these algorithms on the net as open source repositories, yet another programming layer fluff to hide the details of the algorithms... but an actual implementation that works and solves a real problem is rare.

Those who know how to do that are working for DOD doing the drones...

Dara

`>or more seriously because a lot of the techniques are are solving problems by gradient descent/stochastic algorithms/etc which tend to either not converge or converge to very poor local minima in very high dimesional spaces. Look at Differential Evolution and GPU parrallelization, I am looking for 'the few the proud' who are willing to brave some CUDA code for this algorithm paralleliized in a CUDA server (which I got one right now as we are talking). To be more serious, even for 1 dimensional time series e.g. stocks the forecast algorithms are really not worked by the industries involved. There has a been a lot of academic paper publishing and claiming block buster results and yet in actuality little code that works. Not that there is a shortage of all kinds of variations of these algorithms on the net as open source repositories, yet another programming layer fluff to hide the details of the algorithms... but an actual implementation that works and solves a real problem is rare. Those who know how to do that are working for DOD doing the drones... Dara`

Thanks for the very interesting link Jim: the fact pca seems virtually as good as other techniques is interesting.

Dara: My past experience a couple of years ago was that lots of techniques just don't work as the dimensionality gets higher. I've spent the past couple of years trying to meaningful work on an opencl compiler in spite of management. However, i'm not sure how realistic it is for me in the UK to work on your cuda server.

`Thanks for the very interesting link Jim: the fact pca seems virtually as good as other techniques is interesting. Dara: My past experience a couple of years ago was that lots of techniques just don't work as the dimensionality gets higher. I've spent the past couple of years trying to meaningful work on an opencl compiler in spite of management. However, i'm not sure how realistic it is for me in the UK to work on your cuda server.`

In this comment John pointed to some current work on dynamical and statistical models for El Nino prediction. In particular see the "Weekly ENSO Evolution, Status, and Prediction Presentation (PDF)" under "Expert Discussions/Assessments".

It would be good to find out more about these models and what data and data reduction they use.

`In [this comment](http://forum.azimuthproject.org/discussion/1382/el-nino-project-thinking-about-the-next-steps/?Focus=11493#Comment_11493) John pointed to some [current work](http://www.cpc.ncep.noaa.gov/products/precip/CWlink/MJO/enso.shtml#discussion) on dynamical and statistical models for El Nino prediction. In particular see the "Weekly ENSO Evolution, Status, and Prediction Presentation (PDF)" under "Expert Discussions/Assessments". It would be good to find out more about these models and what data and data reduction they use.`

I like specificity in these matters:

SVR for higher dimensions is very fast, except it needs the global min/max which is cpu intensive, but that algorithm has been improved by the Differential Evolution and GPU servers. So there is much hope for new development there, but the programmers needed to do such work are not readily avaiable

Neural Networks, could be coded in all matrix operations which then again on GPU servers they would perform fast!

Random Forrest could be used for a good fast mow down to get rid of most useless cases, like a super large IF THEN ELSE for pre-processing, I have little experience with it but could be promissing

All you need is a login which I will provide to a multi-cpu core CUDA server (NVIDIA), SSH and SFTP, the rest could be done on your laptop enjoying fish & chips in UK :)

`>Dara: My past experience a couple of years ago was that lots of techniques just don’t work as the dimensionality gets higher. I like specificity in these matters: 1. SVR for higher dimensions is very fast, except it needs the global min/max which is cpu intensive, but that algorithm has been improved by the Differential Evolution and GPU servers. So there is much hope for new development there, but the programmers needed to do such work are not readily avaiable 2. Neural Networks, could be coded in all matrix operations which then again on GPU servers they would perform fast! 3. Random Forrest could be used for a good fast mow down to get rid of most useless cases, like a super large IF THEN ELSE for pre-processing, I have little experience with it but could be promissing >However, i’m nit sure how realistic it is for me in the uk to work on your cuda server. All you need is a login which I will provide to a multi-cpu core CUDA server (NVIDIA), SSH and SFTP, the rest could be done on your laptop enjoying fish & chips in UK :)`

We should clearly specify the problem to be learned or modelled before worrying about implementation and optimization. The air.995 dataset is approximately 10k by 25k readings for the complete 66 years of daily temperatures, which is reasonably small by current standards. Efficient implementations of modern algorithms should be able to deal with it directly. It would be better to get a baseline without dimension reduction, so that we can tell if it is actually helping when we do apply it.

`We should clearly specify the problem to be learned or modelled before worrying about implementation and optimization. The air.995 dataset is approximately 10k by 25k readings for the complete 66 years of daily temperatures, which is reasonably small by current standards. Efficient implementations of modern algorithms should be able to deal with it directly. It would be better to get a baseline without dimension reduction, so that we can tell if it is actually helping when we do apply it.`

I'd be inclined to view it the other way, in terms of first trying "off-the-shelf" algorithms with a dimensionaliy reduced dataset and then, for those that seem to give promising rsults, take much longer runing them on the original dimensionality dataset to see if hte dimensinoality reduction has reduced the accuracy of the results. However, I can entirely agree that this isn't an importnat mainstream Azimuth thing; I was just trying to check if I was looking at something that has already been investigated.

I'm thinking of trying some variety of L0 penalised sparse regression algorithms, but I'm not sure they're feasible on the unreduced dataset. But as I've mentioned rpreviously I'm not really going to comit to being part of themainstream El Nino effort.

`I'd be inclined to view it the other way, in terms of first trying "off-the-shelf" algorithms with a dimensionaliy reduced dataset and then, for those that seem to give promising rsults, take much longer runing them on the original dimensionality dataset to see if hte dimensinoality reduction has reduced the accuracy of the results. However, I can entirely agree that this isn't an importnat mainstream Azimuth thing; I was just trying to check if I was looking at something that has already been investigated. I'm thinking of trying some variety of L0 penalised sparse regression algorithms, but I'm not sure they're feasible on the unreduced dataset. But as I've mentioned rpreviously I'm not really going to comit to being part of themainstream El Nino effort.`

I have an interesting idea for experimenting with reducing data dimensionality.

This has to do with taking a long

simulatedtime series and discovering if there are any patterns of cyclic or quasi-cyclic behavior that may emerge. The situation that sparked my attention was this article by Palmer and Slingo http://rsta.royalsocietypublishing.org/content/369/1956/4751.fullThey present a very long running simulation of 2000 years worth of NINO3 SST cycles with a 20-year low-pass filter:

Note how well the cycles revert to the mean and show a consistent constrained amplitude with very few, if any, extreme outliers I would like to take this time series and apply a model or perhaps wavelet analysis to it and see what the reduced view would look like.

But I am not seeing how we can get the raw simulation data that feeds the chart above.

`I have an interesting idea for experimenting with reducing data dimensionality. This has to do with taking a long <i>simulated</i> time series and discovering if there are any patterns of cyclic or quasi-cyclic behavior that may emerge. The situation that sparked my attention was this article by Palmer and Slingo <http://rsta.royalsocietypublishing.org/content/369/1956/4751.full> They present a very long running simulation of 2000 years worth of NINO3 SST cycles with a 20-year low-pass filter: ![slingo](http://rsta.royalsocietypublishing.org/content/369/1956/4751/F12.medium.gif) Note how well the cycles revert to the mean and show a consistent constrained amplitude with very few, if any, extreme outliers I would like to take this time series and apply a model or perhaps wavelet analysis to it and see what the reduced view would look like. But I am not seeing how we can get the raw simulation data that feeds the chart above.`

David Tweed wrote:

Being able to detect signals using linear models is a seemingly signficant finding I took from the thesis.

Julian Sligo and Tim Palmer wrote in the paper Peter cited above, wrote:

This suggests that an important way forward is to improve the performance of non-linear dynamic models. I'm not sure where that leaves non-linear statistical analysis of the observations.

`David Tweed wrote: > the fact (that) pca seems virtually as good as other techniques is interesting. Being able to detect signals using linear models is a seemingly signficant finding I took from the thesis. Julian Sligo and Tim Palmer wrote in the paper Peter cited above, wrote: > However, these empirical correction methods are essentially linear and yet we know that the real system is highly nonlinear. As Turner et al. [16] have demonstrated, there is inherently much more predictive skill if improvements in model formulation could be made that reduce these biases, rather than correcting them after the fact. This suggests that an important way forward is to improve the performance of non-linear dynamic models. I'm not sure where that leaves non-linear statistical analysis of the observations.`

Hello Paul

I checked the references for the Slingo paper in #9:

Predictability of flow processing many scales of motion

Please see (1-3) p293, these are the flow equations to govern flux and currents (so to say). You find these all over the physics papers, in one particular famous place you find related equations in Maxwell's equations of Electricity and Magnetism (I stand corrected if this is a flawed view). Example for Electro Magnetic dynamical systems:

Topology bounds energy of knots and links

Basically on Sun's corona there are chaotic flows of plasma now captured on video, and the explanation based on CHAOS is now replaced in favor of a system of differential equations with braided or knotted solutions (field flows knotted). Each energy level of the system matched against a particular knot, change the energy the knot changes.

For understanding of the braiding see:

Topological Dissipation and the Small-Scale Fields in Tubulent Gases

The change to the knots are chaotic, per such paradigm and verbiage, but I do not see that as chaos.

In Slingo paper FIG 1 the third (C) shows a chaotic orbit, I see that as a densely braided orbit of some kind.

So this is how I am trying to think:

The energy changes to the atmosphere and waters, changed the knot or braided solutions to a system of equations modelling the convection flow of the surface and deep see rivers. The energy changes caused by CO2 emission or new forms of huge agriculture and livestock breeding or whatever.

Then in Slingo paper equation (1.1) being the model, I have to add:

At this point of time these just blabbering, brainstorming.

Dara

`Hello Paul I checked the references for the Slingo paper in #9: [Predictability of flow processing many scales of motion](http://onlinelibrary.wiley.com/doi/10.1111/j.2153-3490.1969.tb00444.x/pdf) Please see (1-3) p293, these are the flow equations to govern flux and currents (so to say). You find these all over the physics papers, in one particular famous place you find related equations in Maxwell's equations of Electricity and Magnetism (I stand corrected if this is a flawed view). Example for Electro Magnetic dynamical systems: [Topology bounds energy of knots and links](http://rspa.royalsocietypublishing.org/content/464/2090/293.full.pdf) Basically on Sun's corona there are chaotic flows of plasma now captured on video, and the explanation based on CHAOS is now replaced in favor of a system of differential equations with braided or knotted solutions (field flows knotted). Each energy level of the system matched against a particular knot, change the energy the knot changes. For understanding of the braiding see: [Topological Dissipation and the Small-Scale Fields in Tubulent Gases ](http://articles.adsabs.harvard.edu/cgi-bin/nph-iarticle_query?bibcode=1972ApJ...174..499P&db_key=AST&page_ind=0&plate_select=NO&data_type=GIF&type=SCREEN_GIF&classic=YES) The change to the knots are chaotic, per such paradigm and verbiage, but I do not see that as chaos. In Slingo paper FIG 1 the third (C) shows a chaotic orbit, I see that as a densely braided orbit of some kind. So this is how I am trying to think: The energy changes to the atmosphere and waters, changed the knot or braided solutions to a system of equations modelling the convection flow of the surface and deep see rivers. The energy changes caused by CO2 emission or new forms of huge agriculture and livestock breeding or whatever. Then in Slingo paper equation (1.1) being the model, I have to add: 1. This equation needs to be temporal or localized in time via its coefficients change with passage of time 2. Machine Learning used to predict the next set of coefficients, we take those forecasted coeffcients and re-solve the 1.1 and that is how the computation model should work as opposed to a general global solution for all times! 3. Removal of the periodical parts of the signal for 1-2 to be done via the wavelet decomposition as opposed to human fiddling with data At this point of time these just blabbering, brainstorming. Dara`

Hello Paul

Could we take the actual data and make our own simulator and solve the diff eq (if possible) and apply the wavelets to removed trends?

What is there to prevent us?

D

`Hello Paul Could we take the actual data and make our own simulator and solve the diff eq (if possible) and apply the wavelets to removed trends? What is there to prevent us? D`

Dara,

If you are looking for some other types of proxy data to represent ENSO, there is this one:

This comes from http://www.nature.com/nature/journal/v420/n6912/full/nature01194.html, a study of sedimentation in Laguna Pallcacocha, southern Ecuador.

`Dara, If you are looking for some other types of proxy data to represent ENSO, there is this one: ![equador](http://www.nature.com/nature/journal/v420/n6912/images/nature01194-f1.2.jpg) This comes from <http://www.nature.com/nature/journal/v420/n6912/full/nature01194.html>, a study of sedimentation in Laguna Pallcacocha, southern Ecuador.`

Ok it seems one us knows a lot more about the data :) why don't we do some experimental code in Mathematica (I could do most of the work) and you mix and match data an equations and show it to John see if new theoretical models could be thought of.

I do not understand these charts nor the equations, I need to code them and compute stuff around them to form a conceptual model in my head

Dara

`Ok it seems one us knows a lot more about the data :) why don't we do some experimental code in Mathematica (I could do most of the work) and you mix and match data an equations and show it to John see if new theoretical models could be thought of. I do not understand these charts nor the equations, I need to code them and compute stuff around them to form a conceptual model in my head Dara`

Back on the subject of dimensionality reduction, my thoughts are to try PCA as a baseline but also try to find a reduced set of spatial points which have the "greatest range of cross correlation behaviours across timesteps", in order to bring down 207 points to something more like 30 points. I'm still looking at how to characterise range of cross correlaion, the basic idea being:

As Jim poins out above, lots of machine learning is linear. However there's a popular subarea for using a non-linear per-point mapping to a new space and then learning a linear model there (pretty much anyhing with kernel in the name). Using cross correlations, like Ludescher

et al's manually built model, but using an automated learning procedure is one way to get a weak non-linearity into any models.On that if two points are constantly correlated, anti-correlated or uncorrelated then heuristically they're unlikely to be useful for a model; it's points whose correlation varies that may carry information.

I'm pondering something like either the full range or a $x$ -- $(100-x)$ percentile range for robustness to a few outliers. But this is just noting down my thoughts as I go rather than an indication this will turn out to be a worthwhile approach.

`Back on the subject of dimensionality reduction, my thoughts are to try PCA as a baseline but also try to find a reduced set of spatial points which have the "greatest range of cross correlation behaviours across timesteps", in order to bring down 207 points to something more like 30 points. I'm still looking at how to characterise range of cross correlaion, the basic idea being: 1. As Jim poins out above, lots of machine learning is linear. However there's a popular subarea for using a non-linear per-point mapping to a new space and then learning a linear model there (pretty much anyhing with kernel in the name). Using cross correlations, like Ludescher _et al_'s manually built model, but using an automated learning procedure is one way to get a weak non-linearity into any models. 2. On that if two points are constantly correlated, anti-correlated or uncorrelated then heuristically they're unlikely to be useful for a model; it's points whose correlation varies that may carry information. I'm pondering something like either the full range or a $x$ -- $(100-x)$ percentile range for robustness to a few outliers. But this is just noting down my thoughts as I go rather than an indication this will turn out to be a worthwhile approach.`

I'm glad you guys are thinking about this stuff.

Here's one thing I keep thinking: a very naive thought about meteorology. The ocean has a lot more heat capacity than the air, and I think it's mainly the ocean that gets heated up by sunlight, not the air above it, since the air is almost transparent. So I'd guess that the dynamics of heat in the ocean is more important than the atmosphere. Maybe the air temperature readings are largely a "proxy" for the ocean surface temperature.

If something like this is true, it might be better to use ocean temperature data than surface air temperature data.

And this is (indirectly) related to the issue of dimensionality reduction.

The TAO/TRITON array has roughly 70 buoys in the tropical Pacific, each measuring temperatures at 10 depths. So it's a 3d array of about 700 points, instead of what Ludescher are using, which is a 2d array of 27 × 69 = 1863 points, which they "subsample" to get 9 × 23 = 207 points.

Well, maybe it's not that relevant to David's worries in the short term! A lot of the same issues will show up. But it may be good to keep in mind, that

maybebig gains can be made by focusing on the ocean instead of the air - or focusing on both.`I'm glad you guys are thinking about this stuff. Here's one thing I keep thinking: a very naive thought about meteorology. The ocean has a lot more heat capacity than the air, and I think it's mainly the ocean that gets heated up by sunlight, not the air above it, since the air is almost transparent. So I'd guess that the dynamics of heat in the ocean is more important than the atmosphere. Maybe the air temperature readings are largely a "proxy" for the ocean surface temperature. If something like this is true, it might be better to use ocean temperature data than surface air temperature data. And this is (indirectly) related to the issue of dimensionality reduction. The TAO/TRITON array has roughly 70 buoys in the tropical Pacific, each measuring temperatures at 10 depths. So it's a 3d array of about 700 points, instead of what Ludescher are using, which is a 2d array of 27 × 69 = 1863 points, which they "subsample" to get 9 × 23 = 207 points. Well, maybe it's not that relevant to David's worries in the short term! A lot of the same issues will show up. But it may be good to keep in mind, that _maybe_ big gains can be made by focusing on the ocean instead of the air - or focusing on both.`

David wrote:

Okay, so here's a question!

Ludescher

et algot those 207 points by starting with a rectangle of 1863 points andaveragingthe temperatures over 3 × 3 squares: 207 = 1863/(3 × 3).Given what you just said, would

youhave done that to get temperatures at 207 points? Or would you have preferred to use the temperatureat one pointchosen from each 3 × 3 square?Presumably your answer to this question will be related to your choice of strategy for getting from 207 points down to 30 points!

`David wrote: > Back on the subject of dimensionality reduction, my thoughts are to try PCA as a baseline but also try to find a reduced set of spatial points which have the “greatest range of cross correlation behaviours across timesteps”, in order to bring down 207 points to something more like 30 points. Okay, so here's a question! Ludescher _et al_ got those 207 points by starting with a rectangle of 1863 points and _averaging_ the temperatures over 3 × 3 squares: 207 = 1863/(3 × 3). Given what you just said, would _you_ have done that to get temperatures at 207 points? Or would you have preferred to use the temperature _at one point_ chosen from each 3 × 3 square? Presumably your answer to this question will be related to your choice of strategy for getting from 207 points down to 30 points!`

100%.

I like to use this 3D volumetric data, I will try to find the data. The new GPM satellites I believe could scan depth temperature I suspect.

I believe we would only know that by actually writing forecast code and measuring the accuracies. Cannot fathom any theoretical way.

Dara

`>If something like this is true, it might be better to use ocean temperature data than surface air temperature data. 100%. >The TAO/TRITON array has roughly 70 buoys in the tropical Pacific, each measuring temperatures at 10 depths. So it’s a 3d array of about 700 points, instead of what Ludescher are using, which is a 2darray of 27 × 69 = 1863 points, which they “subsample” to get 9 × 23 = 207 points. I like to use this 3D volumetric data, I will try to find the data. The new GPM satellites I believe could scan depth temperature I suspect. >that maybe big gains can be made by focusing on the ocean instead of the air - or focusing on both. I believe we would only know that by actually writing forecast code and measuring the accuracies. Cannot fathom any theoretical way. Dara`

Here's the TAO/TRITON data:

As usual it takes a while to figure out how to use this page!

Actually if I were a meteorologist who specialized in El Niños I could probably guess the answer. I think Jim Stuttard's friend wrote something about this... But still, we should actually do the experiment!

`Here's the TAO/TRITON data: * [TAO/TRITON data delivery](http://www.pmel.noaa.gov/tao/data_deliv/deliv.html). As usual it takes a while to figure out how to use this page! > I believe we would only know that by actually writing forecast code and measuring the accuracies. Cannot fathom any theoretical way. Actually if I were a meteorologist who specialized in El Niños I could probably guess the answer. I think Jim Stuttard's friend wrote something about this... But still, we should actually do the experiment!`

Hi John, the wiki is smoking for me at the moment so I haven't been able to update any notes.

So the general dimension reduction is going to depend a bit on what you're going to do with it. My thought is that looking at correlation between different spatial and temporal anomalies is a quite promising "primitive", so I've been thinking about making a feature vector/matrix out of:

Points 2--5 are quite coarsely grained but give 4 * 4 * 2 * 2 = 64 choices. This gives an 870 * 64 matrix (or possibly flattened into a vector) of features at each time point, which is about the sort of size that my laptops can handle. If it proves in any way promising we can look at trying to do bigger, less coarse computations on bigger machines.

So that's what I'm thinking about trying to achieve. (Ctd)

`Hi John, the wiki is smoking for me at the moment so I haven't been able to update any notes. So the general dimension reduction is going to depend a bit on what you're going to do with it. My thought is that looking at correlation between different spatial and temporal anomalies is a quite promising "primitive", so I've been thinking about making a feature vector/matrix out of: 1. An ordered pair of spatial "locations", so that's 30 * 29 possibilitites 2. Temporal lag for the first location: 0 days, 32 days, 64 days, 128 days 3. Temporal lag for second location relative to first: 0 days, 32 days, 64 days, 128 days. 4. Correlation window length: 128 days, 256 days 5. Whether we're using the maximum or minimum correlation value. (Later could refine this to say 5% & 95% quartile value to reduce ouliers effect.) Points 2--5 are quite coarsely grained but give 4 * 4 * 2 * 2 = 64 choices. This gives an 870 * 64 matrix (or possibly flattened into a vector) of features at each time point, which is about the sort of size that my laptops can handle. If it proves in any way promising we can look at trying to do bigger, less coarse computations on bigger machines. So that's what I'm thinking about trying to achieve. (Ctd)`

(Continuation.) In terms of how I'd actually perform the data reduction I've been thinking and researching and basically come to a view that for a first experiment doing just 3x3 agglomeration of Ludescher et al's points should be quick and easy (particularly since I'm new to R). Once prototype calculations have been done, I'd have been inclined to try to actively cluster the points based not necessarily upon spatial proximity but on a score based on the "degree of variation in their correlation values". In terms of which multiclass clustering algorithms I'm thinking two promising ones seem to be

Unfortunately I haven't been able to find any downloadable code that will just do those operations on a dataset.

`(Continuation.) In terms of how I'd actually perform the data reduction I've been thinking and researching and basically come to a view that for a first experiment doing just 3x3 agglomeration of Ludescher et al's points should be quick and easy (particularly since I'm new to R). Once prototype calculations have been done, I'd have been inclined to try to actively cluster the points based not necessarily upon spatial proximity but on a score based on the "degree of variation in their correlation values". In terms of which multiclass clustering algorithms I'm thinking two promising ones seem to be 1. [Multiclass spectral clustering](http://en.wikipedia.org/wiki/Spectral_clustering). 2. [Sparse PCA](http://en.wikipedia.org/wiki/Sparse_PCA). Unfortunately I haven't been able to find any downloadable code that will just do those operations on a dataset.`

Surface air temperature and sea surface temperature are highly correlated over the ocean. For that reason, it probably doesn't matter too much which you use.

`Surface air temperature and sea surface temperature are highly correlated over the ocean. For that reason, it probably doesn't matter too much which you use.`

Nathan - thanks! But the TAO/TRITON array gives temperatures somewhat deeper, too, and I'd expect them to give new information, since the boundary between warmer and colder water moves in interesting ways as an El Niño starts.

`Nathan - thanks! But the TAO/TRITON array gives temperatures somewhat deeper, too, and I'd expect them to give new information, since the boundary between warmer and colder water moves in interesting ways as an El Niño starts.`

David Tweed wrote:

Doesn't deleting all cookies containing the string "Azimuth" solve the problem? I just do that each time this bug hits; it works every time.

`David Tweed wrote: > Hi John, the wiki is smoking for me at the moment so I haven’t been able to update any notes. Doesn't deleting all cookies containing the string "Azimuth" solve the problem? I just do that each time this bug hits; it works every time.`

I didn't know about clearing the cookies until yesterday, when I tried it and it did work.

`I didn't know about clearing the cookies until yesterday, when I tried it and it did work.`

I thought you were mainly talking about ocean surface temperatures. Yes, if you get down into the mixed layer, that will likely be useful. Some examples of using just the depth of the thermocline (ignoring actual temperatures) as a predictor are Newman et al. (2011) and Aiken et al. (2013).

`I thought you were mainly talking about ocean surface temperatures. Yes, if you get down into the mixed layer, that will likely be useful. Some examples of using just the depth of the thermocline (ignoring actual temperatures) as a predictor are [Newman et al. (2011)](http://link.springer.com/article/10.1007%2Fs00382-011-1034-0) and [Aiken et al. (2013)](http://onlinelibrary.wiley.com/doi/10.1002/grl.50264/abstract).`