Thinking about it some more, I don't hold much hope that a credentialed climate scientist would be interested in the approach I am taking. I think it may be too barebones. By that I mean that I tend to work the physics down to the bone so that all is left is the first-order impacts.

Recently I have been running a machine learning algorithm on the ENSO SOI data set. The only objective constraint I applied was that the learning algorithm was trying to minimize the wave equation LHS and RHS difference.

$ \frac{d^2 x(t)}{dt^2} = -k(t) x(t) + f(t) $

So it essentially tries to find symbolic representations of f(t) ( and k(t) /= constant for the Hill/Mathieu variant of the wave equation).

What is not unexpected but certainly validating is that the machine learning discovers solutions that substantiate the results that I get from solving the DiffEq via Mathematica.

This is the raw SOI data:

![SOI Raw](http://imageshack.com/a/img673/4210/pIUv0i.gif)

This is an example solution that the machine learning finds, where *D* is the derivative operator -- applied twice -- and *sma* is a smoothing filter -- applied as a triple window to remove ringing:

![Example Solution](http://imageshack.com/a/img673/2871/lxC5GR.gif)

The algorithm is Eueqa's symbolic regression. The second derivative of the data is plotted alongside that generated by Eureqa (blue=data, red=model fit).

![Example Profile](http://imageshack.com/a/img538/622/uYpOrO.gif)

As I have found earlier, the f(t) function has a profile similar to the Quasi-Biennial Oscillation (QBO), which is known to have a relationship to ENSO as a wind-forcing mechanism. The SOI data 1900 to 1980 was used as a training interval, and then the symbolic expression extended outside this interval:

![Training Interval](http://imageshack.com/a/img673/3219/IizGGc.gif)

The QBO data is only available after 1953 as radiosonde measurements became common. If we plot f(t) and QBO(t) together one can see how they correlate for the lower altitude 70 hPa QBO data. Remember that the SOI data from 1900 to 1980 was used and it correlates to the QBO _after_ 1980.

![Training Interval Fit](http://imageshack.com/a/img673/5668/5qAAXc.gif)

The fundamental frequency of the QBO corresponds to 2.33 years and this lines up with the higher altitude 20 hPa QBO data:

![Simple QBO fit](http://imageshack.com/a/img673/8396/MYxsTj.gif)

Another period found by the machine learning corresponds to the 6.47 years of the Chandler Wobble (CW) cycle. Like the QBO this was inferred previously and has its origins as a mechanical forcing due to changes in the angular momentum of the earth.

So the next idea is to feed the machine learning these periods, the QBO=2.33 and CW=6.47 as well as the Hale sunspot cycle of 11 years and see how it can fit the data.

The twist is to use Clarke's estimated value of K=2.2 rad/yr^2 for the ocean's wave resonance and group the x(t) on the LHS, assuming that k(t)=K is constant over time. (It is likely not according to liquid sloshing theory but it is a perturbing simplification here)

$ \frac{d^2 x(t)}{dt^2} + K x(t) = f(t) $

In other words we have the LHS representing mainly data and the RHS the unknown forcing f(t).

LHS data ~ RHS forcing

The other factor we add is a conditional that enables the algorithm to evaluate the Pacific climate shift that occurred around 1976-1977, so that we can use the entire time-series from 1880-2014. Eureqa was also set to minimize the correlation coefficient error, as this works most efficiently to home in on a solution.

![SOI Solution](http://imageshack.com/a/img540/3766/SUcFcK.gif)

The result I picked from the set of solutions is the highest complexity in the Pareto front. This is actually not as a high a complexity as it looks because the 1976 transition essentially doubled the solution space complexity. The blue is the data and the red is the symbolic solution discovered. Look carefully and you can see how well the peaks are covered and how the 1976 transition reveals itself.

![SOI Profile](http://imageshack.com/a/img913/5517/EvH9gl.gif)

The correlation coefficient is illustrated by a model vs data scatter plot:

![SOI Scatter](http://imageshack.com/a/img913/4093/p2cLf3.gif)

The symbolic regression fit that Eureqa found is also presented in a parameterized form. Each of the factors has some meaning which I will go through in a separate post. The behavior after 1976 is also very interesting as this could be an effect of a change in TSI, a change in ocean characteristics, or perhaps a resonance in the wave equation or an instability manifested in the sloshing Hill/Mathieu variant. I see this in DiffEq solutions if k(t) modulates too strongly and at a specific frequency.

![SOI Model](http://imageshack.com/a/img673/295/gFsHEX.gif)

These are the statistics. The correlation coefficient of 0.76 is very good for such an erratic and noisy signal source.

![Model Stats](http://imageshack.com/a/img661/6949/cq18H1.gif)

One can do all this with a spreadsheet as well, by implementing an averager in a cell and then doing a discretized second derivative from neighboring cells.

The overall approach is all very straightforward, as it maps well to the brute-force Mathematica DiffEq integration that I have been favoring elsewhere. Moreover it provides further substantiation for the simple first-order physics that I am applying.

Recently I have been running a machine learning algorithm on the ENSO SOI data set. The only objective constraint I applied was that the learning algorithm was trying to minimize the wave equation LHS and RHS difference.

$ \frac{d^2 x(t)}{dt^2} = -k(t) x(t) + f(t) $

So it essentially tries to find symbolic representations of f(t) ( and k(t) /= constant for the Hill/Mathieu variant of the wave equation).

What is not unexpected but certainly validating is that the machine learning discovers solutions that substantiate the results that I get from solving the DiffEq via Mathematica.

This is the raw SOI data:

![SOI Raw](http://imageshack.com/a/img673/4210/pIUv0i.gif)

This is an example solution that the machine learning finds, where *D* is the derivative operator -- applied twice -- and *sma* is a smoothing filter -- applied as a triple window to remove ringing:

![Example Solution](http://imageshack.com/a/img673/2871/lxC5GR.gif)

The algorithm is Eueqa's symbolic regression. The second derivative of the data is plotted alongside that generated by Eureqa (blue=data, red=model fit).

![Example Profile](http://imageshack.com/a/img538/622/uYpOrO.gif)

As I have found earlier, the f(t) function has a profile similar to the Quasi-Biennial Oscillation (QBO), which is known to have a relationship to ENSO as a wind-forcing mechanism. The SOI data 1900 to 1980 was used as a training interval, and then the symbolic expression extended outside this interval:

![Training Interval](http://imageshack.com/a/img673/3219/IizGGc.gif)

The QBO data is only available after 1953 as radiosonde measurements became common. If we plot f(t) and QBO(t) together one can see how they correlate for the lower altitude 70 hPa QBO data. Remember that the SOI data from 1900 to 1980 was used and it correlates to the QBO _after_ 1980.

![Training Interval Fit](http://imageshack.com/a/img673/5668/5qAAXc.gif)

The fundamental frequency of the QBO corresponds to 2.33 years and this lines up with the higher altitude 20 hPa QBO data:

![Simple QBO fit](http://imageshack.com/a/img673/8396/MYxsTj.gif)

Another period found by the machine learning corresponds to the 6.47 years of the Chandler Wobble (CW) cycle. Like the QBO this was inferred previously and has its origins as a mechanical forcing due to changes in the angular momentum of the earth.

So the next idea is to feed the machine learning these periods, the QBO=2.33 and CW=6.47 as well as the Hale sunspot cycle of 11 years and see how it can fit the data.

The twist is to use Clarke's estimated value of K=2.2 rad/yr^2 for the ocean's wave resonance and group the x(t) on the LHS, assuming that k(t)=K is constant over time. (It is likely not according to liquid sloshing theory but it is a perturbing simplification here)

$ \frac{d^2 x(t)}{dt^2} + K x(t) = f(t) $

In other words we have the LHS representing mainly data and the RHS the unknown forcing f(t).

LHS data ~ RHS forcing

The other factor we add is a conditional that enables the algorithm to evaluate the Pacific climate shift that occurred around 1976-1977, so that we can use the entire time-series from 1880-2014. Eureqa was also set to minimize the correlation coefficient error, as this works most efficiently to home in on a solution.

![SOI Solution](http://imageshack.com/a/img540/3766/SUcFcK.gif)

The result I picked from the set of solutions is the highest complexity in the Pareto front. This is actually not as a high a complexity as it looks because the 1976 transition essentially doubled the solution space complexity. The blue is the data and the red is the symbolic solution discovered. Look carefully and you can see how well the peaks are covered and how the 1976 transition reveals itself.

![SOI Profile](http://imageshack.com/a/img913/5517/EvH9gl.gif)

The correlation coefficient is illustrated by a model vs data scatter plot:

![SOI Scatter](http://imageshack.com/a/img913/4093/p2cLf3.gif)

The symbolic regression fit that Eureqa found is also presented in a parameterized form. Each of the factors has some meaning which I will go through in a separate post. The behavior after 1976 is also very interesting as this could be an effect of a change in TSI, a change in ocean characteristics, or perhaps a resonance in the wave equation or an instability manifested in the sloshing Hill/Mathieu variant. I see this in DiffEq solutions if k(t) modulates too strongly and at a specific frequency.

![SOI Model](http://imageshack.com/a/img673/295/gFsHEX.gif)

These are the statistics. The correlation coefficient of 0.76 is very good for such an erratic and noisy signal source.

![Model Stats](http://imageshack.com/a/img661/6949/cq18H1.gif)

One can do all this with a spreadsheet as well, by implementing an averager in a cell and then doing a discretized second derivative from neighboring cells.

The overall approach is all very straightforward, as it maps well to the brute-force Mathematica DiffEq integration that I have been favoring elsewhere. Moreover it provides further substantiation for the simple first-order physics that I am applying.