Options

Symbolic Regression machine learning and ENSO time series

I applied a symbolic regression machine learning algorithm to the SOI time series from 1880 to the present time.

The input was

$ \frac{d^2{soi(t)}}{dt^2} = f(soi(t), t) $

and it came up with the following as the expansion for f(t)

$ \frac{d^2{soi(t)}}{dt^2} = [ A + B \cos(\omega t) \cos(2\omega t) ] soi(t) + Q qbo(t) + K_1 \cos(\omega t) + K_2 \cos(2 \omega t) + W \cos(t) $

The value of $\omega$ was approximately a 20 year period, the QBO term matched the 28 month observed along with its frequency modulation (see the discussion earlier this week), and the last term matches the beat frequency of the Chandler wobble with a ~6.4 year period.

What the symbolic regression learning (Eureqa) did was to approximate the second derivative in terms of the LHS rigidity (A and B terms) and the RHS forcing terms (Q, K's, and W). It tried to simplify the expression by providing as many common factors as possible -- which explains the prevalence of the $\omega$ terms.

This is not the complete solution though; so what I did next was to solve the equation using Mathematica. This involved tweaking the parameters and initial conditions enough to maximize the correlation coefficient between the model and the SOI time series data.

soi-loess

I highlighted the regions that appear the most problematic. The first yellow area in 1982 corresponds to the significant El Nino of 1982/1983 and volcanic eruption at El Chichon. The second highlighted area starting around 1991 corresponds to the Pinatubo eruption. I have read that causal relationships between volcanic eruption and ENSO extremes is tenuous at best [1], but it probably needs to be considered to explain deviations. In other words, volcanic eruptions could be considered disruptive linked connections (to say the least).

[1] Frölicher, Thomas Lukas, Fortunat Joos, Christoph Cornelius Raible, and Jorge Louis Sarmiento. “Atmospheric CO 2 Response to Volcanic Eruptions: The Role of ENSO, Season, and Variability: VOLCANOES AND THE GLOBAL CARBON BUDGET.” Global Biogeochemical Cycles 27, no. 1 (March 2013): 239–51. doi:10.1002/gbc.20028.

«1

Comments

  • 1.

    This is the chart of the 3-term emulation that I am using for the QBO forcing factor in the above SOI model

    QBO3term

    Note how well it is synchronized to the data as it follows the subtle frequency modulation -- not perfect but maybe good enough to emulate the true forcing behavior.

    It is also important to realize that QBO measurements only go back to 1953, and so I am using this emulation to back-extrapolate the forcing to the start of the SOI time-series in 1880. In other words, we don't have any data for the QBO over the 73 years from 1880 to 1953, so that this emulation is the best that we can do.
    Yet, amazingly the correlation coefficient of the SOI model against the SOI time series is better over the early interval 0.67 vs 0.60 for the later interval (with an average of 0.635 over the entire span).

    Getting there ...

    Comment Source:This is the chart of the 3-term emulation that I am using for the QBO forcing factor in the above SOI model ![QBO3term](http://imageshack.com/a/img661/9379/p1sVbP.gif) Note how well it is synchronized to the data as it follows the subtle frequency modulation -- not perfect but maybe good enough to emulate the true forcing behavior. It is also important to realize that QBO measurements only go back to 1953, and so I am using this emulation to back-extrapolate the forcing to the start of the SOI time-series in 1880. In other words, we don't have any data for the QBO over the 73 years from 1880 to 1953, so that this emulation is the best that we can do. Yet, amazingly the correlation coefficient of the SOI model against the SOI time series is better over the early interval 0.67 vs 0.60 for the later interval (with an average of 0.635 over the entire span). Getting there ...
  • 2.

    This is a concise example of how the symbolic regression of Eureqa works.

    As a premise, consider that we want to identify a Mathieu function in some time series.

    MathieuC

    This function is the even solution of the following Mathieu differential equation

    $ f''(t) + [a-2q \cos(2\omega t)] f(t) = 0 $

    To see if Eureqa can find this solution, we compose a trial target expression composed of an unknown combination of mathematical factors. The analysis trick here is to leave the function as the second derivative, to see if we can discover the remaining terms.

    MathieuProblem

    The Mathematica data file from the first chart is entered into Eureqa, which would be the solution to this Mathieu DiffEq

    $ f''(t) + [2.8-5.6 \cos(2t)] f(t) = 0 $

    For this example, it takes Eureqa a couple of seconds to home in on the result, with a negligible error. The solution to look for is the one that appears at the knee of the Pareto frontier in the lower-right panel below.

    MathieuSolution

    The numbers it returns are close to those input as a Mathieu function 2.795 ~ 2.8, 5.584 ~ 5.6, and 2=2.

    That's all it takes -- input the time series and the machine learning does the rest.

    Unfortunately, in real problems we have the problems with noise and confounding factors. Yet, bottom-line is that this approach saves a lot of time and guards against accusations of over-fitting. What the Eureqa solver discovers has no underlying subjective biases and only finds results based on generalized symbolic expressions. So when I apply it to an ENSO time-series such as SOI, and I use it to reconstruct the differential equation solution of Comment #1, you have to take the results with more than a grain of salt. In other words, I am not telling it to find the QBO factor in Comment #2 --- it does that all on its own. Pretty neat!

    BTW, this is what Eureqa claims according to the blurb on its web site:

    "Traditional machine learning techniques like neural networks and regression trees are capable tools for prediction, but become impractical when 'solving the problem' involves understanding how you arrive at the answer.

    Eureqa uses a breakthrough machine learning technique called Symbolic Regression to unravel the intrinsic relationships in data and explain them as simple math. Using Symbolic Regression, Eureqa can create incredibly accurate predictions that are easily explained and shared with others"

    I think the "understanding" part is valuable, and why I have shied away from neural networks and EOFs and other approaches that tend to obscure the underlying reveal symbolic patterns and expressions.

    Comment Source:This is a concise example of how the symbolic regression of [Eureqa](http://www.nutonian.com/products/eureqa/) works. As a premise, consider that we want to identify a Mathieu function in some time series. ![MathieuC](http://imageshack.com/a/img745/5276/y8ydrY.gif) This function is the even solution of the following Mathieu differential equation $ f''(t) + [a-2q \cos(2\omega t)] f(t) = 0 $ To see if Eureqa can find this solution, we compose a trial target expression composed of an unknown combination of mathematical factors. The analysis trick here is to leave the function as the second derivative, to see if we can discover the remaining terms. ![MathieuProblem](http://imageshack.com/a/img540/7981/9xZZBa.gif) The Mathematica data file from the first chart is entered into Eureqa, which would be the solution to this Mathieu DiffEq $ f''(t) + [2.8-5.6 \cos(2t)] f(t) = 0 $ For this example, it takes Eureqa a couple of seconds to home in on the result, with a negligible error. The solution to look for is the one that appears at the knee of the Pareto frontier in the lower-right panel below. ![MathieuSolution](http://imageshack.com/a/img537/1158/1YahMr.gif) The numbers it returns are close to those input as a Mathieu function 2.795 ~ 2.8, 5.584 ~ 5.6, and 2=2. That's all it takes -- input the time series and the machine learning does the rest. Unfortunately, in real problems we have the problems with noise and confounding factors. Yet, bottom-line is that this approach saves a lot of time and guards against accusations of over-fitting. What the Eureqa solver discovers has no underlying subjective biases and only finds results based on generalized symbolic expressions. So when I apply it to an ENSO time-series such as SOI, and I use it to reconstruct the differential equation solution of Comment #1, you have to take the results with more than a grain of salt. In other words, I am not telling it to find the QBO factor in Comment #2 --- it does that all on its own. Pretty neat! BTW, this is what Eureqa claims according to the blurb on its web site: > "Traditional machine learning techniques like neural networks and regression trees are capable tools for prediction, but become impractical when 'solving the problem' involves understanding how you arrive at the answer. > Eureqa uses a breakthrough machine learning technique called Symbolic Regression to unravel the intrinsic relationships in data and explain them as simple math. Using Symbolic Regression, Eureqa can create incredibly accurate predictions that are easily explained and shared with others" I think the "understanding" part is valuable, and why I have shied away from neural networks and EOFs and other approaches that tend to obscure the underlying reveal symbolic patterns and expressions.
  • 3.
    edited October 2014

    Paul

    I do not mean to be a party pooper, but I get nothing close to your computations! Reason is due to the fact that you are smoothing the data in your code with MeanFilter[] .

    If you decompose the signal with Wavelets you get multiple trends with rather high amplitude and not clear which one to use!? If you use MeanFilter[] then the data is too smoothed, IMHO.

    If you do not decompose the data, I could predict the next time interval with ok accuracy, but not the next 6 which John needs.

    I think the “understanding” part is valuable, and why I have shied away from neural networks and EOFs and other approaches that tend to obscure the underlying reveal symbolic patterns and expressions.

    This is so due to the fact that the Trend for signal is periodic, therefore sin() cos() based formulas could approximate it to all right accuracy, that is if you decompose the signal and use the Trend. If you do not decompose the signal I doubt you get the symbolic equations you are obtaining. I am curious if you could?

    There are some aspects of the planetary physics which are tied to quite simple equations and at that calendaring of events are easily computable with simple formulas, this is one of many examples:

    lunar calendar formulation

    For Tides:

    Some tides formulation

    But I suspect, from earlier comments by John, that some seemingly slight variations could cause a differential equation with a noisy damper/force to have drastically unstable solutions. If you smooth the signal, you get rid of the noisy damper/force and you get ordinary second order equations with sin () and cos ().

    D

    Comment Source:Paul I do not mean to be a party pooper, but I get nothing close to your computations! Reason is due to the fact that you are smoothing the data in your code with MeanFilter[] . If you decompose the signal with Wavelets you get multiple trends with rather high amplitude and not clear which one to use!? If you use MeanFilter[] then the data is too smoothed, IMHO. If you do not decompose the data, I could predict the next time interval with ok accuracy, but not the next 6 which John needs. >I think the “understanding” part is valuable, and why I have shied away from neural networks and EOFs and other approaches that tend to obscure the underlying reveal symbolic patterns and expressions. This is so due to the fact that the Trend for signal is periodic, therefore sin() cos() based formulas could approximate it to all right accuracy, that is if you decompose the signal and use the Trend. **If you do not decompose the signal I doubt you get the symbolic equations you are obtaining**. I am curious if you could? There are some aspects of the planetary physics which are tied to quite simple equations and at that calendaring of events are easily computable with simple formulas, this is one of many examples: [lunar calendar formulation](http://www.hermetic.ch/cal_stud/lunarcal/luncal.htm) For Tides: [Some tides formulation](http://www.linz.govt.nz/sites/default/files/docs/hydro/tidal-info/tide-tables/mfth-between-hlw.pdf) But I suspect, from earlier comments by John, that some seemingly slight variations could cause a differential equation with a noisy damper/force to have drastically unstable solutions. If you smooth the signal, you get rid of the noisy damper/force and you get ordinary second order equations with sin () and cos (). D
  • 4.

    I have made a lot of progress recently on understanding the behavior of ENSO, despite a few false starts early on. I guess one of the problems with machine learning is that you will likely never duplicate the results precisely -- the way these algorithms work is that the search path is dependent on so many factors and not many of these are actually controllable. Moreover, the people that developed Eureqa are constantly providing updates to their algorithm and so I doubt that a regression test is even worthwhile. The bottom-line is that after putting the experiment aside many times over the past year, the last time I tried the Eureqa machine learning, the results finally clicked and apparently everything fell into place.

    The fact that the the component the machine learning algorithm found as a forcing factor is an almost exact match to the behavior of QBO , makes the evidence overwhelming that this is the right path to take. It is important to realize that what the machine learning found matches perfectly the (1)phase, (2)period, and (3) intricate frequency modulation of the QBO waveform as shown in Comment #2 above. And then when we put that together with a solid differential equation solution, I am confident that I have a solution that I can stand by.

    Do a Google search for the number of references that speculate that the QBO is a strong driver for ENSO and we also have a scientific consensus to draw from. The difference here, is that this is probably the first time that someone has actually fed the QBO forcing into a model of the ENSO response function (i.e. the Mathieu DiffEq) and produced something that matches the results over the past 130+ years this effectively. I am really starting to see some daylight here and can now concentrate all my efforts on tailoring the model and writing up a more formal white paper.

    Now on to your specific concerns on the filtering I am doing.

    First of all, the SOI measure is very noisy at the sub-annual level, in particular in comparison to the SST measures. I tend to high-pass filter the SOI to the point that it matches the SST in the noise ripple level. I do not think there is anything wrong with that.

    Second, the low-pass filtering I am doing is critical to revealing the wave-equation-like solution -- having a strong DC level or low-frequency offset does no good when one is trying to find the true zero crossings that are the hallmark of solutions to a resonant second-order differential equation. Once I find the solution, having applied the filtered waveform to discriminate the results more precisely, I can always add the low-frequency multi-decadal factor back in place.

    And the real question is: How can my filtering have done anything to bias what the machine learning has found? Did I destroy the original waveform so much that it forced a hidden QBO waveform to spontaneously appear? That would be preposterous to say the least.

    Thanks for the comments and this is where the discussion is getting good. I will add in more commentary as I try to digest your other concerns.

    Comment Source:I have made a lot of progress recently on understanding the behavior of ENSO, despite a few false starts early on. I guess one of the problems with machine learning is that you will likely never duplicate the results precisely -- the way these algorithms work is that the search path is dependent on so many factors and not many of these are actually controllable. Moreover, the people that developed Eureqa are constantly providing updates to their algorithm and so I doubt that a regression test is even worthwhile. The bottom-line is that after putting the experiment aside many times over the past year, the last time I tried the Eureqa machine learning, the results finally clicked and apparently everything fell into place. The fact that the the component the machine learning algorithm found as a forcing factor is an almost *exact* match to the behavior of QBO , makes the evidence overwhelming that this is the right path to take. It is important to realize that what the machine learning found matches perfectly the (1)phase, (2)period, and (3) intricate frequency modulation of the QBO waveform as shown in Comment #2 above. And then when we put that together with a solid differential equation solution, I am confident that I have a solution that I can stand by. Do a Google search for the number of references that speculate that the QBO is a strong driver for ENSO and we also have a scientific consensus to draw from. The difference here, is that this is probably the first time that someone has actually fed the QBO forcing into a model of the ENSO response function (i.e. the Mathieu DiffEq) and produced something that matches the results over the past 130+ years this effectively. I am really starting to see some daylight here and can now concentrate all my efforts on tailoring the model and writing up a more formal white paper. Now on to your specific concerns on the filtering I am doing. First of all, the SOI measure is very noisy at the sub-annual level, in particular in comparison to the SST measures. I tend to high-pass filter the SOI to the point that it matches the SST in the noise ripple level. I do not think there is anything wrong with that. Second, the low-pass filtering I am doing is critical to revealing the wave-equation-like solution -- having a strong DC level or low-frequency offset does no good when one is trying to find the true zero crossings that are the hallmark of solutions to a resonant second-order differential equation. Once I find the solution, having applied the filtered waveform to discriminate the results more precisely, I can always add the low-frequency multi-decadal factor back in place. And the real question is: How can my filtering have done anything to bias what the machine learning has found? Did I destroy the original waveform so much that it forced a hidden QBO waveform to spontaneously appear? That would be preposterous to say the least. Thanks for the comments and this is where the discussion is getting good. I will add in more commentary as I try to digest your other concerns.
  • 5.

    Hello Paul

    You work is astounding and I do not wish you to think the computations you made are not useful, breakthrough in their own rights. But they are global fitters i.e. a single equation models/constraints the signal globally. So they work fine with smoothed periodic data, which is grand.

    All machine learning algorithms are not much more than non-linear adaptive approximator, in continuous cases the usual calculus approximators (remember Epsilon-Delta exercises in 1 year calc). They behave as if they are learning and adapting but nonetheless they are simple numerical or symbolic fitting functions.

    What is the terrific about them is that they are data agnostic, which could not be said about most curve fitting numerical methods, and they are adaptive which means they are recomputed at each x or t as the input.

    This is what your code does: Applies MeanFilter[] on top of the code, that removes the irregularities in the data and smoothes it. Then you symbolically fit a function (or parametric functions) or you fit a differential equation, either way you have a global data-fitting of the signal. Since the signal is periodic due to the nature of weather pattern on this planet, it seems to work for fitting or forecasting the Trend part of the data with sin() and cos(), perhaps complex powers would be better to try.

    Consequently what your code does, and does it well, is it fits the smooth Trend part of the signal.

    But it does not model the noise in the signal, which I understand what might be causing the weather patterns we see, it is those lesser fluctuations that are causing the weather disasters we are seeing.

    So using wavelets I do similar Trend fitting like you do however it gives good results for 1-2 months forecasts, but no way to get 3+ forecasts which was what John asking.

    In effect if the original equations were f(t) = F(t) where F is non-smooth function e.g. white noise, you solved f(t) = g(t) which g is a smoothed version of the noisy data. while that would tell us a lot about the general shape of the data and future forecasts, it does not help with anomalies that are encapsulated in the model of the noise part of the data.

    I have asked another superb progarmmer to join us to investigate this, perhaps from your ancestries neck of the woods :) and we try to say the above in actual code and diagrams and equation.

    Your work is instrumental for us to understand the shape of the data and nature of the anomalies.

    Dara

    Comment Source:Hello Paul You work is astounding and I do not wish you to think the computations you made are not useful, breakthrough in their own rights. But they are global fitters i.e. a single equation models/constraints the signal globally. So they work fine with smoothed periodic data, which is grand. All machine learning algorithms are not much more than non-linear adaptive approximator, in continuous cases the usual calculus approximators (remember Epsilon-Delta exercises in 1 year calc). They behave as if they are learning and adapting but nonetheless they are simple numerical or symbolic fitting functions. What is the terrific about them is that they are data agnostic, which could not be said about most curve fitting numerical methods, and they are adaptive which means they are recomputed at each x or t as the input. This is what your code does: Applies MeanFilter[] on top of the code, that removes the irregularities in the data and smoothes it. Then you symbolically fit a function (or parametric functions) or you fit a differential equation, either way you have a global data-fitting of the signal. Since the signal is periodic due to the nature of weather pattern on this planet, it seems to work for fitting or forecasting the Trend part of the data with sin() and cos(), perhaps complex powers would be better to try. Consequently what your code does, and does it well, is it fits the smooth Trend part of the signal. But it does not model the noise in the signal, which I understand what might be causing the weather patterns we see, it is those lesser **fluctuations** that are causing the weather disasters we are seeing. So using wavelets I do similar Trend fitting like you do however it gives good results for 1-2 months forecasts, but no way to get 3+ forecasts which was what John asking. In effect if the original equations were f(t) = F(t) where F is non-smooth function e.g. white noise, you solved f(t) = g(t) which g is a smoothed version of the noisy data. while that would tell us a lot about the **general** shape of the data and future forecasts, it does not help with anomalies that are encapsulated in the model of the noise part of the data. I have asked another superb progarmmer to join us to investigate this, perhaps from your ancestries neck of the woods :) and we try to say the above in actual code and diagrams and equation. Your work is instrumental for us to understand the **shape** of the data and nature of the anomalies. Dara
  • 6.

    I usually use this example dealing with riding a bike, f(t) = F(t) governs the dynamics of bike riding on two wheels, f(t) modeling the mechanisms of the bike and my body while F(t) governs the surface of the road and curvature and all that. If the road is smooth F(t) becomes a regular smooth function g(t) then solutions to f(t)=g(t) are easy to model. This is me riding a bike on asphalt road by Declan Spring :) This I reckon is your solutions.

    Now add a bit of non-smoothness into F(t) e.g. finite number of discontinuous derivatives, this happens when I ride the bike with my huskies leashed to my bike, they tug hard at the bike here and there but in between the bike and them cruise smoothly. I can manage the tugs by jerky motions to the handle and my body and compensate, this is where the machine learning algorithms work best.

    Now add a lot of noise to right hand side so F(t) is non-smooth function e.g. riding on rubble or rocky roads. Some non-smoothness of the F(t) causes unstable solutions i.e. I wobble and head first darting into the road :) In this case any algorithmic solution fails!

    Dara

    Comment Source:I usually use this example dealing with riding a bike, f(t) = F(t) governs the dynamics of bike riding on two wheels, f(t) modeling the mechanisms of the bike and my body while F(t) governs the surface of the road and curvature and all that. If the road is smooth F(t) becomes a regular smooth function g(t) then solutions to f(t)=g(t) are easy to model. This is me riding a bike on asphalt road by Declan Spring :) This I reckon is your solutions. Now add a bit of non-smoothness into F(t) e.g. finite number of discontinuous derivatives, this happens when I ride the bike with my huskies leashed to my bike, they tug hard at the bike here and there but in between the bike and them cruise smoothly. I can manage the tugs by jerky motions to the handle and my body and compensate, this is where the machine learning algorithms work best. Now add a lot of noise to right hand side so F(t) is non-smooth function e.g. riding on rubble or rocky roads. Some non-smoothness of the F(t) causes unstable solutions i.e. I wobble and head first darting into the road :) In this case any algorithmic solution fails! Dara
  • 7.

    "But it does not model the noise in the signal, which I understand what might be causing the weather patterns we see, it is those lesser fluctuations that are causing the weather disasters we are seeing."

    I don't know about everyone else, but I am not attempting to solve weather. I am trying to solve climate change --- and this happens on a longer scale than weather.

    What does the wavelet analysis look like on the multi-decadal intervals?

    wavelet

    This model, which I optimized with respect to the last 20 years, seems to go in-and-out over the decades. Yet, there may be a stronger and slightly different pattern emerging now.

    waveletRecent

    The possible confounding factor is whether CO2 is actually impacting the recent behavioral change.

    The invariant throughout this time span this is the rather constant QBO forcing. What appears to be changing is forcing and modulation on periods from 5 to 20 years.

    Comment Source:> "But it does not model the noise in the signal, which I understand what might be causing the weather patterns we see, it is those lesser fluctuations that are causing the weather disasters we are seeing." I don't know about everyone else, but I am not attempting to solve weather. I am *trying* to solve climate change --- and this happens on a longer scale than weather. What does the wavelet analysis look like on the multi-decadal intervals? ![wavelet](http://imagizer.imageshack.us/a/img908/2922/Ivn2ox.gif) This model, which I optimized with respect to the last 20 years, seems to go in-and-out over the decades. Yet, there may be a stronger and slightly different pattern emerging now. ![waveletRecent](http://imageshack.com/a/img909/1674/HKPSlt.gif) The possible confounding factor is whether CO2 is actually impacting the recent behavioral change. The invariant throughout this time span this is the rather constant QBO forcing. What appears to be changing is forcing and modulation on periods from 5 to 20 years.
  • 8.

    Great so the Scalograms show that towards the axes origin along y-axis i.e. Trend (please check) the two signals are quite similar (around index 4-6).

    This is because if you look at your code closely, you passed SOIT to ConstinuousWaveletTransform [] and SOIT was made from filtered P earlier via calling MeanFilter[] with SOIL I assume the original data.

    So what you are showing is that if you denoise the SOIL call it SOIT, then SOIT has a governing differential equation with sin () and cos () coefficients.

    This is precisely what I also get from not just this data any other correlated data. So you should be able to forecast the next 1 or 2 values or the up vs. down of the next two values.

    You won't be able to get the next 6 months!

    Dara

    Comment Source:Great so the Scalograms show that towards the axes origin along y-axis i.e. Trend (please check) the two signals are quite similar (around index 4-6). This is because if you look at your code closely, you passed SOIT to ConstinuousWaveletTransform [] and SOIT was made from filtered P earlier via calling MeanFilter[] with SOIL I assume the original data. So what you are showing is that if you denoise the SOIL call it SOIT, then SOIT has a governing differential equation with sin () and cos () coefficients. This is precisely what I also get from not just this data any other correlated data. So you should be able to forecast the next 1 or 2 values or the up vs. down of the next two values. You won't be able to get the next 6 months! Dara
  • 9.

    BTW Paul you might wanna replace MeanFilter with DiscreteWaveletTransform since the MeanFilter is crude and does not preserve the peaks and valleys as well as the Wavelets.

    D

    Comment Source:BTW Paul you might wanna replace MeanFilter with DiscreteWaveletTransform since the MeanFilter is crude and does not preserve the peaks and valleys as well as the Wavelets. D
  • 10.
    edited October 2014

    This is all that the mean-filter is doing with respect to the wavelet transform

    waveletFilter

    The top is untreated and the bottom is mean-filter applied

    All I was trying to do was to remove the low-frequency modulation

    Perhaps you are getting confused by this code:

    MeanFilter[SOIL[[All,2]],0]
    

    That actually doesn't do anything, because the parameter of 0 passes the signal through unmodified. On the other hand, the low frequency mean filter with parameter 135 is removing the slight modulation of the signal at periods of 20 years and higher.

    And the SOI signal has to be treated to remove the sub-annual fluctuations otherwise the seasonal noise is too strong.

    This is what the main site says: http://climatedataguide.ucar.edu/climate-data/southern-oscillation-index-soi

    Key Strengths:
    Simple time series
    Key Limitations:
    Raw monthly values are noisy; Must be smoothed
    

    The SST indices do not suffer from this as much since the ocean has a built-in low-pass filter due to its large thermal mass. Atmospheric pressure however has a much lower thermal mass and so is subject to significant daily fluctuations, which unless filtered would make the output unusable.

    Perhaps I shouldn't be using SOI as the noise always seems to be troubling, even though when it is filtered properly, it looks not much different than the NINO 3 or NINO 3.4 index.

    Comment Source:This is all that the mean-filter is doing with respect to the wavelet transform ![waveletFilter](http://imageshack.com/a/img674/5261/Pb6Ttn.gif) The top is untreated and the bottom is mean-filter applied All I was trying to do was to remove the low-frequency modulation Perhaps you are getting confused by this code: MeanFilter[SOIL[[All,2]],0] That actually doesn't do anything, because the parameter of 0 passes the signal through unmodified. On the other hand, the low frequency mean filter with parameter 135 is removing the slight modulation of the signal at periods of 20 years and higher. And the SOI signal has to be treated to remove the sub-annual fluctuations otherwise the seasonal noise is too strong. This is what the main site says: <http://climatedataguide.ucar.edu/climate-data/southern-oscillation-index-soi> Key Strengths: Simple time series Key Limitations: Raw monthly values are noisy; Must be smoothed The SST indices do not suffer from this as much since the ocean has a built-in low-pass filter due to its large thermal mass. Atmospheric pressure however has a much lower thermal mass and so is subject to significant daily fluctuations, which unless filtered would make the output unusable. Perhaps I shouldn't be using SOI as the noise always seems to be troubling, even though when it is filtered properly, it looks not much different than the NINO 3 or NINO 3.4 index.
  • 11.

    These wavelet scalograms are not the easiest diagrams to interpret.

    The following is the wavelet scalogram of a Mathieu function. The pattern looks vaguely periodic but not really on close inspection. That is in keeping with the quasi-periodic solutions to the Mathieu differential equation.

    QBOScalagram

    On the other hand, here is the wavelet scalogram of the model for QBO, which is a frequency modulated sinusoid.

    MathieuScalogram

    This is a bit easier to interpret as you can see the modulation of the fundamental QBO frequency.

    When you start putting these together, with the QBO as a forcing function to a Mathieu DiffEq, one can imagine that the scalogram can start looking quite complex.

    One thing that the scalogram does quite well is point out significant changes in the character of a time series. Having done this for the entire SOI time series, one can see a qualitative difference from the beginning to the end, yet the QBO forcing seems to be a constant throughout. But at modulations longer than 3 years, something is definitely perturbing the model. From 1880 to 1980 one set of parameters works better than another set post-1980.

    Comment Source:These wavelet scalograms are not the easiest diagrams to interpret. The following is the wavelet scalogram of a Mathieu function. The pattern looks vaguely periodic but not really on close inspection. That is in keeping with the quasi-periodic solutions to the Mathieu differential equation. ![QBOScalagram](http://imageshack.com/a/img538/5320/tBJk5R.gif) On the other hand, here is the wavelet scalogram of the model for QBO, which is a frequency modulated sinusoid. ![MathieuScalogram](http://imageshack.com/a/img538/7039/gFIjNZ.gif) This is a bit easier to interpret as you can see the modulation of the fundamental QBO frequency. When you start putting these together, with the QBO as a forcing function to a Mathieu DiffEq, one can imagine that the scalogram can start looking quite complex. One thing that the scalogram does quite well is point out significant changes in the character of a time series. Having done this for the entire SOI time series, one can see a qualitative difference from the beginning to the end, yet the QBO forcing seems to be a constant throughout. But at modulations longer than 3 years, something is definitely perturbing the model. From 1880 to 1980 one set of parameters works better than another set post-1980.
  • 12.

    On the other hand, the low frequency mean filter with parameter 135 is removing the slight modulation of the signal at periods of 20 years and higher.

    Paul if you look closely at the higher frequencies in your scalogram comparisons you see that they are altered, the MeanFilter[] theoretically alters the lower frequencies per usage of 135 window, but it does across the board alteration of all frequencies due to its mathematical inefficiencies.

    For that matter the removal of such frequencies ought to be done with FFT or Wavelets transforms to have surgical removal as such. Even they are approximate.

    The changes IMHO might impact the coefficients to the diff eq, which are quite sensitive to slight alterations.

    As a rule of thumb for me, not necessarily an instructions for others, I do not trust the assumptions the filters provide perfect filtering unless I actually carry out the error analysis and see which parts of the signals were removed.

    Why?

    For FFT it assumes the sin cos waves are infinite since there is no asymptotic behaviour for sin or cos, for wavelets the compact base (so to say as t->infinity) is finite length, therefore the removal of the frequencies are approximate for finite signals.

    No way of making sure how the removal of the frequencies impact the final signal that derives the coefficients into another equation.

    Generally I have seen this part of the data analysis is overlooked by most authors who just filter the data thinking, Oh! it is just a little alternation to smooth out, but the underlying differential equations are sensitive to the slight alterations.

    You might just ignore what I say, but I am cautious since I have seen many many cases of such troubles with smoothing data

    Dara

    Comment Source:>On the other hand, the low frequency mean filter with parameter 135 is removing the slight modulation of the signal at periods of 20 years and higher. Paul if you look closely at the higher frequencies in your scalogram comparisons you see that they are altered, the MeanFilter[] theoretically alters the lower frequencies per usage of 135 window, but it does across the board alteration of all frequencies due to its mathematical inefficiencies. For that matter the removal of such frequencies ought to be done with FFT or Wavelets transforms to have surgical removal as such. Even they are approximate. The changes IMHO might impact the coefficients to the diff eq, which are quite sensitive to slight alterations. As a rule of thumb for me, not necessarily an instructions for others, I do not trust the assumptions the filters provide perfect filtering unless I actually carry out the error analysis and see which parts of the signals were removed. Why? For FFT it assumes the sin cos waves are infinite since there is no asymptotic behaviour for sin or cos, for wavelets the compact base (so to say as t->infinity) is finite length, therefore the removal of the frequencies are approximate for finite signals. No way of making sure how the removal of the frequencies impact the final signal that derives the coefficients into another equation. Generally I have seen this part of the data analysis is overlooked by most authors who just filter the data thinking, Oh! it is just a little alternation to smooth out, but the underlying differential equations are sensitive to the slight alterations. You might just ignore what I say, but I am cautious since I have seen many many cases of such troubles with smoothing data Dara
  • 13.

    Paul I do not understand the #12 post in this discussion

    Comment Source:Paul I do not understand the #12 post in this discussion
  • 14.

    I usually use this example dealing with riding a bike, f(t) = F(t) governs the dynamics of bike riding on two wheels, f(t) modeling the mechanisms of the bike and my body while F(t) governs the surface of the road and curvature and all that. If the road is smooth F(t) becomes a regular smooth function g(t) then solutions to f(t)=g(t) are easy to model.

    This is basically my wheelhouse (so to speak) as I have been supporting simulation of vehicle dynamics -- both land and water -- for the last two years at the company I work at.

    What I did specifically was characterize roads and terrains that served as test tracks for vehicle design. These typically were diverse enough that they featured all the disturbances that a vehicle may experience, from cobblestones to off-road. The general idea is correct in that the suspension of the vehicles are designed to protect the vehicle and provide the occupants with some level of driving comfort.

    The random and stochastic terrains are usually not the problem, because of the way that suspensions are designed -- they can absorb most of the power that the surface imparts. What actually are more problematic are the surfaces that have some regularity; these include washboard roads and rumble strips. What these can do is set up resonances with the vehicle suspension.

    washboardRoad

    More info on this white paper that I am working on: http://entroplet.com/ref/foundation/B-terrain_characterization.pdf

    Incidentally, this kind of modeling was the reason that I set up the http://ContextEarth.com blog. I wanted to put together a site that featured all sorts of environmental models that provided context for design and analysis. I am really sidetracked on climate stuff because that is interesting as well and fits in nicely with the basic theme.

    How the road dynamics has a connection to ENSO is that I firmly believe that the ocean is acting as a behavioral response function to forcing functions of external origin. It certainly is more complicated than anything having to do with simulating vehicles with models of terrains, yet the evidence suggests that there is some natural characteristic frequency corresponding to a period of ~4 years, which is interacting with the ~2-1/3 year period forcing of QBO along with some longer periods having something to do with perhaps lunisolar forcing.

    My entire premise is that what we are seeing with ENSO is not as random as one would believe based on what is being conveyed in the ENSO literature. The response just looks random because the forcing function (QBO, etc) is not perfectly periodic and the response function can have these non-linear characteristics due to sloshing that also leads to quasi-periodic responses. That is what the #12 comment is demonstrating. I really think that the way to solve ENSO is to find a deterministic response somewhere amidst the complexity. If it is stochastic and random, we really have no chance to predict it far down the road. If there is a deterministic model, then we have something to work with.

    Comment Source:> I usually use this example dealing with riding a bike, f(t) = F(t) governs the dynamics of bike riding on two wheels, f(t) modeling the mechanisms of the bike and my body while F(t) governs the surface of the road and curvature and all that. If the road is smooth F(t) becomes a regular smooth function g(t) then solutions to f(t)=g(t) are easy to model. This is basically my wheelhouse (so to speak) as I have been supporting simulation of vehicle dynamics -- both land and water -- for the last two years at the company I work at. What I did specifically was characterize roads and terrains that served as test tracks for vehicle design. These typically were diverse enough that they featured all the disturbances that a vehicle may experience, from cobblestones to off-road. The general idea is correct in that the suspension of the vehicles are designed to protect the vehicle and provide the occupants with some level of driving comfort. The random and stochastic terrains are usually not the problem, because of the way that suspensions are designed -- they can absorb most of the power that the surface imparts. What actually are more problematic are the surfaces that have some regularity; these include washboard roads and rumble strips. What these can do is set up resonances with the vehicle suspension. ![washboardRoad](http://www.quirkyscience.com/wp-content/uploads/2012/10/ayers-rock.jpg) More info on this white paper that I am working on: <http://entroplet.com/ref/foundation/B-terrain_characterization.pdf> Incidentally, this kind of modeling was the reason that I set up the <http://ContextEarth.com> blog. I wanted to put together a site that featured all sorts of environmental models that provided context for design and analysis. I am really sidetracked on climate stuff because that is interesting as well and fits in nicely with the basic theme. How the road dynamics has a connection to ENSO is that I firmly believe that the ocean is acting as a behavioral response function to forcing functions of external origin. It certainly is more complicated than anything having to do with simulating vehicles with models of terrains, yet the evidence suggests that there is some natural characteristic frequency corresponding to a period of ~4 years, which is interacting with the ~2-1/3 year period forcing of QBO along with some longer periods having something to do with perhaps lunisolar forcing. My entire premise is that what we are seeing with ENSO is not as random as one would believe based on what is being conveyed in the ENSO literature. The response just looks random because the forcing function (QBO, etc) is not perfectly periodic and the response function can have these non-linear characteristics due to sloshing that also leads to quasi-periodic responses. That is what the #12 comment is demonstrating. I really think that the way to solve ENSO is to find a deterministic response somewhere amidst the complexity. If it is stochastic and random, we really have no chance to predict it far down the road. If there is a deterministic model, then we have something to work with.
  • 15.

    I am really sidetracked on climate stuff

    My! your sidetracked is my life's mission :)

    The B-terrain link is not valid.

    Comment Source:>I am really sidetracked on climate stuff My! your sidetracked is my life's mission :) The B-terrain link is not valid.
  • 16.

    really think that the way to solve ENSO is to find a deterministic response somewhere amidst the complexity.

    If you look at the earlier ENSO CDFs I made, you can see the deterministic patterns, once you use the Wavelet decomposition on the entire volume data of the planetary grid temps. Problem I see the 1D time series are inadequate for finding those patterns.

    I also see in those wavelets decompositions that the patterns are east-west and north-south in other words poles are quite important to force the patterns. In other words there are multi-Trends in either directions.

    Comment Source:> really think that the way to solve ENSO is to find a deterministic response somewhere amidst the complexity. If you look at the earlier ENSO CDFs I made, you can see the deterministic patterns, once you use the Wavelet decomposition on the entire volume data of the planetary grid temps. Problem I see the 1D time series are inadequate for finding those patterns. I also see in those wavelets decompositions that the patterns are east-west and north-south in other words poles are quite important to force the patterns. In other words there are multi-Trends in either directions.
  • 17.

    Often what I will do is to filter the data so that it can avoid falling into local minimum during the search process. Then once it finds a good fit, I take out the filter and it invariably ends up close to the same set of parameters.

    So I am not sure how much it matters that a bit of residual filtering remains.

    Comment Source:Often what I will do is to filter the data so that it can avoid falling into local minimum during the search process. Then once it finds a good fit, I take out the filter and it invariably ends up close to the same set of parameters. So I am not sure how much it matters that a bit of residual filtering remains.
  • 18.

    Then once it finds a good fit, I take out the filter and it invariably ends up close to the same set of parameters.

    This is not happening to any of my stuff, this does not mean you did not find a way. filtered and un-filtered data in all my work has been quite distanced in behaviour.

    So I am not sure how much it matters that a bit of residual filtering remains.

    I played with your diff EQ, the CDF I sent you, and slight changes to one equation caused drastically different solution.

    Let's put this discussion to bed for now, I will release many coded examples and you could rerun the code and see what I say.

    Dara

    Comment Source:>Then once it finds a good fit, I take out the filter and it invariably ends up close to the same set of parameters. This is not happening to any of my stuff, this does not mean you did not find a way. filtered and un-filtered data in all my work has been quite distanced in behaviour. >So I am not sure how much it matters that a bit of residual filtering remains. I played with your diff EQ, the CDF I sent you, and slight changes to one equation caused drastically different solution. Let's put this discussion to bed for now, I will release many coded examples and you could rerun the code and see what I say. Dara
  • 19.

    Here is an example of a test track power spectrum density profile designed to shake up any vehicle

    testTrackPSD

    This was data that was generated in collaboration with some colleagues at CSIR in South Africa.

    The track was essentially a set of somewhat uniformly spaced railroad ties that a vehicle had to drive over. The model is a semi-Markov sequence of pulses.

    Try out the interactive page here http://entroplet.com/context_profile/navigate

    Comment Source:Here is an example of a test track power spectrum density profile designed to shake up any vehicle ![testTrackPSD](http://imageshack.com/a/img908/3346/9xelIW.gif) This was data that was generated in collaboration with some colleagues at CSIR in South Africa. The track was essentially a set of somewhat uniformly spaced railroad ties that a vehicle had to drive over. The model is a semi-Markov sequence of pulses. Try out the interactive page here <http://entroplet.com/context_profile/navigate>
  • 20.

    Couldn't resist this one. Here is another snapshot from the terrain server, showing a Google Earth shot of the test track to the left and a profile of the surface to the right. You can see the corrugations on the test track from the aerial shot.

    testTrack

    The intent is that vehicle designers can load the data or a synthetic version of the data into a simulation and see how their finite-element model of vehicle suspension works on the test track before they try it for real.

    You used the example of a bike traversing a rough surface, so I thought I would reciprocate with my own examples :)

    Comment Source:Couldn't resist this one. Here is another snapshot from the terrain server, showing a Google Earth shot of the test track to the left and a profile of the surface to the right. You can see the corrugations on the test track from the aerial shot. ![testTrack](http://imageshack.com/a/img909/5010/E62rX2.gif) The intent is that vehicle designers can load the data or a synthetic version of the data into a simulation and see how their finite-element model of vehicle suspension works on the test track before they try it for real. You used the example of a bike traversing a rough surface, so I thought I would reciprocate with my own examples :)
  • 21.

    Back to the theme of this comment thread.

    Here is a summary:

    A high fidelity model of the QBO time series is critical as a forcing function so as to get a good fit to the ENSO SOI time series.

    Yet if you lay the QBO on top of the SOI one would not see a strong resemblance between the two time series.

    However, the combined response of the QBO with the characteristic frequency of the equatorial Pacific is what nails the behavior.

    This is a lock in terms of a causal model for ENSO and the next step is to scour the literature to see if someone else has applied this approach.

    Comment Source:Back to the theme of this comment thread. Here is a summary: A high fidelity model of the QBO time series is critical as a forcing function so as to get a good fit to the ENSO SOI time series. Yet if you lay the QBO on top of the SOI one would not see a strong resemblance between the two time series. However, the combined response of the QBO with the characteristic frequency of the equatorial Pacific is what nails the behavior. This is a lock in terms of a causal model for ENSO and the next step is to scour the literature to see if someone else has applied this approach.
  • 22.

    Impressive as always! your vehicle project is grand!

    Will get back to you in a few days on some code

    D

    Comment Source:Impressive as always! your vehicle project is grand! Will get back to you in a few days on some code D
  • 23.

    twoPhase

    A characteristic adjusts between 1975 and 1980. See the wavelet scalogram top (SOI data) and bottom (SOI model).

    Comment Source:![twoPhase](http://imageshack.com/a/img661/2946/TxPy4p.gif) A characteristic adjusts between 1975 and 1980. See the wavelet scalogram top (SOI data) and bottom (SOI model).
  • 24.
    edited October 2014

    There appears to be a shift in regime in the MEI starting in 1975 until about 2000.

    MEI 12 month MA 1950-2013

    If the following regime changes are real then this poses considerable difficulties , especially for training data construction. Presumably these regime changes would have to be modded out in some way?

    • Christopher Torrence and Peter J. Webster, Interdecadal changes in the ENSO-Monsoon system (1998), clim_12_103.2679_2690.pdf

      Wavelet analysis of time series with a 2-7 year variance distinguish the periods 1875-1920 and 1960-1990 of high ENSO-monsoon variance from the period 1920-1960 of low variance.

      The low 2-7 year variance period 1920-1960 shows a larger 1 year variance time series of the Nino3 index that is negatively correlated with interdecadal ENSO variance.

      The 1920-1960 low coherency, weaker ENSO-monsoon connection suggested considering Eurasian snow cover as a factor TBD.

      These raise the possiblity of a return to 1920-19960 conditions with low ENSO-monsoon variability and reduced predictability.

    Is the PDO anticorrelated with the ENSO shift 1920-1950 (+20),1970?

    Comment Source:There appears to be a shift in regime in the MEI starting in 1975 until about 2000. ![MEI 12 month MA 1950-2013](https://dl.dropboxusercontent.com/u/61621163/Images/mei-12mnth-1950-2013.png) If the following regime changes are real then this poses considerable difficulties , especially for training data construction. Presumably these regime changes would have to be modded out in some way? * Christopher Torrence and Peter J. Webster, [Interdecadal changes in the ENSO-Monsoon system (1998)](), clim_12_103.2679_2690.pdf Wavelet analysis of time series with a 2-7 year variance distinguish the periods 1875-1920 and 1960-1990 of high ENSO-monsoon variance from the period 1920-1960 of low variance. The low 2-7 year variance period 1920-1960 shows a larger 1 year variance time series of the Nino3 index that is negatively correlated with interdecadal ENSO variance. The 1920-1960 low coherency, weaker ENSO-monsoon connection suggested considering Eurasian snow cover as a factor TBD. These raise the possiblity of a return to 1920-19960 conditions with low ENSO-monsoon variability and reduced predictability. Is the PDO anticorrelated with the ENSO shift 1920-1950 (+20),1970?
  • 25.

    There was an event called.the great Pacific climate shift of 1976-1977.

    Thanks for the ref Jim.

    Comment Source:There was an event called.the great Pacific climate shift of 1976-1977. Thanks for the ref Jim.
  • 26.

    William Gray was one of the first to make the connection between QBO and ENSO

    [1]W. M. Gray, J. D. Sheaffer, and J. A. Knaff, “Hypothesized mechanism for stratospheric QBO influence on ENSO variability,” Geophysical Research Letters, vol. 19, no. 2, pp. 107–110, 1992. [2]W. M. Gray, J. D. Sheaffer, and J. A. Knaff, “Influence of the stratospheric QBO on ENSO variability,” J. Meteor: Soc. Japan, vol. 70, pp. 975–995, 1992.

    Comment Source:William Gray was one of the first to make the connection between QBO and ENSO [1]W. M. Gray, J. D. Sheaffer, and J. A. Knaff, “Hypothesized mechanism for stratospheric QBO influence on ENSO variability,” Geophysical Research Letters, vol. 19, no. 2, pp. 107–110, 1992. [2]W. M. Gray, J. D. Sheaffer, and J. A. Knaff, “Influence of the stratospheric QBO on ENSO variability,” J. Meteor: Soc. Japan, vol. 70, pp. 975–995, 1992.
  • 27.

    Thanks for the links :).

    Comment Source:Thanks for the links :).
  • 28.

    The effectiveness of applying the QBO as a forcing to fit the SOI data is opening up some other areas. As seem in the wavelet scalogram, the interval around 1980 appears to initiate a different regime.

    What the machine learning did was take the 133 year time-span in which SOI records were available and it tried to find a DiffEq solution that had a minimum error over the entire interval. Besides the QBO and what looks like Chandler wobble forcing beat (~6.4 years), the other modulation it found was a 20-year period with a strong harmonic at 10 years. For the chart in comment #24, I essentially removed the 20 year period and replaced the 10 year-harmonic with a 10.7 year period, with the transition point separating the two regimes right at 1981.

    Comment Source:The effectiveness of applying the QBO as a forcing to fit the SOI data is opening up some other areas. As seem in the wavelet scalogram, the interval around 1980 appears to initiate a different regime. What the machine learning did was take the 133 year time-span in which SOI records were available and it tried to find a DiffEq solution that had a minimum error over the *entire* interval. Besides the QBO and what looks like [Chandler wobble](http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/18398/1/99-1877.pdf) forcing beat (~6.4 years), the other modulation it found was a 20-year period with a strong harmonic at 10 years. For the chart in [comment #24](http://forum.azimuthproject.org/discussion/1504/symbolic-regression-machine-learning-and-enso-time-series/?Focus=12923#Comment_12923), I essentially removed the 20 year period and replaced the 10 year-harmonic with a 10.7 year period, with the transition point separating the two regimes right at 1981.
  • 29.

    Getting back to Jim's question of whether regime changes can impact predictability, no doubt that is the case. And the fact that all of the links contributing as factors to ENSO also need to be predictable in the face of regime changes, makes it even more challenging.

    Yet some of the factors, such as QBO and the Chandler wobble and perhaps the characteristic frequency of the ocean may be invariant to regime shifts. It may only be that a few of the factors change during a regime shift. This may make it a bit easier to model the dynamics of the transition, and perhaps to follow it as it is happening. In math, changing one variable while holding all the others constant is a tried-and-true approach and that is what I did above -- only the long-term 10-to-20 year modulation appeared to change.

    The other nagging question is how well can we correlate the model with the data. Ideally, one would assume that achieving a correlation coefficient of unity would be the ultimate goal. However, the data itself may not be the true reflection of the underlying mechanism. Consider again the correlation of the Darwin and Tahiti atmospheric pressure anomalies that constitute the SOI characteristic. Note below that the Tahiti and Darwin correlation coefficient is less than 0.6, while the model correlation coefficient is above 0.7.

    cc

    This doesn't necessarily mean that the model is any better than the data. It's like asking which watch is correct when you have two that are giving different times. Unless you have a standard that establishes the ground truth, one can never tell which is right. And that is partially what is happening here. It may not make any sense to try to improve the correlation coefficient without having a clear target to shoot for.

    --

    Some clues from the first paper by Gray gives some mechanisms for how the QBO can generate wind shear and thus force the ENSO variations.

    gray

    One detail I have noticed with the model is that it tends to better follow the high pressure Darwin values during El Nino conditions and the high pressure Tahiti values during La Nina conditions. For the latter, check the right side of the first chart above, where the strong La Nina oscillations are much more pronounced during the 2010 time frame.

    It is possible that a metric that combines the high-pressure dipole behavior is a better measure of what is happening, as the model would likely show a higher correlation coefficient. But I am not sure if that would detract from the narrative.

    Comment Source:Getting back to Jim's question of whether regime changes can impact predictability, no doubt that is the case. And the fact that all of the links contributing as factors to ENSO also need to be predictable in the face of regime changes, makes it even more challenging. Yet some of the factors, such as QBO and the Chandler wobble and perhaps the characteristic frequency of the ocean may be invariant to regime shifts. It may only be that a few of the factors change during a regime shift. This may make it a bit easier to model the dynamics of the transition, and perhaps to follow it as it is happening. In math, changing one variable while holding all the others constant is a tried-and-true approach and that is what I did above -- only the long-term 10-to-20 year modulation appeared to change. The other nagging question is how well can we correlate the model with the data. Ideally, one would assume that achieving a correlation coefficient of unity would be the ultimate goal. However, the data itself may not be the true reflection of the underlying mechanism. Consider again the correlation of the Darwin and Tahiti atmospheric pressure anomalies that constitute the SOI characteristic. Note below that the Tahiti and Darwin correlation coefficient is less than 0.6, while the model correlation coefficient is above 0.7. ![cc](http://imageshack.com/a/img903/763/QRP2Te.gif) This doesn't necessarily mean that the model is any better than the data. It's like asking which watch is correct when you have two that are giving different times. Unless you have a standard that establishes the ground truth, one can never tell which is right. And that is partially what is happening here. It may not make any sense to try to improve the correlation coefficient without having a clear target to shoot for. -- Some clues from the first paper by Gray gives some mechanisms for how the QBO can generate wind shear and thus force the ENSO variations. ![gray](http://imageshack.com/a/img537/9145/HFLjAd.gif) One detail I have noticed with the model is that it tends to better follow the high pressure Darwin values during El Nino conditions and the high pressure Tahiti values during La Nina conditions. For the latter, check the right side of the first chart above, where the strong La Nina oscillations are much more pronounced during the 2010 time frame. It is possible that a metric that combines the high-pressure dipole behavior is a better measure of what is happening, as the model would likely show a higher correlation coefficient. But I am not sure if that would detract from the narrative.
  • 30.

    I put together a post with some latest findings -- http://contextearth.com/2014/10/19/demodulation-and-the-soim/

    A page listing all the posts on the Southern Index Oscillation Model topic -- http://contextearth.com/soim-links/

    Hint, hint - My starting to organize the information usually means that I am seeing the light at the end of the tunnel.

    Comment Source:I put together a post with some latest findings -- <http://contextearth.com/2014/10/19/demodulation-and-the-soim/> A page listing all the posts on the Southern Index Oscillation Model topic -- <http://contextearth.com/soim-links/> Hint, hint - My starting to organize the information usually means that I am seeing the light at the end of the tunnel.
  • 31.

    I said earlier:

    “This is a lock in terms of a causal model for ENSO and the next step is to scour the literature to see if someone else has applied this approach.”

    I scored by finding a paper by Clarke at al [1] that is applying a resonant wave equation to ENSO and identifying a characteristic frequency for the oscillation. This is a variation of a sloshing model that treats surface temperature T and a fixed isotherm depth D (representing the thermocline) as the representative force and velocity terms in a 2nd order DiffEq.

    Clarke eqn

    The direct result of the mutual relationship between T and D results in a forced resonant oscillator.

    $ \frac{dT}{dt} = \nu D $ , $ \frac{dD}{dt} = -\mu T $

    Based on the observed lag relationship between T and D, they are able to estimate the characteristic period for oscillations of 4.25 years. They further suggest that wind stress is a forcing function.

    This is the diagrammatic explanation for their formulation.

    Clarke diagram

    This is very close to the model that I am using for the SOI behavior, replacing wind stress with the known behavior of the QBO. The one significant free parameter left in the model is the characteristic frequency – and this just happens to match Clarke’s value. Look below at the blue reverse-highlighted text where the characteristic period is 4.2565 years !

    SOIM

    The basic formulation I use for the SOIM is this Mathieu/Hill equation

    $$ f''(t) + [\omega^2 +g(t)] f(t) = qbo(t) + tsi(t) + cw(t) + trend(t) $$ The g(t) Mathieu factor is a scaled tsi(t) plus a slowly varying background. There are really no free parameters left, other than the approximations that I have applied to emulate the other measures as closed-form expressions.

    • f(t) == SOI simulation
    • qbo == Quasi-Biennial Oscillation (20 hPa measure) -- approximated as a 28 month cycle with a 3 term frequency jitter
    • tsi == Total Solar Irradiance anomaly -- piecewise approximation of 10-11 year solar cycle with break at 1981
    • cw == Chandler Wobble beat frequency -- ~6.4 years with some dispersion about this point
    • trend == slowly varying multi-decadal forcing, PDO perhaps?

    With the substantiation that this is the true characteristic period of ENSO -- which had been estimated at being between 4 and 6 years previously -- the only criteria left is to assess how well the model matches the true ENSO signal. Right now the correlation coefficient is above 0.74, and that does not yet include any synchronization to the beginning of a calendar year, which is known to trigger the strongest El Nino signal.

    So the way to think of this model is as a behavior that is priming the pump for strong El Nino behavior -- that is, when a moderately strong negative peak appears in the model, some extra start-of-the-year boost comes along and pushes it to an El Nino extreme. The negative peaks at 1982 and 1998 appear in the model but may not be quite as strong as the data show. I say that with the knowledge that the modeled El Nino peak of 1982 is the strongest both in model and data :)

    SOIM link page

    References

    [1] Clarke, Allan J., Stephen Van Gorder, and Giuseppe Colantuono. "Wind stress curl and ENSO discharge/recharge in the equatorial Pacific." Journal of physical oceanography 37.4 (2007): 1077-1091.

    These papers cite Clarke's original work and look interesting:

    [2] Clarke, Allan J. "El Niño Physics and El Niño Predictability." Annual review of marine science 6 (2014): 79-99.

    [3] Chambers, D. P. "ENSO-correlated fluctuations in ocean bottom pressure and wind-stress curl in the North Pacific." Ocean Science 7.5 (2011): 685-692.

    [4] Fedorov, Alexey V. "Ocean response to wind variations, warm water volume, and simple models of ENSO in the low-frequency approximation." Journal of Climate 23.14 (2010): 3855-3873.

    Comment Source:I said earlier: > “This is a lock in terms of a causal model for ENSO and the next step is to scour the literature to see if someone else has applied this approach.” I scored by finding a paper by Clarke at al [1] that is applying a resonant wave equation to ENSO and identifying a characteristic frequency for the oscillation. This is a variation of a sloshing model that treats surface temperature T and a fixed isotherm depth D (representing the thermocline) as the representative force and velocity terms in a 2nd order DiffEq. ![Clarke eqn](http://imageshack.com/a/img537/4247/sOwyr7.gif) The direct result of the mutual relationship between T and D results in a forced resonant oscillator. $ \frac{dT}{dt} = \nu D $ , $ \frac{dD}{dt} = -\mu T $ Based on the observed lag relationship between T and D, they are able to estimate the characteristic period for oscillations of **4.25 years**. They further suggest that wind stress is a forcing function. This is the diagrammatic explanation for their formulation. ![Clarke diagram](http://imageshack.com/a/img908/8266/YC9ycJ.gif) This is very close to the model that I am using for the SOI behavior, replacing wind stress with the known behavior of the QBO. The one significant free parameter left in the model is the characteristic frequency – and this just happens to match Clarke’s value. Look below at the blue reverse-highlighted text where the characteristic period is **4.2565 years** ! ![SOIM](http://imageshack.com/a/img538/3349/cyCjbY.gif) The basic formulation I use for the SOIM is this Mathieu/Hill equation $$ f''(t) + [\omega^2 +g(t)] f(t) = qbo(t) + tsi(t) + cw(t) + trend(t) $$ The *g(t)* Mathieu factor is a scaled *tsi(t)* plus a slowly varying background. There are really no free parameters left, other than the approximations that I have applied to emulate the other measures as closed-form expressions. * f(t) == SOI simulation * qbo == Quasi-Biennial Oscillation (20 hPa measure) -- approximated as a 28 month cycle with a 3 term frequency jitter * tsi == Total Solar Irradiance anomaly -- piecewise approximation of 10-11 year solar cycle with break at 1981 * cw == Chandler Wobble beat frequency -- ~6.4 years with some dispersion about this point * trend == slowly varying multi-decadal forcing, PDO perhaps? With the substantiation that this is the true characteristic period of ENSO -- which had been estimated at being between 4 and 6 years previously -- the only criteria left is to assess how well the model matches the true ENSO signal. Right now the correlation coefficient is above 0.74, and that does not yet include any synchronization to the beginning of a calendar year, which is known to trigger the strongest El Nino signal. So the way to think of this model is as a behavior that is priming the pump for strong El Nino behavior -- that is, when a moderately strong negative peak appears in the model, some extra start-of-the-year boost comes along and pushes it to an El Nino extreme. The negative peaks at 1982 and 1998 appear in the model but may not be quite as strong as the data show. I say that with the knowledge that the modeled El Nino peak of 1982 is the strongest both in model and data :) [SOIM link page](http://contextearth.com/soim-links/) ### References ### [1] Clarke, Allan J., Stephen Van Gorder, and Giuseppe Colantuono. "Wind stress curl and ENSO discharge/recharge in the equatorial Pacific." Journal of physical oceanography 37.4 (2007): 1077-1091. These papers cite Clarke's original work and look interesting: [2] Clarke, Allan J. "El Niño Physics and El Niño Predictability." Annual review of marine science 6 (2014): 79-99. [3] Chambers, D. P. "ENSO-correlated fluctuations in ocean bottom pressure and wind-stress curl in the North Pacific." Ocean Science 7.5 (2011): 685-692. [4] Fedorov, Alexey V. "Ocean response to wind variations, warm water volume, and simple models of ENSO in the low-frequency approximation." Journal of Climate 23.14 (2010): 3855-3873.
  • 32.
    edited October 2014

    Congrats Paul,

    That's looks like a pretty good 5 parameter multivariate fit.

    I don't get the quote from Clarke:

    The wind stress curl anomaly is a key aspect of the dynamics and has not been properly recognised.in the standard discharge/recharge model.

    I haven't read the paper yet but I thought the standard ENSO description started ith easterly wind stress piled up the warm water pool as per the diagrams. Is it some formulation of the curl which hasn't been recognized?

    Comment Source:Congrats Paul, That's looks like a pretty good 5 parameter multivariate fit. I don't get the quote from Clarke: > The wind stress curl anomaly is a key aspect of the dynamics and has not been properly recognised.in the standard discharge/recharge model. I haven't read the paper yet but I thought the standard ENSO description started ith easterly wind stress piled up the warm water pool as per the diagrams. Is it some formulation of the curl which hasn't been recognized?
  • 33.

    Jim, I think their point is that there are quite a few hand-wavy "just-so" descriptive models of how the wind plays a part, but there are very few if any models that actually use this information in a calculation. In the context of what I am trying to do, I have yet to see a similar calculation applied. So I see your point and unless I charitably account for it that way, I am just as puzzled by the statement.

    As far as the wind curl and how it ties into QBO, the thinking is that the QBO is a part of the quasi-periodic wind shear that moves downward and that it is also pulling in air from the upper latitudes as well.

    This is a movie of the behavior:

    http://www.ugamp.nerc.ac.uk/hot/ajh/qboanim.movie

    Comment Source:Jim, I think their point is that there are quite a few hand-wavy "just-so" descriptive models of how the wind plays a part, but there are very few if any models that actually use this information in a calculation. In the context of what I am trying to do, I have yet to see a similar calculation applied. So I see your point and unless I charitably account for it that way, I am just as puzzled by the statement. As far as the wind curl and how it ties into QBO, the thinking is that the QBO is a part of the quasi-periodic wind shear that moves downward and that it is also pulling in air from the upper latitudes as well. This is a movie of the behavior: <http://www.ugamp.nerc.ac.uk/hot/ajh/qboanim.movie>
  • 34.

    Here is the wavelet scalogram that goes with the fit in comment #32

    waveletScalogram

    Perhaps a correlation coefficient-like metric which compares two scalograms would be useful ? I see that one idea is to do the correlation coefficient of wavelet coefficients, which sounds straightforward.

    Comment Source:Here is the wavelet scalogram that goes with the fit in comment #32 ![waveletScalogram](http://imageshack.com/a/img904/714/jkgkWf.gif) Perhaps a correlation coefficient-like metric which compares two scalograms would be useful ? I see that one idea is to do the correlation coefficient of wavelet coefficients, which sounds straightforward.
  • 35.

    This sloshing ENSO model is beginning to come together, and I sense that a few select adjustments will help to further substantiate its worthiness.

    So far, the main forcing inputs are (1) the QBO (2) the TSI anomaly and (3) something that follows the Chandler wobble ~6.4-year beat frequency.

    Modeling the TSI appears to take the most finesse. The TSI has two rough cycles: the ~11-year Schwabe cycle and the ~22-year Hale cycle. These two are related as the longer Hale cycle aligns with polarity shifts.

    In the chart below, one can see that the model I am using has both ~11 and ~22 year features. What is most interesting is that the model has an 8 year lag from the actual TSI value. This is most clear at the discontinuity at around 1972, which when shifted by 8 years places the 1980 transition of regimes.

    This 8-year lag may be a real effect as solar thermal energy takes time to diffuse downward where it can make an impact on the thermocline.

    SOIM-TSI

    What makes the TSI hard to model is that the Hale cycle is only quasi-periodic, and there are certain years, particularly before ~1900 when the cycle is noticeably longer. That is why my model seems to weaken and disperse around those years. It is also clear that the most recent solar cycle is starting to lengthen. Leif Svalsgaard is one researcher that follows this very closely:

    leif-euv

    http://www.leif.org/research/Keynote-SCOSTEP-2014.pdf

    He calls this chart "an accurate depiction of true solar activity since 1840"

    Comment Source:This sloshing ENSO model is beginning to come together, and I sense that a few select adjustments will help to further substantiate its worthiness. So far, the main forcing inputs are (1) the QBO (2) the TSI anomaly and (3) something that follows the Chandler wobble ~6.4-year beat frequency. Modeling the TSI appears to take the most finesse. The TSI has two rough cycles: the ~11-year Schwabe cycle and the ~22-year Hale cycle. These two are related as the longer Hale cycle aligns with [polarity shifts](http://solarphysics.livingreviews.org/open?pubNo=lrsp-2008-3&amp;page=articlesu4.html). In the chart below, one can see that the model I am using has both ~11 and ~22 year features. What is most interesting is that the model has an 8 year lag from the actual TSI value. This is most clear at the discontinuity at around 1972, which when shifted by 8 years places the 1980 transition of regimes. This 8-year lag may be a real effect as solar thermal energy takes time to diffuse downward where it can make an impact on the thermocline. ![SOIM-TSI](http://imageshack.com/a/img537/5346/ZJcgsK.gif) What makes the TSI hard to model is that the Hale cycle is only quasi-periodic, and there are certain years, particularly before ~1900 when the cycle is noticeably longer. That is why my model seems to weaken and disperse around those years. It is also clear that the most recent solar cycle is starting to lengthen. Leif Svalsgaard is one researcher that follows this very closely: ![leif-euv](http://imageshack.com/a/img909/1193/n372KL.png) <http://www.leif.org/research/Keynote-SCOSTEP-2014.pdf> He calls this chart <i>"an accurate depiction of true solar activity since 1840"</i>
  • 36.

    The three forcing components to the ENSO sloshing model are approximately weighted as (1) QBO as the strongest, responsible for the finer structure, and (2) TSI and (3) Chandler wobble tied for a strong second.

    All three of these have been identified in the literature as somehow related to the ENSO cycle, but the identification of the Chandler wobble (CW) has been the most obscure. Only Richard Gross of JPL [1] has really made the connection with the motion of the ocean...

    The way I have modeled the wobble is not by using the ~433 day Wobble frequency [2][3] but using that number against the annual cycle of 365.25 days to create a beat period. The following chart is from [4].

    FT-miller

    The sloshing model frequency matches the CW beat closely. From the code in my chart #35:

    $$ 2\pi/0.9728 = 6.46 years = \frac{1}{1-1/1.183}$$ which is the formula for the beat frequency.

    Taking this a step further, we can plot the changes in radial polar velocity from the JPL Kalman Orientation series and plot that against the model component used to fit the SOI data:

    cw-FIT

    Note that the model syncs up with the data after 1930, but is 180 degrees out-of-phase before this point. Those are known as phase jumps of the CW in the literature, kind of mysterious behaviors, AFAICT.

    But overall, this is the evidence that I consider as substantiating for both qualitative and quantitative attribution describing one of the ENSO forcings. Physically, we know something like this has to occur -- the Chandler wobble is describing changes in angular frequency of the rotating earth, and the ocean has to respond to this factor. Based on one's own intuition, if you rotate your body while holding a bowl of soup, any change in rotational speed will result in the soup to slosh as a compensation.

    What is happening here is really no different, it's just that these 3 factors (QBO, TSI, and CW) are each obscuring one another. Since we can not do the experiment where we hold 2 of the 3 constant while letting the third freely vary, this kind of modeling is what we are left with pursuing.

    REFS

    [1] Gross, Richard S. “The Excitation of the Chandler Wobble.” Geophysical Research Letters 27, no. 15 (2000): 2329–32.

    [2] Vicente, Raimundo O, and Clark R Wilson. “On the Variability of the Chandler Frequency.” Journal of Geophysical Research: Solid Earth (1978–2012) 102, no. B9 (1997): 20439–45.

    [3] Zotov, Leonid. “Analysis of Chandler Wobble Excitation, Reconstructed from Observations of the Polar Motion of the Earth.” Accessed October 25, 2014. http://syrte.obspm.fr/jsr/journees2010/pdf/Zotov.pdf.

    [4] Miller, N, and Z Malkin. “Analysis of Polar Motion Variations from 170-Year Observation Series.” arXiv Preprint arXiv:1304.3985, 2013.

    Comment Source:The three forcing components to the ENSO sloshing model are approximately weighted as (1) QBO as the strongest, responsible for the finer structure, and (2) TSI and (3) Chandler wobble tied for a strong second. All three of these have been identified in the literature as somehow related to the ENSO cycle, but the identification of the Chandler wobble (CW) has been the most obscure. Only Richard Gross of JPL [1] has really made the connection with the motion of the ocean... The way I have modeled the wobble is not by using the ~433 day Wobble frequency [2][3] but using that number against the annual cycle of 365.25 days to create a beat period. The following chart is from [4]. ![FT-miller](http://imageshack.com/a/img674/6648/rBeCqi.gif) The sloshing model frequency matches the CW beat closely. From the code in my chart #35: $$ 2\pi/0.9728 = 6.46 years = \frac{1}{1-1/1.183}$$ which is the formula for the beat frequency. Taking this a step further, we can plot the changes in radial polar velocity from the JPL Kalman Orientation series and plot that against the model component used to fit the SOI data: ![cw-FIT](http://imageshack.com/a/img910/5530/tNkTpZ.gif) Note that the model syncs up with the data after 1930, but is 180 degrees out-of-phase before this point. Those are known as phase jumps of the CW in the literature, kind of mysterious behaviors, AFAICT. But overall, this is the evidence that I consider as substantiating for both qualitative and quantitative attribution describing one of the ENSO forcings. Physically, we know something like this has to occur -- the Chandler wobble is describing changes in angular frequency of the rotating earth, and the ocean *has to* respond to this factor. Based on one's own intuition, if you rotate your body while holding a bowl of soup, any change in rotational speed will result in the soup to slosh as a compensation. What is happening here is really no different, it's just that these 3 factors (QBO, TSI, and CW) are each obscuring one another. Since we can not do the experiment where we hold 2 of the 3 constant while letting the third freely vary, this kind of modeling is what we are left with pursuing. ### REFS ### [1] Gross, Richard S. “The Excitation of the Chandler Wobble.” Geophysical Research Letters 27, no. 15 (2000): 2329–32. [2] Vicente, Raimundo O, and Clark R Wilson. “On the Variability of the Chandler Frequency.” Journal of Geophysical Research: Solid Earth (1978–2012) 102, no. B9 (1997): 20439–45. [3] Zotov, Leonid. “Analysis of Chandler Wobble Excitation, Reconstructed from Observations of the Polar Motion of the Earth.” Accessed October 25, 2014. http://syrte.obspm.fr/jsr/journees2010/pdf/Zotov.pdf. [4] Miller, N, and Z Malkin. “Analysis of Polar Motion Variations from 170-Year Observation Series.” arXiv Preprint arXiv:1304.3985, 2013.
  • 37.

    I apply three ENSO forcing factors -- QBO, TSI anomaly and Chandler wobble -- that are critical to finding a good fit to the ENSO time series .

    The CW forcing aligns very well with measurements of the polar motion absolute velocity variations -- but only after the year ~1925 cw_qbo

    This is essentially a plot of the rate of change in (x,y) as defined here:

    http://hpiers.obspm.fr/eop-pc/index.php?index=pm

    Before 1925, I tried to invert the phase of the waveform but the fit became worse and not better. As a matter of fact, applying adjustments to keep the phase inverted actually improved the fit -- see the sharp cusp at 1910.

    As of 2012, scientists still refer to this phase reversal as "baffling" [1]. The amplitude of the wobble becomes very small here and so it is possible that it is a metastable point. Realistically, I can't really be sure if the measure of the CW is a side effect of some other forcing that does indeed have a more periodic component, and this phase reversal is a "skip" in the beat caused by a subtle forcing change starting around 1910 and extending past 1925.

    [1] Chao, Benjamin F, and Wei-Yung Chung. “Amplitude and Phase Variations of Earth’s Chandler Wobble under Continual Excitation.” Journal of Geodynamics 62 (2012): 35–39.

    Comment Source:I apply three ENSO forcing factors -- QBO, TSI anomaly and Chandler wobble -- that are critical to finding a good fit to the ENSO time series . The CW forcing aligns very well with measurements of the polar motion absolute velocity variations -- but only after the year ~1925 ![cw_qbo](http://imageshack.com/a/img661/2538/uWPodb.gif) This is essentially a plot of the rate of change in (x,y) as defined here: <http://hpiers.obspm.fr/eop-pc/index.php?index=pm> Before 1925, I tried to invert the phase of the waveform but the fit became worse and not better. As a matter of fact, applying adjustments to keep the phase inverted actually improved the fit -- see the sharp cusp at 1910. As of 2012, scientists still refer to this phase reversal as "baffling" [1]. The amplitude of the wobble becomes very small here and so it is possible that it is a metastable point. Realistically, I can't really be sure if the measure of the CW is a side effect of some other forcing that does indeed have a more periodic component, and this phase reversal is a "skip" in the beat caused by a subtle forcing change starting around 1910 and extending past 1925. [1] Chao, Benjamin F, and Wei-Yung Chung. “Amplitude and Phase Variations of Earth’s Chandler Wobble under Continual Excitation.” Journal of Geodynamics 62 (2012): 35–39.
  • 38.
    nad
    edited October 2014

    As of 2012, scientists still refer to this phase reversal as “baffling” [1]. The amplitude of the wobble becomes very small here and so it is possible that it is a metastable point.

    This seems to be a 80 year periodic phenomenom, at least this is what this article suggests using data from Pulkovo Observatory. (or an article from 2009 on this issue)

    Comment Source:>As of 2012, scientists still refer to this phase reversal as “baffling” [1]. The amplitude of the wobble becomes very small here and so it is possible that it is a metastable point. This seems to be a 80 year periodic phenomenom, at least this is what this <a href="http://arxiv.org/abs/1304.3985">article</a> suggests using data from Pulkovo Observatory. (or an <a href="http://arxiv.org/abs/0908.3732">article from 2009</a> on this issue)
  • 39.

    Paul

    If you treat the curve as a differentiable one, with the given data sets we could not get far to find a snug fit using differential operators (unless we do severe overfitting). For the related QBO I tried to use the backward propagation NN which is nothing but a particular minimizing algorithm to reduce error by means of differential operators.

    The given data sets get stuck around 13% accuracy. I run same algorithm on stocks prices and get quite a snug fit. Somehow the data for QBO is either missing lots of interim points or curves too fast.

    I believe same is happening to differential equation version as well. This is due to assuming and usage differentiability.

    Dara

    Comment Source:Paul If you treat the curve as a differentiable one, with the given data sets we could not get far to find a snug fit using differential operators (unless we do severe overfitting). For the related QBO I tried to use the backward propagation NN which is nothing but a particular minimizing algorithm to reduce error by means of differential operators. The given data sets get stuck around 13% accuracy. I run same algorithm on stocks prices and get quite a snug fit. Somehow the data for QBO is either missing lots of interim points or curves too fast. I believe same is happening to differential equation version as well. This is due to assuming and usage differentiability. Dara
  • 40.

    Paul I think piecewise differential model is better, to divide up the data into piecewise differentiable sub-curves and let them have their own distinct equations and then tie them together as simultaneous system of equations.

    I am concluding that the atmospheric oceanic structures are non-differentiable in large part, and require different formulation than vanilla multi-variate calculus.

    Dara

    Comment Source:Paul I think piecewise differential model is better, to divide up the data into piecewise differentiable sub-curves and let them have their own distinct equations and then tie them together as simultaneous system of equations. I am concluding that the atmospheric oceanic structures are non-differentiable in large part, and require different formulation than vanilla multi-variate calculus. Dara
  • 41.
    edited October 2014

    Dara, I understand your concern perfectly. I tried doing the same thing of applying the QBO curves exactly as they exist. Unfortunately, as you have found out they do not work very effectively in that way. The QBO data at 20 hPa --- which is in the stratosphere --- has to drop in altitude as a wind shear before it impacts the ocean surface. By that time, the jitter in the waveform will likely change. So the only invariant I strictly kept was the overall period, which was fixed solid at ~28 months, or 2.696 rads/year. The jitter was relaxed until it started to lock into a fit. The other point to remember is that the lag of the effect is also unknown --- a QBO forcing may not be immediate if it has to propagate downward into the thermocline. So the variability in the jitter is about right in terms of an RMS, but not exact to any concrete measure.

    The same thing is probably holding for the TSI, where it appears the lag is more likely several years.

    On the other hand, the Chandler wobble effect may be more immediate as it is a "mechanical" effect and directly coupled as an angular momentum change to the ocean volume.

    BTW, if I place the TSI and CW data into Eureqa as factors with my D(SOI,t,2) approach, the symbolic regression will use those terms and not sinusoids to fit the data. Eureqa would rather use a sin(2.696t) & cos(2.696t) combination for the QBO, though.

    As you can see, it is getting to the point that I have to start making these qualified assertions to be able to make sense of what is going on. It is entirely possible that something else is happening, but I can only draw on what other scientists have been offering as associations.

    Comment Source:Dara, I understand your concern perfectly. I tried doing the same thing of applying the QBO curves *exactly* as they exist. Unfortunately, as you have found out they do not work very effectively in that way. The QBO data at 20 hPa --- which is in the stratosphere --- has to drop in altitude as a wind shear before it impacts the ocean surface. By that time, the jitter in the waveform will likely change. So the only invariant I *strictly* kept was the overall period, which was fixed solid at ~28 months, or 2.696 rads/year. The jitter was relaxed until it started to lock into a fit. The other point to remember is that the lag of the effect is also unknown --- a QBO forcing may not be immediate if it has to propagate downward into the thermocline. So the *variability* in the jitter is about right in terms of an RMS, but not exact to any concrete measure. The same thing is probably holding for the TSI, where it appears the lag is more likely several years. On the other hand, the Chandler wobble effect may be more immediate as it is a "mechanical" effect and directly coupled as an angular momentum change to the ocean volume. BTW, if I place the TSI and CW data into Eureqa as factors with my D(SOI,t,2) approach, the symbolic regression will use those terms and not sinusoids to fit the data. Eureqa would rather use a sin(2.696*t) & cos(2.696*t) combination for the QBO, though. As you can see, it is getting to the point that I have to start making these qualified assertions to be able to make sense of what is going on. It is entirely possible that something else is happening, but I can only draw on what other scientists have been offering as associations.
  • 42.

    I believe the computational methods are fine, both the machine learning and different equation software solvers.

    I believe we are a) using the data the wrong way, b) we are applying the mentioned techniques the wrong way as well.

    Dara

    Comment Source:I believe the computational methods are fine, both the machine learning and different equation software solvers. I believe we are a) using the data the wrong way, b) we are applying the mentioned techniques the wrong way as well. Dara
  • 43.

    This seems to be a 80 year periodic phenomenom, at least this is what this article suggests using data from Pulkovo Observatory. (or an article from 2009 on this issue)

    Yes nad, that is the article that I referenced in comment #37. If the 80 year repeat sequence is correct, then the sequence is 1845, 1925, 2005. My data for the absolute velocity variation time-series ends in 2000, so I can't see the phase shift that they say occurs in 2005. http://www.technologyreview.com/view/415093/earths-chandler-wobble-changed-dramatically-in-2005/

    I was using the POLE99 data from JPL, but I should probably try to find the more recent POLE2012 data that Ratcliff&Gross refer to here: ftp://euler.jpl.nasa.gov/keof/combinations/SpaceCombPole_latest.pdf

    Comment Source:> This seems to be a 80 year periodic phenomenom, at least this is what this article suggests using data from Pulkovo Observatory. (or an article from 2009 on this issue) Yes nad, that is the article that I referenced in comment #37. If the 80 year repeat sequence is correct, then the sequence is 1845, 1925, 2005. My data for the absolute velocity variation time-series ends in 2000, so I can't see the phase shift that they say occurs in 2005. <http://www.technologyreview.com/view/415093/earths-chandler-wobble-changed-dramatically-in-2005/> I was using the POLE99 data from JPL, but I should probably try to find the more recent POLE2012 data that Ratcliff&Gross refer to here: <ftp://euler.jpl.nasa.gov/keof/combinations/SpaceCombPole_latest.pdf>
  • 44.

    I believe we are a) using the data the wrong way, b) we are applying the mentioned techniques the wrong way as well.

    Dara, If I am using the data in the wrong way, this is some deep rabbit hole that I have gone down!

    Review:

    • Characeteristic frequency is 4.25 year period, exactly as predicted by Allan Clarke, based on themocline measurements

    • A forcing correlating exactly to a QBO frequency of 2.33 years or 28 months, which aligns perfectly with the data since 1953.

    • A forcing correlating exactly to a Chandler wobble beat of 6.46 year period.

    • And the last significant forcing I found appears to be very close to the TSI anomaly, which has a more complicated structure, consisting of a mix of the Schwabe cycle of 10-11 years and the Hale cycle of 20-22 years.

    If this is wrong and I am getting such good correlations, I wonder how much better it will get if I start doing it right!

    Seriously, the issue of solving the DiffEq's is that they are extremely brittle. If one places an extraneous delta impulse forcing somewhere in the time-series, that can cause a large perturbation in the result. In terms of resonant structures, this is considered to have a high Q factor, and so has very little damping. Recall that you can always place a first-order damping factor, f'(t), in the DiffEq formulation -- I haven't because the sloshing formulation doesn't. They consider liquid volumes fairly inviscid in that regard.

    Comment Source:> I believe we are a) using the data the wrong way, b) we are applying the mentioned techniques the wrong way as well. Dara, If I am using the data in the wrong way, this is some deep [rabbit hole](http://www.urbandictionary.com/define.php?term=Rabbit+Hole) that I have gone down! Review: * Characeteristic frequency is 4.25 year period, exactly as predicted by Allan Clarke, based on themocline measurements * A forcing correlating exactly to a QBO frequency of 2.33 years or 28 months, which aligns perfectly with the data since 1953. * A forcing correlating exactly to a Chandler wobble beat of 6.46 year period. * And the last significant forcing I found appears to be very close to the TSI anomaly, which has a more complicated structure, consisting of a mix of the Schwabe cycle of 10-11 years and the Hale cycle of 20-22 years. If this is wrong and I am getting such good correlations, I wonder how much better it will get if I start doing it right! Seriously, the issue of solving the DiffEq's is that they are extremely brittle. If one places an extraneous delta impulse forcing somewhere in the time-series, that can cause a large perturbation in the result. In terms of resonant structures, this is considered to have a high Q factor, and so has very little damping. Recall that you can always place a first-order damping factor, f'(t), in the DiffEq formulation -- I haven't because the sloshing formulation doesn't. They consider liquid volumes fairly inviscid in that regard.
  • 45.

    I think we are talking cross purposes, you see the curves that match on the periods, I am forecasting the next 6 months. Two different situations.

    OK Paul could you then do the 6 months forecasts John was asking and run the code say for 9-10 years, backtest and compute the error, simple abs (actual - forecast). This should be one liner in your code already working.

    Not to compute the error for the given period of time (the past values), but the error for next 6 forecasts, f(t+1) f(t+2) ... f(t+6) then subtract these from actual historical data.

    You could produce solutions with same periods, but not forecasting good enough, I gave a bicycle example.

    Dara

    Comment Source:I think we are talking cross purposes, you see the curves that match on the periods, I am forecasting the next 6 months. Two different situations. OK Paul could you then do the 6 months forecasts John was asking and run the code say for 9-10 years, backtest and compute the error, simple abs (actual - forecast). This should be one liner in your code already working. Not to compute the error for the given period of time (the past values), but the error for next 6 forecasts, f(t+1) f(t+2) ... f(t+6) then subtract these from actual historical data. You could produce solutions with same periods, but not forecasting good enough, I gave a bicycle example. Dara
  • 46.

    Dara, I don't think the next 6 months are meaningful because weather and seasonal effects creep in. Just consider if a typhoon happens to enter the Darwin area during a monthly reading. This will impact the error significantly, even though on a yearly averaged time-scale it will wash away.

    Now I understand how we may not be on the same track. I am thinking years in advance, not a few months.

    Weather prediction and climate prediction are significantly different beasts. I do think that ENSO inhabits the gray area between these extremes, but the significance of the forcing factors that exist on time scales between 2 and 20+ years makes it closer to a climate prediction discipline.

    Yet, the fact that I have yet to include the beginning-of-the-year seasonal synchronization to the sloshing model means that there may be even more room for improvement :) I haven't figured out quite how to do this yet.

    Comment Source:Dara, I don't think the next 6 months are meaningful because weather and seasonal effects creep in. Just consider if a typhoon happens to enter the Darwin area during a monthly reading. This will impact the error significantly, even though on a yearly averaged time-scale it will wash away. Now I understand how we may not be on the same track. I am thinking years in advance, not a few months. Weather prediction and climate prediction are significantly different beasts. I do think that ENSO inhabits the gray area between these extremes, but the significance of the forcing factors that exist on time scales between 2 and 20+ years makes it closer to a climate prediction discipline. Yet, the fact that I have yet to include the beginning-of-the-year seasonal synchronization to the sloshing model means that there may be even more room for improvement :) I haven't figured out quite how to do this yet.
  • 47.

    Paul I believe those periods you are entering into your equations are functions of another variable, slight alteration to them causes hugely different solution!

    So right of the bad my first coding question to you is, why you think these periods are such crisp constants?

    D

    Comment Source:Paul I believe those periods you are entering into your equations are functions of another variable, slight alteration to them causes hugely different solution! So right of the bad my first coding question to you is, why you think these periods are such crisp constants? D
  • 48.

    When we do the wavelet scalograms we see those periods but we also see that they have different amplitudes at different times.

    Comment Source:When we do the wavelet scalograms we see those periods but we also see that they have different amplitudes at different times.
  • 49.

    I think that the fundamental QBO frequency is a crisp constant because it is derived from the properties of the upper atmosphere and possibly interacting with fixed seasonal and tidal periods. The substantiation for this is that we only have measurements of QBO back to 1953, yet the same waveform extrapolated backwards works very well from 1880 to 1953.

    I have no problem with slightly different results giving different answers. This is a resonant system after all. If I were building a resonant circuit out of inductors and capacitors, the exact amplitude and operating point would depend on precise estimates of these values along with any internal resistance, which would damp the amplitudes.

    The epiphany is that ENSO is likely not a stochastic system, but in fact a well-concealed resonant structure complicated by several forcing factors at play. This, along with Mathieu-type modulation, makes it look chaotic, when in reality it is highly deterministic. The caveat is that climate shifts, such as occurred from 1975 to 1980 are inexplicable and these can change the outcome drsatically.

    At some point we also have to realize that the SOI data is not exactly the true measure. Consider that the SOI is made up of two separate measurements from Tahiti and Darwin. These are supposedly out-of-phase dipoles, yet the correlation coefficient between these two is only -0.58 (and higher if filtered). If they were a perfect dipole pair, the correlation coefficient should be -1. But it isn't -1, and part of the reason for this is that there are fluctuations not related to the ENSO behavior.

    One strategy for eliminating this noise is to find a better pair of dipoles. The choice of Darwin and Tahiti is largely historical as these are long-running records. Perhaps a pair of stations exists that have a shorter historical time span but that show a better negative correlation.

    Comment Source:I think that the fundamental QBO frequency is a crisp constant because it is derived from the properties of the upper atmosphere and possibly interacting with fixed seasonal and tidal periods. The substantiation for this is that we only have measurements of QBO back to 1953, yet the same waveform extrapolated backwards works very well from 1880 to 1953. I have no problem with slightly different results giving different answers. This is a resonant system after all. If I were building a resonant circuit out of inductors and capacitors, the exact amplitude and operating point would depend on precise estimates of these values along with any internal resistance, which would damp the amplitudes. The epiphany is that ENSO is likely not a stochastic system, but in fact a well-concealed resonant structure complicated by several forcing factors at play. This, along with Mathieu-type modulation, makes it look chaotic, when in reality it is highly deterministic. The caveat is that climate shifts, such as occurred from 1975 to 1980 are inexplicable and these can change the outcome drsatically. At some point we also have to realize that the SOI data is not exactly the true measure. Consider that the SOI is made up of two separate measurements from Tahiti and Darwin. These are supposedly out-of-phase dipoles, yet the correlation coefficient between these two is *only* -0.58 (and higher if filtered). If they were a perfect dipole pair, the correlation coefficient should be -1. But it isn't -1, and part of the reason for this is that there are fluctuations not related to the ENSO behavior. One strategy for eliminating this noise is to find a better pair of dipoles. The choice of Darwin and Tahiti is largely historical as these are long-running records. Perhaps a pair of stations exists that have a shorter historical time span but that show a better negative correlation.
  • 50.

    Paul I am going to look at the new volumetric data from GPM and TRMM satellite networks and TOA satelliates for the top of the atmosphere radiations.

    I do not believe the 1-D time series are going to tell us much other than the seasonal periodicities.

    That being said, need to look for volumetric Differential Equations e.g. the tensor stress equations. D

    Comment Source:Paul I am going to look at the new volumetric data from GPM and TRMM satellite networks and TOA satelliates for the top of the atmosphere radiations. I do not believe the 1-D time series are going to tell us much other than the seasonal periodicities. That being said, need to look for volumetric Differential Equations e.g. the tensor stress equations. D
Sign In or Register to comment.