It looks like you're new here. If you want to get involved, click one of these buttons!

- All Categories 1.6K
- Azimuth Code Project 107
- News and Information 340
- Chat 197
- Azimuth Blog 148
- Azimuth Forum 29
- Azimuth Project 189
- - Strategy 108
- - Conventions and Policies 21
- - Questions 43
- Azimuth Wiki 702
- - Latest Changes 694
- - - Action 14
- - - Biodiversity 8
- - - Books 1
- - - Carbon 9
- - - Computational methods 38
- - - Climate 52
- - - Earth science 22
- - - Ecology 43
- - - Energy 29
- - - Experiments 30
- - - Geoengineering 0
- - - Mathematical methods 68
- - - Meta 9
- - - Methodology 16
- - - Natural resources 7
- - - Oceans 4
- - - Organizations 34
- - - People 6
- - - Publishing 3
- - - Reports 3
- - - Software 20
- - - Statistical methods 2
- - - Sustainability 4
- - - Things to do 2
- - - Visualisation 1
- General 36

Options

I googled QBO while trying to understand what WebHubTel is doing and found this: Predictability of the quasi-biennial oscillation and its northern winter teleconnection on seasonal to decadal timescales, Scaife et al

They say

The skill of these QBO predictions remains high at very long lead times and is higher than other highly predictable modes of climate variability such as El Niño–Southern Oscillation [Fedorov et al., 2003; Barnston et al., 2012]. Correlation scores exceed 0.8 at prediction times up to 6 months. At 12 months lead time the correlations are still above 0.7, and positive correlations are maintained out to years ahead in both forecast systems (Figure 2), remaining significant out to 4 years ahead in the MiKlip forecasts [c.f. Pohlmann et al., 2013].

That's quite long but it's not very long. WebHubTel appears to be using a simple method to project the QBO back from 1953-2015 all the way to 1880. I cannot make sense of this.

## Comments

Good discussion topic. First of all, the "quasi" in quasi-biennial oscillations appears to be somewhat of a misnomer. The fundamental frequency in the QBO is locked into around 2.33 years. This is 3 complete cycles every 7 years. So it is not quasi-periodic as much as it is quasi=2 for the person that originally thought that 2.33 was similar to 2.

The fact that this fundamental period of 2.33 years is so strong allows me to back-propagate the values to times well before 1953, which is the date when measurements of the QBO were first undertaken.

I am not the first to notice how strong this 28 month period is: http://benthamopen.com/contents/pdf/TOASCJ/TOASCJ-4-53.pdf Somebody else has created a time series that allows one to back-extrapolate to any year before 1953, and what this amounts to is the 28 month harmonic plus other Fourier series components. I can't find the ref at the moment, but I calculate the Fourier series myself because it is easy enough to construct and check for good correlation. This is what Mathematica's Find Formula discovers for 3 Fourier components (the highlighted component is close to the Chandler wobble beat frequency) :

I add a few more components to get the correlation coefficient higher, and that is the key to getting a good fit to ENSO. Unless this "jitter" or "frequency modulation" is included, the fit really won't work. That is possibly one of the reasons it has escaped the notice of researchers over the years. ( I built FM transmitters and radios as a teenager, so I know all about frequency modulation and knew that it could cause all the variation that was observed. I just didn't know how much better it would get the more I worked on it. )

I find it curious that the paper that Graham found claims that QBO isn't that predictable over the scale of much more than than 3 years:

What I notice is a difference in research done by climate scientists versus that done by geophysicists. Geophysicists aren't as afraid of making claims of determinism or periodicity, where climate scientists typically don't. I think this has a basis in the likelihood that climate scientists and meteorologists are taught to not make such strong claims in their forecasts. Why they write papers like this that don't point out the strong periodicity in the historical record is something I just can't understand.

Graham, I assume that is the same point that you "cannot make sense of" ?

`Good discussion topic. First of all, the "quasi" in quasi-biennial oscillations appears to be somewhat of a misnomer. The fundamental frequency in the QBO is locked into around 2.33 years. This is 3 complete cycles every 7 years. So it is not quasi-periodic as much as it is quasi=2 for the person that originally thought that 2.33 was similar to 2. The fact that this fundamental period of 2.33 years is so strong allows me to back-propagate the values to times well before 1953, which is the date when measurements of the QBO were first undertaken. I am not the first to notice how strong this 28 month period is: http://benthamopen.com/contents/pdf/TOASCJ/TOASCJ-4-53.pdf Somebody else has created a time series that allows one to back-extrapolate to any year before 1953, and what this amounts to is the 28 month harmonic plus other Fourier series components. I can't find the ref at the moment, but I calculate the Fourier series myself because it is easy enough to construct and check for good correlation. This is what Mathematica's Find Formula discovers for 3 Fourier components (the highlighted component is close to the Chandler wobble beat frequency) : ![QBO](http://imageshack.com/a/img540/5063/QK1oWH.gif) I add a few more components to get the correlation coefficient higher, and that is the key to getting a good fit to ENSO. Unless this "jitter" or "frequency modulation" is included, the fit really won't work. That is possibly one of the reasons it has escaped the notice of researchers over the years. ( I built FM transmitters and radios as a teenager, so I know all about frequency modulation and knew that it could cause all the variation that was observed. I just didn't know how much better it would get the more I worked on it. ) I find it curious that the paper that Graham found claims that QBO isn't that predictable over the scale of much more than than 3 years: > "We demonstrate predictability of the QBO extending more than 3 years into the future, well beyond timescales normally associated with internal atmospheric processes. " What I notice is a difference in research done by climate scientists versus that done by geophysicists. Geophysicists aren't as afraid of making claims of determinism or periodicity, where climate scientists typically don't. I think this has a basis in the likelihood that climate scientists and meteorologists are taught to not make such strong claims in their forecasts. Why they write papers like this that don't point out the strong periodicity in the historical record is something I just can't understand. Graham, I assume that is the same point that you "cannot make sense of" ?`

Thanks, WebHubTel. I agree it is odd they don't point out the strong periodicity in the historical record.

But I haven't seen you actually demonstrate your ability to extraprolate the QBO. Its obvious from your graph this is possible to some extent, but how much? If you feed half of the data (1953-1983 say) into FeedFormula and then extrapolate to 2013, how accurately can you do this? I would want to do lots of tests like this before extrapolating into the unknown.

Your formula looks odd to me because all the cos terms are equal to 1 when Y=0. I can see that the data is at a peak near here, but you can't know this is exactly true, and it would be quite wrong for other time periods.

`Thanks, WebHubTel. I agree it is odd they don't point out the strong periodicity in the historical record. But I haven't seen you actually demonstrate your ability to extraprolate the QBO. Its obvious from your graph this is possible to some extent, but how much? If you feed half of the data (1953-1983 say) into FeedFormula and then extrapolate to 2013, how accurately can you do this? I would want to do lots of tests like this before extrapolating into the unknown. Your formula looks odd to me because all the cos terms are equal to 1 when Y=0. I can see that the data is at a peak near here, but you can't know this is exactly true, and it would be quite wrong for other time periods.`

Graham, Lots of evidence based on the fits to historical ENSO proxy records http://contextearth.com/2014/08/26/soim-fit-to-unified-enso-proxy/

I don't use the Mathematica search results directly but optimize further by hand. The lack of phase terms is also seen in Eureqa, not a big deal.

`Graham, Lots of evidence based on the fits to historical ENSO proxy records http://contextearth.com/2014/08/26/soim-fit-to-unified-enso-proxy/ I don't use the Mathematica search results directly but optimize further by hand. The lack of phase terms is also seen in Eureqa, not a big deal.`

Graham, After trying to digest your response some more, I don't think you understand exactly what I am trying to do. I am only using FindFormula to find an empirical and analytical fit to QBO, which then can be used to extrapolate to out-of-band numbers for the past where the measurements were never done (because radiosonde equipment had yet to be invented).

You ask whether I am checking the QBO to 2013. Yes, of course, this the fit to 2013

I use all available QBO data from 1953-2013 to estimate what it will be from 1880-1953. I don't throw away any useful results that will help with a fit. We

will never knowwhat the actual QBO values are before 1953, because that is the stratosphere and no traces are left in any records.That's not quite my formula as I have more terms. It is well known that machine learning techniques such as Eureqa and Mathematica's FindFormula balance complexity against accuracy in their results. In this case, a result such as Cos(t) is generated because it is less complex than Cos(0.97*t + 0.45) , even though the latter is more accurate. The same holds true for the other terms -- I refer to this as a "lever arm" effect as small adjustments in the frequency can model phase adjustments. You really have to spend time working with these tools to understand their intricacies. Machine learning techniques are not a panacea, they are just a tool.

Yet what I show is just the tip of the iceberg as far as results are concerned. Get a license for Eureqa and you can let it automatically find solutions. Here is an experiment where it deduces the DiffEq result along with a Mathieu modulation.

The reality is that we are way beyond the climatologists in terms of how we are looking at the data. They are stuck in the stone age of analysis is all I can figure. No one is looking at machine learning and signal processing apart from what were are doing here and what some geophysicists such as Astudillo are finding. Like I said, there has to be some explanation why no one has found this relationship before -- curious because it has always been hidden in plain sight.

[1] H. Astudillo, R. Abarca-del-Rio, and F. Borotto, “Long-term non-linear predictability of ENSO events over the 20th century,” arXiv preprint arXiv:1506.04066, 2015.

`Graham, After trying to digest your response some more, I don't think you understand exactly what I am trying to do. I am only using FindFormula to find an empirical and analytical fit to QBO, which then can be used to extrapolate to out-of-band numbers for the past where the measurements were never done (because radiosonde equipment had yet to be invented). You ask whether I am checking the QBO to 2013. Yes, of course, this the fit to 2013 ![qbo](http://imagizer.imageshack.us/a/img540/6099/gkXx4K.gif) I use all available QBO data from 1953-2013 to estimate what it will be from 1880-1953. I don't throw away any useful results that will help with a fit. We <i>will never know</i> what the actual QBO values are before 1953, because that is the stratosphere and no traces are left in any records. > "Your formula looks odd to me because all the cos terms are equal to 1 when Y=0." That's not quite my formula as I have more terms. It is well known that machine learning techniques such as Eureqa and Mathematica's FindFormula balance complexity against accuracy in their results. In this case, a result such as Cos(t) is generated because it is less complex than Cos(0.97*t + 0.45) , even though the latter is more accurate. The same holds true for the other terms -- I refer to this as a "lever arm" effect as small adjustments in the frequency can model phase adjustments. You really have to spend time working with these tools to understand their intricacies. Machine learning techniques are not a panacea, they are just a tool. Yet what I show is just the tip of the iceberg as far as results are concerned. Get a license for Eureqa and you can let it automatically find solutions. Here is an experiment where it deduces the DiffEq result along with a Mathieu modulation. ![eureqa](http://imagizer.imageshack.us/a/img540/5181/F8eX2g.gif) The reality is that we are way beyond the climatologists in terms of how we are looking at the data. They are stuck in the stone age of analysis is all I can figure. No one is looking at machine learning and signal processing apart from what were are doing here and what some geophysicists such as Astudillo are finding. Like I said, there has to be some explanation why no one has found this relationship before -- curious because it has always been hidden in plain sight. [1] H. Astudillo, R. Abarca-del-Rio, and F. Borotto, “Long-term non-linear predictability of ENSO events over the 20th century,” arXiv preprint arXiv:1506.04066, 2015.`

I understand that. What I object to is the word 'only'. You need to test the ability of your method to extrapolate, before using it to extrapolate.

`> I am only using FindFormula to find an empirical and analytical fit to QBO, which then can be used to extrapolate to out-of-band numbers for the past I understand that. What I object to is the word 'only'. You need to test the ability of your method to extrapolate, before using it to extrapolate.`

Like I said, there is no way to test for how well it will extrapolate, because actual QBO values prior to 1953 will never be known. And the values for the future won't be known for some time to come :)

I can do something like take parts of the known QBO time-series as a training interval and use it to evaluate other parts, but that is a trade-off and judgment call. The trade-off is that a smaller training interval will have less information content than a longer interval. IOW, TINSTAAFL

Here is another recent paper "The validity of long-term prediction of quasi-biennial oscillation (QBO) as a proof of the exact seasonal synchronization of the equatorial stratospheric QBO cycle"

http://www.sciencedirect.com/science/article/pii/S1364682615000255

The author claims that QBO cycles are either 24, 30, or 36 months long, or on 2, 2.5, or 3 year seasonal boundaries. If the average cycle length is 28 months or 2.333 years, a bit of discrete combinatorial math will show that a set of months that are portioned out as

5c24, 2x30, and 2x36, will compose 9x2.333 cycles, which is a 21 year repeat period.

This gives the correct mean as well (5x24, 2x30, and 2x36)/(5+2+2) = 28

It might be worthwhile to test this out. Of course, one would want to use as long an interval as possible to order the sequence.

EDIT:

Another grouping to consider is (2+2+3) + (2+2.5+2.5) + (2+2+3)

This generates boundaries on 7 year cycles plus the 21 year cycle to maintain the 2.33 year average.

`Like I said, there is no way to test for how well it will extrapolate, because actual QBO values prior to 1953 will never be known. And the values for the future won't be known for some time to come :) I can do something like take parts of the known QBO time-series as a training interval and use it to evaluate other parts, but that is a trade-off and judgment call. The trade-off is that a smaller training interval will have less information content than a longer interval. IOW, TINSTAAFL Here is another recent paper "The validity of long-term prediction of quasi-biennial oscillation (QBO) as a proof of the exact seasonal synchronization of the equatorial stratospheric QBO cycle" http://www.sciencedirect.com/science/article/pii/S1364682615000255 The author claims that QBO cycles are either 24, 30, or 36 months long, or on 2, 2.5, or 3 year seasonal boundaries. If the average cycle length is 28 months or 2.333 years, a bit of discrete combinatorial math will show that a set of months that are portioned out as 5c24, 2x30, and 2x36, will compose 9x2.333 cycles, which is a 21 year repeat period. This gives the correct mean as well (5x24, 2x30, and 2x36)/(5+2+2) = 28 It might be worthwhile to test this out. Of course, one would want to use as long an interval as possible to order the sequence. --- EDIT: Another grouping to consider is (2+2+3) + (2+2.5+2.5) + (2+2+3) This generates boundaries on 7 year cycles plus the 21 year cycle to maintain the 2.33 year average.`

Of course there is. It is standard practice in machine learning, see cross-validation

Sounds rather like what Nad said a while ago.

`> Like I said, there is no way to test for how well it will extrapolate... Of course there is. It is standard practice in machine learning, see [cross-validation](https://en.wikipedia.org/wiki/Cross-validation_%28statistics%29) > The author claims that QBO cycles are either 24, 30, or 36 months long... Sounds rather like [what Nad said](https://forum.azimuthproject.org/discussion/comment/14533/#Comment_14533) a while ago. > well the QBO oscillations have of course a little fuzzy frequency component, but I wouldn't call this "frequency modulation." And strictly speaking the QBO is even not really periodic, but only somewhat and the amplitude goes also quite wild. What I mean is, as said here that it looks as if the signal is mostly having a strict two year period (i.e. I mean by that f(t) comparatively small) but once in a while not - i.e. it gets out of sync. Moreover it looks as if it is "forced back" into the biannual rythmn. I could imagine that this "out of sync" (disturbations, breaks) and forced back behaviour in an oscillation is rather well distinguishable from a more random behaviour via a fourier transform but may be not and even if this would be so I don't know how the corresponding typical forms would look like.`

OK, so I modeled that jitter.

This was partly done through Eureqa, which has cross-validation built-in. This is a screenshot.

Have we gone a complete circle now?

`OK, so I modeled that jitter. ![QBO](http://imagizer.imageshack.us/a/img540/6099/gkXx4K.gif) This was partly done through Eureqa, which has cross-validation built-in. This is a screenshot. ![Eureqa](http://imageshack.com/a/img911/6116/uSMk8W.gif) Have we gone a complete circle now?`

Paul wrote:

Graham wrote:

I agree with Graham: you

canget some information about how well a method of extrapolating the QBO works, using only the data we have. And if one wants to claim one knows how to extrapolate the QBO back to 1880, one needs whatever evidence one can get! - even though we'll never know for sure.On the other hand, one can merely claim one has a method of retrodicting the SOI back to 1880. In this approach - which only differs in the

claims one makes, extrapolating the QBO back to 1880 is treated as just part of a bigger machine. The machine is justified solely by its ability to retrodict the SOI.The difference is this:

In the latter approach, all the parameters used to extrapolate the QBO back to 1880 count as adjustable parameters in the SOI retrodiction. So, if we use something like the Akiake information criterion to see how good the SOI retrodiction is, these parameters "cost us".

I hope I'm making myself clear here.

`Paul wrote: > Like I said, there is no way to test for how well it will extrapolate... Graham wrote: > Of course there is. It is standard practice in machine learning, see cross-validation. I agree with Graham: you _can_ get some information about how well a method of extrapolating the QBO works, using only the data we have. And if one wants to claim one knows how to extrapolate the QBO back to 1880, one needs whatever evidence one can get! - even though we'll never know for sure. On the other hand, one can merely claim one has a method of retrodicting the SOI back to 1880. In this approach - which only differs in the _claims one makes_, extrapolating the QBO back to 1880 is treated as just part of a bigger machine. The machine is justified solely by its ability to retrodict the SOI. The difference is this: In the latter approach, all the parameters used to extrapolate the QBO back to 1880 count as adjustable parameters in the SOI retrodiction. So, if we use something like the [Akiake information criterion](https://en.wikipedia.org/wiki/Akaike_information_criterion) to see how good the SOI retrodiction is, these parameters "cost us". I hope I'm making myself clear here.`

Maybe I put my point in too technical a way for easy comprehension... though of course everyone here is super-smart.

Here's the simplified version. The reason this piece of work is original is also the reason a lot of people will dismiss it. Paul is using some data from 1953-2015 and a simple model to retrodict a climate cycle back to 1880 - a climate cycle we don't have records for. Then he's using this and another simple model to retrodict a climate cycle we

dohave records for. One can argue endlessly about the validity of the first step, but it's only justified by the results of the second step.It's an interesting situation.

One thing Paul should do right now is use his model to predict the QBO and SOI for the next decade, and put his predictions someplace online where he can't change them.

(For example, I could put them on my website - but that'll work only for people who know that I'm not in cahoots with Paul. Or, he could put a paper on the arXiv. The paper can be changed, but the original version is time-stamped and will never go away.)

`Maybe I put my point in too technical a way for easy comprehension... though of course everyone here is super-smart. Here's the simplified version. The reason this piece of work is original is also the reason a lot of people will dismiss it. Paul is using some data from 1953-2015 and a simple model to retrodict a climate cycle back to 1880 - a climate cycle we don't have records for. Then he's using this and another simple model to retrodict a climate cycle we _do_ have records for. One can argue endlessly about the validity of the first step, but it's only justified by the results of the second step. It's an interesting situation. One thing Paul should do right now is use his model to predict the QBO and SOI for the next decade, and put his predictions someplace online where he can't change them. (For example, I could put them on my website - but that'll work only for people who know that I'm not in cahoots with Paul. Or, he could put a paper on the arXiv. The paper can be changed, but the original version is time-stamped and will never go away.)`

Since Graham evidently thought that my QBO model was not good enough, I let Eureqa generate a much better cross-validated time series from the data available. The correlation coefficient is now up to 0.88 from 0.77. This is essentially a frequency modulated waveform. The yellow are the areas of discrepancy. This heuristic formula is used to back-extrapolate before 1953.

The quickest way to check how well this will fit the DiffEq of the SOI model is to compare the LHS against the RHS in differential form

$ f''(t) + \omega_0^2 f(t) = Forcing(t) = A*qbo(t) + B*qbo'(t) $

or

$lhs(t) = rhs(t)$

Check the agreement below. You will see that the majority of the peaks and valleys line up, even before 1953. And they align with an almost hyper-realistic precision. Those that don't are off and that is what causes the correlation coefficient to be kind of low. Remember that this is a differential form so that the noise is amplified !!!

Below the comparison is a power spectra of the residual error. There are some other periodicities in the signal that could be resolved, but otherwise it is pretty flat, closer to white noise.

Again I think that there is something very significant here and all I am trying to do is to get some initial buy-in.

John, thanks for asking me to do a prediction -- but all I am asking is for others to help substantiate this work and perhaps to improve it. Only then would I consider predictions.

`Since Graham evidently thought that my QBO model was not good enough, I let Eureqa generate a much better cross-validated time series from the data available. The correlation coefficient is now up to 0.88 from 0.77. This is essentially a frequency modulated waveform. The yellow are the areas of discrepancy. This heuristic formula is used to back-extrapolate before 1953. ![QBO](http://imageshack.com/a/img912/9076/xweSPF.gif) The quickest way to check how well this will fit the DiffEq of the SOI model is to compare the LHS against the RHS in differential form $ f''(t) + \omega_0^2 f(t) = Forcing(t) = A*qbo(t) + B*qbo'(t) $ or $lhs(t) = rhs(t)$ Check the agreement below. You will see that the majority of the peaks and valleys line up, even before 1953. And they align with an almost hyper-realistic precision. Those that don't are off and that is what causes the correlation coefficient to be kind of low. Remember that this is a differential form so that the noise is amplified !!! ![BabyQ](http://imageshack.com/a/img537/2792/YBJLBT.gif) Below the comparison is a power spectra of the residual error. There are some other periodicities in the signal that could be resolved, but otherwise it is pretty flat, closer to white noise. Again I think that there is something very significant here and all I am trying to do is to get some initial buy-in. John, thanks for asking me to do a prediction -- but all I am asking is for others to help substantiate this work and perhaps to improve it. Only then would I consider predictions.`

Oh dear. You have completely missed the point I have been trying to make. I don't know whether this is because you have no interest in doing what I think you need to do, or because you lack the background to understand. Your comment to John "Only then would I consider predictions" makes me think its the former. Your mentioning that Eureqa uses cross-validation makes me think its the latter.

This is my main point:

Cross-validation is just a more systematic way (or set of ways) of doing this. It involves dividing up the available to data to make best use (we hope) of a small amount of data for some purpose. A very common purpose is to assess the ability of a method to extrapolate. The basic idea is already contained in the paragraph I just quoted. The fact that Eureqa uses cross-validation internally is irrelevant. The point is to test the model produced by Eureqa on data that Eureqa has never seen.

`> Since Graham evidently thought that my QBO model was not good enough, I let Eureqa generate a much better cross-validated time series from the data available. Oh dear. You have completely missed the point I have been trying to make. I don't know whether this is because you have no interest in doing what I think you need to do, or because you lack the background to understand. Your comment to John "Only then would I consider predictions" makes me think its the former. Your mentioning that Eureqa uses cross-validation makes me think its the latter. This is my main point: > I haven't seen you actually demonstrate your ability to extraprolate the QBO. Its obvious from your graph this is possible to some extent, but how much? If you feed half of the data (1953-1983 say) into FindFormula and then extrapolate to 2013, how accurately can you do this? I would want to do lots of tests like this before extrapolating into the unknown. Cross-validation is just a more systematic way (or set of ways) of doing this. It involves dividing up the available to data to make best use (we hope) of a small amount of data for some purpose. A very common purpose is to assess the ability of a method to extrapolate. The basic idea is already contained in the paragraph I just quoted. The fact that Eureqa uses cross-validation internally is irrelevant. The point is to test the model produced by Eureqa on data that Eureqa has never seen.`

Graham, I apologize for implying that you thought that the model was not good enough. I now think that you want me to apply a stricter set of rules on how to discover new behaviors.

OK, first of all that is what Eureqa "kind of" does. It splits the data into a training interval and a validation interval. It uses the training interval to establish an error criteria (min error, max corr coeff, etc) and then continually tests the result on the unfitted validation interval. The way the Eureqa algorithm works is that it will stop when the error criteria on the training interval no longer improves the score on the validation interval (see BTW note below @ ).

The practical matter is that I have personally observed that one can get pretty much similar results if one uses the entire interval for training, and not bothering with the split data approach.

That is just a consequence of dealing with an oscillating yet stationary time-series . The stationarity of the time series is what I think causes this. Say I take the first half of the time-series for training and obtain a fit for the entire interval. Then I run the complement where I use the second half and validate on the first half. For this application the differences turn out to be small because the stationarity of the time series imposes that the same Fourier components are on both first and second half of the time series. And then could take the average of the two fits -- and that is what usually converges to the the fit of the entire time series.

For a non-stationary time series, all bets are off because you aren't even close to being assured that the same Fourier components are on both sides of the interval. That's where you can have problems with over-fitting.

And you are right in that I have no interest in making predictions at the present time. I want to use all available information to determine what the underlying physical factors are. There are no rules when it comes to that and you can "cheat" as much as you want -- you will either lock in to the fundamental formula or you won't. When I am done, I don't think it will be important to even mention that I used Eureqa.

@ BTW, some would argue that Eureqa cheats by evaluating the result on the validation interval, and uses that criteria to continue the search. If it was completely blind, it wouldn't be allowed to do that. Yet, it obviously does that from what I have observed, and that is likely one of the ingredients in how it can uncover new behaviors. That's probably also why the tool is so popular -- in that it takes a pragmatic approach. It's not used for predictions as much as trying to reveal the hidden internal behaviors of a time series.

`Graham, I apologize for implying that you thought that the model was not good enough. I now think that you want me to apply a stricter set of rules on how to discover new behaviors. > "The point is to test the model produced by Eureqa on data that Eureqa has never seen." OK, first of all that is what Eureqa "kind of" does. It splits the data into a training interval and a validation interval. It uses the training interval to establish an error criteria (min error, max corr coeff, etc) and then continually tests the result on the unfitted validation interval. The way the Eureqa algorithm works is that it will stop when the error criteria on the training interval no longer improves the score on the validation interval (see BTW note below @ ). The practical matter is that I have personally observed that one can get pretty much similar results if one uses the entire interval for training, and not bothering with the split data approach. That is just a consequence of dealing with an oscillating yet stationary time-series . The stationarity of the time series is what I think causes this. Say I take the first half of the time-series for training and obtain a fit for the entire interval. Then I run the complement where I use the second half and validate on the first half. For this application the differences turn out to be small because the stationarity of the time series imposes that the same Fourier components are on both first and second half of the time series. And then could take the average of the two fits -- and that is what usually converges to the the fit of the entire time series. For a non-stationary time series, all bets are off because you aren't even close to being assured that the same Fourier components are on both sides of the interval. That's where you can have problems with over-fitting. And you are right in that I have no interest in making predictions at the present time. I want to use all available information to determine what the underlying physical factors are. There are no rules when it comes to that and you can "cheat" as much as you want -- you will either lock in to the fundamental formula or you won't. When I am done, I don't think it will be important to even mention that I used Eureqa. @ BTW, some would argue that Eureqa cheats by evaluating the result on the validation interval, and uses that criteria to continue the search. If it was completely blind, it wouldn't be allowed to do that. Yet, it obviously does that from what I have observed, and that is likely one of the ingredients in how it can uncover new behaviors. That's probably also why the tool is so popular -- in that it takes a pragmatic approach. It's not used for predictions as much as trying to reveal the hidden internal behaviors of a time series.`

This is what the running integral of the QBO time series looks like. I am pondering whether this has something to do with the disturbance I see in the model fit after 1980.

An integral acts as a low-pass filter and so will show the longer wavelength components of the QBO.

The last post in the other long-running ENSO revisited thread, shows how a phase reversal in the standing wave may be occurring around that point. It also looks like it occurs at a zero-crossing, which is the only possibility consider how much energy it would take to instantaneously flip the amplitude from one sign to another.

https://forum.azimuthproject.org/discussion/comment/14746/#Comment_14746

I have to find some examples of a metastable phase reversal occurring in a standing wave.

By luck, I found this paper: Greenstadt, El W., et al. "A storm time, Pc 5 event observed in the outer magnetosphere by ISEE 1 and 2: Wave properties." Journal of Geophysical Research: Space Physics (1978–2012) 91.A12 (1986): 13398-13410.

I don't fully understand this, but they show a figure that shows a similar cumulative buildup and then a release as a phase reversal:

`This is what the running integral of the QBO time series looks like. I am pondering whether this has something to do with the disturbance I see in the model fit after 1980. ![QBOint](http://imageshack.com/a/img673/583/1CcmmU.gif) An integral acts as a low-pass filter and so will show the longer wavelength components of the QBO. The last post in the other long-running ENSO revisited thread, shows how a phase reversal in the standing wave may be occurring around that point. It also looks like it occurs at a zero-crossing, which is the only possibility consider how much energy it would take to instantaneously flip the amplitude from one sign to another. https://forum.azimuthproject.org/discussion/comment/14746/#Comment_14746 ![phase](http://imageshack.com/a/img673/7259/boMW5n.png) I have to find some examples of a metastable phase reversal occurring in a standing wave. By luck, I found this paper: Greenstadt, El W., et al. "A storm time, Pc 5 event observed in the outer magnetosphere by ISEE 1 and 2: Wave properties." Journal of Geophysical Research: Space Physics (1978–2012) 91.A12 (1986): 13398-13410. I don't fully understand this, but they show a figure that shows a similar cumulative buildup and then a release as a phase reversal: ![greenstadt](http://imageshack.com/a/img540/1922/dWuPMN.gif)`

Elsewhere, I made a comment that essentially states that the model I am using for ENSO is so obvious that others would have to first prove that it can't occur. The essential idea is that a volume of water has to slosh if it is subject to changes in angular momentum such as those caused by known periodic factors such as QBO, the Chandler Wobble, and diurnal lunar tides.

Take a look at the elegance of the two model fits below. The first is a differential expansion of the SOI model

$ LHS(t) \leftarrow f''(t) + \omega_0^2 f(t) = A \cdot qbo(t) +B \cdot cw(t) + C \cdot tide(t)\rightarrow RHS(t) $

where the LHS is the wave equation transformed ENSO data (observe the noise in the signal), and the RHS is an empirical model of the known forcing factors.

The second figure is the differential equation solution, where the f(t) is compared to the actual SOI over a 100 year time span.

`Elsewhere, I made a comment that essentially states that the model I am using for ENSO is so obvious that others would have to first prove that it can't occur. The essential idea is that a volume of water has to slosh if it is subject to changes in angular momentum such as those caused by known periodic factors such as QBO, the Chandler Wobble, and diurnal lunar tides. Take a look at the elegance of the two model fits below. The first is a differential expansion of the SOI model $ LHS(t) \leftarrow f''(t) + \omega_0^2 f(t) = A \cdot qbo(t) +B \cdot cw(t) + C \cdot tide(t)\rightarrow RHS(t) $ where the LHS is the wave equation transformed ENSO data (observe the noise in the signal), and the RHS is an empirical model of the known forcing factors. The second figure is the differential equation solution, where the f(t) is compared to the actual SOI over a 100 year time span. ![lhsrhs](http://imageshack.com/a/img537/774/YeVS2M.gif) ![soim](http://imageshack.com/a/img673/2777/maQYJn.gif)`

This is what happens with the Earth's excess angular velocity (momentum) over time

Note the strong hiccup ~1983 where it appeared the earth's rotation slowed down temporarily.

That coincides with the point that the SOI model also loses coherence with the underling data.

This is a prior analysis: http://www.earth.sinica.edu.tw/~bfchao/publication/eng/1995 Wavelet Analysis Provides a New Tool for Studying Earth's Rotation.pdf Chao, Benjamin Fong, and Isao Naito. "Wavelet analysis provides a new tool for studying Earth's rotation." Eos, Transactions American Geophysical Union 76.16 (1995): 161-165.

Like they say, what is confusing is the QBO has the opposite polarity when comparing the two events.

`This is what happens with the Earth's excess angular velocity (momentum) over time ![aam](http://imageshack.com/a/img673/2352/9VxGW8.gif) Note the strong hiccup ~1983 where it appeared the earth's rotation slowed down temporarily. That coincides with the point that the SOI model also loses coherence with the underling data. This is a prior analysis: http://www.earth.sinica.edu.tw/~bfchao/publication/eng/1995%20Wavelet%20Analysis%20Provides%20a%20New%20Tool%20for%20Studying%20Earth's%20Rotation.pdf Chao, Benjamin Fong, and Isao Naito. "Wavelet analysis provides a new tool for studying Earth's rotation." Eos, Transactions American Geophysical Union 76.16 (1995): 161-165. ![chao](http://imageshack.com/a/img537/369/TrpjEu.gif) Like they say, what is confusing is the QBO has the opposite polarity when comparing the two events.`

If I do a full model fit that extends beyond 1980 to present, it appears that a perfect phase reversal occurs from 1979 to 1996, a span of 17 years -- the purple bar in the chart below. A phase reversal in that a positive forcing has the opposite effect on the sign of the result.

I also added a level shift to the phase reversal. This mod is an experimental hack in lieu of trying to understand what is going on. The time series again reverses phase after 1996, with the QBO back in sync.

`If I do a full model fit that extends beyond 1980 to present, it appears that a perfect phase reversal occurs from 1979 to 1996, a span of 17 years -- the purple bar in the chart below. A phase reversal in that a positive forcing has the opposite effect on the sign of the result. ![phase](http://imageshack.com/a/img537/4569/iJiaja.png) I also added a level shift to the phase reversal. This mod is an experimental hack in lieu of trying to understand what is going on. The time series again reverses phase after 1996, with the QBO back in sync.`

I said that the spectral contents of proxy measures of ENSO are similar to measured ENSO

This is the power spectra of ENSO, i.e. SOI

This of the Universal ENSO Proxy, which goes back to 1700. Note the matching positions of the peaks. Unfortunately the UEP data is very noisy and only has one points per year. That's why the x-axis do nt match up.

`I said that the spectral contents of proxy measures of ENSO are similar to measured ENSO This is the power spectra of ENSO, i.e. SOI ![ENSO](http://imageshack.com/a/img909/8613/EJgoKL.png) This of the Universal ENSO Proxy, which goes back to 1700. Note the matching positions of the peaks. Unfortunately the UEP data is very noisy and only has one points per year. That's why the x-axis do nt match up. ![UEP](http://imageshack.com/a/img540/3121/PRq6Sk.png)`

How can a phase reversal occur?

Recall that the formulation for the standing wave equation in temporal frequency space is

$ (-\omega^2+\omega_o^2)F(\omega) = Forcing(\omega) $

Note that the forcing has a sign change about the resonant condition $\omega_0$. So what happens if temporarily a forcing is applied that is near the resonance condition but with a frequency on the side of the peak that has the opposite sign of the prevailing standing wave phase? I will assert that this may be enough to force the output to change sign, and that this most likely occur at a zero crossing where the impact would be strongest.

I can easily test this out but may have to add a stronger dampening term to make sure that the disturbance can die out.

`How can a phase reversal occur? Recall that the formulation for the standing wave equation in temporal frequency space is $ (-\omega^2+\omega_o^2)F(\omega) = Forcing(\omega) $ Note that the forcing has a sign change about the resonant condition $\omega_0$. So what happens if temporarily a forcing is applied that is near the resonance condition but with a frequency on the side of the peak that has the opposite sign of the prevailing standing wave phase? I will assert that this may be enough to force the output to change sign, and that this most likely occur at a zero crossing where the impact would be strongest. I can easily test this out but may have to add a stronger dampening term to make sure that the disturbance can die out.`

Here is a hypothesis.

There was a known event called the Pacific climate shift of 1976-1977. What happens as a result of this climate shift is that the characteristic frequency $\omega_0$ started increasing from a value 1.5 rad/year to a value of 2.7 rad/year. This would cross the threshold at which the QBO oscillation shifted from one sign to another. There would be a significant resonance when this happened. And of course the EN of 1982-1983 was the biggest of the century up to that point. http://www.whoi.edu/science/B/people/kamaral/1982-1983ElNino.html

Plus consider the fact that a significant volcano erupted in 1982 which may have also been the spark for a temporary characteristic frequency change, potentially due to changes in the density at the thermocline as less sunlight entered the water. This is a very narrow interval at which this all happened though so is somewhat questionable.

The disturbance eventually died out and then ended in 1996, right in time for another phase reversal for the huge 1998 EN event, which then locked in to the original phase alignment that was there before 1980.

Note the observation by Chao in comment #16, where he mentions the polarity changes with respect to QBO for the two events.

This is perhaps a bit too tidy but it makes an interesting narrative.

Excited yet?

`Here is a hypothesis. There was a known event called the Pacific climate shift of 1976-1977. What happens as a result of this climate shift is that the characteristic frequency $\omega_0$ started increasing from a value 1.5 rad/year to a value of 2.7 rad/year. This would cross the threshold at which the QBO oscillation shifted from one sign to another. There would be a significant resonance when this happened. And of course the EN of 1982-1983 was the biggest of the century up to that point. http://www.whoi.edu/science/B/people/kamaral/1982-1983ElNino.html Plus consider the fact that a significant volcano erupted in 1982 which may have also been the spark for a temporary characteristic frequency change, potentially due to changes in the density at the thermocline as less sunlight entered the water. This is a very narrow interval at which this all happened though so is somewhat questionable. The disturbance eventually died out and then ended in 1996, right in time for another phase reversal for the huge 1998 EN event, which then locked in to the original phase alignment that was there before 1980. Note the observation by Chao in comment #16, where he mentions the polarity changes with respect to QBO for the two events. This is perhaps a bit too tidy but it makes an interesting narrative. Excited yet?`

I am currently thinking that perhaps the ENSO and QBO are being forced by the same underlying factors, and what we are seeing WRT the difference in the two responses results from the characteristics of the medium that generates the response. So that the QBO, characterized by a low density medium (thin air), is able to respond quickly and thus has a very high characteristic frequency. But the ENSO, characterized by the sluggish response of water, must have a much lower characteristic frequency.

An interesting hypothesis is that the draconic lunar tide of duration 27.2122 days may be the common underlying significant driver. Unfortunately, the QBO and ENSO data are sampled at only a monthly rate, so we can’t do much to pull out the signal intact from our data ... Or can we?

What’s intriguing is that the driving force isn’t at this monthly level anyways, but likely is the result of a beat of the monthly tidal signal with the yearly signal. It is expected that strong tidal forces will interact with seasonal behavior in such a situation and that we should be able to see the effects of the oscillating tidal signal where it constructively interferes during specific times of the year. For example, a strong tidal force during the hottest part of the year, or an interaction of the lunar signal with the solar tide (a precisely 6 month period) can pull out a constructively interfering signal.

To analyze the effect, we need to find the tidal frequency and un-alias the signal by multiples of 2$\pi$

So that the draconic frequency of 2$\pi$/(27.212/365.25) = 84.33 rads/year becomes 2.65 rads/year after removing 13 x 2$\pi$ worth of folded signal. This then has an apparent period of 2.368 years. Compare that to the nominal 2.33 year period of QBO, and the strong spectral component of ENSO at this frequency (see comment #18), and the figure below.

The period of 2.368 years is just to the left of the peaks -- it should be lined up at 0.22 on the x-axis. That is agonizingly close, but over the course of 100 years, the difference in propagated phase error between 2.368 and 2.333 is about one-half the average cycle. But then again, 2.368 could be the actual QBO period since we only have ~60 years of data, and so could easily be off by a ¼ of a cycle. It could be that the 7-year syncing of 2.33 x 3 is enough to push it towards that value.

Next, consider the draconic period in its bi-monthly form against that of the Chandler wobble period of 432.5 days ( for a constant seasonal signal [1]). This comes out to 2.3681/2 years when un-aliased or 1.184 years = 432.47 days . Consider that the beat frequency of the Chandler wobble is 1/(1-1/1.184) = 6.43 years, which shows up in the ENSO signal. Is this a coincidence or a related forcing?

The other strong QBO-like signal is shown in the figure below labeled “Folded Mean Lunation”.

The unaliased synodic lunar tide forcing comes out to be 2$\pi$/(29.531/365.25) = 77.71. After removing 12*2$\pi$ worth of aliased signal, this comes out to 2.314 rads/year or a period of 2.71 years, which is close to the secondary signal shown. What’s more is that the synodic frequency is not steady over time and so will split into sidebands as the lunation period will cycle between maximum and minimum values. This can explain why there are satellite peaks as these will again get aliased and form a beat pattern with specific seasons.

And finally consider the fact that these two tidal signals -- draconic and synodic -- appear in both ENSO and QBO. I found the synodic signal in the QBO using Eureqa where it actually pulled the high frequency signal out of the data. Read this post from over a year ago, when I first observed the aliasing:

http://contextearth.com/2014/06/17/the-qbom/

Eureqa picked out (with abslutely none of my help) the frequency components

77.72 rads/year → 29.53 days = synodic month

153 rads/year → 15 days = 1/2 synodic month

72.73 rads/year → aliased 2.355 years, close to aliased period of 2.368 years for draconic month

(72.73 - 12 2$\pi$ = -2.67 rads/yr)

so the numbers actually work a bit better for QBO than for ENSO, but the bottomline fact that they have this synodic and draconic lunar month commonality is striking.

As a caveat, these numbers really need to be spot on to make sense because tidal frequencies are so well-characterized. Anything that is off by a little bit will propagate as an error over the 100+ year range we are dealing with. They are tantalizing close though.

[1] MASAKI, Yoshimitsu. "Expected Seasonal Excitations of Earth Rotation by Unmodeled Geophysical Fluids." Bulletin of the Geographical Survey Institute 54 (2007): 2.

`I am currently thinking that perhaps the ENSO and QBO are being forced by the same underlying factors, and what we are seeing WRT the difference in the two responses results from the characteristics of the medium that generates the response. So that the QBO, characterized by a low density medium (thin air), is able to respond quickly and thus has a very high characteristic frequency. But the ENSO, characterized by the sluggish response of water, must have a much lower characteristic frequency. An interesting hypothesis is that the draconic lunar tide of duration 27.2122 days may be the common underlying significant driver. Unfortunately, the QBO and ENSO data are sampled at only a monthly rate, so we can’t do much to pull out the signal intact from our data ... Or can we? What’s intriguing is that the driving force isn’t at this monthly level anyways, but likely is the result of a beat of the monthly tidal signal with the yearly signal. It is expected that strong tidal forces will interact with seasonal behavior in such a situation and that we should be able to see the effects of the oscillating tidal signal where it constructively interferes during specific times of the year. For example, a strong tidal force during the hottest part of the year, or an interaction of the lunar signal with the solar tide (a precisely 6 month period) can pull out a constructively interfering signal. To analyze the effect, we need to find the tidal frequency and un-alias the signal by multiples of 2$\pi$ So that the draconic frequency of 2$\pi$/(27.212/365.25) = 84.33 rads/year becomes 2.65 rads/year after removing 13 x 2$\pi$ worth of folded signal. This then has an apparent period of 2.368 years. Compare that to the nominal 2.33 year period of QBO, and the strong spectral component of ENSO at this frequency (see comment #18), and the figure below. ![spectra](http://imageshack.com/a/img673/3796/Ul6mlo.png) The period of 2.368 years is just to the left of the peaks -- it should be lined up at 0.22 on the x-axis. That is agonizingly close, but over the course of 100 years, the difference in propagated phase error between 2.368 and 2.333 is about one-half the average cycle. But then again, 2.368 could be the actual QBO period since we only have ~60 years of data, and so could easily be off by a ¼ of a cycle. It could be that the 7-year syncing of 2.33 x 3 is enough to push it towards that value. Next, consider the draconic period in its bi-monthly form against that of the Chandler wobble period of 432.5 days ( for a constant seasonal signal [1]). This comes out to 2.3681/2 years when un-aliased or 1.184 years = 432.47 days . Consider that the beat frequency of the Chandler wobble is 1/(1-1/1.184) = 6.43 years, which shows up in the ENSO signal. Is this a coincidence or a related forcing? The other strong QBO-like signal is shown in the figure below labeled “Folded Mean Lunation”. ![lunation](http://imageshack.com/a/img909/8613/EJgoKL.png) The unaliased synodic lunar tide forcing comes out to be 2$\pi$/(29.531/365.25) = 77.71. After removing 12*2$\pi$ worth of aliased signal, this comes out to 2.314 rads/year or a period of 2.71 years, which is close to the secondary signal shown. What’s more is that the synodic frequency is not steady over time and so will split into sidebands as the lunation period will cycle between maximum and minimum values. This can explain why there are satellite peaks as these will again get aliased and form a beat pattern with specific seasons. And finally consider the fact that these two tidal signals -- draconic and synodic -- appear in both ENSO and QBO. I found the synodic signal in the QBO using Eureqa where it actually pulled the high frequency signal out of the data. Read this post from over a year ago, when I first observed the aliasing: http://contextearth.com/2014/06/17/the-qbom/ ![eureqa](http://imagizer.imageshack.us/a/img855/7435/femn.gif) Eureqa picked out (with abslutely none of my help) the frequency components 77.72 rads/year → 29.53 days = synodic month 153 rads/year → 15 days = 1/2 synodic month 72.73 rads/year → aliased 2.355 years, close to aliased period of 2.368 years for draconic month (72.73 - 12 2$\pi$ = -2.67 rads/yr) so the numbers actually work a bit better for QBO than for ENSO, but the bottomline fact that they have this synodic and draconic lunar month commonality is striking. As a caveat, these numbers really need to be spot on to make sense because tidal frequencies are so well-characterized. Anything that is off by a little bit will propagate as an error over the 100+ year range we are dealing with. They are tantalizing close though. [1] MASAKI, Yoshimitsu. "Expected Seasonal Excitations of Earth Rotation by Unmodeled Geophysical Fluids." Bulletin of the Geographical Survey Institute 54 (2007): 2.`

This shows how tricky it is to determine the underlying periodicity in ENSO

I did an initial count of identifying peaks and got 42 in a 100 year interval. This comes out to 2.381 years, which is close to 2.368 for the aliased draconic cycle. But then I notice a prominent shoulder at the question mark and think that the count should be 43 peaks. That would put it at 2.326 years per cycle, which is closer to the nominal period of 2.33 of the QBO.

This NASA page shows the variability in the lunar months http://eclipse.gsfc.nasa.gov/SEhelp/moonorbit.html

Synodic month variability

Draconic month variability

Note the bimodal shape. I think this is due to the PDF of an Sine function modulation being bimodal (i.e. inverse function is an ArcSin).

The bimodality may impact ENSO in the synodic month more than the draconic, if the fundamental period wants to lock into the aligned 2.3333 year groove of QBO. The satellites in comment #21 around 2.75 years correspond to approximately a +/-1 hour change in the synodic 29 day 12 hour 44 min cycle.

`This shows how tricky it is to determine the underlying periodicity in ENSO ![100year](http://imageshack.com/a/img537/1244/0sAMdH.gif) I did an initial count of identifying peaks and got 42 in a 100 year interval. This comes out to 2.381 years, which is close to 2.368 for the aliased draconic cycle. But then I notice a prominent shoulder at the question mark and think that the count should be 43 peaks. That would put it at 2.326 years per cycle, which is closer to the nominal period of 2.33 of the QBO. This NASA page shows the variability in the lunar months http://eclipse.gsfc.nasa.gov/SEhelp/moonorbit.html Synodic month variability ![syn](http://eclipse.gsfc.nasa.gov/SEhelp/image/Fig4-3b.png) Draconic month variability ![dra](http://eclipse.gsfc.nasa.gov/SEhelp/image/Fig4-12b.png) Note the bimodal shape. I think this is due to the PDF of an Sine function modulation being bimodal (i.e. inverse function is an ArcSin). ![arcsin](https://upload.wikimedia.org/wikipedia/commons/thumb/d/db/Arcsin_density.svg/350px-Arcsin_density.svg.png) The bimodality may impact ENSO in the synodic month more than the draconic, if the fundamental period wants to lock into the aligned 2.3333 year groove of QBO. The satellites in comment #21 around 2.75 years correspond to approximately a +/-1 hour change in the synodic 29 day 12 hour 44 min cycle.`

Have to thank Graham for putting up this topic of QBO predictability.

I was going through some old Eureqa machine leaning experiments on QBO and once again realized the significance of what it found.

## Started with raw QBO data

## Next targeted a solution with sinusoidal factors

maximizing correlation coefficient

## Then let Eureqa crank away for 20 hours

## Picked a high complexity solution

(high complexity doesn't matter as the other solutions have similar components)

## The two strongest factors that Eureqa found

which were sinusoids with an obviously folded or aliased frequency

## The two factors have periods when unaliased that match the draconic and synodic lunar month, with errors 0.01% and 0.02% respecively

What are the chances of that?

`Have to thank Graham for putting up this topic of QBO predictability. I was going through some old Eureqa machine leaning experiments on QBO and once again realized the significance of what it found. Started with raw QBO data ==== ![rawQBO](http://imageshack.com/a/img540/8111/JVyVPG.gif) Next targeted a solution with sinusoidal factors ==== maximizing correlation coefficient ![params](http://imageshack.com/a/img911/1995/eW8IQg.gif) Then let Eureqa crank away for 20 hours ==== ![prog](http://imageshack.com/a/img905/707/l573Zu.gif) Picked a high complexity solution ==== (high complexity doesn't matter as the other solutions have similar components) ![soln](http://imageshack.com/a/img540/9429/S7WLD8.gif) The two strongest factors that Eureqa found ==== which were sinusoids with an obviously folded or aliased frequency ~~~~ strength aliased freq period days actual % error 78 2.66341033 2.359075219 27.20894362 27.212=draconic 0.011233004 35 2.29753386 2.734751989 29.53743558 29.531=synodic -0.021787874 ~~~~ The two factors have periods when unaliased that match the draconic and synodic lunar month, with errors 0.01% and 0.02% respecively ==== What are the chances of that?`

But before getting excited, I want to see some checks. If you feed Eureka half the data does it choose a similar model? And are you sure this is not some artefact related to the use of calendar months and years? Inferring peroids of about a month from monthly data makes me suspicious.

`> The most exciting phrase to hear in science, the one that heralds new discoveries, is not “Eureka” but “That’s funny...” Isaac Asimov But before getting excited, I want to see some checks. If you feed Eureka half the data does it choose a similar model? And are you sure this is not some artefact related to the use of calendar months and years? Inferring peroids of about a month from monthly data makes me suspicious.`

Graham, I may do that some day but alas I had my license to the Eureqa engine expire. I can still look at old logs but I can't do any more experiments for the time being.

Let me alternatively propose a premise. Suppose it was 200 years ago and scientists knew about the concept of tides and that they had measured the Draconic and Synodic cycles to values of 27.212 days and 29.531 days, respectively. Say they also knew that a specific month of the year had some maximum interaction with the tidal oscillation. This interaction would dwarf the rest of the year in its impact so that it was ok to just use that value as the yearly data point.

What would the resultant oscillation look like when plotted year-to-year?

One would be cos((2$\pi$/(27.212/365.25)-13x2$\pi$)t+phase) and the other cos((2*$\pi$/(29.531/365.25)-12x2$\pi$)t+phase) , where t is time in years.

i.e. the tidal frequency folded by N x 2$\pi$ (easy to check this graphically as well)

Now, consider that some scientists wanted to find evidence that this phenomenon was actually happening. With a lot of effort (before the age of computers) they deconstructed some known oscillating time-series data set to see if there were factors which matched these periods. If they had access to the QBO data, they would have been very happy and would have patted themselves on the back for finding the tidal influence.

Given that same premise, turn the clock to today and do the same experiment. Nothing has changed, yet the acceptance of the findings would get questioned. It doesn't matter how close the numbers are or if they can be reproduced with a validation set, as you are suggesting. Somebody is going to say that the premise is invalid, i.e. tides have nothing to do with it, and that will be that ( note caveat below [*] )

Yet, the results of the QBO analysis in #22 should not be surprising. The Wikipedia entry says that

One form of a gravity wave is the lunar tide.

The other factor is that the medium has a property that it has a maximal impact coinciding with a specific calendar month.

And that is likely the same mechanism for ENSO. The two behaviors, QBO and ENSO just happen to have the same underlying forcing mechanism. However, unless we can first get some buy-in to the physical premise, there may be not a heck of a lot more we can do to change minds.

That's what is fun about doing this. We are trying to topple the apple cart of the scientific establishment's current understanding. What's also sweet about this is that the main target is likely Richard Lindzen (yes that Lindzen, notable AGW denier) who has the original theory of QBO, and which apparently has nothing to do with lunar tides.

[*] the caveat is that there are papers that propose this effect. I wrote about it here on my blog over a year ago.

see Li, G., Zong, H., & Zhang, Q. (2011). 27.3-day and average 13.6-day periodic oscillations in the Earth’s rotation rate and atmospheric pressure fields due to celestial gravitation forcing. Advances in Atmospheric Sciences, 28, 45-58.

Krahenbuhl, D. S., Pace, M. B., Cerveny, R. S., & Balling, R. C. (2011). Monthly lunar declination extremes' influence on tropospheric circulation patterns. Journal of Geophysical Research: Atmospheres (1984–2012), 116(D23).

`Graham, I may do that some day but alas I had my license to the Eureqa engine expire. I can still look at old logs but I can't do any more experiments for the time being. Let me alternatively propose a premise. Suppose it was 200 years ago and scientists knew about the concept of tides and that they had measured the Draconic and Synodic cycles to values of 27.212 days and 29.531 days, respectively. Say they also knew that a specific month of the year had some maximum interaction with the tidal oscillation. This interaction would dwarf the rest of the year in its impact so that it was ok to just use that value as the yearly data point. What would the resultant oscillation look like when plotted year-to-year? One would be cos((2$\pi$/(27.212/365.25)-13x2$\pi$)t+phase) and the other cos((2*$\pi$/(29.531/365.25)-12x2$\pi$)t+phase) , where t is time in years. i.e. the tidal frequency folded by N x 2$\pi$ (easy to check this graphically as well) Now, consider that some scientists wanted to find evidence that this phenomenon was actually happening. With a lot of effort (before the age of computers) they deconstructed some known oscillating time-series data set to see if there were factors which matched these periods. If they had access to the QBO data, they would have been very happy and would have patted themselves on the back for finding the tidal influence. Given that same premise, turn the clock to today and do the same experiment. Nothing has changed, yet the acceptance of the findings would get questioned. It doesn't matter how close the numbers are or if they can be reproduced with a validation set, as you are suggesting. Somebody is going to say that the premise is invalid, i.e. tides have nothing to do with it, and that will be that ( note caveat below [*] ) Yet, the results of the QBO analysis in [#22](/discussion/comment/14843/#Comment_14843) should not be surprising. The Wikipedia entry says that > " The precise nature of the waves responsible for this effect was heavily debated; in recent years, however, gravity waves have come to be seen as a major contributor and the QBO is now simulated in a growing number of climate models (Takahashi 1996, Scaife et al. 2000, Giorgetta et al. 2002)" One form of a gravity wave is the lunar tide. The other factor is that the medium has a property that it has a maximal impact coinciding with a specific calendar month. And that is likely the same mechanism for ENSO. The two behaviors, QBO and ENSO just happen to have the same underlying forcing mechanism. However, unless we can first get some buy-in to the physical premise, there may be not a heck of a lot more we can do to change minds. That's what is fun about doing this. We are trying to topple the apple cart of the scientific establishment's current understanding. What's also sweet about this is that the main target is likely Richard Lindzen (yes that Lindzen, notable AGW denier) who has the original theory of QBO, and which apparently has nothing to do with lunar tides. [*] the caveat is that there are papers that propose this effect. I wrote about it [here on my blog](http://contextearth.com/2014/08/15/change-of-tide-in-thought/) over a year ago. see Li, G., Zong, H., & Zhang, Q. (2011). 27.3-day and average 13.6-day periodic oscillations in the Earth’s rotation rate and atmospheric pressure fields due to celestial gravitation forcing. Advances in Atmospheric Sciences, 28, 45-58. Krahenbuhl, D. S., Pace, M. B., Cerveny, R. S., & Balling, R. C. (2011). Monthly lunar declination extremes' influence on tropospheric circulation patterns. Journal of Geophysical Research: Atmospheres (1984–2012), 116(D23).`

Suppose there was jitter on a periodic waveform that tended to align the peaks toward 1/2 year dates. In terms of a frequency power spectra, this jitter will not change the positions of the component peaks, but will impact their magnitude. As long as the jitter doesn't have long-range order, this is a commonly used approximation.

This paper [1] shows a "re-analyzed" Kelvin wave that shows jitter that also appears to align on 2, 2.5, and 3 year intervals, but staying near the 2.33 periodicity over the long term. You can see that with the 7 year intervals

[1]Y.-H. Kim and H.-Y. Chun, “Momentum forcing of the quasi-biennial oscillation by equatorial waves in recent reanalyses,” Atmospheric Chemistry and Physics, vol. 15, no. 12, pp. 6577–6587, 2015.

Note the comment #6 near the top of this thread, where I stated the groupings of (2+2+3) + (2+2.5+2.5) + (2+2+3), which do appear above! (not in that order though)

I think this supports the idea that the 2.33 year period is emerging as a result of the constructive interference of the Draconic tidal cycle with the yearly cycle. There could be as well interference with the 1/2 year cycle.

Yet since the aliased Draconic period is 2.368 years as opposed to 2.333 year, this alignment will gradually get out of sync after~29 cycles. That is if we believe that 2.368 is the actual forcing driver.

But if this yearly interference didn't exist, the lunar Draconic forcing would probably not have emerged. That's why the theory has some staying power, even in the light of these seasonal barriers.

`Suppose there was jitter on a periodic waveform that tended to align the peaks toward 1/2 year dates. In terms of a frequency power spectra, this jitter will not change the positions of the component peaks, but will impact their magnitude. As long as the jitter doesn't have long-range order, this is a commonly used approximation. This paper [1] shows a "re-analyzed" Kelvin wave that shows jitter that also appears to align on 2, 2.5, and 3 year intervals, but staying near the 2.33 periodicity over the long term. You can see that with the 7 year intervals ![kelvin](http://imageshack.com/a/img673/5980/1f0egq.png) [1]Y.-H. Kim and H.-Y. Chun, “Momentum forcing of the quasi-biennial oscillation by equatorial waves in recent reanalyses,” Atmospheric Chemistry and Physics, vol. 15, no. 12, pp. 6577–6587, 2015. Note the comment #6 near the top of this thread, where I stated the groupings of (2+2+3) + (2+2.5+2.5) + (2+2+3), which do appear above! (not in that order though) I think this supports the idea that the 2.33 year period is emerging as a result of the constructive interference of the Draconic tidal cycle with the yearly cycle. There could be as well interference with the 1/2 year cycle. Yet since the aliased Draconic period is 2.368 years as opposed to 2.333 year, this alignment will gradually get out of sync after~29 cycles. That is if we believe that 2.368 is the actual forcing driver. But if this yearly interference didn't exist, the lunar Draconic forcing would probably not have emerged. That's why the theory has some staying power, even in the light of these seasonal barriers.`

I am adding a web server app to do automated ENSO model fitting. Here is an example of a model that uses the QBO-folded tidal components, long-range tidal factors, and a Chandler wobble component.

The fitting interval is rather narrow, from 1995 to 2013. You can see that it is a very tight fit in this interval, yet when the model is back-extrapolated to 1880, the validated interval from 1880 to 1980 is impressive. Most of the peaks and valleys line up and often with the right amplitude. The model flipped the sign of the broad ElNino valley centered at 1941, which is indicated by the up arrow. And the interval from 1980 to 1995 also out-of-phase.

What I am going to do is provide interactive options for users to adjust the model periods and fitting intervals. Hopefully they can use the model to find other interesting correlations.

The current parameter set is

Surfers know the 9-year cycle well http://www.severn-bore.co.uk/bore_cycle.html

My premise is that the standing wave in the Pacific ocean known as the Southern Oscillation is caused by sloshing of the thermocline and this sloshing is incredibly sensitive to any kind of angular momentum changes in the earth's rotation. And because of the large inertial mass of the ocean, the long-term periods have a greater observable impact than the short-term parameters.

And the same thing holds for the standing wave in the stratosphere known as the Quasi-Biennial Oscillation. But in this case, the parameters 3 and 4 are more important as the response time of the low-density stratosphere favors shorter periods due to its smaller inertial mass. That is also why the QBO appears less erratic as well, since the long term forcing factors won't introduce complex beat patterns.

`I am adding a web server app to do automated ENSO model fitting. Here is an example of a model that uses the QBO-folded tidal components, long-range tidal factors, and a Chandler wobble component. The fitting interval is rather narrow, from 1995 to 2013. You can see that it is a very tight fit in this interval, yet when the model is back-extrapolated to 1880, the validated interval from 1880 to 1980 is impressive. Most of the peaks and valleys line up and often with the right amplitude. The model flipped the sign of the broad ElNino valley centered at 1941, which is indicated by the up arrow. And the interval from 1980 to 1995 also out-of-phase. ![fit](http://imageshack.com/a/img661/2510/CIzmaz.png) What I am going to do is provide interactive options for users to adjust the model periods and fitting intervals. Hopefully they can use the model to find other interesting correlations. The current parameter set is 1. Long term tidal periods of 9.03, 18.613, and 4.424 years 2. Chandler wobble period of 6.48 years 3. Folded tidal periods of 2.245, 2.091, 2.765 (Synodic) years averaging to the Draconic 2.368 years along with a strong 2.33 year period 4. A lunar spin-orbit coupling of 2.9 years Surfers know the 9-year cycle well http://www.severn-bore.co.uk/bore_cycle.html ![cycle](http://www.severn-bore.co.uk/wpimages/wp619e83c5_06.png) My premise is that the standing wave in the Pacific ocean known as the Southern Oscillation is caused by sloshing of the thermocline and this sloshing is incredibly sensitive to any kind of angular momentum changes in the earth's rotation. And because of the large inertial mass of the ocean, the long-term periods have a greater observable impact than the short-term parameters. And the same thing holds for the standing wave in the stratosphere known as the Quasi-Biennial Oscillation. But in this case, the parameters 3 and 4 are more important as the response time of the low-density stratosphere favors shorter periods due to its smaller inertial mass. That is also why the QBO appears less erratic as well, since the long term forcing factors won't introduce complex beat patterns.`

I am doing multiple linear regression of several aliased tidal periods on the QBO from 1953 to present and using the same factors on multiple linear regression on ENSO from 1880 to 1980 and the agreement in the scaling of the common factors is striking.

ENSO 2nd derivative

`Estimate Std. Error t value Pr(>|t|) msS[1:1200] 0.393387 0.076489 5.143 3.17e-07 *** msC[1:1200] 0.300668 0.076635 3.923 9.24e-05 *** s9S[1:1200] -0.155704 0.076556 -2.034 0.042190 * s9C[1:1200] -0.014996 0.076896 -0.195 0.845411 diS[1:1200] 0.010538 0.076490 0.138 0.890446 diC[1:1200] 0.005264 0.077649 0.068 0.945965 q4S[1:1200] 0.288473 0.076631 3.764 0.000175 *** q4C[1:1200] 0.737927 0.076501 9.646 < 2e-16 *** qmS[1:1200] 0.267377 0.078113 3.423 0.000641 *** qmC[1:1200] -1.291226 0.078034 -16.547 < 2e-16 *** q1S[1:1200] 0.802413 0.076488 10.491 < 2e-16 *** q1C[1:1200] -0.085617 0.076619 -1.117 0.264033 cwS[1:1200] 0.419867 0.076419 5.494 4.81e-08 *** cwC[1:1200] -0.008576 0.076778 -0.112 0.911085 sdS[1:1200] -0.476016 0.076555 -6.218 6.99e-10 *** sdC[1:1200] 0.208593 0.076419 2.730 0.006436 ** qsS[1:1200] 0.808250 0.077621 10.413 < 2e-16 *** qsC[1:1200] 0.220302 0.077570 2.840 0.004589 ** spS[1:1200] -0.754940 0.077362 -9.759 < 2e-16 *** spC[1:1200] 1.306538 0.077567 16.844 < 2e-16 *** q2S[1:1200] 0.794521 0.076619 10.370 < 2e-16 *** q2C[1:1200] 0.487892 0.076766 6.356 2.97e-10 *** q3S[1:1200] 0.831220 0.076436 10.875 < 2e-16 *** q3C[1:1200] -0.547102 0.076755 -7.128 1.78e-12 *** qbS[1:1200] -2.246223 0.077833 -28.859 < 2e-16 *** qbC[1:1200] 0.484379 0.078057 6.205 7.55e-10 ***`

QBO

`Estimate Std. Error t value Pr(>|t|) msS[877:1596] -1.26202 0.22753 -5.547 4.15e-08 *** msC[877:1596] -1.56597 0.22743 -6.886 1.29e-11 *** s9S[877:1596] 0.22970 0.22723 1.011 0.312445 s9C[877:1596] 0.42552 0.22646 1.879 0.060659 . diS[877:1596] 0.19315 0.23125 0.835 0.403888 diC[877:1596] -0.03392 0.22352 -0.152 0.879440 q4S[877:1596] 0.45803 0.22857 2.004 0.045470 * q4C[877:1596] 0.16178 0.22860 0.708 0.479365 qmS[877:1596] -0.82360 0.22496 -3.661 0.000270 *** qmC[877:1596] -3.20132 0.22533 -14.207 < 2e-16 *** q1S[877:1596] -0.69555 0.22555 -3.084 0.002125 ** q1C[877:1596] -0.98784 0.22509 -4.389 1.32e-05 *** cwS[877:1596] 0.37967 0.22720 1.671 0.095156 . cwC[877:1596] -0.11175 0.22602 -0.494 0.621151 sdS[877:1596] -0.03667 0.22830 -0.161 0.872449 sdC[877:1596] 1.46746 0.22857 6.420 2.53e-10 *** qsS[877:1596] 0.72815 0.22558 3.228 0.001306 ** qsC[877:1596] -3.11850 0.22413 -13.914 < 2e-16 *** spS[877:1596] 1.81587 0.22445 8.090 2.67e-15 *** spC[877:1596] 0.86197 0.22583 3.817 0.000147 *** q2S[877:1596] 0.07538 0.22677 0.332 0.739683 q2C[877:1596] 1.14416 0.22582 5.067 5.20e-07 *** q3S[877:1596] -0.08424 0.22747 -0.370 0.711238 q3C[877:1596] 1.57475 0.22771 6.916 1.06e-11 *** qbS[877:1596] 3.58416 0.22578 15.875 < 2e-16 *** qbC[877:1596] -6.14774 0.22424 -27.416 < 2e-16 ***`

This looks like many more factors than there actually are. In the multiple linear regression approach, for a particular sinusoidal factor, the Sin and Cos are considered independently so as to get the correct phase. So the character S or C is appended to the factors to distinguish between the Sin and Cos factor.

Anything with 3 asterisks (***) has a high statistical significance. If either a Sin or Cos has 3 asterisks, that is a strong contributing factor. (If both do, that only means it happens to be a linear combination)

The factors that do not add to the fit have periods greater than 4 years, and so will contribute to the actual ENSO (not the second derivative). The QBO is not as sensitive to these long-period forcing as the stratosphere has a much smaller inertia, and so responds primarily to the faster tidal factors.

Both QBO and ENSO fits have correlation coefficients greater than 0.8, which is actually difficult to achieve with non-trending pseudo-oscillating data.

The QBO plot is truncated up to 1953 because data for QBO is not available until after this date.

I expect to hear some comments to the effect that "With four parameters I can fit an elephant", which is OK by me.

This may need to be changed to "With the same nine parameters I can fit an elephant and a hummingbird". Scientists have always thought that ENSO (the elephant) and QBO (the hummingbird) differed behaviorally. That is not so true as they appear to respond to a common set of tidal factors.

`I am doing multiple linear regression of several aliased tidal periods on the QBO from 1953 to present and using the same factors on multiple linear regression on ENSO from 1880 to 1980 and the agreement in the scaling of the common factors is striking. ENSO 2nd derivative <code> Estimate Std. Error t value Pr(>|t|) msS[1:1200] 0.393387 0.076489 5.143 3.17e-07 *** msC[1:1200] 0.300668 0.076635 3.923 9.24e-05 *** s9S[1:1200] -0.155704 0.076556 -2.034 0.042190 * s9C[1:1200] -0.014996 0.076896 -0.195 0.845411 diS[1:1200] 0.010538 0.076490 0.138 0.890446 diC[1:1200] 0.005264 0.077649 0.068 0.945965 q4S[1:1200] 0.288473 0.076631 3.764 0.000175 *** q4C[1:1200] 0.737927 0.076501 9.646 < 2e-16 *** qmS[1:1200] 0.267377 0.078113 3.423 0.000641 *** qmC[1:1200] -1.291226 0.078034 -16.547 < 2e-16 *** q1S[1:1200] 0.802413 0.076488 10.491 < 2e-16 *** q1C[1:1200] -0.085617 0.076619 -1.117 0.264033 cwS[1:1200] 0.419867 0.076419 5.494 4.81e-08 *** cwC[1:1200] -0.008576 0.076778 -0.112 0.911085 sdS[1:1200] -0.476016 0.076555 -6.218 6.99e-10 *** sdC[1:1200] 0.208593 0.076419 2.730 0.006436 ** qsS[1:1200] 0.808250 0.077621 10.413 < 2e-16 *** qsC[1:1200] 0.220302 0.077570 2.840 0.004589 ** spS[1:1200] -0.754940 0.077362 -9.759 < 2e-16 *** spC[1:1200] 1.306538 0.077567 16.844 < 2e-16 *** q2S[1:1200] 0.794521 0.076619 10.370 < 2e-16 *** q2C[1:1200] 0.487892 0.076766 6.356 2.97e-10 *** q3S[1:1200] 0.831220 0.076436 10.875 < 2e-16 *** q3C[1:1200] -0.547102 0.076755 -7.128 1.78e-12 *** qbS[1:1200] -2.246223 0.077833 -28.859 < 2e-16 *** qbC[1:1200] 0.484379 0.078057 6.205 7.55e-10 ***</code> QBO <code> Estimate Std. Error t value Pr(>|t|) msS[877:1596] -1.26202 0.22753 -5.547 4.15e-08 *** msC[877:1596] -1.56597 0.22743 -6.886 1.29e-11 *** s9S[877:1596] 0.22970 0.22723 1.011 0.312445 s9C[877:1596] 0.42552 0.22646 1.879 0.060659 . diS[877:1596] 0.19315 0.23125 0.835 0.403888 diC[877:1596] -0.03392 0.22352 -0.152 0.879440 q4S[877:1596] 0.45803 0.22857 2.004 0.045470 * q4C[877:1596] 0.16178 0.22860 0.708 0.479365 qmS[877:1596] -0.82360 0.22496 -3.661 0.000270 *** qmC[877:1596] -3.20132 0.22533 -14.207 < 2e-16 *** q1S[877:1596] -0.69555 0.22555 -3.084 0.002125 ** q1C[877:1596] -0.98784 0.22509 -4.389 1.32e-05 *** cwS[877:1596] 0.37967 0.22720 1.671 0.095156 . cwC[877:1596] -0.11175 0.22602 -0.494 0.621151 sdS[877:1596] -0.03667 0.22830 -0.161 0.872449 sdC[877:1596] 1.46746 0.22857 6.420 2.53e-10 *** qsS[877:1596] 0.72815 0.22558 3.228 0.001306 ** qsC[877:1596] -3.11850 0.22413 -13.914 < 2e-16 *** spS[877:1596] 1.81587 0.22445 8.090 2.67e-15 *** spC[877:1596] 0.86197 0.22583 3.817 0.000147 *** q2S[877:1596] 0.07538 0.22677 0.332 0.739683 q2C[877:1596] 1.14416 0.22582 5.067 5.20e-07 *** q3S[877:1596] -0.08424 0.22747 -0.370 0.711238 q3C[877:1596] 1.57475 0.22771 6.916 1.06e-11 *** qbS[877:1596] 3.58416 0.22578 15.875 < 2e-16 *** qbC[877:1596] -6.14774 0.22424 -27.416 < 2e-16 *** </code> This looks like many more factors than there actually are. In the multiple linear regression approach, for a particular sinusoidal factor, the Sin and Cos are considered independently so as to get the correct phase. So the character S or C is appended to the factors to distinguish between the Sin and Cos factor. Anything with 3 asterisks (***) has a high statistical significance. If either a Sin or Cos has 3 asterisks, that is a strong contributing factor. (If both do, that only means it happens to be a linear combination) The factors that do not add to the fit have periods greater than 4 years, and so will contribute to the actual ENSO (not the second derivative). The QBO is not as sensitive to these long-period forcing as the stratosphere has a much smaller inertia, and so responds primarily to the faster tidal factors. Both QBO and ENSO fits have correlation coefficients greater than 0.8, which is actually difficult to achieve with non-trending pseudo-oscillating data. ![a](http://imageshack.com/a/img537/4889/82vQYf.png) The QBO plot is truncated up to 1953 because data for QBO is not available until after this date. ![b](http://imageshack.com/a/img673/4091/OX5E2a.png) I expect to hear some comments to the effect that "With four parameters I can fit an elephant", which is OK by me. This may need to be changed to "With the same nine parameters I can fit an elephant and a hummingbird". Scientists have always thought that ENSO (the elephant) and QBO (the hummingbird) differed behaviorally. That is not so true as they appear to respond to a common set of tidal factors.`

Couple of questions:

(1) Why use of second (I presume time) derivative of the ENSO? And that's related to ... (2) What's the form of the design matrix used in the regression, sometimes also called the factors matrix? It would have rows corresponding to observations and columns corresponding to things like

q1sandq1c. (3) Can I get a copy of the raw design matrix and responses used for the fit? It would be good to have the rawq1,s9,sd, etc. Are these normalized in some way so it makes sense to extract the trigonometric projections? (4) Regarding "(3)", if these are of lags, how's coincidence (zero lag) determined?A quick look at the literature (mind, I'm no meteorologist or geophysicist or climatologist) indicates a

lotof work on connections between QBO and ENSO, e.g., http://www.nature.com/ngeo/journal/v7/n5/full/ngeo2138.html (I don't have a copy because I don't subscribe) and its figures 1, then 2, then 3, then 4, then 5, and the paper 6 from the famous team of Gray, Sheaffer, and Knaff from 1992. So, I woudn't be surprised if there's a link. Is the link stronger than you suspected, or people suspect?`Couple of questions: (1) Why use of second (I presume time) derivative of the ENSO? And that's related to ... (2) What's the form of the design matrix used in the regression, sometimes also called the factors matrix? It would have rows corresponding to observations and columns corresponding to things like _q1s_ and _q1c_. (3) Can I get a copy of the raw design matrix and responses used for the fit? It would be good to have the raw _q1_, _s9_, _sd_, etc. Are these normalized in some way so it makes sense to extract the trigonometric projections? (4) Regarding "(3)", if these are of lags, how's coincidence (zero lag) determined? A quick look at the literature (mind, I'm no meteorologist or geophysicist or climatologist) indicates a _lot_ of work on connections between QBO and ENSO, e.g., http://www.nature.com/ngeo/journal/v7/n5/full/ngeo2138.html (I don't have a copy because I don't subscribe) and its figures [1](http://www.nature.com/ngeo/journal/v7/n5/fig_tab/ngeo2138_F2.html), then [2](http://staff.ucar.edu/browse/people/14577/OSGC-000-000-010-862), then [3](http://www-das.uwyo.edu/~geerts/cwx/notes/chap11/qbo_enso.html), then [4](http://www.eps.jhu.edu/~cig4/dynamicsseminar12507.pdf), then [5](http://eprints.ucm.es/25544/1/calvofernandez10libre.pdf), and the paper [6](http://rammb.cira.colostate.edu/resources/docs/gray%20et%20al%201992.pdf) from the famous team of Gray, Sheaffer, and Knaff from 1992. So, I woudn't be surprised if there's a link. Is the link stronger than you suspected, or people suspect?`

Jan, I use the second derivative because that exposes more directly the forcing factor, F(t)

$ f''(t) + \omega_0^2 f(t) = F(t) $

The factors matrix is flat with these periods (in years) all contributing equal potential power, but the regression returns a weighting and phase.

I quickly made this chart just for you, so you can see the final weightings of the factors.

s9 and di have very low weightings for the second derivative but are much more important for the f(t).

The concept of lags may not mean anything at this point because there is no reference point to attach it to. Ideally, what I would want to have is a fully calculated luni-solar gravitational pull time series that comes from the JPL ephemeris calculations. What I am doing above is trying to create a first-order approximation to this time series. which will hopefully identify the primary factors,

Yes, I have seen many of the papers linking QBO and ENSO, but that's as far as it goes.

`Jan, I use the second derivative because that exposes more directly the forcing factor, F(t) $ f''(t) + \omega_0^2 f(t) = F(t) $ The factors matrix is flat with these periods (in years) all contributing equal potential power, but the regression returns a weighting and phase. s9 is 9.08 -- metonic cycle di is 18.63 -- diurnal tide q4 is 4.06 -- close to aliased Anomalistic month 3.93 qm is 2.245 -- strong sideband of main QBO period of 2.33 years q1 is 1.745 -- close to 1/(1-1/2.33) aliasing residue cw is 6.48 -- Chandler wobble period sd is 8.848/2 -- semidiurnal tide qs is 2.763 -- close to aliased Synodic cycle 2.71 sp is 2.9 -- spin-orbit coupling cycle Moon and Earth q2 is 2.09 -- q3 is 3.52 -- the third strongest tide after diurnal and semidiurnal qb is 2.329 -- close to aliased Draconic cycle ms is 1.93 -- MSm tide I quickly made this chart just for you, so you can see the final weightings of the factors. ![chart](http://imageshack.com/a/img538/6530/pqOQsR.gif) s9 and di have very low weightings for the second derivative but are much more important for the f(t). The concept of lags may not mean anything at this point because there is no reference point to attach it to. Ideally, what I would want to have is a fully calculated luni-solar gravitational pull time series that comes from the JPL ephemeris calculations. What I am doing above is trying to create a first-order approximation to this time series. which will hopefully identify the primary factors, Yes, I have seen many of the papers linking QBO and ENSO, but that's as far as it goes.`

This is a nice recent summary (2014) of the major and minor tidal periods

R. D. Ray and S. Y. Erofeeva, “Long‐period tidal variations in the length of day,” Journal of Geophysical Research: Solid Earth, vol. 119, no. 2, pp. 1498–1509, 2014.

The machine learning found a clear 3.52 year period, which is very close to the listed 1305 day period - 1305/365.25 = 3.57 years.

the following paper considers this cycle as part of the "Lp group" four of the main long periodic tides (9.3 yr and 1305, 205 and 121 days

S. Loyer, J. Hinderer, and J.-P. Boy, “Determination of the gravimetric factor at the Chandler period from Earth orientation data and superconducting gravimetry observations,” Geophysical Journal International, vol. 136, no. 1, pp. 1–7, 1999.

BTW, the paper that you reference is humorous in how they show correlation between ENSO and QBO.

In the upper panel, they really cherry-pick a five year interval where the two measures clearly correlate. Unfortunately over a longer interval, the correlation is

not at allvisibly apparent. Kind of wonder how that escaped peer review. Then again, one can always bring this up to show how we "outsiders" have to work extra hard to get acceptance. All the climate "insiders" have to do is show their affiliation and write a "just-so" story as a narrative. :)`This is a nice recent summary (2014) of the major and minor tidal periods ![tides](http://imageshack.com/a/img905/2140/hE19jx.gif) R. D. Ray and S. Y. Erofeeva, “Long‐period tidal variations in the length of day,” Journal of Geophysical Research: Solid Earth, vol. 119, no. 2, pp. 1498–1509, 2014. The machine learning found a clear 3.52 year period, which is very close to the listed 1305 day period - 1305/365.25 = 3.57 years. the following paper considers this cycle as part of the "Lp group" four of the main long periodic tides (9.3 yr and 1305, 205 and 121 days S. Loyer, J. Hinderer, and J.-P. Boy, “Determination of the gravimetric factor at the Chandler period from Earth orientation data and superconducting gravimetry observations,” Geophysical Journal International, vol. 136, no. 1, pp. 1–7, 1999. BTW, the paper that [you reference](http://www.nature.com/ngeo/journal/v7/n5/full/ngeo2138.html) is humorous in how they show correlation between ENSO and QBO. ![nature](http://www.nature.com/ngeo/journal/v7/n5/images/ngeo2138-f2.jpg) In the upper panel, they really cherry-pick a five year interval where the two measures clearly correlate. Unfortunately over a longer interval, the correlation is *not at all* visibly apparent. Kind of wonder how that escaped peer review. Then again, one can always bring this up to show how we "outsiders" have to work extra hard to get acceptance. All the climate "insiders" have to do is show their affiliation and write a "just-so" story as a narrative. :)`

Thank you for all this great information. It will take me quite some time to digest it all.

BTW, while you may have "found humorous" how Neu, et al "demonstrated" a correlated, I found both hilarious and a textbook case of worshipping at the altar of black box methods the claim "The machine learning found a clear 3.52 year period, which is very close to the listed 1305 day period - 1305/365.25 = 3.57 years."

Cheerio!

`Thank you for all this great information. It will take me quite some time to digest it all. BTW, while you may have "found humorous" how Neu, et al "demonstrated" a correlated, I found both hilarious and a textbook case of worshipping at the altar of black box methods the claim "The machine learning found a clear 3.52 year period, which is very close to the listed 1305 day period - 1305/365.25 = 3.57 years." Cheerio!`

This paper [1] has an interesting observation relating QBO and ENSO.

[1]B. F. Chao and I. Naito, “Wavelet analysis provides a new tool for studying Earth’s rotation,” Eos, Transactions American Geophysical Union, vol. 76, no. 16, pp. 161–165, 1995.

The phase relationship is obvious when the 2nd-derivative of ENSO (top) is compared to QBO (bottom). The first vertical red line in the figure connects the two. At the first turquoise region, you can see a phase shift in the waveform, after which the phase flips by 180 degrees with respect to the model reference. It is then in-phase with the QBO (second vertical red line) for approximately 15 years. Then at the second turquoise region, the phase shifts and then the data reverts to following the model reference, which is o.o.p. with the QBO.

The model reference was generated by fitting only data from 1880 to 1980 -- anything subsequent to 1980 is extrapolated as a continuation of the sinusoidal forcing functions.

The working explanation for this behavior is that the standing wave in the Pacific Ocean which manifests as ENSO is metastable with respect to its initial excursion. So when the standing wave crosses zero on its way to changing polarity, it transiently has no inertia and is therefore sensitive to a forcing that could reverse its intended direction. In this case, the hypothesis is that some sort of climate and/or geophysical shift occurred around 1980 to cause the reversal. This was perhaps not the lowest energy configuration, however, and it then reversed direction back to its original phase by 1996.

`This paper [1] has an interesting observation relating QBO and ENSO. ![chao](http://imageshack.com/a/img538/103/RVqQFy.png) [1]B. F. Chao and I. Naito, “Wavelet analysis provides a new tool for studying Earth’s rotation,” Eos, Transactions American Geophysical Union, vol. 76, no. 16, pp. 161–165, 1995. The phase relationship is obvious when the 2nd-derivative of ENSO (top) is compared to QBO (bottom). The first vertical red line in the figure connects the two. At the first turquoise region, you can see a phase shift in the waveform, after which the phase flips by 180 degrees with respect to the model reference. It is then in-phase with the QBO (second vertical red line) for approximately 15 years. Then at the second turquoise region, the phase shifts and then the data reverts to following the model reference, which is o.o.p. with the QBO. ![oop](http://imageshack.com/a/img537/6035/aF9wlD.png) The model reference was generated by fitting only data from 1880 to 1980 -- anything subsequent to 1980 is extrapolated as a continuation of the sinusoidal forcing functions. The working explanation for this behavior is that the standing wave in the Pacific Ocean which manifests as ENSO is metastable with respect to its initial excursion. So when the standing wave crosses zero on its way to changing polarity, it transiently has no inertia and is therefore sensitive to a forcing that could reverse its intended direction. In this case, the hypothesis is that some sort of climate and/or geophysical shift occurred around 1980 to cause the reversal. This was perhaps not the lowest energy configuration, however, and it then reversed direction back to its original phase by 1996.`

I'm curious, in the sloshing model introduced in the Introduction of a "Sloshing Model for ENSO", has there been, perhaps separately because of lack of space in the primary paper, been an attempt to reconcile the sloshing model with the flow dynamics and circulation of the stratified Pacific? The core principle of oceanography is stratification, due to salinity differences and differences in available potential energy, as well as limited compressibility of water in contrast to atmosphere. This stratification does not get disrupted without a commensurate input of energy, and this is believed (and calculated!) to be the reason for the long term overturning circulation in oceans. Accordingly, it seems that a sloshing model needs to rectify itself against those facts, and I'm hoping it's been done, just not written up.

`I'm curious, in the sloshing model introduced in the Introduction of a "Sloshing Model for ENSO", has there been, perhaps separately because of lack of space in the primary paper, been an attempt to reconcile the sloshing model with the flow dynamics and circulation of the stratified Pacific? The core principle of oceanography is stratification, due to salinity differences and differences in available potential energy, as well as limited compressibility of water in contrast to atmosphere. This stratification does not get disrupted without a commensurate input of energy, and this is believed (and calculated!) to be the reason for the long term overturning circulation in oceans. Accordingly, it seems that a sloshing model needs to rectify itself against those facts, and I'm hoping it's been done, just not written up.`

Jan, The sloshing model is marginally different than the ENSO models that are of current interest and considered the standard models for ENSO. There is the Zebiac-Kane model and various delayed action oscillators, which are described on the Azimuth Project http://www.azimuthproject.org/azimuth/show/ENSO

But those IMO are complicated by the fact that they add nonlinear terms -- I think on the grounds that someone wanted to see chaotic behavior. What happens if you remove the nonlinear terms is that you end up with the basic 2nd-order differential equation called the wave equation, which is the one I have been using recently. The Mathieu sloshing equation is a slight perturbation on the wave equation, which verges toward chaotic behavior, but is a long way from the chaos of the Zebiac-Kane type models. I started with the Mathieu (see note ** below), and frankly probably wasted a of time because I thought that would be my entry in weaning off a fully chaotic model.

Allan Clarke at FSU is the biggest proponent of wave equation type models for ENSO. Read this paper [1] and you can see how the derivation of the wave equation occurs. It is spelled out in its full uncomplicated glory

This formulation very hard to argue against because it is a simple model. But if you ever see anyone use it in the literature with periodic forcing terms applied, I would like to know about it.

Thinking about the larger-scale ocean patterns is not on my radar as I am focused on this single standing wave mode.

[1] A. J. Clarke, S. Van Gorder, and G. Colantuono, “Wind stress curl and ENSO discharge/recharge in the equatorial Pacific,” Journal of physical oceanography, vol. 37, no. 4, pp. 1077–1091, 2007.

** I remember reading an aside in one article where the authors dismissed a Mathieu equation type of formulation because the geometry of a spherical earth does not allow the boundary conditions necessary for that kind of solution.

`Jan, The sloshing model is marginally different than the ENSO models that are of current interest and considered the standard models for ENSO. There is the Zebiac-Kane model and various delayed action oscillators, which are described on the Azimuth Project http://www.azimuthproject.org/azimuth/show/ENSO But those IMO are complicated by the fact that they add nonlinear terms -- I think on the grounds that someone wanted to see chaotic behavior. What happens if you remove the nonlinear terms is that you end up with the basic 2nd-order differential equation called the wave equation, which is the one I have been using recently. The Mathieu sloshing equation is a slight perturbation on the wave equation, which verges toward chaotic behavior, but is a long way from the chaos of the Zebiac-Kane type models. I started with the Mathieu (see note ** below), and frankly probably wasted a of time because I thought that would be my entry in weaning off a fully chaotic model. Allan Clarke at FSU is the biggest proponent of wave equation type models for ENSO. Read this paper [1] and you can see how the derivation of the wave equation occurs. It is spelled out in its full uncomplicated glory > ![clarke](http://imageshack.com/a/img673/7682/m57j2Y.gif) This formulation very hard to argue against because it is a simple model. But if you ever see anyone use it in the literature with periodic forcing terms applied, I would like to know about it. Thinking about the larger-scale ocean patterns is not on my radar as I am focused on this single standing wave mode. [1] A. J. Clarke, S. Van Gorder, and G. Colantuono, “Wind stress curl and ENSO discharge/recharge in the equatorial Pacific,” Journal of physical oceanography, vol. 37, no. 4, pp. 1077–1091, 2007. ** I remember reading an aside in one article where the authors dismissed a Mathieu equation type of formulation because the geometry of a spherical earth does not allow the boundary conditions necessary for that kind of solution.`

Is it possible that phase reversals in a standing wave can occur via this mechanism? Consider the spatio-temporal standing wave as specified by

$ f(x,t) = sin(\omega t) sin(k x) $

The zero'd boundary conditions are determined by setting the wave number k to $2\pi/L$ , which means that both $2\pi/L$ and $-2\pi/L$ are valid and so the sign of f(x,t) can conceivably change with a forcing perturbation localized in space.

`Is it possible that phase reversals in a standing wave can occur via this mechanism? Consider the spatio-temporal standing wave as specified by $ f(x,t) = sin(\omega t) sin(k x) $ The zero'd boundary conditions are determined by setting the wave number k to $2\pi/L$ , which means that both $2\pi/L$ and $-2\pi/L$ are valid and so the sign of f(x,t) can conceivably change with a forcing perturbation localized in space.`

So, again, what would you like me to look at? If, indeed, if only the "single standing wave mode" is being modeled, what's the estimate of the size of the unexplained residual?

`So, again, what would you like me to look at? If, indeed, if only the "single standing wave mode" is being modeled, what's the estimate of the size of the unexplained residual?`

The questions aren't necessarily directed to anyone in particular. I am tossing out these ideas so I don't lose them..

I think discussions about residuals have to start with the inherent noise in the system. For example, the SOI is a measure of the dipole difference between Tahiti and Darwin, but Tahiti /= -Darwin, and so that tells us something about the noise in the system. I recall the highest correlation coefficient that I could achieve between NINO34 and SOI was about 0.86 -- and these are both considered indices of ENSO. So trying to explain a residual is conflated with uncertainties in the data and with variations in the weather.

I am not even sure if a correlation coefficient is the best error criteria. I know it is not squared error or absolute error, which does really poorly on time series profile shapes. I have been experimenting with a metric that detects same sign excursion.

I have thought that the only way to evaluate the model is to compare to another model and use something like AIC or BIC to evaluate.

`The questions aren't necessarily directed to anyone in particular. I am tossing out these ideas so I don't lose them.. I think discussions about residuals have to start with the inherent noise in the system. For example, the SOI is a measure of the dipole difference between Tahiti and Darwin, but Tahiti /= -Darwin, and so that tells us something about the noise in the system. I recall the highest correlation coefficient that I could achieve between NINO34 and SOI was about 0.86 -- and these are both considered indices of ENSO. So trying to explain a residual is conflated with uncertainties in the data and with variations in the weather. I am not even sure if a correlation coefficient is the best error criteria. I know it is not squared error or absolute error, which does really poorly on time series profile shapes. I have been experimenting with a metric that detects same sign excursion. ~~~ %% excursion(+X, +Y, -R) % % Excursion match of two arrays % excursion(X, Y, R) :- mean(X, XM), XOff is -XM, mean(Y, YM), YOff is -YM, DXM mapdot XOff .+ X, DYM mapdot YOff .+ Y, DXE mapdot sign ~> DXM, DYE mapdot sign ~> DYM, Num dot DXE*DYE, length(X, N), R is ( 1 + Num/N)/2. ~~~ I have thought that the only way to evaluate the model is to compare to another model and use something like AIC or BIC to evaluate. 1. Establishing a plausible physical model 2. Determining the parameter values of the model based on known characteristics 3. Figure out an information metric for the model parameters and result`

In general, comparing time series is problematic, both because it's hard to find a good divergence measure, and, then, once such a measure is in hand, it in general is itself correlated in time. Time series can differ in phase as well as amplitudes, even if they are standardized. The most comprehensive treatment of such comparisons comes, I believe, from the seminal work of Sugihara and May in 1990, described and applied thereafter in a series of papers by their students. Began with G. Sugihara, R. M. May, "Nonlinear forecasting as a way of distinguishing between chaos and measurement error in time series",

Nature, 344(6268), 734-741, 1990, and http://dx.doi.org/10.1098/rsta.1994.0106, and ended up in things like http://dx.doi.org/10.1126/science.1227079, and http://dx.doi.org/10.1890/14-1479.1 (also http://www.esajournals.org/doi/pdf/10.1890/14-1479.1), http://dx.doi.org/10.1371/journal.pone.0018295, and even http://www.pnas.org/content/96/25/14210.full.pdf. There's also software available to do this, per https://cran.r-project.org/web/packages/multispatialCCM/multispatialCCM.pdf.Updated14thSeptember2015: Maher and Hernandez (2015), CauseMap: fast inference of causality from complex time series. PeerJ 3:e824; DOI 10.7717/peerj.824. There is also a pre-publication review of the same.It has also come to my attention that Corey Chivers, from whom I learn a lot, has an

Rversion in the works and that there is a Python version.`In general, comparing time series is problematic, both because it's hard to find a good divergence measure, and, then, once such a measure is in hand, it in general is itself correlated in time. Time series can differ in phase as well as amplitudes, even if they are standardized. The most comprehensive treatment of such comparisons comes, I believe, from the seminal work of Sugihara and May in 1990, described and applied thereafter in a series of papers by their students. Began with G. Sugihara, R. M. May, "Nonlinear forecasting as a way of distinguishing between chaos and measurement error in time series", _Nature_, 344(6268), 734-741, 1990, and http://dx.doi.org/10.1098/rsta.1994.0106, and ended up in things like http://dx.doi.org/10.1126/science.1227079, and http://dx.doi.org/10.1890/14-1479.1 (also http://www.esajournals.org/doi/pdf/10.1890/14-1479.1), http://dx.doi.org/10.1371/journal.pone.0018295, and even http://www.pnas.org/content/96/25/14210.full.pdf. There's also software available to do this, per https://cran.r-project.org/web/packages/multispatialCCM/multispatialCCM.pdf. _Updated_ _14th_ _September_ _2015_: Maher and Hernandez (2015), CauseMap: fast inference of causality from complex time series. PeerJ 3:e824; DOI [10.7717/peerj.824](http://dx.doi.org/10.7717/peerj.824). There is also a [pre-publication review of the same](https://publons.com/review/78979/). It has also come to my attention that [Corey Chivers](http://bayesianbiologist.com/), from whom I learn a lot, has an _R_ [version in the works](https://github.com/cjbayesian/rccm) and that there [is a Python version](https://pypi.python.org/pypi/pyccm).`

Interactive prototype of ENSO model, BabyD style, i.e. DiffEq is not actually solved but approximated as a Fourier series expansion of main sinusoidal factors.

http://entroplet.com/context_enso/navigate

Click on "graph" to see results. This is a prototype which will be made more general so ask if you have questions or suggestions.

`Interactive prototype of ENSO model, BabyD style, i.e. DiffEq is not actually solved but approximated as a Fourier series expansion of main sinusoidal factors. http://entroplet.com/context_enso/navigate Click on "graph" to see results. This is a prototype which will be made more general so ask if you have questions or suggestions.`

This is the kind of analysis one can do with the interactive graph linked to above

The training interval is set from 1930 to 1980 and the validation interval is 1880 to 1930

This is a very aggressive set of parameters but the truth is in there somewhere.

I can imagine integrating this with the CSALT model of AGW I developed a couple of years ago.

`This is the kind of analysis one can do with the interactive graph linked to above ![val](http://imageshack.com/a/img537/3528/M6XySw.png) The training interval is set from 1930 to 1980 and the validation interval is 1880 to 1930 This is a very aggressive set of parameters but the truth is in there somewhere. I can imagine integrating this with the [CSALT model of AGW](http://contextearth.com/2013/10/26/csalt-model/) I developed a couple of years ago.`

In http://entroplet.com/context_enso/navigate, what's the "log_model" mean in the Table? I would guess this is some kind of log of likelihood, but is there documentation of the specific definition? Also, do I understand correctly that the modeling involves assessing components using Fourier transforms to obtain spectral components? Are these windowed and do the windows slide? Windowing is needed to suppress aliasing. Sliding windows are typically used when spectra are non-stationary. Is there a justification for believing the spectra are stationary? What are the number of taps used in the transform? In the figure just above, when "Validation" is mentioned, what's that mean, precisely? I assume that's a hold-out validation, but how many folds were used? One? More?

Just trying to understand what these figures could mean? Are they simply comparisons of various climate-derived measures and series against one another? Are there any structural elements in these models, e.g., physics of oceans?

`In [http://entroplet.com/context_enso/navigate](http://entroplet.com/context_enso/navigate), what's the "log_model" mean in the Table? I would guess this is some kind of log of likelihood, but is there documentation of the specific definition? Also, do I understand correctly that the modeling involves assessing components using Fourier transforms to obtain spectral components? Are these windowed and do the windows slide? Windowing is needed to suppress aliasing. Sliding windows are typically used when spectra are non-stationary. Is there a justification for believing the spectra are stationary? What are the number of taps used in the transform? In the figure just above, when "Validation" is mentioned, what's that mean, precisely? I assume that's a hold-out validation, but how many folds were used? One? More? Just trying to understand what these figures could mean? Are they simply comparisons of various climate-derived measures and series against one another? Are there any structural elements in these models, e.g., physics of oceans?`

Jan, Some of that is part of the CSALT model -- the reference to log_model is a factor concerning log(CO2), which is the AGW forcing parameter. Ignore that for now, but eventually it will get incorporated into a larger model.

I don't typically use Fourier transforms to root out the spectral components. Those were obtained (1) from using exploratory machine learning via Eureqa and (2) from the knowledge of main tidal periods, wobble periods, etc. I am avoiding tweaking these values too much because that may lead to overfitting.

The validation is that a set of frequencies is used on the training interval and then the constructed waveform is extended, i.e. extrapolated, across the validation interval.

Yes, the goal is to determine whether the time series is stationary, see my comment #13 above.

This model is first-order physics of the sloshing of the ocean, following Allan Clarke's derivation of the wave equation described in comment #35.

In the differential equation expansion mode, I compare the data on the left-hand side (LHS) against the forcing on the right-hand side (RHS)

$ LHS(t) \leftarrow f''(t) + \omega_0^2 f(t) = B \cdot cw(t) + C \cdot tide(t)\rightarrow RHS(t) $

where the LHS is the wave equation transformed ENSO data (observe the noise in the signal), and the RHS is an empirical model of the known forcing factors. What this interactive application does is determine what combination of forcing input parameters will match best the forced response. I imagine that many people have done this as part of researching ENSO but I have yet to find anything in the peer-reviewed literature.

I will add the parameters to the input so those can be tweaked to see what impact it has on the fit.

`Jan, Some of that is part of the CSALT model -- the reference to log_model is a factor concerning log(CO2), which is the AGW forcing parameter. Ignore that for now, but eventually it will get incorporated into a larger model. I don't typically use Fourier transforms to root out the spectral components. Those were obtained (1) from using exploratory machine learning via Eureqa and (2) from the knowledge of main tidal periods, wobble periods, etc. I am avoiding tweaking these values too much because that may lead to overfitting. The validation is that a set of frequencies is used on the training interval and then the constructed waveform is extended, i.e. extrapolated, across the validation interval. Yes, the goal is to determine whether the time series is stationary, see my comment #13 above. This model is first-order physics of the sloshing of the ocean, following Allan Clarke's derivation of the wave equation described in comment #35. In the differential equation expansion mode, I compare the data on the left-hand side (LHS) against the forcing on the right-hand side (RHS) $ LHS(t) \leftarrow f''(t) + \omega_0^2 f(t) = B \cdot cw(t) + C \cdot tide(t)\rightarrow RHS(t) $ where the LHS is the wave equation transformed ENSO data (observe the noise in the signal), and the RHS is an empirical model of the known forcing factors. What this interactive application does is determine what combination of forcing input parameters will match best the forced response. I imagine that many people have done this as part of researching ENSO but I have yet to find anything in the peer-reviewed literature. I will add the parameters to the input so those can be tweaked to see what impact it has on the fit.`

The premise that I am working from is that the response follows from the solution to the wave equation for a largely unknown set of inputs.

This is a hard problem to discern what the inputs may be, but remember that we have two separate sets of measures that we can evaluate against : (1) the ENSO standing wave and (2) the QBO standing wave.

That's like having two different electrical circuits to discriminate against given a common unknown forcing. The additional constraints will narrow the forcing solution possibilities.

And there may be more of these measures available to check against -- for example, tidal gauge data shows a strong indicator for an ENSO signature

http://contextearth.com/2014/09/21/an-enso-predictor-based-on-a-tide-gauge-data-model/

also here on Azimuth

https://forum.azimuthproject.org/discussion/comment/12573/#Comment_12573

That particular tidal gauge dataset is the first I tried since it appeared to be one of the longer time-series available. There lots more of these available.

My feeling is that this is a rich environment for analysis that curiously has not been tapped.

`The premise that I am working from is that the response follows from the solution to the wave equation for a largely unknown set of inputs. This is a hard problem to discern what the inputs may be, but remember that we have two separate sets of measures that we can evaluate against : (1) the ENSO standing wave and (2) the QBO standing wave. That's like having two different electrical circuits to discriminate against given a common unknown forcing. The additional constraints will narrow the forcing solution possibilities. And there may be more of these measures available to check against -- for example, tidal gauge data shows a strong indicator for an ENSO signature http://contextearth.com/2014/09/21/an-enso-predictor-based-on-a-tide-gauge-data-model/ also here on Azimuth https://forum.azimuthproject.org/discussion/comment/12573/#Comment_12573 That particular tidal gauge dataset is the first I tried since it appeared to be one of the longer time-series available. There lots more of these available. My feeling is that this is a rich environment for analysis that curiously has not been tapped.`

Thank you so much for the details in #43 and #44! I'm going to see if I can adapt the PODES work, per Bayesian uncertainty quantification for differential equations and Bayesian Analysis of ODE's: solver optimal accuracy and Bayes factors, and perhaps methods by Soetaert and Herman and from their book.

`Thank you so much for the details in #43 and #44! I'm going to see if I can adapt the PODES work, per [Bayesian uncertainty quantification for differential equations](http://andrewgelman.com/2014/04/29/bayesian-uncertainty-quantification-differential-equations/) and [Bayesian Analysis of ODE's: solver optimal accuracy and Bayes factors](http://arxiv.org/abs/1311.2281), and perhaps [methods by Soetaert and Herman](https://cran.r-project.org/web/packages/ecolMod/index.html) and [from their book](http://www.springer.com/us/book/9781402086236).`

Minor question: How do you calculate the second derivative of ENSO? Generally speaking it's good to do it by smoothing with some kernel first. This leads to things like "Hessian of Gaussian" kernels (here in 1D) which are convolved with the series to obtain the requisite second derivative.

`Minor question: How do you calculate the second derivative of ENSO? Generally speaking it's good to do it by smoothing with some kernel first. This leads to things like "Hessian of Gaussian" kernels (here in 1D) which are convolved with the series to obtain the requisite second derivative.`

Thanks for the question. I think doing the second derivative correctly is part of the secret recipe to this analysis. I'm sure many people have taken the 2nd derivative of the raw ENSO and just about gagged when they noticed how noisy it is. Yet, there is a REAL signal inside that noise.

Nick Stokes at the Moyhu blog showed how one can either filter before calculating the 2nd derivative or delay the filtering step and do it after. I do it before using something that approximates a Gaussian filter -- a triple filter with staggered window widths, say 7+5+3 months. I try to keep that as minimal as possible, just enough to remove the seasonal noise.

I have a "kind of proof" that this works -- with the forcing solution it approximately finds, I then apply the result and do the double integration using Mathematica. It ends up with something that closely approximates the smoothed ENSO. This is actually a pretty cool pre-conditioning step in solving a DiffEq.

As a bottom-line, the assumption is that the real ENSO effects are on scales greater than 1 year, and those variations within a year are caused by weather and typhoon activity that is more-or-less random. The latter is all that we want to filter out. Yet, as a caveat, there may be something interesting at the monthly level that we can also eventually root out.

`> "How do you calculate the second derivative of ENSO? Generally speaking it's good to do it by smoothing with some kernel first. " Thanks for the question. I think doing the second derivative correctly is part of the secret recipe to this analysis. I'm sure many people have taken the 2nd derivative of the raw ENSO and just about gagged when they noticed how noisy it is. Yet, there is a REAL signal inside that noise. Nick Stokes at the Moyhu blog showed how one can either filter before calculating the 2nd derivative or delay the filtering step and do it after. I do it before using something that approximates a Gaussian filter -- a triple filter with staggered window widths, say 7+5+3 months. I try to keep that as minimal as possible, just enough to remove the seasonal noise. I have a "kind of proof" that this works -- with the forcing solution it approximately finds, I then apply the result and do the double integration using Mathematica. It ends up with something that closely approximates the smoothed ENSO. This is actually a pretty cool pre-conditioning step in solving a DiffEq. As a bottom-line, the assumption is that the real ENSO effects are on scales greater than 1 year, and those variations within a year are caused by weather and typhoon activity that is more-or-less random. The latter is all that we want to filter out. Yet, as a caveat, there may be something interesting at the monthly level that we can also eventually root out.`

I did some searching and could again find very little on why the low-pass-filtered sea-level height anomaly (SLHA) shows an obvious exact 2-year periodic variation over spans of 50 years. It is not the quasi-biennial period of 2.33 years that ENSO shows.

This is analysis from a year ago for Sydney Harbor tide data:

https://forum.azimuthproject.org/discussion/comment/12570/#Comment_12570

But if the 2-year period is removed from SLHA via a crude notch filter, then the agreement with ENSO SOI is evident

There are likely two modes in the SLHA, a biennial mode and an ENSO mode.

This is close to the problem described:

Kim, Kwang-Yul, James J. O'Brien, and Albert I. Barcilon. "The principal physical modes of variability over the tropical Pacific." Earth Interactions 7.3 (2003): 1-32.

I will add this SLHA data to the entroplet server so I can let the MLR algorithm try to sort it out.

`I did some searching and could again find very little on why the low-pass-filtered sea-level height anomaly (SLHA) shows an obvious exact 2-year periodic variation over spans of 50 years. It is not the quasi-biennial period of 2.33 years that ENSO shows. This is analysis from a year ago for Sydney Harbor tide data: https://forum.azimuthproject.org/discussion/comment/12570/#Comment_12570 ![tide](http://imagizer.imageshack.us/a/img903/7049/yxkLF2.gif) But if the 2-year period is removed from SLHA via a crude notch filter, then the agreement with ENSO SOI is evident ![notch](http://imagizer.imageshack.us/a/img538/1953/itXH4n.gif) There are likely two modes in the SLHA, a biennial mode and an ENSO mode. This is close to the problem described: Kim, Kwang-Yul, James J. O'Brien, and Albert I. Barcilon. "The principal physical modes of variability over the tropical Pacific." Earth Interactions 7.3 (2003): 1-32. I will add this SLHA data to the entroplet server so I can let the MLR algorithm try to sort it out.`

Is there a description online for this "MLR algorithm"?

`Is there a description online for this "MLR algorithm"?`

It is just multiple linear regression with sine and cosine terms. Everybody seems to reinvent this algorithm so I am not sure it has a name. Here is someone asking about it a couple of months ago:

http://stats.stackexchange.com/questions/163837/linear-regression-with-sine-cosine-elements

In R, it is the application of the "lm" function.

Is that enough for you to go on?

`It is just multiple linear regression with sine and cosine terms. Everybody seems to reinvent this algorithm so I am not sure it has a name. Here is someone asking about it a couple of months ago: http://stats.stackexchange.com/questions/163837/linear-regression-with-sine-cosine-elements In R, it is the application of the "lm" function. Is that enough for you to go on?`