Options

Crunch time

135

Comments

  • 101.

    John is it possible to send some stuff to Mr. Pollard your grad student, so he could run them and show them to you? This way you are not bogged down looking at some new stuff.

    Comment Source:John is it possible to send some stuff to Mr. Pollard your grad student, so he could run them and show them to you? This way you are not bogged down looking at some new stuff.
  • 102.

    I'll email you Blake's email address. Of course I have already given him plenty of work to do - he's supposed to be figuring out Susanne Still's software for optimal Markov prediction models and writing another paper on entropy increase for open Markov processes. But grad students thrive on being overworked.

    Comment Source:I'll email you Blake's email address. Of course I have already given him plenty of work to do - he's supposed to be figuring out Susanne Still's software for optimal Markov prediction models and writing another paper on entropy increase for open Markov processes. But grad students thrive on being overworked. <img src = "http://math.ucr.edu/home/baez/emoticons/tongue2.gif" alt = ""/>
  • 103.

    Thanx John, all I want him to do is showing you some interactive code running, it will take a few minutes of his time.

    Comment Source:Thanx John, all I want him to do is showing you some interactive code running, it will take a few minutes of his time.
  • 104.
    edited November 2014

    13) The Azimuth Project has carefully examined a 2013 paper by Ludescher et al called Very early warning of next El Niño, which seeks to use a climate network for El Niño prediction.

    They build their climate network using correlations between daily surface air temperature data between points in the El Niño basin and points outside this region, as shown here:

    The red dots are the points in their version of the El Niño basin.

    Comment Source:13) The Azimuth Project has carefully examined a 2013 paper by Ludescher _et al_ called <a href="http://www.climatelinc.eu/fileadmin/UG_ADVANCED/Publications/BIU_-_Avi__Halvin__et_al-Very_early_warning_of_next_El_Nino.pdf">Very early warning of next El Niño</a>, which seeks to use a climate network for El Ni&ntilde;o prediction. They build their climate network using correlations between daily surface air temperature data between points in the El Niño basin and points outside this region, as shown here: <img width="450" src="http://math.ucr.edu/home/baez/ecological/el_nino/ludescher_el_nino_cooperativity_1a.jpg" alt="" /> The red dots are the points in their version of the El Ni&ntilde;o basin.
  • 105.

    A comment on the breaking of links by El Nino. An El Nino is like the gorilla in the room, or the bull in the china shop. It will break a lot of links simply because it overpowers everything in its path as it propagates outward.

    To my ears, what Tsonis found was obvious and fairly uninteresting. It's like reporting that an earthquake will break a lot of links with respect to the prior static conditions. Maybe I don't get it, especially when one can actually look at actual forcing factors such as QBO and the dipole (sloshing) mode that do correlate with El Nino. That's where the real physics is taking place, IMO.

    Comment Source:A comment on the breaking of links by El Nino. An El Nino is like the gorilla in the room, or the bull in the china shop. It will break a lot of links simply because it overpowers everything in its path as it propagates outward. To my ears, what Tsonis found was obvious and fairly uninteresting. It's like reporting that an earthquake will break a lot of links with respect to the prior static conditions. Maybe I don't get it, especially when one can actually look at actual forcing factors such as QBO and the [dipole (sloshing)](http://scholar.google.com/scholar?q=dipole+sloshing) mode that do correlate with El Nino. That's where the real physics is taking place, IMO.
  • 106.

    It's quite possible that everything in my talk will be obvious and fairly uninteresting; my talk is supposed to be about "climate networks" rather than physical models of the climate.

    Comment Source:It's quite possible that everything in my talk will be obvious and fairly uninteresting; my talk is supposed to be about "climate networks" rather than physical models of the climate.
  • 107.
    edited November 2014

    I should admit that it wasn't a priori obvious to me that El Niño would cause spatially uncorrelated climate fluctuations, thus breaking climate links, rather that spatially correlated ones that make climate links. But admittedly, learning this amounts to learning roughly one bit of information.

    Anyway, once this talk is done I'll be a lot more open to exploring simplified physical models of climate, not just these "network theory" ideas. Please don't think I'm uninterested in them!

    Comment Source:I should admit that it wasn't _a priori_ obvious to me that El Ni&ntilde;o would cause spatially uncorrelated climate fluctuations, thus _breaking_ climate links, rather that spatially correlated ones that _make_ climate links. But admittedly, learning this amounts to learning roughly one bit of information. Anyway, once this talk is done I'll be a lot more open to exploring simplified physical models of climate, not just these "network theory" ideas. Please don't think I'm uninterested in them!
  • 108.
    edited November 2014

    (Next I will describe Ludescher's procedure. Since their procedure is rather baroque, I may omit some details in the actual talk, but let me include them here.)

    14) The main idea of Ludescher et al is to construct a climate network that is a weighted graph, and to say an El Niño will occur if the average weight of edges between points in the El Niño basin and points outside this basin exceeds a certain threshold.

    As in the other papers I mentioned, Ludescher et al let $T_i(t)$ be the surface air temperature at the $i$th grid point at time $t$ minus the average temperature at that location at that time of year, to eliminate the most obvious seasonal effects.

    They consider a time-delayed covariance between temperatures at different grid points:

    $$\langle T_i(t) T_j(t - \tau) \rangle - \langle T_i(t) \rangle \langle T_j(t - \tau) \rangle $$ where $\tau$ is a time delay, and the angle brackets denote a running average over the last year, that is:

    $$ \langle f(t) \rangle = \frac{1}{365} \sum_{d = 0}^{364} f(t - d) } $$ where $t$ is the time in days.

    They normalize this to define a correlation $C_{i,j}^t(\tau)$.

    Next, for any pair of nodes $i$ and $j,$ and for each time $t,$ they determine the maximum, the mean and the standard deviation of $|C_{i,j}^t(\tau)|,$ as the delay $\tau$ ranges from -200 to 200 days.

    They define the link strength $S_{i,j}(t)$ as the difference between the maximum and the mean value of $|C_{i,j}^t(\tau)|$ divided by the standard deviation.

    Finally, they let $S(t)$ be the average link strength, calculated by averaging $S_{i j}(t)$ over all pairs $i,j$ where $i$ is a grid point inside their El Niño basin and $j$ is a grid point outside this basin, but still in their larger rectangle.

    Here is what they get:

    The blue peaks are El Niños: episodes where the Niño 3.4 index is over 0.5°C for at least 5 months.

    The red line is their 'average link strength'. Whenever this exceeds a certain threshold $\Theta = 2.82,$ and the Niño 3.4 index is not already over 0.5°C, they predict an El Niño will start in the following calendar year.

    Comment Source:(Next I will describe Ludescher's procedure. Since their procedure is rather baroque, I may omit some details in the actual talk, but let me include them here.) 14) The main idea of Ludescher _et al_ is to construct a climate network that is a weighted graph, and to say an El Ni&ntilde;o will occur if the average weight of edges between points _in_ the El Ni&ntilde;o basin and points _outside_ this basin exceeds a certain threshold. As in the other papers I mentioned, Ludescher _et al_ let $T_i(t)$ be the surface air temperature at the $i$th grid point at time $t$ _minus the average temperature at that location at that time of year_, to eliminate the most obvious seasonal effects. They consider a **time-delayed covariance** between temperatures at different grid points: $$\langle T_i(t) T_j(t - \tau) \rangle - \langle T_i(t) \rangle \langle T_j(t - \tau) \rangle $$ where $\tau$ is a time delay, and the angle brackets denote a running average over the last year, that is: $$ \langle f(t) \rangle = \frac{1}{365} \sum_{d = 0}^{364} f(t - d) } $$ where $t$ is the time in days. They normalize this to define a correlation $C_{i,j}^t(\tau)$. Next, for any pair of nodes $i$ and $j,$ and for each time $t,$ they determine the maximum, the mean and the standard deviation of $|C_{i,j}^t(\tau)|,$ as the delay $\tau$ ranges from -200 to 200 days. They define the <b>link strength</b> $S_{i,j}(t)$ as the difference between the maximum and the mean value of $|C_{i,j}^t(\tau)|$ divided by the standard deviation. Finally, they let $S(t)$ be the <b>average link strength</b>, calculated by averaging $S_{i j}(t)$ over all pairs $i,j$ where $i$ is a grid point _inside_ their El Niño basin and $j$ is a grid point _outside_ this basin, but still in their larger rectangle. Here is what they get: <img width="450" src="http://math.ucr.edu/home/baez/ecological/el_nino/ludescher_el_nino_cooperativity_2a.jpg" alt="" /> The blue peaks are El Ni&ntilde;os: episodes where the Ni&ntilde;o 3.4 index is over 0.5&deg;C for at least 5 months. The red line is their 'average link strength'. Whenever this exceeds a certain threshold $\Theta = 2.82,$ and the Ni&ntilde;o 3.4 index is not <i>already</i> over 0.5&deg;C, they predict an El Ni&ntilde;o will start in the following calendar year.
  • 109.

    John, I guess my point is that what Tsonis finds are negative results. The amount of uncorrelated factors is infinite and so it is not hard to uncover those.

    In my rather clumsy way, I am giving hints that the number one ENSO climate link is the oscillating dipole. When I first learned about this behavior, that one location goes up when the other location goes down and it does this over intervals lasting centuries, I said to myself, Get out, that is so cool. It is even cast in terms of network topology, i.e. nodes and anti-nodes.

    Perhaps you are getting to that but this is the most fundamental aspect of ENSO behavior, so that is why I am harping on it. This is the real fundamental climate link and it predated Tsonis' work. The dipole is the basic circuit=network and everything else builds off of that. It makes the topic much less mysterious as well.

    Comment Source:John, I guess my point is that what Tsonis finds are negative results. The amount of uncorrelated factors is infinite and so it is not hard to uncover those. In my rather clumsy way, I am giving hints that the number one ENSO climate link is the oscillating dipole. When I first learned about this behavior, that one location goes up when the other location goes down and it does this over intervals lasting centuries, I said to myself, Get out, that is so cool. It is even cast in terms of network topology, i.e. [nodes and anti-nodes](http://www.physicsclassroom.com/class/waves/Lesson-4/Nodes-and-Anti-nodes). Perhaps you are getting to that but this is the most fundamental aspect of ENSO behavior, so that is why I am harping on it. This is the real fundamental climate link and it predated Tsonis' work. The dipole is the basic circuit=network and everything else builds off of that. It makes the topic much less mysterious as well.
  • 110.

    I need to caution: Correlation distance as a metric to measure similarity, in case of forecasts self-similarity for algorithms like k-NN performs with more error than non-Euclidean metrics e.g. Manhattan or Chessboard! Therefore I suspect by looking at the results of the k-NN forecasts we could conclude, on the computational side of things, that the correlation as in traditional statistical definition might not be optimal for certain data.

    I post the findings tonight.

    I suggest we think of many metrics for our forecasts and similarity algorithms and not limit ourselves to Euclidean or Correlation distance functions.

    Comment Source:I need to caution: Correlation distance as a metric to measure similarity, in case of forecasts self-similarity for algorithms like k-NN performs with more error than non-Euclidean metrics e.g. Manhattan or Chessboard! Therefore I suspect by looking at the results of the k-NN forecasts we could conclude, on the computational side of things, that the correlation as in traditional statistical definition might not be optimal for certain data. I post the findings tonight. I suggest we think of many metrics for our forecasts and similarity algorithms and not limit ourselves to Euclidean or Correlation distance functions.
  • 111.
    edited November 2014

    WebHubTel wrote:

    John, I guess my point is that what Tsonis finds are negative results. The amount of uncorrelated factors is infinite and so it is not hard to uncover those.

    I wouldn't say that. They've found a strong correlation between El Niños (i.e. the Niño 3.4 index) and the weakening of climate links (i.e., spatial correlations in the temperature field) worldwide. It's a way of talking about the global effect of El Niños that's a bit more subtle than, say, a correlation between El Niños and the temperature at some location. As you note, it's a way of measuring the "gorilla in the china shop" effect.

    In my rather clumsy way, I am giving hints that the number one ENSO climate link is the oscillating dipole. When I first learned about this behavior, that one location goes up when the other location goes down.

    Do you mean the Darwin--Tahiti air pressure anticorrelation? Of course it's incredibly important, and if my talk was mainly about El Niño I'd have to talk about that. Maybe I do anyway.

    By the way, when I googled "ENSO dipole" I mainly got links to the Indian ocean dipole, which is a similar phenomenon in the Indian ocean. I think you've talked about how this interacts with the ENSO. It would be fun to examine that in detail sometime.

    Comment Source:WebHubTel wrote: > John, I guess my point is that what Tsonis finds are negative results. The amount of uncorrelated factors is infinite and so it is not hard to uncover those. I wouldn't say that. They've found a strong correlation between El Ni&ntilde;os (i.e. the Ni&ntilde;o 3.4 index) and the weakening of climate links (i.e., spatial correlations in the temperature field) worldwide. It's a way of talking about the global effect of El Ni&ntilde;os that's a bit more subtle than, say, a correlation between El Ni&ntilde;os and the temperature at some location. As you note, it's a way of measuring the "gorilla in the china shop" effect. > In my rather clumsy way, I am giving hints that the number one ENSO climate link is the oscillating dipole. When I first learned about this behavior, that one location goes up when the other location goes down. Do you mean the Darwin--Tahiti air pressure anticorrelation? Of course it's incredibly important, and if my talk was mainly about El Ni&ntilde;o I'd have to talk about that. Maybe I do anyway. By the way, when I googled "ENSO dipole" I mainly got links to the [Indian ocean dipole](https://en.wikipedia.org/wiki/Indian_Ocean_Dipole), which is a similar phenomenon in the Indian ocean. I think you've talked about how this interacts with the ENSO. It would be fun to examine that in detail sometime.
  • 112.

    I guess we are in agreement, but all those effects happen after the El Nino makes an appearance, so these causally correlated climate links don't work as well for prediction.

    Recall that the Tsonis paper was called "Topology and Predictability of El Niño and La Niña Networks", but then the authors admit that El Ninos are hard to predict. This Tsonis statement in particular is difficult to reconcile

    "We conjecture that because of this, predictability of temperature should decrease during El Nino years. Here we verify that indeed during El Nino years predictability is lower compared to La Niña years".

    So the outcome is that they can say that temperatures became hard to predict after an El Nino arrived. I suppose no kidding, because you can longer assume the temperature will remain roughly the same as the previous year, which is a higher probability outcome than the probability of an El Nino making an appearance, which of course will force the temperatures higher transiently.

    It's akin to saying that seismic readings are fairly predictable before an earthquake occurs (i.e. a low value), but then when the earthquake happens, the actual prediction is not so good because it could be anything on the scale.

    In retrospect, I wonder why he didn't call his paper "Topology and Un - Predictability of El Niño Networks". Perhaps because it wouldn't have been published?

    BTW, Tsonis is actually a favorite of the AGW skeptics because they use his work to claim that climate change is impossible to predict.

    Comment Source:I guess we are in agreement, but all those effects happen *after* the El Nino makes an appearance, so these causally correlated climate links don't work as well for prediction. Recall that the Tsonis paper was called "Topology and Predictability of El Niño and La Niña Networks", but then the authors admit that El Ninos are hard to predict. This Tsonis statement in particular is difficult to reconcile >"We conjecture that because of this, predictability of temperature should decrease during El Nino years. Here we verify that indeed during El Nino years predictability is lower compared to La Niña years". So the outcome is that they can say that temperatures became hard to predict after an El Nino arrived. I suppose no kidding, because you can longer assume the temperature will remain roughly the same as the previous year, which is a higher probability outcome than the probability of an El Nino making an appearance, which of course will force the temperatures higher transiently. It's akin to saying that seismic readings are fairly predictable before an earthquake occurs (i.e. a low value), but then when the earthquake happens, the actual prediction is not so good because it could be anything on the scale. In retrospect, I wonder why he didn't call his paper "Topology and **Un** - Predictability of El Niño Networks". Perhaps because it wouldn't have been published? BTW, Tsonis is actually a favorite of the AGW skeptics because they use his work to claim that climate change is impossible to predict.
  • 113.

    For k-NN regression did 4 forecasts:

    No links, got an improvement:

    k-NN regression El Nino 3.4 Anomalies, no links

    Least error was 0.149 with metric Cosine Distance.

    The following is the interlaced forecast i.e. anom1, c1, anom2, c2, anom3, c3 ... where c could be any real number:

    k-NN regression El Nino 3.4 Anomalies interlaced

    One is (Link, Anomalies) interlaced and the link data is Daniel's. Another is (1, Anomalies) Anomalies interlaced with constant 1. Another is (0, Anomalies) Anomalies interlaced with constant 0.

    Ran the k-NN backtesting on all 3 and on a group of distance functions (not just Euclidean) and measured the error: The lowest error 0.123 was obtained with interlaced (0, Anomalies) with metric Chessboard Distance.

    What basically these results show: Interlacing could improve the forecast accuracy, but the Link Strength does not particularly produce far better forecasts and there are better forecasts (than interlacing with links) obtained by interlacing constants.

    Note all the results are backtested several hundred times on actual historical measurements and what is reported above is the direct output without any modifications.

    Comment Source:For k-NN regression did 4 forecasts: No links, got an improvement: [k-NN regression El Nino 3.4 Anomalies, no links](http://files.lossofgenerality.com/knn1_nino34_anom_IMPROVED.pdf) Least error was 0.149 with metric Cosine Distance. The following is the interlaced forecast i.e. anom1, c1, anom2, c2, anom3, c3 ... where c could be any real number: [k-NN regression El Nino 3.4 Anomalies interlaced](http://files.lossofgenerality.com/knn2_nino34_anom_link.pdf) One is (Link, Anomalies) interlaced and the link data is Daniel's. Another is (1, Anomalies) Anomalies interlaced with constant 1. Another is (0, Anomalies) Anomalies interlaced with constant 0. Ran the k-NN backtesting on all 3 and on a group of distance functions (not just Euclidean) and measured the error: The lowest error 0.123 was obtained with interlaced (0, Anomalies) with metric Chessboard Distance. What basically these results show: Interlacing could improve the forecast accuracy, but the Link Strength does not particularly produce far better forecasts and there are better forecasts (than interlacing with links) obtained by interlacing constants. Note all the results are backtested several hundred times on actual historical measurements and what is reported above is the direct output without any modifications.
  • 114.
    edited November 2014

    Paul wrote:

    all those effects happen after the El Nino makes an appearance, so these causally correlated climate links don’t work as well for prediction.

    I started the IOD page because, unless I've had a memory lapse, Torrence et al. find the IOD dipole mode index (DMI) precedes El Nino.

    IOD index has a peak correlation coefficient of about 0.4 with the Darwin pressure index, i.e. the western pole of the Southern Oscillation, when the former leads the latter by one month. The correlation analysis with seasonally stratified data further confirms the lead role of the IOD. The IOD-Darwin relation has undergone interdecadal changes; in the last 50 years the correlation is highest during the most recent decade of 1990–99, and weakest during 1980–89.

    The correlation coefficent between the pressure index and the SST index time series is 0.65 (0.74 for June-November) when the latter leads the former by one month.

    Fig.3 shows cross correlation coefficients above 2.5 with 99% CL for:

    WP-DMI with a -4.5 month WP lag : 0.3

    Darwin-DMI with a -1 month Darwin lag : 0.4

    and coefficients below -0.25 with 99% CL for:

    IOSPL-POSLP with a -2 month IOSPL lag : -0.275

    SOI-DMI with a -2.5 month SOI lag. : -0.4

    • WP : western Pacific
    • DMI : dipole mode index
    • IOSPL : Indian ocean sea level pressure
    • POSPL : Pacific ocean sea level pressure
    • SOI : southern oscillation index …

    Even if the DMI only precedes the SOI by 2.5 months with a 40% probability (I hope I've got this correct) then it may still be worth comparing your QBO-TSI-CW model with the DMI. The the Indian ocean link strengths in the Torrence et al. paper might be expected to correlated with SOI, MEI or BEST and be interesting to compare 12 month predictions from IO link strength rather than Ludescher's Pacific basin. If there is no correlation then IOD push is not a significant mechanism for prediction and I'd want to look at what pulls in easterlies from Australia.

    A joint analysis of link strength over both oceans might be also be worthwhile.

    I think Paul commented somewhere that he's analysing, not predicting but I don't see why this equation can't be used to test hindcasts at various lags; perhaps I haven't understood correctly?

    Comment Source:Paul wrote: > all those effects happen after the El Nino makes an appearance, so these causally correlated climate links don’t work as well for prediction. I started the [IOD page](http://forum.azimuthproject.org/discussion/1529/the-indian-ocean-dipole-iod-and-enso/?Focus=13385#Comment_13385) because, unless I've had a memory lapse, Torrence et al. find the IOD dipole mode index (DMI) *precedes* El Nino. > IOD index has a peak correlation coefficient of about 0.4 with the Darwin pressure index, i.e. the western pole of the Southern Oscillation, when the former leads the latter by one month. The correlation analysis with seasonally stratified data further confirms the lead role of the IOD. The IOD-Darwin relation has undergone interdecadal changes; in the last 50 years the correlation is highest during the most recent decade of 1990–99, and weakest during 1980–89. > The correlation coefficent between the pressure index and the SST index time series is 0.65 (0.74 for June-November) when the latter leads the former by one month. > Fig.3 shows cross correlation coefficients above 2.5 with 99% CL for: > WP-DMI with a -4.5 month WP lag : 0.3 > Darwin-DMI with a -1 month Darwin lag : 0.4 and coefficients below -0.25 with 99% CL for: > IOSPL-POSLP with a -2 month IOSPL lag : -0.275 > SOI-DMI with a -2.5 month SOI lag. : -0.4 * WP : western Pacific * DMI : dipole mode index * IOSPL : Indian ocean sea level pressure * POSPL : Pacific ocean sea level pressure * SOI : southern oscillation index … Even if the DMI only precedes the SOI by 2.5 months with a 40% probability (I hope I've got this correct) then it may still be worth comparing your QBO-TSI-CW model with the DMI. The the Indian ocean link strengths in the Torrence et al. paper might be expected to correlated with SOI, MEI or BEST and be interesting to compare 12 month predictions from IO link strength rather than Ludescher's Pacific basin. If there is no correlation then IOD push is not a significant mechanism for prediction and I'd want to look at what pulls in easterlies from Australia. A joint analysis of link strength over both oceans might be also be worthwhile. I think Paul commented somewhere that he's analysing, not predicting but I don't see why this equation can't be used to test hindcasts at various lags; perhaps I haven't understood correctly?
  • 115.

    Jim, Here is a machine learning comparison between SOI (BEST variation) and IOD.

    In the first chart below, it shows SOI as a function of IOD and in the second it is IOD as a function of SOI. The base correlation coefficient is 0.462 with no delays (i.e. zero lag).

    soi as iod

    With delays the IOD as a predictor of SOI works marginally better than SOI as a predictor of IOD, but it will need some interpretation to see what this means. These are both dipoles so no doubt that they share the same global forcing factors. The question is whether these lags are real or are just phase relationships that occur from occupying different positions on a standing wave. Remember that a standing wave is really a collective phenomenon, and where the peaks and valleys are is really dependent on where the measurements are taken.

    iod as soi

    The delays shown are in months, so that IOD either lags SOI by 1 month with a sign reversal, or as a stronger condition, SOI lags IOD by 17 months with a sign reversal.

    I am also running an implicit comparison between the two with the ML but this is grinding away slowly.

    Comment Source:Jim, Here is a machine learning comparison between SOI (BEST variation) and IOD. In the first chart below, it shows SOI as a function of IOD and in the second it is IOD as a function of SOI. The base correlation coefficient is 0.462 with no delays (i.e. zero lag). ![soi as iod](http://imageshack.com/a/img673/7277/jZ9rS0.gif) With delays the IOD as a predictor of SOI works marginally better than SOI as a predictor of IOD, but it will need some interpretation to see what this means. These are both dipoles so no doubt that they share the same global forcing factors. The question is whether these lags are real or are just phase relationships that occur from occupying different positions on a standing wave. Remember that a standing wave is really a collective phenomenon, and where the peaks and valleys are is really dependent on where the measurements are taken. ![iod as soi](http://imageshack.com/a/img661/1959/UsMcFr.gif) The delays shown are in months, so that IOD either lags SOI by 1 month with a sign reversal, or as a stronger condition, SOI lags IOD by 17 months with a sign reversal. I am also running an implicit comparison between the two with the ML but this is grinding away slowly.
  • 116.

    Great stuff!

    I should have highlighted the WP-DMI lag of 4.5 months as the largest lag in the paper. Off-hand I've forgotten what WP means, maybe SLP?

    Comment Source:Great stuff! I should have highlighted the WP-DMI lag of 4.5 months as the largest lag in the paper. Off-hand I've forgotten what WP means, maybe SLP?
  • 117.
    edited November 2014

    Hello John

    Did the Random Forest again this time link-->anom is the map:

    Random Forest El Nino 3.4 Link-Anomalies

    This is off-the-shelf Mathematica, and I sure the code is correct per earlier review by their techsupport.

    Result: The addition of the link as the domain for the mapping to Anomalies results in larger errors, in other words link strength did not aid in the Random Forest forecast.

    I added the ANOVA like analysis from Mathematica, you can see clearly.

    Comment Source:Hello John Did the Random Forest again this time link-->anom is the map: [Random Forest El Nino 3.4 Link-Anomalies](http://files.lossofgenerality.com/random_forrest_nino34.pdf) This is off-the-shelf Mathematica, and I sure the code is correct per earlier review by their techsupport. Result: The addition of the link as the domain for the mapping to Anomalies results in larger errors, in other words link strength did not aid in the Random Forest forecast. I added the ANOVA like analysis from Mathematica, you can see clearly.
  • 118.
    edited November 2014

    Dara wrote:

    Random Forest El Nino 3.4 Link-Anomalies

    I always have trouble understanding what you are doing, because you never include the one or two sentences in English that would explain it to me. I really wish you'd do this. Perhaps if I were better at Mathematica I could easily guess what you're doing here. I will try to guess:

    Are you using the Niño 3.4 index and link strengths up to month $n$ to predict the Niño 3.4 index at month $n+1$?

    A simple "yes" or "no" would help me a lot.

    Result: The addition of the link as the domain for the mapping to Anomalies results in larger errors, in other words link strength did not aid in the Random Forest forecast.

    That's interesting. But Ludescher et al don't claim that link strengths help in predicting the Niño 3.4 one month in advance. They (roughly) claim it helps predict the Niño 3.4 six months in advance. Is there a way to adjust your computation to test this hypothesis?

    Comment Source:Dara wrote: > [Random Forest El Nino 3.4 Link-Anomalies](http://files.lossofgenerality.com/random_forrest_nino34.pdf) I always have trouble understanding what you are doing, because you never include the one or two sentences in English that would explain it to me. I really wish you'd do this. Perhaps if I were better at Mathematica I could easily guess what you're doing here. I will try to guess: **Are you using the Ni&ntilde;o 3.4 index and link strengths up to month $n$ to predict the Ni&ntilde;o 3.4 index at month $n+1$?** A simple "yes" or "no" would help me a lot. > Result: The addition of the link as the domain for the mapping to Anomalies results in larger errors, in other words link strength did not aid in the Random Forest forecast. That's interesting. But Ludescher _et al_ don't claim that link strengths help in predicting the Ni&ntilde;o 3.4 one month in advance. They (roughly) claim it helps predict the Ni&ntilde;o 3.4 six months in advance. Is there a way to adjust your computation to test this hypothesis?
  • 119.
    edited November 2014

    WebHubTel wrote:

    I guess we are in agreement, but all those effects happen after the El Nino makes an appearance, so these causally correlated climate links don’t work as well for prediction.

    Right. I hope it's clear: my talk is not really about El Niño prediction. It's called "Networks in Climate Science". It's supposed to be a very easy intro to what people call "complex networks", and attempts to apply them to climate science. One of the topics will be an analysis of Ludescher's claim that climate networks can be used to predict El Niños. I'll be rather critical of this claim. But I'm not attempting to review other methods of El Niño prediction. If I add extra material, it will be more stuff about network theory.

    When I'm done with this talk, I'll probably change personalities and become interested in El Niño prediction. Then I'll have lots more questions for you, and maybe coax you to write some blog articles on your approach.

    Comment Source:WebHubTel wrote: > I guess we are in agreement, but all those effects happen after the El Nino makes an appearance, so these causally correlated climate links don’t work as well for prediction. Right. I hope it's clear: my talk is not really about El Ni&ntilde;o prediction. It's called "Networks in Climate Science". It's supposed to be a very easy intro to what people call "complex networks", and attempts to apply them to climate science. One of the topics will be an analysis of Ludescher's claim that climate networks can be used to predict El Ni&ntilde;os. I'll be rather critical of this claim. But I'm not attempting to review other methods of El Ni&ntilde;o prediction. If I add extra material, it will be more stuff about network theory. When I'm done with this talk, I'll probably change personalities and become interested in El Ni&ntilde;o prediction. Then I'll have lots more questions for you, and maybe coax you to write some blog articles on your approach.
  • 120.

    John

    I thought from the beginning that Link Strength addition to ML algorithms would not up the accuracy, at least for 2 we can see for real. I posted the Kolmogorov Theorem on non-increase of information by application of computational functions e.g. averaging.

    As Paul said, one has to find a sound theoretical data obtained independent of the original input e.g. upper atmosphere radiation or number of solar spots or sloshing data/theory or solutions to some differential equation i.e. anything that is not obtained from the original data.

    Comment Source:John I thought from the beginning that Link Strength addition to ML algorithms would not up the accuracy, at least for 2 we can see for real. I posted the Kolmogorov Theorem on non-increase of information by application of computational functions e.g. averaging. As Paul said, one has to find a sound theoretical data obtained independent of the original input e.g. upper atmosphere radiation or number of solar spots or sloshing data/theory or solutions to some differential equation i.e. anything that is not obtained from the original data.
  • 121.
    edited November 2014

    Dara - could you please answer my question in item 118? It's in boldface.

    If you don't answer my questions about your work, your work can't help me. I'm sorry that I'm so stupid, but I need to hear answers to my questions, or I can't understand what you are doing. This is a constant problem.

    Comment Source:Dara - could you please answer my question in item 118? It's in boldface. If you don't answer my questions about your work, your work can't help me. I'm sorry that I'm so stupid, but I need to hear answers to my questions, or I can't understand what you are doing. This is a constant problem.
  • 122.

    15) Ludescher et al chose their threshold for El Niño prediction by training their algorithm on climate data from 1948 to 1980, and tested it on data from 1981 to 2013. They claim that with this threshold, their El Niño predictions were correct 76% of the time, and their predictions of no El Niño were correct in 86% of all cases.

    On this basis they claimed - when their paper was published in February 2014 - that there would be an El Niño by the end of 2014 with probability 3/4.

    As of 14 November 2014 there is not an El Niño.

    (This information is updated weekly, so I should give the latest news in my talk.)

    What should we make of this?

    Comment Source:15) Ludescher _et al_ chose their threshold for El Ni&ntilde;o prediction by training their algorithm on climate data from 1948 to 1980, and tested it on data from 1981 to 2013. They claim that with this threshold, their El Ni&ntilde;o predictions were correct 76% of the time, and their predictions of no El Ni&ntilde;o were correct in 86% of all cases. On this basis they claimed - when their paper was published in February 2014 - that there would be an El Ni&ntilde;o by the end of 2014 with probability 3/4. As of [14 November 2014](http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/lanina/enso_evolution-status-fcsts-web.pdf) there is not an El Ni&ntilde;o. (This information is updated weekly, so I should give the latest news in my talk.) What should we make of this?
  • 123.
    edited November 2014

    16) Graham Jones of the Azimuth Project wrote code implementing Ludescher et al's algorithm, as best as we could understand it, and got results close to theirs, thought not identical. The code is open-source; one goal of the Azimuth Project is to do science "in the open".

    More interesting than the small discrepancies between our calculation and theirs is the question of whether "average link strengths" between points in the El Niño basin and points outside are really helpful in predicting El Niños.

    Steve Wenner, a statistician helping the Azimuth Project, noted some ambiguities in Ludescher et al's El Niño prediction rules and disambiguated them in a number of ways. For each way he used Fischer's exact test to compute the $p$-value of the null hypothesis that Ludescher et al's El Niño prediction does not improve the odds of a successful prediction of an El Niño.

    The best he got (that is, the lowest $p$-value) was $0.03$. This is just a bit more significant than the conventional 1 in 20 chance that is the usual threshold for rejecting a null hypothesis.

    Comment Source:16) Graham Jones of the Azimuth Project [wrote code implementing Ludescher _et al's_ algorithm](http://johncarlosbaez.wordpress.com/2014/07/08/el-nino-project-part-4/), as best as we could understand it, and got results close to theirs, thought not identical. The code is open-source; one goal of the Azimuth Project is to do science "in the open". More interesting than the small discrepancies between our calculation and theirs is the question of whether "average link strengths" between points in the El Ni&ntilde;o basin and points outside are really helpful in predicting El Ni&ntilde;os. Steve Wenner, a statistician helping the Azimuth Project, [noted some ambiguities in Ludescher _et al_'s El Ni&ntilde;o prediction rules](http://johncarlosbaez.wordpress.com/2014/07/23/el-nino-project-part-6/) and disambiguated them in a number of ways. For each way he used Fischer's exact test to compute the $p$-value of the null hypothesis that Ludescher _et al_'s El Ni&ntilde;o prediction does not improve the odds of a successful prediction of an El Ni&ntilde;o. The best he got (that is, the lowest $p$-value) was $0.03$. This is just a bit more significant than the conventional 1 in 20 chance that is the usual threshold for rejecting a null hypothesis.
  • 124.

    What should we make of this?

    Events with .2 or even .1 probability are not really unusual by definition, so being wrong in their .8 confidence prediction does not by itself refute their claims. However it would be interesting to analyze how they arrived at their accuracy numbers and derive confidence intervals around those numbers. While the failure of their prediction is not particularly significant, I find those claimed levels of accuracy higher that I would expect based on the correlation analyses.

    Comment Source:> What should we make of this? Events with .2 or even .1 probability are not really unusual by definition, so being wrong in their .8 confidence prediction does not by itself refute their claims. However it would be interesting to analyze how they arrived at their accuracy numbers and derive confidence intervals around those numbers. While the failure of their prediction is not particularly significant, I find those claimed levels of accuracy higher that I would expect based on the correlation analyses.
  • 125.

    Are you using the Niño 3.4 index and link strengths up to month n to predict the Niño 3.4 index at month n+1?

    Yes. Sorry I did not see that post.

    Comment Source:>Are you using the Niño 3.4 index and link strengths up to month n to predict the Niño 3.4 index at month n+1? Yes. Sorry I did not see that post.
  • 126.

    They (roughly) claim it helps predict the Niño 3.4 six months in advance. Is there a way to adjust your computation to test this hypothesis?

    Yes I could adjust my code, but I am sure if it does not work for n+1 will also have trouble for n+6.

    If I do not reply tonight , means on the flight from Dublin to Toronto, else I try to get it done Sunday night or Monday

    D

    Comment Source:>They (roughly) claim it helps predict the Niño 3.4 six months in advance. Is there a way to adjust your computation to test this hypothesis? Yes I could adjust my code, but I am sure if it does not work for n+1 will also have trouble for n+6. If I do not reply tonight , means on the flight from Dublin to Toronto, else I try to get it done Sunday night or Monday D
  • 127.

    As of 14 November 2014 there is not an El Niño.

    (This information is updated weekly, so I should give the latest news in my talk.)

    What should we make of this?

    The latest is that there will likely be an El Nino

    http://iri.columbia.edu/our-expertise/climate/forecasts/enso/current/?enso_tab=enso-iri_update

    "Published: November 20, 2014

    Recent and Current Conditions

    The SST anomaly in the NINO3.4 region has hovered near or just below the threshold of the level required for El Niño level, beginning in early November the weekly SST anomalies have started exceeded it. For October the average NINO3.4 SST anomaly was 0.49 C, indicative of borderline Niño conditions, and for Aug-Oct it was 0.38 C. The IRI’s definition of El Niño, like NOAA/Climate Prediction Center’s, requires that the SST anomaly in the Nino3.4 region (5S-5N; 170W-120W) exceed 0.5 C. Similarly, for La Niña, the anomaly must be -0.5 C or less.

    The probabilities derived from the models on the IRI/CPC plume describe, on average, El Niño conditions for the Nov-Jan 2014-15 season currently in progress, enduring well into northern spring season of 2015. The consensus of model predictions calls for a weak El Niño event, although a moderate event or no event are also possible.

    Comment Source:>As of 14 November 2014 there is not an El Niño. >(This information is updated weekly, so I should give the latest news in my talk.) >What should we make of this? The latest is that there will likely be an El Nino <http://iri.columbia.edu/our-expertise/climate/forecasts/enso/current/?enso_tab=enso-iri_update> > "Published: November 20, 2014 >Recent and Current Conditions >The SST anomaly in the NINO3.4 region has hovered near or just below the threshold of the level required for El Niño level, beginning in early November the weekly SST anomalies have started exceeded it. For October the average NINO3.4 SST anomaly was 0.49 C, indicative of borderline Niño conditions, and for Aug-Oct it was 0.38 C. The IRI’s definition of El Niño, like NOAA/Climate Prediction Center’s, requires that the SST anomaly in the Nino3.4 region (5S-5N; 170W-120W) exceed 0.5 C. Similarly, for La Niña, the anomaly must be -0.5 C or less. >The probabilities derived from the models on the IRI/CPC plume describe, on average, El Niño conditions for the Nov-Jan 2014-15 season currently in progress, enduring well into northern spring season of 2015. The consensus of model predictions calls for a weak El Niño event, although a moderate event or no event are also possible.
  • 128.

    John here it is, scroll to the bottom:

    Random Forest El Nino 3.4, Links 6-months lag

    As I told you the results for 6-monthd lag are similar to results for 1-months lag, no new accuracy is obtained.

    When I arrive at Toronto I will code the same in SVR and NN.

    Dara

    Comment Source:John here it is, scroll to the bottom: [Random Forest El Nino 3.4, Links 6-months lag](http://files.lossofgenerality.com/random_forrest_nino34.pdf) As I told you the results for 6-monthd lag are similar to results for 1-months lag, no new accuracy is obtained. When I arrive at Toronto I will code the same in SVR and NN. Dara
  • 129.

    John for some reason, I do not see some of your posts, until I do a full refresh on my browser! I am using FireFox. So if you do not see a response from me, I am not ignoring you, please repost. I need to see what is causing this...

    Comment Source:John for some reason, I do not see some of your posts, until I do a full refresh on my browser! I am using FireFox. So if you do not see a response from me, I am not ignoring you, please repost. I need to see what is causing this...
  • 130.

    Here is another bit of insight as to current conditions:

    Nino 4 is quite high. “That’s the Modoki flavor of the #ElNino,” says @climatesociety’s Barnston.

    yet

    IRI ‏@climatesociety Nov 20

    The most important strike against calling an #ElNiño now is lack of convection anomalies in central Pacific. #IRIforecast

    nino conditions

    nino4 region

    Notice how the sea-level is rising in the western Pacific as the water sloshing builds up.

    slr

    This is another way to look at it:

    slr3d

    Comment Source:Here is another bit of insight as to current conditions: > Nino 4 is quite high. “That’s the Modoki flavor of the #ElNino,” says @climatesociety’s Barnston. yet >IRI ‏@climatesociety Nov 20 >The most important strike against calling an #ElNiño now is lack of convection anomalies in central Pacific. #IRIforecast ![nino conditions](https://pbs.twimg.com/media/B25aCijCYAAhx8k.png) ![nino4 region](http://meteora.ucsd.edu/~pierce/elnino/nino-4.gif) Notice how the sea-level is rising in the western Pacific as the water sloshing builds up. ![slr](http://sealevel.colorado.edu/files/current/sl.jpg) This is another way to look at it: ![slr3d](http://imagizer.imageshack.us/a/img540/6605/Qei3sz.gif)
  • 131.

    WebHubTel wrote:

    The latest is that there will likely be an El Nino.

    Thanks!

    Will this arrive in time to save Ludescher et al? That's not a very important question, but it's interesting for this talk.

    They claimed an El Niño would arrive by the end of this calendar year. The usual definition of El Niño requires that the 3-month running mean of the Niño 3.4 index exceeds 0.5 °C for at least 5 months in a row. That can't happen this year.

    But luckily for Ludescher et al, they were using a nonstandard definition of El Niño! For them, all you need is for the monthly average Niño 3.4 index to exceed 0.5 °C.

    So, if the December average is above 0.5 °C, they can pat themselves on the back and claim success.

    Comment Source:WebHubTel wrote: > The latest is that there will likely be an El Nino. Thanks! Will this arrive in time to save Ludescher _et al_? That's not a very important question, but it's interesting for this talk. They claimed an El Ni&ntilde;o would arrive by the end of this calendar year. The usual definition of El Ni&ntilde;o requires that the 3-month running mean of the Niño 3.4 index exceeds 0.5 °C for at least 5 months in a row. That can't happen this year. But luckily for Ludescher _et al_, they were using a nonstandard definition of El Ni&ntilde;o! For them, all you need is for the monthly average Niño 3.4 index to exceed 0.5 °C. So, if the December average is above 0.5 °C, they can pat themselves on the back and claim success.
  • 132.
    edited November 2014

    Daniel wrote:

    Events with .2 or even .1 probability are not really unusual by definition, so being wrong in their .8 confidence prediction does not by itself refute their claims.

    Right. I'm not really trying to read anything into the success or failure of their prediction of this one event: an El Niño in 2014. It is, nonetheless, the reason their paper made the news!

    However it would be interesting to analyze how they arrived at their accuracy numbers and derive confidence intervals around those numbers.

    Indeed! If anyone can do this before December 5th I'd be very happy.

    Right now our statistical consultants are Steve Wenner and Jan Galkowski. Steve wrote:

    Anyway, I did some formal hypothesis testing and estimation under all five scenarios. (That is, 5 interpretations of Ludescher et al's ambiguous rules.) The good news is that under most scenarios the prediction method gave better results than merely guessing. (But, I wonder how many things the authors tried before they settled on their final method? Also, did they do all their work on the learning series, and then only at the end check the validation series—or were they checking both as they went about their investigations?)

    The bad news is that the predictions varied with the method, and the methods were rather weak. For instance, in the training series there were 9 El Niño periods in 30 years; the authors’ rules (whatever they were, exactly) found five of the nine. At the same time, they had three false alarms in the 21 years that did not have an El Niño initiated.

    I used Fisher’s exact test to compute some p-values. Suppose (as our ‘null hypothesis’) that Ludescher et al’s method does not improve the odds of a successful prediction of an El Nino initiation. What’s the probability of that method getting at least as many predictions right just by chance? Answer: 0.032 – this is marginally more significant than the conventional 1 in 20 chance that is the usual threshold for rejecting a null hypothesis, but still not terribly convincing. This was, by the way, the most significant of the five p-values for the alternative rule sets applied to the learning series.

    I also computed the “relative risk” statistics for all scenarios; for instance, we are more than three times as likely to see an El Niño initiation if Ludescher et al predict one, than if they predict otherwise (the 90% confidence interval for that ratio is 1.2 to 9.7, with the point estimate 3.4). Here is a screen shot of some statistics for that case... (spreadsheet of all statistics here)

    Jan wrote this.

    Neither of them put bounds on "what fraction of the time Ludescher et al's predictions are right".

    Comment Source:Daniel wrote: > Events with .2 or even .1 probability are not really unusual by definition, so being wrong in their .8 confidence prediction does not by itself refute their claims. Right. I'm not really trying to read anything into the success or failure of their prediction of this one event: an El Ni&ntilde;o in 2014. It is, nonetheless, the reason their paper made the news! > However it would be interesting to analyze how they arrived at their accuracy numbers and derive confidence intervals around those numbers. Indeed! If anyone can do this before December 5th I'd be very happy. Right now our statistical consultants are Steve Wenner and Jan Galkowski. Steve wrote: > Anyway, I did some formal hypothesis testing and estimation under all five scenarios. **(That is, 5 interpretations of Ludescher _et al's_ ambiguous rules.)** The good news is that under most scenarios the prediction method gave better results than merely guessing. (But, I wonder how many things the authors tried before they settled on their final method? Also, did they do all their work on the learning series, and then only at the end check the validation series—or were they checking both as they went about their investigations?) > The bad news is that the predictions varied with the method, and the methods were rather weak. For instance, in the training series there were 9 El Niño periods in 30 years; the authors’ rules (whatever they were, exactly) found five of the nine. At the same time, they had three false alarms in the 21 years that did not have an El Niño initiated. > I used Fisher’s exact test to compute some p-values. Suppose (as our ‘null hypothesis’) that Ludescher et al’s method does not improve the odds of a successful prediction of an El Nino initiation. What’s the probability of that method getting at least as many predictions right just by chance? Answer: 0.032 – this is marginally more significant than the conventional 1 in 20 chance that is the usual threshold for rejecting a null hypothesis, but still not terribly convincing. This was, by the way, the most significant of the five p-values for the alternative rule sets applied to the learning series. > I also computed the “relative risk” statistics for all scenarios; for instance, we are more than three times as likely to see an El Niño initiation if Ludescher et al predict one, than if they predict otherwise (the 90% confidence interval for that ratio is 1.2 to 9.7, with the point estimate 3.4). Here is a screen shot of some statistics for that case... **(spreadsheet of all statistics [here](http://math.ucr.edu/home/baez/ecological/el_nino/wenner/ElNinoTemps.xlsx))** Jan wrote [this](http://johncarlosbaez.wordpress.com/2014/07/23/el-nino-project-part-6/#comment-54131). Neither of them put bounds on "what fraction of the time Ludescher _et al's_ predictions are right".
  • 133.
    edited November 2014

    Okay, so here are some things I'd like to add to the talk:

    1. Some of Daniel Mahler's work.

    2. Some of Dara Shayda's work.

    Let me start by talking about the first.

    On November 12 in the thread "Random forest: El Nino 3.4, (off-the-shelf)" Daniel wrote:

    I am rather strongly convinced that the link strength, at least as represented in the github file, is essentially unrelated to nino34. My main source of doubt is the possibility that there is an error how I temporally align the signals.

    Then comes more discussion of the latter topic.

    But most of the interesting ideas appear in this thread here. Read my next comments.

    Comment Source:Okay, so here are some things I'd like to add to the talk: 1. Some of Daniel Mahler's work. 1. Some of Dara Shayda's work. Let me start by talking about the first. On November 12 in the thread "Random forest: El Nino 3.4, (off-the-shelf)" [Daniel wrote](http://forum.azimuthproject.org/discussion/1528/random-forest-el-nino-34-offtheshelf/?Focus=13405#Comment_13405): > I am rather strongly convinced that the link strength, at least as represented in the github file, is essentially unrelated to nino34. My main source of doubt is the possibility that there is an error how I temporally align the signals. Then comes more discussion of the latter topic. But most of the interesting ideas appear in this thread here. Read my next comments.
  • 134.
    edited November 2014

    On November 15th in comment #39 on this thread here, Daniel Mahler wrote:

    I have figured out the main reason why I was getting only negligible correlation between link strength and nino34 anomaly, even though Graham was able to reproduce the Ludascher results. Maplotlib's xcorr and acorr function do not subtract the means from the signals before doing the rolling dot products. This was fine for the nino34 anomaly since that is mean 0 by design, but link strength is all positive so it is badly affected by this.

    Once the mean is subtracted from the anomaly, the 0 time lag correlation remain negligible, but there a small but noticable peak in the xcorr plot at -4 coresponding to a correlation with the anomaly 4 month after the corresponding link strength. The correlation with 6 months later is almost the same, but it is still lower than the 6-month lag autocorrelation of the anomaly itself and significantly lower than the other models I have posted here.

    I have updated the notebook to subtract the mean from the anomaly prior to the analysis.

    Daniel: I believe that "Once the mean is subtracted from the anomaly" should be "Once the mean is subtracted from the link strength", and "I have updated the notebook to subtract the mean from the anomaly" should be "I have updated the notebook to subtract the mean from the link strength". Right?

    Comment Source:On November 15th in comment [#39](http://forum.azimuthproject.org/discussion/1523/crunch-time/?Focus=13485#Comment_13485) on this thread here, Daniel Mahler wrote: > I have figured out the main reason why I was getting only negligible correlation between link strength and nino34 anomaly, even though Graham was able to reproduce the Ludascher results. Maplotlib's xcorr and acorr function do not subtract the means from the signals before doing the rolling dot products. This was fine for the nino34 anomaly since that is mean 0 by design, but link strength is all positive so it is badly affected by this. > Once the mean is subtracted from the anomaly, the 0 time lag correlation remain negligible, but there a small but noticable peak in the xcorr plot at -4 coresponding to a correlation with the anomaly 4 month after the corresponding link strength. The correlation with 6 months later is almost the same, but it is still lower than the 6-month lag autocorrelation of the anomaly itself and significantly lower than the other models I have posted here. > I have updated the [notebook](https://www.googledrive.com/host/0B4cyIPgV_VxrX0lxSUxHU2VLN28/link-anom.html) to subtract the mean from the anomaly prior to the analysis. Daniel: I believe that "Once the mean is subtracted from the anomaly" should be "Once the mean is subtracted from the link strength", and "I have updated the [notebook](https://www.googledrive.com/host/0B4cyIPgV_VxrX0lxSUxHU2VLN28/link-anom.html) to subtract the mean from the anomaly" should be "I have updated the [notebook](https://www.googledrive.com/host/0B4cyIPgV_VxrX0lxSUxHU2VLN28/link-anom.html) to subtract the mean from the link strength". Right?
  • 135.
    edited November 2014

    And things got really interesting in comment #61 on this thread here, where Daniel wrote:


    The observations in #39 and #41 are more interesting then I initially realized. Since the current nino34 and current link strength are both correlated with nino34 in the future but not with each other, means that they carry independent information about future nino34. So while the link strength is a weak predictor it can be valuable because of its independence. This is born out combining the current nino34 and link strength in a model:

    > summary(lm(d$ANOM[7:nrow(d)] ~ d$ANOM[1:(nrow(d)-6)] + d$link[1:(nrow(d)-6)]))
    
    Call:
    lm(formula = d$ANOM[7:nrow(d)] ~ d$ANOM[1:(nrow(d) - 6)] + d$link[1:(nrow(d) - 
        6)])
    
    Residuals:
         Min       1Q   Median       3Q      Max 
    -2.04005 -0.46827 -0.01113  0.46492  2.05495 
    
    Coefficients:
                            Estimate Std. Error t value Pr(>|t|)    
    (Intercept)              0.01309    0.02620   0.500    0.618    
    d$ANOM[1:(nrow(d) - 6)]  0.36617    0.03206  11.423  < 2e-16 ***
    d$link[1:(nrow(d) - 6)]  0.65109    0.08501   7.659 5.71e-14 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    Residual standard error: 0.7231 on 759 degrees of freedom
    Multiple R-squared:  0.222, Adjusted R-squared:   0.22 
    F-statistic: 108.3 on 2 and 759 DF,  p-value: < 2.2e-16
    

    This model has an $R^2$ of .22 compared to .16 and 0.9 of from the individual signals. It is fairly close to the sum of the $R^2$ of the individual signal models minus the $R^2$ of the model between them (0.016). This suggests that the information in links strength is all new, not just degraded information of the current enso34. While $R^2=.22$ is still lower then some the other models, it is starting to catch up, and it is kind of impressive for a liner model based on just 2 numbers.

    Comment Source:And things got really interesting in comment [#61](http://forum.azimuthproject.org/discussion/1523/crunch-time/?Focus=13526#Comment_13526) on this thread here, where Daniel wrote: <hr/> The observations in [#39](http://forum.azimuthproject.org/discussion/1523/crunch-time/?Focus=13485#Comment_13485) and [#41](http://forum.azimuthproject.org/discussion/1523/crunch-time/?Focus=13487#Comment_13487) are more interesting then I initially realized. Since the current nino34 and current link strength are both correlated with nino34 in the future but not with each other, means that they carry independent information about future nino34. So while the link strength is a weak predictor it can be valuable because of its independence. This is born out combining the current nino34 and link strength in a model: > summary(lm(d$ANOM[7:nrow(d)] ~ d$ANOM[1:(nrow(d)-6)] + d$link[1:(nrow(d)-6)])) Call: lm(formula = d$ANOM[7:nrow(d)] ~ d$ANOM[1:(nrow(d) - 6)] + d$link[1:(nrow(d) - 6)]) Residuals: Min 1Q Median 3Q Max -2.04005 -0.46827 -0.01113 0.46492 2.05495 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.01309 0.02620 0.500 0.618 d$ANOM[1:(nrow(d) - 6)] 0.36617 0.03206 11.423 < 2e-16 *** d$link[1:(nrow(d) - 6)] 0.65109 0.08501 7.659 5.71e-14 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.7231 on 759 degrees of freedom Multiple R-squared: 0.222, Adjusted R-squared: 0.22 F-statistic: 108.3 on 2 and 759 DF, p-value: < 2.2e-16 This model has an $R^2$ of .22 compared to .16 and 0.9 of from the individual signals. It is fairly close to the sum of the $R^2$ of the individual signal models minus the $R^2$ of the model between them (0.016). This suggests that the information in links strength is all new, not just degraded information of the current enso34. While $R^2=.22$ is still lower then some the other models, it is starting to catch up, and it is kind of impressive for a liner model based on just 2 numbers.
  • 136.
    edited November 2014

    Let me try to continue my talk draft, though I need help from Daniel now. This is the main point I'd like to focus on, if it's true:

    Since the current nino34 and current link strength are both correlated with nino34 in the future but not with each other, means that they carry independent information about future nino34. So while the link strength is a weak predictor it can be valuable because of its independence.

    I will start by assembling a few graphs taken from Daniel Mahler's notebook.

    Output 53 is a scatter plot showing the link strength versus the Niño 3.4 index using data from 1950 to 2013:

    Question for Daniel: are these normalized by subtracting off the mean and dividing by the standard deviation?

    My guess: the mean has been subtracted off the link strength; it's already subtracted off in the definition of Niño 3.4. These quantities have not been divided by their standard deviation.

    Question for Daniel: are these the Niño 3.4 and link strength at the same time?

    My guess: yes. This graph was produced by:

    f.plot(x="link", y="ANOM", kind="scatter", figsize=(9,9), alpha=0.3, marker="o")

    Comment Source:Let me try to continue my talk draft, though I need help from Daniel now. This is the main point I'd like to focus on, if it's true: > Since the current nino34 and current link strength are both correlated with nino34 in the future but not with each other, means that they carry independent information about future nino34. So while the link strength is a weak predictor it can be valuable because of its independence. I will start by assembling a few graphs taken from [Daniel Mahler's notebook](https://5619417f7fb3a489ed01c7f329cbd1e9b70a10d6-www.googledrive.com/host/0B4cyIPgV_VxrX0lxSUxHU2VLN28/link-anom.html). Output 53 is a scatter plot showing the link strength versus the Ni&ntilde;o 3.4 index using data from 1950 to 2013: <img src = "http://math.ucr.edu/home/baez/climate_networks/mahler_nino3.4_versus_link_strength_output53.png" alt = ""/> **Question for Daniel: are these normalized by subtracting off the mean and dividing by the standard deviation?** My guess: the mean has been subtracted off the link strength; it's already subtracted off in the definition of Ni&ntilde;o 3.4. These quantities have not been divided by their standard deviation. **Question for Daniel: are these the Ni&ntilde;o 3.4 and link strength at the same time?** My guess: yes. This graph was produced by: `f.plot(x="link", y="ANOM", kind="scatter", figsize=(9,9), alpha=0.3, marker="o")`
  • 137.
    edited November 2014

    Output 54 again shows the link strength versus the Niño 3.4 index using data from 1950 to 2013:

    Question for Daniel: what's the difference?

    Guess: this plots the Niño 3.4 index against the link strength 6 months earlier, or perhaps a year earlier. This graph was produced by:

    figure(figsize=(9,9)) scatter(df["link"].iloc[:-6], y=df["ANOM"].iloc[6:], alpha=0.3, marker="o")

    I don't know what .iloc[:-6] and .iloc[6:] mean, hence my confusion.

    Comment Source:Output 54 again shows the link strength versus the Niño 3.4 index using data from 1950 to 2013: <img src = "http://math.ucr.edu/home/baez/climate_networks/mahler_nino3.4_versus_link_strength_output54.png" alt = ""/> **Question for Daniel: what's the difference?** Guess: this plots the Ni&ntilde;o 3.4 index against the link strength 6 months earlier, or perhaps a year earlier. This graph was produced by: `figure(figsize=(9,9)) scatter(df["link"].iloc[:-6], y=df["ANOM"].iloc[6:], alpha=0.3, marker="o")` I don't know what `.iloc[:-6]` and `.iloc[6:]` mean, hence my confusion.
  • 138.

    Output 14 is another scatter plot. This one looks a bit better:

    Question for Daniel: what's the difference?

    Guess: this plots the Niño 3.4 index against itself one year later, or maybe 6 months later. This graph was produced by:

    figure(figsize=(9,9)) scatter(df.ANOM[:-6], y=df.ANOM[6:], alpha=0.3, marker="o")

    I don't know what :-6 and 6: mean, hence my confusion.

    Comment Source:Output 14 is another scatter plot. This one looks a bit better: <img src = "http://math.ucr.edu/home/baez/climate_networks/mahler_nino3.4_versus_link_strength_output14.png" alt = ""/> **Question for Daniel: what's the difference?** Guess: this plots the Ni&ntilde;o 3.4 index against itself one year later, or maybe 6 months later. This graph was produced by: `figure(figsize=(9,9)) scatter(df.ANOM[:-6], y=df.ANOM[6:], alpha=0.3, marker="o")` I don't know what `:-6` and `6:` mean, hence my confusion.
  • 139.

    I keep looking at the 2 images from the Donges paper of link strengths of La Nina and El Nino in comment #95.

    The strongest links during whatever single or average El Nino(s) appear to be in the Nino3.4 region but the biggest change appears to be the complete disappearance of links in the Indo-China area. If it takes about 3 months for significant ocean volumes to traverse the Pacific from west to east doesn't this suggest it's might be worthwhile looking at the breaking of links in the Indo-China link "hotspot" as the warm water volume sloshes eastwards as possibly useful for prediction?

    At least looking at correlations between 2 link "hotspots" and the rest of the Pacific rather than just the Nino3.4 area might provide more information than just the defining convention of an El Nino's existence?

    Comment Source:I keep looking at the 2 images from the Donges paper of link strengths of La Nina and El Nino in comment #95. The strongest links during whatever single or average El Nino(s) appear to be in the Nino3.4 region but the biggest change appears to be the complete disappearance of links in the Indo-China area. If it takes about 3 months for significant ocean volumes to traverse the Pacific from west to east doesn't this suggest it's might be worthwhile looking at the breaking of links in the Indo-China link "hotspot" as the warm water volume sloshes eastwards as possibly useful for prediction? At least looking at correlations between 2 link "hotspots" and the rest of the Pacific rather than just the Nino3.4 area might provide more information than just the defining convention of an El Nino's existence?
  • 140.
    edited November 2014

    Hello John

    I plotted Daniel's plots.

    number 136 is scatter plot for (x,y) where x is link and y is anomalies

    number 138 is scatter plot for (x,y) where x is anomalies and y is anomalies 6months into future

    His In[54] is scatter plot for (x,y) where x is link y is anomalies 6 months into future.

    I did his plots in mathematica so we match

    Comment Source:Hello John I plotted Daniel's plots. number 136 is scatter plot for (x,y) where x is link and y is anomalies number 138 is scatter plot for (x,y) where x is anomalies and y is anomalies 6months into future His In[54] is scatter plot for (x,y) where x is link y is anomalies 6 months into future. I did his plots in mathematica so we match
  • 141.

    Basically visually if I look at Daniel's plots I can see that linear fit will be best for 138 plot which is (x,y) where x is anomalies and y is anomalies 6months into future.

    Slight improvement but not as much as 138 in In[54] for (x,y) where x is link y is anomalies 6 months into future.

    Comment Source:Basically visually if I look at Daniel's plots I can see that linear fit will be best for 138 plot which is (x,y) where x is anomalies and y is anomalies 6months into future. Slight improvement but not as much as 138 in In[54] for (x,y) where x is link y is anomalies 6 months into future.
  • 142.

    Thanks, Dara!

    (And thanks, Jim, but I'll need to think more about that.)

    Comment Source:Thanks, Dara! (And thanks, Jim, but I'll need to think more about that.)
  • 143.

    Hi John,

    Your guesses were correct. Dara, thanks for confiming in the meantime.

    In more detail:

    #133

    I am rather strongly convinced that the link strength, at least as represented in the github file, is essentially unrelated to nino34. My main source of doubt is the possibility that there is an error how I temporally align the signals.

    That comment is due to the mean problem in the xcorr plot due the link strength not being centered. It is still true that link strenth is largely uncorrelated with the anomaly at the same time, but there is a weak but interesting correlation with the future anomaly

    #134

    Daniel: I believe that “Once the mean is subtracted from the anomaly” should be “Once the mean is subtracted from the link strength”, and “I have updated the notebook to subtract the mean from the anomaly” should be “I have updated the notebook to subtract the mean from the link strength”. Right?

    Correct, the mean needs to be subracted from the link strength.

    #136

    Question for Daniel: are these normalized by subtracting off the mean and dividing by the standard deviation? My guess: the mean has been subtracted off the link strength; it’s already subtracted off in the definition of Niño 3.4. These quantities have not been divided by their standard deviation.

    Correct.

    Question for Daniel: are these the Niño 3.4 and link strength at the same time? My guess: yes.

    Correct

    #137

    Output 54 again shows the link strength versus the Niño 3.4 index using data from 1950 to 2013:

    Yes, but the plot included in comment 137 is a repeat of Output 53. Output 54 does show some correlation

    Question for Daniel: what’s the difference?

    Guess: this plots the Niño 3.4 index against the link strength 6 months earlier, or perhaps a year earlier.

    Correct (link vs the anomaly later) but, but plot linked in the comment is not Output 54.

    figure(figsize=(9,9)) scatter(df["link"].iloc[:-6], y=df["ANOM"].iloc[6:], alpha=0.3, marker="o")

    I don’t know what .iloc[:-6] and .iloc[6:] mean, hence my confusion.

    Numpy, Scipy and Panda support slicing, x[5:-10] mean give me x witheout the first 5 and last 10 elements. The above code is plotting the anomally without the last 6 month against the anomaly without the first 6 months, ie the anomaly agininst itself with a 6 month lag. df is a Pandas DataFrame rather than a numpy array so the indexing works a little differently: x.iloc[a:b] is the same as x[:,a:b] if x were a numpy array, ie it takes all the rows from a to b.

    Now that you brought it up, the iloc is actually unnnecessary here since it is being called on df["link"] which is one dimensional and so figure(figsize=(9,9)) scatter(df["link"][:-6], y=df["ANOM"][6:], alpha=0.3, marker="o") would work as well.

    # 138

    Output 14 is another scatter plot. This one looks a bit better.

    Question for Daniel: what’s the difference?

    Guess: this plots the Niño 3.4 index against itself one year later, or maybe 6 months later. This graph was produced by:

    Correct, the anomaly against itself 6 month later. Just to be clear, in all the time delayed plot the y axis is the future value.

    figure(figsize=(9,9)) scatter(df.ANOM[:-6], y=df.ANOM[6:], alpha=0.3, marker="o")

    I don’t know what :-6 and 6: mean, hence my confusion.

    it is the array slicing like in #137. Slicing notation also works in R.

    Comment Source:Hi John, Your guesses were correct. Dara, thanks for confiming in the meantime. In more detail: #### #133 >> I am rather strongly convinced that the link strength, at least as represented in the github file, is essentially unrelated to nino34. My main source of doubt is the possibility that there is an error how I temporally align the signals. That comment is due to the mean problem in the xcorr plot due the link strength not being centered. It is still true that link strenth is largely uncorrelated with the anomaly at the same time, but there is a weak but interesting correlation with the future anomaly #### #134 > Daniel: I believe that “Once the mean is subtracted from the anomaly” should be “Once the mean is subtracted from the link strength”, and “I have updated the notebook to subtract the mean from the anomaly” should be “I have updated the notebook to subtract the mean from the link strength”. Right? Correct, the mean needs to be subracted from the link strength. #### #136 > Question for Daniel: are these normalized by subtracting off the mean and dividing by the standard deviation? > My guess: the mean has been subtracted off the link strength; it’s already subtracted off in the definition of Niño 3.4. These quantities have not been divided by their standard deviation. Correct. > Question for Daniel: are these the Niño 3.4 and link strength at the same time? > My guess: yes. Correct #### #137 > Output 54 again shows the link strength versus the Niño 3.4 index using data from 1950 to 2013: Yes, but **the plot included in comment 137 is a repeat of Output 53. Output 54 does show some correlation** > **Question for Daniel: what’s the difference?** > Guess: this plots the Niño 3.4 index against the link strength 6 months earlier, or perhaps a year earlier. Correct (link vs the anomaly later) but, but plot linked in the comment is not Output 54. > `figure(figsize=(9,9)) scatter(df["link"].iloc[:-6], y=df["ANOM"].iloc[6:], alpha=0.3, marker="o")` > I don’t know what .iloc[:-6] and .iloc[6:] mean, hence my confusion. Numpy, Scipy and Panda support slicing, `x[5:-10]` mean give me `x` witheout the first 5 and last 10 elements. The above code is plotting the anomally without the last 6 month against the anomaly without the first 6 months, ie the anomaly agininst itself with a 6 month lag. `df` is a Pandas DataFrame rather than a numpy array so the indexing works a little differently: `x.iloc[a:b]` is the same as `x[:,a:b]` if x were a numpy array, ie it takes all the rows from `a` to `b`. Now that you brought it up, the `iloc` is actually unnnecessary here since it is being called on df["link"] which is one dimensional and so `figure(figsize=(9,9)) scatter(df["link"][:-6], y=df["ANOM"][6:], alpha=0.3, marker="o")` would work as well. #### # 138 > Output 14 is another scatter plot. This one looks a bit better. > **Question for Daniel: what’s the difference?** > Guess: this plots the Niño 3.4 index against itself one year later, or maybe 6 months later. This graph was produced by: Correct, the anomaly against itself 6 month later. Just to be clear, in all the time delayed plot the y axis is the future value. > `figure(figsize=(9,9)) scatter(df.ANOM[:-6], y=df.ANOM[6:], alpha=0.3, marker="o")` > I don’t know what :-6 and 6: mean, hence my confusion. it is the array slicing like in #137. Slicing notation also works in R.
  • 144.
    edited November 2014

    I have updated the notebook with a a scatterplot of the combine link+anomaly based 6 month prediction vs actual 6 month future anomaly showing the benefit of the combination. I have also done a little house keeping.

    Comment Source:I have updated the notebook with a a scatterplot of the combine link+anomaly based 6 month prediction vs actual 6 month future anomaly showing the benefit of the combination. I have also done a little house keeping.
  • 145.

    Daniel

    IN[15] is great, shows the past 6 months give better forecast.

    IN[16] reg.fit: Is not clear what this does, df "ANOM","link" means u are using the anom and link distribution with link again, this is of course produces a better fit, but you used the link twice

    Comment Source:Daniel IN[15] is great, shows the past 6 months give better forecast. IN[16] reg.fit: Is not clear what this does, df "ANOM","link" means u are using the anom and link distribution with link again, this is of course produces a better fit, but you used the link twice
  • 146.

    I just wanted to show some animations I made when starting to look at this data again.

    They all show the Nino3.4 Anomalies plotted against a lagged version of itself.

    This one shows how the correlation between the two fall off as you increase the lag. Meaning the nice diagonal line turns into a big blob.

    slr3d

    The next three show the plots being drawn as we pass through the '97-'98 El Nino at lags of 1, 3, and 6.

    Lag of 1 month.

    slr3d

    Lag of 3 months.

    slr3d

    Lag of 6 months.

    slr3d

    In all of them you can see the Anomaly index take a long walk outside of the typical range during the El Nino event. This makes some sense as the definition of an El Nino is when the 3 month mean of this index is above .5 for 5 consecutive months.

    Comment Source:I just wanted to show some animations I made when starting to look at this data again. They all show the Nino3.4 Anomalies plotted against a lagged version of itself. This one shows how the correlation between the two fall off as you increase the lag. Meaning the nice diagonal line turns into a big blob. ![slr3d](http://imageshack.com/a/img905/1144/OzgknG.gif) The next three show the plots being drawn as we pass through the '97-'98 El Nino at lags of 1, 3, and 6. Lag of 1 month. ![slr3d](http://imageshack.com/a/img661/8272/cCwDrV.gif) Lag of 3 months. ![slr3d](http://imageshack.com/a/img674/3406/dVOIzG.gif) Lag of 6 months. ![slr3d](http://imageshack.com/a/img537/9376/YEtztR.gif) In all of them you can see the Anomaly index take a long walk outside of the typical range during the El Nino event. This makes some sense as the definition of an El Nino is when the 3 month mean of this index is above .5 for 5 consecutive months.
  • 147.

    IN[15] is great, shows the past 6 months give better forecast.

    IN[16] reg.fit: Is not clear what this does, df “ANOM”,”link” means u are using the anom and link distribution with link again, this is of course produces a better fit, but you used the link twice

    Thanks Blake!! Good catch!! I was actually using link strength + anomaly to predict the link strength in 6 months. I thought there was something odd about the results. I have updated everything now. The combined prediction does not look anywhere nearly as good anymore, but still noticeably better then either by itself. "ANOM" is the anomaly and "link" is the link strength. Time shifting is done with numpy array slicing like explained above.

    Comment Source:> IN[15] is great, shows the past 6 months give better forecast. > IN[16] reg.fit: Is not clear what this does, df “ANOM”,”link” means u are using the anom and link distribution with link again, this is of course produces a better fit, but you used the link twice Thanks Blake!! Good catch!! I was actually using link strength + anomaly to predict the link strength in 6 months. I thought there was something odd about the results. I have updated everything now. The combined prediction does not look anywhere nearly as good anymore, but still noticeably better then either by itself. "ANOM" is the anomaly and "link" is the link strength. Time shifting is done with numpy array slicing like explained above.
  • 148.
    edited November 2014

    I get MS = 0.09 for anom vs. link. This MS is way lower than the other numbers, confusing? no it means the linear fit model is not a good model IlMHO.

    but MS=0.61 for anom vs. link (time shifted by -6)

    MS=0.56 for anom vs. anom 6 month time shift.

    In other words the time shifted link produces more error. The anom time shift is better than the anom vs. link time shift -6.

    Comment Source:I get MS = 0.09 for anom vs. link. This MS is way lower than the other numbers, confusing? no it means the linear fit model is not a good model IlMHO. but MS=0.61 for anom vs. link (time shifted by -6) MS=0.56 for anom vs. anom 6 month time shift. In other words the time shifted link produces more error. The anom time shift is better than the anom vs. link time shift -6.
  • 149.

    Great catch Blake!!

    I just wanted to note that Dara made the catch, not me!

    Comment Source:>Great catch Blake!! I just wanted to note that Dara made the catch, not me!
  • 150.

    Blake I loved the animations! This is how I think of these computations, I see them in animations

    Comment Source:Blake I loved the animations! This is how I think of these computations, I see them in animations
Sign In or Register to comment.