Options

Crunch time

245

Comments

  • 51.

    Hello Graham

    Could you kindly, if you have time and no pressure, to give me a simple csv or text file of your link computations, that exactly matches the anomalies file John has in his github account, I left the link on my write ups.

    I just cannot follow the discussions, just want to incorporate it into the input vector of the ML algorithms I am doing, see if any improvement is obtained.

    My assumption is that it should be the same length as the 3.4 anomalies, else no idea how to use it

    Comment Source:Hello Graham Could you kindly, if you have time and no pressure, to give me a simple csv or text file of your link computations, that exactly matches the anomalies file John has in his github account, I left the link on my write ups. I just cannot follow the discussions, just want to incorporate it into the input vector of the ML algorithms I am doing, see if any improvement is obtained. My assumption is that it should be the same length as the 3.4 anomalies, else no idea how to use it
  • 52.

    Dara, John used my code to make a file of link strengths which Daniel has been using. Its every ten days, not every calendar month. Daniel is probably more help than me, since he's recently done some ten-day to month conversion, and I'm too busy with other things right now.

    Comment Source:Dara, John used my code to make a file of link strengths which Daniel has been using. Its every ten days, not every calendar month. Daniel is probably more help than me, since he's recently done some ten-day to month conversion, and I'm too busy with other things right now.
  • 53.

    Trade winds can act to pile up water on one side of the Pacific and the result is delayed as the imbalance propagates downward into the thermocline. This is very much a collective mode that has a continuous sense of lag within the greater volume. That is part of the reason that I transitioned to using the equations of hydrodynamics that model sloshing of liquid volumes, to at least keep it in line with a physical model of behavior.

    The analogy is that if I was looking at a standing wave on one end of a waveguide and expecting that end to tell me more about the dynamics than the other end of the waveguide. It is a standing wave mode, so every segment of the waveguide really contains identical information, with nothing really leading or lagging except in a continuous fashion. The collective is what really matters and how the external forcing modifies the collective behavior.

    That's also why all the armchair prognosticating never seems to work. Someone point out that winds are picking up, indicating some pending change, but they don't know if that is the result of the change already happening or if there are already changes occurring underneath the waters that have sufficient inertia to resist those changes temporarily.

    In terms of one's own intuition, consider how sloshing is one of the most difficult behaviors to control . Think about carrying a big bowl of water around and trying to prevent sloshing once it gets in motion. It takes great practice to know what counter-motions are needed to damp out the sloshing and most of those are counter-intuitive.

    Comment Source:Trade winds can act to pile up water on one side of the Pacific and the result is delayed as the imbalance propagates downward into the thermocline. This is very much a collective mode that has a continuous sense of lag within the greater volume. That is part of the reason that I transitioned to using the equations of hydrodynamics that model sloshing of liquid volumes, to at least keep it in line with a physical model of behavior. The analogy is that if I was looking at a standing wave on one end of a waveguide and expecting that end to tell me more about the dynamics than the other end of the waveguide. It is a standing wave mode, so every segment of the waveguide really contains *identical* information, with nothing really leading or lagging except in a continuous fashion. The collective is what really matters and how the external forcing modifies the collective behavior. That's also why all the armchair prognosticating never seems to work. Someone point out that winds are picking up, indicating some pending change, but they don't know if that is the result of the change already happening or if there are already changes occurring underneath the waters that have sufficient inertia to resist those changes temporarily. In terms of one's own intuition, consider how sloshing is one of the most difficult behaviors to control . Think about carrying a big bowl of water around and trying to prevent sloshing once it gets in motion. It takes great practice to know what counter-motions are needed to damp out the sloshing and most of those are counter-intuitive.
  • 54.

    Jim wrote:

    The Kawale paper is on ResearchGate to which us non-academics don’t have access [...]

    As far as I know, anyone can sign up to join ResearchGate by going here. I haven't joined, though.

    Comment Source:Jim wrote: > The Kawale paper is on ResearchGate to which us non-academics don’t have access [...] As far as I know, anyone can sign up to join ResearchGate by going [here](http://www.researchgate.net/). I haven't joined, though.
  • 55.
    edited November 2014

    Dara wrote:

    Could you kindly, if you have time and no pressure, to give me a simple csv or text file of your link computations...

    All the link strengths Graham computed using Ludescher et al's algorithm are available on my Github site. There are two files:

    • average-link-strength.txt. This file has the average link strength S as computed by ludescher.R at 10-day intervals, starting from day 730 and going until day 12040, where day 1 is the first of January 1948.

    • average-link-strength-1948-2013.txt. The second column in this file lists the average link strengths S as computed by ludescher.R at 10-day intervals, starting from day 730, and going until day 24090, where day 1 is the first of January 1948. The first column numbers these items from 1 to 2337.

    I will now copy these to the Azimuth Project github site. We should try to put data there.

    Comment Source:Dara wrote: > Could you kindly, if you have time and no pressure, to give me a simple csv or text file of your link computations... All the link strengths Graham computed using Ludescher _et al_'s algorithm are available [on my Github site](https://github.com/johncarlosbaez/el-nino). There are two files: * [average-link-strength.txt](https://github.com/johncarlosbaez/el-nino/blob/master/R/average-link-strength.txt). This file has the average link strength S as computed by ludescher.R at 10-day intervals, starting from day 730 and going until day 12040, where day 1 is the first of January 1948. * [average-link-strength-1948-2013.txt](https://github.com/johncarlosbaez/el-nino/blob/master/R/average-link-strength-1948-2013.txt). The second column in this file lists the average link strengths S as computed by ludescher.R at 10-day intervals, starting from day 730, and going until day 24090, where day 1 is the first of January 1948. The first column numbers these items from 1 to 2337. I will now copy these to the [Azimuth Project github site](https://github.com/azimuth-project). We should try to put data there.
  • 56.

    Thanx John

    I have enough time before traveling Sunday to incorporate these links into the ML input vectors, I might ask a bunch of dumb questions ... sorry

    Dara

    Comment Source:Thanx John I have enough time before traveling Sunday to incorporate these links into the ML input vectors, I might ask a bunch of dumb questions ... sorry Dara
  • 57.
    edited November 2014

    Daniel wrote:

    We then need to reconcile Graham's results with my analyses, since I am seeing no meaningful relationship between link strength and nino34.

    These aren't exactly Graham's results - these are the results of Ludescher et al, which I explained in El Niño project (part 3). Graham redid their calculation of the link strengths, and began to analyze their work.

    For some more critical analysis of what Ludescher et al did, read Steve Wenner's article El Niño project (part 6). They predict an El Niño in the next calendar year if the link strength exceeds 2.82. They claim about a 75% success rate. But as Steve pointed out, their results aren't extremely significant. I'm focusing on their paper not because I think it's good, but because I have to give a talk on climate networks, in 2 weeks, and this is the most famous paper on climate networks.

    Was Graham using the link strength numbers from our github file?

    Yes! More precisely, Graham wrote the program which computed the numbers in this file!

    Comment Source:Daniel wrote: > We then need to reconcile Graham's results with my analyses, since I am seeing no meaningful relationship between link strength and nino34. These aren't exactly Graham's results - these are the results of Ludescher _et al_, which I explained in [El Niño project (part 3)](http://johncarlosbaez.wordpress.com/2014/07/01/el-nino-project-part-3/). Graham redid their calculation of the link strengths, and began to analyze their work. For some more critical analysis of what Ludescher _et al_ did, read Steve Wenner's article [El Niño project (part 6)](http://johncarlosbaez.wordpress.com/2014/07/23/el-nino-project-part-6/). They predict an El Niño in the next calendar year if the link strength exceeds 2.82. They claim about a 75% success rate. But as Steve pointed out, their results aren't extremely significant. I'm focusing on their paper not because I think it's good, but because I have to give a talk on climate networks, in 2 weeks, and this is the most famous paper on climate networks. > Was Graham using the link strength numbers from our github file? Yes! More precisely, Graham wrote the program which computed the numbers in this file!
  • 58.
    edited November 2014

    WebHubTel wrote:

    I understand how a dipole is detected, which is looking for correlations that approach -1, but not the link strength.

    I explained this in El Niño project (part 3), if you're interested. As Graham points out, it may not be a well-motivated definition.

    Comment Source:WebHubTel wrote: > I understand how a dipole is detected, which is looking for correlations that approach -1, but not the link strength. I explained this in [El Niño project (part 3)](http://johncarlosbaez.wordpress.com/2014/07/01/el-nino-project-part-3/), if you're interested. As Graham points out, it may not be a well-motivated definition.
  • 59.
    edited November 2014

    Okay, let me try to start outlining my talk.

    0) I'm very flattered to be invited to speak here. I was probably invited because of my abstract mathematical work on networks and category theory. But when I got the invitation, instead of talking about something I understood, I thought I'd learn about something a bit more practical and talk about that. This was a bad idea. (Laughter, I hope.) But I'll try to make the best of it.

    I've been trying to learn climate science. There's a subject called "complex networks" where people do statistical analyses of large graphs like the worldwide web or Facebook and draw conclusions from it. People are trying to apply these ideas to climate science. So that's what I'll talk about. I'll be reviewing a lot of other people's work, but also describing some work by a project I'm involved in, the Azimuth Project. (List of names.)

    1) You've probably heard about the "global warming pause". Is this a real thing? If so, is it due to "natural variability", heat going into the deep oceans, some combination of both, or something else?

    Here is chart of global average air temperatures at sea level, put together by NASA's Goddard Institute of Space Science:

    (What are the green error bars, exactly?) You can see a lot of fluctuations, including a big dip after 1940 and a tiny dip after 2000. That tiny dip is the so-called "global warming pause". What causes these fluctuations? That's a big, complicated question.

    Comment Source:Okay, let me try to start outlining my talk. 0) I'm very flattered to be invited to speak here. I was probably invited because of my abstract mathematical work on networks and category theory. But when I got the invitation, instead of talking about something I understood, I thought I'd learn about something a bit more practical and talk about that. This was a bad idea. (Laughter, I hope.) But I'll try to make the best of it. I've been trying to learn climate science. There's a subject called "complex networks" where people do statistical analyses of large graphs like the worldwide web or Facebook and draw conclusions from it. People are trying to apply these ideas to climate science. So that's what I'll talk about. I'll be reviewing a lot of other people's work, but also describing some work by a project I'm involved in, the Azimuth Project. (List of names.) 1) You've probably heard about the "global warming pause". Is this a real thing? If so, is it due to "natural variability", heat going into the deep oceans, some combination of both, or something else? Here is chart of global average air temperatures at sea level, put together by NASA's Goddard Institute of Space Science: <img src = "http://math.ucr.edu/home/baez/climate_networks/gistemp_1880-2013.jpg" alt = ""/> (What are the green error bars, exactly?) You can see a lot of fluctuations, including a big dip after 1940 and a tiny dip after 2000. That tiny dip is the so-called "global warming pause". What causes these fluctuations? That's a big, complicated question.
  • 60.
    edited November 2014

    3) One cause of temperature fluctuations is a kind of cycle whose extremes are called El Niño and La Niña.

    (Graph from NOAA.)

    Comment Source:3) One cause of temperature fluctuations is a kind of cycle whose extremes are called El Ni&ntilde;o and La Ni&ntilde;a. <img width = "500" src = "http://math.ucr.edu/home/baez/climate_networks/ENSO_global_temperature_anomalies.png" alt = ""/> (Graph from [NOAA](http://www.ncdc.noaa.gov/sotc/global/2012/13).)
  • 61.
    edited November 2014

    4) A lot of things happen during an El Niño. For example, in 1997 and 1998, a big El Niño, we saw all these events:

    (Map from Shrimp News International.)

    El Niño is part of an irregular cycle that happens every 3 to 7 years, called the El Niño Southern Oscillation or ENSO.

    During the hot phase, El Niño, hot water in the Pacific moves from near Asia toward California, and air pressures get higher near that part of the Pacific. During the cool phase, La Niña, the water near California cools down, and air pressures get higher near the coast of Asia.

    Comment Source:4) A lot of things happen during an El Ni&ntilde;o. For example, in 1997 and 1998, a big El Ni&ntilde;o, we saw all these events: <img src = "http://math.ucr.edu/home/baez/climate_networks/ElNinoMap1998.jpg" alt = ""/> (Map from [Shrimp News International](https://www.shrimpnews.com/FreeReportsFolder/WeatherFolder/ElNino.html).) El Ni&ntilde;o is part of an irregular cycle that happens every 3 to 7 years, called the **El Ni&ntilde;o Southern Oscillation** or **ENSO**. During the hot phase, **El Ni&ntilde;o**, hot water in the Pacific moves from near Asia toward California, and air pressures get higher near that part of the Pacific. During the cool phase, **La Ni&ntilde;a**, the water near California cools down, and air pressures get higher near the coast of Asia.
  • 62.
    edited November 2014

    5) El Niños are important because they can cause billions of dollars of economic damage.

    They also seem to bring heat stored in the deeper waters of the Pacific into the atmosphere. So, one reason for the "global warming pause" may be that we haven't had a strong El Niño since 1998. The global warming pause might end with the next El Niño. For a while it seemed we were due for a big one this fall, but that hasn't happened.

    Comment Source:5) El Ni&ntilde;os are important because they can cause billions of dollars of economic damage. They also seem to bring heat stored in the deeper waters of the Pacific into the atmosphere. So, one reason for the "global warming pause" may be that we haven't had a strong El Ni&ntilde;o [since 1998](http://ggweather.com/enso/oni.htm). The global warming pause might end with the next El Ni&ntilde;o. For a while it seemed we were due for a big one this fall, but that hasn't happened.
  • 63.

    6) The ENSO cycle is just one of many cycles involving teleconnections - strong correlations between weather at distant locations, typically thousands of kilometers. People have systematically looked for these teleconnections using principal component analysis of climate data, and also other techniques.

    The ENSO cycle shows up automatically when you do this kind of study. It stands out as the biggest source of climate variability on time scales greater than a year and less than a decade.

    (Flesh this out; include some pretty pictures and name the second biggest one.)

    Comment Source:6) The ENSO cycle is just one of many cycles involving **teleconnections** - strong correlations between weather at distant locations, typically thousands of kilometers. People have systematically looked for these teleconnections using [principal component analysis](https://en.wikipedia.org/wiki/Principal_component_analysis) of climate data, and also other techniques. The ENSO cycle shows up automatically when you do this kind of study. It stands out as the biggest source of climate variability on time scales greater than a year and less than a decade. (Flesh this out; include some pretty pictures and name the second biggest one.)
  • 64.

    We then need to reconcile Graham's results with my analyses, since I am seeing no meaningful relationship between link strength and nino34.

    These aren't exactly Graham's results - these are the results of Ludescher et al, which I explained in El Niño project (part 3). Graham redid their calculation of the link strengths, and began to analyze their work.

    I am aware of that, but as I remember there was a lot of discussion as to what data and algorithms Ludescher et al actually used. The discrepancy between my results and theirs could have stemmed from differences/problems in the data. Knowing that Graham reproduced their result with the same data and known algorithms was very important since it narrowed down the possible sources of the discrepancy which made it easier to track down the problem (not subtracting out the mean from the link strength signal #39)

    Comment Source:>> We then need to reconcile Graham's results with my analyses, since I am seeing no meaningful relationship between link strength and nino34. > These aren't exactly Graham's results - these are the results of Ludescher _et al_, which I explained in [El Ni&ntilde;o project (part 3)](http://johncarlosbaez.wordpress.com/2014/07/01/el-nino-project-part-3/). Graham redid their calculation of the link strengths, and began to analyze their work. I am aware of that, but as I remember there was a lot of discussion as to what data and algorithms Ludescher _et al_ actually used. The discrepancy between my results and theirs could have stemmed from differences/problems in the data. Knowing that Graham reproduced their result with the same data and known algorithms was very important since it narrowed down the possible sources of the discrepancy which made it easier to track down the problem (not subtracting out the mean from the link strength signal [#39](http://forum.azimuthproject.org/discussion/1523/crunch-time/?Focus=13485#Comment_13485))
  • 65.
    edited November 2014

    The observations in #39 and #41 are more interesting then I initially realized. Since the current nino34 and current link strength are both correlated with nino34 in the future but not with each other, means that they carry independent information about future nino34. So while the link strength is a weak predictor it can be valuable because of its independence. This is born out combining the current nino34 and link strength in a model:

    > summary(lm(d$ANOM[7:nrow(d)] ~ d$ANOM[1:(nrow(d)-6)] + d$link[1:(nrow(d)-6)]))
    
    Call:
    lm(formula = d$ANOM[7:nrow(d)] ~ d$ANOM[1:(nrow(d) - 6)] + d$link[1:(nrow(d) - 
        6)])
    
    Residuals:
         Min       1Q   Median       3Q      Max 
    -2.04005 -0.46827 -0.01113  0.46492  2.05495 
    
    Coefficients:
                            Estimate Std. Error t value Pr(>|t|)    
    (Intercept)              0.01309    0.02620   0.500    0.618    
    d$ANOM[1:(nrow(d) - 6)]  0.36617    0.03206  11.423  < 2e-16 ***
    d$link[1:(nrow(d) - 6)]  0.65109    0.08501   7.659 5.71e-14 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    Residual standard error: 0.7231 on 759 degrees of freedom
    Multiple R-squared:  0.222, Adjusted R-squared:   0.22 
    F-statistic: 108.3 on 2 and 759 DF,  p-value: < 2.2e-16
    

    This model has an $R^2$ of .22 compared to .16 and 0.9 of from the individual signals. It is fairly close to the sum of the $R^2$ of the individual signal models minus the $R^2$ of the model between them (0.016). This suggests that the information in links strength is all new, not just degraded information of the current enso34. While $R^2=.22$ is still lower then some the other models, it is starting to catch up, and it is kind of impressive for a liner model based on just 2 numbers.

    Comment Source:The observations in [#39](http://forum.azimuthproject.org/discussion/1523/crunch-time/?Focus=13485#Comment_13485) and [#41](http://forum.azimuthproject.org/discussion/1523/crunch-time/?Focus=13487#Comment_13487) are more interesting then I initially realized. Since the current nino34 and current link strength are both correlated with nino34 in the future but not with each other, means that they carry independent information about future nino34. So while the link strength is a weak predictor it can be valuable because of its independence. This is born out combining the current nino34 and link strength in a model: > summary(lm(d$ANOM[7:nrow(d)] ~ d$ANOM[1:(nrow(d)-6)] + d$link[1:(nrow(d)-6)])) Call: lm(formula = d$ANOM[7:nrow(d)] ~ d$ANOM[1:(nrow(d) - 6)] + d$link[1:(nrow(d) - 6)]) Residuals: Min 1Q Median 3Q Max -2.04005 -0.46827 -0.01113 0.46492 2.05495 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.01309 0.02620 0.500 0.618 d$ANOM[1:(nrow(d) - 6)] 0.36617 0.03206 11.423 < 2e-16 *** d$link[1:(nrow(d) - 6)] 0.65109 0.08501 7.659 5.71e-14 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.7231 on 759 degrees of freedom Multiple R-squared: 0.222, Adjusted R-squared: 0.22 F-statistic: 108.3 on 2 and 759 DF, p-value: < 2.2e-16 This model has an $R^2$ of .22 compared to .16 and 0.9 of from the individual signals. It is fairly close to the sum of the $R^2$ of the individual signal models minus the $R^2$ of the model between them (0.016). This suggests that the information in links strength is all new, not just degraded information of the current enso34. While $R^2=.22$ is still lower then some the other models, it is starting to catch up, and it is kind of impressive for a liner model based on just 2 numbers.
  • 66.
    edited November 2014

    Daniel wrote:

    Knowing that Graham reproduced their result with the same data and known algorithms was very important since it narrowed down the possible sources of the discrepancy which made it easier to track down the problem (not subtracting out the mean from the link strength signal

    Great! Your comment 66 looks really interesting!

    $R^2=.22$ is still lower then some the other models, it is starting to catch up, and it is kind of impressive for a linear model based on just 2 numbers.

    Do you remember where I can see what values of $R^2$ other models achieve?

    I remember there was a lot of discussion as to what data and algorithms Ludescher et al actually used.

    Summarizing, just so we're all on the same page:

    There was never much question about data Ludescher et al used, since they said quite precisely, and it's a well-known data set. The controversy over the algorithm they used centered on one particular issue, described in detail here. Briefly, we don't believe they meant what they said in one of their formulas. Using a charitable interpretation of their formula, which we described, Graham was able to almost but not quite replicate their graph of link strengths. He came very close, and we don't know the reason for the small discrepancy.

    Comment Source:Daniel wrote: > Knowing that Graham reproduced their result with the same data and known algorithms was very important since it narrowed down the possible sources of the discrepancy which made it easier to track down the problem (not subtracting out the mean from the link strength signal Great! Your comment [66](http://forum.azimuthproject.org/discussion/1523/crunch-time/?Focus=13526#Comment_13526) looks really interesting! > $R^2=.22$ is still lower then some the other models, it is starting to catch up, and it is kind of impressive for a linear model based on just 2 numbers. Do you remember where I can see what values of $R^2$ other models achieve? > I remember there was a lot of discussion as to what data and algorithms Ludescher _et al_ actually used. Summarizing, just so we're all on the same page: There was never much question about data Ludescher _et al_ used, since they said quite precisely, and it's a well-known data set. The controversy over the algorithm they used centered on one particular issue, [described in detail here](http://www.azimuthproject.org/azimuth/show/Blog+-+El+Ni%C3%B1o+project+%28part+3%29#Nuances). Briefly, we don't believe they meant what they said in one of their formulas. Using a charitable interpretation of their formula, which we described, Graham was able to _almost_ but _not quite_ replicate their graph of link strengths. He came _very_ close, and we don't know the reason for the small discrepancy.
  • 67.
    edited November 2014

    Continuing the draft of my talk:

    6) The ENSO cycle is just one of many cycles involving teleconnections - strong correlations between weather at distant locations, typically thousands of kilometers. People have systematically looked for these teleconnections using principal component analysis of climate data, and also other techniques.

    The ENSO cycle shows up automatically when you do this kind of study. It stands out as the biggest source of climate variability on time scales greater than a year and less than a decade. Some others include:

    (I won't say all the following stuff; it's just notes to myself)

    • The Pacific-North America Index: one of the most prominent modes of low-frequency variability in the Northern Hemisphere extratropics, appearing in all months except June and July. The PNA is seen in a quadrupole pattern of 500 millibar height anomalies, with anomalies of similar sign located south of the Aleutian Islands and over the southeastern United States. Anomalies with sign opposite to the Aleutian center are located in the vicinity of Hawaii, and over the intermountain region of North America (central Canada) during the winter and fall.

    • The Pacific Decadal Oscillation is a longer-lived El Niño-like pattern of Pacific climate variability (Zhang et al. 1997). As seen with the better-known El Niño/Southern Oscillation (ENSO), extremes in the PDO pattern are marked by widespread variations in the Pacific Basin and the North American climate. In parallel with the ENSO phenomenon, the extreme phases of the PDO have been classified as being either warm or cool. When SSTs are anomalously cool in the interior North Pacific and warm along the Pacific Coast, and when sea level pressures are below average over the North Pacific, the PDO has a positive value. When the climate anomaly patterns are reversed, with warm SST anomalies in the interior and cool SST anomalies along the North American coast, or above average sea level pressures over the North Pacific, the PDO has a negative value.

    • The North Atlantic Oscillation index is based on the surface sea-level pressure difference between the Subtropical (Azores) High and the Subpolar Low. The positive phase of the NAO reflects below-normal heights and pressure across the high latitudes of the North Atlantic and above-normal heights and pressure over the central North Atlantic, the eastern United States and western Europe. The negative phase reflects an opposite pattern of height and pressure anomalies over these regions. Both phases of the NAO are associated with basin-wide changes in the intensity and location of the North Atlantic jet stream and storm track, and in large-scale modulations of the normal patterns of zonal and meridional heat and moisture transport, which in turn results in changes in temperature and precipitation patterns often extending from eastern North America to western and central Europe. Strong positive phases of the NAO tend to be associated with above-normal temperatures in the eastern United States and across northern Europe and below-normal temperatures in Greenland and oftentimes across southern Europe and the Middle East. They are also associated with above-normal precipitation over northern Europe and Scandinavia and below-normal precipitation over southern and central Europe. Opposite patterns of temperature and precipitation anomalies are typically observed during strong negative phases of the NAO.

    • The Arctic Oscillation is a large scale mode of climate variability, also referred to as the Northern Hemisphere annular mode. The AO is a climate pattern characterized by winds circulating counterclockwise around the Arctic at around 55°N latitude. When the AO is in its positive phase, a ring of strong winds circulating around the North Pole acts to confine colder air across polar regions. This belt of winds becomes weaker and more distorted in the negative phase of the AO, which allows an easier southward penetration of colder Arctic air masses and increased storminess into the mid-latitudes.

    (Include some pretty pictures and see if I can find work that detects these teleconnections starting from a pile of climate data. An old technique is principal component analysis, known in this field as empirical orthogonal functions. There's a paper by Kutzbach on this stuff.)

    Comment Source:Continuing the draft of my talk: 6) The ENSO cycle is just one of many cycles involving **teleconnections** - strong correlations between weather at distant locations, typically thousands of kilometers. People have systematically looked for these teleconnections using [principal component analysis](https://en.wikipedia.org/wiki/Principal_component_analysis) of climate data, and also other techniques. The ENSO cycle shows up automatically when you do this kind of study. It stands out as the biggest source of climate variability on time scales greater than a year and less than a decade. Some others include: (I won't say all the following stuff; it's just notes to myself) * The [Pacific-North America Index](http://www.ncdc.noaa.gov/teleconnections/pna.php): one of the most prominent modes of low-frequency variability in the Northern Hemisphere extratropics, appearing in all months except June and July. The PNA is seen in a quadrupole pattern of 500 millibar height anomalies, with anomalies of similar sign located south of the Aleutian Islands and over the southeastern United States. Anomalies with sign opposite to the Aleutian center are located in the vicinity of Hawaii, and over the intermountain region of North America (central Canada) during the winter and fall. * The [Pacific Decadal Oscillation](http://www.ncdc.noaa.gov/teleconnections/pdo/) is a longer-lived El Niño-like pattern of Pacific climate variability (Zhang et al. 1997). As seen with the better-known El Niño/Southern Oscillation (ENSO), extremes in the PDO pattern are marked by widespread variations in the Pacific Basin and the North American climate. In parallel with the ENSO phenomenon, the extreme phases of the PDO have been classified as being either warm or cool. When SSTs are anomalously cool in the interior North Pacific and warm along the Pacific Coast, and when sea level pressures are below average over the North Pacific, the PDO has a positive value. When the climate anomaly patterns are reversed, with warm SST anomalies in the interior and cool SST anomalies along the North American coast, or above average sea level pressures over the North Pacific, the PDO has a negative value. * The [North Atlantic Oscillation](http://www.ncdc.noaa.gov/teleconnections/nao.php) index is based on the surface sea-level pressure difference between the Subtropical (Azores) High and the Subpolar Low. The positive phase of the NAO reflects below-normal heights and pressure across the high latitudes of the North Atlantic and above-normal heights and pressure over the central North Atlantic, the eastern United States and western Europe. The negative phase reflects an opposite pattern of height and pressure anomalies over these regions. Both phases of the NAO are associated with basin-wide changes in the intensity and location of the North Atlantic jet stream and storm track, and in large-scale modulations of the normal patterns of zonal and meridional heat and moisture transport, which in turn results in changes in temperature and precipitation patterns often extending from eastern North America to western and central Europe. Strong positive phases of the NAO tend to be associated with above-normal temperatures in the eastern United States and across northern Europe and below-normal temperatures in Greenland and oftentimes across southern Europe and the Middle East. They are also associated with above-normal precipitation over northern Europe and Scandinavia and below-normal precipitation over southern and central Europe. Opposite patterns of temperature and precipitation anomalies are typically observed during strong negative phases of the NAO. * [The Arctic Oscillation](http://www.ncdc.noaa.gov/teleconnections/ao.php) is a large scale mode of climate variability, also referred to as the Northern Hemisphere annular mode. The AO is a climate pattern characterized by winds circulating counterclockwise around the Arctic at around 55°N latitude. When the AO is in its positive phase, a ring of strong winds circulating around the North Pole acts to confine colder air across polar regions. This belt of winds becomes weaker and more distorted in the negative phase of the AO, which allows an easier southward penetration of colder Arctic air masses and increased storminess into the mid-latitudes. (Include some pretty pictures and see if I can find work that detects these teleconnections starting from a pile of climate data. An old technique is principal component analysis, known in this field as **empirical orthogonal functions**. There's a paper by [Kutzbach](http://200.132.77.47:8080/aulas/acp/ACP_curso/Referencias/5Kutzbach_1967.pdf) on this stuff.)
  • 68.
    edited November 2014

    Thanx Daniel for the links.

    I interlaced the links to the anom numbers:

    link_1, anom_1, link_2, anom_2, ... <----- (John is that ok?)

    And then did the k-NN regression, and error more than doubled. Still checking the code so note this with caution, but clearly the self-similarity of the augmented link data decreases for all distance functions.

    I am sure same will happen with NN and SVR, but requires much more time to test.

    Comment Source:Thanx Daniel for the links. I interlaced the links to the anom numbers: link_1, anom_1, link_2, anom_2, ... <----- (**John is that ok?**) And then did the k-NN regression, and error more than doubled. **Still checking the code so note this with caution**, but clearly the self-similarity of the augmented link data decreases for all distance functions. I am sure same will happen with NN and SVR, but requires much more time to test.
  • 69.

    This is cross-correlation and autocorrelation of Tahiti and Darwin signals,

    xcorr

    Note where the 4 upper arrows are. These are at 7 year intervals. The QBO oscillations are well-characterized as a 28 months or 2.33 year period. This fits 3 QBO periods to a 7 year interval, giving alignment at an integral multiple of a QBO and synchronization to a multiple year.

    I did this to show how Daniel's work fits in with dipole cross-correlation and hotspot autocorrelation.

    Comment Source:This is cross-correlation and autocorrelation of Tahiti and Darwin signals, ![xcorr](http://imageshack.com/a/img537/4074/jZCphA.gif) Note where the 4 upper arrows are. These are at 7 year intervals. The QBO oscillations are well-characterized as a 28 months or 2.33 year period. This fits 3 QBO periods to a 7 year interval, giving alignment at an integral multiple of a QBO *and* synchronization to a multiple year. I did this to show how Daniel's work fits in with dipole cross-correlation and hotspot autocorrelation.
  • 70.

    I interlaced the links to the anom numbers:

    link_1, anom_1, link_2, anom_2, … <—– (John is that ok?)

    In order to compare apples to apples, for no link I interlaced 0

    0, anom_1, 0, anom_2, … <—– (John is that ok?)

    Negligible difference between all metrics for k-NN regression

    Comment Source:I interlaced the links to the anom numbers: link_1, anom_1, link_2, anom_2, … <—– (John is that ok?) In order to compare apples to apples, for no link I interlaced 0 0, anom_1, 0, anom_2, … <—– (John is that ok?) Negligible difference between all metrics for k-NN regression
  • 71.
    edited November 2014

    Dara wrote:

    I interlaced the links to the anom numbers:

    link_1, anom_1, link_2, anom_2, … <—– (John is that ok?)

    If I understand you correctly, this sounds like a bad idea.

    The file average-link-strength-1948-2013.txt gives link strengths at 10-day intervals, starting from day 730, and going until day 24090, where day 1 is the first of January 1948.

    The file nino3.4-anoms.txt gives the Nino 3.4 index in the column called ANOM at monthly intervals starting from January 1950.

    So, if you interlace these numbers, they will start very slightly out of synch and then get very badly out of synch as one goes forward in 10-day intervals while the other goes forward in monthly intervals.

    Comment Source:Dara wrote: > I interlaced the links to the anom numbers: > link_1, anom_1, link_2, anom_2, … <—– (John is that ok?) If I understand you correctly, this sounds like a bad idea. The file [average-link-strength-1948-2013.txt](https://github.com/azimuth-project/el-nino/blob/master/average-link-strength-1948-2013.txt) gives link strengths at _10-day intervals_, starting from day 730, and going until day 24090, where day 1 is the first of January 1948. The file [nino3.4-anoms.txt](https://github.com/johncarlosbaez/el-nino/blob/master/R/nino3.4-anoms.txt) gives the Nino 3.4 index in the column called ANOM at _monthly intervals_ starting from January 1950. So, if you interlace these numbers, they will start very slightly out of synch and then get very _badly_ out of synch as one goes forward in 10-day intervals while the other goes forward in monthly intervals.
  • 72.

    John

    Daniel sent a csv file, which had the links month matched, I used that, otherwise interlacing and filling the gaps with 0s to make the input vector lengths constant, will be a bad idea as well.

    So I need to figure out how to include the link numbers with the actual inputs.

    I can retest each ML algorithm then, perhaps 2 days of training and testing

    Dara

    Comment Source:John Daniel sent a csv file, which had the links month matched, I used that, otherwise interlacing and filling the gaps with 0s to make the input vector lengths constant, will be a bad idea as well. So I need to figure out how to include the link numbers with the actual inputs. I can retest each ML algorithm then, perhaps 2 days of training and testing Dara
  • 73.

    this is the file Daniel kindly sent me

    anomaly csv

    It seems the links are mapped to the monthly values.

    Comment Source:this is the file Daniel kindly sent me [anomaly csv](http://files.lossofgenerality.com/anomaly6m.csv) It seems the links are mapped to the monthly values.
  • 74.
    edited November 2014

    Dara wrote:

    Daniel sent a csv file, which had the links month matched, I used that, otherwise interlacing and filling the gaps with 0s to make the input vector lengths constant, will be a bad idea as well.

    Okay, that's good. Filling the gaps with 0s would certainly be a bad idea.

    I'm curious how Daniel made his file. He's a smart guy so I'm confident he did the right thing, but it's an annoying technical problem, combining monthly data with 10-day data. As a pure mathematician I've never had to face this kind of problem!

    Does anyone know an online program that can answer questions like this:

    What day is 10490 days after January 1st, 1948?

    I could use it to check Daniel's work. I could figure out the answer myself, of course, but I'm lazy!

    Anyone who knows R a bit better than me could modify Graham's program so that it computes the link strengths monthly. Even I, with my poor programming skills, might be able to do it. I think that would be the really nice way to solve this problem.

    Comment Source:Dara wrote: > Daniel sent a csv file, which had the links month matched, I used that, otherwise interlacing and filling the gaps with 0s to make the input vector lengths constant, will be a bad idea as well. Okay, that's good. Filling the gaps with 0s would certainly be a bad idea. I'm curious how Daniel made his file. He's a smart guy so I'm confident he did the right thing, but it's an annoying technical problem, combining monthly data with 10-day data. As a pure mathematician I've never had to face this kind of problem! Does anyone know an online program that can answer questions like this: **What day is 10490 days after January 1st, 1948?** I could use it to check Daniel's work. I could figure out the answer myself, of course, but I'm lazy! Anyone who knows R a bit better than me could modify Graham's program so that it computes the link strengths _monthly_. Even I, with my poor programming skills, might be able to do it. I think that would be the really nice way to solve this problem.
  • 75.

    Anyone who knows R a bit better than me could modify Graham’s program so that it computes the link strengths monthly.

    I do not mind doing it, sadly I program well in R ...one of those unwanted skills, so you are not missing much ;)

    I try to figure that out if you need me to, but let's do that after we try something with Daniel's file.

    Comment Source:>Anyone who knows R a bit better than me could modify Graham’s program so that it computes the link strengths monthly. I do not mind doing it, sadly I program well in R ...one of those unwanted skills, so you are not missing much ;) I try to figure that out if you need me to, but let's do that after we try something with Daniel's file.
  • 76.

    John if I used Daniel's file, then the interlacing I proposed would work?

    anom1, link1, anom2, link2, ...

    Comment Source:John if I used Daniel's file, then the interlacing I proposed would work? anom1, link1, anom2, link2, ...
  • 77.

    Continuing my talk draft:

    7) Recently people have begun to study teleconnections using ideas from complex network theory.

    What's that? In complex network theory, people often start with a weighted graph: that is, a set $N$ of nodes and for any pair of nodes $i, j \in N$, a weight $A_{i j}$, which can be any nonnegative real number.

    Why is this called a weighted graph? It's really just a matrix of nonnegative real numbers!

    The reason is that we can turn any weighted graph into a graph by drawing an edge from node $j$ to node $i$ whenever $A_{i j} >0 $. This is a directed graph, meaning that we should draw an arrow pointing from $j$ to $i$. We could have an edge from $i$ to $j$ but not vice versa! Note that we can also have an edge from a node to itself.

    Conversely, if we have any directed graph, we can turn it into a weighted graph by choosing the weight $A_{i j} = 1$ when there's an edge from $j$ to $i$, and $A_{i j} = 0$ otherwise.

    For example, we can make a weighted graph where the nodes are web pages and $A_{i j}$ is the number of links from the web page $j$ to the web page $i$.

    People in complex network theory like examples of this sort: large weighted graphs that describe connections between web pages, or people, or cities, or neurons, or other things. The goal, so far, is to compute numbers from weighted graphs in ways that describe interesting properties of these complex networks --- and then formulate and test hypotheses about the complex networks we see in real life.

    Comment Source:Continuing my talk draft: 7) Recently people have begun to study teleconnections using ideas from complex network theory. What's that? In complex network theory, people often start with a **weighted graph**: that is, a set $N$ of **nodes** and for any pair of nodes $i, j \in N$, a **weight** $A_{i j}$, which can be any nonnegative real number. Why is this called a weighted graph? It's really just a matrix of nonnegative real numbers! The reason is that we can turn any weighted graph into a graph by drawing an edge from node $j$ to node $i$ whenever $A_{i j} &gt;0 $. This is a **directed** graph, meaning that we should draw an arrow pointing from $j$ to $i$. We could have an edge from $i$ to $j$ but not vice versa! Note that we can also have an edge from a node to itself. Conversely, if we have any directed graph, we can turn it into a weighted graph by choosing the weight $A_{i j} = 1$ when there's an edge from $j$ to $i$, and $A_{i j} = 0$ otherwise. For example, we can make a weighted graph where the nodes are web pages and $A_{i j}$ is the number of links from the web page $j$ to the web page $i$. People in complex network theory like examples of this sort: large weighted graphs that describe connections between web pages, or people, or cities, or neurons, or other things. The goal, so far, is to compute numbers from weighted graphs in ways that describe interesting properties of these complex networks --- and then formulate and test hypotheses about the complex networks we see in real life.
  • 78.

    Does anyone know an online program that can answer questions like this: What day is 10490 days after January 1st, 1948?

    The Modified Julian Day is the best bet http://en.wikipedia.org/wiki/Julian_day

    This starts from Nov 17, 1858 and keeps a running count of the number of days elapsed. It is synchronized with UTC, Coordinated Universal Time (i.e. previously GMT). This only loses Leap Seconds which you can get from GPS time

    Here is a converter from UTC to MJD http://www.csgnetwork.com/julianmodifdateconv.html

    This is a great little online utility for keeping track of the different clock schemes http://www.leapsecond.com/java/gpsclock.htm

    The actual algorithm isn't that difficult and one can use these online programs to verify your code.

    Comment Source:> Does anyone know an online program that can answer questions like this: What day is 10490 days after January 1st, 1948? The Modified Julian Day is the best bet <http://en.wikipedia.org/wiki/Julian_day> This starts from Nov 17, 1858 and keeps a running count of the number of days elapsed. It is synchronized with UTC, Coordinated Universal Time (i.e. previously GMT). This only loses Leap Seconds which you can get from GPS time Here is a converter from UTC to MJD <http://www.csgnetwork.com/julianmodifdateconv.html> This is a great little online utility for keeping track of the different clock schemes <http://www.leapsecond.com/java/gpsclock.htm> The actual algorithm isn't that difficult and one can use these online programs to verify your code.
  • 79.

    Hello John on #78 in this thread, I could make you CDFs with all the math notations and sample such graphs and their matrix reps and so, on to demo on the screen or offer as downalod candy to the listeners

    Comment Source:Hello John on #78 in this thread, I could make you CDFs with all the math notations and sample such graphs and their matrix reps and so, on to demo on the screen or offer as downalod candy to the listeners
  • 80.
    edited November 2014

    7b) Here's a very simple example of what we can do with a weighted graph. For any node $i$, we can sum up the weights of edges going into $i$:

    $$ \sum_{j \in N} A_{j i} $$ This is called the **degree* of the node $i$. For example, if lots of people have web pages with lots of links to yours, your webpage will have a high degree. If lots of people like you on Facebook, you will have a high degree.

    This is some measure of how "important" a node is.

    People have constructed climate networks where the nodes are locations on the Earth's surface, and the weight $A_{i j}$ measures how correlated the weather is at the $i$th and $j$th location. Then, the degree says how "important" a given location is for the Earth's climate - in some sense or other.

    Comment Source:7b) Here's a very simple example of what we can do with a weighted graph. For any node $i$, we can sum up the weights of edges going into $i$: $$ \sum_{j \in N} A_{j i} $$ This is called the **degree* of the node $i$. For example, if lots of people have web pages with lots of links to yours, your webpage will have a high degree. If lots of people like you on Facebook, _you_ will have a high degree. This is some measure of how "important" a node is. People have constructed climate networks where the nodes are locations on the Earth's surface, and the weight $A_{i j}$ measures how correlated the weather is at the $i$th and $j$th location. Then, the degree says how "important" a given location is for the Earth's climate - in some sense or other.
  • 81.
    edited November 2014

    John said (#75)

    I’m curious how Daniel made his file.

    I took a fairly rough and ready, but I think accurate enough, approach. The code is in the notebook link strength analysis notebook. I described the algorithm in fair amount of detail here

    To avoid people having to hop around I will just reproduce it here:


    Since there are more 10 day cycles than months the main idea is to group the 10 day periods into "months" and take the average of the values that fall into the same month and since 10 day periods do not exactly line up with months the grouping has to be somewhat approximate anyway.

    The logic is that there are 73 10-day periods in 2 non leap years or 24 months. This gives 24 30-day "months" with a 10-day period left over. So I take the 24 "months" and match those with calendar months and drop the remaining 10 day period. This means that my "months" drift in and out of phase with calendar months, but by no more than half a month ie half the resulting sampling period.

    There is also the issue with leap year, both the above approximation and John's algorithm pretend leap years do not exist. This means there could be a cumulative drift but I think that both algorithms ignore leap years in a way that happens to cancel out, since John's algorithm is actually producing exactly 73 periods every 2 years even whether there is a leap year or not, and my 30 day "months" then become 31 day months if they include Feb 29. So all that happens is the drifting in an out of phase between the 30-day months and calendar months, which resyncs every 2 years. The code that does this is below

    drop every 73 period:

    linktmp=link[arange(len(link))%73 != 0]
    

    group into 3s and average:

    ll= len(linktmp)
    print ll
    linkmnth= linktmp.groupby(arange(ll)//3).mean()
    linkmnth.reset_index()
    
    linkx=linkmnth
    

    only keep data for the intersection of the data sets:

    df=nino34.iloc[:len(linkx)]
    df.index=range(df.shape[0])
    print ll, len(linkx), df.shape
    df["link"]=linkmnth
    

    The code is implicitly assuming that the link strength data and the nino34 data start the same time, beginning of January 1950.

    Comment Source:John said ([#75](http://forum.azimuthproject.org/discussion/1523/crunch-time/?Focus=13541#Comment_13541)) > I’m curious how Daniel made his file. I took a fairly rough and ready, but I think accurate enough, approach. The code is in the notebook link strength analysis notebook. I described the algorithm in fair amount of detail [here](http://forum.azimuthproject.org/discussion/1528/random-forest-el-nino-34-offtheshelf/?Focus=13445#Comment_13445) To avoid people having to hop around I will just reproduce it here: *** Since there are more 10 day cycles than months the main idea is to group the 10 day periods into "months" and take the average of the values that fall into the same month and since 10 day periods do not exactly line up with months the grouping has to be somewhat approximate anyway. The logic is that there are 73 10-day periods in 2 non leap years or 24 months. This gives 24 30-day "months" with a 10-day period left over. So I take the 24 "months" and match those with calendar months and drop the remaining 10 day period. This means that my "months" drift in and out of phase with calendar months, but by no more than half a month ie half the resulting sampling period. There is also the issue with leap year, both the above approximation and John's algorithm pretend leap years do not exist. This means there could be a cumulative drift but I think that both algorithms ignore leap years in a way that happens to cancel out, since John's algorithm is actually producing exactly 73 periods every 2 years even whether there is a leap year or not, and my 30 day "months" then become 31 day months if they include Feb 29. So all that happens is the drifting in an out of phase between the 30-day months and calendar months, which resyncs every 2 years. The code that does this is below drop every 73 period: linktmp=link[arange(len(link))%73 != 0] group into 3s and average: ll= len(linktmp) print ll linkmnth= linktmp.groupby(arange(ll)//3).mean() linkmnth.reset_index() linkx=linkmnth only keep data for the intersection of the data sets: df=nino34.iloc[:len(linkx)] df.index=range(df.shape[0]) print ll, len(linkx), df.shape df["link"]=linkmnth The code is implicitly assuming that the link strength data and the nino34 data start the same time, beginning of January 1950.
  • 82.

    I think Daniel's approach is sound at least for this level of work we are doing, if John agrees I will use his csv file, I already loaded and tested with k-NN last night, could finish testing tonight and post, then move on using it with SVR and NN

    Comment Source:I think Daniel's approach is sound at least for this level of work we are doing, if John agrees I will use his csv file, I already loaded and tested with k-NN last night, could finish testing tonight and post, then move on using it with SVR and NN
  • 83.

    John please let me know if using Daniel's version of data is what you want and if you agree with that data to be interlaced with anom values as I proposed. If so, tell me know if interlacing with 0 makes sense as the test for no link data, in order to compare apples with apples.

    Comment Source:John please let me know if using Daniel's version of data is what you want and if you agree with that data to be interlaced with anom values as I proposed. If so, tell me know if interlacing with 0 makes sense as the test for no link data, in order to compare apples with apples.
  • 84.

    Hello John

    Waiting for your response on the interlacing idea based upon Daniel's csv.

    John I called on a top Mathematica developer, the guy who developed the Wavelet package for Wolfram, to review my code for the stuff if you are presenting them. I probably release some code to him tonight, and I report back if he sees problems.

    I also asked him to develop other algorithms possibly another decision tree machine learning algorithm to classify the samples and use for forecasting.

    The

    Comment Source:Hello John Waiting for your response on the interlacing idea based upon Daniel's csv. John I called on a top Mathematica developer, the guy who developed the Wavelet package for Wolfram, to review my code for the stuff if you are presenting them. I probably release some code to him tonight, and I report back if he sees problems. I also asked him to develop other algorithms possibly another decision tree machine learning algorithm to classify the samples and use for forecasting. The
  • 85.
    edited November 2014

    Dara - sorry to keep you waiting... yes, I'm happy enough with Daniel's file of data, and how he created it. Feel free to go ahead and compute with it! Thanks!

    And thanks, Daniel, for re-explaining how you created that file. Nice.

    (There are days when I need to focus on teaching, then days when I get to focus on the Forum. Yesterday was a teaching day.)

    Comment Source:Dara - sorry to keep you waiting... yes, I'm happy enough with Daniel's file of data, and how he created it. Feel free to go ahead and compute with it! Thanks! And thanks, Daniel, for re-explaining how you created that file. Nice. (There are days when I need to focus on teaching, then days when I get to focus on the Forum. Yesterday was a teaching day.)
  • 86.

    Hello John

    Thanx, I will go ahead and apply them to all the algorithms I already did, we could always change the merging of input data, after all it is called software for a reason :)

    Comment Source:Hello John Thanx, I will go ahead and apply them to all the algorithms I already did, we could always change the merging of input data, after all it is called software for a reason :)
  • 87.
    edited November 2014

    Continuing with my talk draft:

    8) For example, in Complex networks in climate dynamics, Donges et al take surface air temperature data on a grid and compute the Pearson correlation between grid points.

    More precisely, let $T_i(t)$ be the temperature at the $i$th grid point at month $t$ after the average for that month in all years has been subtracted off, to eliminate some seasonal variations. The Pearson correlation is then

    $$ \langle \left( T_i - \langle T_i \rangle \right)\left( T_j - \langle T_j \rangle \right) $$ divided by

    $$ \sqrt{\Big{\langle} (T_i - \langle T_i \rangle)^2 \Big{\rangle}} \; \sqrt{\Big{\langle} (T_j - \langle T_j \rangle)^2 \Big{\rangle}} $$

    Comment Source:Continuing with my talk draft: 8) For example, in [Complex networks in climate dynamics](https://www.pik-potsdam.de/members/kurths/publikationen/2009/complex-networks.pdf), Donges _et al_ take surface air temperature data on a grid and compute the Pearson correlation between grid points. More precisely, let $T_i(t)$ be the temperature at the $i$th grid point at month $t$ after the average for that month in all years has been subtracted off, to eliminate some seasonal variations. The Pearson correlation is then $$ \langle \left( T_i - \langle T_i \rangle \right)\left( T_j - \langle T_j \rangle \right) $$ divided by $$ \sqrt{\Big{\langle} (T_i - \langle T_i \rangle)^2 \Big{\rangle}} \; \sqrt{\Big{\langle} (T_j - \langle T_j \rangle)^2 \Big{\rangle}} $$
  • 88.

    John the person who wrote the Wavelet library for Wolfram agreed to review my code so I did not make any serious mistakes that might embarrass you!

    Also I took some interim computations out of the algorithm and you could use them to explain how the algorithm works, since most people cut and paste the Wiki of Machine Learning algorithms and do not have full understanding of them.

    I try everything done by Sunday.

    Comment Source:John the person who wrote the Wavelet library for Wolfram agreed to review my code so I did not make any serious mistakes that might embarrass you! Also I took some interim computations out of the algorithm and you could use them to explain how the algorithm works, since most people cut and paste the Wiki of Machine Learning algorithms and do not have full understanding of them. I try everything done by Sunday.
  • 89.

    Thanks!!!!!

    Comment Source:Thanks!!!!!
  • 90.
    edited November 2014

    Continuing with my talk draft:

    8) For example, in Complex networks in climate dynamics, Donges et al take surface air temperature data on a grid and compute the Pearson correlation between grid points.

    More precisely, let $T_i(t)$ be the temperature at the $i$th grid point at month $t$ after the average for that month in all years has been subtracted off, to eliminate some seasonal variations. They compute the Pearson correlation of $T_i(t)$ and $T_j(t)$ for each pair of grid points $i, j$. The Pearson correlation is the simplest measure of linear correlation, normalized to range between -1 and 1.

    We could construct a weighted graph $A_{i j}$ this way, and it would be symmetric, or undirected:

    $$ A_{i j} = A_{j i} $$ However, Donges et al prefer to work with a graph rather than a weighted graph. So, they create a graph where there is an edge from $i$ to $j$ (and also from $j$ to $i$) when $|A_{i j}|$ exceeds a certain threshold, and no edge otherwise.

    They can adjust this threshold so that any desired fraction of pairs $i, j$ actually have an edge between them. They choose this fraction to be 0.5%.

    Then, they plot the degree of each node, weighted by the area of its grid square:

    A certain patch dominates the world! This is the El Niño basin. The Indian Ocean comes in second.

    (Some details, which I may not say:

    The Pearson correlation is the covariance

    $$ \Big\langle \left( T_i - \langle T_i \rangle \right) \left( T_j - \langle T_j \rangle \right) \Big\rangle$$ normalized by dividing by the standard deviation of $T_i$ and the standard deviation of $T_j$.

    The reddest shade of red in this picture shows nodes that are connected to 5% or more of the other nodes. That is, these nodes are connected to at least 10 times as many nodes as average.)

    Comment Source:Continuing with my talk draft: 8) For example, in [Complex networks in climate dynamics](https://www.pik-potsdam.de/members/kurths/publikationen/2009/complex-networks.pdf), Donges _et al_ take surface air temperature data on a grid and compute the Pearson correlation between grid points. More precisely, let $T_i(t)$ be the temperature at the $i$th grid point at month $t$ after the average for that month in all years has been subtracted off, to eliminate some seasonal variations. They compute the Pearson correlation of $T_i(t)$ and $T_j(t)$ for each pair of grid points $i, j$. The Pearson correlation is the simplest measure of linear correlation, normalized to range between -1 and 1. We could construct a weighted graph $A_{i j}$ this way, and it would be symmetric, or undirected: $$ A_{i j} = A_{j i} $$ However, Donges _et al_ prefer to work with a graph rather than a weighted graph. So, they create a graph where there is an edge from $i$ to $j$ (and also from $j$ to $i$) when $|A_{i j}|$ exceeds a certain threshold, and no edge otherwise. They can adjust this threshold so that any desired fraction of pairs $i, j$ actually have an edge between them. They choose this fraction to be 0.5%. Then, they plot the degree of each node, weighted by the area of its grid square: <img width = "500" src = "http://math.ucr.edu/home/baez/climate_networks/area_weighted_connectivity_pearson_correlation_sea_surface_air_temperature_donges.jpg" alt = ""/> A certain patch dominates the world! This is the **El Ni&ntilde;o basin**. The Indian Ocean comes in second. (Some details, which I may not say: The **[Pearson correlation](https://en.wikipedia.org/w/index.php?title=Pearson_product-moment_correlation_coefficient)** is the **covariance** $$ \Big\langle \left( T_i - \langle T_i \rangle \right) \left( T_j - \langle T_j \rangle \right) \Big\rangle$$ normalized by dividing by the standard deviation of $T_i$ and the standard deviation of $T_j$. The reddest shade of red in this picture shows nodes that are connected to 5% or more of the other nodes. That is, these nodes are connected to at least 10 times as many nodes as average.)
  • 91.
    edited November 2014

    9) The Pearson correlation detects linear correlations. A more flexible measure is mutual information: how many bits of information knowing the temperature at time $t$ at grid point $i$ tells you about the temperature at the same time at grid point $j$.

    Donges et al create a climate network this way as well, putting an edge between nodes if their mutual information exceeds a certain cutoff. They choose this cutoff so that 0.5% of node pairs have an edge between them, and get the following map:

    The result is almost indistinguishable in the El Niño basin. So, this feature is not just an artifact of focusing on linear correlations.

    Comment Source:9) The Pearson correlation detects linear correlations. A more flexible measure is mutual information: how many bits of information knowing the temperature at time $t$ at grid point $i$ tells you about the temperature at the same time at grid point $j$. Donges _et al_ create a climate network this way as well, putting an edge between nodes if their mutual information exceeds a certain cutoff. They choose this cutoff so that 0.5% of node pairs have an edge between them, and get the following map: <img width = "500" src = "http://math.ucr.edu/home/baez/climate_networks/area_weighted_connectivity_mutual_information_sea_surface_air_temperature_donges.jpg" alt =""/> The result is almost indistinguishable in the El Ni&ntilde;o basin. So, this feature is not just an artifact of focusing on linear correlations.
  • 92.

    John if you have those heatmaps time sliced, I could make you animators to show the time evolution of the weather system.

    Comment Source:John if you have those heatmaps time sliced, I could make you animators to show the time evolution of the weather system.
  • 93.
    edited November 2014

    Hello John

    This is the website of the review engineer:

    Mathematica for prediction algorithms

    His name is Anton Antonov and used to work for Wolfram Research, he is a great person. I am trying to get him more involved he likes the stuff we are doing here.

    So there are no hotshots, we review each other's code and try to find the bugs early on. In the eyes of the bugs programmers are equal brothers.

    Dara

    Comment Source:Hello John This is the website of the review engineer: [Mathematica for prediction algorithms](http://mathematicaforprediction.wordpress.com/) His name is Anton Antonov and used to work for Wolfram Research, he is a great person. I am trying to get him more involved he likes the stuff we are doing here. So there are no hotshots, we review each other's code and try to find the bugs early on. In the eyes of the bugs programmers are equal brothers. Dara
  • 94.
    edited November 2014

    Thanks, Dara! I don't have these heatmaps time sliced; I got them out of the paper by Donges et al.

    By the way, after I suffer through this talk in December I will probably want to keep reusing and improving this talk. For example, this spring I'm supposed to give a talk on Skype at York University... and I'll probably talk about climate networks.

    So, work we do now, which doesn't appear in my December talk, is not wasted! It may even turn into a paper or two someday.

    Comment Source:Thanks, Dara! I don't have these heatmaps time sliced; I got them out of the paper by Donges _et al_. By the way, after I suffer through this talk in December I will probably want to keep reusing and _improving_ this talk. For example, this spring I'm supposed to give a talk on Skype at York University... and I'll probably talk about climate networks. So, work we do now, which doesn't appear in my December talk, is not wasted! It may even turn into a paper or two someday.
  • 95.
    edited November 2014

    Continuing with my talk draft:

    10) We can also look at how climate networks change with time - in particular, how they are affected by El Niños. This is the subject of a 2008 paper by Tsonis and Swanson, Topology and predictability of El Niño and La Niña networks.

    They create a climate network in a way that's similar to the one I just described, except they:

    1. Use surface air temperature data from a different source: the National Center for Environmental Prediction.

    2. Use data from 1950 to 2005, but only data from November to March in each year, claiming that summertime introduces spurious links.

    3. Separately create climate networks for El Niño and La Niña time periods (defined as those where some "normalized ENSO index", with standard deviation 1, is either $\gt 1$ or $\le -1$).

    4. Create a link between grid points when their Pearson correlation has absolute value greater than $0.5$.

    They get this map for La Niña conditions:

    and this map for El Niño conditions:

    They conclude that "El Niño breaks climate links". That is, it generally reduces correlations between temperatures at different points worldwide.

    Comment Source:Continuing with my talk draft: 10) We can also look at how climate networks change with time - in particular, how they are affected by El Ni&ntilde;os. This is the subject of a 2008 paper by Tsonis and Swanson, [Topology and predictability of El Ni&ntilde;o and La Ni&ntilde;a networks](https://pantherfile.uwm.edu/aatsonis/www/publications/2008-06_Tsonis-AA_TopologyandPredictabilityofElNinoandLaNinaNetworks-2.pdf). They create a climate network in a way that's similar to the one I just described, except they: 1. Use surface air temperature data from a different source: the National Center for Environmental Prediction. 2. Use data from 1950 to 2005, but only data from November to March in each year, claiming that summertime introduces spurious links. 3. Separately create climate networks for El Ni&ntilde;o and La Ni&ntilde;a time periods (defined as those where some "normalized ENSO index", with standard deviation 1, is either $\gt 1$ or $\le -1$). 4. Create a link between grid points when their Pearson correlation has absolute value greater than $0.5$. They get this map for La Ni&ntilde;a conditions: <img width = "500" src = "http://math.ucr.edu/home/baez/climate_networks/climate_backbone_la_nina_tsonis.jpg" alt = ""/> and this map for El Ni&ntilde;o conditions: <img width = "500" src = "http://math.ucr.edu/home/baez/climate_networks/climate_backbone_el_nino_tsonis.jpg" alt = ""/> They conclude that "El Ni&ntilde;o breaks climate links". That is, it generally reduces correlations between temperatures at different points worldwide.
  • 96.

    11) It's interesting that while the El Niño basin is the region with the most climate links, the El Niño phenomenon breaks climate links. It's not a contradiction. It simply means that the El Niño basin has a lot of highly correlated temperatures at different locations... but when there's an El Niño, the temperature field of the whole planet becomes more "erratic", with less correlation at different points.

    Here's another figure from Tsonis and Swanson's paper, illustrating this:

    Comment Source:11) It's interesting that while the El Ni&ntilde;o basin is the region with the most climate links, the El Ni&ntilde;o phenomenon _breaks_ climate links. It's not a contradiction. It simply means that the El Ni&ntilde;o basin has a lot of highly correlated temperatures at different locations... but when there's an El Ni&ntilde;o, the temperature field of the whole planet becomes more "erratic", with less correlation at different points. Here's another figure from Tsonis and Swanson's paper, illustrating this: <img src = "http://math.ucr.edu/home/baez/climate_networks/degree_distribution_tsonis.jpg" alt = ""/>
  • 97.

    I will probably want to keep reusing and improving this talk.

    Hello John

    Let me make some CDFs for you for educational purposes that actually teaches something using actual data, it would be something the students download, for free, and enhances your presentation and most importantly has high education value specially with machine learning approach and other related math.

    Comment Source:>I will probably want to keep reusing and improving this talk. Hello John Let me make some CDFs for you for educational purposes that actually teaches something using actual data, it would be something the students download, for free, and enhances your presentation and most importantly has high education value specially with machine learning approach and other related math.
  • 98.
    edited November 2014

    12) There's much more to say, but I'll focus on one tantalizing idea: that we might use climate networks to predict El Niños. This would be great, because El Niños have huge effects on agriculture, especially around the Pacific ocean.

    It's generally regarded as very hard to predict El Niños more than 6 months in advance. There is also a spring barrier - it's harder to predict El Niños through the spring of any year. It's controversial how much of the unpredictability in the ENSO cycle is due to chaos intrinsic to the Pacific ocean system, and how much is due to noise from outside the system. Both may be involved.

    There are many teams trying to predict El Niños, some using physical models of the Earth's climate, and others using machine learning techniques. There is a kind of competition going on, which you can see at a National Oceanic and Atmospheric Administration website.

    The most recent predictions give a sense of how hard this job is:

    The Niño 3.4 index is the mean temperature of a certain patch of the Pacific ocean, minus its average for that time of year. When its 3-month running average exceeds 0.5°C for 5 months, people say there's an El Niño.

    As you can see, it's hard to be sure if there will be an El Niño early next year! However, the consensus forecast is yes, a weak El Niño. This is the best we can do, now. Right now multi-model ensembles have better predictive skill than any one model.

    Comment Source:12) There's much more to say, but I'll focus on one tantalizing idea: that we might use climate networks to _predict_ El Ni&ntilde;os. This would be great, because El Ni&ntilde;os have huge effects on agriculture, especially around the Pacific ocean. It's generally regarded as very hard to predict El Ni&ntilde;os more than 6 months in advance. There is also a **spring barrier** - it's harder to predict El Ni&ntilde;os through the spring of any year. It's controversial how much of the unpredictability in the ENSO cycle is due to chaos intrinsic to the Pacific ocean system, and how much is due to noise from outside the system. Both may be involved. There are many teams trying to predict El Ni&ntilde;os, some using physical models of the Earth's climate, and others using machine learning techniques. There is a kind of competition going on, which you can see at a [National Oceanic and Atmospheric Administration website](http://www.pmel.noaa.gov/tao/elnino/forecasts.html). The most recent predictions give a sense of how hard this job is: <img src = "http://math.ucr.edu/home/baez/climate_networks/2014-10-16-Nino3.4-predictions.jpg" alt = ""/> The Ni&ntilde;o 3.4 index is the mean temperature of a certain patch of the Pacific ocean, minus its average for that time of year. When its 3-month running average exceeds 0.5&deg;C for 5 months, people say there's an **El Ni&ntilde;o**. As you can see, it's hard to be sure if there will be an El Ni&ntilde;o early next year! However, the consensus forecast is _yes, a weak El Ni&ntilde;o_. This is the best we can do, now. Right now multi-model ensembles have better predictive skill than any one model.
  • 99.
    edited November 2014

    Dara wrote:

    Let me make some CDFs for you for educational purposes that actually teaches something using actual data, it would be something the students download, for free, and enhances your presentation and most importantly has high education value specially with machine learning approach and other related math.

    That sounds great, though I don't know what a CDF is.

    (In probability theory, it's a "cumulative distribution function". I hate acronyms!)

    I won't be very interested in this until after my talk. But after that I will be. Let's talk about it more then!

    Comment Source:Dara wrote: > Let me make some CDFs for you for educational purposes that actually teaches something using actual data, it would be something the students download, for free, and enhances your presentation and most importantly has high education value specially with machine learning approach and other related math. That sounds great, though I don't know what a CDF is. (In probability theory, it's a "cumulative distribution function". I hate acronyms!) I won't be very interested in this until after my talk. But after that I will be. Let's talk about it more then!
  • 100.

    After your talk John.

    Comment Source:After your talk John.
Sign In or Register to comment.