It looks like you're new here. If you want to get involved, click one of these buttons!

- All Categories 2.3K
- Chat 502
- Study Groups 21
- Petri Nets 9
- Epidemiology 4
- Leaf Modeling 2
- Review Sections 9
- MIT 2020: Programming with Categories 51
- MIT 2020: Lectures 20
- MIT 2020: Exercises 25
- Baez ACT 2019: Online Course 339
- Baez ACT 2019: Lectures 79
- Baez ACT 2019: Exercises 149
- Baez ACT 2019: Chat 50
- UCR ACT Seminar 4
- General 72
- Azimuth Code Project 110
- Statistical methods 4
- Drafts 10
- Math Syntax Demos 15
- Wiki - Latest Changes 3
- Strategy 113
- Azimuth Project 1.1K
- - Spam 1
- News and Information 148
- Azimuth Blog 149
- - Conventions and Policies 21
- - Questions 43
- Azimuth Wiki 715

## Comments

Hello Graham

Could you kindly, if you have time and no pressure, to give me a simple csv or text file of your link computations, that exactly matches the anomalies file John has in his github account, I left the link on my write ups.

I just cannot follow the discussions, just want to incorporate it into the input vector of the ML algorithms I am doing, see if any improvement is obtained.

My assumption is that it should be the same length as the 3.4 anomalies, else no idea how to use it

`Hello Graham Could you kindly, if you have time and no pressure, to give me a simple csv or text file of your link computations, that exactly matches the anomalies file John has in his github account, I left the link on my write ups. I just cannot follow the discussions, just want to incorporate it into the input vector of the ML algorithms I am doing, see if any improvement is obtained. My assumption is that it should be the same length as the 3.4 anomalies, else no idea how to use it`

Dara, John used my code to make a file of link strengths which Daniel has been using. Its every ten days, not every calendar month. Daniel is probably more help than me, since he's recently done some ten-day to month conversion, and I'm too busy with other things right now.

`Dara, John used my code to make a file of link strengths which Daniel has been using. Its every ten days, not every calendar month. Daniel is probably more help than me, since he's recently done some ten-day to month conversion, and I'm too busy with other things right now.`

Trade winds can act to pile up water on one side of the Pacific and the result is delayed as the imbalance propagates downward into the thermocline. This is very much a collective mode that has a continuous sense of lag within the greater volume. That is part of the reason that I transitioned to using the equations of hydrodynamics that model sloshing of liquid volumes, to at least keep it in line with a physical model of behavior.

The analogy is that if I was looking at a standing wave on one end of a waveguide and expecting that end to tell me more about the dynamics than the other end of the waveguide. It is a standing wave mode, so every segment of the waveguide really contains

identicalinformation, with nothing really leading or lagging except in a continuous fashion. The collective is what really matters and how the external forcing modifies the collective behavior.That's also why all the armchair prognosticating never seems to work. Someone point out that winds are picking up, indicating some pending change, but they don't know if that is the result of the change already happening or if there are already changes occurring underneath the waters that have sufficient inertia to resist those changes temporarily.

In terms of one's own intuition, consider how sloshing is one of the most difficult behaviors to control . Think about carrying a big bowl of water around and trying to prevent sloshing once it gets in motion. It takes great practice to know what counter-motions are needed to damp out the sloshing and most of those are counter-intuitive.

`Trade winds can act to pile up water on one side of the Pacific and the result is delayed as the imbalance propagates downward into the thermocline. This is very much a collective mode that has a continuous sense of lag within the greater volume. That is part of the reason that I transitioned to using the equations of hydrodynamics that model sloshing of liquid volumes, to at least keep it in line with a physical model of behavior. The analogy is that if I was looking at a standing wave on one end of a waveguide and expecting that end to tell me more about the dynamics than the other end of the waveguide. It is a standing wave mode, so every segment of the waveguide really contains *identical* information, with nothing really leading or lagging except in a continuous fashion. The collective is what really matters and how the external forcing modifies the collective behavior. That's also why all the armchair prognosticating never seems to work. Someone point out that winds are picking up, indicating some pending change, but they don't know if that is the result of the change already happening or if there are already changes occurring underneath the waters that have sufficient inertia to resist those changes temporarily. In terms of one's own intuition, consider how sloshing is one of the most difficult behaviors to control . Think about carrying a big bowl of water around and trying to prevent sloshing once it gets in motion. It takes great practice to know what counter-motions are needed to damp out the sloshing and most of those are counter-intuitive.`

Jim wrote:

As far as I know, anyone can sign up to join ResearchGate by going here. I haven't joined, though.

`Jim wrote: > The Kawale paper is on ResearchGate to which us non-academics don’t have access [...] As far as I know, anyone can sign up to join ResearchGate by going [here](http://www.researchgate.net/). I haven't joined, though.`

Dara wrote:

All the link strengths Graham computed using Ludescher

et al's algorithm are available on my Github site. There are two files:average-link-strength.txt. This file has the average link strength S as computed by ludescher.R at 10-day intervals, starting from day 730 and going until day 12040, where day 1 is the first of January 1948.

average-link-strength-1948-2013.txt. The second column in this file lists the average link strengths S as computed by ludescher.R at 10-day intervals, starting from day 730, and going until day 24090, where day 1 is the first of January 1948. The first column numbers these items from 1 to 2337.

I will now copy these to the Azimuth Project github site. We should try to put data there.

`Dara wrote: > Could you kindly, if you have time and no pressure, to give me a simple csv or text file of your link computations... All the link strengths Graham computed using Ludescher _et al_'s algorithm are available [on my Github site](https://github.com/johncarlosbaez/el-nino). There are two files: * [average-link-strength.txt](https://github.com/johncarlosbaez/el-nino/blob/master/R/average-link-strength.txt). This file has the average link strength S as computed by ludescher.R at 10-day intervals, starting from day 730 and going until day 12040, where day 1 is the first of January 1948. * [average-link-strength-1948-2013.txt](https://github.com/johncarlosbaez/el-nino/blob/master/R/average-link-strength-1948-2013.txt). The second column in this file lists the average link strengths S as computed by ludescher.R at 10-day intervals, starting from day 730, and going until day 24090, where day 1 is the first of January 1948. The first column numbers these items from 1 to 2337. I will now copy these to the [Azimuth Project github site](https://github.com/azimuth-project). We should try to put data there.`

Thanx John

I have enough time before traveling Sunday to incorporate these links into the ML input vectors, I might ask a bunch of dumb questions ... sorry

Dara

`Thanx John I have enough time before traveling Sunday to incorporate these links into the ML input vectors, I might ask a bunch of dumb questions ... sorry Dara`

Daniel wrote:

These aren't exactly Graham's results - these are the results of Ludescher

et al, which I explained in El Niño project (part 3). Graham redid their calculation of the link strengths, and began to analyze their work.For some more critical analysis of what Ludescher

et aldid, read Steve Wenner's article El Niño project (part 6). They predict an El Niño in the next calendar year if the link strength exceeds 2.82. They claim about a 75% success rate. But as Steve pointed out, their results aren't extremely significant. I'm focusing on their paper not because I think it's good, but because I have to give a talk on climate networks, in 2 weeks, and this is the most famous paper on climate networks.Yes! More precisely, Graham wrote the program which computed the numbers in this file!

`Daniel wrote: > We then need to reconcile Graham's results with my analyses, since I am seeing no meaningful relationship between link strength and nino34. These aren't exactly Graham's results - these are the results of Ludescher _et al_, which I explained in [El Niño project (part 3)](http://johncarlosbaez.wordpress.com/2014/07/01/el-nino-project-part-3/). Graham redid their calculation of the link strengths, and began to analyze their work. For some more critical analysis of what Ludescher _et al_ did, read Steve Wenner's article [El Niño project (part 6)](http://johncarlosbaez.wordpress.com/2014/07/23/el-nino-project-part-6/). They predict an El Niño in the next calendar year if the link strength exceeds 2.82. They claim about a 75% success rate. But as Steve pointed out, their results aren't extremely significant. I'm focusing on their paper not because I think it's good, but because I have to give a talk on climate networks, in 2 weeks, and this is the most famous paper on climate networks. > Was Graham using the link strength numbers from our github file? Yes! More precisely, Graham wrote the program which computed the numbers in this file!`

WebHubTel wrote:

I explained this in El Niño project (part 3), if you're interested. As Graham points out, it may not be a well-motivated definition.

`WebHubTel wrote: > I understand how a dipole is detected, which is looking for correlations that approach -1, but not the link strength. I explained this in [El Niño project (part 3)](http://johncarlosbaez.wordpress.com/2014/07/01/el-nino-project-part-3/), if you're interested. As Graham points out, it may not be a well-motivated definition.`

Okay, let me try to start outlining my talk.

0) I'm very flattered to be invited to speak here. I was probably invited because of my abstract mathematical work on networks and category theory. But when I got the invitation, instead of talking about something I understood, I thought I'd learn about something a bit more practical and talk about that. This was a bad idea. (Laughter, I hope.) But I'll try to make the best of it.

I've been trying to learn climate science. There's a subject called "complex networks" where people do statistical analyses of large graphs like the worldwide web or Facebook and draw conclusions from it. People are trying to apply these ideas to climate science. So that's what I'll talk about. I'll be reviewing a lot of other people's work, but also describing some work by a project I'm involved in, the Azimuth Project. (List of names.)

1) You've probably heard about the "global warming pause". Is this a real thing? If so, is it due to "natural variability", heat going into the deep oceans, some combination of both, or something else?

Here is chart of global average air temperatures at sea level, put together by NASA's Goddard Institute of Space Science:

(What are the green error bars, exactly?) You can see a lot of fluctuations, including a big dip after 1940 and a tiny dip after 2000. That tiny dip is the so-called "global warming pause". What causes these fluctuations? That's a big, complicated question.

`Okay, let me try to start outlining my talk. 0) I'm very flattered to be invited to speak here. I was probably invited because of my abstract mathematical work on networks and category theory. But when I got the invitation, instead of talking about something I understood, I thought I'd learn about something a bit more practical and talk about that. This was a bad idea. (Laughter, I hope.) But I'll try to make the best of it. I've been trying to learn climate science. There's a subject called "complex networks" where people do statistical analyses of large graphs like the worldwide web or Facebook and draw conclusions from it. People are trying to apply these ideas to climate science. So that's what I'll talk about. I'll be reviewing a lot of other people's work, but also describing some work by a project I'm involved in, the Azimuth Project. (List of names.) 1) You've probably heard about the "global warming pause". Is this a real thing? If so, is it due to "natural variability", heat going into the deep oceans, some combination of both, or something else? Here is chart of global average air temperatures at sea level, put together by NASA's Goddard Institute of Space Science: <img src = "http://math.ucr.edu/home/baez/climate_networks/gistemp_1880-2013.jpg" alt = ""/> (What are the green error bars, exactly?) You can see a lot of fluctuations, including a big dip after 1940 and a tiny dip after 2000. That tiny dip is the so-called "global warming pause". What causes these fluctuations? That's a big, complicated question.`

3) One cause of temperature fluctuations is a kind of cycle whose extremes are called El Niño and La Niña.

(Graph from NOAA.)

`3) One cause of temperature fluctuations is a kind of cycle whose extremes are called El Niño and La Niña. <img width = "500" src = "http://math.ucr.edu/home/baez/climate_networks/ENSO_global_temperature_anomalies.png" alt = ""/> (Graph from [NOAA](http://www.ncdc.noaa.gov/sotc/global/2012/13).)`

4) A lot of things happen during an El Niño. For example, in 1997 and 1998, a big El Niño, we saw all these events:

(Map from Shrimp News International.)

El Niño is part of an irregular cycle that happens every 3 to 7 years, called the

El Niño Southern OscillationorENSO.During the hot phase,

El Niño, hot water in the Pacific moves from near Asia toward California, and air pressures get higher near that part of the Pacific. During the cool phase,La Niña, the water near California cools down, and air pressures get higher near the coast of Asia.`4) A lot of things happen during an El Niño. For example, in 1997 and 1998, a big El Niño, we saw all these events: <img src = "http://math.ucr.edu/home/baez/climate_networks/ElNinoMap1998.jpg" alt = ""/> (Map from [Shrimp News International](https://www.shrimpnews.com/FreeReportsFolder/WeatherFolder/ElNino.html).) El Niño is part of an irregular cycle that happens every 3 to 7 years, called the **El Niño Southern Oscillation** or **ENSO**. During the hot phase, **El Niño**, hot water in the Pacific moves from near Asia toward California, and air pressures get higher near that part of the Pacific. During the cool phase, **La Niña**, the water near California cools down, and air pressures get higher near the coast of Asia.`

5) El Niños are important because they can cause billions of dollars of economic damage.

They also seem to bring heat stored in the deeper waters of the Pacific into the atmosphere. So, one reason for the "global warming pause" may be that we haven't had a strong El Niño since 1998. The global warming pause might end with the next El Niño. For a while it seemed we were due for a big one this fall, but that hasn't happened.

`5) El Niños are important because they can cause billions of dollars of economic damage. They also seem to bring heat stored in the deeper waters of the Pacific into the atmosphere. So, one reason for the "global warming pause" may be that we haven't had a strong El Niño [since 1998](http://ggweather.com/enso/oni.htm). The global warming pause might end with the next El Niño. For a while it seemed we were due for a big one this fall, but that hasn't happened.`

6) The ENSO cycle is just one of many cycles involving

teleconnections- strong correlations between weather at distant locations, typically thousands of kilometers. People have systematically looked for these teleconnections using principal component analysis of climate data, and also other techniques.The ENSO cycle shows up automatically when you do this kind of study. It stands out as the biggest source of climate variability on time scales greater than a year and less than a decade.

(Flesh this out; include some pretty pictures and name the second biggest one.)

`6) The ENSO cycle is just one of many cycles involving **teleconnections** - strong correlations between weather at distant locations, typically thousands of kilometers. People have systematically looked for these teleconnections using [principal component analysis](https://en.wikipedia.org/wiki/Principal_component_analysis) of climate data, and also other techniques. The ENSO cycle shows up automatically when you do this kind of study. It stands out as the biggest source of climate variability on time scales greater than a year and less than a decade. (Flesh this out; include some pretty pictures and name the second biggest one.)`

I am aware of that, but as I remember there was a lot of discussion as to what data and algorithms Ludescher

et alactually used. The discrepancy between my results and theirs could have stemmed from differences/problems in the data. Knowing that Graham reproduced their result with the same data and known algorithms was very important since it narrowed down the possible sources of the discrepancy which made it easier to track down the problem (not subtracting out the mean from the link strength signal #39)`>> We then need to reconcile Graham's results with my analyses, since I am seeing no meaningful relationship between link strength and nino34. > These aren't exactly Graham's results - these are the results of Ludescher _et al_, which I explained in [El Niño project (part 3)](http://johncarlosbaez.wordpress.com/2014/07/01/el-nino-project-part-3/). Graham redid their calculation of the link strengths, and began to analyze their work. I am aware of that, but as I remember there was a lot of discussion as to what data and algorithms Ludescher _et al_ actually used. The discrepancy between my results and theirs could have stemmed from differences/problems in the data. Knowing that Graham reproduced their result with the same data and known algorithms was very important since it narrowed down the possible sources of the discrepancy which made it easier to track down the problem (not subtracting out the mean from the link strength signal [#39](http://forum.azimuthproject.org/discussion/1523/crunch-time/?Focus=13485#Comment_13485))`

The observations in #39 and #41 are more interesting then I initially realized. Since the current nino34 and current link strength are both correlated with nino34 in the future but not with each other, means that they carry independent information about future nino34. So while the link strength is a weak predictor it can be valuable because of its independence. This is born out combining the current nino34 and link strength in a model:

This model has an $R^2$ of .22 compared to .16 and 0.9 of from the individual signals. It is fairly close to the sum of the $R^2$ of the individual signal models minus the $R^2$ of the model between them (0.016). This suggests that the information in links strength is all new, not just degraded information of the current enso34. While $R^2=.22$ is still lower then some the other models, it is starting to catch up, and it is kind of impressive for a liner model based on just 2 numbers.

`The observations in [#39](http://forum.azimuthproject.org/discussion/1523/crunch-time/?Focus=13485#Comment_13485) and [#41](http://forum.azimuthproject.org/discussion/1523/crunch-time/?Focus=13487#Comment_13487) are more interesting then I initially realized. Since the current nino34 and current link strength are both correlated with nino34 in the future but not with each other, means that they carry independent information about future nino34. So while the link strength is a weak predictor it can be valuable because of its independence. This is born out combining the current nino34 and link strength in a model: > summary(lm(d$ANOM[7:nrow(d)] ~ d$ANOM[1:(nrow(d)-6)] + d$link[1:(nrow(d)-6)])) Call: lm(formula = d$ANOM[7:nrow(d)] ~ d$ANOM[1:(nrow(d) - 6)] + d$link[1:(nrow(d) - 6)]) Residuals: Min 1Q Median 3Q Max -2.04005 -0.46827 -0.01113 0.46492 2.05495 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.01309 0.02620 0.500 0.618 d$ANOM[1:(nrow(d) - 6)] 0.36617 0.03206 11.423 < 2e-16 *** d$link[1:(nrow(d) - 6)] 0.65109 0.08501 7.659 5.71e-14 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.7231 on 759 degrees of freedom Multiple R-squared: 0.222, Adjusted R-squared: 0.22 F-statistic: 108.3 on 2 and 759 DF, p-value: < 2.2e-16 This model has an $R^2$ of .22 compared to .16 and 0.9 of from the individual signals. It is fairly close to the sum of the $R^2$ of the individual signal models minus the $R^2$ of the model between them (0.016). This suggests that the information in links strength is all new, not just degraded information of the current enso34. While $R^2=.22$ is still lower then some the other models, it is starting to catch up, and it is kind of impressive for a liner model based on just 2 numbers.`

Daniel wrote:

Great! Your comment 66 looks really interesting!

Do you remember where I can see what values of $R^2$ other models achieve?

Summarizing, just so we're all on the same page:

There was never much question about data Ludescher

et alused, since they said quite precisely, and it's a well-known data set. The controversy over the algorithm they used centered on one particular issue, described in detail here. Briefly, we don't believe they meant what they said in one of their formulas. Using a charitable interpretation of their formula, which we described, Graham was able toalmostbutnot quitereplicate their graph of link strengths. He cameveryclose, and we don't know the reason for the small discrepancy.`Daniel wrote: > Knowing that Graham reproduced their result with the same data and known algorithms was very important since it narrowed down the possible sources of the discrepancy which made it easier to track down the problem (not subtracting out the mean from the link strength signal Great! Your comment [66](http://forum.azimuthproject.org/discussion/1523/crunch-time/?Focus=13526#Comment_13526) looks really interesting! > $R^2=.22$ is still lower then some the other models, it is starting to catch up, and it is kind of impressive for a linear model based on just 2 numbers. Do you remember where I can see what values of $R^2$ other models achieve? > I remember there was a lot of discussion as to what data and algorithms Ludescher _et al_ actually used. Summarizing, just so we're all on the same page: There was never much question about data Ludescher _et al_ used, since they said quite precisely, and it's a well-known data set. The controversy over the algorithm they used centered on one particular issue, [described in detail here](http://www.azimuthproject.org/azimuth/show/Blog+-+El+Ni%C3%B1o+project+%28part+3%29#Nuances). Briefly, we don't believe they meant what they said in one of their formulas. Using a charitable interpretation of their formula, which we described, Graham was able to _almost_ but _not quite_ replicate their graph of link strengths. He came _very_ close, and we don't know the reason for the small discrepancy.`

Continuing the draft of my talk:

6) The ENSO cycle is just one of many cycles involving

teleconnections- strong correlations between weather at distant locations, typically thousands of kilometers. People have systematically looked for these teleconnections using principal component analysis of climate data, and also other techniques.The ENSO cycle shows up automatically when you do this kind of study. It stands out as the biggest source of climate variability on time scales greater than a year and less than a decade. Some others include:

(I won't say all the following stuff; it's just notes to myself)

The Pacific-North America Index: one of the most prominent modes of low-frequency variability in the Northern Hemisphere extratropics, appearing in all months except June and July. The PNA is seen in a quadrupole pattern of 500 millibar height anomalies, with anomalies of similar sign located south of the Aleutian Islands and over the southeastern United States. Anomalies with sign opposite to the Aleutian center are located in the vicinity of Hawaii, and over the intermountain region of North America (central Canada) during the winter and fall.

The Pacific Decadal Oscillation is a longer-lived El Niño-like pattern of Pacific climate variability (Zhang et al. 1997). As seen with the better-known El Niño/Southern Oscillation (ENSO), extremes in the PDO pattern are marked by widespread variations in the Pacific Basin and the North American climate. In parallel with the ENSO phenomenon, the extreme phases of the PDO have been classified as being either warm or cool. When SSTs are anomalously cool in the interior North Pacific and warm along the Pacific Coast, and when sea level pressures are below average over the North Pacific, the PDO has a positive value. When the climate anomaly patterns are reversed, with warm SST anomalies in the interior and cool SST anomalies along the North American coast, or above average sea level pressures over the North Pacific, the PDO has a negative value.

The North Atlantic Oscillation index is based on the surface sea-level pressure difference between the Subtropical (Azores) High and the Subpolar Low. The positive phase of the NAO reflects below-normal heights and pressure across the high latitudes of the North Atlantic and above-normal heights and pressure over the central North Atlantic, the eastern United States and western Europe. The negative phase reflects an opposite pattern of height and pressure anomalies over these regions. Both phases of the NAO are associated with basin-wide changes in the intensity and location of the North Atlantic jet stream and storm track, and in large-scale modulations of the normal patterns of zonal and meridional heat and moisture transport, which in turn results in changes in temperature and precipitation patterns often extending from eastern North America to western and central Europe. Strong positive phases of the NAO tend to be associated with above-normal temperatures in the eastern United States and across northern Europe and below-normal temperatures in Greenland and oftentimes across southern Europe and the Middle East. They are also associated with above-normal precipitation over northern Europe and Scandinavia and below-normal precipitation over southern and central Europe. Opposite patterns of temperature and precipitation anomalies are typically observed during strong negative phases of the NAO.

The Arctic Oscillation is a large scale mode of climate variability, also referred to as the Northern Hemisphere annular mode. The AO is a climate pattern characterized by winds circulating counterclockwise around the Arctic at around 55°N latitude. When the AO is in its positive phase, a ring of strong winds circulating around the North Pole acts to confine colder air across polar regions. This belt of winds becomes weaker and more distorted in the negative phase of the AO, which allows an easier southward penetration of colder Arctic air masses and increased storminess into the mid-latitudes.

(Include some pretty pictures and see if I can find work that detects these teleconnections starting from a pile of climate data. An old technique is principal component analysis, known in this field as

empirical orthogonal functions. There's a paper by Kutzbach on this stuff.)`Continuing the draft of my talk: 6) The ENSO cycle is just one of many cycles involving **teleconnections** - strong correlations between weather at distant locations, typically thousands of kilometers. People have systematically looked for these teleconnections using [principal component analysis](https://en.wikipedia.org/wiki/Principal_component_analysis) of climate data, and also other techniques. The ENSO cycle shows up automatically when you do this kind of study. It stands out as the biggest source of climate variability on time scales greater than a year and less than a decade. Some others include: (I won't say all the following stuff; it's just notes to myself) * The [Pacific-North America Index](http://www.ncdc.noaa.gov/teleconnections/pna.php): one of the most prominent modes of low-frequency variability in the Northern Hemisphere extratropics, appearing in all months except June and July. The PNA is seen in a quadrupole pattern of 500 millibar height anomalies, with anomalies of similar sign located south of the Aleutian Islands and over the southeastern United States. Anomalies with sign opposite to the Aleutian center are located in the vicinity of Hawaii, and over the intermountain region of North America (central Canada) during the winter and fall. * The [Pacific Decadal Oscillation](http://www.ncdc.noaa.gov/teleconnections/pdo/) is a longer-lived El Niño-like pattern of Pacific climate variability (Zhang et al. 1997). As seen with the better-known El Niño/Southern Oscillation (ENSO), extremes in the PDO pattern are marked by widespread variations in the Pacific Basin and the North American climate. In parallel with the ENSO phenomenon, the extreme phases of the PDO have been classified as being either warm or cool. When SSTs are anomalously cool in the interior North Pacific and warm along the Pacific Coast, and when sea level pressures are below average over the North Pacific, the PDO has a positive value. When the climate anomaly patterns are reversed, with warm SST anomalies in the interior and cool SST anomalies along the North American coast, or above average sea level pressures over the North Pacific, the PDO has a negative value. * The [North Atlantic Oscillation](http://www.ncdc.noaa.gov/teleconnections/nao.php) index is based on the surface sea-level pressure difference between the Subtropical (Azores) High and the Subpolar Low. The positive phase of the NAO reflects below-normal heights and pressure across the high latitudes of the North Atlantic and above-normal heights and pressure over the central North Atlantic, the eastern United States and western Europe. The negative phase reflects an opposite pattern of height and pressure anomalies over these regions. Both phases of the NAO are associated with basin-wide changes in the intensity and location of the North Atlantic jet stream and storm track, and in large-scale modulations of the normal patterns of zonal and meridional heat and moisture transport, which in turn results in changes in temperature and precipitation patterns often extending from eastern North America to western and central Europe. Strong positive phases of the NAO tend to be associated with above-normal temperatures in the eastern United States and across northern Europe and below-normal temperatures in Greenland and oftentimes across southern Europe and the Middle East. They are also associated with above-normal precipitation over northern Europe and Scandinavia and below-normal precipitation over southern and central Europe. Opposite patterns of temperature and precipitation anomalies are typically observed during strong negative phases of the NAO. * [The Arctic Oscillation](http://www.ncdc.noaa.gov/teleconnections/ao.php) is a large scale mode of climate variability, also referred to as the Northern Hemisphere annular mode. The AO is a climate pattern characterized by winds circulating counterclockwise around the Arctic at around 55°N latitude. When the AO is in its positive phase, a ring of strong winds circulating around the North Pole acts to confine colder air across polar regions. This belt of winds becomes weaker and more distorted in the negative phase of the AO, which allows an easier southward penetration of colder Arctic air masses and increased storminess into the mid-latitudes. (Include some pretty pictures and see if I can find work that detects these teleconnections starting from a pile of climate data. An old technique is principal component analysis, known in this field as **empirical orthogonal functions**. There's a paper by [Kutzbach](http://200.132.77.47:8080/aulas/acp/ACP_curso/Referencias/5Kutzbach_1967.pdf) on this stuff.)`

Thanx Daniel for the links.

I interlaced the links to the anom numbers:

link_1, anom_1, link_2, anom_2, ... <----- (

John is that ok?)And then did the k-NN regression, and error more than doubled.

Still checking the code so note this with caution, but clearly the self-similarity of the augmented link data decreases for all distance functions.I am sure same will happen with NN and SVR, but requires much more time to test.

`Thanx Daniel for the links. I interlaced the links to the anom numbers: link_1, anom_1, link_2, anom_2, ... <----- (**John is that ok?**) And then did the k-NN regression, and error more than doubled. **Still checking the code so note this with caution**, but clearly the self-similarity of the augmented link data decreases for all distance functions. I am sure same will happen with NN and SVR, but requires much more time to test.`

This is cross-correlation and autocorrelation of Tahiti and Darwin signals,

Note where the 4 upper arrows are. These are at 7 year intervals. The QBO oscillations are well-characterized as a 28 months or 2.33 year period. This fits 3 QBO periods to a 7 year interval, giving alignment at an integral multiple of a QBO

andsynchronization to a multiple year.I did this to show how Daniel's work fits in with dipole cross-correlation and hotspot autocorrelation.

`This is cross-correlation and autocorrelation of Tahiti and Darwin signals, ![xcorr](http://imageshack.com/a/img537/4074/jZCphA.gif) Note where the 4 upper arrows are. These are at 7 year intervals. The QBO oscillations are well-characterized as a 28 months or 2.33 year period. This fits 3 QBO periods to a 7 year interval, giving alignment at an integral multiple of a QBO *and* synchronization to a multiple year. I did this to show how Daniel's work fits in with dipole cross-correlation and hotspot autocorrelation.`

I interlaced the links to the anom numbers:

link_1, anom_1, link_2, anom_2, … <—– (John is that ok?)

In order to compare apples to apples, for no link I interlaced 0

0, anom_1, 0, anom_2, … <—– (John is that ok?)

Negligible difference between all metrics for k-NN regression

`I interlaced the links to the anom numbers: link_1, anom_1, link_2, anom_2, … <—– (John is that ok?) In order to compare apples to apples, for no link I interlaced 0 0, anom_1, 0, anom_2, … <—– (John is that ok?) Negligible difference between all metrics for k-NN regression`

Dara wrote:

If I understand you correctly, this sounds like a bad idea.

The file average-link-strength-1948-2013.txt gives link strengths at

10-day intervals, starting from day 730, and going until day 24090, where day 1 is the first of January 1948.The file nino3.4-anoms.txt gives the Nino 3.4 index in the column called ANOM at

monthly intervalsstarting from January 1950.So, if you interlace these numbers, they will start very slightly out of synch and then get very

badlyout of synch as one goes forward in 10-day intervals while the other goes forward in monthly intervals.`Dara wrote: > I interlaced the links to the anom numbers: > link_1, anom_1, link_2, anom_2, … <—– (John is that ok?) If I understand you correctly, this sounds like a bad idea. The file [average-link-strength-1948-2013.txt](https://github.com/azimuth-project/el-nino/blob/master/average-link-strength-1948-2013.txt) gives link strengths at _10-day intervals_, starting from day 730, and going until day 24090, where day 1 is the first of January 1948. The file [nino3.4-anoms.txt](https://github.com/johncarlosbaez/el-nino/blob/master/R/nino3.4-anoms.txt) gives the Nino 3.4 index in the column called ANOM at _monthly intervals_ starting from January 1950. So, if you interlace these numbers, they will start very slightly out of synch and then get very _badly_ out of synch as one goes forward in 10-day intervals while the other goes forward in monthly intervals.`

John

Daniel sent a csv file, which had the links month matched, I used that, otherwise interlacing and filling the gaps with 0s to make the input vector lengths constant, will be a bad idea as well.

So I need to figure out how to include the link numbers with the actual inputs.

I can retest each ML algorithm then, perhaps 2 days of training and testing

Dara

`John Daniel sent a csv file, which had the links month matched, I used that, otherwise interlacing and filling the gaps with 0s to make the input vector lengths constant, will be a bad idea as well. So I need to figure out how to include the link numbers with the actual inputs. I can retest each ML algorithm then, perhaps 2 days of training and testing Dara`

this is the file Daniel kindly sent me

anomaly csv

It seems the links are mapped to the monthly values.

`this is the file Daniel kindly sent me [anomaly csv](http://files.lossofgenerality.com/anomaly6m.csv) It seems the links are mapped to the monthly values.`

Dara wrote:

Okay, that's good. Filling the gaps with 0s would certainly be a bad idea.

I'm curious how Daniel made his file. He's a smart guy so I'm confident he did the right thing, but it's an annoying technical problem, combining monthly data with 10-day data. As a pure mathematician I've never had to face this kind of problem!

Does anyone know an online program that can answer questions like this:

What day is 10490 days after January 1st, 1948?I could use it to check Daniel's work. I could figure out the answer myself, of course, but I'm lazy!

Anyone who knows R a bit better than me could modify Graham's program so that it computes the link strengths

monthly. Even I, with my poor programming skills, might be able to do it. I think that would be the really nice way to solve this problem.`Dara wrote: > Daniel sent a csv file, which had the links month matched, I used that, otherwise interlacing and filling the gaps with 0s to make the input vector lengths constant, will be a bad idea as well. Okay, that's good. Filling the gaps with 0s would certainly be a bad idea. I'm curious how Daniel made his file. He's a smart guy so I'm confident he did the right thing, but it's an annoying technical problem, combining monthly data with 10-day data. As a pure mathematician I've never had to face this kind of problem! Does anyone know an online program that can answer questions like this: **What day is 10490 days after January 1st, 1948?** I could use it to check Daniel's work. I could figure out the answer myself, of course, but I'm lazy! Anyone who knows R a bit better than me could modify Graham's program so that it computes the link strengths _monthly_. Even I, with my poor programming skills, might be able to do it. I think that would be the really nice way to solve this problem.`

I do not mind doing it, sadly I program well in R ...one of those unwanted skills, so you are not missing much ;)

I try to figure that out if you need me to, but let's do that after we try something with Daniel's file.

`>Anyone who knows R a bit better than me could modify Graham’s program so that it computes the link strengths monthly. I do not mind doing it, sadly I program well in R ...one of those unwanted skills, so you are not missing much ;) I try to figure that out if you need me to, but let's do that after we try something with Daniel's file.`

John if I used Daniel's file, then the interlacing I proposed would work?

anom1, link1, anom2, link2, ...

`John if I used Daniel's file, then the interlacing I proposed would work? anom1, link1, anom2, link2, ...`

Continuing my talk draft:

7) Recently people have begun to study teleconnections using ideas from complex network theory.

What's that? In complex network theory, people often start with a

weighted graph: that is, a set $N$ ofnodesand for any pair of nodes $i, j \in N$, aweight$A_{i j}$, which can be any nonnegative real number.Why is this called a weighted graph? It's really just a matrix of nonnegative real numbers!

The reason is that we can turn any weighted graph into a graph by drawing an edge from node $j$ to node $i$ whenever $A_{i j} >0 $. This is a

directedgraph, meaning that we should draw an arrow pointing from $j$ to $i$. We could have an edge from $i$ to $j$ but not vice versa! Note that we can also have an edge from a node to itself.Conversely, if we have any directed graph, we can turn it into a weighted graph by choosing the weight $A_{i j} = 1$ when there's an edge from $j$ to $i$, and $A_{i j} = 0$ otherwise.

For example, we can make a weighted graph where the nodes are web pages and $A_{i j}$ is the number of links from the web page $j$ to the web page $i$.

People in complex network theory like examples of this sort: large weighted graphs that describe connections between web pages, or people, or cities, or neurons, or other things. The goal, so far, is to compute numbers from weighted graphs in ways that describe interesting properties of these complex networks --- and then formulate and test hypotheses about the complex networks we see in real life.

`Continuing my talk draft: 7) Recently people have begun to study teleconnections using ideas from complex network theory. What's that? In complex network theory, people often start with a **weighted graph**: that is, a set $N$ of **nodes** and for any pair of nodes $i, j \in N$, a **weight** $A_{i j}$, which can be any nonnegative real number. Why is this called a weighted graph? It's really just a matrix of nonnegative real numbers! The reason is that we can turn any weighted graph into a graph by drawing an edge from node $j$ to node $i$ whenever $A_{i j} >0 $. This is a **directed** graph, meaning that we should draw an arrow pointing from $j$ to $i$. We could have an edge from $i$ to $j$ but not vice versa! Note that we can also have an edge from a node to itself. Conversely, if we have any directed graph, we can turn it into a weighted graph by choosing the weight $A_{i j} = 1$ when there's an edge from $j$ to $i$, and $A_{i j} = 0$ otherwise. For example, we can make a weighted graph where the nodes are web pages and $A_{i j}$ is the number of links from the web page $j$ to the web page $i$. People in complex network theory like examples of this sort: large weighted graphs that describe connections between web pages, or people, or cities, or neurons, or other things. The goal, so far, is to compute numbers from weighted graphs in ways that describe interesting properties of these complex networks --- and then formulate and test hypotheses about the complex networks we see in real life.`

The Modified Julian Day is the best bet http://en.wikipedia.org/wiki/Julian_day

This starts from Nov 17, 1858 and keeps a running count of the number of days elapsed. It is synchronized with UTC, Coordinated Universal Time (i.e. previously GMT). This only loses Leap Seconds which you can get from GPS time

Here is a converter from UTC to MJD http://www.csgnetwork.com/julianmodifdateconv.html

This is a great little online utility for keeping track of the different clock schemes http://www.leapsecond.com/java/gpsclock.htm

The actual algorithm isn't that difficult and one can use these online programs to verify your code.

`> Does anyone know an online program that can answer questions like this: What day is 10490 days after January 1st, 1948? The Modified Julian Day is the best bet <http://en.wikipedia.org/wiki/Julian_day> This starts from Nov 17, 1858 and keeps a running count of the number of days elapsed. It is synchronized with UTC, Coordinated Universal Time (i.e. previously GMT). This only loses Leap Seconds which you can get from GPS time Here is a converter from UTC to MJD <http://www.csgnetwork.com/julianmodifdateconv.html> This is a great little online utility for keeping track of the different clock schemes <http://www.leapsecond.com/java/gpsclock.htm> The actual algorithm isn't that difficult and one can use these online programs to verify your code.`

Hello John on #78 in this thread, I could make you CDFs with all the math notations and sample such graphs and their matrix reps and so, on to demo on the screen or offer as downalod candy to the listeners

`Hello John on #78 in this thread, I could make you CDFs with all the math notations and sample such graphs and their matrix reps and so, on to demo on the screen or offer as downalod candy to the listeners`

7b) Here's a very simple example of what we can do with a weighted graph. For any node $i$, we can sum up the weights of edges going into $i$:

$$ \sum_{j \in N} A_{j i} $$ This is called the **degree* of the node $i$. For example, if lots of people have web pages with lots of links to yours, your webpage will have a high degree. If lots of people like you on Facebook,

youwill have a high degree.This is some measure of how "important" a node is.

People have constructed climate networks where the nodes are locations on the Earth's surface, and the weight $A_{i j}$ measures how correlated the weather is at the $i$th and $j$th location. Then, the degree says how "important" a given location is for the Earth's climate - in some sense or other.

`7b) Here's a very simple example of what we can do with a weighted graph. For any node $i$, we can sum up the weights of edges going into $i$: $$ \sum_{j \in N} A_{j i} $$ This is called the **degree* of the node $i$. For example, if lots of people have web pages with lots of links to yours, your webpage will have a high degree. If lots of people like you on Facebook, _you_ will have a high degree. This is some measure of how "important" a node is. People have constructed climate networks where the nodes are locations on the Earth's surface, and the weight $A_{i j}$ measures how correlated the weather is at the $i$th and $j$th location. Then, the degree says how "important" a given location is for the Earth's climate - in some sense or other.`

John said (#75)

I took a fairly rough and ready, but I think accurate enough, approach. The code is in the notebook link strength analysis notebook. I described the algorithm in fair amount of detail here

To avoid people having to hop around I will just reproduce it here:

Since there are more 10 day cycles than months the main idea is to group the 10 day periods into "months" and take the average of the values that fall into the same month and since 10 day periods do not exactly line up with months the grouping has to be somewhat approximate anyway.

The logic is that there are 73 10-day periods in 2 non leap years or 24 months. This gives 24 30-day "months" with a 10-day period left over. So I take the 24 "months" and match those with calendar months and drop the remaining 10 day period. This means that my "months" drift in and out of phase with calendar months, but by no more than half a month ie half the resulting sampling period.

There is also the issue with leap year, both the above approximation and John's algorithm pretend leap years do not exist. This means there could be a cumulative drift but I think that both algorithms ignore leap years in a way that happens to cancel out, since John's algorithm is actually producing exactly 73 periods every 2 years even whether there is a leap year or not, and my 30 day "months" then become 31 day months if they include Feb 29. So all that happens is the drifting in an out of phase between the 30-day months and calendar months, which resyncs every 2 years. The code that does this is below

drop every 73 period:

group into 3s and average:

only keep data for the intersection of the data sets:

The code is implicitly assuming that the link strength data and the nino34 data start the same time, beginning of January 1950.

`John said ([#75](http://forum.azimuthproject.org/discussion/1523/crunch-time/?Focus=13541#Comment_13541)) > I’m curious how Daniel made his file. I took a fairly rough and ready, but I think accurate enough, approach. The code is in the notebook link strength analysis notebook. I described the algorithm in fair amount of detail [here](http://forum.azimuthproject.org/discussion/1528/random-forest-el-nino-34-offtheshelf/?Focus=13445#Comment_13445) To avoid people having to hop around I will just reproduce it here: *** Since there are more 10 day cycles than months the main idea is to group the 10 day periods into "months" and take the average of the values that fall into the same month and since 10 day periods do not exactly line up with months the grouping has to be somewhat approximate anyway. The logic is that there are 73 10-day periods in 2 non leap years or 24 months. This gives 24 30-day "months" with a 10-day period left over. So I take the 24 "months" and match those with calendar months and drop the remaining 10 day period. This means that my "months" drift in and out of phase with calendar months, but by no more than half a month ie half the resulting sampling period. There is also the issue with leap year, both the above approximation and John's algorithm pretend leap years do not exist. This means there could be a cumulative drift but I think that both algorithms ignore leap years in a way that happens to cancel out, since John's algorithm is actually producing exactly 73 periods every 2 years even whether there is a leap year or not, and my 30 day "months" then become 31 day months if they include Feb 29. So all that happens is the drifting in an out of phase between the 30-day months and calendar months, which resyncs every 2 years. The code that does this is below drop every 73 period: linktmp=link[arange(len(link))%73 != 0] group into 3s and average: ll= len(linktmp) print ll linkmnth= linktmp.groupby(arange(ll)//3).mean() linkmnth.reset_index() linkx=linkmnth only keep data for the intersection of the data sets: df=nino34.iloc[:len(linkx)] df.index=range(df.shape[0]) print ll, len(linkx), df.shape df["link"]=linkmnth The code is implicitly assuming that the link strength data and the nino34 data start the same time, beginning of January 1950.`

I think Daniel's approach is sound at least for this level of work we are doing, if John agrees I will use his csv file, I already loaded and tested with k-NN last night, could finish testing tonight and post, then move on using it with SVR and NN

`I think Daniel's approach is sound at least for this level of work we are doing, if John agrees I will use his csv file, I already loaded and tested with k-NN last night, could finish testing tonight and post, then move on using it with SVR and NN`

John please let me know if using Daniel's version of data is what you want and if you agree with that data to be interlaced with anom values as I proposed. If so, tell me know if interlacing with 0 makes sense as the test for no link data, in order to compare apples with apples.

`John please let me know if using Daniel's version of data is what you want and if you agree with that data to be interlaced with anom values as I proposed. If so, tell me know if interlacing with 0 makes sense as the test for no link data, in order to compare apples with apples.`

Hello John

Waiting for your response on the interlacing idea based upon Daniel's csv.

John I called on a top Mathematica developer, the guy who developed the Wavelet package for Wolfram, to review my code for the stuff if you are presenting them. I probably release some code to him tonight, and I report back if he sees problems.

I also asked him to develop other algorithms possibly another decision tree machine learning algorithm to classify the samples and use for forecasting.

The

`Hello John Waiting for your response on the interlacing idea based upon Daniel's csv. John I called on a top Mathematica developer, the guy who developed the Wavelet package for Wolfram, to review my code for the stuff if you are presenting them. I probably release some code to him tonight, and I report back if he sees problems. I also asked him to develop other algorithms possibly another decision tree machine learning algorithm to classify the samples and use for forecasting. The`

Dara - sorry to keep you waiting... yes, I'm happy enough with Daniel's file of data, and how he created it. Feel free to go ahead and compute with it! Thanks!

And thanks, Daniel, for re-explaining how you created that file. Nice.

(There are days when I need to focus on teaching, then days when I get to focus on the Forum. Yesterday was a teaching day.)

`Dara - sorry to keep you waiting... yes, I'm happy enough with Daniel's file of data, and how he created it. Feel free to go ahead and compute with it! Thanks! And thanks, Daniel, for re-explaining how you created that file. Nice. (There are days when I need to focus on teaching, then days when I get to focus on the Forum. Yesterday was a teaching day.)`

Hello John

Thanx, I will go ahead and apply them to all the algorithms I already did, we could always change the merging of input data, after all it is called software for a reason :)

`Hello John Thanx, I will go ahead and apply them to all the algorithms I already did, we could always change the merging of input data, after all it is called software for a reason :)`

Continuing with my talk draft:

8) For example, in Complex networks in climate dynamics, Donges

et altake surface air temperature data on a grid and compute the Pearson correlation between grid points.More precisely, let $T_i(t)$ be the temperature at the $i$th grid point at month $t$ after the average for that month in all years has been subtracted off, to eliminate some seasonal variations. The Pearson correlation is then

$$ \langle \left( T_i - \langle T_i \rangle \right)\left( T_j - \langle T_j \rangle \right) $$ divided by

$$ \sqrt{\Big{\langle} (T_i - \langle T_i \rangle)^2 \Big{\rangle}} \; \sqrt{\Big{\langle} (T_j - \langle T_j \rangle)^2 \Big{\rangle}} $$

`Continuing with my talk draft: 8) For example, in [Complex networks in climate dynamics](https://www.pik-potsdam.de/members/kurths/publikationen/2009/complex-networks.pdf), Donges _et al_ take surface air temperature data on a grid and compute the Pearson correlation between grid points. More precisely, let $T_i(t)$ be the temperature at the $i$th grid point at month $t$ after the average for that month in all years has been subtracted off, to eliminate some seasonal variations. The Pearson correlation is then $$ \langle \left( T_i - \langle T_i \rangle \right)\left( T_j - \langle T_j \rangle \right) $$ divided by $$ \sqrt{\Big{\langle} (T_i - \langle T_i \rangle)^2 \Big{\rangle}} \; \sqrt{\Big{\langle} (T_j - \langle T_j \rangle)^2 \Big{\rangle}} $$`

John the person who wrote the Wavelet library for Wolfram agreed to review my code so I did not make any serious mistakes that might embarrass you!

Also I took some interim computations out of the algorithm and you could use them to explain how the algorithm works, since most people cut and paste the Wiki of Machine Learning algorithms and do not have full understanding of them.

I try everything done by Sunday.

`John the person who wrote the Wavelet library for Wolfram agreed to review my code so I did not make any serious mistakes that might embarrass you! Also I took some interim computations out of the algorithm and you could use them to explain how the algorithm works, since most people cut and paste the Wiki of Machine Learning algorithms and do not have full understanding of them. I try everything done by Sunday.`

Thanks!!!!!

`Thanks!!!!!`

Continuing with my talk draft:

8) For example, in Complex networks in climate dynamics, Donges

et altake surface air temperature data on a grid and compute the Pearson correlation between grid points.More precisely, let $T_i(t)$ be the temperature at the $i$th grid point at month $t$ after the average for that month in all years has been subtracted off, to eliminate some seasonal variations. They compute the Pearson correlation of $T_i(t)$ and $T_j(t)$ for each pair of grid points $i, j$. The Pearson correlation is the simplest measure of linear correlation, normalized to range between -1 and 1.

We could construct a weighted graph $A_{i j}$ this way, and it would be symmetric, or undirected:

$$ A_{i j} = A_{j i} $$ However, Donges

et alprefer to work with a graph rather than a weighted graph. So, they create a graph where there is an edge from $i$ to $j$ (and also from $j$ to $i$) when $|A_{i j}|$ exceeds a certain threshold, and no edge otherwise.They can adjust this threshold so that any desired fraction of pairs $i, j$ actually have an edge between them. They choose this fraction to be 0.5%.

Then, they plot the degree of each node, weighted by the area of its grid square:

A certain patch dominates the world! This is the

El Niño basin. The Indian Ocean comes in second.(Some details, which I may not say:

The

Pearson correlationis thecovariance$$ \Big\langle \left( T_i - \langle T_i \rangle \right) \left( T_j - \langle T_j \rangle \right) \Big\rangle$$ normalized by dividing by the standard deviation of $T_i$ and the standard deviation of $T_j$.

The reddest shade of red in this picture shows nodes that are connected to 5% or more of the other nodes. That is, these nodes are connected to at least 10 times as many nodes as average.)

`Continuing with my talk draft: 8) For example, in [Complex networks in climate dynamics](https://www.pik-potsdam.de/members/kurths/publikationen/2009/complex-networks.pdf), Donges _et al_ take surface air temperature data on a grid and compute the Pearson correlation between grid points. More precisely, let $T_i(t)$ be the temperature at the $i$th grid point at month $t$ after the average for that month in all years has been subtracted off, to eliminate some seasonal variations. They compute the Pearson correlation of $T_i(t)$ and $T_j(t)$ for each pair of grid points $i, j$. The Pearson correlation is the simplest measure of linear correlation, normalized to range between -1 and 1. We could construct a weighted graph $A_{i j}$ this way, and it would be symmetric, or undirected: $$ A_{i j} = A_{j i} $$ However, Donges _et al_ prefer to work with a graph rather than a weighted graph. So, they create a graph where there is an edge from $i$ to $j$ (and also from $j$ to $i$) when $|A_{i j}|$ exceeds a certain threshold, and no edge otherwise. They can adjust this threshold so that any desired fraction of pairs $i, j$ actually have an edge between them. They choose this fraction to be 0.5%. Then, they plot the degree of each node, weighted by the area of its grid square: <img width = "500" src = "http://math.ucr.edu/home/baez/climate_networks/area_weighted_connectivity_pearson_correlation_sea_surface_air_temperature_donges.jpg" alt = ""/> A certain patch dominates the world! This is the **El Niño basin**. The Indian Ocean comes in second. (Some details, which I may not say: The **[Pearson correlation](https://en.wikipedia.org/w/index.php?title=Pearson_product-moment_correlation_coefficient)** is the **covariance** $$ \Big\langle \left( T_i - \langle T_i \rangle \right) \left( T_j - \langle T_j \rangle \right) \Big\rangle$$ normalized by dividing by the standard deviation of $T_i$ and the standard deviation of $T_j$. The reddest shade of red in this picture shows nodes that are connected to 5% or more of the other nodes. That is, these nodes are connected to at least 10 times as many nodes as average.)`

9) The Pearson correlation detects linear correlations. A more flexible measure is mutual information: how many bits of information knowing the temperature at time $t$ at grid point $i$ tells you about the temperature at the same time at grid point $j$.

Donges

et alcreate a climate network this way as well, putting an edge between nodes if their mutual information exceeds a certain cutoff. They choose this cutoff so that 0.5% of node pairs have an edge between them, and get the following map:The result is almost indistinguishable in the El Niño basin. So, this feature is not just an artifact of focusing on linear correlations.

`9) The Pearson correlation detects linear correlations. A more flexible measure is mutual information: how many bits of information knowing the temperature at time $t$ at grid point $i$ tells you about the temperature at the same time at grid point $j$. Donges _et al_ create a climate network this way as well, putting an edge between nodes if their mutual information exceeds a certain cutoff. They choose this cutoff so that 0.5% of node pairs have an edge between them, and get the following map: <img width = "500" src = "http://math.ucr.edu/home/baez/climate_networks/area_weighted_connectivity_mutual_information_sea_surface_air_temperature_donges.jpg" alt =""/> The result is almost indistinguishable in the El Niño basin. So, this feature is not just an artifact of focusing on linear correlations.`

John if you have those heatmaps time sliced, I could make you animators to show the time evolution of the weather system.

`John if you have those heatmaps time sliced, I could make you animators to show the time evolution of the weather system.`

Hello John

This is the website of the review engineer:

Mathematica for prediction algorithms

His name is Anton Antonov and used to work for Wolfram Research, he is a great person. I am trying to get him more involved he likes the stuff we are doing here.

So there are no hotshots, we review each other's code and try to find the bugs early on. In the eyes of the bugs programmers are equal brothers.

Dara

`Hello John This is the website of the review engineer: [Mathematica for prediction algorithms](http://mathematicaforprediction.wordpress.com/) His name is Anton Antonov and used to work for Wolfram Research, he is a great person. I am trying to get him more involved he likes the stuff we are doing here. So there are no hotshots, we review each other's code and try to find the bugs early on. In the eyes of the bugs programmers are equal brothers. Dara`

Thanks, Dara! I don't have these heatmaps time sliced; I got them out of the paper by Donges

et al.By the way, after I suffer through this talk in December I will probably want to keep reusing and

improvingthis talk. For example, this spring I'm supposed to give a talk on Skype at York University... and I'll probably talk about climate networks.So, work we do now, which doesn't appear in my December talk, is not wasted! It may even turn into a paper or two someday.

`Thanks, Dara! I don't have these heatmaps time sliced; I got them out of the paper by Donges _et al_. By the way, after I suffer through this talk in December I will probably want to keep reusing and _improving_ this talk. For example, this spring I'm supposed to give a talk on Skype at York University... and I'll probably talk about climate networks. So, work we do now, which doesn't appear in my December talk, is not wasted! It may even turn into a paper or two someday.`

Continuing with my talk draft:

10) We can also look at how climate networks change with time - in particular, how they are affected by El Niños. This is the subject of a 2008 paper by Tsonis and Swanson, Topology and predictability of El Niño and La Niña networks.

They create a climate network in a way that's similar to the one I just described, except they:

Use surface air temperature data from a different source: the National Center for Environmental Prediction.

Use data from 1950 to 2005, but only data from November to March in each year, claiming that summertime introduces spurious links.

Separately create climate networks for El Niño and La Niña time periods (defined as those where some "normalized ENSO index", with standard deviation 1, is either $\gt 1$ or $\le -1$).

Create a link between grid points when their Pearson correlation has absolute value greater than $0.5$.

They get this map for La Niña conditions:

and this map for El Niño conditions:

They conclude that "El Niño breaks climate links". That is, it generally reduces correlations between temperatures at different points worldwide.

`Continuing with my talk draft: 10) We can also look at how climate networks change with time - in particular, how they are affected by El Niños. This is the subject of a 2008 paper by Tsonis and Swanson, [Topology and predictability of El Niño and La Niña networks](https://pantherfile.uwm.edu/aatsonis/www/publications/2008-06_Tsonis-AA_TopologyandPredictabilityofElNinoandLaNinaNetworks-2.pdf). They create a climate network in a way that's similar to the one I just described, except they: 1. Use surface air temperature data from a different source: the National Center for Environmental Prediction. 2. Use data from 1950 to 2005, but only data from November to March in each year, claiming that summertime introduces spurious links. 3. Separately create climate networks for El Niño and La Niña time periods (defined as those where some "normalized ENSO index", with standard deviation 1, is either $\gt 1$ or $\le -1$). 4. Create a link between grid points when their Pearson correlation has absolute value greater than $0.5$. They get this map for La Niña conditions: <img width = "500" src = "http://math.ucr.edu/home/baez/climate_networks/climate_backbone_la_nina_tsonis.jpg" alt = ""/> and this map for El Niño conditions: <img width = "500" src = "http://math.ucr.edu/home/baez/climate_networks/climate_backbone_el_nino_tsonis.jpg" alt = ""/> They conclude that "El Niño breaks climate links". That is, it generally reduces correlations between temperatures at different points worldwide.`

11) It's interesting that while the El Niño basin is the region with the most climate links, the El Niño phenomenon

breaksclimate links. It's not a contradiction. It simply means that the El Niño basin has a lot of highly correlated temperatures at different locations... but when there's an El Niño, the temperature field of the whole planet becomes more "erratic", with less correlation at different points.Here's another figure from Tsonis and Swanson's paper, illustrating this:

`11) It's interesting that while the El Niño basin is the region with the most climate links, the El Niño phenomenon _breaks_ climate links. It's not a contradiction. It simply means that the El Niño basin has a lot of highly correlated temperatures at different locations... but when there's an El Niño, the temperature field of the whole planet becomes more "erratic", with less correlation at different points. Here's another figure from Tsonis and Swanson's paper, illustrating this: <img src = "http://math.ucr.edu/home/baez/climate_networks/degree_distribution_tsonis.jpg" alt = ""/>`

Hello John

Let me make some CDFs for you for educational purposes that actually teaches something using actual data, it would be something the students download, for free, and enhances your presentation and most importantly has high education value specially with machine learning approach and other related math.

`>I will probably want to keep reusing and improving this talk. Hello John Let me make some CDFs for you for educational purposes that actually teaches something using actual data, it would be something the students download, for free, and enhances your presentation and most importantly has high education value specially with machine learning approach and other related math.`

12) There's much more to say, but I'll focus on one tantalizing idea: that we might use climate networks to

predictEl Niños. This would be great, because El Niños have huge effects on agriculture, especially around the Pacific ocean.It's generally regarded as very hard to predict El Niños more than 6 months in advance. There is also a

spring barrier- it's harder to predict El Niños through the spring of any year. It's controversial how much of the unpredictability in the ENSO cycle is due to chaos intrinsic to the Pacific ocean system, and how much is due to noise from outside the system. Both may be involved.There are many teams trying to predict El Niños, some using physical models of the Earth's climate, and others using machine learning techniques. There is a kind of competition going on, which you can see at a National Oceanic and Atmospheric Administration website.

The most recent predictions give a sense of how hard this job is:

The Niño 3.4 index is the mean temperature of a certain patch of the Pacific ocean, minus its average for that time of year. When its 3-month running average exceeds 0.5°C for 5 months, people say there's an

El Niño.As you can see, it's hard to be sure if there will be an El Niño early next year! However, the consensus forecast is

yes, a weak El Niño. This is the best we can do, now. Right now multi-model ensembles have better predictive skill than any one model.`12) There's much more to say, but I'll focus on one tantalizing idea: that we might use climate networks to _predict_ El Niños. This would be great, because El Niños have huge effects on agriculture, especially around the Pacific ocean. It's generally regarded as very hard to predict El Niños more than 6 months in advance. There is also a **spring barrier** - it's harder to predict El Niños through the spring of any year. It's controversial how much of the unpredictability in the ENSO cycle is due to chaos intrinsic to the Pacific ocean system, and how much is due to noise from outside the system. Both may be involved. There are many teams trying to predict El Niños, some using physical models of the Earth's climate, and others using machine learning techniques. There is a kind of competition going on, which you can see at a [National Oceanic and Atmospheric Administration website](http://www.pmel.noaa.gov/tao/elnino/forecasts.html). The most recent predictions give a sense of how hard this job is: <img src = "http://math.ucr.edu/home/baez/climate_networks/2014-10-16-Nino3.4-predictions.jpg" alt = ""/> The Niño 3.4 index is the mean temperature of a certain patch of the Pacific ocean, minus its average for that time of year. When its 3-month running average exceeds 0.5°C for 5 months, people say there's an **El Niño**. As you can see, it's hard to be sure if there will be an El Niño early next year! However, the consensus forecast is _yes, a weak El Niño_. This is the best we can do, now. Right now multi-model ensembles have better predictive skill than any one model.`

Dara wrote:

That sounds great, though I don't know what a CDF is.

(In probability theory, it's a "cumulative distribution function". I hate acronyms!)

I won't be very interested in this until after my talk. But after that I will be. Let's talk about it more then!

`Dara wrote: > Let me make some CDFs for you for educational purposes that actually teaches something using actual data, it would be something the students download, for free, and enhances your presentation and most importantly has high education value specially with machine learning approach and other related math. That sounds great, though I don't know what a CDF is. (In probability theory, it's a "cumulative distribution function". I hate acronyms!) I won't be very interested in this until after my talk. But after that I will be. Let's talk about it more then!`

After your talk John.

`After your talk John.`