Options

Experiments in El Niño analysis and prediction

This page contains a digest of all of the relevant material from the thread on Azimuth initiatives:

EDIT: renamed the page, due to a trailing space which couldn't be removed from the old page name.

It contains the references, along with John's suggestion for how we can get active with the programming for it -- with possible applications to new research.

I'm using "Project -" to indicate an activity page, which can be used to organize and focus our efforts. Such a page shouldn't contain get bogged down with a lot of tutorial material -- only the material which specifically pertains to that project. That's why I created the separate Climate network model for tutorial information on climate networks, and their applications to El Niño signal processing. On this page we want to focus on the how to get things done.

«134

Comments

  • 1.
    edited June 2014

    We had been using "Experiments in" rather than "Project - " for similar initiatives - see Experiments. I will at least put this new page in the category "Experiments".

    But I have something more interesting to say...

    Comment Source:We had been using "Experiments in" rather than "Project - " for similar initiatives - see [[Experiments]]. I will at least put this new page in the category "Experiments". But I have something more interesting to say...
  • 2.
    edited June 2014

    Here's my new idea: Azimuth Code Games.

    It's really just a different way to think about the Azimuth Code Project. We can play some "code games". We can pose challenges like this on the Azimuth Blog, and provide links to the necessary data sets:

    1. Can you detect correlations between detrended worldwide average temperature data and the Southern Oscillation Index? This is a way to ask if El Niño really affects global warming. Graham has already started playing this game on the blog, and I responded... but there's more to do here.

    2. Can you tell whether greenhouse gas concentrations are correlated to El Niños and/or detrended worldwide average temperature data? Nad has already been playing similar games here and on her blog... but her work hasn't been adequately publicized, and there's more to do.

    Then people can take the software written to solve these puzzles and stick it on github.

    Note that the above two games are just about analyzing a few time series: they're easier than full-fledged network analysis, where we have lots of time series. So, they're good places to start.

    Part of the "game" aspect could be can you teach a mathematician like John Baez how to write a program in R that does this data analysis? I think some people might get into that. Maybe even me.

    Calling these small coding projects "games" - or at least treating them as fun rather than profoundly serious and demanding that everything be breathtakingly original - may also help lure non-Azimuthers into helping out. Lots of people like to program.

    Comment Source:Here's my new idea: **Azimuth Code Games**. It's really just a different way to think about the Azimuth Code Project. We can play some "code games". We can pose challenges like this on the Azimuth Blog, and provide links to the necessary data sets: 1. Can you detect correlations between detrended worldwide average temperature data and the Southern Oscillation Index? This is a way to ask if El Niño really affects global warming. Graham has already started playing this game [on the blog](http://johncarlosbaez.wordpress.com/2014/06/05/warming-slowdown-part-2/#comment-47811), and I responded... but there's more to do here. 1. Can you tell whether greenhouse gas concentrations are correlated to El Niños and/or detrended worldwide average temperature data? Nad has already been playing similar games [here](http://www.azimuthproject.org/azimuth/show/Does+global+warming+lag+or+lead+a+rise+in+greenhouse+gas+concentration%3F) and on her blog... but her work hasn't been adequately publicized, and there's more to do. Then people can take the software written to solve these puzzles and stick it on github. Note that the above two games are just about analyzing a few time series: they're easier than full-fledged network analysis, where we have _lots_ of time series. So, they're good places to start. Part of the "game" aspect could be _can you teach a mathematician like John Baez how to write a program in R that does this data analysis?_ I think some people might get into that. Maybe even me. Calling these small coding projects "games" - or at least treating them as fun rather than profoundly serious and demanding that everything be breathtakingly original - may also help lure non-Azimuthers into helping out. Lots of people like to program.
  • 3.
    nad
    edited June 2014

    Calling these small coding projects “games” - or at least treating them as fun rather than profoundly serious and demanding that everything be breathtakingly original - may also help lure non-Azimuthers into helping out. Lots of people like to program.

    I am not sure what you mean here. Calculating correlations and other statistics is a way of enhancing human sensors, like a tomography may tell you more about the inside state of a sick belly than a bulge or a QNH more about the height of an aircraft than a look out of the window. One can of course let people make those interpretations of "tomographies" in a more organized game-like setting, if this is what you meant ? Actually I think there some medicine games like that for real tomographies. However I am not really sure wether organizing such games is really interesting for Azimuth since it seems to be a lot of work to prepare such a game. In a game you get usually a feedback of wether you were right or wrong and there are usually no other big consequences if you are wrong.

    Nad has already been playing similar games here and on her blog… but her work hasn’t been adequately publicized, and there’s more to do.

    I wouldn't call that a game, although I currently do indeed quite some guess work there. That is I try to collect information there and talk about my suspicions and I don't think that it is currently in a stage to be widely publicized - one would need to do a lot more work to undermine the suspicions. I should however also point out that I usually do not publicise all of my suspicions, but only those who appear to me to be in some sense relevant. And if I am totally off than this of course questions my abilities to judge about those things, which could have professional consequences. Just like if a surgeon would do that above mentioned games of interpreting tomographies in public - for the surgeon it would not really be a "game" or only to some extend.

    Comment Source:>Calling these small coding projects “games” - or at least treating them as fun rather than profoundly serious and demanding that everything be breathtakingly original - may also help lure non-Azimuthers into helping out. Lots of people like to program. I am not sure what you mean here. Calculating correlations and other statistics is a way of enhancing human sensors, like a tomography may tell you more about the inside state of a sick belly than a bulge or a <a href="http://en.wikipedia.org/wiki/QNH">QNH</a> more about the height of an aircraft than a look out of the window. One can of course let people make those interpretations of "tomographies" in a more organized game-like setting, if this is what you meant ? Actually I think there some medicine games like that for real tomographies. However I am not really sure wether organizing such games is really interesting for Azimuth since it seems to be a lot of work to prepare such a game. In a game you get usually a feedback of wether you were right or wrong and there are usually no other big consequences if you are wrong. > Nad has already been playing similar games here and on her blog… but her work hasn’t been adequately publicized, and there’s more to do. I wouldn't call that a game, although I currently do indeed quite some guess work there. That is I try to collect information there and talk about my suspicions and I don't think that it is currently in a stage to be widely publicized - one would need to do a lot more work to undermine the suspicions. I should however also point out that I usually do not publicise all of my suspicions, but only those who appear to me to be in some sense relevant. And if I am totally off than this of course questions my abilities to judge about those things, which could have professional consequences. Just like if a surgeon would do that above mentioned games of interpreting tomographies in public - for the surgeon it would not really be a "game" or only to some extend.
  • 4.

    I also am having trouble seeing it in terms of a game, but I like the spirit of your idea, in terms of finding small puzzles and challenges that people could work on and share -- a presentation with a lighter touch, that has a fun sound.

    But however you look at it we ourselves do have serious work to do in order to formulate the problems, obtain the data, and write the presentations -- whether they be blogs, coding challenges, or data puzzles. So I would see it as e.g. puzzles being form of presentation of the Azimuth Code project.

    Comment Source:I also am having trouble seeing it in terms of a game, but I like the spirit of your idea, in terms of finding small puzzles and challenges that people could work on and share -- a presentation with a lighter touch, that has a fun sound. But however you look at it we ourselves do have serious work to do in order to formulate the problems, obtain the data, and write the presentations -- whether they be blogs, coding challenges, or data puzzles. So I would see it as e.g. puzzles being form of presentation of the Azimuth Code project.
  • 5.

    By the way I myself am not frustrated or discouraged with the Azimuth Code project. As I see it what happened is that it was underestimated what it would take for for such a project to engage more people. It's not enough to just have a few of us be enthusiastic, and build some infrastructure. Some of these issues can be seen in business software development environment. It's not enough for management to set some general goals, buy some machines, and then hire some programmers. The programmers are not domain experts, and will flounder around trying to see how to bridge the huge language gap between that of the business logic and the programming languages. For us its compounded by the fact that we not even paying the programmers, so rather than the programmers flapping around in the room trying to figure out what to do, the volunteer programmers will just wander away.

    That is why there is a whole field of business analysis, whose importance is generally underestimated. They perform the critical translation process needed to activate the team as a whole.

    We ourselves -- or least me for now -- are beginning this "BA" process by writing blogs and wiki articles to translate from the science language to the programmer-ese.

    Comment Source:By the way I myself am not frustrated or discouraged with the Azimuth Code project. As I see it what happened is that it was underestimated what it would take for for such a project to engage more people. It's not enough to just have a few of us be enthusiastic, and build some infrastructure. Some of these issues can be seen in business software development environment. It's not enough for management to set some general goals, buy some machines, and then hire some programmers. The programmers are not domain experts, and will flounder around trying to see how to bridge the huge language gap between that of the business logic and the programming languages. For us its compounded by the fact that we not even paying the programmers, so rather than the programmers flapping around in the room trying to figure out what to do, the volunteer programmers will just wander away. That is why there is a whole field of _business analysis_, whose importance is generally underestimated. They perform the critical translation process needed to activate the team as a whole. We ourselves -- or least me for now -- are beginning this "BA" process by writing blogs and wiki articles to translate from the science language to the programmer-ese.
  • 6.
    edited June 2014

    John, your El Niño inspiration was a great advance. It's something people care about, it's current, it involves programming, and it is approachable.

    Let's follow through with this. When we have our stuff a bit more together, we can think about how to present it.

    The lighter touch could appeal to some wider audiences. But I'm also confident that there are alot of serious programmers who would be glad to contribute to our Azimuth Code Project (I have a few in mind already, alumni from college and grad school), once we do our science "business analysis" and have some written materials to engage them. I think the Wiki articles e.g. on climate networks will be very important, to clue people into what this fascinating stuff is all about. We should get to the point where we clearly spell out the algorithms involved.

    Comment Source:John, your El Niño inspiration was a great advance. It's something people care about, it's current, it involves programming, and it is approachable. Let's follow through with this. When we have our stuff a bit more together, we can think about how to present it. The lighter touch could appeal to some wider audiences. But I'm also confident that there are alot of serious programmers who would be glad to contribute to our Azimuth Code Project (I have a few in mind already, alumni from college and grad school), once we do our science "business analysis" and have some written materials to engage them. I think the Wiki articles e.g. on climate networks will be very important, to clue people into what this fascinating stuff is all about. We should get to the point where we clearly spell out the algorithms involved.
  • 7.

    John wrote:

    We would first need to get ahold of daily temperature data for "14 grid points in the El Niño basin and 193 grid points outside this domain" from 1981 to 2014. That's 207 locations and 34 years. This data is supposedly available from the National Centers for Environmental Prediction and the National Center for Atmospheric Research Reanalysis I Project. The paper starts by taking these temperatures, computing the average temperature at each day of the year at each location, and subtracting this from the actual temperatures to obtain "temperature anomalies". In other words, we want a big array of numbers like this: the temperature on March 21st 1990 at some location, minus the average temperature on all March 21sts from 1981 to 2014 at that location. Then they process this array of numbers in various ways, which I can explain... They consider all pairs of locations, so at some point they are working with 207 × 207 × 365 × 34 numbers. Is that a lot of numbers these days?

    No, not a lot of numbers. It's on the order of 50 million, which times four bytes for a float is only 200 megabytes. Memory spaces are measured in the gigabytes.

    Can anyone here look into getting the data?

    Comment Source:John wrote: > We would first need to get ahold of daily temperature data for "14 grid points in the El Niño basin and 193 grid points outside this domain" from 1981 to 2014. That's 207 locations and 34 years. This data is supposedly available from the National Centers for Environmental Prediction and the National Center for Atmospheric Research Reanalysis I Project. The paper starts by taking these temperatures, computing the average temperature at each day of the year at each location, and subtracting this from the actual temperatures to obtain "temperature anomalies". In other words, we want a big array of numbers like this: the temperature on March 21st 1990 at some location, minus the average temperature on all March 21sts from 1981 to 2014 at that location. Then they process this array of numbers in various ways, which I can explain... They consider all _pairs_ of locations, so at some point they are working with 207 &times; 207 &times; 365 &times; 34 numbers. Is that a lot of numbers these days? No, not a lot of numbers. It's on the order of 50 million, which times four bytes for a float is only 200 megabytes. Memory spaces are measured in the gigabytes. Can anyone here look into getting the data?
  • 8.

    Dave wrote:

    That is why there is a whole field of business analysis, whose importance is generally underestimated. They perform the critical translation process needed to activate the team as a whole.

    +2.

    Comment Source:Dave wrote: > That is why there is a whole field of business analysis, whose importance is generally underestimated. They perform the critical translation process needed to activate the team as a whole. +2.
  • 9.
    edited June 2014

    Nad wrote:

    One can of course let people make those interpretations of “tomographies” in a more organized game-like setting, if this is what you meant?

    David wrote:

    I also am having trouble seeing it in terms of a game, but I like the spirit of your idea, in terms of finding small puzzles and challenges that people could work on and share – a presentation with a lighter touch, that has a fun sound.

    I don't mean a game with rules, so maybe the word "game" is more confusing than helpful... I just meant what David said: putting puzzles and programming challenges on the blog and treating them more as "fun" than "a serious research project", at least until we get up to speed.

    The lighter touch could appeal to some wider audiences.

    I think what I have in mind will appeal to smart people whose idea of fun is writing a little program to answer a question. My main job will be to help come up with well-defined questions, explain them, and point them to the necessary databases. Some of you other folks can focus a bit more on the programming aspects - either writing programs, or talking to people about them.

    Let’s follow through with this. When we have our stuff a bit more together, we can think about how to present it.

    I definitely want to follow through with it. But I don't want to wait too long; since it's supposed to be fun I don't mind diving in and starting, and letting it become organized as time goes by. I can put some simple programming challenges on the blog as soon as people have had time to discuss Global Warming (Part 2).

    By the way, I hope everyone here comments on that post, since Jan Galkowski put a lot of work into it. It's a bit intimidating, but we should still ask lots of questions.

    Also by the way, he's the kind of guy who would be really great for this El Niño project, and I'm hoping he'll join the fun.

    WebHubTel is also good at this stuff.

    If Greg Egan ever gets interested in this stuff, we'll be in even better shape. He's a great programmer, really smart, and he's helped me out on dozens of projects. But so far he's mainly interested in solving problems related to general relativity, quantum mechanics and pure math.

    Comment Source:Nad wrote: > One can of course let people make those interpretations of “tomographies” in a more organized game-like setting, if this is what you meant? David wrote: > I also am having trouble seeing it in terms of a game, but I like the spirit of your idea, in terms of finding small puzzles and challenges that people could work on and share – a presentation with a lighter touch, that has a fun sound. I don't mean a game with rules, so maybe the word "game" is more confusing than helpful... I just meant what David said: putting puzzles and programming challenges on the blog and treating them more as "fun" than "a serious research project", at least until we get up to speed. > The lighter touch could appeal to some wider audiences. I think what I have in mind will appeal to smart people whose idea of fun is writing a little program to answer a question. My main job will be to help come up with well-defined questions, explain them, and point them to the necessary databases. Some of you other folks can focus a bit more on the programming aspects - either writing programs, or talking to people about them. > Let’s follow through with this. When we have our stuff a bit more together, we can think about how to present it. I definitely want to follow through with it. But I don't want to wait too long; since it's supposed to be fun I don't mind diving in and starting, and letting it become organized as time goes by. I can put some simple programming challenges on the blog as soon as people have had time to discuss [Global Warming (Part 2)](http://johncarlosbaez.wordpress.com/2014/06/05/warming-slowdown-part-2/). By the way, I hope everyone here comments on that post, since [[Jan Galkowski]] put a lot of work into it. It's a bit intimidating, but we should still ask lots of questions. Also by the way, he's the kind of guy who would be really great for this El Ni&ntilde;o project, and I'm hoping he'll join the fun. WebHubTel is also good at this stuff. If Greg Egan ever gets interested in this stuff, we'll be in even better shape. He's a great programmer, really smart, and he's helped me out on dozens of projects. But so far he's mainly interested in solving problems related to general relativity, quantum mechanics and pure math.
  • 10.
    edited June 2014

    David wrote:

    Can anyone here look into getting the data?

    They seem to say it's here:

    More precisely, there's a bunch of files here containing worldwide daily average temperatures on a 2.5 degree latitude × 2.5 degree longitude grid (144 × 73 grid points), from 1948 to 2010. If you go here the website will help you get data from within a chosen rectangle in a grid, for a chosen time interval.

    Comment Source:David wrote: > Can anyone here look into getting the data? They seem to say it's here: * [NCEP/NCAR Reanalysis 1: Surface](http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.surface.html). More precisely, there's a bunch of files [here](http://www.esrl.noaa.gov/psd/cgi-bin/db_search/DBSearch.pl?Dataset=NCEP+Reanalysis+Daily+Averages+Surface+Level&Variable=Air+Temperature&group=0&submit=Search) containing worldwide daily average temperatures on a 2.5 degree latitude &times; 2.5 degree longitude grid (144 &times; 73 grid points), from 1948 to 2010. If you go [here](http://www.esrl.noaa.gov/psd/cgi-bin/DataAccess.pl?DB_dataset=NCEP+Reanalysis+Daily+Averages+Surface+Level&DB_variable=Air+temperature&DB_statistic=Mean&DB_tid=41710&DB_did=33&DB_vid=668) the website will help you get data from within a chosen rectangle in a grid, for a chosen time interval.
  • 11.

    Great, thanks. I'm looking into these netCDF files.

    Comment Source:Great, thanks. I'm looking into these netCDF files.
  • 12.

    There is an R package for using netCDF files. I've never used it.

    Comment Source:There is an [R package for using netCDF files](http://cran.r-project.org/web/packages/RNetCDF/index.html). I've never used it.
  • 13.
    edited June 2014

    Now I have used it:

    library(RNetCDF)
    
    onc <- open.nc("air.sig995.2013.nc")
    rnc <- read.nc(onc)
    rnc[[1]]
    rnc[[2]]
    rnc[[3]]
    rnc[[4]][1:5, 1:5, 1:5]
    close.nc(onc)
    

    It does this: looks like grid points, times, then a 3D array.

    > rnc[[1]]
     [1]  90.0  87.5  85.0  82.5  80.0  77.5  75.0  72.5  70.0  67.5  65.0  62.5  60.0  57.5  55.0  52.5  50.0  47.5  45.0  42.5
    [21]  40.0  37.5  35.0  32.5  30.0  27.5  25.0  22.5  20.0  17.5  15.0  12.5  10.0   7.5   5.0   2.5   0.0  -2.5  -5.0  -7.5
    [41] -10.0 -12.5 -15.0 -17.5 -20.0 -22.5 -25.0 -27.5 -30.0 -32.5 -35.0 -37.5 -40.0 -42.5 -45.0 -47.5 -50.0 -52.5 -55.0 -57.5
    [61] -60.0 -62.5 -65.0 -67.5 -70.0 -72.5 -75.0 -77.5 -80.0 -82.5 -85.0 -87.5 -90.0
    
    > rnc[[2]]
      [1]   0.0   2.5   5.0   7.5  10.0  12.5  15.0  17.5  20.0  22.5  25.0  27.5  30.0  32.5  35.0  37.5  40.0  42.5  45.0  47.5
     [21]  50.0  52.5  55.0  57.5  60.0  62.5  65.0  67.5  70.0  72.5  75.0  77.5  80.0  82.5  85.0  87.5  90.0  92.5  95.0  97.5
     [41] 100.0 102.5 105.0 107.5 110.0 112.5 115.0 117.5 120.0 122.5 125.0 127.5 130.0 132.5 135.0 137.5 140.0 142.5 145.0 147.5
     [61] 150.0 152.5 155.0 157.5 160.0 162.5 165.0 167.5 170.0 172.5 175.0 177.5 180.0 182.5 185.0 187.5 190.0 192.5 195.0 197.5
     [81] 200.0 202.5 205.0 207.5 210.0 212.5 215.0 217.5 220.0 222.5 225.0 227.5 230.0 232.5 235.0 237.5 240.0 242.5 245.0 247.5
    [101] 250.0 252.5 255.0 257.5 260.0 262.5 265.0 267.5 270.0 272.5 275.0 277.5 280.0 282.5 285.0 287.5 290.0 292.5 295.0 297.5
    [121] 300.0 302.5 305.0 307.5 310.0 312.5 315.0 317.5 320.0 322.5 325.0 327.5 330.0 332.5 335.0 337.5 340.0 342.5 345.0 347.5
    [141] 350.0 352.5 355.0 357.5
    
    > rnc[[3]]
      [1] 17636880 17636904 17636928 17636952 17636976 17637000 17637024 17637048 17637072 17637096 17637120 17637144 17637168
     [14] 17637192 17637216 17637240 17637264 17637288 17637312 17637336 17637360 17637384 17637408 17637432 17637456 17637480
     [27] 17637504 17637528 17637552 17637576 17637600 17637624 17637648 17637672 17637696 17637720 17637744 17637768 17637792
     [40] 17637816 17637840 17637864 17637888 17637912 17637936 17637960 17637984 17638008 17638032 17638056 17638080 17638104
     [53] 17638128 17638152 17638176 17638200 17638224 17638248 17638272 17638296 17638320 17638344 17638368 17638392 17638416
     [66] 17638440 17638464 17638488 17638512 17638536 17638560 17638584 17638608 17638632 17638656 17638680 17638704 17638728
     [79] 17638752 17638776 17638800 17638824 17638848 17638872 17638896 17638920 17638944 17638968 17638992 17639016 17639040
     [92] 17639064 17639088 17639112 17639136 17639160 17639184 17639208 17639232 17639256 17639280 17639304 17639328 17639352
    [105] 17639376 17639400 17639424 17639448 17639472 17639496 17639520 17639544 17639568 17639592 17639616 17639640 17639664
    [118] 17639688 17639712 17639736 17639760 17639784 17639808 17639832 17639856 17639880 17639904 17639928 17639952 17639976
    [131] 17640000 17640024 17640048 17640072 17640096 17640120 17640144 17640168 17640192 17640216 17640240 17640264 17640288
    [144] 17640312 17640336 17640360 17640384 17640408 17640432 17640456 17640480 17640504 17640528 17640552 17640576 17640600
    [157] 17640624 17640648 17640672 17640696 17640720 17640744 17640768 17640792 17640816 17640840 17640864 17640888 17640912
    [170] 17640936 17640960 17640984 17641008 17641032 17641056 17641080 17641104 17641128 17641152 17641176 17641200 17641224
    [183] 17641248 17641272 17641296 17641320 17641344 17641368 17641392 17641416 17641440 17641464 17641488 17641512 17641536
    [196] 17641560 17641584 17641608 17641632 17641656 17641680 17641704 17641728 17641752 17641776 17641800 17641824 17641848
    [209] 17641872 17641896 17641920 17641944 17641968 17641992 17642016 17642040 17642064 17642088 17642112 17642136 17642160
    [222] 17642184 17642208 17642232 17642256 17642280 17642304 17642328 17642352 17642376 17642400 17642424 17642448 17642472
    [235] 17642496 17642520 17642544 17642568 17642592 17642616 17642640 17642664 17642688 17642712 17642736 17642760 17642784
    [248] 17642808 17642832 17642856 17642880 17642904 17642928 17642952 17642976 17643000 17643024 17643048 17643072 17643096
    [261] 17643120 17643144 17643168 17643192 17643216 17643240 17643264 17643288 17643312 17643336 17643360 17643384 17643408
    [274] 17643432 17643456 17643480 17643504 17643528 17643552 17643576 17643600 17643624 17643648 17643672 17643696 17643720
    [287] 17643744 17643768 17643792 17643816 17643840 17643864 17643888 17643912 17643936 17643960 17643984 17644008 17644032
    [300] 17644056 17644080 17644104 17644128 17644152 17644176 17644200 17644224 17644248 17644272 17644296 17644320 17644344
    [313] 17644368 17644392 17644416 17644440 17644464 17644488 17644512 17644536 17644560 17644584 17644608 17644632 17644656
    [326] 17644680 17644704 17644728 17644752 17644776 17644800 17644824 17644848 17644872 17644896 17644920 17644944 17644968
    [339] 17644992 17645016 17645040 17645064 17645088 17645112 17645136 17645160 17645184 17645208 17645232 17645256 17645280
    [352] 17645304 17645328 17645352 17645376 17645400 17645424 17645448 17645472 17645496 17645520 17645544 17645568 17645592
    [365] 17645616
    
    > rnc[[4]][1:5, 1:5, 1:5]
    , , 1
    
           [,1]   [,2]   [,3]   [,4]   [,5]
    [1,] 258.78 259.30 264.10 269.18 272.38
    [2,] 258.78 259.55 264.47 269.57 272.35
    [3,] 258.78 259.82 264.82 269.78 272.12
    [4,] 258.78 260.00 265.05 269.88 271.80
    [5,] 258.78 260.18 265.22 269.90 271.45
    
    , , 2
    
          [,1]   [,2]   [,3]   [,4]   [,5]
    [1,] 252.4 254.57 262.77 269.05 269.90
    [2,] 252.4 254.22 262.50 269.02 269.82
    [3,] 252.4 253.95 262.25 269.07 269.80
    [4,] 252.4 253.60 262.03 269.12 269.85
    [5,] 252.4 253.32 261.82 269.18 269.90
    
    , , 3
    
           [,1]   [,2]   [,3]   [,4]   [,5]
    [1,] 248.07 248.17 258.52 268.20 268.70
    [2,] 248.07 248.07 258.65 268.60 268.82
    [3,] 248.07 247.95 258.72 269.00 269.05
    [4,] 248.07 247.85 258.78 269.40 269.32
    [5,] 248.07 247.72 258.80 269.77 269.65
    
    , , 4
    
         [,1]   [,2]   [,3]   [,4]   [,5]
    [1,]  245 245.65 259.00 267.77 266.80
    [2,]  245 245.62 258.97 268.03 267.35
    [3,]  245 245.57 258.95 268.35 267.90
    [4,]  245 245.52 258.92 268.67 268.47
    [5,]  245 245.45 258.87 268.97 268.92
    
    , , 5
    
          [,1]   [,2]   [,3]   [,4]   [,5]
    [1,] 248.7 256.92 265.40 266.25 266.72
    [2,] 248.7 256.70 265.52 267.17 267.62
    [3,] 248.7 256.45 265.60 268.00 268.32
    [4,] 248.7 256.22 265.68 268.65 268.80
    [5,] 248.7 255.97 265.65 269.22 269.07
    
    
    
    
    
    
    
    Comment Source:Now I have used it: ~~~~ library(RNetCDF) onc <- open.nc("air.sig995.2013.nc") rnc <- read.nc(onc) rnc[[1]] rnc[[2]] rnc[[3]] rnc[[4]][1:5, 1:5, 1:5] close.nc(onc) ~~~~ It does this: looks like grid points, times, then a 3D array. ~~~~ > rnc[[1]] [1] 90.0 87.5 85.0 82.5 80.0 77.5 75.0 72.5 70.0 67.5 65.0 62.5 60.0 57.5 55.0 52.5 50.0 47.5 45.0 42.5 [21] 40.0 37.5 35.0 32.5 30.0 27.5 25.0 22.5 20.0 17.5 15.0 12.5 10.0 7.5 5.0 2.5 0.0 -2.5 -5.0 -7.5 [41] -10.0 -12.5 -15.0 -17.5 -20.0 -22.5 -25.0 -27.5 -30.0 -32.5 -35.0 -37.5 -40.0 -42.5 -45.0 -47.5 -50.0 -52.5 -55.0 -57.5 [61] -60.0 -62.5 -65.0 -67.5 -70.0 -72.5 -75.0 -77.5 -80.0 -82.5 -85.0 -87.5 -90.0 > rnc[[2]] [1] 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 22.5 25.0 27.5 30.0 32.5 35.0 37.5 40.0 42.5 45.0 47.5 [21] 50.0 52.5 55.0 57.5 60.0 62.5 65.0 67.5 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 90.0 92.5 95.0 97.5 [41] 100.0 102.5 105.0 107.5 110.0 112.5 115.0 117.5 120.0 122.5 125.0 127.5 130.0 132.5 135.0 137.5 140.0 142.5 145.0 147.5 [61] 150.0 152.5 155.0 157.5 160.0 162.5 165.0 167.5 170.0 172.5 175.0 177.5 180.0 182.5 185.0 187.5 190.0 192.5 195.0 197.5 [81] 200.0 202.5 205.0 207.5 210.0 212.5 215.0 217.5 220.0 222.5 225.0 227.5 230.0 232.5 235.0 237.5 240.0 242.5 245.0 247.5 [101] 250.0 252.5 255.0 257.5 260.0 262.5 265.0 267.5 270.0 272.5 275.0 277.5 280.0 282.5 285.0 287.5 290.0 292.5 295.0 297.5 [121] 300.0 302.5 305.0 307.5 310.0 312.5 315.0 317.5 320.0 322.5 325.0 327.5 330.0 332.5 335.0 337.5 340.0 342.5 345.0 347.5 [141] 350.0 352.5 355.0 357.5 > rnc[[3]] [1] 17636880 17636904 17636928 17636952 17636976 17637000 17637024 17637048 17637072 17637096 17637120 17637144 17637168 [14] 17637192 17637216 17637240 17637264 17637288 17637312 17637336 17637360 17637384 17637408 17637432 17637456 17637480 [27] 17637504 17637528 17637552 17637576 17637600 17637624 17637648 17637672 17637696 17637720 17637744 17637768 17637792 [40] 17637816 17637840 17637864 17637888 17637912 17637936 17637960 17637984 17638008 17638032 17638056 17638080 17638104 [53] 17638128 17638152 17638176 17638200 17638224 17638248 17638272 17638296 17638320 17638344 17638368 17638392 17638416 [66] 17638440 17638464 17638488 17638512 17638536 17638560 17638584 17638608 17638632 17638656 17638680 17638704 17638728 [79] 17638752 17638776 17638800 17638824 17638848 17638872 17638896 17638920 17638944 17638968 17638992 17639016 17639040 [92] 17639064 17639088 17639112 17639136 17639160 17639184 17639208 17639232 17639256 17639280 17639304 17639328 17639352 [105] 17639376 17639400 17639424 17639448 17639472 17639496 17639520 17639544 17639568 17639592 17639616 17639640 17639664 [118] 17639688 17639712 17639736 17639760 17639784 17639808 17639832 17639856 17639880 17639904 17639928 17639952 17639976 [131] 17640000 17640024 17640048 17640072 17640096 17640120 17640144 17640168 17640192 17640216 17640240 17640264 17640288 [144] 17640312 17640336 17640360 17640384 17640408 17640432 17640456 17640480 17640504 17640528 17640552 17640576 17640600 [157] 17640624 17640648 17640672 17640696 17640720 17640744 17640768 17640792 17640816 17640840 17640864 17640888 17640912 [170] 17640936 17640960 17640984 17641008 17641032 17641056 17641080 17641104 17641128 17641152 17641176 17641200 17641224 [183] 17641248 17641272 17641296 17641320 17641344 17641368 17641392 17641416 17641440 17641464 17641488 17641512 17641536 [196] 17641560 17641584 17641608 17641632 17641656 17641680 17641704 17641728 17641752 17641776 17641800 17641824 17641848 [209] 17641872 17641896 17641920 17641944 17641968 17641992 17642016 17642040 17642064 17642088 17642112 17642136 17642160 [222] 17642184 17642208 17642232 17642256 17642280 17642304 17642328 17642352 17642376 17642400 17642424 17642448 17642472 [235] 17642496 17642520 17642544 17642568 17642592 17642616 17642640 17642664 17642688 17642712 17642736 17642760 17642784 [248] 17642808 17642832 17642856 17642880 17642904 17642928 17642952 17642976 17643000 17643024 17643048 17643072 17643096 [261] 17643120 17643144 17643168 17643192 17643216 17643240 17643264 17643288 17643312 17643336 17643360 17643384 17643408 [274] 17643432 17643456 17643480 17643504 17643528 17643552 17643576 17643600 17643624 17643648 17643672 17643696 17643720 [287] 17643744 17643768 17643792 17643816 17643840 17643864 17643888 17643912 17643936 17643960 17643984 17644008 17644032 [300] 17644056 17644080 17644104 17644128 17644152 17644176 17644200 17644224 17644248 17644272 17644296 17644320 17644344 [313] 17644368 17644392 17644416 17644440 17644464 17644488 17644512 17644536 17644560 17644584 17644608 17644632 17644656 [326] 17644680 17644704 17644728 17644752 17644776 17644800 17644824 17644848 17644872 17644896 17644920 17644944 17644968 [339] 17644992 17645016 17645040 17645064 17645088 17645112 17645136 17645160 17645184 17645208 17645232 17645256 17645280 [352] 17645304 17645328 17645352 17645376 17645400 17645424 17645448 17645472 17645496 17645520 17645544 17645568 17645592 [365] 17645616 > rnc[[4]][1:5, 1:5, 1:5] , , 1 [,1] [,2] [,3] [,4] [,5] [1,] 258.78 259.30 264.10 269.18 272.38 [2,] 258.78 259.55 264.47 269.57 272.35 [3,] 258.78 259.82 264.82 269.78 272.12 [4,] 258.78 260.00 265.05 269.88 271.80 [5,] 258.78 260.18 265.22 269.90 271.45 , , 2 [,1] [,2] [,3] [,4] [,5] [1,] 252.4 254.57 262.77 269.05 269.90 [2,] 252.4 254.22 262.50 269.02 269.82 [3,] 252.4 253.95 262.25 269.07 269.80 [4,] 252.4 253.60 262.03 269.12 269.85 [5,] 252.4 253.32 261.82 269.18 269.90 , , 3 [,1] [,2] [,3] [,4] [,5] [1,] 248.07 248.17 258.52 268.20 268.70 [2,] 248.07 248.07 258.65 268.60 268.82 [3,] 248.07 247.95 258.72 269.00 269.05 [4,] 248.07 247.85 258.78 269.40 269.32 [5,] 248.07 247.72 258.80 269.77 269.65 , , 4 [,1] [,2] [,3] [,4] [,5] [1,] 245 245.65 259.00 267.77 266.80 [2,] 245 245.62 258.97 268.03 267.35 [3,] 245 245.57 258.95 268.35 267.90 [4,] 245 245.52 258.92 268.67 268.47 [5,] 245 245.45 258.87 268.97 268.92 , , 5 [,1] [,2] [,3] [,4] [,5] [1,] 248.7 256.92 265.40 266.25 266.72 [2,] 248.7 256.70 265.52 267.17 267.62 [3,] 248.7 256.45 265.60 268.00 268.32 [4,] 248.7 256.22 265.68 268.65 268.80 [5,] 248.7 255.97 265.65 269.22 269.07 ~~~~
  • 14.

    The Nino3.4 data is here.

    Comment Source:The Nino3.4 data is [here](http://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/Nino34/).
  • 15.
    edited June 2014

    Graham wrote:

    The Nino3.4 data is here.

    I wondered what this is. It's the area averaged sea surface temperature (SST) in the region 5°S-5°N and 170°-120°W. These numbers are good to know: this is someone's idea of region of the Pacific that gets hot during an El Niño.

    I guess these temperatures are also averaged over time somehow, or is it a reading-by-reading thing?

    Thanks, guys! A good first "game" would be to use R and this data to do anything mildly interesting.

    Comment Source:Graham wrote: > The Nino3.4 data is [here](http://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/Nino34/). I wondered what this is. It's the area averaged sea surface temperature (SST) in the region 5&deg;S-5&deg;N and 170&deg;-120&deg;W. These numbers are good to know: this is someone's idea of region of the Pacific that gets hot during an El Ni&ntilde;o. I guess these temperatures are also averaged over time somehow, or is it a reading-by-reading thing? Thanks, guys! A good first "game" would be to use R and this data to do _anything mildly interesting_.
  • 16.
    edited June 2014

    The reason I went looking for Nino3.4 data is that in this paper: Improved El Niño forecasting by cooperativity detection, Josef Ludescher et al, 2013, it provides what they are trying to predict,ie, El Niño events. The data contains one number per month back to 1870. An El Niño event is declared when the NINO3.4 index stays above 0.5C for 5 months. The index is defined as "the average of the sea-surface temperature (SST) anomalies at certain grid points in the Pacific (Fig. 1)".

    So one game is to make better predictions than they did.

    Comment Source:The reason I went looking for Nino3.4 data is that in this paper: *Improved El Niño forecasting by cooperativity detection*, Josef Ludescher et al, 2013, it provides what they are trying to predict,ie, El Niño events. The data contains one number per month back to 1870. An El Niño event is declared when the NINO3.4 index stays above 0.5C for 5 months. The index is defined as "the average of the sea-surface temperature (SST) anomalies at certain grid points in the Pacific (Fig. 1)". So one game is to make better predictions than they did.
  • 17.

    On games, or work and play. In the climate network thread, John said

    I have the strong feeling that people in this subject - “climate network theory” - are just messing around trying to see what works.

    I have the same feeling. It seems appropriate to think of this as playing a game. I guess it turns from play into work when other people take what you've done seriously.

    Comment Source:On games, or work and play. In the climate network thread, [John said](http://forum.azimuthproject.org/discussion/1357/climate-networks/?Focus=10706#Comment_10706) > I have the strong feeling that people in this subject - “climate network theory” - are just messing around trying to see what works. I have the same feeling. It seems appropriate to think of this as playing a game. I guess it turns from play into work when *other people* take what you've done seriously.
  • 18.

    I have added an image to the wiki page which displays the temperatures over the Pacific for the year 1951, and the R code that makes this.

    Comment Source:I have added an image to the wiki page which displays the temperatures over the Pacific for the year 1951, and the R code that makes this.
  • 19.
    edited June 2014

    Great!

    The red line at bottom right looks like the coast of South America. Is that what it is?

    I want to start blogging about this stuff soon - above very simple aspects, like how to find the data online, etc.

    So one game is to make better predictions than they did.

    Yes, this is the big game I'm aiming for!

    As Nad points out, there is a danger of "overfitting" - there haven't been many El Niños during the time we've had such good climate data, so one could make a model that correctly predicts all of those, yet is fundamentally stupid and fails the next time around. I think using good concepts instead of just fiddling around tuning parameters is one way to avoid this.

    Comment Source:Great! <img src = "http://math.ucr.edu/home/baez/emoticons/thumbsup.gif" alt = ""/> The red line at bottom right looks like the coast of South America. Is that what it is? <img width = "500" src = "http://www.azimuthproject.org/azimuth/files/pacific1951.png" alt = ""/> I want to start blogging about this stuff soon - above very simple aspects, like how to find the data online, etc. > So one game is to make better predictions than they did. Yes, this is the big game I'm aiming for! As Nad points out, there is a danger of "overfitting" - there haven't been many El Ni&ntilde;os during the time we've had such good climate data, so one could make a model that correctly predicts all of those, yet is fundamentally stupid and fails the next time around. I think using good concepts instead of just fiddling around tuning parameters is one way to avoid this.
  • 20.

    The red line at bottom right looks like the coast of South America. Is that what it is?

    I hope so! Red is coolest, white is hottest, so that looks like the Andes.

    one could make a model that correctly predicts all of those, yet is fundamentally stupid and fails the next time around.

    The usual guard against that is to 'train' the method on some of the data (eg 1948-1980) and evaluate it on the rest. With so few El Niños, that is not great. Making continuous predictions of the NINO3.4 index might be preferable.

    Comment Source:> The red line at bottom right looks like the coast of South America. Is that what it is? I hope so! Red is coolest, white is hottest, so that looks like the Andes. > one could make a model that correctly predicts all of those, yet is fundamentally stupid and fails the next time around. The usual guard against that is to 'train' the method on some of the data (eg 1948-1980) and evaluate it on the rest. With so few El Niños, that is not great. Making continuous predictions of the NINO3.4 index might be preferable.
  • 21.

    Red for cool is not so cool from a cognitive ergonomic pov; bring on the blue I say :) PS I'm thoroughly enjoying programming this.

    Comment Source:Red for cool is not so cool from a cognitive ergonomic pov; bring on the blue I say :) PS I'm thoroughly enjoying programming this.
  • 22.

    Another image:

    Can you spot the El Nino years?

    Comment Source:Another image: <img width = "1200" src = "http://www.azimuthproject.org/azimuth/files/pacific1955-1961.png" alt = ""/> Can you spot the El Nino years?
  • 23.
    edited June 2014

    Thanks for the blue Graham. I for one can easily be confused by heading west and going from 179W to 179 E and La Nina/El Nino. Looks like 1958 and 1961 have the largest red blobs.

    Comment Source:Thanks for the blue Graham. I for one can easily be confused by heading west and going from 179W to 179 E and La Nina/El Nino. Looks like 1958 and 1961 have the largest red blobs.
  • 24.
    edited June 2014

    NOAA seem to use the convention of heading east from Greenwich. But I see John says El Nino is area 5°S-5°N and 170°-120°W. That's 190°E- 300°E. I think.

    And yes, I could draw a box, but I think subtracting the long-term means so differences show up better is the next thing to do.

    Comment Source:NOAA seem to use the convention of heading east from Greenwich. But I see John says El Nino is area 5°S-5°N and 170°-120°W. That's 190°E- 300°E. I think. And yes, I could draw a box, but I think subtracting the long-term means so differences show up better is the next thing to do.
  • 25.

    I guess it turns from play into work when other people take what you’ve done seriously.

    Mostly only if in addition it may directly affect those other people.

    Comment Source:>I guess it turns from play into work when other people take what you’ve done seriously. Mostly only if in addition it may directly affect those other people.
  • 26.
    edited June 2014

    Graham wrote:

    I see John says El Nino is area 5°S-5°N and 170°-120°W. That’s 190°E- 300°E. I think.

    Shouldn't the angle west and the angle east add up to 360°? If so, 170°-120°W is 190°-240°E.

    NOAA says the Nino3.4 index is the area averaged sea surface temperature (SST) in the region 5°S-5°N and 170°-120°W.

    Ludescher et al's paper on improved El Niño forecasting by cooperative detection uses sea surface temperatures in this region. The red dots are their "El Niño basin". It looks like about 75°-170° W.

    If you read the fine print and track back the references you'll see they got their definition of "El Niño basin" from these papers:

    • Yamasaki K, Gozolchiani A, Havlin S (2008) Climate networks around the globe are significantly affected by El Niño. Phys Rev Lett 100(22):228501.

    • Gozolchiani A, Havlin S, Yamasaki K (2011) Emergence of El Niño as an autonomous component in the climate network. Phys Rev Lett 107(14):148501.

    The first one we are familiar with. Both are published in Physical Review Letters, which is not where climate scientists would typically publish. So, I expect these guys are more like "network theorists" - this stuff is popular in physics now. This means we should absorb more of the traditional climate science literature.

    Comment Source:Graham wrote: > I see John says El Nino is area 5°S-5°N and 170°-120°W. That’s 190°E- 300°E. I think. Shouldn't the angle west and the angle east add up to 360°? If so, 170°-120°W is 190°-240°E. NOAA says the [Nino3.4 index](http://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/Nino34/) is the area averaged sea surface temperature (SST) in the region 5&deg;S-5&deg;N and 170&deg;-120&deg;W. Ludescher _et al's_ paper on [improved El Ni&ntilde;o forecasting by cooperative detection](http://forum.azimuthproject.org/discussion/1360/paper-ludescher-et-al-improved-el-nino-forecasting-by-cooperativity-detection/#Item_35) uses sea surface temperatures in this region. The red dots are their "El Ni&ntilde;o basin". It looks like about 75&deg;-170&deg; W. <img src = "http://math.ucr.edu/home/baez/ecological/ludescher_el_nino_cooperativity_1.jpg" alt = ""/> If you read the fine print and track back the references you'll see they got their definition of "El Ni&ntilde;o basin" from these papers: * Yamasaki K, Gozolchiani A, Havlin S (2008) Climate networks around the globe are significantly affected by El Niño. Phys Rev Lett 100(22):228501. * Gozolchiani A, Havlin S, Yamasaki K (2011) Emergence of El Niño as an autonomous component in the climate network. Phys Rev Lett 107(14):148501. The first one we are familiar with. Both are published in _Physical Review Letters_, which is not where climate scientists would typically publish. So, I expect these guys are more like "network theorists" - this stuff is popular in physics now. This means we should absorb more of the traditional climate science literature.
  • 27.
    edited June 2014

    I find the discussion here (and in other threads) a bit sprawling, so I have updated

    to include the key sources of data, useful graphics, etc.

    I propose renaming that page "Experiments in El Niño prediction":

    1. We have a large class of Experiments pages, where people work on projects but don't want their writeups to be taken as the gospel truth - these are on the wiki, but not part of the "Azimuth Library".

    2. I don't think "signal processing" is quite the best emphasis.

    3. We may stray from a strictly "climate network" viewpoint.

    Comment Source:I find the discussion here (and in other threads) a bit sprawling, so I have updated * [[Project - Using climate networks for El Niño signal processing]] to include the key sources of data, useful graphics, etc. I propose renaming that page "Experiments in El Niño prediction": 1. We have a large class of [[Experiments]] pages, where people work on projects but don't want their writeups to be taken as the gospel truth - these are on the wiki, but not part of the "Azimuth Library". 1. I don't think "signal processing" is quite the best emphasis. 1. We may stray from a strictly "climate network" viewpoint.
  • 28.
    edited June 2014

    I agree with most of what you say about the title, but have one consideration to raise here.

    Ludescher et al. is about El Niño prediction, and Yamasaki et al. is about El Niño detection in the present -- they provide a measure that they claim more cleanly indicates El Niño than other noisier measures.

    While I agree that prediction may be the most critical application, the broader topic which includes both prediction and present-time detection is the foundation. Since we're just starting out, I'd prefer not to exclude the present-time detection from our potential scope of experiments. As you said elsewhere, it would be a forward step if we can start computing anything that is interesting from this data.

    "Signal processing" was a hasty attempt to capture the union of present-oriented and future-oriented El Niño detection.

    Suggestions: How about "Experiments in El Niño detection," or "Experiments in El Niño detection and prediction"?

    Otherwise, "Experiments in El Niño prediction" would be fine.

    Comment Source:I agree with most of what you say about the title, but have one consideration to raise here. Ludescher _et al._ is about El Niño prediction, and Yamasaki _et al._ is about El Niño detection in the present -- they provide a measure that they claim more cleanly indicates El Niño than other noisier measures. While I agree that prediction may be the most critical application, the broader topic which includes both prediction and present-time detection is the foundation. Since we're just starting out, I'd prefer not to exclude the present-time detection from our potential scope of experiments. As you said elsewhere, it would be a forward step if we can start computing anything that is interesting from this data. "Signal processing" was a hasty attempt to capture the union of present-oriented and future-oriented El Niño detection. Suggestions: How about "Experiments in El Niño detection," or "Experiments in El Niño detection and prediction"? Otherwise, "Experiments in El Niño prediction" would be fine.
  • 29.
    edited June 2014

    Okay, I'll go with

    Hmm, getting the "smell smoke?" bug while trying to rename this page...

    Comment Source:Okay, I'll go with * [[Experiments in El Niño detection and prediction]]. Hmm, getting the "smell smoke?" bug while trying to rename this page...
  • 30.
    edited June 2014

    Graham, thanks for the information about how to get started with R. I got started, and generated the first image using your sample code.

    For those who haven't looked into this data, here is how it is organized.

    There is one file per year, from 1948 to 2010. As stated on the wiki, each file contains worldwide daily average temperatures on a 2.5° latitude × 2.5° longitude grid (144 × 73 grid points).

    Each file is organized as a list of arrays:

    • A one dimensional list of 73 latitude values, going from 90 to -90 in steps of -0.25

    • A one dimensional list of 144 longitude values, going from 0 to 357.5 in steps of 0.25

    • A list of 365 day indicators. These are absolute numbers, expressed in units of hours, i.e. they jump by 24 from one array position to the next

    • A three-dimensional array of numbers, 73 by 144 by 365. The first three arrays give the interpretation of the axes of this three-dimensional array.

    The numbers themselves are two-byte "short" integers. Graham do you know the units?

    Comment Source:Graham, thanks for the information about how to get started with R. I got started, and generated the first image using your sample code. For those who haven't looked into this data, here is how it is organized. There is one file per year, from 1948 to 2010. As stated on the wiki, each file contains worldwide daily average temperatures on a 2.5° latitude × 2.5° longitude grid (144 × 73 grid points). Each file is organized as a list of arrays: * A one dimensional list of 73 latitude values, going from 90 to -90 in steps of -0.25 * A one dimensional list of 144 longitude values, going from 0 to 357.5 in steps of 0.25 * A list of 365 day indicators. These are absolute numbers, expressed in units of hours, i.e. they jump by 24 from one array position to the next * A three-dimensional array of numbers, 73 by 144 by 365. The first three arrays give the interpretation of the axes of this three-dimensional array. The numbers themselves are two-byte "short" integers. Graham do you know the units?
  • 31.
    edited June 2014

    Can any of you R programmers (e.g. Graham :) help me by writing the following function? (I don't know R at all.) Thanks...

    First, I'll download all of the files into a directory.

    The function should take as arguments the start year and the end year. It should go through all of the indicated files, and assemble all the data into a single coherent structure.

    I suggest a four-dimensional array, organized by latitude, longitude, year and day. That's in line with the type of analyses conducted by the papers we are looking at.

    It would be good for us to do some test operations, working with the whole array in memory at once.

    Comment Source:Can any of you R programmers (e.g. Graham :) help me by writing the following function? (I don't know R at all.) Thanks... First, I'll download all of the files into a directory. The function should take as arguments the start year and the end year. It should go through all of the indicated files, and assemble all the data into a single coherent structure. I suggest a four-dimensional array, organized by latitude, longitude, year and day. That's in line with the type of analyses conducted by the papers we are looking at. It would be good for us to do some test operations, working with the whole array in memory at once.
  • 32.
    edited June 2014

    We are told that all of these files are over the same grid, so once we verify that, I see no need to make a fancy function that concatenates arrays which may have different axes.

    However, to be safe, it would be nice -- as a refinement -- if this function would verify that all data is over the same grid. It could do that pretty simply, by (1) storing the three arrays of axis values for the first file, as reference information, and then when reading subsequent files, comparing their axis arrays to the reference arrays. If any discrepancies are detected, the program could just print an error message and exit. This way we could sleep at night, knowing that since the program completed all the data is aligned.

    Comment Source:We are told that all of these files are over the same grid, so once we verify that, I see no need to make a fancy function that concatenates arrays which may have different axes. However, to be safe, it would be nice -- as a refinement -- if this function would verify that all data is over the same grid. It could do that pretty simply, by (1) storing the three arrays of axis values for the first file, as reference information, and then when reading subsequent files, comparing their axis arrays to the reference arrays. If any discrepancies are detected, the program could just print an error message and exit. This way we could sleep at night, knowing that since the program completed all the data is aligned.
  • 33.

    Here's the next R programming exercise: write a function that inputs the above array, and produces the four-dimensional array of seasonal adjusted temperatures that we spoke about elsewhere.

    Comment Source:Here's the next R programming exercise: write a function that inputs the above array, and produces the four-dimensional array of seasonal adjusted temperatures that we spoke about elsewhere.
  • 34.
    edited June 2014

    Let's start building up a library of functions and demos in our github repository.

    Comment Source:Let's start building up a library of functions and demos in our github repository.
  • 35.
    edited June 2014

    Graham you have the lead on all this R stuff, but if you want to take a rest let me know, and some of us can pick it up. For me there will be a learning curve, but that's okay!

    Comment Source:Graham you have the lead on all this R stuff, but if you want to take a rest let me know, and some of us can pick it up. For me there will be a learning curve, but that's okay!
  • 36.

    There is an incorrect link to the data.

    The link above Visualize NCEP Reanalysis Daily Averages Surface Level Data (Specify dimension values)

    ==>

    ERROR: Could not extract file information for Air temperature

    However I've individually downloaded the air-sigxxxx.xxxx.nc files and netcef-displayyear.r and edited the current working directory. What's the command line R invocation or interpreter command to run it over the .nc files and produce text output?

    Comment Source:There is an incorrect link to the data. The link above [Visualize NCEP Reanalysis Daily Averages Surface Level Data (Specify dimension values)](http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.surface.html) ==> ERROR: Could not extract file information for Air temperature However I've individually downloaded the air-sigxxxx.xxxx.nc files and netcef-displayyear.r and edited the current working directory. What's the command line R invocation or interpreter command to run it over the .nc files and produce text output?
  • 37.
    edited June 2014

    David, I have been thinking about making it easier for others to join in too. It's great that you're prepared to learn R, but not everyone will want to, and I'd rather that people can get on with playing around in their favourite language. We perhaps have somewhat different aims. At the moment I am interested in looking at the data in various ways, thinking about it, trying to come up with some of those "good concepts" that John talked about. This means a lot of messing about, and lot of messy code. I get the sense you want to do something more organised ("a library of functions") and perhaps that you want to implement the Ludescher algorithm exactly as they did.

    So I was thinking of using R to convert from the netCDF format into a very simple format that could easily be read as a single large file (about 11Mb a year for the area used by Ludescher). Four dimensional data is not great for that. I plan to go the other way and make two-dimensional data along the following lines.

    [Edited because I got N and S confused.]

                 S24E48              S24E49          S24E50           ...            S50E116
    Y1950P001 277.850002810359 279.800002766773 281.950002718717     ...        290.500002527609
    ...
    Y1951P365 277.220002824441 280.100002760068 281.350002718717     ...        290.970002695918 
    

    There is one row for each day, and 365 days in each year (leap days are omitted). In each row, you have temperatures in Kelvin for each grid point in a rectangle.

    S24E48 means 24 steps South from the North Pole and 48 steps East from Greenwich. Steps are 2.5 degrees. The points are in reading order, starting top left and going along the top row first.

    Y1950P001 means year 1950, period 1. You might want longer periods.

    You can convert into a four dimensional array if you want, either by parsing the row and column names, or by arithmetic. Personally, I don't want four dimensional arrays. One reason is that with 2D ararys, there is only one way of getting the indices muddled, y,x not x,y. With four indices there are 23 wrong ways. Also note that we'll being dealing with pairs of points, so you could be dealing with 8 indices.

    The numbers themselves are two-byte “short” integers. Graham do you know the units?

    I don't know what they are in the original file, but they are in Kelvin once read in by the R function.

    Comment Source:David, I have been thinking about making it easier for others to join in too. It's great that you're prepared to learn R, but not everyone will want to, and I'd rather that people can get on with playing around in their favourite language. We perhaps have somewhat different aims. At the moment I am interested in looking at the data in various ways, thinking about it, trying to come up with some of those "good concepts" that John talked about. This means a lot of messing about, and lot of messy code. I get the sense you want to do something more organised ("a library of functions") and perhaps that you want to implement the Ludescher algorithm exactly as they did. So I was thinking of using R to convert from the netCDF format into a very simple format that could easily be read as a single large file (about 11Mb a year for the area used by Ludescher). Four dimensional data is not great for that. I plan to go the other way and make two-dimensional data along the following lines. [Edited because I got N and S confused.] ~~~~ S24E48 S24E49 S24E50 ... S50E116 Y1950P001 277.850002810359 279.800002766773 281.950002718717 ... 290.500002527609 ... Y1951P365 277.220002824441 280.100002760068 281.350002718717 ... 290.970002695918 ~~~~ There is one row for each day, and 365 days in each year (leap days are omitted). In each row, you have temperatures in Kelvin for each grid point in a rectangle. S24E48 means 24 steps South from the North Pole and 48 steps East from Greenwich. Steps are 2.5 degrees. The points are in reading order, starting top left and going along the top row first. Y1950P001 means year 1950, period 1. You might want longer periods. You can convert into a four dimensional array if you want, either by parsing the row and column names, or by arithmetic. Personally, I don't want four dimensional arrays. One reason is that with 2D ararys, there is only one way of getting the indices muddled, y,x not x,y. With four indices there are 23 wrong ways. Also note that we'll being dealing with pairs of points, so you could be dealing with 8 indices. > The numbers themselves are two-byte “short” integers. Graham do you know the units? I don't know what they are in the original file, but they are in Kelvin once read in by the R function.
  • 38.

    What’s the command line R invocation or interpreter command to run it over the .nc files and produce text output?

    Something like this.

    "C:\Program Files\R\R-2.14.1\bin\R.exe" --slave --vanilla --args RP1 RP2 < Rscript.r

    The --slave and --vanilla flags are recommended for this sort of thing, I forget exactly what they do. The arguments following --args (RP1 and RP2) will be passed to the R script. They can be accessed via the R command commandArgs(). To produce output, you can use various R functions, see eg cat(). Output can be directed to a file using sink().

    Comment Source:> What’s the command line R invocation or interpreter command to run it over the .nc files and produce text output? Something like this. `"C:\Program Files\R\R-2.14.1\bin\R.exe" --slave --vanilla --args RP1 RP2 < Rscript.r` The `--slave` and `--vanilla` flags are recommended for this sort of thing, I forget exactly what they do. The arguments following `--args` (RP1 and RP2) will be passed to the R script. They can be accessed via the R command `commandArgs()`. To produce output, you can use various R functions, see eg `cat()`. Output can be directed to a file using `sink()`.
  • 39.

    I have uploaded an R script for converting netCDF data. I haven't put it on github yet.

    The simplest way to run this is to edit it to your requirements (instructions are in the script), then start R, and then copy and paste the whole file into the R console.

    Comment Source:I have uploaded an [R script for converting netCDF data](http://www.azimuthproject.org/azimuth/files/netcdf-convertor.r). I haven't put it on github yet. The simplest way to run this is to edit it to your requirements (instructions are in the script), then start R, and then copy and paste the whole file into the R console.
  • 40.

    Thanks Graham.

    Comment Source:Thanks Graham.
  • 41.

    I am not mainly interested in reproducing Ludescher's paper; however,

    1. Our previous attempts to reproduce a much simpler piece of work found a mistake in that work, which we traced back to a typo in the code. (I'm talking about the Zaliapin-Ghil model of Snowball Earth, and I see now that we failed to document our discovery on the Azimuth Wiki - the details are lurking in the Forum.) I suspect there are lots of things to be learned from trying and failing to reproduce known papers.

    2. Once we get good at this kind of stuff, it should not be terribly hard to reproduce Ludescher's work.

    I think it's great that David Tanzer wants to build a library of code that will help us do a variety of tasks, including reproducing existing papers. But I hope this doesn't slow down Graham too much, since he's already making progress in exploring the data.

    I would like to propose an interesting yet not too hard programming challenge to focus our efforts a little. I'll do this very soon.

    Also, I want to blog about El Niño soon and try to lure more people into this project. Based on work we've already done, I think I can already write decent introductory articles on

    1. El Niño in general, and why the inception of the next one is a hugely important question,

    2. The papers by Ludescher et al and Yamasaki et al.

    These would help solidify our knowledge and get others interested. I just need a day or two when I'm not busy doing other things. This should happen after I get to Singapore on Sunday.

    Comment Source:I am not mainly interested in reproducing Ludescher's paper; however, 1. Our previous attempts to reproduce a much simpler piece of work found a mistake in that work, which we traced back to a typo in the code. (I'm talking about the [Zaliapin-Ghil model](http://www.azimuthproject.org/azimuth/show/Another+look+at+climate+sensitivity) of Snowball Earth, and I see now that we failed to document our discovery on the Azimuth Wiki - the details are lurking in the Forum.) I suspect there are lots of things to be learned from trying and _failing_ to reproduce known papers. 2. Once we get good at this kind of stuff, it should not be terribly hard to reproduce Ludescher's work. I think it's great that David Tanzer wants to build a library of code that will help us do a variety of tasks, including reproducing existing papers. But I hope this doesn't slow down Graham too much, since he's already making progress in exploring the data. I would like to propose an interesting yet not too hard programming challenge to focus our efforts a little. I'll do this very soon. Also, I want to blog about El Ni&ntilde;o soon and try to lure more people into this project. Based on work we've already done, I think I can already write decent introductory articles on 1. El Ni&ntilde;o in general, and why the inception of the next one is a hugely important question, 2. The papers by [Ludescher et al](http://forum.azimuthproject.org/discussion/1360/paper-ludescher-et-al-improved-el-nino-forecasting-by-cooperativity-detection/#Item_38) and [Yamasaki et al](http://forum.azimuthproject.org/discussion/1357/paper-yamasaki-et-al-climate-networks-around-the-globe-are-significantly-effected-by-el-nino/#Item_37). These would help solidify our knowledge and get others interested. I just need a day or two when I'm not busy doing other things. This should happen after I get to Singapore on Sunday.
  • 42.
    edited June 2014

    Here's a rough rough sketch of a programming challenge. I, or we, can make it more precise.

    Write a program that for any given time $t$ and delay $\tau$, graphs the time-delayed covariance

    $$ C_{i,j}^{t}(-\tau) = \langle T_i(t) T_j(t - \tau) \rangle - \langle T_i(t) \rangle \langle T_j(t - \tau) \rangle $$ as a function of the distance between site $i$ and site $j$. Here $T_i(t)$ is the seasonally adjusted temperature at time $t$ at site $i$, and $\langle \rangle$ is a suitably defined time average.

    There are many details to iron out here, including the worry raised by Nad, and I would like to do this.

    But first I hope we can agree that this is important. Luedescher et al claim that these cross-covariances get big shortly before the inception of an El Niño. They phrase their claim in the language of networks, but it's not clear that the network language is needed. So, let's take a look at $C_{i,j}^{t}(-\tau) $ directly.

    Why plot it as a function of distance between $i$ and $j$? This sort of covariance tends to be big if $i$ and $j$ are close and small if they're farther away. In a strongly correlated system it tends to decay more slowly with increasing distance.

    "Distance" is a bit annoying to compute, since we have grid points labelled by latitude and longitude. Luckily most sites in the El Niño basin are near the equator. So, we can restrict attention to sites on the equator and use "difference in longitude" as a substitute for distance! This makes the calculation easier.

    I'm talking about the red dots on the equator here. This is Figure 1 from Ludescher's paper.

    Comment Source:Here's a rough _rough_ sketch of a programming challenge. I, or we, can make it more precise. Write a program that for any given time $t$ and delay $\tau$, graphs the time-delayed covariance $$ C_{i,j}^{t}(-\tau) = \langle T_i(t) T_j(t - \tau) \rangle - \langle T_i(t) \rangle \langle T_j(t - \tau) \rangle $$ as a function of the distance between site $i$ and site $j$. Here $T_i(t)$ is the seasonally adjusted temperature at time $t$ at site $i$, and $\langle \rangle$ is a suitably defined time average. There are many details to iron out here, including the worry raised by Nad, and I would like to do this. But first I hope we can agree that this is important. Luedescher _et al_ claim that these cross-covariances get big shortly before the inception of an El Ni&ntilde;o. They phrase their claim in the language of networks, but it's not clear that the network language is needed. So, let's take a look at $C_{i,j}^{t}(-\tau) $ directly. Why plot it as a function of distance between $i$ and $j$? This sort of covariance tends to be big if $i$ and $j$ are close and small if they're farther away. In a strongly correlated system it tends to decay more slowly with increasing distance. "Distance" is a bit annoying to compute, since we have grid points labelled by latitude and longitude. Luckily most sites in the El Ni&ntilde;o basin are near the equator. So, we can restrict attention to sites on the equator and use "difference in longitude" as a substitute for distance! This makes the calculation easier. <img src = "http://math.ucr.edu/home/baez/ecological/ludescher_el_nino_cooperativity_1.jpg" alt = ""/> I'm talking about the red dots on the equator here. This is Figure 1 from [Ludescher's paper](http://www.pnas.org/content/early/2013/06/26/1309353110.full.pdf+html).
  • 43.
    edited June 2014

    One of the details to be ironed out is: covariance or correlation?

    My preliminary look at some covariances shows they are much bigger over land than sea, for example. Although the area only includes some bits of land near the corners, it could distort the results. Ludescher et al use correlations. I don't like correlations, they seem much harder to relate to anything physical.

    I have also been wondering why they use a whole year for their covariance (and correlation)? I guess there are seasonal factors, even when seasonally adjusted temperatures are used. For example, if winds blow more strongly in some direction in a particular season, that could 'link' two points more strongly then. But you could use a second seasonal adjustment to cater for that. It seems shorter periods would give more up-to-date results, which seems helpful for precdictions.

    Comment Source:One of the details to be ironed out is: covariance or correlation? My preliminary look at some covariances shows they are much bigger over land than sea, for example. Although the area only includes some bits of land near the corners, it could distort the results. Ludescher et al use correlations. I don't *like* correlations, they seem much harder to relate to anything physical. I have also been wondering why they use a whole year for their covariance (and correlation)? I guess there are seasonal factors, even when seasonally adjusted temperatures are used. For example, if winds blow more strongly in some direction in a particular season, that could 'link' two points more strongly then. But you could use a second seasonal adjustment to cater for that. It seems shorter periods would give more up-to-date results, which seems helpful for precdictions.
  • 44.
    edited June 2014

    By "correlation" you mean covariance normalized so it lies between -1 and 1? If so, I agree with your distaste: I don't like this normalization for what we're doing here. In physics we'd never normalize in this way. (We talk about "2-point functions" or "Green's functions" rather than "covariances".)

    My preliminary look at some covariances shows they are much bigger over land than sea, for example.

    Cool! It's worthwhile learning things like this, and I bet there's a large literature on it that we haven't tapped into.

    I have also been wondering why they use a whole year for their covariance (and correlation)?

    I agree we shouldn't be wedded to that. It averages out some seasonal variations that might be distracting. But I'd rather see these seasonal variations before I decide to ignore them! These seasonal variations could be interesting. Nominally we're trying to predict the next El Niño, but it's possible that other discoveries we make en route could be just as important.

    Comment Source:By "correlation" you mean covariance normalized so it lies between -1 and 1? If so, I agree with your distaste: I don't like this normalization for what we're doing here. In physics we'd never normalize in this way. (We talk about "2-point functions" or "Green's functions" rather than "covariances".) > My preliminary look at some covariances shows they are much bigger over land than sea, for example. Cool! It's worthwhile learning things like this, and I bet there's a large literature on it that we haven't tapped into. > I have also been wondering why they use a whole year for their covariance (and correlation)? I agree we shouldn't be wedded to that. It averages out some seasonal variations that might be distracting. But I'd rather _see_ these seasonal variations before I decide to ignore them! These seasonal variations could be interesting. Nominally we're trying to predict the next El Ni&ntilde;o, but it's possible that other discoveries we make en route could be just as important.
  • 45.

    Here's part of that "large literature" on covariances in the Earth's climate:

    Teleconnections are, roughly, ways in which the weather at distant locations can be strongly correlated. To identify teleconnections, people take a big fat covariance matrix and try to break it down into simpler chunks, which are the teleconnections. The technique here is called empirical orthogonal functions (EOF). According to Wikipedia:

    It is the same as performing a principal components analysis on the data, except that the EOF method finds both time series and spatial patterns.

    Has anyone here spent time doing this kind of data analysis?

    Using this kind of analysis one can see that El Niño and other climate cycles are "real things" in a fairly objective way.

    Comment Source:Here's part of that "large literature" on covariances in the Earth's climate: * [Teleconnection pattern calculation procedures](http://www.cpc.ncep.noaa.gov/products/precip/CWlink/daily_ao_index/history/method.shtml). **Teleconnections** are, roughly, ways in which the weather at distant locations can be strongly correlated. To identify teleconnections, people take a big fat covariance matrix and try to break it down into simpler chunks, which are the teleconnections. The technique here is called [empirical orthogonal functions](http://en.wikipedia.org/wiki/Empirical_orthogonal_functions) (EOF). According to Wikipedia: > It is the same as performing a [principal components analysis](http://en.wikipedia.org/wiki/Principal_components_analysis) on the data, except that the EOF method finds both time series and spatial patterns. Has anyone here spent time doing this kind of data analysis? Using this kind of analysis one can see that El Ni&ntilde;o and other climate cycles are "real things" in a fairly objective way.
  • 46.
    edited June 2014

    Here is what's happening now:

    Comment Source:Here is what's happening now: <a href = "http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/enso_update/sstanim.shtml"> <img src = "http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/enso_update/sstanim.gif" alt = ""/> </a> <a href = "http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/enso_update/sstanim.shtml"> <img src = "http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/enso_update/sstaanim.gif" alt = ""/> </a>
  • 47.
    nad
    edited June 2014

    Here is what's happening now:

    I would like to see how good the statistical trend analysis like via correlations in Ludescher et al is in comparision to a visual analysis by a human here, especially since the article claims that:

    the inclusion of the two nodes south of the Equator is not essential for our results

    where as already said it appears that the anomalies in front of the South American Coast are rather significant. In the Ludescher et al article they are probably "averaged out" by the other values. On the other hand it may be that these anomalies are too late for a long term prediction (where it seems that the speed of anomaly change in the corresponding 1+2 region may eventually also play quite a role).

    Comment Source:> Here is what's happening now: > <a href = "http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/enso_update/sstanim.shtml"> <img src = "http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/enso_update/sstanim.gif" alt = ""/> </a> > <a href = "http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/enso_update/sstanim.shtml"> <img src = "http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/enso_update/sstaanim.gif" alt = ""/> </a> I would like to see how good the statistical trend analysis like via correlations in Ludescher et al is in comparision to a visual analysis by a human here, especially since the article <a href="http://forum.azimuthproject.org/discussion/1358/experiments-in-el-nino-detection-and-prediction/?Focus=10863#Comment_10863">claims that:</a> >the inclusion of the two nodes south of the Equator is not essential for our results where as <a href="http://forum.azimuthproject.org/discussion/1360/paper-ludescher-et-al-improved-el-nino-forecasting-by-cooperativity-detection/?Focus=10834#Comment_10834">already said</a> it appears that the anomalies in front of the South American Coast are rather significant. In the Ludescher et al article they are probably "averaged out" by the other values. On the other hand it may be that these anomalies are too late for a long term prediction (where it seems that the speed of anomaly change in the corresponding 1+2 region may eventually also play <a href="https://www.ncdc.noaa.gov/teleconnections/enso/indicators/sst.php">quite a role</a>).
  • 48.

    By “correlation” you mean covariance normalized so it lies between -1 and 1?

    Yes. I get why you might call covariances “2-point functions” but I thought “Green’s functions” were something more sophisticated.

    A long time ago I did some work on texture in grey-scale images. It appears that the human visual system only uses second order statistics to discriminate textures. "Second order statistics" means any function of the joint distribution of pairs of pixels; this includes information about the relative locations of the pairs. I'd like to look at the temperature maps from that point of view.

    Comment Source:> By “correlation” you mean covariance normalized so it lies between -1 and 1? Yes. I get why you might call covariances “2-point functions” but I thought “Green’s functions” were something more sophisticated. A long time ago I did some work on texture in grey-scale images. It appears that the human visual system only uses second order statistics to discriminate textures. "Second order statistics" means any function of the joint distribution of pairs of pixels; this includes information about the relative locations of the pairs. I'd like to look at the temperature maps from that point of view.
  • 49.

    It appears that the human visual system only uses second order statistics to discriminate textures. “Second order statistics” means any function of the joint distribution of pairs of pixels; this includes information about the relative locations of the pairs.

    I think that the human visual system may make use of more than just pairs of pixels, like the relative position of three etc. points. And the human visual system is rather sensitive to fast changes. E.g. something is noticed if it is almost in 90 degrees from the central view axis if it moves fastly enough. So humans may detect from behind jumping leopards faster.

    Comment Source:>It appears that the human visual system only uses second order statistics to discriminate textures. “Second order statistics” means any function of the joint distribution of pairs of pixels; this includes information about the relative locations of the pairs. I think that the human visual system may make use of more than just pairs of pixels, like the relative position of three etc. points. And the human visual system is rather sensitive to fast changes. E.g. something is noticed if it is almost in 90 degrees from the central view axis if it moves fastly enough. So humans may detect from behind jumping leopards faster.
  • 50.

    I am talking about discriminating textures, not seeing leopards.

    Comment Source:I am talking about discriminating textures, not seeing leopards.
Sign In or Register to comment.