Options

Studying the dependence of covariance on distance

John wrote:

It seems quite plausible to me that correlations between different locations increase as we approach a widespread event like an El Niño. In statistical mechanics we think a lot about 2-point functions - covariances between the value of some field $F$ at one point $i$ and another point $j$:

$$ C_{i,j} = \langle F_i F_j \rangle - \langle F_i \rangle \langle F_j \rangle $$ 2-point functions typically decay exponentially as the distance between the points $i$ and $j$ increases. However, as our system approaches a phase transition, e.g. as a solid approaches its melting point, its 2-point functions decay more slowly, and right at the phase transition they often show power-law decay.

In other words: when something dramatic is on the brink of happening, the system displays a lot of correlation between distant locations.

Does the start of an El Niño act in this way? That seems like a good question.

We should take this up.

Comments

  • 1.
    edited June 2014

    It would be nice to have a series of graphs over time, showing the dependence of average link strength on distance. This could be visualized as a surface plot, with independent variable time and link distance.

    Perhaps we could find a suitable metric for how rapidly the link strength falls off as distance increases. Then we could plot this metric as a function of time, and try to relate it to an El Niño index.

    Comment Source:It would be nice to have a series of graphs over time, showing the dependence of average link strength on distance. This could be visualized as a surface plot, with independent variable time and link distance. Perhaps we could find a suitable metric for how rapidly the link strength falls off as distance increases. Then we could plot this metric as a function of time, and try to relate it to an El Niño index.
  • 2.
    edited June 2014

    This computation raises some technical issues.

    • An algorithmic issue, arising from the rectangular nature of the sampling grid.

    • A performance issue, arising from the quadratic number of pairs of nodes which can be correlated.

    Comment Source:This computation raises some technical issues. * An algorithmic issue, arising from the rectangular nature of the sampling grid. * A performance issue, arising from the quadratic number of pairs of nodes which can be correlated.
  • 3.
    edited June 2014

    Suppose we fix a time t, and want to compute the mean link strengths for a range of distances D = d, 2d, 3d, ....

    I haven't yet come up with an algorithm that I am satisfied with. In the following messages, I will present various ideas, and then discuss their strengths and weaknesses.

    Comment Source:Suppose we fix a time t, and want to compute the mean link strengths for a range of distances D = d, 2d, 3d, .... I haven't yet come up with an algorithm that I am satisfied with. In the following messages, I will present various ideas, and then discuss their strengths and weaknesses.
  • 4.
    edited June 2014

    Algo 1.

    Choose N random line L segments of length D, compute the link strengths between the nodes at the endpoints of L, and return the mean link strength as an estimate.

    Details. Let A be the rectangle of interest. Choose a random point p1 = (x1,y1) in A. Then choose a random angle $\theta$ between 0 and 360, and let $p2$ be the point at distance D from p1 in this direction. If p2 lies outside of A, then chuck it and try again.

    For large enough N, we should get a good sampling of the mean. We could experiment with increasing N, to verify that the results converge.

    Strength: solves the performance problem.

    Major problem: data is sampled on a square grid with spacing of 2.5 degrees, so the endpoints of L will not fall on the grid points.

    Minor problem: for larger values of D, the range of possible samples gets artificially smaller, with the endpoints being concentrated near the edges of A. But presumably link strengths will be low across such large distances, so the problem gets zeroed away.

    Comment Source:Algo 1. Choose N random line L segments of length D, compute the link strengths between the nodes at the endpoints of L, and return the mean link strength as an estimate. Details. Let A be the rectangle of interest. Choose a random point p1 = (x1,y1) in A. Then choose a random angle $\theta$ between 0 and 360, and let $p2$ be the point at distance D from p1 in this direction. If p2 lies outside of A, then chuck it and try again. For large enough N, we should get a good sampling of the mean. We could experiment with increasing N, to verify that the results converge. Strength: solves the performance problem. Major problem: data is sampled on a square grid with spacing of 2.5 degrees, so the endpoints of L will not fall on the grid points. Minor problem: for larger values of D, the range of possible samples gets artificially smaller, with the endpoints being concentrated near the edges of A. But presumably link strengths will be low across such large distances, so the problem gets zeroed away.
  • 5.
    edited June 2014

    To rectify this, as much as possible, consider various strategies for "snapping" the endpoints of L to the nearest grid point.

    Then for the distance D, we will end up with a sequence of line segments whose lengths are "centered" around D, and an associated measure of link strength.

    We can then combine these records into an estimate of mean link strength, not necessarily attributed exactly to D. For instance, we might take the mean link strength, and attribute this to the mean of the actual line lengths.

    Comment Source:To rectify this, as much as possible, consider various strategies for "snapping" the endpoints of L to the nearest grid point. Then for the distance D, we will end up with a sequence of line segments whose lengths are "centered" around D, and an associated measure of link strength. We can then combine these records into an estimate of mean link strength, not necessarily attributed exactly to D. For instance, we might take the mean link strength, and attribute this to the mean of the actual line lengths.
  • 6.

    We could also partitioning the ray of distances into intervals of some size B. Then pick random line segments L (not of any specific length), check their length, determine which interval J it fits into, and put the link strength record into a bucket of measurements for J. When enough measurements have accumulated for J, combine them to produce an estimate of mean link strength for J.

    Major problem: if the link strengths fall off exponentially, or even according to a power law, then most of the activity of the resulting function will take place within a small range of small distances, and so we can't afford to make the results coarser by taking ranges of distances.

    Comment Source:We could also partitioning the ray of distances into intervals of some size B. Then pick random line segments L (not of any specific length), check their length, determine which interval J it fits into, and put the link strength record into a bucket of measurements for J. When enough measurements have accumulated for J, combine them to produce an estimate of mean link strength for J. Major problem: if the link strengths fall off exponentially, or even according to a power law, then most of the activity of the resulting function will take place within a small range of small distances, and so we can't afford to make the results coarser by taking ranges of distances.
  • 7.
    edited June 2014

    The issue becomes most acutely clear when you consider a distance of one grid point.

    At this radius, you can only get measurements of covariance in each of the four directions north, south, east, west.

    Do we give up on making any kind of estimate for the covariance, at distance 1, in the 45 degree directions? It doesn't seem nice to do this, because the larger distances will cover a much more comprehensive range of angles.

    The best estimate we have to work with are the covariances at the 45 degree angles, for a distance of $\sqrt 2$.

    Some ideas:

    • Include these covariances, but give them a smaller weight when computing the mean

    • Include them, unweighted, but using the idea I mentioned at the end of message 6, attribute the mean link strength not to distance 1, but to the mean of the values e.g. 1, 1, $\sqrt 2$, 1, $\sqrt 2$, ... of the actual line segments that are contributing to the mean.

    Comment Source:The issue becomes most acutely clear when you consider a distance of one grid point. At this radius, you can only get measurements of covariance in each of the four directions north, south, east, west. Do we give up on making any kind of estimate for the covariance, at distance 1, in the 45 degree directions? It doesn't seem nice to do this, because the larger distances will cover a much more comprehensive range of angles. The best estimate we have to work with are the covariances at the 45 degree angles, for a distance of $\sqrt 2$. Some ideas: * Include these covariances, but give them a smaller weight when computing the mean * Include them, unweighted, but using the idea I mentioned at the end of message 6, attribute the mean link strength not to distance 1, but to the mean of the values e.g. 1, 1, $\sqrt 2$, 1, $\sqrt 2$, ... of the actual line segments that are contributing to the mean.
  • 8.
    edited June 2014

    Maybe the grouping of distances into larger intervals is the way to go. The sampling resolution is 2.5 degrees, which is at most 172.5 miles, on the equator.

    In the paper:

    the authors describe a "short" distance link as 750 km, and a long distance link as 1500 km.

    That paper deals extensively with the relationship between distance and link strength, but I didn't see any mention of the issues caused by a discrete grid.

    Comment Source:Maybe the grouping of distances into larger intervals is the way to go. The sampling resolution is 2.5 degrees, which is at most 172.5 miles, on the equator. In the paper: * Y. Berezin, A. Gozolchiani, O. Guez & S. Havlin, [Stability of Climate Networks with Time](http://www.nature.com/srep/2012/120918/srep00666/full/srep00666.html), 2012, Scientific Reports. the authors describe a "short" distance link as 750 km, and a long distance link as 1500 km. That paper deals extensively with the relationship between distance and link strength, but I didn't see any mention of the issues caused by a discrete grid.
  • 9.

    That's all that I have for now. Let me what assessments people have.

    I'm leaning towards breaking the distances into intervals. Only I'm a bit uncomfortable about just averaging the results in the interval of smallest radius, since there could be a big drop down taking place due to an exponential decay, and the ordinary mean could be inappropriate here.

    Probably we should pick the most plausible, do-able approach, and see what it gives us.

    Comment Source:That's all that I have for now. Let me what assessments people have. I'm leaning towards breaking the distances into intervals. Only I'm a bit uncomfortable about just averaging the results in the interval of smallest radius, since there could be a big drop down taking place due to an exponential decay, and the ordinary mean could be inappropriate here. Probably we should pick the most plausible, do-able approach, and see what it gives us.
  • 10.
    edited June 2014

    By the way, Graham, is there any particular reason why you plotted the median covariance, rather than the mean, in your first demo program?

    The treatment I have just given would have to be rethought if we were after the median.

    Comment Source:By the way, Graham, is there any particular reason why you plotted the median covariance, rather than the mean, in your first demo program? The treatment I have just given would have to be rethought if we were after the median.
  • 11.

    Covariances can have a very skewed distribution, with a few very large values, which dominate the mean. I'd also note you can get negative covariances, which could "cancel" positive ones elsewhere in a mean.

    Comment Source:Covariances can have a very skewed distribution, with a few very large values, which dominate the mean. I'd also note you can get negative covariances, which could "cancel" positive ones elsewhere in a mean.
  • 12.
    edited June 2014

    Not that this is decisive, but as a point of reference note that Yamasaki et al. take the absolute value of the covariances, when forming their definition of link strength.

    Here is their full definition of link strength, where $X(r,\ell,y,\tau)$ is the covariance between nodes r and $\ell$, for year y and time lag $\tau$.

    $$ Max(\ell,r,y) = Max \{ |X(r,\ell,y,\tau)| : \: \tau_{min} \leq \tau \leq \tau_{max} \} $$ $$ Std(\ell,r,y) = Std \{ |X(r,\ell,y,\tau)| : \: \tau_{min} \leq \tau \leq \tau_{max} \} $$ $$ W(\ell,r,y) = Max(\ell,r,y) \, / \, Std(\ell,r,y) $$ $ W(\ell,r,y) $ is called the correlation strength of the link between $\ell$ and $r$ during year $y$.

    The value of $\tau$ at which $ Max(\ell,r,y) $ is achieved is the time delay between points $\ell$ and $r$ during year $y$.

    I noted that you didn't like the normalization by the standard deviation, but that's another matter.

    Also we haven't even gotten to the issue of maximization over varying $\tau$, which introduces another dimension to the performance costs.

    Comment Source:Not that this is decisive, but as a point of reference note that Yamasaki et al. take the absolute value of the covariances, when forming their definition of link strength. Here is their full definition of link strength, where $X(r,\ell,y,\tau)$ is the covariance between nodes r and $\ell$, for year y and time lag $\tau$. > $$ Max(\ell,r,y) = Max \{ |X(r,\ell,y,\tau)| : \: \tau_{min} \leq \tau \leq \tau_{max} \} $$ > $$ Std(\ell,r,y) = Std \{ |X(r,\ell,y,\tau)| : \: \tau_{min} \leq \tau \leq \tau_{max} \} $$ > $$ W(\ell,r,y) = Max(\ell,r,y) \, / \, Std(\ell,r,y) $$ > $ W(\ell,r,y) $ is called the _correlation strength_ of the link between $\ell$ and $r$ during year $y$. > The value of $\tau$ at which $ Max(\ell,r,y) $ is achieved is the _time delay_ between points $\ell$ and $r$ during year $y$. I noted that you didn't like the normalization by the standard deviation, but that's another matter. Also we haven't even gotten to the issue of maximization over varying $\tau$, which introduces another dimension to the performance costs.
  • 13.

    I've been looking at Ludescher et al, not Yamasaki et al. Ludescher et al do two normalisations. "we divide the cross-covariances by the corresponding standard deviations (SD) of Ti and Tj, to obtain the cross-correlations." Then they do what you said. It was that first normalisation I said didn't like. (I'm not saying I like the other either!)

    Comment Source:I've been looking at Ludescher et al, not Yamasaki et al. Ludescher et al do two normalisations. "we divide the cross-covariances by the corresponding standard deviations (SD) of Ti and Tj, to obtain the cross-correlations." Then they do what you said. It was that first normalisation I said didn't like. (I'm not saying I like the other either!)
  • 14.
    edited June 2014

    What do you think of their use of the absolute value. Does that in way address your concerns about negative covariances -- or is it just a band-aid?

    Comment Source:What do you think of their use of the absolute value. Does that in way address your concerns about negative covariances -- or is it just a band-aid?
  • 15.

    I wouldn't call it a band-aid, but what they call the correlation strength is very ad hoc. There are dozens of other ways you could measure the strength of a link and no reason to choose this one in particular.

    Comment Source:I wouldn't call it a band-aid, but what they call the *correlation strength* is very ad hoc. There are dozens of other ways you could measure the strength of a link and no reason to choose this one in particular.
  • 16.
    edited June 2014

    Graham, I introduced some tangents in the discussion above, which I'd like to snip out by restating the point that I meant to make.

    I'm concerned about the computational cost of computing all covariances at all distances, which is why I proposed the sampling approach to estimating the mean covariance for each of the distances.

    You said you preferred the median because the covariances can be very unevenly distributed, and also because of your concern about positive and negative covariances cancelling each other out.

    But using sampling to estimate the median looks like a substantial problem in itself.

    I was wondering about borrowing the idea from Yamaski et al. of taking the mean of the absolute values of the covariance. That way we could still use sampling, but it would avoid the effect of negative and positive covariances cancelling each other out.

    This still wouldn't address the first issue you raised, about the uneven distribution of the covariances.

    Comment Source:Graham, I introduced some tangents in the discussion above, which I'd like to snip out by restating the point that I meant to make. I'm concerned about the computational cost of computing all covariances at all distances, which is why I proposed the sampling approach to estimating the mean covariance for each of the distances. You said you preferred the median because the covariances can be very unevenly distributed, and also because of your concern about positive and negative covariances cancelling each other out. But using sampling to estimate the median looks like a substantial problem in itself. I was wondering about borrowing the idea from Yamaski et al. of taking the mean of the absolute values of the covariance. That way we could still use sampling, but it would avoid the effect of negative and positive covariances cancelling each other out. This still wouldn't address the first issue you raised, about the uneven distribution of the covariances.
  • 17.
    edited June 2014

    I’m concerned about the computational cost of computing all covariances at all distances

    I have put some code on github for faster calculation of covariances. It takes advantage of the fact that for Ludescher-type sets of covariances, there is much overlap between the ranges. It needs more testing before serious use. It does ~1000 days (every ten days for 28 years) by 401 time-delays (tau values) between two points in about 1.5s.

    But using sampling to estimate the median looks like a substantial problem in itself.

    Why? You have to collect all the samples to find the median, whereas with a mean you can keeping a running total, but that's not a substantial problem.

    Comment Source:> I’m concerned about the computational cost of computing all covariances at all distances I have put some code on github for faster calculation of covariances. It takes advantage of the fact that for Ludescher-type sets of covariances, there is much overlap between the ranges. It needs more testing before serious use. It does ~1000 days (every ten days for 28 years) by 401 time-delays (tau values) between two points in about 1.5s. > But using sampling to estimate the median looks like a substantial problem in itself. Why? You have to collect all the samples to find the median, whereas with a mean you can keeping a running total, but that's not a substantial problem.
  • 18.
    edited June 2014

    Here is a quote that I came across on the web:

    The sample median is an unbiased estimator of the population median when the population is normal. However, for a general population it is not true that the sample median is an unbiased estimator of the population median. The sample mean is a biased estimator of the population median when the population is not symmetric.

    But I don't have enough experience with statistics, or with our data, to assess the significance of this concern, in our case.

    So for now I'll stay with the median, but we should keep this footnote in our minds.

    Comment Source:Here is a quote that I came across on the [web](http://individual.utoronto.ca/pivovarova/teaching/HW_21_SOLN.pdf): > The sample median is an unbiased estimator of the population median when the population is normal. However, for a general population it is not true that the sample median is an unbiased estimator of the population median. The sample mean is a biased estimator of the population median when the population is not symmetric. But I don't have enough experience with statistics, or with our data, to assess the significance of this concern, in our case. So for now I'll stay with the median, but we should keep this footnote in our minds.
  • 19.
    edited June 2014

    I worked out the algorithm that I described above, using R. It does give good performance, and the data looks like a smoothly decreasing function of node separation.

    I'm running it overnight, to generate a table for each of the years 1950-1979, which has one column for each of the node separations D = 0, 1, ..., 27, which are in units of 2.5 degrees. It will show the median covariance, taken over a sample size of 10,000 randomly chosen segments for each integer value of D. For each randomly chosen segment, I use the floor of it's length, measured in Cartesian grid units, to determine which bucket should go into. The median of each bucket gets shown in a corresponding column.

    The algorithm needs a modification, though: it is working in Cartesian grid units, which doesn't take into account the fact that the distance between two adjacent nodes in the grid is a function of latitude.

    Comment Source:I worked out the algorithm that I described above, using R. It does give good performance, and the data looks like a smoothly decreasing function of node separation. I'm running it overnight, to generate a table for each of the years 1950-1979, which has one column for each of the node separations D = 0, 1, ..., 27, which are in units of 2.5 degrees. It will show the median covariance, taken over a sample size of 10,000 randomly chosen segments for each integer value of D. For each randomly chosen segment, I use the floor of it's length, measured in Cartesian grid units, to determine which bucket should go into. The median of each bucket gets shown in a corresponding column. The algorithm needs a modification, though: it is working in Cartesian grid units, which doesn't take into account the fact that the distance between two adjacent nodes in the grid is a function of latitude.
  • 20.
    edited June 2014

    Here is the data. Covariances were computed over a full 365 day window.

    This is still not corrected for the dependence of node distance on latitude.

    1950, 0.376725, 0.288752, 0.184332, 0.117790, 0.082108, 0.055115, 0.038667, 0.027804, 0.017671, 0.011439, 0.007086, 0.004478, 0.001744, -0.001047, -0.003089, -0.003710, -0.003404, -0.004410, -0.003479, -0.003606, -0.004439, -0.003517, -0.002411, -0.002327, -0.000939, -0.000110, 0.003093

    1951, 0.497873, 0.378028, 0.267959, 0.206597, 0.164560, 0.121937, 0.084815, 0.065603, 0.045090, 0.035339, 0.029342, 0.023507, 0.020172, 0.019567, 0.015739, 0.013974, 0.010840, 0.010878, 0.011903, 0.014327, 0.013779, 0.019440, 0.018217, 0.019181, 0.019640, 0.018455, 0.015653

    1952, 0.384914, 0.298311, 0.193402, 0.136403, 0.096268, 0.063049, 0.046147, 0.035085, 0.026154, 0.019055, 0.014410, 0.008885, 0.007904, 0.005009, 0.004154, 0.003535, 0.002084, 0.001440, 0.001872, 0.001446, 0.002742, 0.003582, 0.005336, 0.005406, 0.003590, 0.004583, 0.004938

    1953, 0.404755, 0.299675, 0.193846, 0.131324, 0.087064, 0.053213, 0.037445, 0.028035, 0.020515, 0.015935, 0.012084, 0.010311, 0.008631, 0.008659, 0.009659, 0.009075, 0.009953, 0.010439, 0.012139, 0.012957, 0.012625, 0.010603, 0.009926, 0.005277, 0.004812, 0.001944, 0.001748

    1954, 0.399608, 0.324695, 0.232220, 0.179475, 0.143515, 0.108126, 0.081813, 0.062538, 0.051074, 0.040224, 0.033944, 0.028852, 0.025769, 0.023752, 0.021954, 0.019374, 0.017265, 0.017973, 0.018237, 0.018248, 0.018128, 0.017861, 0.019269, 0.022279, 0.023335, 0.026768, 0.033545

    1955, 0.423631, 0.307121, 0.223340, 0.170559, 0.132915, 0.089765, 0.061923, 0.042745, 0.029450, 0.019171, 0.012016, 0.007414, 0.003746, 0.002021, 0.000216, -0.000743, -0.002419, -0.003040, -0.003243, -0.003336, -0.003471, -0.002443, -0.001914, -0.000111, 0.000197, 0.001615, 0.004419

    1956, 0.398572, 0.297339, 0.211447, 0.146149, 0.101105, 0.063289, 0.044106, 0.032605, 0.022991, 0.016632, 0.012465, 0.011271, 0.009290, 0.010205, 0.009812, 0.008534, 0.009789, 0.009654, 0.011038, 0.009764, 0.009680, 0.008852, 0.007636, 0.007611, 0.008158, 0.008140, 0.008346

    1957, 0.543840, 0.401238, 0.300696, 0.229548, 0.170417, 0.117823, 0.085493, 0.058618, 0.038805, 0.026047, 0.018612, 0.011135, 0.011133, 0.005943, 0.006544, 0.007047, 0.007505, 0.009342, 0.007762, 0.008633, 0.008941, 0.012584, 0.012921, 0.013848, 0.013240, 0.010421, 0.016420

    1958, 0.401656, 0.320032, 0.211704, 0.163894, 0.121845, 0.083730, 0.055344, 0.036292, 0.022771, 0.014026, 0.007086, 0.002262, 0.000560, -0.001399, -0.003168, -0.004113, -0.003320, -0.003257, -0.002088, -0.001063, -0.000352, -0.000769, 0.001491, 0.001296, 0.000530, -0.000339, 0.001791

    1959, 0.406817, 0.317248, 0.203533, 0.152294, 0.114097, 0.082428, 0.059843, 0.044288, 0.032283, 0.022619, 0.017355, 0.012539, 0.010661, 0.009816, 0.010079, 0.009338, 0.010045, 0.008826, 0.009526, 0.009328, 0.006070, 0.007269, 0.002881, 0.003538, 0.002745, 0.002050, 0.004178

    1960, 0.404540, 0.293296, 0.182790, 0.126116, 0.081986, 0.053903, 0.038619, 0.029649, 0.022639, 0.018154, 0.014553, 0.011814, 0.009597, 0.007263, 0.006036, 0.005744, 0.005145, 0.005096, 0.005608, 0.005323, 0.004823, 0.005340, 0.004700, 0.003929, 0.004074, 0.003112, 0.000932

    1961, 0.436981, 0.318925, 0.227808, 0.170093, 0.131357, 0.096418, 0.074433, 0.059509, 0.050305, 0.043425, 0.038501, 0.032744, 0.030665, 0.026243, 0.025636, 0.024662, 0.024857, 0.023646, 0.023234, 0.023648, 0.024311, 0.023317, 0.023183, 0.022564, 0.019814, 0.019071, 0.021283

    1962, 0.396058, 0.299080, 0.187232, 0.129761, 0.090867, 0.063143, 0.045590, 0.035252, 0.027418, 0.020875, 0.016034, 0.013393, 0.010684, 0.008866, 0.008027, 0.008095, 0.007091, 0.006710, 0.006786, 0.005350, 0.005228, 0.005071, 0.004673, 0.003920, 0.001334, 0.000442, 0.001578

    1963, 0.479960, 0.365897, 0.249881, 0.193506, 0.152478, 0.110162, 0.073808, 0.050997, 0.035377, 0.023616, 0.018152, 0.011323, 0.009910, 0.004800, 0.002581, 0.000238, 0.001071, 0.002614, 0.002234, 0.002974, 0.004150, 0.004119, 0.004514, 0.005624, 0.004134, 0.003697, 0.004461

    1964, 0.413449, 0.332507, 0.259229, 0.205005, 0.165699, 0.114814, 0.076885, 0.055181, 0.041309, 0.031944, 0.025338, 0.018501, 0.016807, 0.013228, 0.013626, 0.012466, 0.014382, 0.015428, 0.016407, 0.016345, 0.016640, 0.018331, 0.016619, 0.020420, 0.016845, 0.020146, 0.026646

    1965, 0.530515, 0.432047, 0.337793, 0.268720, 0.213892, 0.145845, 0.097379, 0.068950, 0.047180, 0.036840, 0.029960, 0.021405, 0.019833, 0.013756, 0.008812, 0.006022, 0.005071, 0.004931, 0.005695, 0.008717, 0.011845, 0.013156, 0.013968, 0.014541, 0.010263, 0.009971, 0.011144

    Comment Source:Here is the data. Covariances were computed over a full 365 day window. This is still not corrected for the dependence of node distance on latitude. 1950, 0.376725, 0.288752, 0.184332, 0.117790, 0.082108, 0.055115, 0.038667, 0.027804, 0.017671, 0.011439, 0.007086, 0.004478, 0.001744, -0.001047, -0.003089, -0.003710, -0.003404, -0.004410, -0.003479, -0.003606, -0.004439, -0.003517, -0.002411, -0.002327, -0.000939, -0.000110, 0.003093 1951, 0.497873, 0.378028, 0.267959, 0.206597, 0.164560, 0.121937, 0.084815, 0.065603, 0.045090, 0.035339, 0.029342, 0.023507, 0.020172, 0.019567, 0.015739, 0.013974, 0.010840, 0.010878, 0.011903, 0.014327, 0.013779, 0.019440, 0.018217, 0.019181, 0.019640, 0.018455, 0.015653 1952, 0.384914, 0.298311, 0.193402, 0.136403, 0.096268, 0.063049, 0.046147, 0.035085, 0.026154, 0.019055, 0.014410, 0.008885, 0.007904, 0.005009, 0.004154, 0.003535, 0.002084, 0.001440, 0.001872, 0.001446, 0.002742, 0.003582, 0.005336, 0.005406, 0.003590, 0.004583, 0.004938 1953, 0.404755, 0.299675, 0.193846, 0.131324, 0.087064, 0.053213, 0.037445, 0.028035, 0.020515, 0.015935, 0.012084, 0.010311, 0.008631, 0.008659, 0.009659, 0.009075, 0.009953, 0.010439, 0.012139, 0.012957, 0.012625, 0.010603, 0.009926, 0.005277, 0.004812, 0.001944, 0.001748 1954, 0.399608, 0.324695, 0.232220, 0.179475, 0.143515, 0.108126, 0.081813, 0.062538, 0.051074, 0.040224, 0.033944, 0.028852, 0.025769, 0.023752, 0.021954, 0.019374, 0.017265, 0.017973, 0.018237, 0.018248, 0.018128, 0.017861, 0.019269, 0.022279, 0.023335, 0.026768, 0.033545 1955, 0.423631, 0.307121, 0.223340, 0.170559, 0.132915, 0.089765, 0.061923, 0.042745, 0.029450, 0.019171, 0.012016, 0.007414, 0.003746, 0.002021, 0.000216, -0.000743, -0.002419, -0.003040, -0.003243, -0.003336, -0.003471, -0.002443, -0.001914, -0.000111, 0.000197, 0.001615, 0.004419 1956, 0.398572, 0.297339, 0.211447, 0.146149, 0.101105, 0.063289, 0.044106, 0.032605, 0.022991, 0.016632, 0.012465, 0.011271, 0.009290, 0.010205, 0.009812, 0.008534, 0.009789, 0.009654, 0.011038, 0.009764, 0.009680, 0.008852, 0.007636, 0.007611, 0.008158, 0.008140, 0.008346 1957, 0.543840, 0.401238, 0.300696, 0.229548, 0.170417, 0.117823, 0.085493, 0.058618, 0.038805, 0.026047, 0.018612, 0.011135, 0.011133, 0.005943, 0.006544, 0.007047, 0.007505, 0.009342, 0.007762, 0.008633, 0.008941, 0.012584, 0.012921, 0.013848, 0.013240, 0.010421, 0.016420 1958, 0.401656, 0.320032, 0.211704, 0.163894, 0.121845, 0.083730, 0.055344, 0.036292, 0.022771, 0.014026, 0.007086, 0.002262, 0.000560, -0.001399, -0.003168, -0.004113, -0.003320, -0.003257, -0.002088, -0.001063, -0.000352, -0.000769, 0.001491, 0.001296, 0.000530, -0.000339, 0.001791 1959, 0.406817, 0.317248, 0.203533, 0.152294, 0.114097, 0.082428, 0.059843, 0.044288, 0.032283, 0.022619, 0.017355, 0.012539, 0.010661, 0.009816, 0.010079, 0.009338, 0.010045, 0.008826, 0.009526, 0.009328, 0.006070, 0.007269, 0.002881, 0.003538, 0.002745, 0.002050, 0.004178 1960, 0.404540, 0.293296, 0.182790, 0.126116, 0.081986, 0.053903, 0.038619, 0.029649, 0.022639, 0.018154, 0.014553, 0.011814, 0.009597, 0.007263, 0.006036, 0.005744, 0.005145, 0.005096, 0.005608, 0.005323, 0.004823, 0.005340, 0.004700, 0.003929, 0.004074, 0.003112, 0.000932 1961, 0.436981, 0.318925, 0.227808, 0.170093, 0.131357, 0.096418, 0.074433, 0.059509, 0.050305, 0.043425, 0.038501, 0.032744, 0.030665, 0.026243, 0.025636, 0.024662, 0.024857, 0.023646, 0.023234, 0.023648, 0.024311, 0.023317, 0.023183, 0.022564, 0.019814, 0.019071, 0.021283 1962, 0.396058, 0.299080, 0.187232, 0.129761, 0.090867, 0.063143, 0.045590, 0.035252, 0.027418, 0.020875, 0.016034, 0.013393, 0.010684, 0.008866, 0.008027, 0.008095, 0.007091, 0.006710, 0.006786, 0.005350, 0.005228, 0.005071, 0.004673, 0.003920, 0.001334, 0.000442, 0.001578 1963, 0.479960, 0.365897, 0.249881, 0.193506, 0.152478, 0.110162, 0.073808, 0.050997, 0.035377, 0.023616, 0.018152, 0.011323, 0.009910, 0.004800, 0.002581, 0.000238, 0.001071, 0.002614, 0.002234, 0.002974, 0.004150, 0.004119, 0.004514, 0.005624, 0.004134, 0.003697, 0.004461 1964, 0.413449, 0.332507, 0.259229, 0.205005, 0.165699, 0.114814, 0.076885, 0.055181, 0.041309, 0.031944, 0.025338, 0.018501, 0.016807, 0.013228, 0.013626, 0.012466, 0.014382, 0.015428, 0.016407, 0.016345, 0.016640, 0.018331, 0.016619, 0.020420, 0.016845, 0.020146, 0.026646 1965, 0.530515, 0.432047, 0.337793, 0.268720, 0.213892, 0.145845, 0.097379, 0.068950, 0.047180, 0.036840, 0.029960, 0.021405, 0.019833, 0.013756, 0.008812, 0.006022, 0.005071, 0.004931, 0.005695, 0.008717, 0.011845, 0.013156, 0.013968, 0.014541, 0.010263, 0.009971, 0.011144
  • 21.
    edited June 2014

    1966, 0.431148, 0.331803, 0.238255, 0.176686, 0.141642, 0.089962, 0.060394, 0.039816, 0.022766, 0.011258, 0.004415, -0.004761, -0.006303, -0.007417, -0.009172, -0.008656, -0.007522, -0.005489, -0.002784, -0.001813, -0.000654, -0.001337, 0.000206, -0.001158, -0.001903, -0.001847, -0.004526

    1967, 0.417590, 0.315164, 0.213519, 0.146833, 0.105399, 0.067396, 0.045697, 0.032650, 0.020687, 0.013863, 0.008831, 0.004853, 0.004300, 0.001801, 0.001962, 0.001687, 0.000790, 0.001165, 0.001586, 0.002647, 0.002899, 0.002957, 0.005071, 0.005849, 0.005952, 0.004707, 0.006682

    1968, 0.512509, 0.404913, 0.289787, 0.219778, 0.171534, 0.126884, 0.096705, 0.076529, 0.059857, 0.048526, 0.042656, 0.036140, 0.032940, 0.028841, 0.026615, 0.025556, 0.024169, 0.024714, 0.026404, 0.027640, 0.027231, 0.029520, 0.029747, 0.029645, 0.028538, 0.029928, 0.025921

    1969, 0.415270, 0.313612, 0.212363, 0.153155, 0.114511, 0.080217, 0.059733, 0.045833, 0.031744, 0.024738, 0.018835, 0.014443, 0.011237, 0.008032, 0.007759, 0.007580, 0.007433, 0.005875, 0.005508, 0.004729, 0.003742, 0.002756, 0.002415, 0.001675, 0.003120, 0.002357, -0.002137

    1970, 0.497127, 0.412444, 0.294776, 0.226329, 0.166967, 0.116271, 0.087155, 0.059640, 0.044915, 0.033142, 0.027946, 0.020460, 0.016456, 0.012873, 0.011462, 0.011314, 0.009976, 0.009153, 0.012228, 0.012125, 0.012177, 0.011530, 0.011566, 0.008241, 0.006988, 0.006295, 0.008723

    1971, 0.424364, 0.332461, 0.224057, 0.157024, 0.110483, 0.072468, 0.053449, 0.038617, 0.028035, 0.019310, 0.013372, 0.007930, 0.004398, 0.000019, -0.001477, -0.001690, -0.002654, -0.002683, -0.001316, 0.000114, 0.000257, -0.001043, -0.001038, -0.000761, -0.002161, -0.001195, -0.002994

    1972, 0.691285, 0.591039, 0.476096, 0.361859, 0.275870, 0.162873, 0.107006, 0.071540, 0.051868, 0.036460, 0.027582, 0.017843, 0.013376, 0.008525, 0.006463, 0.004207, 0.005106, 0.004518, 0.006773, 0.005468, 0.005762, 0.005188, 0.004774, 0.003143, 0.000716, 0.000555, -0.000404

    1973, 0.719328, 0.602857, 0.442475, 0.339179, 0.267357, 0.185539, 0.128436, 0.085810, 0.059455, 0.040934, 0.030712, 0.021248, 0.014541, 0.010812, 0.007942, 0.009611, 0.005944, 0.005143, 0.005824, 0.002407, 0.005623, 0.006767, 0.005630, 0.005233, 0.003683, 0.006677, 0.010963

    1974, 0.395127, 0.309754, 0.225025, 0.179570, 0.143235, 0.102333, 0.076058, 0.049868, 0.037540, 0.027670, 0.018539, 0.013229, 0.010906, 0.007226, 0.006533, 0.003032, 0.005356, 0.004035, 0.007664, 0.007887, 0.007309, 0.007498, 0.009876, 0.011102, 0.010993, 0.008149, 0.013693

    1975, 0.411585, 0.314415, 0.233452, 0.183707, 0.149873, 0.108196, 0.077837, 0.058354, 0.045755, 0.034201, 0.027057, 0.019275, 0.017942, 0.014944, 0.011740, 0.008582, 0.008528, 0.007633, 0.004492, 0.005337, 0.004136, 0.005322, 0.006265, 0.004904, 0.004796, 0.006067, 0.007700

    1976, 0.701437, 0.611954, 0.486288, 0.387918, 0.311690, 0.232872, 0.153192, 0.118535, 0.088980, 0.075090, 0.060840, 0.052749, 0.054625, 0.044829, 0.041111, 0.035870, 0.032889, 0.031577, 0.032975, 0.033839, 0.038191, 0.037870, 0.039298, 0.041152, 0.041833, 0.041602, 0.061608

    1977, 0.487561, 0.364822, 0.239694, 0.175498, 0.128164, 0.089747, 0.063153, 0.045851, 0.029987, 0.019526, 0.010389, 0.005615, 0.000417, -0.002110, -0.005536, -0.006853, -0.007390, -0.006935, -0.005430, -0.004005, -0.000692, -0.002648, -0.001338, 0.001523, 0.001367, 0.002584, 0.009944

    1978, 0.485333, 0.372565, 0.262490, 0.192783, 0.146089, 0.105030, 0.075198, 0.055268, 0.039472, 0.028793, 0.020672, 0.014129, 0.013400, 0.009390, 0.009295, 0.005212, 0.005422, 0.003599, 0.005449, 0.004420, 0.004095, 0.004498, 0.003332, 0.001888, 0.000352, -0.003374, -0.006576

    1979, 0.487816, 0.352541, 0.232255, 0.178560, 0.136964, 0.094659, 0.067236, 0.047334, 0.033249, 0.022715, 0.014910, 0.008833, 0.008597, 0.006866, 0.005765, 0.006084, 0.005813, 0.005550, 0.005076, 0.007923, 0.007074, 0.008297, 0.009846, 0.010525, 0.010280, 0.011980, 0.008838

    Comment Source:1966, 0.431148, 0.331803, 0.238255, 0.176686, 0.141642, 0.089962, 0.060394, 0.039816, 0.022766, 0.011258, 0.004415, -0.004761, -0.006303, -0.007417, -0.009172, -0.008656, -0.007522, -0.005489, -0.002784, -0.001813, -0.000654, -0.001337, 0.000206, -0.001158, -0.001903, -0.001847, -0.004526 1967, 0.417590, 0.315164, 0.213519, 0.146833, 0.105399, 0.067396, 0.045697, 0.032650, 0.020687, 0.013863, 0.008831, 0.004853, 0.004300, 0.001801, 0.001962, 0.001687, 0.000790, 0.001165, 0.001586, 0.002647, 0.002899, 0.002957, 0.005071, 0.005849, 0.005952, 0.004707, 0.006682 1968, 0.512509, 0.404913, 0.289787, 0.219778, 0.171534, 0.126884, 0.096705, 0.076529, 0.059857, 0.048526, 0.042656, 0.036140, 0.032940, 0.028841, 0.026615, 0.025556, 0.024169, 0.024714, 0.026404, 0.027640, 0.027231, 0.029520, 0.029747, 0.029645, 0.028538, 0.029928, 0.025921 1969, 0.415270, 0.313612, 0.212363, 0.153155, 0.114511, 0.080217, 0.059733, 0.045833, 0.031744, 0.024738, 0.018835, 0.014443, 0.011237, 0.008032, 0.007759, 0.007580, 0.007433, 0.005875, 0.005508, 0.004729, 0.003742, 0.002756, 0.002415, 0.001675, 0.003120, 0.002357, -0.002137 1970, 0.497127, 0.412444, 0.294776, 0.226329, 0.166967, 0.116271, 0.087155, 0.059640, 0.044915, 0.033142, 0.027946, 0.020460, 0.016456, 0.012873, 0.011462, 0.011314, 0.009976, 0.009153, 0.012228, 0.012125, 0.012177, 0.011530, 0.011566, 0.008241, 0.006988, 0.006295, 0.008723 1971, 0.424364, 0.332461, 0.224057, 0.157024, 0.110483, 0.072468, 0.053449, 0.038617, 0.028035, 0.019310, 0.013372, 0.007930, 0.004398, 0.000019, -0.001477, -0.001690, -0.002654, -0.002683, -0.001316, 0.000114, 0.000257, -0.001043, -0.001038, -0.000761, -0.002161, -0.001195, -0.002994 1972, 0.691285, 0.591039, 0.476096, 0.361859, 0.275870, 0.162873, 0.107006, 0.071540, 0.051868, 0.036460, 0.027582, 0.017843, 0.013376, 0.008525, 0.006463, 0.004207, 0.005106, 0.004518, 0.006773, 0.005468, 0.005762, 0.005188, 0.004774, 0.003143, 0.000716, 0.000555, -0.000404 1973, 0.719328, 0.602857, 0.442475, 0.339179, 0.267357, 0.185539, 0.128436, 0.085810, 0.059455, 0.040934, 0.030712, 0.021248, 0.014541, 0.010812, 0.007942, 0.009611, 0.005944, 0.005143, 0.005824, 0.002407, 0.005623, 0.006767, 0.005630, 0.005233, 0.003683, 0.006677, 0.010963 1974, 0.395127, 0.309754, 0.225025, 0.179570, 0.143235, 0.102333, 0.076058, 0.049868, 0.037540, 0.027670, 0.018539, 0.013229, 0.010906, 0.007226, 0.006533, 0.003032, 0.005356, 0.004035, 0.007664, 0.007887, 0.007309, 0.007498, 0.009876, 0.011102, 0.010993, 0.008149, 0.013693 1975, 0.411585, 0.314415, 0.233452, 0.183707, 0.149873, 0.108196, 0.077837, 0.058354, 0.045755, 0.034201, 0.027057, 0.019275, 0.017942, 0.014944, 0.011740, 0.008582, 0.008528, 0.007633, 0.004492, 0.005337, 0.004136, 0.005322, 0.006265, 0.004904, 0.004796, 0.006067, 0.007700 1976, 0.701437, 0.611954, 0.486288, 0.387918, 0.311690, 0.232872, 0.153192, 0.118535, 0.088980, 0.075090, 0.060840, 0.052749, 0.054625, 0.044829, 0.041111, 0.035870, 0.032889, 0.031577, 0.032975, 0.033839, 0.038191, 0.037870, 0.039298, 0.041152, 0.041833, 0.041602, 0.061608 1977, 0.487561, 0.364822, 0.239694, 0.175498, 0.128164, 0.089747, 0.063153, 0.045851, 0.029987, 0.019526, 0.010389, 0.005615, 0.000417, -0.002110, -0.005536, -0.006853, -0.007390, -0.006935, -0.005430, -0.004005, -0.000692, -0.002648, -0.001338, 0.001523, 0.001367, 0.002584, 0.009944 1978, 0.485333, 0.372565, 0.262490, 0.192783, 0.146089, 0.105030, 0.075198, 0.055268, 0.039472, 0.028793, 0.020672, 0.014129, 0.013400, 0.009390, 0.009295, 0.005212, 0.005422, 0.003599, 0.005449, 0.004420, 0.004095, 0.004498, 0.003332, 0.001888, 0.000352, -0.003374, -0.006576 1979, 0.487816, 0.352541, 0.232255, 0.178560, 0.136964, 0.094659, 0.067236, 0.047334, 0.033249, 0.022715, 0.014910, 0.008833, 0.008597, 0.006866, 0.005765, 0.006084, 0.005813, 0.005550, 0.005076, 0.007923, 0.007074, 0.008297, 0.009846, 0.010525, 0.010280, 0.011980, 0.008838
  • 22.

    Thanks! But year 1971 is missing 6 numbers!

    Comment Source:Thanks! But year 1971 is missing 6 numbers!
  • 23.
    edited June 2014

    Hi, I just pasted those numbers in above.

    Comment Source:Hi, I just pasted those numbers in above.
  • 24.

    The top three graphs show the square roots of the absolute values of the covariances. The bottom three are similar, but the values are scaled so that the covariance for the smallest distance is always 1.

    Comment Source:The top three graphs show the square roots of the absolute values of the covariances. The bottom three are similar, but the values are scaled so that the covariance for the smallest distance is always 1. <img width = "1200" src = "http://www.azimuthproject.org/azimuth/files/cov-dists-1950-1979.png" alt = ""/>
  • 25.
    edited June 2014

    Great!

    I'll have to ponder these... in the 1950's, the purple year 1954 seems to have remarkable high covariances over long distances. What happened in 1954? In the 1960s, it seems 1968 and 1969 are the winners.

    Comment Source:Great! <img src = "http://math.ucr.edu/home/baez/emoticons/thumb.gif" alt = ""/> I'll have to ponder these... in the 1950's, the purple year 1954 seems to have remarkable high covariances over long distances. What happened in 1954? In the 1960s, it seems 1968 and 1969 are the winners.
  • 26.

    David wrote:

    Suppose we fix a time t, and want to compute the mean link strengths for a range of distances D = d, 2d, 3d, ….

    What exactly is meant by "node separation" ? How exactly did you compute this distance in "degrees" Graham? Is 68 the maximal distance?

    Comment Source:David wrote: >Suppose we fix a time t, and want to compute the mean link strengths for a range of distances D = d, 2d, 3d, …. What exactly is meant by "node separation" ? How exactly did you compute this distance in "degrees" Graham? Is 68 the maximal distance?
  • 27.

    What exactly is meant by “node separation” ? How exactly did you compute this distance in “degrees” Graham?

    There were 27 values. I just multiplied by 2.5, which is the seperation of the NOAA data. This was a guess.

    Is 68 the maximal distance?

    In the graphs, it is 27*2.5 = 67.5. I suspect I shpuld have multiplied (0 to 26) by 2.5, not (1 to 27), so it probably should be 26*2.5=65.

    We'll have to ask David to be sure.

    Comment Source:> What exactly is meant by “node separation” ? How exactly did you compute this distance in “degrees” Graham? There were 27 values. I just multiplied by 2.5, which is the seperation of the NOAA data. This was a guess. > Is 68 the maximal distance? In the graphs, it is `27*2.5 = 67.5`. I suspect I shpuld have multiplied (0 to 26) by 2.5, not (1 to 27), so it probably should be `26*2.5=65`. We'll have to ask David to be sure.
  • 28.
    edited June 2014

    Graham, that's a good visualization of the data, thanks.

    The first value was for a separation of zero degrees. So multiplication by 2.5 is correct.

    For each degree of separation D, the graph shows the square root of the absolute value of the median covariance -- using a sampling estimate of 10,000 random pairs in the neighborhood of each value for D.

    Should I rerun it for the whole period, up to 2010? Also, I can increase the number of samples, say to 20,000, to see what kind of impact that has.

    For purposes of peer-review, I'll post a pseudo-code description of the algorithm here, and also will post the code to Github. (Bugs are always possible, especially in the first cut of a program.)

    Comment Source:Graham, that's a good visualization of the data, thanks. The first value was for a separation of zero degrees. So multiplication by 2.5 is correct. For each degree of separation D, the graph shows the square root of the absolute value of the median covariance -- using a sampling estimate of 10,000 random pairs in the neighborhood of each value for D. Should I rerun it for the whole period, up to 2010? Also, I can increase the number of samples, say to 20,000, to see what kind of impact that has. For purposes of peer-review, I'll post a pseudo-code description of the algorithm here, and also will post the code to Github. (Bugs are always possible, especially in the first cut of a program.)
  • 29.
    edited June 2014

    Here's the code I used for the graphs. David, feel free to incorporate this into your code on github.

    x <- scan("C:/Users/Work/Desktop/dtanzerdists.txt", sep=',')
    
    xm <- matrix(x, nrow=30, ncol=28, byrow=TRUE)
    
    covs <- xm[,-1]
    
    sqrtcovs <- matrix(0, ncol=ncol(covs), nrow=nrow(covs))
    sqrtnormcovs <- matrix(0, ncol=ncol(covs), nrow=nrow(covs))
    for (y in 1:30) {
      sqrtcovs[y,] <- sqrt(abs(covs[y,]))
    }
    for (y in 1:30) {
      sqrtnormcovs[y,] <- sqrt(abs(covs[y,] / covs[y,1] ))
    }
    
    
    clrs <- c("#d4004fff", "#ff8546ff", "#c68500ff", "#f5d22dff", "#b3b800ff",
      "#8ff66aff", "#00b09cff", "#8ec1ffff", "#6917ddff", "#db7effff")
    
    oldpar <- par(mfrow=(c(2,3)))
    for (i in 1:3) {
      dec <- c(1950,1960,1970)[i]
      matplot((1:27)*2.5, t(sqrtcovs[(i*10-9):(i*10),]), ylim=c(min(sqrtcovs),max(sqrtcovs)),
              main=paste0(dec, "s"), xlab="distance in degrees", 
              ylab="sqrt of abs(covariance)", type='l', col=clrs, lty=1, lwd=2)
      legend(x="topright", legend=paste(as.character(dec+(0:9))), col=clrs, lty=1, lwd=2)
    }
    for (i in 1:3) {
      dec <- c(1950,1960,1970)[i]
      matplot((1:27)*2.5, t(sqrtnormcovs[(i*10-9):(i*10),]), ylim=c(0,1),
              main=paste0(dec, "s"), xlab="distance in degrees", 
              ylab="sqrt of normalised abs(covariance)", type='l', col=clrs, lty=1, lwd=2)
      legend(x="topright", legend=paste(as.character(dec+(0:9))), col=clrs, lty=1, lwd=2)
    }
    par(oldpar)
    
    Comment Source:Here's the code I used for the graphs. David, feel free to incorporate this into your code on github. ~~~~ x <- scan("C:/Users/Work/Desktop/dtanzerdists.txt", sep=',') xm <- matrix(x, nrow=30, ncol=28, byrow=TRUE) covs <- xm[,-1] sqrtcovs <- matrix(0, ncol=ncol(covs), nrow=nrow(covs)) sqrtnormcovs <- matrix(0, ncol=ncol(covs), nrow=nrow(covs)) for (y in 1:30) { sqrtcovs[y,] <- sqrt(abs(covs[y,])) } for (y in 1:30) { sqrtnormcovs[y,] <- sqrt(abs(covs[y,] / covs[y,1] )) } clrs <- c("#d4004fff", "#ff8546ff", "#c68500ff", "#f5d22dff", "#b3b800ff", "#8ff66aff", "#00b09cff", "#8ec1ffff", "#6917ddff", "#db7effff") oldpar <- par(mfrow=(c(2,3))) for (i in 1:3) { dec <- c(1950,1960,1970)[i] matplot((1:27)*2.5, t(sqrtcovs[(i*10-9):(i*10),]), ylim=c(min(sqrtcovs),max(sqrtcovs)), main=paste0(dec, "s"), xlab="distance in degrees", ylab="sqrt of abs(covariance)", type='l', col=clrs, lty=1, lwd=2) legend(x="topright", legend=paste(as.character(dec+(0:9))), col=clrs, lty=1, lwd=2) } for (i in 1:3) { dec <- c(1950,1960,1970)[i] matplot((1:27)*2.5, t(sqrtnormcovs[(i*10-9):(i*10),]), ylim=c(0,1), main=paste0(dec, "s"), xlab="distance in degrees", ylab="sqrt of normalised abs(covariance)", type='l', col=clrs, lty=1, lwd=2) legend(x="topright", legend=paste(as.character(dec+(0:9))), col=clrs, lty=1, lwd=2) } par(oldpar) ~~~~
  • 30.

    Should I rerun it for the whole period, up to 2010? Also, I can increase the number of samples, say to 20,000, to see what kind of impact that has.

    I think other things would be more interesting:

    • fix the distance calculation to take account of the roundness of the world - does that affect the results?

    • do east-west distances behave differently from north-south ones?

    • what happens if you use a larger area?

    Comment Source:> Should I rerun it for the whole period, up to 2010? Also, I can increase the number of samples, say to 20,000, to see what kind of impact that has. I think other things would be more interesting: * fix the distance calculation to take account of the roundness of the world - does that affect the results? * do east-west distances behave differently from north-south ones? * what happens if you use a larger area?
  • 31.

    Re: your first point, I'm noting a blog article on great circle distance calculations in R, from which I'm planning to use the haversine formula.

    Comment Source:Re: your first point, I'm noting a blog article on [great circle distance calculations in R](http://www.r-bloggers.com/great-circle-distance-calculations-in-r/), from which I'm planning to use the haversine formula.
  • 32.
    What is the raw data you are all working with? How can I download it? If you are trying to find "important" correlations, glasso may be worth trying: http://statweb.stanford.edu/~tibs/glasso/
    Comment Source:What is the raw data you are all working with? How can I download it? If you are trying to find &quot;important&quot; correlations, glasso may be worth trying: http://statweb.stanford.edu/~tibs/glasso/
  • 33.

    Daniel, see this comment and others in that thread.

    Comment Source:Daniel, see this [comment](http://forum.azimuthproject.org/discussion/1366/r-programming-language/?Focus=11072#Comment_11072) and others in that thread.
  • 34.

    Thanks Graham. Where can I find what all the file types in ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis.dailyavgs/surface/ are?

    Comment Source:Thanks Graham. Where can I find what all the file types in ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis.dailyavgs/surface/ are?
  • 35.

    I assume from your post in the R programming thread that you already know as much as I do about the file formats, namely: RNetCDF works.

    Comment Source:I assume from your post in the R programming thread that you already know as much as I do about the file formats, namely: RNetCDF works.
  • 36.

    Yes, but my last question about file types was about the sematics. There are a lot of other files ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis.dailyavgs/surface/ besides what we are working with here, so I was wondering if there was some key to what they all represent.

    Comment Source:Yes, but my last question about file types was about the sematics. There are a lot of other files ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis.dailyavgs/surface/ besides what we are working with here, so I was wondering if there was some key to what they all represent.
  • 37.
    edited June 2014

    Instead of looking around the ftp site, it could be easier to visit their website:

    This describes (and provides access to) lots of files, which I hope correspond to those here:

    You can get even more stuff if you go up to

    which provides access to 7 kinds of files:

    • Pressure level
    • Surface
    • Surface Fluxes
    • Other Fluxes
    • Tropopause
    • Derived Data
    • Spectral Coefficients
    Comment Source:Instead of looking around the ftp site, it could be easier to visit their website: * [NCEP/NCAR Reanalysis 1: Surface](http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.surface.html). This describes (and provides access to) lots of files, which I hope correspond to those here: * [ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis.dailyavgs/surface/](ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis.dailyavgs/surface/) You can get even more stuff if you go up to * [NCEP/NCAR Reanalysis 1: Summary](http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.html) which provides access to 7 kinds of files: * Pressure level * Surface * Surface Fluxes * Other Fluxes * Tropopause * Derived Data * Spectral Coefficients
  • 38.
    edited June 2014

    NOAA seems very shy about the units they use. This site has some details.

    Comment Source:NOAA seems very shy about the units they use. This [site](http://icdc.zmaw.de/ncep.html?&L=1) has some details.
  • 39.
    edited July 2014

    Here is the path to the code on github which I used to generate the data for the median covariance as a function of node separation: covariance-by-distance.R.

    I posted this data in messages 21 and 22, and Graham plotted it in message 25.

    The first part of the code consists of Graham's functions for reading the data, and performing the seasonal adjustment of the temperatures. The second part of the file contains the functions I wrote to compute the covariance by distance.

    It's my first R program, and it is highly unpolished. In particular, I'm just printing the .csv data to the console, and running the program through "tee output.txt". It took most of the night to generate the output.

    (Overall it seems more efficient for us to post code that generates data, even when it is not in the greatest form.)

    Comment Source:Here is the path to the code on github which I used to generate the data for the median covariance as a function of node separation: [covariance-by-distance.R](https://raw.githubusercontent.com/azimuth-project/el-nino/master/R/tanzer/covariance-by-distance.R). I posted this data in messages 21 and 22, and Graham plotted it in message 25. The first part of the code consists of Graham's functions for reading the data, and performing the seasonal adjustment of the temperatures. The second part of the file contains the functions I wrote to compute the covariance by distance. It's my first R program, and it is highly unpolished. In particular, I'm just printing the .csv data to the console, and running the program through "tee output.txt". It took most of the night to generate the output. (Overall it seems more efficient for us to post code that generates data, even when it is not in the greatest form.)
  • 40.
    edited July 2014

    Yes, it's good to post code.

    I'm going to want to start writing a lot of short blog articles that show graphs and other visualizations, with links to the code that made them, and just enough explanation that people can understand what they're seeing. People here are starting to generate lots of visualizations, and I think blog articles would be a way to make it easier to find these. I think people would enjoy reading them, too.

    The trick will be to not make a big deal of it. A graph may or may not be very important; it can still be good to see and discuss.

    I will need to let these blog posts interweave with the more carefully polished "El Niño project (part $n$)" posts.

    Comment Source:Yes, it's good to post code. I'm going to want to start writing a lot of short blog articles that show graphs and other visualizations, with links to the code that made them, and just enough explanation that people can understand what they're seeing. People here are starting to generate lots of visualizations, and I think blog articles would be a way to make it easier to find these. I think people would enjoy reading them, too. The trick will be to not make a big deal of it. A graph may or may not be very important; it can still be good to see and discuss. I will need to let these blog posts interweave with the more carefully polished "El Ni&ntilde;o project (part $n$)" posts.
Sign In or Register to comment.