It looks like you're new here. If you want to get involved, click one of these buttons!
John wrote:
It seems quite plausible to me that correlations between different locations increase as we approach a widespread event like an El Niño. In statistical mechanics we think a lot about 2-point functions - covariances between the value of some field $F$ at one point $i$ and another point $j$:
$$ C_{i,j} = \langle F_i F_j \rangle - \langle F_i \rangle \langle F_j \rangle $$ 2-point functions typically decay exponentially as the distance between the points $i$ and $j$ increases. However, as our system approaches a phase transition, e.g. as a solid approaches its melting point, its 2-point functions decay more slowly, and right at the phase transition they often show power-law decay.In other words: when something dramatic is on the brink of happening, the system displays a lot of correlation between distant locations.
Does the start of an El Niño act in this way? That seems like a good question.
We should take this up.
Comments
It would be nice to have a series of graphs over time, showing the dependence of average link strength on distance. This could be visualized as a surface plot, with independent variable time and link distance.
Perhaps we could find a suitable metric for how rapidly the link strength falls off as distance increases. Then we could plot this metric as a function of time, and try to relate it to an El Niño index.
It would be nice to have a series of graphs over time, showing the dependence of average link strength on distance. This could be visualized as a surface plot, with independent variable time and link distance. Perhaps we could find a suitable metric for how rapidly the link strength falls off as distance increases. Then we could plot this metric as a function of time, and try to relate it to an El Niño index.
This computation raises some technical issues.
An algorithmic issue, arising from the rectangular nature of the sampling grid.
A performance issue, arising from the quadratic number of pairs of nodes which can be correlated.
This computation raises some technical issues. * An algorithmic issue, arising from the rectangular nature of the sampling grid. * A performance issue, arising from the quadratic number of pairs of nodes which can be correlated.
Suppose we fix a time t, and want to compute the mean link strengths for a range of distances D = d, 2d, 3d, ....
I haven't yet come up with an algorithm that I am satisfied with. In the following messages, I will present various ideas, and then discuss their strengths and weaknesses.
Suppose we fix a time t, and want to compute the mean link strengths for a range of distances D = d, 2d, 3d, .... I haven't yet come up with an algorithm that I am satisfied with. In the following messages, I will present various ideas, and then discuss their strengths and weaknesses.
Algo 1.
Choose N random line L segments of length D, compute the link strengths between the nodes at the endpoints of L, and return the mean link strength as an estimate.
Details. Let A be the rectangle of interest. Choose a random point p1 = (x1,y1) in A. Then choose a random angle $\theta$ between 0 and 360, and let $p2$ be the point at distance D from p1 in this direction. If p2 lies outside of A, then chuck it and try again.
For large enough N, we should get a good sampling of the mean. We could experiment with increasing N, to verify that the results converge.
Strength: solves the performance problem.
Major problem: data is sampled on a square grid with spacing of 2.5 degrees, so the endpoints of L will not fall on the grid points.
Minor problem: for larger values of D, the range of possible samples gets artificially smaller, with the endpoints being concentrated near the edges of A. But presumably link strengths will be low across such large distances, so the problem gets zeroed away.
Algo 1. Choose N random line L segments of length D, compute the link strengths between the nodes at the endpoints of L, and return the mean link strength as an estimate. Details. Let A be the rectangle of interest. Choose a random point p1 = (x1,y1) in A. Then choose a random angle $\theta$ between 0 and 360, and let $p2$ be the point at distance D from p1 in this direction. If p2 lies outside of A, then chuck it and try again. For large enough N, we should get a good sampling of the mean. We could experiment with increasing N, to verify that the results converge. Strength: solves the performance problem. Major problem: data is sampled on a square grid with spacing of 2.5 degrees, so the endpoints of L will not fall on the grid points. Minor problem: for larger values of D, the range of possible samples gets artificially smaller, with the endpoints being concentrated near the edges of A. But presumably link strengths will be low across such large distances, so the problem gets zeroed away.
To rectify this, as much as possible, consider various strategies for "snapping" the endpoints of L to the nearest grid point.
Then for the distance D, we will end up with a sequence of line segments whose lengths are "centered" around D, and an associated measure of link strength.
We can then combine these records into an estimate of mean link strength, not necessarily attributed exactly to D. For instance, we might take the mean link strength, and attribute this to the mean of the actual line lengths.
To rectify this, as much as possible, consider various strategies for "snapping" the endpoints of L to the nearest grid point. Then for the distance D, we will end up with a sequence of line segments whose lengths are "centered" around D, and an associated measure of link strength. We can then combine these records into an estimate of mean link strength, not necessarily attributed exactly to D. For instance, we might take the mean link strength, and attribute this to the mean of the actual line lengths.
We could also partitioning the ray of distances into intervals of some size B. Then pick random line segments L (not of any specific length), check their length, determine which interval J it fits into, and put the link strength record into a bucket of measurements for J. When enough measurements have accumulated for J, combine them to produce an estimate of mean link strength for J.
Major problem: if the link strengths fall off exponentially, or even according to a power law, then most of the activity of the resulting function will take place within a small range of small distances, and so we can't afford to make the results coarser by taking ranges of distances.
We could also partitioning the ray of distances into intervals of some size B. Then pick random line segments L (not of any specific length), check their length, determine which interval J it fits into, and put the link strength record into a bucket of measurements for J. When enough measurements have accumulated for J, combine them to produce an estimate of mean link strength for J. Major problem: if the link strengths fall off exponentially, or even according to a power law, then most of the activity of the resulting function will take place within a small range of small distances, and so we can't afford to make the results coarser by taking ranges of distances.
The issue becomes most acutely clear when you consider a distance of one grid point.
At this radius, you can only get measurements of covariance in each of the four directions north, south, east, west.
Do we give up on making any kind of estimate for the covariance, at distance 1, in the 45 degree directions? It doesn't seem nice to do this, because the larger distances will cover a much more comprehensive range of angles.
The best estimate we have to work with are the covariances at the 45 degree angles, for a distance of $\sqrt 2$.
Some ideas:
Include these covariances, but give them a smaller weight when computing the mean
Include them, unweighted, but using the idea I mentioned at the end of message 6, attribute the mean link strength not to distance 1, but to the mean of the values e.g. 1, 1, $\sqrt 2$, 1, $\sqrt 2$, ... of the actual line segments that are contributing to the mean.
The issue becomes most acutely clear when you consider a distance of one grid point. At this radius, you can only get measurements of covariance in each of the four directions north, south, east, west. Do we give up on making any kind of estimate for the covariance, at distance 1, in the 45 degree directions? It doesn't seem nice to do this, because the larger distances will cover a much more comprehensive range of angles. The best estimate we have to work with are the covariances at the 45 degree angles, for a distance of $\sqrt 2$. Some ideas: * Include these covariances, but give them a smaller weight when computing the mean * Include them, unweighted, but using the idea I mentioned at the end of message 6, attribute the mean link strength not to distance 1, but to the mean of the values e.g. 1, 1, $\sqrt 2$, 1, $\sqrt 2$, ... of the actual line segments that are contributing to the mean.
Maybe the grouping of distances into larger intervals is the way to go. The sampling resolution is 2.5 degrees, which is at most 172.5 miles, on the equator.
In the paper:
the authors describe a "short" distance link as 750 km, and a long distance link as 1500 km.
That paper deals extensively with the relationship between distance and link strength, but I didn't see any mention of the issues caused by a discrete grid.
Maybe the grouping of distances into larger intervals is the way to go. The sampling resolution is 2.5 degrees, which is at most 172.5 miles, on the equator. In the paper: * Y. Berezin, A. Gozolchiani, O. Guez & S. Havlin, [Stability of Climate Networks with Time](http://www.nature.com/srep/2012/120918/srep00666/full/srep00666.html), 2012, Scientific Reports. the authors describe a "short" distance link as 750 km, and a long distance link as 1500 km. That paper deals extensively with the relationship between distance and link strength, but I didn't see any mention of the issues caused by a discrete grid.
That's all that I have for now. Let me what assessments people have.
I'm leaning towards breaking the distances into intervals. Only I'm a bit uncomfortable about just averaging the results in the interval of smallest radius, since there could be a big drop down taking place due to an exponential decay, and the ordinary mean could be inappropriate here.
Probably we should pick the most plausible, do-able approach, and see what it gives us.
That's all that I have for now. Let me what assessments people have. I'm leaning towards breaking the distances into intervals. Only I'm a bit uncomfortable about just averaging the results in the interval of smallest radius, since there could be a big drop down taking place due to an exponential decay, and the ordinary mean could be inappropriate here. Probably we should pick the most plausible, do-able approach, and see what it gives us.
By the way, Graham, is there any particular reason why you plotted the median covariance, rather than the mean, in your first demo program?
The treatment I have just given would have to be rethought if we were after the median.
By the way, Graham, is there any particular reason why you plotted the median covariance, rather than the mean, in your first demo program? The treatment I have just given would have to be rethought if we were after the median.
Covariances can have a very skewed distribution, with a few very large values, which dominate the mean. I'd also note you can get negative covariances, which could "cancel" positive ones elsewhere in a mean.
Covariances can have a very skewed distribution, with a few very large values, which dominate the mean. I'd also note you can get negative covariances, which could "cancel" positive ones elsewhere in a mean.
Not that this is decisive, but as a point of reference note that Yamasaki et al. take the absolute value of the covariances, when forming their definition of link strength.
Here is their full definition of link strength, where $X(r,\ell,y,\tau)$ is the covariance between nodes r and $\ell$, for year y and time lag $\tau$.
I noted that you didn't like the normalization by the standard deviation, but that's another matter.
Also we haven't even gotten to the issue of maximization over varying $\tau$, which introduces another dimension to the performance costs.
Not that this is decisive, but as a point of reference note that Yamasaki et al. take the absolute value of the covariances, when forming their definition of link strength. Here is their full definition of link strength, where $X(r,\ell,y,\tau)$ is the covariance between nodes r and $\ell$, for year y and time lag $\tau$. > $$ Max(\ell,r,y) = Max \{ |X(r,\ell,y,\tau)| : \: \tau_{min} \leq \tau \leq \tau_{max} \} $$ > $$ Std(\ell,r,y) = Std \{ |X(r,\ell,y,\tau)| : \: \tau_{min} \leq \tau \leq \tau_{max} \} $$ > $$ W(\ell,r,y) = Max(\ell,r,y) \, / \, Std(\ell,r,y) $$ > $ W(\ell,r,y) $ is called the _correlation strength_ of the link between $\ell$ and $r$ during year $y$. > The value of $\tau$ at which $ Max(\ell,r,y) $ is achieved is the _time delay_ between points $\ell$ and $r$ during year $y$. I noted that you didn't like the normalization by the standard deviation, but that's another matter. Also we haven't even gotten to the issue of maximization over varying $\tau$, which introduces another dimension to the performance costs.
I've been looking at Ludescher et al, not Yamasaki et al. Ludescher et al do two normalisations. "we divide the cross-covariances by the corresponding standard deviations (SD) of Ti and Tj, to obtain the cross-correlations." Then they do what you said. It was that first normalisation I said didn't like. (I'm not saying I like the other either!)
I've been looking at Ludescher et al, not Yamasaki et al. Ludescher et al do two normalisations. "we divide the cross-covariances by the corresponding standard deviations (SD) of Ti and Tj, to obtain the cross-correlations." Then they do what you said. It was that first normalisation I said didn't like. (I'm not saying I like the other either!)
What do you think of their use of the absolute value. Does that in way address your concerns about negative covariances -- or is it just a band-aid?
What do you think of their use of the absolute value. Does that in way address your concerns about negative covariances -- or is it just a band-aid?
I wouldn't call it a band-aid, but what they call the correlation strength is very ad hoc. There are dozens of other ways you could measure the strength of a link and no reason to choose this one in particular.
I wouldn't call it a band-aid, but what they call the *correlation strength* is very ad hoc. There are dozens of other ways you could measure the strength of a link and no reason to choose this one in particular.
Graham, I introduced some tangents in the discussion above, which I'd like to snip out by restating the point that I meant to make.
I'm concerned about the computational cost of computing all covariances at all distances, which is why I proposed the sampling approach to estimating the mean covariance for each of the distances.
You said you preferred the median because the covariances can be very unevenly distributed, and also because of your concern about positive and negative covariances cancelling each other out.
But using sampling to estimate the median looks like a substantial problem in itself.
I was wondering about borrowing the idea from Yamaski et al. of taking the mean of the absolute values of the covariance. That way we could still use sampling, but it would avoid the effect of negative and positive covariances cancelling each other out.
This still wouldn't address the first issue you raised, about the uneven distribution of the covariances.
Graham, I introduced some tangents in the discussion above, which I'd like to snip out by restating the point that I meant to make. I'm concerned about the computational cost of computing all covariances at all distances, which is why I proposed the sampling approach to estimating the mean covariance for each of the distances. You said you preferred the median because the covariances can be very unevenly distributed, and also because of your concern about positive and negative covariances cancelling each other out. But using sampling to estimate the median looks like a substantial problem in itself. I was wondering about borrowing the idea from Yamaski et al. of taking the mean of the absolute values of the covariance. That way we could still use sampling, but it would avoid the effect of negative and positive covariances cancelling each other out. This still wouldn't address the first issue you raised, about the uneven distribution of the covariances.
I have put some code on github for faster calculation of covariances. It takes advantage of the fact that for Ludescher-type sets of covariances, there is much overlap between the ranges. It needs more testing before serious use. It does ~1000 days (every ten days for 28 years) by 401 time-delays (tau values) between two points in about 1.5s.
Why? You have to collect all the samples to find the median, whereas with a mean you can keeping a running total, but that's not a substantial problem.
> I’m concerned about the computational cost of computing all covariances at all distances I have put some code on github for faster calculation of covariances. It takes advantage of the fact that for Ludescher-type sets of covariances, there is much overlap between the ranges. It needs more testing before serious use. It does ~1000 days (every ten days for 28 years) by 401 time-delays (tau values) between two points in about 1.5s. > But using sampling to estimate the median looks like a substantial problem in itself. Why? You have to collect all the samples to find the median, whereas with a mean you can keeping a running total, but that's not a substantial problem.
Here is a quote that I came across on the web:
But I don't have enough experience with statistics, or with our data, to assess the significance of this concern, in our case.
So for now I'll stay with the median, but we should keep this footnote in our minds.
Here is a quote that I came across on the [web](http://individual.utoronto.ca/pivovarova/teaching/HW_21_SOLN.pdf): > The sample median is an unbiased estimator of the population median when the population is normal. However, for a general population it is not true that the sample median is an unbiased estimator of the population median. The sample mean is a biased estimator of the population median when the population is not symmetric. But I don't have enough experience with statistics, or with our data, to assess the significance of this concern, in our case. So for now I'll stay with the median, but we should keep this footnote in our minds.
I worked out the algorithm that I described above, using R. It does give good performance, and the data looks like a smoothly decreasing function of node separation.
I'm running it overnight, to generate a table for each of the years 1950-1979, which has one column for each of the node separations D = 0, 1, ..., 27, which are in units of 2.5 degrees. It will show the median covariance, taken over a sample size of 10,000 randomly chosen segments for each integer value of D. For each randomly chosen segment, I use the floor of it's length, measured in Cartesian grid units, to determine which bucket should go into. The median of each bucket gets shown in a corresponding column.
The algorithm needs a modification, though: it is working in Cartesian grid units, which doesn't take into account the fact that the distance between two adjacent nodes in the grid is a function of latitude.
I worked out the algorithm that I described above, using R. It does give good performance, and the data looks like a smoothly decreasing function of node separation. I'm running it overnight, to generate a table for each of the years 1950-1979, which has one column for each of the node separations D = 0, 1, ..., 27, which are in units of 2.5 degrees. It will show the median covariance, taken over a sample size of 10,000 randomly chosen segments for each integer value of D. For each randomly chosen segment, I use the floor of it's length, measured in Cartesian grid units, to determine which bucket should go into. The median of each bucket gets shown in a corresponding column. The algorithm needs a modification, though: it is working in Cartesian grid units, which doesn't take into account the fact that the distance between two adjacent nodes in the grid is a function of latitude.
Here is the data. Covariances were computed over a full 365 day window.
This is still not corrected for the dependence of node distance on latitude.
1950, 0.376725, 0.288752, 0.184332, 0.117790, 0.082108, 0.055115, 0.038667, 0.027804, 0.017671, 0.011439, 0.007086, 0.004478, 0.001744, -0.001047, -0.003089, -0.003710, -0.003404, -0.004410, -0.003479, -0.003606, -0.004439, -0.003517, -0.002411, -0.002327, -0.000939, -0.000110, 0.003093
1951, 0.497873, 0.378028, 0.267959, 0.206597, 0.164560, 0.121937, 0.084815, 0.065603, 0.045090, 0.035339, 0.029342, 0.023507, 0.020172, 0.019567, 0.015739, 0.013974, 0.010840, 0.010878, 0.011903, 0.014327, 0.013779, 0.019440, 0.018217, 0.019181, 0.019640, 0.018455, 0.015653
1952, 0.384914, 0.298311, 0.193402, 0.136403, 0.096268, 0.063049, 0.046147, 0.035085, 0.026154, 0.019055, 0.014410, 0.008885, 0.007904, 0.005009, 0.004154, 0.003535, 0.002084, 0.001440, 0.001872, 0.001446, 0.002742, 0.003582, 0.005336, 0.005406, 0.003590, 0.004583, 0.004938
1953, 0.404755, 0.299675, 0.193846, 0.131324, 0.087064, 0.053213, 0.037445, 0.028035, 0.020515, 0.015935, 0.012084, 0.010311, 0.008631, 0.008659, 0.009659, 0.009075, 0.009953, 0.010439, 0.012139, 0.012957, 0.012625, 0.010603, 0.009926, 0.005277, 0.004812, 0.001944, 0.001748
1954, 0.399608, 0.324695, 0.232220, 0.179475, 0.143515, 0.108126, 0.081813, 0.062538, 0.051074, 0.040224, 0.033944, 0.028852, 0.025769, 0.023752, 0.021954, 0.019374, 0.017265, 0.017973, 0.018237, 0.018248, 0.018128, 0.017861, 0.019269, 0.022279, 0.023335, 0.026768, 0.033545
1955, 0.423631, 0.307121, 0.223340, 0.170559, 0.132915, 0.089765, 0.061923, 0.042745, 0.029450, 0.019171, 0.012016, 0.007414, 0.003746, 0.002021, 0.000216, -0.000743, -0.002419, -0.003040, -0.003243, -0.003336, -0.003471, -0.002443, -0.001914, -0.000111, 0.000197, 0.001615, 0.004419
1956, 0.398572, 0.297339, 0.211447, 0.146149, 0.101105, 0.063289, 0.044106, 0.032605, 0.022991, 0.016632, 0.012465, 0.011271, 0.009290, 0.010205, 0.009812, 0.008534, 0.009789, 0.009654, 0.011038, 0.009764, 0.009680, 0.008852, 0.007636, 0.007611, 0.008158, 0.008140, 0.008346
1957, 0.543840, 0.401238, 0.300696, 0.229548, 0.170417, 0.117823, 0.085493, 0.058618, 0.038805, 0.026047, 0.018612, 0.011135, 0.011133, 0.005943, 0.006544, 0.007047, 0.007505, 0.009342, 0.007762, 0.008633, 0.008941, 0.012584, 0.012921, 0.013848, 0.013240, 0.010421, 0.016420
1958, 0.401656, 0.320032, 0.211704, 0.163894, 0.121845, 0.083730, 0.055344, 0.036292, 0.022771, 0.014026, 0.007086, 0.002262, 0.000560, -0.001399, -0.003168, -0.004113, -0.003320, -0.003257, -0.002088, -0.001063, -0.000352, -0.000769, 0.001491, 0.001296, 0.000530, -0.000339, 0.001791
1959, 0.406817, 0.317248, 0.203533, 0.152294, 0.114097, 0.082428, 0.059843, 0.044288, 0.032283, 0.022619, 0.017355, 0.012539, 0.010661, 0.009816, 0.010079, 0.009338, 0.010045, 0.008826, 0.009526, 0.009328, 0.006070, 0.007269, 0.002881, 0.003538, 0.002745, 0.002050, 0.004178
1960, 0.404540, 0.293296, 0.182790, 0.126116, 0.081986, 0.053903, 0.038619, 0.029649, 0.022639, 0.018154, 0.014553, 0.011814, 0.009597, 0.007263, 0.006036, 0.005744, 0.005145, 0.005096, 0.005608, 0.005323, 0.004823, 0.005340, 0.004700, 0.003929, 0.004074, 0.003112, 0.000932
1961, 0.436981, 0.318925, 0.227808, 0.170093, 0.131357, 0.096418, 0.074433, 0.059509, 0.050305, 0.043425, 0.038501, 0.032744, 0.030665, 0.026243, 0.025636, 0.024662, 0.024857, 0.023646, 0.023234, 0.023648, 0.024311, 0.023317, 0.023183, 0.022564, 0.019814, 0.019071, 0.021283
1962, 0.396058, 0.299080, 0.187232, 0.129761, 0.090867, 0.063143, 0.045590, 0.035252, 0.027418, 0.020875, 0.016034, 0.013393, 0.010684, 0.008866, 0.008027, 0.008095, 0.007091, 0.006710, 0.006786, 0.005350, 0.005228, 0.005071, 0.004673, 0.003920, 0.001334, 0.000442, 0.001578
1963, 0.479960, 0.365897, 0.249881, 0.193506, 0.152478, 0.110162, 0.073808, 0.050997, 0.035377, 0.023616, 0.018152, 0.011323, 0.009910, 0.004800, 0.002581, 0.000238, 0.001071, 0.002614, 0.002234, 0.002974, 0.004150, 0.004119, 0.004514, 0.005624, 0.004134, 0.003697, 0.004461
1964, 0.413449, 0.332507, 0.259229, 0.205005, 0.165699, 0.114814, 0.076885, 0.055181, 0.041309, 0.031944, 0.025338, 0.018501, 0.016807, 0.013228, 0.013626, 0.012466, 0.014382, 0.015428, 0.016407, 0.016345, 0.016640, 0.018331, 0.016619, 0.020420, 0.016845, 0.020146, 0.026646
1965, 0.530515, 0.432047, 0.337793, 0.268720, 0.213892, 0.145845, 0.097379, 0.068950, 0.047180, 0.036840, 0.029960, 0.021405, 0.019833, 0.013756, 0.008812, 0.006022, 0.005071, 0.004931, 0.005695, 0.008717, 0.011845, 0.013156, 0.013968, 0.014541, 0.010263, 0.009971, 0.011144
Here is the data. Covariances were computed over a full 365 day window. This is still not corrected for the dependence of node distance on latitude. 1950, 0.376725, 0.288752, 0.184332, 0.117790, 0.082108, 0.055115, 0.038667, 0.027804, 0.017671, 0.011439, 0.007086, 0.004478, 0.001744, -0.001047, -0.003089, -0.003710, -0.003404, -0.004410, -0.003479, -0.003606, -0.004439, -0.003517, -0.002411, -0.002327, -0.000939, -0.000110, 0.003093 1951, 0.497873, 0.378028, 0.267959, 0.206597, 0.164560, 0.121937, 0.084815, 0.065603, 0.045090, 0.035339, 0.029342, 0.023507, 0.020172, 0.019567, 0.015739, 0.013974, 0.010840, 0.010878, 0.011903, 0.014327, 0.013779, 0.019440, 0.018217, 0.019181, 0.019640, 0.018455, 0.015653 1952, 0.384914, 0.298311, 0.193402, 0.136403, 0.096268, 0.063049, 0.046147, 0.035085, 0.026154, 0.019055, 0.014410, 0.008885, 0.007904, 0.005009, 0.004154, 0.003535, 0.002084, 0.001440, 0.001872, 0.001446, 0.002742, 0.003582, 0.005336, 0.005406, 0.003590, 0.004583, 0.004938 1953, 0.404755, 0.299675, 0.193846, 0.131324, 0.087064, 0.053213, 0.037445, 0.028035, 0.020515, 0.015935, 0.012084, 0.010311, 0.008631, 0.008659, 0.009659, 0.009075, 0.009953, 0.010439, 0.012139, 0.012957, 0.012625, 0.010603, 0.009926, 0.005277, 0.004812, 0.001944, 0.001748 1954, 0.399608, 0.324695, 0.232220, 0.179475, 0.143515, 0.108126, 0.081813, 0.062538, 0.051074, 0.040224, 0.033944, 0.028852, 0.025769, 0.023752, 0.021954, 0.019374, 0.017265, 0.017973, 0.018237, 0.018248, 0.018128, 0.017861, 0.019269, 0.022279, 0.023335, 0.026768, 0.033545 1955, 0.423631, 0.307121, 0.223340, 0.170559, 0.132915, 0.089765, 0.061923, 0.042745, 0.029450, 0.019171, 0.012016, 0.007414, 0.003746, 0.002021, 0.000216, -0.000743, -0.002419, -0.003040, -0.003243, -0.003336, -0.003471, -0.002443, -0.001914, -0.000111, 0.000197, 0.001615, 0.004419 1956, 0.398572, 0.297339, 0.211447, 0.146149, 0.101105, 0.063289, 0.044106, 0.032605, 0.022991, 0.016632, 0.012465, 0.011271, 0.009290, 0.010205, 0.009812, 0.008534, 0.009789, 0.009654, 0.011038, 0.009764, 0.009680, 0.008852, 0.007636, 0.007611, 0.008158, 0.008140, 0.008346 1957, 0.543840, 0.401238, 0.300696, 0.229548, 0.170417, 0.117823, 0.085493, 0.058618, 0.038805, 0.026047, 0.018612, 0.011135, 0.011133, 0.005943, 0.006544, 0.007047, 0.007505, 0.009342, 0.007762, 0.008633, 0.008941, 0.012584, 0.012921, 0.013848, 0.013240, 0.010421, 0.016420 1958, 0.401656, 0.320032, 0.211704, 0.163894, 0.121845, 0.083730, 0.055344, 0.036292, 0.022771, 0.014026, 0.007086, 0.002262, 0.000560, -0.001399, -0.003168, -0.004113, -0.003320, -0.003257, -0.002088, -0.001063, -0.000352, -0.000769, 0.001491, 0.001296, 0.000530, -0.000339, 0.001791 1959, 0.406817, 0.317248, 0.203533, 0.152294, 0.114097, 0.082428, 0.059843, 0.044288, 0.032283, 0.022619, 0.017355, 0.012539, 0.010661, 0.009816, 0.010079, 0.009338, 0.010045, 0.008826, 0.009526, 0.009328, 0.006070, 0.007269, 0.002881, 0.003538, 0.002745, 0.002050, 0.004178 1960, 0.404540, 0.293296, 0.182790, 0.126116, 0.081986, 0.053903, 0.038619, 0.029649, 0.022639, 0.018154, 0.014553, 0.011814, 0.009597, 0.007263, 0.006036, 0.005744, 0.005145, 0.005096, 0.005608, 0.005323, 0.004823, 0.005340, 0.004700, 0.003929, 0.004074, 0.003112, 0.000932 1961, 0.436981, 0.318925, 0.227808, 0.170093, 0.131357, 0.096418, 0.074433, 0.059509, 0.050305, 0.043425, 0.038501, 0.032744, 0.030665, 0.026243, 0.025636, 0.024662, 0.024857, 0.023646, 0.023234, 0.023648, 0.024311, 0.023317, 0.023183, 0.022564, 0.019814, 0.019071, 0.021283 1962, 0.396058, 0.299080, 0.187232, 0.129761, 0.090867, 0.063143, 0.045590, 0.035252, 0.027418, 0.020875, 0.016034, 0.013393, 0.010684, 0.008866, 0.008027, 0.008095, 0.007091, 0.006710, 0.006786, 0.005350, 0.005228, 0.005071, 0.004673, 0.003920, 0.001334, 0.000442, 0.001578 1963, 0.479960, 0.365897, 0.249881, 0.193506, 0.152478, 0.110162, 0.073808, 0.050997, 0.035377, 0.023616, 0.018152, 0.011323, 0.009910, 0.004800, 0.002581, 0.000238, 0.001071, 0.002614, 0.002234, 0.002974, 0.004150, 0.004119, 0.004514, 0.005624, 0.004134, 0.003697, 0.004461 1964, 0.413449, 0.332507, 0.259229, 0.205005, 0.165699, 0.114814, 0.076885, 0.055181, 0.041309, 0.031944, 0.025338, 0.018501, 0.016807, 0.013228, 0.013626, 0.012466, 0.014382, 0.015428, 0.016407, 0.016345, 0.016640, 0.018331, 0.016619, 0.020420, 0.016845, 0.020146, 0.026646 1965, 0.530515, 0.432047, 0.337793, 0.268720, 0.213892, 0.145845, 0.097379, 0.068950, 0.047180, 0.036840, 0.029960, 0.021405, 0.019833, 0.013756, 0.008812, 0.006022, 0.005071, 0.004931, 0.005695, 0.008717, 0.011845, 0.013156, 0.013968, 0.014541, 0.010263, 0.009971, 0.011144
1966, 0.431148, 0.331803, 0.238255, 0.176686, 0.141642, 0.089962, 0.060394, 0.039816, 0.022766, 0.011258, 0.004415, -0.004761, -0.006303, -0.007417, -0.009172, -0.008656, -0.007522, -0.005489, -0.002784, -0.001813, -0.000654, -0.001337, 0.000206, -0.001158, -0.001903, -0.001847, -0.004526
1967, 0.417590, 0.315164, 0.213519, 0.146833, 0.105399, 0.067396, 0.045697, 0.032650, 0.020687, 0.013863, 0.008831, 0.004853, 0.004300, 0.001801, 0.001962, 0.001687, 0.000790, 0.001165, 0.001586, 0.002647, 0.002899, 0.002957, 0.005071, 0.005849, 0.005952, 0.004707, 0.006682
1968, 0.512509, 0.404913, 0.289787, 0.219778, 0.171534, 0.126884, 0.096705, 0.076529, 0.059857, 0.048526, 0.042656, 0.036140, 0.032940, 0.028841, 0.026615, 0.025556, 0.024169, 0.024714, 0.026404, 0.027640, 0.027231, 0.029520, 0.029747, 0.029645, 0.028538, 0.029928, 0.025921
1969, 0.415270, 0.313612, 0.212363, 0.153155, 0.114511, 0.080217, 0.059733, 0.045833, 0.031744, 0.024738, 0.018835, 0.014443, 0.011237, 0.008032, 0.007759, 0.007580, 0.007433, 0.005875, 0.005508, 0.004729, 0.003742, 0.002756, 0.002415, 0.001675, 0.003120, 0.002357, -0.002137
1970, 0.497127, 0.412444, 0.294776, 0.226329, 0.166967, 0.116271, 0.087155, 0.059640, 0.044915, 0.033142, 0.027946, 0.020460, 0.016456, 0.012873, 0.011462, 0.011314, 0.009976, 0.009153, 0.012228, 0.012125, 0.012177, 0.011530, 0.011566, 0.008241, 0.006988, 0.006295, 0.008723
1971, 0.424364, 0.332461, 0.224057, 0.157024, 0.110483, 0.072468, 0.053449, 0.038617, 0.028035, 0.019310, 0.013372, 0.007930, 0.004398, 0.000019, -0.001477, -0.001690, -0.002654, -0.002683, -0.001316, 0.000114, 0.000257, -0.001043, -0.001038, -0.000761, -0.002161, -0.001195, -0.002994
1972, 0.691285, 0.591039, 0.476096, 0.361859, 0.275870, 0.162873, 0.107006, 0.071540, 0.051868, 0.036460, 0.027582, 0.017843, 0.013376, 0.008525, 0.006463, 0.004207, 0.005106, 0.004518, 0.006773, 0.005468, 0.005762, 0.005188, 0.004774, 0.003143, 0.000716, 0.000555, -0.000404
1973, 0.719328, 0.602857, 0.442475, 0.339179, 0.267357, 0.185539, 0.128436, 0.085810, 0.059455, 0.040934, 0.030712, 0.021248, 0.014541, 0.010812, 0.007942, 0.009611, 0.005944, 0.005143, 0.005824, 0.002407, 0.005623, 0.006767, 0.005630, 0.005233, 0.003683, 0.006677, 0.010963
1974, 0.395127, 0.309754, 0.225025, 0.179570, 0.143235, 0.102333, 0.076058, 0.049868, 0.037540, 0.027670, 0.018539, 0.013229, 0.010906, 0.007226, 0.006533, 0.003032, 0.005356, 0.004035, 0.007664, 0.007887, 0.007309, 0.007498, 0.009876, 0.011102, 0.010993, 0.008149, 0.013693
1975, 0.411585, 0.314415, 0.233452, 0.183707, 0.149873, 0.108196, 0.077837, 0.058354, 0.045755, 0.034201, 0.027057, 0.019275, 0.017942, 0.014944, 0.011740, 0.008582, 0.008528, 0.007633, 0.004492, 0.005337, 0.004136, 0.005322, 0.006265, 0.004904, 0.004796, 0.006067, 0.007700
1976, 0.701437, 0.611954, 0.486288, 0.387918, 0.311690, 0.232872, 0.153192, 0.118535, 0.088980, 0.075090, 0.060840, 0.052749, 0.054625, 0.044829, 0.041111, 0.035870, 0.032889, 0.031577, 0.032975, 0.033839, 0.038191, 0.037870, 0.039298, 0.041152, 0.041833, 0.041602, 0.061608
1977, 0.487561, 0.364822, 0.239694, 0.175498, 0.128164, 0.089747, 0.063153, 0.045851, 0.029987, 0.019526, 0.010389, 0.005615, 0.000417, -0.002110, -0.005536, -0.006853, -0.007390, -0.006935, -0.005430, -0.004005, -0.000692, -0.002648, -0.001338, 0.001523, 0.001367, 0.002584, 0.009944
1978, 0.485333, 0.372565, 0.262490, 0.192783, 0.146089, 0.105030, 0.075198, 0.055268, 0.039472, 0.028793, 0.020672, 0.014129, 0.013400, 0.009390, 0.009295, 0.005212, 0.005422, 0.003599, 0.005449, 0.004420, 0.004095, 0.004498, 0.003332, 0.001888, 0.000352, -0.003374, -0.006576
1979, 0.487816, 0.352541, 0.232255, 0.178560, 0.136964, 0.094659, 0.067236, 0.047334, 0.033249, 0.022715, 0.014910, 0.008833, 0.008597, 0.006866, 0.005765, 0.006084, 0.005813, 0.005550, 0.005076, 0.007923, 0.007074, 0.008297, 0.009846, 0.010525, 0.010280, 0.011980, 0.008838
1966, 0.431148, 0.331803, 0.238255, 0.176686, 0.141642, 0.089962, 0.060394, 0.039816, 0.022766, 0.011258, 0.004415, -0.004761, -0.006303, -0.007417, -0.009172, -0.008656, -0.007522, -0.005489, -0.002784, -0.001813, -0.000654, -0.001337, 0.000206, -0.001158, -0.001903, -0.001847, -0.004526 1967, 0.417590, 0.315164, 0.213519, 0.146833, 0.105399, 0.067396, 0.045697, 0.032650, 0.020687, 0.013863, 0.008831, 0.004853, 0.004300, 0.001801, 0.001962, 0.001687, 0.000790, 0.001165, 0.001586, 0.002647, 0.002899, 0.002957, 0.005071, 0.005849, 0.005952, 0.004707, 0.006682 1968, 0.512509, 0.404913, 0.289787, 0.219778, 0.171534, 0.126884, 0.096705, 0.076529, 0.059857, 0.048526, 0.042656, 0.036140, 0.032940, 0.028841, 0.026615, 0.025556, 0.024169, 0.024714, 0.026404, 0.027640, 0.027231, 0.029520, 0.029747, 0.029645, 0.028538, 0.029928, 0.025921 1969, 0.415270, 0.313612, 0.212363, 0.153155, 0.114511, 0.080217, 0.059733, 0.045833, 0.031744, 0.024738, 0.018835, 0.014443, 0.011237, 0.008032, 0.007759, 0.007580, 0.007433, 0.005875, 0.005508, 0.004729, 0.003742, 0.002756, 0.002415, 0.001675, 0.003120, 0.002357, -0.002137 1970, 0.497127, 0.412444, 0.294776, 0.226329, 0.166967, 0.116271, 0.087155, 0.059640, 0.044915, 0.033142, 0.027946, 0.020460, 0.016456, 0.012873, 0.011462, 0.011314, 0.009976, 0.009153, 0.012228, 0.012125, 0.012177, 0.011530, 0.011566, 0.008241, 0.006988, 0.006295, 0.008723 1971, 0.424364, 0.332461, 0.224057, 0.157024, 0.110483, 0.072468, 0.053449, 0.038617, 0.028035, 0.019310, 0.013372, 0.007930, 0.004398, 0.000019, -0.001477, -0.001690, -0.002654, -0.002683, -0.001316, 0.000114, 0.000257, -0.001043, -0.001038, -0.000761, -0.002161, -0.001195, -0.002994 1972, 0.691285, 0.591039, 0.476096, 0.361859, 0.275870, 0.162873, 0.107006, 0.071540, 0.051868, 0.036460, 0.027582, 0.017843, 0.013376, 0.008525, 0.006463, 0.004207, 0.005106, 0.004518, 0.006773, 0.005468, 0.005762, 0.005188, 0.004774, 0.003143, 0.000716, 0.000555, -0.000404 1973, 0.719328, 0.602857, 0.442475, 0.339179, 0.267357, 0.185539, 0.128436, 0.085810, 0.059455, 0.040934, 0.030712, 0.021248, 0.014541, 0.010812, 0.007942, 0.009611, 0.005944, 0.005143, 0.005824, 0.002407, 0.005623, 0.006767, 0.005630, 0.005233, 0.003683, 0.006677, 0.010963 1974, 0.395127, 0.309754, 0.225025, 0.179570, 0.143235, 0.102333, 0.076058, 0.049868, 0.037540, 0.027670, 0.018539, 0.013229, 0.010906, 0.007226, 0.006533, 0.003032, 0.005356, 0.004035, 0.007664, 0.007887, 0.007309, 0.007498, 0.009876, 0.011102, 0.010993, 0.008149, 0.013693 1975, 0.411585, 0.314415, 0.233452, 0.183707, 0.149873, 0.108196, 0.077837, 0.058354, 0.045755, 0.034201, 0.027057, 0.019275, 0.017942, 0.014944, 0.011740, 0.008582, 0.008528, 0.007633, 0.004492, 0.005337, 0.004136, 0.005322, 0.006265, 0.004904, 0.004796, 0.006067, 0.007700 1976, 0.701437, 0.611954, 0.486288, 0.387918, 0.311690, 0.232872, 0.153192, 0.118535, 0.088980, 0.075090, 0.060840, 0.052749, 0.054625, 0.044829, 0.041111, 0.035870, 0.032889, 0.031577, 0.032975, 0.033839, 0.038191, 0.037870, 0.039298, 0.041152, 0.041833, 0.041602, 0.061608 1977, 0.487561, 0.364822, 0.239694, 0.175498, 0.128164, 0.089747, 0.063153, 0.045851, 0.029987, 0.019526, 0.010389, 0.005615, 0.000417, -0.002110, -0.005536, -0.006853, -0.007390, -0.006935, -0.005430, -0.004005, -0.000692, -0.002648, -0.001338, 0.001523, 0.001367, 0.002584, 0.009944 1978, 0.485333, 0.372565, 0.262490, 0.192783, 0.146089, 0.105030, 0.075198, 0.055268, 0.039472, 0.028793, 0.020672, 0.014129, 0.013400, 0.009390, 0.009295, 0.005212, 0.005422, 0.003599, 0.005449, 0.004420, 0.004095, 0.004498, 0.003332, 0.001888, 0.000352, -0.003374, -0.006576 1979, 0.487816, 0.352541, 0.232255, 0.178560, 0.136964, 0.094659, 0.067236, 0.047334, 0.033249, 0.022715, 0.014910, 0.008833, 0.008597, 0.006866, 0.005765, 0.006084, 0.005813, 0.005550, 0.005076, 0.007923, 0.007074, 0.008297, 0.009846, 0.010525, 0.010280, 0.011980, 0.008838
Thanks! But year 1971 is missing 6 numbers!
Thanks! But year 1971 is missing 6 numbers!
Hi, I just pasted those numbers in above.
Hi, I just pasted those numbers in above.
The top three graphs show the square roots of the absolute values of the covariances. The bottom three are similar, but the values are scaled so that the covariance for the smallest distance is always 1.
The top three graphs show the square roots of the absolute values of the covariances. The bottom three are similar, but the values are scaled so that the covariance for the smallest distance is always 1. <img width = "1200" src = "http://www.azimuthproject.org/azimuth/files/cov-dists-1950-1979.png" alt = ""/>
Great!
I'll have to ponder these... in the 1950's, the purple year 1954 seems to have remarkable high covariances over long distances. What happened in 1954? In the 1960s, it seems 1968 and 1969 are the winners.
Great! <img src = "http://math.ucr.edu/home/baez/emoticons/thumb.gif" alt = ""/> I'll have to ponder these... in the 1950's, the purple year 1954 seems to have remarkable high covariances over long distances. What happened in 1954? In the 1960s, it seems 1968 and 1969 are the winners.
David wrote:
What exactly is meant by "node separation" ? How exactly did you compute this distance in "degrees" Graham? Is 68 the maximal distance?
David wrote: >Suppose we fix a time t, and want to compute the mean link strengths for a range of distances D = d, 2d, 3d, …. What exactly is meant by "node separation" ? How exactly did you compute this distance in "degrees" Graham? Is 68 the maximal distance?
There were 27 values. I just multiplied by 2.5, which is the seperation of the NOAA data. This was a guess.
In the graphs, it is
27*2.5 = 67.5
. I suspect I shpuld have multiplied (0 to 26) by 2.5, not (1 to 27), so it probably should be26*2.5=65
.We'll have to ask David to be sure.
> What exactly is meant by “node separation” ? How exactly did you compute this distance in “degrees” Graham? There were 27 values. I just multiplied by 2.5, which is the seperation of the NOAA data. This was a guess. > Is 68 the maximal distance? In the graphs, it is `27*2.5 = 67.5`. I suspect I shpuld have multiplied (0 to 26) by 2.5, not (1 to 27), so it probably should be `26*2.5=65`. We'll have to ask David to be sure.
Graham, that's a good visualization of the data, thanks.
The first value was for a separation of zero degrees. So multiplication by 2.5 is correct.
For each degree of separation D, the graph shows the square root of the absolute value of the median covariance -- using a sampling estimate of 10,000 random pairs in the neighborhood of each value for D.
Should I rerun it for the whole period, up to 2010? Also, I can increase the number of samples, say to 20,000, to see what kind of impact that has.
For purposes of peer-review, I'll post a pseudo-code description of the algorithm here, and also will post the code to Github. (Bugs are always possible, especially in the first cut of a program.)
Graham, that's a good visualization of the data, thanks. The first value was for a separation of zero degrees. So multiplication by 2.5 is correct. For each degree of separation D, the graph shows the square root of the absolute value of the median covariance -- using a sampling estimate of 10,000 random pairs in the neighborhood of each value for D. Should I rerun it for the whole period, up to 2010? Also, I can increase the number of samples, say to 20,000, to see what kind of impact that has. For purposes of peer-review, I'll post a pseudo-code description of the algorithm here, and also will post the code to Github. (Bugs are always possible, especially in the first cut of a program.)
Here's the code I used for the graphs. David, feel free to incorporate this into your code on github.
Here's the code I used for the graphs. David, feel free to incorporate this into your code on github. ~~~~ x <- scan("C:/Users/Work/Desktop/dtanzerdists.txt", sep=',') xm <- matrix(x, nrow=30, ncol=28, byrow=TRUE) covs <- xm[,-1] sqrtcovs <- matrix(0, ncol=ncol(covs), nrow=nrow(covs)) sqrtnormcovs <- matrix(0, ncol=ncol(covs), nrow=nrow(covs)) for (y in 1:30) { sqrtcovs[y,] <- sqrt(abs(covs[y,])) } for (y in 1:30) { sqrtnormcovs[y,] <- sqrt(abs(covs[y,] / covs[y,1] )) } clrs <- c("#d4004fff", "#ff8546ff", "#c68500ff", "#f5d22dff", "#b3b800ff", "#8ff66aff", "#00b09cff", "#8ec1ffff", "#6917ddff", "#db7effff") oldpar <- par(mfrow=(c(2,3))) for (i in 1:3) { dec <- c(1950,1960,1970)[i] matplot((1:27)*2.5, t(sqrtcovs[(i*10-9):(i*10),]), ylim=c(min(sqrtcovs),max(sqrtcovs)), main=paste0(dec, "s"), xlab="distance in degrees", ylab="sqrt of abs(covariance)", type='l', col=clrs, lty=1, lwd=2) legend(x="topright", legend=paste(as.character(dec+(0:9))), col=clrs, lty=1, lwd=2) } for (i in 1:3) { dec <- c(1950,1960,1970)[i] matplot((1:27)*2.5, t(sqrtnormcovs[(i*10-9):(i*10),]), ylim=c(0,1), main=paste0(dec, "s"), xlab="distance in degrees", ylab="sqrt of normalised abs(covariance)", type='l', col=clrs, lty=1, lwd=2) legend(x="topright", legend=paste(as.character(dec+(0:9))), col=clrs, lty=1, lwd=2) } par(oldpar) ~~~~
I think other things would be more interesting:
fix the distance calculation to take account of the roundness of the world - does that affect the results?
do east-west distances behave differently from north-south ones?
what happens if you use a larger area?
> Should I rerun it for the whole period, up to 2010? Also, I can increase the number of samples, say to 20,000, to see what kind of impact that has. I think other things would be more interesting: * fix the distance calculation to take account of the roundness of the world - does that affect the results? * do east-west distances behave differently from north-south ones? * what happens if you use a larger area?
Re: your first point, I'm noting a blog article on great circle distance calculations in R, from which I'm planning to use the haversine formula.
Re: your first point, I'm noting a blog article on [great circle distance calculations in R](http://www.r-bloggers.com/great-circle-distance-calculations-in-r/), from which I'm planning to use the haversine formula.
What is the raw data you are all working with? How can I download it? If you are trying to find "important" correlations, glasso may be worth trying: http://statweb.stanford.edu/~tibs/glasso/
Daniel, see this comment and others in that thread.
Daniel, see this [comment](http://forum.azimuthproject.org/discussion/1366/r-programming-language/?Focus=11072#Comment_11072) and others in that thread.
Thanks Graham. Where can I find what all the file types in ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis.dailyavgs/surface/ are?
Thanks Graham. Where can I find what all the file types in ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis.dailyavgs/surface/ are?
I assume from your post in the R programming thread that you already know as much as I do about the file formats, namely: RNetCDF works.
I assume from your post in the R programming thread that you already know as much as I do about the file formats, namely: RNetCDF works.
Yes, but my last question about file types was about the sematics. There are a lot of other files ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis.dailyavgs/surface/ besides what we are working with here, so I was wondering if there was some key to what they all represent.
Yes, but my last question about file types was about the sematics. There are a lot of other files ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis.dailyavgs/surface/ besides what we are working with here, so I was wondering if there was some key to what they all represent.
Instead of looking around the ftp site, it could be easier to visit their website:
This describes (and provides access to) lots of files, which I hope correspond to those here:
You can get even more stuff if you go up to
which provides access to 7 kinds of files:
Instead of looking around the ftp site, it could be easier to visit their website: * [NCEP/NCAR Reanalysis 1: Surface](http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.surface.html). This describes (and provides access to) lots of files, which I hope correspond to those here: * [ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis.dailyavgs/surface/](ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis.dailyavgs/surface/) You can get even more stuff if you go up to * [NCEP/NCAR Reanalysis 1: Summary](http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.html) which provides access to 7 kinds of files: * Pressure level * Surface * Surface Fluxes * Other Fluxes * Tropopause * Derived Data * Spectral Coefficients
NOAA seems very shy about the units they use. This site has some details.
NOAA seems very shy about the units they use. This [site](http://icdc.zmaw.de/ncep.html?&L=1) has some details.
Here is the path to the code on github which I used to generate the data for the median covariance as a function of node separation: covariance-by-distance.R.
I posted this data in messages 21 and 22, and Graham plotted it in message 25.
The first part of the code consists of Graham's functions for reading the data, and performing the seasonal adjustment of the temperatures. The second part of the file contains the functions I wrote to compute the covariance by distance.
It's my first R program, and it is highly unpolished. In particular, I'm just printing the .csv data to the console, and running the program through "tee output.txt". It took most of the night to generate the output.
(Overall it seems more efficient for us to post code that generates data, even when it is not in the greatest form.)
Here is the path to the code on github which I used to generate the data for the median covariance as a function of node separation: [covariance-by-distance.R](https://raw.githubusercontent.com/azimuth-project/el-nino/master/R/tanzer/covariance-by-distance.R). I posted this data in messages 21 and 22, and Graham plotted it in message 25. The first part of the code consists of Graham's functions for reading the data, and performing the seasonal adjustment of the temperatures. The second part of the file contains the functions I wrote to compute the covariance by distance. It's my first R program, and it is highly unpolished. In particular, I'm just printing the .csv data to the console, and running the program through "tee output.txt". It took most of the night to generate the output. (Overall it seems more efficient for us to post code that generates data, even when it is not in the greatest form.)
Yes, it's good to post code.
I'm going to want to start writing a lot of short blog articles that show graphs and other visualizations, with links to the code that made them, and just enough explanation that people can understand what they're seeing. People here are starting to generate lots of visualizations, and I think blog articles would be a way to make it easier to find these. I think people would enjoy reading them, too.
The trick will be to not make a big deal of it. A graph may or may not be very important; it can still be good to see and discuss.
I will need to let these blog posts interweave with the more carefully polished "El Niño project (part $n$)" posts.
Yes, it's good to post code. I'm going to want to start writing a lot of short blog articles that show graphs and other visualizations, with links to the code that made them, and just enough explanation that people can understand what they're seeing. People here are starting to generate lots of visualizations, and I think blog articles would be a way to make it easier to find these. I think people would enjoy reading them, too. The trick will be to not make a big deal of it. A graph may or may not be very important; it can still be good to see and discuss. I will need to let these blog posts interweave with the more carefully polished "El Niño project (part $n$)" posts.