#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Options

# Blog - how good is climate science temperature data?

I started editing the blog article about the missing temperature data at:

I would like the images to be included from the randform blog article here on Azimuth. John is there a possibility to download them?

• Options
1.
edited October 2014

nad, Your analysis is not quantitative. It says, basically, gosh that looks sparse. The question is, relative to what?

An index is basically a spatial integral. In effect, you interpolate to form a continuous approximation, then integrate the approximation. So the question is whether the data is dense enough for interpolation.

Monthly temperatures are spatially correlated. You can see it in these maps. There is noise, but you are averaging. Hansen says the correlation length scale is something like 1200 km. So that is the sort of minimum density that you need.

I've looked at whether you can get a reasonable global estimate from just 60 land stations. Past posts and history are described here. I think the answer is, it's not bad. Of course, 60 is extreme. The other thing to remember is that at least two thirds of the global temperature estimate is sea surface temperature. That has its own sampling issues, but different, and SST can include space surveys.

Comment Source:nad, Your analysis is not quantitative. It says, basically, gosh that looks sparse. The question is, relative to what? An index is basically a spatial integral. In effect, you interpolate to form a continuous approximation, then integrate the approximation. So the question is whether the data is dense enough for interpolation. Monthly temperatures are spatially correlated. You can see it in <a href="http://www.moyhu.blogspot.com.au/p/blog-page_24.html">these maps</a>. There is noise, but you are averaging. Hansen says the correlation length scale is something like 1200 km. So that is the sort of minimum density that you need. I've looked at whether you can get a reasonable global estimate from just 60 land stations. Past posts and history are described <a href="http://moyhu.blogspot.com.au/2014/01/just-60-global-stations-area-weighting.html">here</a>. I think the answer is, it's not bad. Of course, 60 is extreme. The other thing to remember is that at least two thirds of the global temperature estimate is sea surface temperature. That has its own sampling issues, but different, and SST can include space surveys.
• Options
2.

Nick, in this blog post where you wrote:

Twice before I've written about using just 60 met stations to estimate the Earth's global temperature anomaly. The first time I simply chose 61 stations according to "quality" criteria (rural, 90 years records). The resulting average was quite close to HADCRUt land only and the GISS met stations index. This was somewhat surprising, because the stations were in effect not area-weighted, but sin(latitude) weighted.

Are you talking about the anomalies? Which datafiles exactly are you comparing with?

Comment Source:Nick, in this [blog post](http://moyhu.blogspot.com.au/2014/01/just-60-global-stations-area-weighting.html) where you wrote: >Twice before I've written about using just 60 met stations to estimate the Earth's global temperature anomaly. The first time I simply chose 61 stations according to "quality" criteria (rural, 90 years records). The resulting average was quite close to HADCRUt land only and the GISS met stations index. This was somewhat surprising, because the stations were in effect not area-weighted, but sin(latitude) weighted. Are you talking about the anomalies? Which datafiles exactly are you comparing with?
• Options
3.

There is today an article on Spiegel Online, about methane leaks detected by spectral measurements via the SCIAMACHY mission as described in an article which I can't read. Would be interesting to see wether a local temperature difference could be observed due to this leak.

Comment Source:There is today an article on [Spiegel Online](http://www.spiegel.de/wissenschaft/weltall/satellitendaten-zeigen-methan-ausstoss-der-usa-a-996542.html), about methane leaks detected by spectral measurements via the [SCIAMACHY](http://de.wikipedia.org/wiki/SCIAMACHY) mission as described in an [article](http://onlinelibrary.wiley.com/doi/10.1002/2014GL061503/abstract) which I can't read. Would be interesting to see wether a local temperature difference could be observed due to this leak.
• Options
4.

nad, all my references to spatial averages will be to anomalies. The HADCRUT reference is CRUTEN 3. The Giss Met stations index is linked just a bit further down.

Comment Source:nad, all my references to spatial averages will be to anomalies. The HADCRUT reference is CRUTEN 3. The Giss Met stations index is linked just a bit further down.
• Options
5.

nad, all my references to spatial averages will be to anomalies. The HADCRUT reference is CRUTEN 3. The Giss Met stations index is linked just a bit further down.

thanks. With the GISS data do you mean this one ?

Comment Source:>nad, all my references to spatial averages will be to anomalies. The HADCRUT reference is CRUTEN 3. The Giss Met stations index is linked just a bit further down. thanks. With the GISS data do you mean [this one](http://data.giss.nasa.gov/gistemp/graphs_v3/Fig.A.txt) ?
• Options
6.

nad, It may be the same, but I use this one.

Comment Source:nad, It may be the same, but I use <a href="http://data.giss.nasa.gov/gistemp/tabledata_v3/GLB.Ts.txt">this one</a>.
• Options
7.

nad, It may be the same, but I use this one.

aha thanks.

Sofar I don't really understood why you claim that

Monthly temperatures are spatially correlated. You can see it in these maps. There is noise, but you are averaging. Hansen says the correlation length scale is something like 1200 km. So that is the sort of minimum density that you need.

-like even if I look at your visualization then I saw there once in a while little red blotches in a blue sea (like probably coming from one station).

There may be temperature correlations, however especially for geographical pecularities those may eventually be rather small, like on can see the alps in the colors in our station data visualization.

Comment Source:>nad, It may be the same, but I use this one. aha thanks. Sofar I don't really understood why you claim that >Monthly temperatures are spatially correlated. You can see it in these maps. There is noise, but you are averaging. Hansen says the correlation length scale is something like 1200 km. So that is the sort of minimum density that you need. -like even if I look at your visualization then I saw there once in a while little red blotches in a blue sea (like probably coming from one station). There may be temperature correlations, however especially for geographical pecularities those may eventually be rather small, like on can see the alps in the colors in our station data visualization.
• Options
8.

By the ways you wrote here:

Twice before I've written about using just 60 met stations to estimate the Earth's global temperature anomaly. The first time I simply chose 61 stations according to "quality" criteria (rural, 90 years records). The resulting average was quite close to HADCRUt land only and the GISS met stations index. This was somewhat surprising, because the stations were in effect not area-weighted, but sin(latitude) weighted.

Do I understand this correctly that Crutem land only and GISS are only similar with sin(latitude) weighting? I haven't really looked in the definition of the Crutem or GISS Anomalies, but is this mentioned somewhere? If yes where?

Comment Source:By the ways you wrote [here](http://moyhu.blogspot.com.au/2014/01/just-60-global-stations-area-weighting.html): >Twice before I've written about using just 60 met stations to estimate the Earth's global temperature anomaly. The first time I simply chose 61 stations according to "quality" criteria (rural, 90 years records). The resulting average was quite close to HADCRUt land only and the GISS met stations index. This was somewhat surprising, because the stations were in effect not area-weighted, but sin(latitude) weighted. Do I understand this correctly that Crutem land only and GISS are only similar with sin(latitude) weighting? I haven't really looked in the definition of the Crutem or GISS Anomalies, but is this mentioned somewhere? If yes where?
• Options
9.
edited October 2014

The GISS Met stations index is different from others. It is a continuation of Hansen's original index, which was intended to use land data (all they really had) to represent the entire earth. It had a complex hierarchical grid scheme. CRUTEM etc use land data to represent land only. The map gives a qualitative picture.

But really, there's a whole lot of quantitative stuff that you need to catch up on. Hansen has been writing about this for over 30 years. Here is a list of his papers, with links. Here is a paper where he describes his met stations scheme, and the correlation that it is based on.

Comment Source:nad, The GISS Met stations index is different from others. It is a continuation of Hansen's original index, which was intended to use land data (all they really had) to represent the entire earth. It had a complex hierarchical grid scheme. CRUTEM etc use land data to represent land only. The map gives a qualitative picture. But really, there's a whole lot of quantitative stuff that you need to catch up on. Hansen has been writing about this for over 30 years. <a href="http://pubs.giss.nasa.gov/authors/jhansen.html">Here</a> is a list of his papers, with links. <a href="http://pubs.giss.nasa.gov/docs/1987/1987_Hansen_Lebedeff_1.pdf">Here</a> is a paper where he describes his met stations scheme, and the correlation that it is based on.
• Options
10.

In the article he writes:

The 1200-km limit is the distance at which the average correlation coefficient of temperature variations falls to 0.5 at middle and high latitudes and 0.33 at low latitudes

I didn't see a formula for the correlation in his article (where I might have overseen something) and we saw that there seem to exist various notions of correlation measurements in climate science. Maybe it is defined in another article. Thus let's call that particular correlation which is used by Hansen: Hansen-correlation. The Hansen correlation is one particular measure and it may always be that a particular measure may not be enough or even unappropriate to grasp specific differences.

Comment Source:In the [article](http://pubs.giss.nasa.gov/docs/1987/1987_Hansen_Lebedeff_1.pdf) he writes: > The 1200-km limit is the distance at which the average correlation coefficient of temperature variations falls to 0.5 at middle and high latitudes and 0.33 at low latitudes I didn't see a formula for the correlation in his article (where I might have overseen something) and we saw that there seem to exist various notions of correlation measurements in climate science. Maybe it is defined in another article. Thus let's call that particular correlation which is used by Hansen: Hansen-correlation. The Hansen correlation is one particular measure and it may always be that a particular measure may not be enough or even unappropriate to grasp specific differences.
• Options
11.
edited October 2014

The GISS Met stations index is different from others.

Nick, I cited the GISS page, where it is said that they use now NOAA data. Could you please check, wether I got all your used data right? In particular found the references on your page not so easy to find.

John the post is ready now apart from the images and english polish and eventual additions by Nick. Can you please download the images to your wordpressaccount and NOT link to the randform ones. Apart from the quota limit aspect, I guess your wordpress blog is hosted on some US server, while our server is here in Berlin, so synchronicity is an argument.

Comment Source:>The GISS Met stations index is different from others. Nick, I cited the GISS page, where it is said that they use now NOAA data. Could you please check, wether I got all your used data right? In particular found the references on your page not so easy to find. John the post is ready now apart from the images and english polish and eventual additions by Nick. Can you please download the images to your wordpressaccount and NOT link to the randform ones. Apart from the quota limit aspect, I guess your wordpress blog is hosted on some US server, while our server is here in Berlin, so synchronicity is an argument.
• Options
12.
edited October 2014

John the post is ready now apart from the images and english polish and eventual additions by Nick.

I think this should be self-understood but just in case: I demand to see and be able to change the final version before publication.

Comment Source:>John the post is ready now apart from the images and english polish and eventual additions by Nick. I think this should be self-understood but just in case: I demand to see and be able to change the final version before publication.
• Options
13.

nad, "So which original data file did you use for the kmz’s? - could you give me the original data location for those data files" Each data supplier gives at least an inventory file and a monthly data file. The inventory has one line per station giving lat/lon and various metadata. That's what I used for the KMZ and is what is on the ALL4inv.zip file in csv format.

"Nick did you change something at the kmz files within the last tow days and today?" No. One merit of that site where I linked is that it shows the change times. The data is foldered - you might have switched folders.

Comment Source:nad, <i>"So which original data file did you use for the kmz’s? - could you give me the original data location for those data files"</i> Each data supplier gives at least an inventory file and a monthly data file. The inventory has one line per station giving lat/lon and various metadata. That's what I used for the KMZ and is what is on the ALL4inv.zip file in csv format. <i>"Nick did you change something at the kmz files within the last tow days and today?"</i> No. One merit of that site where I linked is that it shows the change times. The data is foldered - you might have switched folders.
• Options
14.

nad, “So which original data file did you use for the kmz’s? - could you give me the original data location for those data files” Each data supplier gives at least an inventory file and a monthly data file. The inventory has one line per station giving lat/lon and various metadata. That’s what I used for the KMZ and is what is on the ALL4inv.zip file in csv format.

yes and the files are eventually updated thats why it would be good to have the exact name and file location of the respective original files. If you don't know them anymore, then can you please check wether what I had written about your visualization is right in particular concerning the data ressources - like I had guessed that GSOD is probably the one from NOAA.

The data is foldered - you might have switched folders.

No I remember the yellow colour of BEST, but I probably had forgotten to tick off the BEST 1950's to 1980's

Comment Source:>nad, “So which original data file did you use for the kmz’s? - could you give me the original data location for those data files” Each data supplier gives at least an inventory file and a monthly data file. The inventory has one line per station giving lat/lon and various metadata. That’s what I used for the KMZ and is what is on the ALL4inv.zip file in csv format. yes and the files are eventually updated thats why it would be good to have the exact name and file location of the respective original files. If you don't know them anymore, then can you please check wether what I had written about your visualization is right in particular concerning the data ressources - like I had guessed that GSOD is probably the one from NOAA. >The data is foldered - you might have switched folders. No I remember the yellow colour of BEST, but I probably had forgotten to tick off the BEST 1950's to 1980's
• Options
15.

Comment Source:Just added something about GHCN v3. Please check Nick.
• Options
16.

Dara just posted quite a bit of comments so I am a little afraid that my announcement for this post drowns in the forum thread list. I currently think about wether I should enhance the multimodal capabilities of this forum by lifting this thread on the forum like by lets say 7-hourly comments here.

Comment Source:Dara just posted quite a bit of comments so I am a little afraid that my announcement for this post drowns in the forum thread list. I currently think about wether I should enhance the <a href="http://www.sfb673.org/projects/C5">multimodal capabilities of this forum</a> by lifting this thread on the forum like by lets say 7-hourly comments here.
• Options
17.

nad, You may be interested in this way of treating data. It has a Google Maps interface in one pane. You can choose a region, and it will show the stations with flags in Google Maps. Choosing more than one region is possible, but it gets slow. You can choose different sets (GHCN etc), and select by start and end dates.

But then if you click on a flag, it shows a plot of the annual temperature data on the left pane. It's an active plot, and you can stretch or shrink the axes.

Here is a simpler version that just displays stations (Google Maps), but lets you choose in various ways. It's like KML, but more flexible.

Comment Source:nad, You may be interested in <a href="http://www.moyhu.blogspot.com.au/2012/12/universal-station-locator-and-history.html">this</a> way of treating data. It has a Google Maps interface in one pane. You can choose a region, and it will show the stations with flags in Google Maps. Choosing more than one region is possible, but it gets slow. You can choose different sets (GHCN etc), and select by start and end dates. But then if you click on a flag, it shows a plot of the annual temperature data on the left pane. It's an active plot, and you can stretch or shrink the axes. <a href="http://moyhu.blogspot.com.au/2011/12/google-maps-display-of-ghcn-stations.html">Here</a> is a simpler version that just displays stations (Google Maps), but lets you choose in various ways. It's like KML, but more flexible.
• Options
18.

nad, You may be interested in this way of treating data.

Here is a simpler version that just displays stations

Nick I don't feel rewriting the blog article every second day because you dig out yet another visualization. Are there more to come? I had problems with both of the visualizations you mentioned above in particular most unfortunate was that one couldn't set the time range, that is you can set it with the buttons, but it doesn't work, I mentioned this thus in the blog post. Like choose the station Essey (Siberia) in the first visualization (in GHCN) in the range 2000-2010 and look at the diagram. It closed before 2000 but the marker still appears on the map. So one can't get an overview which stations closed in recent times and which not. It seemed to me that the same is happening with the second visualization. Moreover I didn't really see that the different sets appear differently. In addition there were other problems especially with the diagram (sometimes not appearing, not erasable etc). The google earth vis works fine but those two I think are not so good, I mean not showing the time interval is unfortunately a false representation of data. This might happen by accident, but if one knows it is false one has at least to mention the problem. I added now also a sentence about Clive Best's vis.

John what do you think?

Anyways what is gadj ? This set appeared in the first mentioned visualization and is now in the blog post sofar without reference. Nick please check wether I got your data files right and supplement the missing information.

Comment Source:>nad, You may be interested in this way of treating data. >Here is a simpler version that just displays stations Nick I don't feel rewriting the blog article every second day because you dig out yet another visualization. Are there more to come? I had problems with both of the visualizations you mentioned above in particular most unfortunate was that one couldn't set the time range, that is you can set it with the buttons, but it doesn't work, I mentioned this thus in the blog post. Like choose the station Essey (Siberia) in the first visualization (in GHCN) in the range 2000-2010 and look at the diagram. It closed before 2000 but the marker still appears on the map. So one can't get an overview which stations closed in recent times and which not. It seemed to me that the same is happening with the second visualization. Moreover I didn't really see that the different sets appear differently. In addition there were other problems especially with the diagram (sometimes not appearing, not erasable etc). The google earth vis works fine but those two I think are not so good, I mean not showing the time interval is unfortunately a false representation of data. This might happen by accident, but if one knows it is false one has at least to mention the problem. I added now also a sentence about Clive Best's vis. John what do you think? Anyways what is gadj ? This set appeared in the first mentioned visualization and is now in the blog post sofar without reference. Nick please check wether I got your data files right and supplement the missing information.
• Options
19.
edited October 2014

"Nick I don’t feel rewriting the blog article every second day because you dig out yet another visualization."

They are all on my blog. They are even arranged as a gallery here (linked as a page on the right).

Comment Source:nad, <i>"Nick I don’t feel rewriting the blog article every second day because you dig out yet another visualization."</i> They are all on my blog. They are even arranged as a gallery <a href="http://www.moyhu.blogspot.com.au/p/gallery-of-javascript-enhanced-graphics.html">here</a> (linked as a page on the right). As said in the text, "Gadj means GHCN adjusted".
• Options
20.

Dara just posted quite a bit of comments so I am a little afraid that my announcement for this post drowns in the forum thread list

Azimuth needs GUI standards in order to avoid these sorts of issues Nad mentioned. D3 is wonderful and runs on almost any platform, and iPython notebooks use really advanced standard scientific graphics.

D

Comment Source:>Dara just posted quite a bit of comments so I am a little afraid that my announcement for this post drowns in the forum thread list Sorry Nad! Azimuth needs GUI standards in order to avoid these sorts of issues Nad mentioned. D3 is wonderful and runs on almost any platform, and iPython notebooks use really advanced standard scientific graphics. D
• Options
21.

Nick wrote:

Which GHCN? v2 or v3 ? What means adjusted?

They are all on my blog.

Nick, Sorry I didn't follow your blog and I don't have the time to search for eventual graphics. The overview shows only the technical methods used but doesn't explain what you showed and why you showed it, so one would need to follow all those links. Sorry - as said I have no time for this.

I assume from now on that you now mentioned all temperature visualizations which are in principle useful in detecting gaps in station data unless you do not protest.

If you don't check wether I got your decriptions and data sets right, I will leave it at that and maybe indicate that insufficient check with a comment.

Dara wrote:

Azimuth needs GUI standards in order to avoid these sorts of issues Nad mentioned.

Unfortunately it seems that there are not much ressources for such extras.

Comment Source:Nick wrote: >As said in the text, “Gadj means GHCN adjusted”. Which GHCN? v2 or v3 ? What means adjusted? >They are all on my blog. Nick, Sorry I didn't follow your blog and I don't have the time to search for eventual graphics. The overview shows only the technical methods used but doesn't explain what you showed and why you showed it, so one would need to follow all those links. Sorry - as said I have no time for this. I assume from now on that you now mentioned all temperature visualizations which are in principle useful in detecting gaps in station data unless you do not protest. If you don't check wether I got your decriptions and data sets right, I will leave it at that and maybe indicate that insufficient check with a comment. Dara wrote: >Azimuth needs GUI standards in order to avoid these sorts of issues Nad mentioned. Unfortunately it seems that there are not much ressources for such extras.
• Options
22.

It's GHCN V3. They produce adjusted (QCA) and unadjusted (QCU) monthly averages. Adjusted mostly means homogenised. The stations are the same. The reason for the separate button here is that it shows also the plot of annual temperature, adjusted or not depending on that setting.

It's up to you what you choose to put in your article.

Comment Source:nad, It's GHCN V3. They produce adjusted (QCA) and unadjusted (QCU) monthly averages. Adjusted mostly means homogenised. The stations are the same. The reason for the separate button here is that it shows also the plot of annual temperature, adjusted or not depending on that setting. It's up to you what you choose to put in your article.
• Options
23.

It’s up to you what you choose to put in your article.

Allright I did this.

John - if I understood correctly Nick doesn't want to add anything to the article, so apart from eventual last minute fixes and the english polish the blog post is done.

Comment Source:>It’s up to you what you choose to put in your article. Allright I did this. John - if I understood correctly Nick doesn't want to add anything to the article, so apart from eventual last minute fixes and the english polish the blog post is done.
• Options
24.

Unfortunately it seems that there are not much ressources for such extras.

Nad without investment into serious GUI and visualization this forum on atmospheric/climate systems will not produce anything of significance.

D

Comment Source:>Unfortunately it seems that there are not much ressources for such extras. Nad without investment into serious GUI and visualization this forum on atmospheric/climate systems will not produce anything of significance. D
• Options
25.
edited November 2014

I tried editing the grammar, spelling and style of the article, and changed its title to the usual style for blog articles on the wiki:

I tried to break up the long and winding Germanic sentences into the shorter sentences we use in English. I didn't succeed in chopping up this one yet:

In the first visualization, which is more exhaustive GHCNv3 and an adjusted version of it (gadj) is shown, moreover a preliminary dataset from ISTI, BEST and CRUTEM 4 are shown, so this visualization would be pretty exhaustive also with respect to newer data, unfortunately however setting the time range didn’t work properly (at least when I tested it), the same holds for his second visualization of GHCN v3 data so one could here trace the deterioration of recent data only manually (like by clicking on individual stations).

I also didn't add the pictures yet: I need to put copies on my website, and link to those.

Comment Source:I tried editing the grammar, spelling and style of the article, and changed its title to the usual style for blog articles on the wiki: * [Blog - how good is climate science temperature data?](http://www.azimuthproject.org/azimuth/show/Blog+-+how+good+is+climate+science+temperature+data%3F) I tried to break up the long and winding Germanic sentences into the shorter sentences we use in English. I didn't succeed in chopping up this one yet: > In the first visualization, which is more exhaustive GHCNv3 and an adjusted version of it (gadj) is shown, moreover a preliminary dataset from ISTI, BEST and CRUTEM 4 are shown, so this visualization would be pretty exhaustive also with respect to newer data, unfortunately however setting the time range didn’t work properly (at least when I tested it), the same holds for his second visualization of GHCN v3 data so one could here trace the deterioration of recent data only manually (like by clicking on individual stations). I also didn't add the pictures yet: I need to put copies on my website, and link to those.
• Options
26.

I changed "Nick Stove" to "Nick Stokes" everywhere in the article, because I believe Nad is talking about Nick Stokes.

Comment Source:I changed "Nick Stove" to "Nick Stokes" everywhere in the article, because I believe Nad is talking about Nick Stokes.
• Options
27.

I changed “Nick Stove” to “Nick Stokes” everywhere in the article, because I believe Nad is talking about Nick Stokes

Yes of course. Gee, I didn't even notice this! I am very sorry. Nick - why didn't you correct me?!

I tried to break up the long and winding Germanic sentences into the shorter sentences we use in English. I didn’t succeed in chopping up this one yet:

thanks for reminding me. I usually try to not forget about chopping. This sentence seems to have evaded my attention. In some cases it is actually a kind of art form to make long sentences in german, but usually not in scientific texts. The difficult part here is in particular to close them properly. With closing them properly I mean cases where for example you have a verb like HERUMGEHEN (walk around (literally it is actually you do AROUNDWALK)). If you use this verb then usually you have to break up the verb, so a long sentence with that verb would be:

Die Leute GINGEN, während sie sich prüfend in die Augen blickten, ungeduldig mit den Füßen scharrten und ihre Arme in Kampfbereitschaft hielten, wie bei einem Kreistanz, angespannt um sich HERUM.

that is literally: The people WALKED, while peering into each others eyes, impatiently shuffling their feet and holding their arms prepared for a fight, like in a round dance, pent-up AROUND themselves.

(no guarantee for the commas) Actually it seems to me that the above sentence would be even valid english in this case, apart may be from the fact that you would probably place the "like in a round dance" at the end of the sentence. But you are the native speaker. Unfortunately especially in speeches there are people who start out making long german sentences and then forget to close them, which I find rather strenous listening to, in those cases I feel that urge to shout in between in order to close the sentence. I was once in one talk where I had to leave the talk because I couldn't stand it anymore. But I don't want to exclude that once in a while this might happen to me too, but I usually try actively not to do so.

The sentence is really hard to read, especially since I forgot a comma after exhaustive. So I chopped up the sentence now and repeated that I speak about Nick Stokes visualizations, it is now:

In Nick Stokes first visualization, which is more exhaustive than the second, the following datasets are shown: GHCNv3 and an adjusted version of it (gadj), a prelimary dataset from ISTI, BEST and CRUTEM 4. So his first visualization would be pretty exhaustive also with respect to newer data, unfortunately setting the time range didn't work properly (at least when I tested it), the same holds for his second visualization of GHCN v3 data so one could here trace the deterioration of recent data only manually (like by clicking on individual stations).

I also didn’t add the pictures yet: I need to put copies on my website, and link to those.

Yes thanks it would be good if you could do that, I need to keep down the traffic and storage at our blog.

Comment Source:>I changed “Nick Stove” to “Nick Stokes” everywhere in the article, because I believe Nad is talking about Nick Stokes Yes of course. Gee, I didn't even notice this! I am very sorry. Nick - why didn't you correct me?! >I tried to break up the long and winding Germanic sentences into the shorter sentences we use in English. I didn’t succeed in chopping up this one yet: thanks for reminding me. I usually try to not forget about chopping. This sentence seems to have evaded my attention. In some cases it is actually a kind of art form to make long sentences in german, but usually not in scientific texts. The difficult part here is in particular to close them properly. With closing them properly I mean cases where for example you have a verb like HERUMGEHEN (walk around (literally it is actually you do AROUNDWALK)). If you use this verb then usually you have to break up the verb, so a long sentence with that verb would be: >Die Leute GINGEN, während sie sich prüfend in die Augen blickten, ungeduldig mit den Füßen scharrten und ihre Arme in Kampfbereitschaft hielten, wie bei einem Kreistanz, angespannt um sich HERUM. that is literally: The people WALKED, while peering into each others eyes, impatiently shuffling their feet and holding their arms prepared for a fight, like in a round dance, pent-up AROUND themselves. (no guarantee for the commas) Actually it seems to me that the above sentence would be even valid english in this case, apart may be from the fact that you would probably place the "like in a round dance" at the end of the sentence. But you are the native speaker. Unfortunately especially in speeches there are people who start out making long german sentences and then forget to close them, which I find rather strenous listening to, in those cases I feel that urge to shout in between in order to close the sentence. I was once in one talk where I had to leave the talk because I couldn't stand it anymore. But I don't want to exclude that once in a while this might happen to me too, but I usually try actively not to do so. The sentence is really hard to read, especially since I forgot a comma after exhaustive. So I chopped up the sentence now and repeated that I speak about Nick Stokes visualizations, it is now: >In Nick Stokes first visualization, which is more exhaustive than the second, the following datasets are shown: GHCNv3 and an adjusted version of it (gadj), a prelimary dataset from <a href="http://surfacetemperatures.blogspot.com.au/2012/12/databank-release-beta-2.html">ISTI</a>, BEST and CRUTEM 4. So his first visualization would be pretty exhaustive also with respect to newer data, unfortunately setting the time range didn't work properly (at least when I tested it), the same holds for his second visualization of GHCN v3 data so one could here trace the deterioration of recent data only manually (like by clicking on individual stations). >I also didn’t add the pictures yet: I need to put copies on my website, and link to those. Yes thanks it would be good if you could do that, I need to keep down the traffic and storage at our blog.
• Options
28.

I'll copy the pictures over and add them to the article sometime when I need a bit of relaxation.

I found the list of temperature data collections a bit confusing, because there were sometimes more than one collection listed in one bullet point. I also think many nonexpert readers will find the many acronyms for these databases a bit overwhelming. This would be a good opportunity to introduce these databases to Azimuth Blog readers, since I haven't talked about them before.

So, I've introduced more bullet points, and also given bullet points for all 4 temperature data collections used in the AR4 report, listed right at the start of the article.

Things I want to do:

1) I would like to provide a link to the CDC database. However, I haven't find one yet.

2) I would like to provide a link to the GSOD (Global Summary of the Day). However, I haven't found one that actually works! There seem to be a lot of broken links, e.g. here. Click on "view data" or "detailed information". Does it work for you? If it's impossible to find the GSOD data, we probably shouldn't include it in the list.

3) I would like to provide a link to the Global Historical Climatology Network. You gave a link to the Wikipedia article about it. Can we get a link to the actual database?

I also improved spelling and grammar a bit more, and changed "CRUTEM 3" and "CRUTEM 4" everywhere to "CRUTEM3" and "CRUTEM4" - since that's what people seem to actually say.

Comment Source:I'll copy the pictures over and add them to the article sometime when I need a bit of relaxation. I found the list of temperature data collections a bit confusing, because there were sometimes more than one collection listed in one bullet point. I also think many nonexpert readers will find the many acronyms for these databases a bit overwhelming. This would be a good opportunity to introduce these databases to Azimuth Blog readers, since I haven't talked about them before. So, I've introduced more bullet points, and also given bullet points for all 4 temperature data collections used in the AR4 report, listed right at the start of the article. Things I want to do: 1) I would like to provide a link to the CDC database. However, I haven't find one yet. 2) I would like to provide a link to the GSOD (Global Summary of the Day). However, I haven't found one that actually works! There seem to be a lot of broken links, e.g. [here](http://www.climate.gov/daily-observational-data-global-summary-day-gsod-%E2%80%93-gis-data-locator). Click on "view data" or "detailed information". Does it work for you? If it's impossible to find the GSOD data, we probably shouldn't include it in the list. 3) I would like to provide a link to the Global Historical Climatology Network. You gave a link to the Wikipedia article about it. Can we get a link to the actual database? I also improved spelling and grammar a bit more, and changed "CRUTEM 3" and "CRUTEM 4" everywhere to "CRUTEM3" and "CRUTEM4" - since that's what people seem to actually say.
• Options
29.
edited November 2014

I found the list of temperature data collections a bit confusing, because there were sometimes more than one collection listed in one bullet point. I also think many nonexpert readers will find the many acronyms for these databases a bit overwhelming.

The acronyms are indeed confusing, but last but not at least listing them and eventually commenting on them is part of this blog post. Thanks for the editing and the bullet points, yes that looks clearer.

1) I would like to provide a link to the CDC database. However, I haven’t find one yet.

No wonder that you haven't found one yet - It seems the CDC is the NCDC database. That is either I made a typo when copying CDC or they changed it to NCDC, which I find likely (the file was edited last nov.14, 2014). Anyways the current link to the dataset provides a quicktime movie or something. I couldn't load it, that is after getting an announcement that some additional parts for quicktime need to be downloaded it seemed my browser was blocked and thus I stopped that link before the browser crashed. I think there was no link or a broken link at the time of writing the blog post - otherwise I would have linked to it, but of course I also could have forgotten to do so. I meanwhile make sometimes screenshots of some webpages in order to be able to recover what happened, but I didn't in this case.

(side remark: In the case of my social security petition I even did a video documentation of how my petition was not visible and reachable at the official site, where it was supposed to appear. I did this in particular because the official letter claimed (however in sort of ambiguous wording) that the petition was displayed as necessary, which it wasn't. (I found that disparity between claim and deed of such a major democratic institution actually rather disconcerting)).

The additional info in the text next to the AR4 listing of NCDC mentions GHCNM v3 thats whats probably the data source. Thus I changed now CDC to NCDC and linked to the wikipedia entry (see also below about wikipedia) and wrote that GHCNMv3 is probably the source.

2) I would like to provide a link to the GSOD (Global Summary of the Day). However, I haven’t found one that actually works! There seem to be a lot of broken links, e.g. here. Click on “view data” or “detailed information”. Does it work for you? If it’s impossible to find the GSOD data, we probably shouldn’t include it in the list.

"view data" does display a map but the interactivity is only partially working. I included a little sentence about that visualization. "detailed information" links to this website, where indeed e.g. the csv data has no link. There is however a link to a ftp server, so in principle it seems one could get the data via ftp, but eventually you would need a password for that. So it seems it is there, but not easily available. I had actually asked Nick to help me with the link adresses, because it was actually through his visualization that I learned about GSOD, but he didn't feel like doing so. So concluding: I like to keep that link to wikipedia, because in some sense not linking directly could intrinsically imply that there is a problem with data retrieval. Moreover I am not sure wether Nick's GSOD is that GSOD. -and there is another reason for linking to wikipedia this refers also to your next question:

3) I would like to provide a link to the Global Historical Climatology Network. You gave a link to the Wikipedia article about it. Can we get a link to the actual database?

that is in cases where I am not sure how stable links are I prefer to link to wikipedia, because it is rather likely that a direct link in wikipedia (and I just checked again there is still a direct link to GHCN in wikipeda) is getting updated if it changed.

I’ll copy the pictures over and add them to the article sometime when I need a bit of relaxation.

I know what you mean, but John - this is not a real "relaxation". I am a bit concerned about your work load. But you know that already (see my comment to your machine learning/processing adventure, which I consider problematic). So I am almost at the point to say ok link to randform for the pictures but since I really need to avoid too much traffic, I don't and let you instead "relax" with downloading images....

Comment Source:>I found the list of temperature data collections a bit confusing, because there were sometimes more than one collection listed in one bullet point. I also think many nonexpert readers will find the many acronyms for these databases a bit overwhelming. The acronyms are indeed confusing, but last but not at least listing them and eventually commenting on them is part of this blog post. Thanks for the editing and the bullet points, yes that looks clearer. >1) I would like to provide a link to the CDC database. However, I haven’t find one yet. No wonder that you haven't found one yet - It seems the CDC is the NCDC database. That is either I made a typo when copying CDC or they changed it to NCDC, which I find likely (the file was edited last nov.14, 2014). Anyways the current link to the dataset provides a quicktime movie or something. I couldn't load it, that is after getting an announcement that some additional parts for quicktime need to be downloaded it seemed my browser was blocked and thus I stopped that link before the browser crashed. I think there was no link or a broken link at the time of writing the blog post - otherwise I would have linked to it, but of course I also could have forgotten to do so. I meanwhile make sometimes screenshots of some webpages in order to be able to recover what happened, but I didn't in this case. (side remark: In the case of my <a href="http://www.randform.org/blog/?p=4845">social security petition</a> I even did a video documentation of how my petition was not visible and reachable at the official site, where it was supposed to appear. I did this in particular because the official letter claimed (however in sort of ambiguous wording) that the petition was displayed as necessary, which it wasn't. (I found that disparity between claim and deed of such a major democratic institution actually rather disconcerting)). The additional info in the text next to the AR4 listing of NCDC mentions <a href="https://www.ncdc.noaa.gov/ghcnm/v3.php">GHCNM v3</a> thats whats probably the data source. Thus I changed now CDC to NCDC and linked to the wikipedia entry (see also below about wikipedia) and wrote that GHCNMv3 is probably the source. > 2) I would like to provide a link to the GSOD (Global Summary of the Day). However, I haven’t found one that actually works! There seem to be a lot of broken links, e.g. here. Click on “view data” or “detailed information”. Does it work for you? If it’s impossible to find the GSOD data, we probably shouldn’t include it in the list. "view data" does display a map but the interactivity is only partially working. I included a little sentence about that visualization. "detailed information" links to <a href="http://gis.ncdc.noaa.gov/geoportal/catalog/search/resource/details.jsp?id=gov.noaa.ncdc%3AC00516">this website</a>, where indeed e.g. the csv data has no link. There is however a link to a ftp server, so in principle it seems one could get the data via ftp, but eventually you would need a password for that. So it seems it is there, but not easily available. I had actually asked Nick to help me with the link adresses, because it was actually through his visualization that I learned about GSOD, but he didn't feel like doing so. So concluding: I like to keep that link to wikipedia, because in some sense not linking directly could intrinsically imply that there is a problem with data retrieval. Moreover I am not sure wether Nick's GSOD is that GSOD. -and there is another reason for linking to wikipedia this refers also to your next question: >3) I would like to provide a link to the Global Historical Climatology Network. You gave a link to the Wikipedia article about it. Can we get a link to the actual database? that is in cases where I am not sure how stable links are I prefer to link to wikipedia, because it is rather likely that a direct link in wikipedia (and I just checked again there is still a direct link to GHCN in wikipeda) is getting updated if it changed. >I’ll copy the pictures over and add them to the article sometime when I need a bit of relaxation. I know what you mean, but John - this is not a real "relaxation". I am a bit concerned about your work load. But you know that already (see my comment to your machine learning/processing adventure, which I consider problematic). So I am almost at the point to say ok link to randform for the pictures but since I really need to avoid too much traffic, I don't and let you instead "relax" with downloading images....
• Options
30.

It seems the CDC is the NCDC database

I was a little surprised that the Center for Disease Control would keep its own climate data, but it actually made sense. Now I am a little disappointed :)

Comment Source:> It seems the CDC is the NCDC database I was a little surprised that the Center for Disease Control would keep its own climate data, but it actually made sense. Now I am a little disappointed :)
• Options
31.

I have a list of links here which may help. It's intended for low-level links, usually connecting to a directory. GHCN is here. There is an unadjusted version (qcu) and adjusted (qca). It's 11.9 Mb in gz format. You need the inventory file too.

GSOD can be hard; I haven't dealt with it for a while. I think most people look to GHCN Daily for that sort of data. The directory is here.

Sorry that I have got behind with this thread - I'll look back to catch up.

Comment Source:I have a list of links <a href="http://moyhu.blogspot.com.au/p/blog-page_12.html">here</a> which may help. It's intended for low-level links, usually connecting to a directory. GHCN is <a href="ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v3/">here</a>. There is an unadjusted version (qcu) and adjusted (qca). It's 11.9 Mb in gz format. You need the inventory file too. GSOD can be hard; I haven't dealt with it for a while. I think most people look to GHCN Daily for that sort of data. The directory is <a href="ftp://ftp.ncdc.noaa.gov/pub/data/gsod/">here</a>. Sorry that I have got behind with this thread - I'll look back to catch up.
• Options
32.

"However, these posts seem to say little about the increasing amount of ‘missing data’."

I don't think there is an increasing amount of 'missing data'. It's a complaint that you find in posts like this. But it is ill-informed. There are organised datasets, like GHCN and USHCN, that are widely used, and have a selection of stations that are regularly maintained and have long, reasonably high quality records. One would always like more, but the fact is that there are enough.

The WUWT stories come from the fact that GHCN was originally a grant funded study in the early 90's. At that time the long process of transferring millions of handwritten/typed logs to digital was reasonably complete, and they made a comprehensive collection. The project finished. Then, some time later, NOAA agreed to maintain it. But keeping up regular production monthly is a very different task to a once only record. So they continued a selected group of about 1800, with as good spatial distribution as could be managed. There is plenty more data in GSOD, GHCN Daily etc. But people use GHCN-M because it is well maintained, and has enough data for their needs. New datasets like BEST, ISTI etc have much more if people want them. But they are updated intermittently.

I'm conscious of this because I maintain a monthly index TempLS, for which I use GHCN unadjusted and ERSST for ocean. ERSST is on a 5° grid. So I typically have for each new month about 1800 land stations and 1800 SST grids. Land is already over-represented. The problem with land is unevenness, in Africa, and Arctic mainly. Reinstating the many hundreds of stations that were in the old GHCN in Turkey, say, would not help here.

That has been my disagreement about Nad's article from the beginning. It complains about data gaps but doesn't analyze how many are needed, or what the ill effects of reducing station numbers might be.

Comment Source:<i>"However, these posts seem to say little about the increasing amount of ‘missing data’."</i> I don't think there is an increasing amount of 'missing data'. It's a complaint that you find in posts like <a href="http://wattsupwiththat.com/2010/03/08/on-the-march-of-the-thermometers/">this</a>. But it is ill-informed. There are organised datasets, like GHCN and USHCN, that are widely used, and have a selection of stations that are regularly maintained and have long, reasonably high quality records. One would always like more, but the fact is that there are enough. The WUWT stories come from the fact that GHCN was originally a grant funded study in the early 90's. At that time the long process of transferring millions of handwritten/typed logs to digital was reasonably complete, and they made a comprehensive collection. The project finished. Then, some time later, NOAA agreed to maintain it. But keeping up regular production monthly is a very different task to a once only record. So they continued a selected group of about 1800, with as good spatial distribution as could be managed. There is plenty more data in GSOD, GHCN Daily etc. But people use GHCN-M because it is well maintained, and has enough data for their needs. New datasets like BEST, ISTI etc have much more if people want them. But they are updated intermittently. I'm conscious of this because I maintain a monthly index <a href="http://moyhu.blogspot.com.au/2014/10/a-catch-up-on-templs.html">TempLS</a>, for which I use GHCN unadjusted and ERSST for ocean. ERSST is on a 5&deg; grid. So I typically have for each new month about 1800 land stations and 1800 SST grids. Land is already over-represented. The problem with land is unevenness, in Africa, and Arctic mainly. Reinstating the many hundreds of stations that were in the old GHCN in Turkey, say, would not help here. That has been my disagreement about Nad's article from the beginning. It complains about data gaps but doesn't analyze how many are needed, or what the ill effects of reducing station numbers might be.
• Options
33.
edited November 2014

Nick thanks for pointing out that blog post on WUWT, I supplemented the blog post with that information.

I also added a sentence about that the blog post doesn't analyse the effects of missing data, but just tries to collect all what is there or not there. Its already long enough.

I didn't have the impression that BEST shows much more data in that respective regions, where a lot of temperature data is missing in CRUTEM and GHCN. ISTI seemed to have much more data, but then I couldn't investigate the ISTI data properly (in your visualization), moreover it seems that the database is still under construction, I think the quality control is also an issue which needs to be mentioned and discussed and most importantly ISTI wasn't sofar used for the IPCC report (up to AR4), i.e. more or less for the current climate discussions, so I mentioned ISTI, but left it sort of out in assessing wether it would cover the gaps in the respective regions.

Comment Source:Nick thanks for pointing out that blog post on WUWT, I supplemented the blog post with that information. I also added a sentence about that the blog post doesn't analyse the effects of missing data, but just tries to collect all what is there or not there. Its already long enough. I didn't have the impression that BEST shows much more data in that respective regions, where a lot of temperature data is missing in CRUTEM and GHCN. ISTI seemed to have much more data, but then I couldn't investigate the ISTI data properly (in your visualization), moreover it seems that the database is still under construction, I think the quality control is also an issue which needs to be mentioned and discussed and most importantly ISTI wasn't sofar used for the IPCC report (up to AR4), i.e. more or less for the current climate discussions, so I mentioned ISTI, but left it sort of out in assessing wether it would cover the gaps in the respective regions.
• Options
34.

Comment Source:I have added figures to this article (placed on my website, as Nad requested), and edited it a bit. I think it's ready! It's here: * [Blog - exploring climate data (part 3)](http://www.azimuthproject.org/azimuth/show/Blog+-+exploring+climate+data+%28part+3%29) I think this title is better than "how good is climate science temperature data?", because in fact this article doesn't really discuss how _good_ the data is: that would be a much bigger job. It fits naturally in our series about exploring climate data.