#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Options

# Community detection by graph Voronoi diagrams

Hello John

This is a nice short review of clustering algorithms, a more modern treatment and more applicable to your applications:

Community detection by graph Voronoi diagrams

Basically we should be able to take the atmospheric volumetric data and cluster as such to turn it into time-varying clusters.

• Options
1.
edited December 2014

I was thinking: The nodes in the networks I have seen people forming, are A Priori designed and allocated, if we could cluster the atmospheric or oceanic data, then the nodes in such clusters are dynamic. This is novel and needs investigation.

Comment Source:I was thinking: The nodes in the networks I have seen people forming, are A Priori designed and allocated, if we could cluster the atmospheric or oceanic data, then the nodes in such clusters are dynamic. This is novel and needs investigation.
• Options
2.
edited December 2014

Thanks!

Here's a vaguely related issue. Weather data is recorded at weather stations at various locations, and people usually process this data and attempt to compute average temperatures, air pressures, etc. on a grid in a Mercator projection, like this:

But this grid is artificial, and near the north and south poles the grid "rectangles" - not really rectangles - become very small. Maybe someday people will use a better system.

However, processing data from weather stations is a lot of work, and I don't want to do it!

What you are suggesting might be less work and more fun.

Comment Source:Thanks! Here's a vaguely related issue. Weather data is recorded at weather stations at various locations, and people usually process this data and attempt to compute average temperatures, air pressures, etc. on a _grid_ in a _Mercator projection_, like this: <img src = "http://math.ucr.edu/home/baez/ecological/el_nino/ludescher_el_nino_cooperativity_1.jpg" alt = ""/> But this grid is artificial, and near the north and south poles the grid "rectangles" - not really rectangles - become very small. Maybe someday people will use a better system. However, processing data from weather stations is a lot of work, and I don't want to do it! What you are suggesting might be less work and more fun.
• Options
3.

Hello John

The ground-station reading of the data is at the end of its life I believe the funding is out for the oceanic ones, instead the new satellites systems are doing that from space, this is sample I processed from GPM sats:

sample gpm

In this case there are no grids rather, warped lattices, co-mixture of several satellites with different yet overlapping orbits.

In either case you need not do much to the data, whether the data you plotted in #3 or the one I posted here.

This is what you do, organize the data into tuples (I used the same data set as the #3)

(long, lat, temp) and animate the cluster as t runs, then cluster:

Animated clusters of temperatures

You get regions each colored differently, so these make the new nodes for your network mode. Please note that I did a few clusters, in reality we will do larger number of clusters hence nodes.

You could use different distance metric function for the clusters.

What remains, and I need to learn from you, is once you have these regions as nodes how do you connect them to each other and how do you assign numbers to these links?

Dara

Comment Source:Hello John The ground-station reading of the data is at the end of its life I believe the funding is out for the oceanic ones, instead the new satellites systems are doing that from space, this is sample I processed from GPM sats: [sample gpm](http://atmospherics.lossofgenerality.com/media/testbot/output/RectSubArray_112214_1904/image.jpg) In this case there are no grids rather, warped lattices, co-mixture of several satellites with different yet overlapping orbits. In either case you need not do much to the data, whether the data you plotted in #3 or the one I posted here. This is what you do, organize the data into tuples (I used the same data set as the #3) (long, lat, temp) and animate the cluster as t runs, then cluster: [Animated clusters of temperatures](https://www.youtube.com/watch?v=K32yn9iMz0c) You get **regions** each colored differently, so these make the new nodes for your network mode. Please note that I did a few clusters, in reality we will do larger number of clusters hence nodes. You could use different distance metric function for the clusters. What remains, and I need to learn from you, is once you have these regions as nodes how do you connect them to each other and how do you assign numbers to these links? Dara
• Options
4.

Animated clusters of temperatures

Towards the end of the animation you can see clusters including poles and Atlantic regions sandwiching equator! This is an indication that the poles are participating in the weather system even around the equator. This could not be seen with naked eye and requires the machine learning clustering.

But oddest of all clusters is that of the yellowish coast of Chile i.e. it is always there sometimes as a narrow protrusion.

Comment Source:[Animated clusters of temperatures](https://www.youtube.com/watch?v=K32yn9iMz0c) Towards the end of the animation you can see clusters including poles and Atlantic regions sandwiching equator! This is an indication that the poles are participating in the weather system even around the equator. This could not be seen with naked eye and requires the machine learning clustering. But oddest of all clusters is that of the yellowish coast of Chile i.e. it is always there sometimes as a narrow protrusion.
• Options
5.

John, "But this grid is artificial, and near the north and south poles the grid “rectangles” - not really rectangles - become very small. Maybe someday people will use a better system."

I generally use a triangular mesh joining stations, for display and now for analysis. Process described here. More here. It amounts to weighting nodes by the area of their triangles. Here is an example of Voronoi usage. I find for most purposes that connecting centroids rather than circumcentres is easier and about as good.

Comment Source:John, <i>"But this grid is artificial, and near the north and south poles the grid “rectangles” - not really rectangles - become very small. Maybe someday people will use a better system."</i> I generally use a triangular mesh joining stations, for display and now for analysis. Process described <a href="http://www.moyhu.blogspot.com.au/2014/10/new-ideas-on-templs-reporting-mesh.html">here</a>. More <a href="http://www.moyhu.blogspot.com.au/2014/10/a-catch-up-on-templs.html">here</a>. It amounts to weighting nodes by the area of their triangles. <a href="http://moyhu.blogspot.com.au/2014/01/just-60-global-stations-area-weighting.html">Here</a> is an example of Voronoi usage. I find for most purposes that connecting centroids rather than circumcentres is easier and about as good.
• Options
6.

Nick has shown the Voronoi diagrams on the surface of the planet, the only addition I have to this is if you animate these diagrams in time, one can see the changes in the Voronoi cells. This gets us a dynamically changing graph. My gut feel is that these cells are best suited for local changes of climate systems.

I prefer the clustering approach more, since there is no idea of cells being convex and all that, so the algorithm is more general. This approach also animated in time issues a dynamical system, with the nodes of the cluster changing (different subsets of the surface of the planet).

In either case, the grid is replaced by some sort of clustering, which is far more reasonable fashion to partition the data on the planet. More importantly, as I showed in k-NN regression, the non-Euclidean metrics give the best forecasts, therefore they (non-Euclidean metrics) give the best clustering of the climate data as well.

Comment Source:Nick has shown the Voronoi diagrams on the surface of the planet, the only addition I have to this is if you animate these diagrams in time, one can see the changes in the Voronoi cells. This gets us a dynamically changing graph. My gut feel is that these cells are best suited for local changes of climate systems. I prefer the clustering approach more, since there is no idea of cells being convex and all that, so the algorithm is more general. This approach also animated in time issues a dynamical system, with the nodes of the cluster changing (different subsets of the surface of the planet). In either case, the grid is replaced by some sort of clustering, which is far more reasonable fashion to partition the data on the planet. More importantly, as I showed in k-NN regression, the non-Euclidean metrics give the best forecasts, therefore they (non-Euclidean metrics) give the best clustering of the climate data as well.
• Options
7.

Dara, in relation to comment #5,

I think you are using raw temperatures, not anomalies. Your clustering is therefore detecting things like high latitudes are cooler than low latitudes and high altitudes are cooler than low altitudes and winters are cooler than summers and perhaps low latitudes have less variation than high latitudes. All very true and important, but not much use for predicting climate. The clusters that emerge from anaomalies would be more interesting.

But oddest of all clusters is that of the yellowish coast of Chile i.e. it is always there sometimes as a narrow protrusion.

I reckon its the Andes. High altitude, low latitude, near the ocean.

Comment Source:Dara, in relation to comment #5, I think you are using raw temperatures, not anomalies. Your clustering is therefore detecting things like **high latitudes are cooler than low latitudes** and **high altitudes are cooler than low altitudes** and **winters are cooler than summers** and perhaps **low latitudes have less variation than high latitudes**. All very true and important, but not much use for predicting climate. The clusters that emerge from anaomalies would be more interesting. > But oddest of all clusters is that of the yellowish coast of Chile i.e. it is always there sometimes as a narrow protrusion. I reckon its the Andes. High altitude, low latitude, near the ocean.
• Options
8.

Hello Graham

I used Euclidean metric, but the clustering algorithm could have used another non-Euclidean metric, and the clusters would tell another tale.

Even in Euclidean clusters they are not as simple and straightforward as you noted, that is why when used on images they find all sorts of hidden structures

Comment Source:Hello Graham I used Euclidean metric, but the clustering algorithm could have used another non-Euclidean metric, and the clusters would tell another tale. Even in Euclidean clusters they are not as simple and straightforward as you noted, that is why when used on images they find all sorts of hidden structures