This is a review paper that explains why mean-squared variance works similarly to Shannon entropy as applied in comment #5

[The Energy of Data](https://doi.org/10.1146/annurev-statistics-060116-054026)

> "The energy of data is the value of a real function of distances between data
in metric spaces. The name energy derives from Newton’s gravitational potential energy, which is also a function of distances between physical objects.
One of the advantages of working with energy functions (energy statistics) is
that even if the data are complex objects, such as functions or graphs, we can
use their real-valued distances for inference. Other advantages are illustrated
and discussed in this review. Concrete examples include energy testing for
normality, energy clustering, and distance correlation."

![](https://imagizer.imageshack.com/img922/7094/4wVfI3.png)

>"The duality between powers of distances and their Fourier transforms is similar
to the duality between probability density functions of random variables and their characteristic functions (especially of normal distributions whose probability density functions have the same form as their characteristic functions). This duality was called a “beautiful theorem of probability theory (Sch¨ones Theorem der Wahrscheinlichkeitrechnung)” by Gauss (Fischer 2011, p. 46)."