The argument I gave above is a bit awkward just because I've primarily seen it used in finding a noisy version of one signal in another signal, in which case you can argue that the sum of squares for both are constant and of the same magnitude. I didn't want to actually edit the blog post while John's working on it, but I've tried to clear up the logic a bit below that can hopefully just be pasted in if its ok.

----------

Another way to understand correlation



David Tweed mentioned another approach from signal processing to understanding the quantity

$$ \langle x y \rangle = \frac{1}{n} \sum_{i = 1}^n x_i y_i $$

If we've got two lists of data $x$ and $y$ that we want to compare to see if they behave similarly, the first thing we ought to do is multiplicatively scale each one so they're of comparable magnitude. There are various possibilities for assigning a scale, but a reasonable one is to ensure they have equal 'energy'

$$ \sum_{i=1}^n x_i^2 = \sum_{i=1}^n y_i^2 $$

(This can be achieved by dividing each list by its standard deviation, which is equivalent to what was done in the main derivation above.) Once we've done that then it's clear that looking at

$$\sum_{i=1}^n (x_i-y_i)^2$$

gives small values when they have a very good match and progressively bigger values as they become less similar. Observe that

$$
\sum_{i=1}^n (x_i-y_i)^2 =\sum_{i=1}^n (x_i^2 - 2 x_i y_i + y_i^2)
$$
$$
=\sum_{i=1}^n x_i^2 - 2 \sum_{i=1}^n x_i y_i + \sum_{i=1}^n y_i^2
$$

Since we've scaled things so that $\sum_{i=1}^n x_i^2$ and $\sum_{i=1}^n y_i^2$ are constants, we can see that when $\sum_{i=1}^n x_i y_i$ becomes bigger,

$$\sum_{i=1}^n (x_i-y_i)^2$$

becomes smaller. So,

$$\sum_{i=1}^n x_i y_i$$

serves as a measure of how close the lists are, under these assumptions.