Here's the bigger context:

I'm considering figuring out an approximation to a compex process in terms of a discretised probabilistic transition over a realatively quite large space (say $256^4=2^32$ states. (In eqns I won't bother using boldface for vectors, but assume all states are vectors). The "exact" transformation is given by

$$p_t(x_i)=\sum_{x_j\in \Omega} p_{t-1}(x_j) p(x_i | x_j)$$

(where for current purposes let's say we only care about one timestep). By simple direct simulation I can calculate a good approximation to the scalar value $p(x_j | x_i)$ for a particular choice of $i$ and $j$. But there's no way that the full $2^32 \times 2^32$ statespace can be fully evaluated. Maybe simulating $2^32$ elements is possible in reasonable time, which is a large number but still a tiny fraction of the whole state space.

So one common approximation would be to asssume that the transition function is reasonably "well-behaved" and to approximate it in terms of a sum of Normals (ie Gaussians)

$$p(x_i|x_j) distributed \sum_k w_k N(\mu_k,\sigma_k)$$

There's no deep reason behind this, it's just a choice that is capable of representing any distribution in the limit of infinite numbers of Normals (I think, maybe there's some constraints to rule out really pathological cases) and it's easy to see how to do estimate this just from keeping track of _sufficient statistics_: just decide which normal a given sample comes from (ignore this complication for now) and compute the sample mean and sample covariance for all the calculated $\{p(x_i|x_j)\}$ that have been calcuated that decided belong to that Normal. _In principle_ this is a well-understood, simple thing, but in practice if this requires computing the covariance of a large dataset (let's say somewhere in the range of $2^16$ to $2^20$ sample values), and that's where the accumulation of floating point artifacts can build up (as terms are added to the sums) large _relative accuracy_ errors as the number of samples increases. (To be clear: I'm not worried about being super-accurate, I'm worried about ending up with virtually no accuracy at all if I do the naive obvious thing: you can google floating point summation algorithms to see some of the issues cases in even simple cases.)

Given the amount of effort that's going into the simulation stage, I'm trying to follow "best practice for floating point algorithms" to avoid blowing all that work on poor estimates of the normal distributions.

Any insights or ideas gratefully accepted.