> Finally,
P(M = m) is independent of our choice of
settings. So, we can use Bayes' rule to compute
P(S = s|M = m) up to a constant factor. And
since probabilities must sum to 1, we can figure out
this constant.

The important thing about P(M = m) is that it is constant (at least for the duration of the analysis) - because m is the "hard disc full of observed data" which does not change. And while the constant factor can be figured out in principle it is not often done in practice (it would take too long). A nice thing about MCMC is that it can explore an unnormalized density and you can extract lots of useful information from the sampled values without needing to know the constant.




> The key to making this work is
that at each step on the walk a proposed modification
s' to the current settings s is generated randomly -
but it may be rejected if it does not seem to improve
the estimates. The essence of the rule is:
The modification s to s' is randomly accepted
with a probability equal to the ratio
P(M = m|S = s')/P(M = m|S = s)
Otherwise the walk stays at the current position.
If the modification is better, so that the ratio is
greater than 1, the new state is always accepted.


I suggest replacing this with:

At each step on the walk a proposed modification s' to the current settings s is generated randomly which is then either accepted or rejected. If P(M = m|S = s') is bigger than P(M = m|S = s), then the modification is accepted and the walk steps to s'. Otherwise, the modification s to s' is randomly accepted with a probability equal to the ratio
P(M = m|S = s')/P(M = m|S = s).
If the proposed modification is rejected the walk stays at the current position s. It is not obvious that this will make a walk with the right statistical properties, but this can be proved for lots of kinds of proposed modifications [Metropolis et al, 1953].



Figure 4 is very nice!