The blog post says:

> A classic example is that you have some sort of device or process which can produce the numbers 1 to 6 with probabilities p1…p6, and the only thing you know about the device or process is that the mean is m. You can then use the principle of maximum entropy to choose the pi. The result is of form pi=αβi and using ∑pi=1 and ∑ipi=m you can solve (numerically) for α and β.

> Why dangerous? Suppose m=1.01. The result has p6≈1e-10.

I am not sure I see the problem. A die has to be extremely biased to have m=1.01, and that is what the MEP produces.

What this seems to indicate is that one would want to make doubly sure that the true mean really is 1.01 before using it to estimate the distribution.

Hardcore Bayesians would probably say that the prior for m=1.01 is extremely low.

The method proposed in the article estimates the distribution as

(.9954333,.002,.001,.0006667,.0005,.0004)

while the MEP estimate is

{0.990, 0.00980, 0.0000971, 9.61*10^-7, 9.51*10^-9, 9.42*10^-11}

The alternative estimate balances the probabilities of the 2-5 rolls more, but at the expense of halving the probability of getting any of them, since the probability of 1 goes from .990 to .995.

This means that the average "surprise" per roll is lower with the proposed alternative since the risk when betting on 1 is halved (and of course because average surprise is what MEP maximizes).

So the proposed alternative really does assume more, at least according to this criterion.

> A classic example is that you have some sort of device or process which can produce the numbers 1 to 6 with probabilities p1…p6, and the only thing you know about the device or process is that the mean is m. You can then use the principle of maximum entropy to choose the pi. The result is of form pi=αβi and using ∑pi=1 and ∑ipi=m you can solve (numerically) for α and β.

> Why dangerous? Suppose m=1.01. The result has p6≈1e-10.

I am not sure I see the problem. A die has to be extremely biased to have m=1.01, and that is what the MEP produces.

What this seems to indicate is that one would want to make doubly sure that the true mean really is 1.01 before using it to estimate the distribution.

Hardcore Bayesians would probably say that the prior for m=1.01 is extremely low.

The method proposed in the article estimates the distribution as

(.9954333,.002,.001,.0006667,.0005,.0004)

while the MEP estimate is

{0.990, 0.00980, 0.0000971, 9.61*10^-7, 9.51*10^-9, 9.42*10^-11}

The alternative estimate balances the probabilities of the 2-5 rolls more, but at the expense of halving the probability of getting any of them, since the probability of 1 goes from .990 to .995.

This means that the average "surprise" per roll is lower with the proposed alternative since the risk when betting on 1 is halved (and of course because average surprise is what MEP maximizes).

So the proposed alternative really does assume more, at least according to this criterion.