#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Options

# Experiments with Bayesian priors and maximum entropy

Added page Experiments with Bayesian priors and maximum entropy where I put my own answers (two infinite sets of them!) to the question I asked in thread 'Understanding Occam's razor in modern science/statistics'.

• Options
1.

Interestingly, a statistician colleague recently forwarded me the Silver et al. manuscript on quantum entropy, along with some related tech reports by the same authors. He was interested in possibly resurrecting their unpublished work; a top statistics journal was interested in an early version years ago, but the authors never revised it.

I only skimmed it and didn't follow the details, but I gather that the mathematical properties of quantum entropy act as some kind of "regularization" (e.g. imposing smoothness constraints) on maximum entropy inference. I told him that to write something up, he'd probably have to compare it against modern methods (e.g., in image processing spatial Bayesian models estimated with Gibbs sampling have emerged in subsequent years as competitors to MaxEnt). I'm not sure how in-vogue MaxEnt still is nowadays. I also mentioned I knew a blog where people discuss quantum entropy ...

Comment Source:Interestingly, a statistician colleague recently forwarded me the Silver et al. manuscript on quantum entropy, along with some related tech reports by the same authors. He was interested in possibly resurrecting their unpublished work; a top statistics journal was interested in an early version years ago, but the authors never revised it. I only skimmed it and didn't follow the details, but I gather that the mathematical properties of quantum entropy act as some kind of "regularization" (e.g. imposing smoothness constraints) on maximum entropy inference. I told him that to write something up, he'd probably have to compare it against modern methods (e.g., in image processing spatial Bayesian models estimated with Gibbs sampling have emerged in subsequent years as competitors to MaxEnt). I'm not sure how in-vogue MaxEnt still is nowadays. I also mentioned I knew a blog where people discuss quantum entropy ...
• Options
2.

I only skimmed it too. I came across the Silver reference in Bayesian Methods by Leonard and Hsu, pp249-250. It says Silverman (not Silver) and Leonard have suggested regularizing a density $g$ by minimising things including terms like

$$\int \left[ \frac{\partial^2 g}{\partial x^2} \right]^2 d x$$ which is quantum-theory-ish.

Comment Source:I only skimmed it too. I came across the Silver reference in Bayesian Methods by Leonard and Hsu, pp249-250. It says Silverman (not Silver) and Leonard have suggested regularizing a density $g$ by minimising things including terms like $$\int \left[ \frac{\partial^2 g}{\partial x^2} \right]^2 d x$$ which is quantum-theory-ish.
• Options
3.

I guess one question is how much does quantum math bring to the problem? Does it suggest regularization terms that are different/better than what already exists? Smoothness penalty terms aren't new (nowadays, at least).

Comment Source:I guess one question is how much does quantum math bring to the problem? Does it suggest regularization terms that are different/better than what already exists? Smoothness penalty terms aren't new (nowadays, at least).
• Options
4.
edited June 2014

The blog post says:

A classic example is that you have some sort of device or process which can produce the numbers 1 to 6 with probabilities p1…p6, and the only thing you know about the device or process is that the mean is m. You can then use the principle of maximum entropy to choose the pi. The result is of form pi=αβi and using ∑pi=1 and ∑ipi=m you can solve (numerically) for α and β.

Why dangerous? Suppose m=1.01. The result has p6≈1e-10.

I am not sure I see the problem. A die has to be extremely biased to have m=1.01, and that is what the MEP produces. What this seems to indicate is that one would want to make doubly sure that the true mean really is 1.01 before using it to estimate the distribution. Hardcore Bayesians would probably say that the prior for m=1.01 is extremely low.

The method proposed in the article estimates the distribution as

(.9954333,.002,.001,.0006667,.0005,.0004)

while the MEP estimate is

{0.990, 0.00980, 0.0000971, 9.6110^-7, 9.5110^-9, 9.42*10^-11}

The alternative estimate balances the probabilities of the 2-5 rolls more, but at the expense of halving the probability of getting any of them, since the probability of 1 goes from .990 to .995. This means that the average "surprise" per roll is lower with the proposed alternative since the risk when betting on 1 is halved (and of course because average surprise is what MEP maximizes). So the proposed alternative really does assume more, at least according to this criterion.

Comment Source:The blog post says: > A classic example is that you have some sort of device or process which can produce the numbers 1 to 6 with probabilities p1…p6, and the only thing you know about the device or process is that the mean is m. You can then use the principle of maximum entropy to choose the pi. The result is of form pi=αβi and using ∑pi=1 and ∑ipi=m you can solve (numerically) for α and β. > Why dangerous? Suppose m=1.01. The result has p6≈1e-10. I am not sure I see the problem. A die has to be extremely biased to have m=1.01, and that is what the MEP produces. What this seems to indicate is that one would want to make doubly sure that the true mean really is 1.01 before using it to estimate the distribution. Hardcore Bayesians would probably say that the prior for m=1.01 is extremely low. The method proposed in the article estimates the distribution as (.9954333,.002,.001,.0006667,.0005,.0004) while the MEP estimate is {0.990, 0.00980, 0.0000971, 9.61*10^-7, 9.51*10^-9, 9.42*10^-11} The alternative estimate balances the probabilities of the 2-5 rolls more, but at the expense of halving the probability of getting any of them, since the probability of 1 goes from .990 to .995. This means that the average "surprise" per roll is lower with the proposed alternative since the risk when betting on 1 is halved (and of course because average surprise is what MEP maximizes). So the proposed alternative really does assume more, at least according to this criterion.
• Options
5.

By the way, it's not a blog post - it's one of the many Experiments pages on the Azimuth Wiki, where individuals pursue little research projects. I'm eager for people to refine these into Blog articles in progress so we can post them.

Comment Source:By the way, it's not a blog post - it's one of the many [[Experiments]] pages on the Azimuth Wiki, where individuals pursue little research projects. I'm eager for people to refine these into [[Blog articles in progress]] so we can post them.