Home › Azimuth Project › › - Questions

It looks like you're new here. If you want to get involved, click one of these buttons!

- All Categories 2.4K
- Chat 505
- Study Groups 21
- Petri Nets 9
- Epidemiology 4
- Leaf Modeling 2
- Review Sections 9
- MIT 2020: Programming with Categories 51
- MIT 2020: Lectures 20
- MIT 2020: Exercises 25
- Baez ACT 2019: Online Course 339
- Baez ACT 2019: Lectures 79
- Baez ACT 2019: Exercises 149
- Baez ACT 2019: Chat 50
- UCR ACT Seminar 4
- General 75
- Azimuth Code Project 111
- Statistical methods 4
- Drafts 10
- Math Syntax Demos 15
- Wiki - Latest Changes 3
- Strategy 113
- Azimuth Project 1.1K
- - Spam 1
- News and Information 148
- Azimuth Blog 149
- - Conventions and Policies 21
- - Questions 43
- Azimuth Wiki 719

Options

David Tweed and I are almost done writing an article for the US undergraduate math magazine *Math Horizons*. What do you think about it?

The most frustrating part of writing this paper was trying to explain the Markov Chain Monte Carlo method without becoming overly technical. If you look at most of the articles in this magazine, you'll see they have a light and breezy tone. Ours is verging on too heavy. The version on the blog ran into this problem:

We start by saying we want to compute P(S|M) - he probability that our model should have certain

**S**ettings, giving certain**M**easurements.Then we say we can't easily do that: we can only compute P(M|S).

Then Bayes' rule comes to our rescue by relating these two.

But then we say "Having glibly said that we can compute P(M|S), how do we actually do this?"

We describe an obvious way to do it.

But then we say this is inefficient.

Then we say Markov Chain Monte Carlo is better. We describe how that works, and we start by saying "compute the ratio P(S'|M)/P(S|M)"

At this point we've made a complete loop: we're back to acting like we can compute P(S|M)!

In a later draft we tried to quickly explain why we're not, in fact, being circular:

Note that this rule uses P(S|M) at each step, which we have to get from the P(M|S) value returned from our simulation by using Bayes' rule again. With some additional tricks, such as discarding the very beginning of the walk, this gives a set of samples which can be used to build an estimate of P(S|M).

Unfortunately, I think most readers will find this bewildering and disheartening.

So, in the new draft I've decided to gloss over all this stuff and give references for details.

## Comments

Very nice article! I only have two very small (and not very important) comments:

1) Why is there an "E" instead of "m" (or "M" or "m_t") on the vertical axis of Figure 2?

2) In the Introduction, both "northern Europe" and "Northern Europe" appears. I guess this is a typo, and only the latter version should appear.

`Very nice article! I only have two very small (and not very important) comments: 1) Why is there an "E" instead of "m" (or "M" or "m_t") on the vertical axis of Figure 2? 2) In the Introduction, both "northern Europe" and "Northern Europe" appears. I guess this is a typo, and only the latter version should appear.`

I guess there is some confusion between what is easy to do conceptually and what is easy to computationally.

Conceptually, it's hard to figure out how to even write down a formula for P(S|M), let alone evaluate it computationally, without first going through P(M|S) via Bayes's theorem. This is because we normally think of modeling problems in a "forward" way, through the data-generating process, which gives you P(M|S). Then the trick is to figure out how to invert what we know, P(M|S), to get what we don't know, P(S|M), which is where Bayes comes in.

But once you do go the Bayes route and conceptually know how to write down a formula for what you want, P(S|M),

computingit isn't really harder than computing P(M|S); it just requires an extra calculation of the prior, P(S).`I guess there is some confusion between what is easy to do conceptually and what is easy to computationally. Conceptually, it's hard to figure out how to even write down a formula for P(S|M), let alone evaluate it computationally, without first going through P(M|S) via Bayes's theorem. This is because we normally think of modeling problems in a "forward" way, through the data-generating process, which gives you P(M|S). Then the trick is to figure out how to invert what we know, P(M|S), to get what we don't know, P(S|M), which is where Bayes comes in. But once you do go the Bayes route and conceptually know how to write down a formula for what you want, P(S|M), *computing* it isn't really harder than computing P(M|S); it just requires an extra calculation of the prior, P(S).`

Suggest: To keep things simple, suppose S can only take a finite number of values.

consistent means equal? close?

This confuses me. It sounds like you are running a simulation for each s. Are you really? I would expect you to be running a program which calculates P(M=m|S=s) for fixed m and each s. And I guess P(M=m|S=s) is a product of things like $P(M_2=m_2|S=s,M_2=m_1)$ derived from eq 1.

I would either omit this, or explain why someone might want to know the constant.

'setting s occurs when the measurement is m.' ?

And 'run the model' makes me nervous again.

`> To keep things simple, suppose S is an element of some finite set. Suggest: To keep things simple, suppose S can only take a finite number of values. > The simplest way would be to run our model many, many times with the settings set at S = s and determine the fraction of times it predicts measurements consistent with m. consistent means equal? close? > One thing we can more easily do is repeatedly run our model with randomly chosen settings and see what measurements it predicts. By doing this, we can compute the probability that given setting values S = s, the model predicts measurements M = m. This confuses me. It sounds like you are running a simulation for each s. Are you really? I would expect you to be running a program which calculates P(M=m|S=s) for fixed m and each s. And I guess P(M=m|S=s) is a product of things like $P(M_2=m_2|S=s,M_2=m_1)$ derived from eq 1. > Then, since probabilities must sum to 1, we can figure out this constant. I would either omit this, or explain why someone might want to know the constant. > Figure 3: Estimates of the conditional probability P(S = s|M = m), where at each step we run the model at all possible settings, and we keep a running total of the fraction of times each measurement s occurs when the setting is m. 'setting s occurs when the measurement is m.' ? And 'run the model' makes me nervous again.`

It might be better to introduce Markov Chain Monte Carlo for an arbitrary density $f(x)$. First say its a clever algorithm for obtaining a set of samples from $f$. Then say it only requires calculating ratios like $f(x')/f(x)$, so it produces the same samples if $f$ is multiplied by a constant, and you don't even need to know what the constant is. Then say that just what is needed to sample from the posterior with 1/P(M=m) being the unknown constant.

`It might be better to introduce Markov Chain Monte Carlo for an arbitrary density $f(x)$. First say its a clever algorithm for obtaining a set of samples from $f$. Then say it only requires calculating ratios like $f(x')/f(x)$, so it produces the same samples if $f$ is multiplied by a constant, and you don't even need to know what the constant is. Then say that just what is needed to sample from the posterior with 1/P(M=m) being the unknown constant.`

Thanks for all the comments and corrections! Here are just a few comments, mainly in response to Graham:

For normal people, definitely. For math majors (e.g., me), finite sets have a certain charm.

I had written "equal" here, which is conceptually nice, but David said that's never how it works in practice, so he opted for "consistent", which is deliberately vague. "Close" might be better, though then people will wonder "how close"?

In our toy example we actually do that to produce Figure 3.

The idea was to quickly sketch how you could work out $P(S|M)$ using Bayes' rule, not just "up to a constant". I think I'll leave this in since a certain class of math majors will become quite upset being told you can compute a number up to a constant factor.

Thanks for catching that typo! My fault.

Thanks also to Zoltan for catching some other typos.

The editor just said this draft was "outstanding", and I've got lots of other things to do, so I think I'll submit it without making more substantial attempts to improve it.

`Thanks for all the comments and corrections! Here are just a few comments, mainly in response to Graham: > Suggest: To keep things simple, suppose S can only take a finite number of values. For normal people, definitely. For math majors (e.g., me), finite sets have a certain charm. > consistent means equal? close? I had written "equal" here, which is conceptually nice, but David said that's never how it works in practice, so he opted for "consistent", which is deliberately vague. "Close" might be better, though then people will wonder "how close"? > It sounds like you are running a simulation for each s. Are you really? In our toy example we actually do that to produce Figure 3. > I would either omit this, or explain why someone might want to know the constant. The idea was to quickly sketch how you could work out $P(S|M)$ using Bayes' rule, not just "up to a constant". I think I'll leave this in since a certain class of math majors will become quite upset being told you can compute a number up to a constant factor. > "setting s occurs when the measurement is m." ? Thanks for catching that typo! My fault. Thanks also to Zoltan for catching some other typos. The editor just said this draft was "outstanding", and I've got lots of other things to do, so I think I'll submit it without making more substantial attempts to improve it.`

Zoltan wrote:

I figured out the answer! In fact it says "m". However, the "m" is rotated 90 degrees!

`Zoltan wrote: > Why is there an “E” instead of “m” (or “M” or “m_t”) on the vertical axis of Figure 2? I figured out the answer! In fact it says "m". However, the "m" is rotated 90 degrees!`

John wrote:

Ahh, yes :)! You are right. Sorry.

`John wrote: > I figured out the answer! In fact it says "m". However, the "m" is rotated 90 degrees! Ahh, yes :)! You are right. Sorry.`

Don't be sorry - we need to fix that, because it really does look like an E.

`Don't be sorry - we need to fix that, because it really does look like an E.`

This paper is officially

done, except I'm still hoping David Tweed can rotate that "m".Thanks to everyone for their help! I guess this is the first Azimuth-authored paper!

Let me remind all of you that I'm eager for more collaborations like this, or other ways to get you all more involved in doing work related to environmental issues. The easiest thing to do is write blog articles or help improve the wiki, but there are also more substantial projects waiting for people to do them.

`This paper is officially **done**, except I'm still hoping David Tweed can rotate that "m". Thanks to everyone for their help! I guess this is the first Azimuth-authored paper! Let me remind all of you that I'm eager for more collaborations like this, or other ways to get you all more involved in doing work related to environmental issues. The easiest thing to do is write blog articles or help improve the wiki, but there are also more substantial projects waiting for people to do them.`