It looks like you're new here. If you want to get involved, click one of these buttons!

- All Categories 2.3K
- Chat 494
- ACT Study Group 6
- Green Mathematics 1
- Azimuth Math Review 6
- MIT 2020: Programming with Categories 53
- MIT 2020: Lectures 21
- MIT 2020: Exercises 25
- MIT 2019: Applied Category Theory 339
- MIT 2019: Lectures 79
- MIT 2019: Exercises 149
- MIT 2019: Chat 50
- UCR ACT Seminar 4
- General 64
- Azimuth Code Project 110
- Statistical methods 2
- Drafts 1
- Math Syntax Demos 15
- Wiki - Latest Changes 1
- Strategy 110
- Azimuth Project 1.1K

Options

Last time we learned about *partitions* of a set: ways of chopping it into disjoint nonempty sets called "parts".

We also learned about *equivalence relations* on a set: relations that are reflexive, symmetric and transitive. And we learned that partitions give equivalence relations! If we have a partition of a set, we can decree that two elements of the set are "equivalent" if they are in the same part. For example, any two points in the red part of the picture are equivalent.

Even better, every equivalence relation comes from a unique partition. So, if you want to work with partitions, you can equally well work with equivalence relations. This makes it easy to define a poset of partitions, which is the key step toward doing *logic* with partitions.

Here's how it goes. For any set \(X\), \( \mathcal{E}(X)\) is the set of partitions of \(X\). For any partition \(P\) of \( X \), let \(\sim_P\) be the corresponding equivalence relation.

**Definition.** We say that a partition \(P\) of \(X\) is **finer** than a partition \(Q\) of \(X\) if

$$ x \sim_P y \textrm{ implies } x \sim_Q y $$
for all \(x,y \in X \). In this situation we write \(P \le Q\), and we also say \(Q\) is **coarser** than \(P\).

This makes it incredibly easy to get our hands on the poset of partitions!

**Proposition.** \((\mathcal{E}(X), \le)\) is a poset.

**Proof.** We just need to show that the relation \(\le\) is reflexive, transitive and antisymmetric. Since

$$ x \sim_P y \textrm{ implies } x \sim_P y $$ for all \(x,y \in X \), we have

$$ P \le P $$ so the relation \(\le\) is reflexive.

If

$$ x \sim_P y \textrm{ implies } x \sim_Q y $$ and

$$ x \sim_Q y \textrm{ implies } x \sim_R y $$ then clearly

$$ x \sim_P y \textrm{ implies } x \sim_R y $$ So,

$$ P \le Q \textrm{ and } Q \le R \textrm{ implies } P \le R $$ so the relation \(\le\) is transitive.

Finally, to show that \(\le\) is antisymmetric, assume

$$ P \le Q \textrm{ and } Q \le P . $$ Then by definition

$$ x \sim_P y \textrm{ implies } x \sim_Q y $$ and

$$ x \sim_Q y \textrm{ implies } x \sim_P y $$ or in other words,

$$ x \sim_P y \textrm{ if and only if } x \sim_Q y. $$ This means that the relations \(\sim_P\) and \(\sim_Q\) are equal, so \(P = Q \) as desired. \( \quad \blacksquare \)

That proof was as easy as slipping down a slide covered with grease! It's what I call a "follow your nose" proof: to create it, you just carefully write down what you need to show, unravel the definitions to figure out what that really means, and notice that it's true.

Here's another way to tell if one partition is finer than another:

**Puzzle 34.** Given two partitions \(P\) and \(Q\) of a set \(X\), show that \(P \le Q\) if and only if every part of \(P\) is contained in a part of \(Q\).

Using this, you can easily work out the poset of partitions of a 3-element set:

It looks simple. But that's deceptive! For a 4-element set we get this poset of partitions:

It's much more complicated than the poset of *subsets* of a 4-element set:

Indeed, there are many difficult questions about the poset of partitions that are easy for the poset of subsets.

For example: how many subsets does an \(n\)-element set have? Easy: \(2^n\). But how many partitions does an \(n\)-element set have? Not so easy: it's the \(n\)th **Bell number** \(B_n\). The Bell numbers go like this:

$$ 1, 1, 2, 5, 15, 52, 203, 877, 4140, 21147, 115975, 678570, \dots $$
Another example: poset of subsets looks the same when you flip it upside down. That's because any subset \(S \subseteq X\) has a **complement**

$$ X - S = \{x \in X : \; x \notin S \} . $$
In the logic of subsets this gives **negation**: if a subset \(S \subseteq X\) corresponds to some proposition \(P\), its complement \(X - S\) corresponds to the proposition \(\neg P \), which is how logicians write "not \(P\) ".

But the poset of partitions does *not* look the same when you flip it upside down:

And indeed, there is no "not" in partition logic! It's very different than classical logic. To read more about it, go here:

Partition logic still has joins and meets, so I'll talk about those next time. We should also see how a function \(f : X \to Y \) gives a monotone function \(f^* : \mathcal{E}(Y) \to \mathcal{E}(X)\). And we should see if \(f^*\) has left or right adjoints!

This leads up to the punchline of Chapter 1: the idea of "generative effects". In a nutshell, partition logic has "generative effects" because the right adjoint of \(f^*\) doesn't preserve joins. This may sound scary and technical, but as we'll see, it captures a basic fact about reality.

## Comments

If you're okay with a little bit of linguistics and philosophy mixed with math, David Ellerman also gives a nice mental description of the operations of subsets and partitions: Four Ways from Universal to Particular: How Chomsky's Language-Acquisition Faculty is Not Selectionist.

`If you're okay with a little bit of linguistics and philosophy mixed with math, David Ellerman also gives a nice mental description of the operations of subsets and partitions: [Four Ways from Universal to Particular: How Chomsky's Language-Acquisition Faculty is Not Selectionist](https://arxiv.org/abs/1410.4501).`

Cool!

On another note, the Bell numbers have a very nice description as coefficients of a power series:

$$ \sum_{n=0}^\infty \frac{B_n}{n!} x^n = e^{e^x-1} $$ The best proof uses category theory, namely Joyal's theory of combinatorial species, which are functors from the groupoid of finite sets to itself.

`Cool! On another note, the Bell numbers have a very nice description as coefficients of a power series: $$ \sum_{n=0}^\infty \frac{B_n}{n!} x^n = e^{e^x-1} $$ The best proof uses category theory, namely Joyal's theory of [combinatorial species](https://en.wikipedia.org/wiki/Combinatorial_species), which are functors from the groupoid of finite sets to itself.`

In order to prove that \(\left(\mathcal{E}(X), \le\right)\) is a poset (that is, a partially order set), shouldn't we have proved that the relation is also antisymmetric?

`In order to prove that \\(\left(\mathcal{E}(X), \le\right)\\) is a poset (that is, a partially order set), shouldn't we have proved that the relation is also antisymmetric?`

Dan - yes, you're right. I'll add that.

`Dan - yes, you're right. I'll add that.`

Since David Ellerman is stuck behind the approval process, he wanted me to point out that 1) there is a better introduction to partition logic on his website here, and 2) there is indeed a 'negation' operation on a partition.

`Since David Ellerman is stuck behind the approval process, he wanted me to point out that 1) there is a better introduction to partition logic on his website [here](http://www.ellerman.org/introduction-to-partition-logic/), and 2) there is indeed a 'negation' operation on a partition.`

Thanks, Keith. I approved 28 people for the course last night but now I see there are 3 new people still needing approval - David must be one of those.

`Thanks, Keith. I approved 28 people for the course last night but now I see there are 3 new people still needing approval - David must be one of those.`

I added a new puzzle to the above post, which is very useful:

Here's another way to tell if one partition is finer than another:

Puzzle 34.Given two partitions \(P\) and \(Q\) of a set \(X\), show that \(P \le Q\) if and only if every part of \(P\) is contained in a part of \(Q\).`I added a new puzzle to the above post, which is very useful: Here's another way to tell if one partition is finer than another: **Puzzle 34.** Given two partitions \\(P\\) and \\(Q\\) of a set \\(X\\), show that \\(P \le Q\\) if and only if every part of \\(P\\) is contained in a part of \\(Q\\).`

There are two opposite ways to present what is called the "lattice of partitions".

The older way, e.g., used in Birkhoff's Lattice Theory book and used by Fong and Spivak, is to use "coarser than" partial order on partitions which is just the inclusion relation between the corresponding binary equivalence relations. In the partial order, the indiscrete partition (only one block or part), nicknamed "the blob," is on top since it is the largest equivalence relation, and the discrete partition (all singletons) is on the bottom. The new way, used in the logic of partitions, is to use the opposite refinement partial ordering which is the inclusion ordering on the complements of equivalence relations which are called "apartness relations" or "partition relations". These partition relations are characterized as being anti-reflexive, symmetric, and anti-transitive. Then the blob is on the bottom and the discrete partition is on the top.

It is a matter of habit, taste, and the purpose at hand as to which ordering seems more natural. The two lattice operations of join and meet interchange when the ordering is reversed; joins become meets and vice-versa. For instance, the operation defined as the join of partitions in Fong & Spivak is the meet operation in partition logic. All the usual Boolean operations on subsets can be carried over to partitions so that one can then take an given formula in subset (i.e., 'propositional' logic) and give the variables and operations either a subset interpretation or a partition interpretation. The 'key' to developing partition logic was to define the implication operation on partitions, and then that leads to the notion of negation in partition logic.

Given two partitions \(P\) and \(Q\), the implication \(Q \Rightarrow P\) is the partition that is just like \(P\) except that whenever a part or block \(S\) of \(P\) is contained in a part of \(Q\), then that block of \(P\) is 'discretized', i.e., is replaced by all the singletons of the elements of \(S\). Then, as in intuitionistic logic, the negation is the implication to the bottom or zero which is this case is the blob. That blob-negation is trivial since the blob-negation of every partition except the blob is the blob, and the blob-negation of the blob is the top or discrete partition. Of much more interest is the set of all partitions that are implications to a fixed partition \(P\), the P-negated partitions, so \(Q \Rightarrow P\) is the P-negation of \(Q\). They form a Boolean algebra under the partition operations! Thus every partition \(P\) determines a Boolean subalgebra of the partition algebra. The P-negation of a P-negated partition just flips the parts \(S\) of \(P\) between the discretized version (like a mini-one) and the normal version (like a mini-zero).

`There are two opposite ways to present what is called the "lattice of partitions". The older way, e.g., used in Birkhoff's Lattice Theory book and used by Fong and Spivak, is to use "coarser than" partial order on partitions which is just the inclusion relation between the corresponding binary equivalence relations. In the partial order, the indiscrete partition (only one block or part), nicknamed "the blob," is on top since it is the largest equivalence relation, and the discrete partition (all singletons) is on the bottom. The new way, used in the [logic of partitions](http://www.ellerman.org/introduction-to-partition-logic/), is to use the opposite refinement partial ordering which is the inclusion ordering on the complements of equivalence relations which are called "apartness relations" or "partition relations". These partition relations are characterized as being anti-reflexive, symmetric, and anti-transitive. Then the blob is on the bottom and the discrete partition is on the top. It is a matter of habit, taste, and the purpose at hand as to which ordering seems more natural. The two lattice operations of join and meet interchange when the ordering is reversed; joins become meets and vice-versa. For instance, the operation defined as the join of partitions in Fong & Spivak is the meet operation in partition logic. All the usual Boolean operations on subsets can be carried over to partitions so that one can then take an given formula in subset (i.e., 'propositional' logic) and give the variables and operations either a subset interpretation or a partition interpretation. The 'key' to developing partition logic was to define the implication operation on partitions, and then that leads to the notion of negation in partition logic. Given two partitions \\(P\\) and \\(Q\\), the implication \\(Q \Rightarrow P\\) is the partition that is just like \\(P\\) except that whenever a part or block \\(S\\) of \\(P\\) is contained in a part of \\(Q\\), then that block of \\(P\\) is 'discretized', i.e., is replaced by all the singletons of the elements of \\(S\\). Then, as in intuitionistic logic, the negation is the implication to the bottom or zero which is this case is the blob. That blob-negation is trivial since the blob-negation of every partition except the blob is the blob, and the blob-negation of the blob is the top or discrete partition. Of much more interest is the set of all partitions that are implications to a fixed partition \\(P\\), the P-negated partitions, so \\(Q \Rightarrow P\\) is the P-negation of \\(Q\\). They form a Boolean algebra under the partition operations! Thus every partition \\(P\\) determines a Boolean subalgebra of the partition algebra. The P-negation of a P-negated partition just flips the parts \\(S\\) of \\(P\\) between the discretized version (like a mini-one) and the normal version (like a mini-zero).`

Hi, David! For some reason you have to hit "refresh" to see the MathJax in your own comment right after writing it or editing it. Luckily, it'll look fine to everyone else even if you don't do this - and if you leave the page and come back later it'll look fine.

`Hi, David! For some reason you have to hit "refresh" to see the MathJax in your own comment right after writing it or editing it. Luckily, it'll look fine to everyone else even if you don't do this - and if you leave the page and come back later it'll look fine.`

David wrote:

Yes, this really bit me in the butt in Lecture 10 - Chapter 1: The Logic of Partitions. I couldn't make up my mind whether to think of "learning more" as moving from a coarser partition to a finer one - that is, learning more distinctions between things - or as moving from a finer partition to a coarser one - that is, learning more connections between things. I picked the latter, but then I realized this doesn't match a story I'm trying to tell later.

Obviously the best thing to do is explain

bothviewpoints, and take this as an opportunity to expose the students to the joys of "opposite categories", or at least opposite posets. But I'd been wanting to put that off for a while. Now I've got to fix my exposition a bit.`David wrote: > There are two opposite ways to present what is called the "lattice of partitions". Yes, this really bit me in the butt in [Lecture 10 - Chapter 1: The Logic of Partitions](https://forum.azimuthproject.org/discussion/1963/lecture-10-chapter-1-the-logic-of-partitions/p1). I couldn't make up my mind whether to think of "learning more" as moving from a coarser partition to a finer one - that is, learning more distinctions between things - or as moving from a finer partition to a coarser one - that is, learning more connections between things. I picked the latter, but then I realized this doesn't match a story I'm trying to tell later. Obviously the best thing to do is explain _both_ viewpoints, and take this as an opportunity to expose the students to the joys of "opposite categories", or at least opposite posets. But I'd been wanting to put that off for a while. Now I've got to fix my exposition a bit.`

It strikes me that treating finer partitions as "greater" than coarser ones corresponds nicely with topologies on sets, where the finest topology on \(X\) is the whole of \(\mathcal{P}X\) and therefore "top". It also fits with the common sense notion that "refined" things are "higher".

`It strikes me that treating finer partitions as "greater" than coarser ones corresponds nicely with topologies on sets, where the finest topology on \\(X\\) is the whole of \\(\mathcal{P}X\\) and therefore "top". It also fits with the common sense notion that "refined" things are "higher".`

I haven’t yet read David Ellerman’s article, but I wonder if the non-crossing partitions have a special logical significance? Unlike regular partitions and like subsets, they are self-dual as a lattice.

`I haven’t yet read David Ellerman’s article, but I wonder if the non-crossing partitions have a special logical significance? Unlike regular partitions and like subsets, they are self-dual as a lattice.`

@Anindya, Yes, but higher sometimes means "more abstract", that is, making less distinctions. In this interpretation,

coarser(more abstract) is greater thanfiner(more concrete).So I think both interpretations are valid and one has to choose the best for the case at hand.

`@Anindya, Yes, but higher sometimes means "more abstract", that is, making less distinctions. In this interpretation, _coarser_ (more abstract) is greater than _finer_ (more concrete). So I think both interpretations are valid and one has to choose the best for the case at hand.`

@John #2, Another neat proof of that comes from the stirling transform. Basically, applying the "stretching map" \( f \mapsto f(e^x - 1)\) is equivalent to multiplying the taylor coefficients of \( f \) by a matrix consisting of stirling numbers of the second kind. What's fascinating to me about this is what seems to have nothing to do with combinatorics (the "stretching map") is actually governed by combinatorial data. I wrote more on this in the link below if you're curious:

https://drive.google.com/file/d/1vF3ChpsIKCuLFYGm1WawAUePxCst4gDH/view?usp=sharing

I like it when we can desribe things, which intuitively seems continuous, in terms of something something more combinatorial or algebraic. I imagine category theory is great at doing this.

`@John #2, Another neat proof of that comes from the stirling transform. Basically, applying the "stretching map" \\( f \mapsto f(e^x - 1)\\) is equivalent to multiplying the taylor coefficients of \\( f \\) by a matrix consisting of stirling numbers of the second kind. What's fascinating to me about this is what seems to have nothing to do with combinatorics (the "stretching map") is actually governed by combinatorial data. I wrote more on this in the link below if you're curious: https://drive.google.com/file/d/1vF3ChpsIKCuLFYGm1WawAUePxCst4gDH/view?usp=sharing I like it when we can desribe things, which intuitively seems continuous, in terms of something something more combinatorial or algebraic. I imagine category theory is great at doing this.`

John Baez wrote:

I believe "learning more" amounts to moving to a finer partition.

I base my opinion off of the model of knowledge proposed by the Nobel laureate Robert Aumann. Aumann proposed to model knowledge in game theory as partitions of information states. This is presented in his Agreeing to Disagree (1976). Leonard Savage also proposed using partitions to model decisions under uncertainty in his book The Foundations Of Statistics (1972). The philosopher Jaakko Hintikka independently suggested using

S5Modalities, which reflect partitions on information space. in his textKnowledge and Belief(1962). Hintikka'sepistemic logichas been embraced by various philosophers since his initial proposal. For instance, the logicians van Ditmarsch, van der Hoek and Kooi have a logic with a relationknows more than. This relation expresses that one agent's partition partition on information states isfinerthan another. This research is presented in their paperKnowing More(2009).I will take a little artistic license with Aumann's presentation. If your partition on information states in a game is \(JB\) and mine is \(MD\), then \(JB \wedge MD\) would be the information state if we "put our heads together" and colluded. So

learningcan be modeled in this logic as moving to a finer partition.On the other hand, \(JB \vee MD\) is our

common knowledge. This reflects ourconsensuson groupings of possible information states. And, at the risk of being modest, I think it's safe to say that \(JB \leq MD\). So the things we both know are rather restricted by the contents of my near-empty head :(I can attempt to make a concrete example if you like. Jan van Ditmarsch has some papers where he adapts some John Conway puzzles to reasoning over partitions of information states.

`[[John Baez]] wrote: > I couldn't make up my mind whether to think of "learning more" as moving from a coarser partition to a finer one - that is, learning more distinctions between things - or as moving from a finer partition to a coarser one - that is, learning more connections between things. I picked the latter, but then I realized this doesn't match a story I'm trying to tell later. I believe "learning more" amounts to moving to a finer partition. I base my opinion off of the model of knowledge proposed by the Nobel laureate Robert Aumann. Aumann proposed to model knowledge in game theory as partitions of information states. This is presented in his [Agreeing to Disagree (1976)](https://projecteuclid.org/euclid.aos/1176343654). Leonard Savage also proposed using partitions to model decisions under uncertainty in his book [The Foundations Of Statistics (1972)](https://books.google.com/books/about/The_Foundations_of_Statistics.html?id=zSv6dBWneMEC). The philosopher Jaakko Hintikka independently suggested using [*S5* Modalities](https://en.wikipedia.org/wiki/S5_(modal_logic)), which reflect partitions on information space. in his text [*Knowledge and Belief* (1962)](https://philpapers.org/rec/HINKAB). Hintikka's *epistemic logic* has been embraced by various philosophers since his initial proposal. For instance, the logicians van Ditmarsch, van der Hoek and Kooi have a logic with a relation *knows more than*. This relation expresses that one agent's partition partition on information states is *finer* than another. This research is presented in their paper [*Knowing More* (2009)](http://www.ijcai.org/Proceedings/09/Papers/162.pdf). I will take a little artistic license with Aumann's presentation. If your partition on information states in a game is \\(JB\\) and mine is \\(MD\\), then \\(JB \wedge MD\\) would be the information state if we "put our heads together" and colluded. So *learning* can be modeled in this logic as moving to a finer partition. On the other hand, \\(JB \vee MD\\) is our *common knowledge*. This reflects our *consensus* on groupings of possible information states. And, at the risk of being modest, I think it's safe to say that \\(JB \leq MD\\). So the things we both know are rather restricted by the contents of my near-empty head :( I can attempt to make a concrete example if you like. Jan van Ditmarsch has some papers where he adapts some John Conway puzzles to reasoning over partitions of information states.`

Matthew wrote:

You're telling a very nice consistent story here, which everyone here should ponder!

But just to muddy the situation, the opposite attitude also makes sense. If you're trying to learn to distinguish things, i.e. "learn that initially similar-looking things are actually different", then learning more amounts to moving to a finer partition. But if you're trying to learn to relate things, i.e., "learn that initially different-looking things are actually similar", then learning more amounts to moving to a coarser partition.

In my lecture here I gave an example of the latter: a detective comes to an island and meets 5 seemingly unrelated people, but gradually discovers that some of them are relatives.

However, in the Fong-Spivak conventions we say \(P \le Q\) if the partition \(P\) is finer than the partition \(Q\). If we then think of partitions as propositions and decree that \(P \le Q\) means \(P \implies Q\), as I'd been doing with the logic of subsets, it makes sense to say \(P\) "knows more" than \(Q\) in this case.

So, after writing my lecture here, I decided I should switch to an example where "learning more" amounts to moving to a finer partition. I will do that.

There are a number of arbitrary conventions here, which provide ample scope for left-right dyslexia. For example, it seems at first glance odd to say \(P \le Q\) means \(P\) "knows more" than \(Q\). Less is more!

Of course, the cause of all the flip-flopping is that every poset has an "opposite", in which \(\le\) is redefined to mean \(\ge\). Both the poset of partitions and its opposite are useful!

`Matthew wrote: > I believe "learning more" amounts to moving to a finer partition. You're telling a very nice consistent story here, which everyone here should ponder! But just to muddy the situation, the opposite attitude also makes sense. If you're trying to learn to distinguish things, i.e. "learn that initially similar-looking things are actually different", then learning more amounts to moving to a finer partition. But if you're trying to learn to relate things, i.e., "learn that initially different-looking things are actually similar", then learning more amounts to moving to a coarser partition. In my lecture here I gave an example of the latter: a detective comes to an island and meets 5 seemingly unrelated people, but gradually discovers that some of them are relatives. However, in the Fong-Spivak conventions we say \\(P \le Q\\) if the partition \\(P\\) is finer than the partition \\(Q\\). If we then think of partitions as propositions and decree that \\(P \le Q\\) means \\(P \implies Q\\), as I'd been doing with the logic of subsets, it makes sense to say \\(P\\) "knows more" than \\(Q\\) in this case. So, after writing my lecture here, I decided I should switch to an example where "learning more" amounts to moving to a finer partition. I will do that. There are a number of arbitrary conventions here, which provide ample scope for left-right dyslexia. For example, it seems at first glance odd to say \\(P \le Q\\) means \\(P\\) "knows more" than \\(Q\\). Less is more! <img src = "http://math.ucr.edu/home/baez/emoticons/tongue2.gif"> Of course, the cause of all the flip-flopping is that every poset has an "opposite", in which \\(\le\\) is redefined to mean \\(\ge\\). Both the poset of partitions and its opposite are useful!`

This reminds me of some Information Theory:

Here we have a definition of Shannon entropy of a

partition(more developed also here). The book shows how the finer a partition is, the higher the resulting entropy.And in slide 7 here Shannon entropy of a finite probability measure \(p\) is interpreted as

So this would quantify how much more you learn by moving to finer partitions.

I take that this MO question helps in viewing real functions on partition blocks as random variables. Problems may arise in infinite sample spaces though.

`This reminds me of some Information Theory: [Here](http://planetmath.org/entropyofapartition) we have a definition of Shannon entropy of a *partition* (more developed also [here](http://www.cambridge.org/9780521883894)). The book shows how the finer a partition is, the higher the resulting entropy. And in slide 7 [here](http://math.ucr.edu/home/baez/networks_oxford/networks_entropy.pdf) Shannon entropy of a finite probability measure \\(p\\) is interpreted as > How much information you learn, on average, when someone tells you an element \\(x \in X\\), if all you’d known was that it was randomly distributed according to \\(p\\). So this would quantify how much more you learn by moving to finer partitions. I take that [this](https://math.stackexchange.com/questions/381986/prove-that-it-is-a-random-variable-iff-it-is-constant-on-each-partition) MO question helps in viewing real functions on partition blocks as random variables. Problems may arise in infinite sample spaces though.`

In the same book as in #17, they give an interpretation of the entropy of the meet of partitions (in a probability space), and give also an independence notion, all resembling facts about joint distributions. They prove \(H(\sigma \wedge \tau) = H(\sigma) + H(\tau)\) for the Shannon entropy of the partition as above (for \(\sigma\) and \(\tau\) independent partitions). But while the meet of partitions is understood, then they ask,

1.4.6. (Research problem) There is no information or entropy interpretation of the join of two partitions. Find an interesting one.`In the same [book](http://www.cambridge.org/9780521883894) as in #17, they give an interpretation of the entropy of the meet of partitions (in a probability space), and give also an independence notion, all resembling facts about joint distributions. They prove \\(H(\sigma \wedge \tau) = H(\sigma) + H(\tau)\\) for the Shannon entropy of the partition as above (for \\(\sigma\\) and \\(\tau\\) independent partitions). But while the meet of partitions is understood, then they ask, *1.4.6. (Research problem) There is no information or entropy interpretation of the join of two partitions. Find an interesting one*.`

There should be a lot of interesting things to say about the information of partitions.

The equation \( H(\sigma \wedge \tau) = H(\sigma) + H(\tau)\) only holds for independent partitions, so I wouldn't say the information-theoretic meaning of the meet of partitions is "understood" based on just that.

I'd tackle the overall problem this way. You cited my Oxford talk about information, entropy and Bayesian networks. There I describe the category \( \mathrm{FinProb} \), where the objects are finite sets equipped with probability distributions, and the morphisms are stochastic maps. In my paper with Fritz and Leinster we show how to associate an entropy to each morphism, and how to characterize entropy very naturally in these terms - the formula for entropy is not postulated, it's derived.

Here's one way that theory interacts with partitions. Any finite set \(X\) can be equipped with a uniform probability distribution \(u_X\). Any partition \(P\) of \(X\) gives rise to an onto function \(f : X \to Y\) where \(Y\) is the set of parts of \(P\), and then a stochastic map \(f : (X,u_X) \to (Y,p) \) for some uniquely determined probability distribution \( p \). \(p\) simply assigns to each part of the partition its measure. This stochastic map \(f\) has an entropy as given by my paper with Fritz and Leinster - but this is equal to the entropy of the partition as defined on PlanetMath!

`There should be a lot of interesting things to say about the information of partitions. The equation \\( H(\sigma \wedge \tau) = H(\sigma) + H(\tau)\\) only holds for independent partitions, so I wouldn't say the information-theoretic meaning of the meet of partitions is "understood" based on just that. I'd tackle the overall problem this way. You cited my [Oxford talk about information, entropy and Bayesian networks](math.ucr.edu/home/baez/networks_oxford/networks_entropy.pdf). There I describe the category \\( \mathrm{FinProb} \\), where the objects are finite sets equipped with probability distributions, and the morphisms are stochastic maps. In my paper with Fritz and Leinster we show how to associate an entropy to each morphism, and how to characterize entropy very naturally in these terms - the formula for entropy is not postulated, it's derived. Here's one way that theory interacts with partitions. Any finite set \\(X\\) can be equipped with a uniform probability distribution \\(u_X\\). Any partition \\(P\\) of \\(X\\) gives rise to an onto function \\(f : X \to Y\\) where \\(Y\\) is the set of parts of \\(P\\), and then a stochastic map \\(f : (X,u_X) \to (Y,p) \\) for some uniquely determined probability distribution \\( p \\). \\(p\\) simply assigns to each part of the partition its measure. This stochastic map \\(f\\) has an entropy as given by my paper with Fritz and Leinster - but this is equal to [the entropy of the partition as defined on PlanetMath](http://planetmath.org/entropyofapartition)!`

Nice! One itch: in the slides, \(\text{FinStoch}\) has as objects just finite sets, while in \(\text{FinProb}\) the objects are pairs of finite sets and distributions (and maps are measure preserving functions). Where does \(f : (X,u_X) \to (Y,p) \) live?

`Nice! One itch: in the slides, \\(\text{FinStoch}\\) has as objects just finite sets, while in \\(\text{FinProb}\\) the objects are pairs of finite sets and distributions (and maps are measure preserving functions). Where does \\(f : (X,u_X) \to (Y,p) \\) live?`

It lives in \(\mathrm{FinProb}\), and that's the category where "entropy as information loss" naturally appears. I'll fix my comment above. Thanks!

`It lives in \\(\mathrm{FinProb}\\), and that's the category where "entropy as information loss" naturally appears. I'll fix my comment above. Thanks!`

Hi John, I'm struggling still with #19, because I'm interpreting that if \(f : (X,u_X) \to (Y,p) \) is in \(FinProb\), then it is measure-preserving, and It would seem that this ruins our freedom to choose \(p\) at will.

Your post, around "Suppose we have two finite sets...", says:

In our case Y is the set of labels of the blocks of the partition, the preimages of the partition label singletons are the partition blocks as subsets of X, and, since X is equipped with the uniform distribution, the sum of the probabilities of the elements of the block is simply proportional to the size of the block, so that would force the value of \(p\) at the label.

`Hi John, I'm struggling still with [#19](https://forum.azimuthproject.org/discussion/comment/17198/#Comment_17198), because I'm interpreting that if \\(f : (X,u_X) \to (Y,p) \\) is in \\(FinProb\\), then it is measure-preserving, and It would seem that this ruins our freedom to choose \\(p\\) at will. Your [post](https://johncarlosbaez.wordpress.com/2011/06/02/a-characterization-of-entropy/), around "Suppose we have two finite sets...", says: > Suppose we have two finite sets with probability measures, say \\((X,p)\\) and \\((Y,q)\\). Then we define a morphism \\(f: (X,p) \to (Y,q)\\) to be a measure-preserving function: in other words, one for which the probability \\(q_j\\) of any point in \\(Y\\) is the sum of the probabilities \\(p_i\\) of the points in \\(X\\) with \\(f(i) = j\\). In our case Y is the set of labels of the blocks of the partition, the preimages of the partition label singletons are the partition blocks as subsets of X, and, since X is equipped with the uniform distribution, the sum of the probabilities of the elements of the block is simply proportional to the size of the block, so that would force the value of \\(p\\) at the label.`

Jesus: Yes, this condition eliminates our freedom to choose \(p\) at will. But I wouldn't say it "ruins" anything. It's exactly what we want for this result.

`Jesus: Yes, this condition eliminates our freedom to choose \\(p\\) at will. But I wouldn't say it "ruins" anything. It's exactly what we want for this result.`

John: If you are doing it on purpose it doesn't "ruin" a thing, excuse me for not getting it. When searching for the formal concept analysis view of Galois Theory I stumbled, in the same conference CLA11, with a talk of Dan Simovici defining partition entropy similarly as you say for his own purposes. But in PlanetMath definition they do allow the measure \(\mu\) to vary freely.

`John: If you are doing it on purpose it doesn't "ruin" a thing, excuse me for not getting it. When searching for the formal concept analysis view of Galois Theory I stumbled, in the same conference [CLA11](http://cla2011.loria.fr/), with a talk of Dan Simovici defining partition entropy similarly as you say for his own purposes. But in PlanetMath [definition](http://planetmath.org/entropyofapartition) they do allow the measure \\(\mu\\) to vary freely.`

@Keith post #1: Slogging along, two steps forward and one backward. There is so much jucy material in here I find I am discovering ideas that I find irresistible that I must follow up on. Thanks for this reference Keith. I am now working through this paper. In my earlier days I took a stab at statistical mechanics. There is an idea I have had for some time; there is a deeper way of thinking about "information" than we find from Shannon/Boltzmann.

My intuition is that what is missing is the notion of how information has a lot to do with context. So the launch codes for a nuclear missile have very little Shannon information. Analogously if you take the human genome and put it in a beaker of water what do you have? Answer: food. You need a human cell to interpret the "information" in the genome.

My intuition further tells me that databases serve to build context. Unfortunately my formal mathematical skills have proven inadequate to develop these intuitions into something more tangible.

Not sure where this is going but I am finding David's paper really intriguing, so thank you David for your insights. I am fascinated by dits. Very entertaining!

`@Keith post #1: Slogging along, two steps forward and one backward. There is so much jucy material in here I find I am discovering ideas that I find irresistible that I must follow up on. Thanks for this reference Keith. I am now working through this paper. In my earlier days I took a stab at statistical mechanics. There is an idea I have had for some time; there is a deeper way of thinking about "information" than we find from Shannon/Boltzmann. My intuition is that what is missing is the notion of how information has a lot to do with context. So the launch codes for a nuclear missile have very little Shannon information. Analogously if you take the human genome and put it in a beaker of water what do you have? Answer: food. You need a human cell to interpret the "information" in the genome. My intuition further tells me that databases serve to build context. Unfortunately my formal mathematical skills have proven inadequate to develop these intuitions into something more tangible. Not sure where this is going but I am finding David's paper really intriguing, so thank you David for your insights. I am fascinated by dits. Very entertaining!`