It looks like you're new here. If you want to get involved, click one of these buttons!

- All Categories 2.3K
- Chat 499
- Study Groups 19
- Petri Nets 9
- Epidemiology 4
- Leaf Modeling 1
- Review Sections 9
- MIT 2020: Programming with Categories 51
- MIT 2020: Lectures 20
- MIT 2020: Exercises 25
- MIT 2019: Applied Category Theory 339
- MIT 2019: Lectures 79
- MIT 2019: Exercises 149
- MIT 2019: Chat 50
- UCR ACT Seminar 4
- General 69
- Azimuth Code Project 110
- Statistical methods 4
- Drafts 2
- Math Syntax Demos 15
- Wiki - Latest Changes 3
- Strategy 113
- Azimuth Project 1.1K
- - Spam 1
- News and Information 148
- Azimuth Blog 149
- - Conventions and Policies 21
- - Questions 43
- Azimuth Wiki 713

Options

This recent thesis is chock full of mathematical ideas. The style is engaging and makes the material very accessible.

- Tai-Danae Bradley, At the interface of algebra and statistics, PhD thesis, 2020.

This thesis takes inspiration from quantum physics to investigate mathematical structure that lies at the interface of algebra and statistics. The starting point is a passage from classical probability theory to quantum probability theory. The quantum version of a probability distribution is a density operator, the quantum version of marginalizing is an operation called the partial trace, and the quantum version of a marginal probability distribution is a reduced density operator. Every joint probability distribution on a finite set can be modeled as a rank one density operator. By applying the partial trace, we obtain reduced density operators whose diagonals recover classical marginal probabilities. In general, these reduced densities will have rank higher than one, and their eigenvalues and eigenvectors will contain extra information that encodes subsystem interactions governed by statistics. We decode this information, and show it is akin to conditional probability, and then investigate the extent to which the eigenvectors capture "concepts" inherent in the original joint distribution. The theory is then illustrated with an experiment that exploits these ideas. Turning to a more theoretical application, we also discuss a preliminary framework for modeling entailment and concept hierarchy in natural language, namely, by representing expressions in the language as densities. Finally, initial inspiration for this thesis comes from formal concept analysis, which finds many striking parallels with the linear algebra. The parallels are not coincidental, and a common blueprint is found in category theory. We close with an exposition on free (co)completions and how the free-forgetful adjunctions in which they arise strongly suggest that in certain categorical contexts, the "fixed points" of a morphism with its adjoint encode interesting information.

## Comments

Apart from the thesis contribution itself, it has the following merits:

`Apart from the thesis contribution itself, it has the following merits: * Wide sweep of topics and concepts * Not a high barrier to entry for understanding the math * Yet it leads to deeper mathematics, including category theory * Practical applications`

Op. cit., p. 11: Introduction to formal concept analysisSee also:

From Wikipedia:

`_Op. cit._, p. 11: Introduction to formal concept analysis See also: * [Formal concept analysis](https://en.wikipedia.org/wiki/Formal_concept_analysis), Wikipedia. * Simon Willerton, [Formal concept analysis](https://golem.ph.utexas.edu/category/2013/09/formal_concept_analysis.html), the n-Category Cafe. From Wikipedia: > Formal concept analysis (FCA) is a principled way of deriving a concept hierarchy or formal ontology from a collection of objects and their properties. Each concept in the hierarchy represents the objects sharing some set of properties; and each sub-concept in the hierarchy represents a subset of the objects (as well as a superset of the properties) in the concepts above it. The term was introduced by Rudolf Wille in 1981, and builds on the mathematical theory of lattices and ordered sets that was developed by Garrett Birkhoff and others in the 1930s. > > Formal concept analysis finds practical application in fields including data mining, text mining, machine learning, knowledge management, semantic web, software development, chemistry and biology.`

A

formal contextK is a relation between the objects O and attributes A. It indicates which attributes apply to which objects. It is represented by a bipartite graph.A formal concept C is a special kind of sub-relation of K, corresponding I believe to a maximal complete bipartite subgraph.

That is, it is a subset \(X\) of the objects along with a subset \(Y\) of the attributes, such that every object in \(X\) has all the attributes in \(Y\), and \(X\) is maximal in this regard.

Note this is not the standard formal definition, as presented in

op. cit., Wikipedia, etc., but I believe it is equivalent.`A **formal context** K is a relation between the objects O and attributes A. It indicates which attributes apply to which objects. It is represented by a bipartite graph. A formal concept C is a special kind of sub-relation of K, corresponding I believe to a maximal complete bipartite subgraph. That is, it is a subset \\(X\\) of the objects along with a subset \\(Y\\) of the attributes, such that every object in \\(X\\) has all the attributes in \\(Y\\), and \\(X\\) is maximal in this regard. Note this is not the standard formal definition, as presented in _op. cit._, Wikipedia, etc., but I believe it is equivalent.`

Op. cit., p.13Formal concepts are illustrated with a miniature example, with:

The formal context given is \(\lbrace (fruit,orange), (fruit,green), (vegetable,purple) \rbrace\).

Writing this using arrows for the tuples:

And reversing the direction to match language scanning:

`_Op. cit._, p.13 Formal concepts are illustrated with a miniature example, with: * Objects O = {fruit, vegetable} * Attributes A = {green, yellow, purple} The formal context given is \\(\lbrace (fruit,orange), (fruit,green), (vegetable,purple) \rbrace\\). Writing this using arrows for the tuples: * \\(fruit \rightarrow orange\\) * \\(fruit \rightarrow green\\) * \\(vegetable \rightarrow purple\\) And reversing the direction to match language scanning: * \\(orange \leftarrow fruit\\) * \\(green \leftarrow fruit\\) * \\(purple \leftarrow vegetable\\)`

This context could be derived by analyzing a small text, such as "Today I would like to get either an orange fruit, a green fruit, or a purple vegetable."

In a simple model, natural language processing tells us that an attribute/adjective X applies to an object/noun Y if the phrase X Y appears somewhere in the text.

There are two formal concepts contained in this context:

C1 = {(fruit,orange), (fruit,green)}

C2 = {(vegetable,purple)}

Each of these is a maximal complete bipartite subgraph.

`This context could be derived by analyzing a small text, such as "Today I would like to get either an orange fruit, a green fruit, or a purple vegetable." In a simple model, natural language processing tells us that an attribute/adjective X applies to an object/noun Y if the phrase X Y appears somewhere in the text. There are two formal concepts contained in this context: C1 = {(fruit,orange), (fruit,green)} C2 = {(vegetable,purple)} Each of these is a maximal complete bipartite subgraph.`

Op. cit., p.14`_Op. cit._, p.14 > The first formal concept is, perhaps, the concept of citrus fruits. The second formal concept is that of a purple vegetable. There are plenty of these: red cabbage, purple cauliflower, purple carrots, purple asparagus, endive, and eggplant—though that’s technically a fruit!—to name a few.`

Now that we have some examples to inform our intuition, let's look at the standard technical definition of a formal concept, which is based on what are called "derivation operators."

`Now that we have some examples to inform our intuition, let's look at the standard technical definition of a formal concept, which is based on what are called "derivation operators."`

Let X be all the objects, and Y all the attributes.

Then using standard notation, the powerset \(2^X\) consists of all subsets of \(X\), i.e., all sets of objects. And \(2^Y\) consists of all subsets of \(Y\), all sets of attributes.

Define functions \(f: 2^X \rightarrow 2^Y\) and \(g: 2^Y \rightarrow 2^X\), called derivation operators, as follows.

Let \(x \subseteq X\) be a subset of the objects.

Then define \(x' = f(x) \subseteq Y\) as the set of attributes which apply to every object in \(X\).

That defines the derivation operator \(f: 2^X \rightarrow 2^Y\).

`Let X be all the objects, and Y all the attributes. Then using standard notation, the powerset \\(2^X\\) consists of all subsets of \\(X\\), i.e., all sets of objects. And \\(2^Y\\) consists of all subsets of \\(Y\\), all sets of attributes. Define functions \\(f: 2^X \rightarrow 2^Y\\) and \\(g: 2^Y \rightarrow 2^X\\), called derivation operators, as follows. Let \\(x \subseteq X\\) be a subset of the objects. Then define \\(x' = f(x) \subseteq Y\\) as the set of attributes which apply to every object in \\(X\\). That defines the derivation operator \\(f: 2^X \rightarrow 2^Y\\).`

The derivation operator \(g: 2^Y \rightarrow 2^X\) is defined symmetrically.

Let \(y \subseteq Y\) be a subset of the attributes.

Then define \(y' = g(y) \subseteq X\) as the set of objects which have every attribute in \(Y\).

`The derivation operator \\(g: 2^Y \rightarrow 2^X\\) is defined symmetrically. Let \\(y \subseteq Y\\) be a subset of the attributes. Then define \\(y' = g(y) \subseteq X\\) as the set of objects which have every attribute in \\(Y\\).`

Then a

formal conceptis defined as a pair (x,y), where f(x) = y and g(y) = x.Or, using our other notation, (x,y) is a formal concept means that x' = y and y' = x.

`Then a **formal concept** is defined as a pair (x,y), where f(x) = y and g(y) = x. Or, using our other notation, (x,y) is a formal concept means that x' = y and y' = x.`

Exercise: prove (or refute) that this is equivalent to the definition I gave in comment #3:

`Exercise: prove (or refute) that this is equivalent to the definition I gave in comment #3: > That is, it is a subset X of the objects along with a subset Y of the attributes, such that every object in X has all the attributes in Y, and X is maximal in this regard.`

These derivation operators have some nice order-theoretic structure.

First, note the powersets \(2^X\) and \(2^Y\) are of course partially ordered by the inclusion relation.

Next, it is not hard to see that \(f: 2^X \rightarrow 2^Y\) and \(g: 2^Y \rightarrow 2^X\) are

order reversingfunctions.First consider \(f: 2^X \rightarrow 2^Y\).

What happens if you apply \(f\) to a subset \(x_1 \subseteq x \subseteq X\)?

Then, since the \(x_1\) is less than or equal to \(x\), the set of attributes which apply to every object in \(x_1\) must be greater than or equal to the set of attributes which apply to every object in \(x\). We weakened the condition by choosing a subset \(x_1\) of \(x\), which can only increase the set of common attributes.

So \(x_1 \subseteq x \implies f(x_1) \supseteq f(x)\), which says that \(f\) is order-reversing.

Similarly, decreasing a set of attributes \(y\) can only increase the set of objects \(g(y)\) which have all the attributes in \(y\).

So \(g\) is order-reversing too.

`These derivation operators have some nice order-theoretic structure. First, note the powersets \\(2^X\\) and \\(2^Y\\) are of course partially ordered by the inclusion relation. Next, it is not hard to see that \\(f: 2^X \rightarrow 2^Y\\) and \\(g: 2^Y \rightarrow 2^X\\) are _order reversing_ functions. First consider \\(f: 2^X \rightarrow 2^Y\\). What happens if you apply \\(f\\) to a subset \\(x_1 \subseteq x \subseteq X\\)? Then, since the \\(x_1\\) is less than or equal to \\(x\\), the set of attributes which apply to every object in \\(x_1\\) must be greater than or equal to the set of attributes which apply to every object in \\(x\\). We weakened the condition by choosing a subset \\(x_1\\) of \\(x\\), which can only increase the set of common attributes. So \\(x_1 \subseteq x \implies f(x_1) \supseteq f(x)\\), which says that \\(f\\) is order-reversing. Similarly, decreasing a set of attributes \\(y\\) can only increase the set of objects \\(g(y)\\) which have all the attributes in \\(y\\). So \\(g\\) is order-reversing too.`

But \(f: 2^X \rightarrow 2^Y\) and \(g: 2^Y \rightarrow 2^X\) are not just any pair of order-reversing functions, they have an additional property which classifies them as a

Galois connection:\[f(x) \supseteq y \iff x \subseteq g(y)\]

Using Willerton's language, these are equivalent because they both state that the objects in \(x\) have all the attributes in \(y\).

`But \\(f: 2^X \rightarrow 2^Y\\) and \\(g: 2^Y \rightarrow 2^X\\) are not just any pair of order-reversing functions, they have an additional property which classifies them as a **Galois connection**: \\[f(x) \supseteq y \iff x \subseteq g(y)\\] Using Willerton's language, these are equivalent because they both state that the objects in \\(x\\) have all the attributes in \\(y\\).`

Before moving on, let's revisit the definitions of \(f\) and \(g\). Earlier we said:

Now let's spell these out more formally.

`Before moving on, let's revisit the definitions of \\(f\\) and \\(g\\). Earlier we said: > define \\(f(x)\\) as the set of attributes which apply to every object in \\(x\\). > > define \\(g(y)\\) as the set of objects which have every attribute in \\(y\\). Now let's spell these out more formally.`

Recall the definition of a formal context:

In the example given,

`Recall the definition of a formal context: > A **formal context** K is a relation between the objects X and attributes Y. It indicates which attributes apply to which objects. It is represented by a bipartite graph. In the example given, * X = {fruit, vegetable} * Y = {green, yellow, purple} * K = \\(\lbrace (fruit,orange), (fruit,green), (vegetable,purple) \rbrace\\)`

That K is a relation between X and Y means it is a subset of the Cartesian product \(X \times Y\).

Let \(f_1: X \rightarrow 2^Y\) be the function sending an object \(o \in X\) to the set of attributes which apply to \(o\), i.e.,

\[f_1(o) = \lbrace a \in Y\ |\ (o,a) \in K \rbrace \]

I called it \(f_1\) because it operates on a single object.

`That K is a relation between X and Y means it is a subset of the Cartesian product \\(X \times Y\\). Let \\(f_1: X \rightarrow 2^Y\\) be the function sending an object \\(o \in X\\) to the set of attributes which apply to \\(o\\), i.e., \\[f_1(o) = \lbrace a \in Y\ |\ (o,a) \in K \rbrace \\] I called it \\(f_1\\) because it operates on a single object.`

Next, let's lift \(f_1\) to a function that operates on a set of objects.

The standard way is to take the union of \(f_1(o)\) as \(o\) ranges over \(x\), i.e.,

\[\bigcup_{o \in x} f_1(o)\]

However, for formal concept analysis we use a different lifting, which is based on the

intersectionof the sets.Accordingly, define the left derivation operator \(f: 2^X \rightarrow 2^Y\) by

\[f(x) = \bigcap_{o \in x} f_1(o)\]

which is equivalent to

`Next, let's lift \\(f_1\\) to a function that operates on a set of objects. The standard way is to take the union of \\(f_1(o)\\) as \\(o\\) ranges over \\(x\\), i.e., \\[\bigcup_{o \in x} f_1(o)\\] However, for formal concept analysis we use a different lifting, which is based on the _intersection_ of the sets. Accordingly, define the left derivation operator \\(f: 2^X \rightarrow 2^Y\\) by \\[f(x) = \bigcap_{o \in x} f_1(o)\\] which is equivalent to > let \\(f(x)\\) be the attributes which apply to every object in \\(x\\).`

The right derivation operator \(g: 2^Y \rightarrow 2^X\) is defined similarly.

First let \(g_1: Y \rightarrow 2^X\) be the function sending an attribute \(a \in Y\) to the set of objects which have that attribute, i.e.,

\[g_1(a) = \lbrace o \in X\ |\ (o,a) \in K \rbrace \]

Then define the right derivation operator \(g: 2^Y \rightarrow 2^X\) by

\[g(y) = \bigcap_{a \in y} g_1(a)\]

which is equivalent to

`The right derivation operator \\(g: 2^Y \rightarrow 2^X\\) is defined similarly. First let \\(g_1: Y \rightarrow 2^X\\) be the function sending an attribute \\(a \in Y\\) to the set of objects which have that attribute, i.e., \\[g_1(a) = \lbrace o \in X\ |\ (o,a) \in K \rbrace \\] Then define the right derivation operator \\(g: 2^Y \rightarrow 2^X\\) by \\[g(y) = \bigcap_{a \in y} g_1(a)\\] which is equivalent to > let \\(g(y)\\) be the objects which have every attribute in \\(y\\).`