Options

Commentary on: At the interface of algebra and statistics, by Tai-Danae Bradley

edited May 24 in Study Groups

This recent thesis is chock full of mathematical ideas. The style is engaging and makes the material very accessible.

This thesis takes inspiration from quantum physics to investigate mathematical structure that lies at the interface of algebra and statistics. The starting point is a passage from classical probability theory to quantum probability theory. The quantum version of a probability distribution is a density operator, the quantum version of marginalizing is an operation called the partial trace, and the quantum version of a marginal probability distribution is a reduced density operator. Every joint probability distribution on a finite set can be modeled as a rank one density operator. By applying the partial trace, we obtain reduced density operators whose diagonals recover classical marginal probabilities. In general, these reduced densities will have rank higher than one, and their eigenvalues and eigenvectors will contain extra information that encodes subsystem interactions governed by statistics. We decode this information, and show it is akin to conditional probability, and then investigate the extent to which the eigenvectors capture "concepts" inherent in the original joint distribution. The theory is then illustrated with an experiment that exploits these ideas. Turning to a more theoretical application, we also discuss a preliminary framework for modeling entailment and concept hierarchy in natural language, namely, by representing expressions in the language as densities. Finally, initial inspiration for this thesis comes from formal concept analysis, which finds many striking parallels with the linear algebra. The parallels are not coincidental, and a common blueprint is found in category theory. We close with an exposition on free (co)completions and how the free-forgetful adjunctions in which they arise strongly suggest that in certain categorical contexts, the "fixed points" of a morphism with its adjoint encode interesting information.

Comments

  • 1.
    edited May 23

    p.11: Introduction to formal concept analysis

    Comment Source:p.11: Introduction to formal concept analysis
  • 2.
    edited May 23

    Formal concept analysis (FCA) is a principled way of deriving a concept hierarchy or formal ontology from a collection of objects and their properties. Each concept in the hierarchy represents the objects sharing some set of properties; and each sub-concept in the hierarchy represents a subset of the objects (as well as a superset of the properties) in the concepts above it. The term was introduced by Rudolf Wille in 1981, and builds on the mathematical theory of lattices and ordered sets that was developed by Garrett Birkhoff and others in the 1930s.

    Formal concept analysis finds practical application in fields including data mining, text mining, machine learning, knowledge management, semantic web, software development, chemistry and biology.

    Comment Source:* [Formal concept analysis](https://en.wikipedia.org/wiki/Formal_concept_analysis), Wikipedia > Formal concept analysis (FCA) is a principled way of deriving a concept hierarchy or formal ontology from a collection of objects and their properties. Each concept in the hierarchy represents the objects sharing some set of properties; and each sub-concept in the hierarchy represents a subset of the objects (as well as a superset of the properties) in the concepts above it. The term was introduced by Rudolf Wille in 1981, and builds on the mathematical theory of lattices and ordered sets that was developed by Garrett Birkhoff and others in the 1930s. > > Formal concept analysis finds practical application in fields including data mining, text mining, machine learning, knowledge management, semantic web, software development, chemistry and biology. * Simon Willerton, [Formal concept analysis](https://golem.ph.utexas.edu/category/2013/09/formal_concept_analysis.html), the n-Category Cafe.
  • 3.
    edited May 23

    A formal context T is a relation between the objects O and attributes A. It indicates which attributes apply to which objects. It is represented by a bipartite graph.

    A formal concept C is a special kind of sub-relation of T, corresponding I believe to a maximal complete bipartite subgraph.

    That is, it is a subset \(X\) of the objects along with a subset \(Y\) of the attributes, such that every object in \(X\) has all the attributes in \(Y\), and \(X\) is maximal in this regard.

    Note this is not the standard formal definition, as presented in op. cit., Wikipedia, etc., but I believe it is equivalent.

    Comment Source:A **formal context** T is a relation between the objects O and attributes A. It indicates which attributes apply to which objects. It is represented by a bipartite graph. A formal concept C is a special kind of sub-relation of T, corresponding I believe to a maximal complete bipartite subgraph. That is, it is a subset \\(X\\) of the objects along with a subset \\(Y\\) of the attributes, such that every object in \\(X\\) has all the attributes in \\(Y\\), and \\(X\\) is maximal in this regard. Note this is not the standard formal definition, as presented in _op. cit._, Wikipedia, etc., but I believe it is equivalent.
  • 4.
    edited May 23

    Op. cit., p.13

    Formal concepts are illustrated with a toy example, where:

    • Objects O = {fruit, vegetable}
    • Attributes A = {green, yellow, purple}

    The formal context given is \(\lbrace (fruit,orange), (fruit,green), (vegetable,purple) \rbrace\).

    Writing this using arrows for the tuples:

    • \(fruit \rightarrow orange\)
    • \(fruit \rightarrow green\)
    • \(vegetable \rightarrow purple\)

    And reversing the direction to match language scanning:

    • \(orange \leftarrow fruit\)
    • \(green \leftarrow fruit\)
    • \(purple \leftarrow vegetable\)
    Comment Source:_Op. cit._, p.13 Formal concepts are illustrated with a toy example, where: * Objects O = {fruit, vegetable} * Attributes A = {green, yellow, purple} The formal context given is \\(\lbrace (fruit,orange), (fruit,green), (vegetable,purple) \rbrace\\). Writing this using arrows for the tuples: * \\(fruit \rightarrow orange\\) * \\(fruit \rightarrow green\\) * \\(vegetable \rightarrow purple\\) And reversing the direction to match language scanning: * \\(orange \leftarrow fruit\\) * \\(green \leftarrow fruit\\) * \\(purple \leftarrow vegetable\\)
  • 5.
    edited May 23

    This context could be derived by analyzing a small text, such as "Today I would like to get either an orange fruit, a green fruit, or a purple vegetable."

    In a simple model, natural language processing tells us that an attribute/adjective X applies to an object/noun Y if the phrase X Y appears somewhere in the text.

    There are two formal concepts contained in this context:

    C1 = {(fruit,orange), (fruit,green)}

    C2 = {(vegetable,purple)}

    Each of these is a maximal complete bipartite subgraph.

    Comment Source:This context could be derived by analyzing a small text, such as "Today I would like to get either an orange fruit, a green fruit, or a purple vegetable." In a simple model, natural language processing tells us that an attribute/adjective X applies to an object/noun Y if the phrase X Y appears somewhere in the text. There are two formal concepts contained in this context: C1 = {(fruit,orange), (fruit,green)} C2 = {(vegetable,purple)} Each of these is a maximal complete bipartite subgraph.
  • 6.
    edited May 23

    Op. cit., p.14

    The first formal concept is, perhaps, the concept of citrus fruits. The second formal concept is that of a purple vegetable. There are plenty of these: red cabbage, purple cauliflower, purple carrots, purple asparagus, endive, and eggplant—though that’s technically a fruit!—to name a few.

    Comment Source:_Op. cit._, p.14 > The first formal concept is, perhaps, the concept of citrus fruits. The second formal concept is that of a purple vegetable. There are plenty of these: red cabbage, purple cauliflower, purple carrots, purple asparagus, endive, and eggplant—though that’s technically a fruit!—to name a few.
  • 7.

    Now that we have some examples to inform our intuition, let's look at the standard technical definition of a formal concept, which is based on what are called "derivation operators."

    Comment Source:Now that we have some examples to inform our intuition, let's look at the standard technical definition of a formal concept, which is based on what are called "derivation operators."
  • 8.
    edited May 23

    Let X be all the objects, and Y all the attributes.

    Then using standard notation, the powerset \(2^X\) consists of all subsets of \(X\), i.e., all sets of objects. And \(2^Y\) consists of all subsets of \(Y\), all sets of attributes.

    Define functions \(f: 2^X \rightarrow 2^Y\) and \(g: 2^Y \rightarrow 2^X\), called derivation operators, as follows.

    Let \(x \subseteq X\) be a subset of the objects.

    Then define \(x' = f(x) \subseteq Y\) as the set of attributes which apply to every object in \(X\).

    That defines the derivation operator \(f: 2^X \rightarrow 2^Y\).

    Comment Source:Let X be all the objects, and Y all the attributes. Then using standard notation, the powerset \\(2^X\\) consists of all subsets of \\(X\\), i.e., all sets of objects. And \\(2^Y\\) consists of all subsets of \\(Y\\), all sets of attributes. Define functions \\(f: 2^X \rightarrow 2^Y\\) and \\(g: 2^Y \rightarrow 2^X\\), called derivation operators, as follows. Let \\(x \subseteq X\\) be a subset of the objects. Then define \\(x' = f(x) \subseteq Y\\) as the set of attributes which apply to every object in \\(X\\). That defines the derivation operator \\(f: 2^X \rightarrow 2^Y\\).
  • 9.
    edited May 23

    The derivation operator \(g: 2^Y \rightarrow 2^X\) is defined symmetrically.

    Let \(y \subseteq Y\) be a subset of the attributes.

    Then define \(y' = g(y) \subseteq X\) as the set of objects which have every attribute in \(Y\).

    Comment Source:The derivation operator \\(g: 2^Y \rightarrow 2^X\\) is defined symmetrically. Let \\(y \subseteq Y\\) be a subset of the attributes. Then define \\(y' = g(y) \subseteq X\\) as the set of objects which have every attribute in \\(Y\\).
  • 10.
    edited May 23

    Then a formal concept is defined as a pair (x,y), where f(x) = y and g(y) = x.

    Or, using our other notation, (x,y) is a formal concept means that x' = y and y' = x.

    Comment Source:Then a **formal concept** is defined as a pair (x,y), where f(x) = y and g(y) = x. Or, using our other notation, (x,y) is a formal concept means that x' = y and y' = x.
  • 11.
    edited May 23

    Exercise: prove (or refute) that this is equivalent to the definition I gave in comment #3:

    That is, it is a subset X of the objects along with a subset Y of the attributes, such that every object in X has all the attributes in Y, and X is maximal in this regard.

    Comment Source:Exercise: prove (or refute) that this is equivalent to the definition I gave in comment #3: > That is, it is a subset X of the objects along with a subset Y of the attributes, such that every object in X has all the attributes in Y, and X is maximal in this regard.
  • 12.
    edited May 23

    These derivation operators have some nice order-theoretic structure.

    First, note the powersets \(2^X\) and \(2^Y\) are of course partially ordered by the inclusion relation.

    Next, it is not hard to see that \(f: 2^X \rightarrow 2^Y\) and \(g: 2^Y \rightarrow 2^X\) are order reversing functions.

    First consider \(f: 2^X \rightarrow 2^Y\).

    What happens if you apply \(f\) to a subset \(x_1 \subseteq x \subseteq X\)?

    Then, since the \(x_1\) is less than or equal to \(x\), the set of attributes which apply to every object in \(x_1\) must be greater than or equal to the set of attributes which apply to every object in \(x\). We weakened the condition by choosing a subset \(x_1\) of \(x\), which can only increase the set of common attributes.

    So \(x_1 \subseteq x \implies f(x_1) \supseteq f(x)\), which says that \(f\) is order-reversing.

    Similarly, decreasing a set of attributes \(y\) can only increase the set of objects \(g(y)\) which have all the attributes in \(y\).

    So \(g\) is order-reversing too.

    Comment Source:These derivation operators have some nice order-theoretic structure. First, note the powersets \\(2^X\\) and \\(2^Y\\) are of course partially ordered by the inclusion relation. Next, it is not hard to see that \\(f: 2^X \rightarrow 2^Y\\) and \\(g: 2^Y \rightarrow 2^X\\) are _order reversing_ functions. First consider \\(f: 2^X \rightarrow 2^Y\\). What happens if you apply \\(f\\) to a subset \\(x_1 \subseteq x \subseteq X\\)? Then, since the \\(x_1\\) is less than or equal to \\(x\\), the set of attributes which apply to every object in \\(x_1\\) must be greater than or equal to the set of attributes which apply to every object in \\(x\\). We weakened the condition by choosing a subset \\(x_1\\) of \\(x\\), which can only increase the set of common attributes. So \\(x_1 \subseteq x \implies f(x_1) \supseteq f(x)\\), which says that \\(f\\) is order-reversing. Similarly, decreasing a set of attributes \\(y\\) can only increase the set of objects \\(g(y)\\) which have all the attributes in \\(y\\). So \\(g\\) is order-reversing too.
  • 13.
    edited May 23

    But \(f: 2^X \rightarrow 2^Y\) and \(g: 2^Y \rightarrow 2^X\) are not just any pair of order-reversing functions, they have an additional property which classifies them as a Galois connection:

    \[f(x) \supseteq y \iff x \subseteq g(y)\]

    Using Willerton's language, these are equivalent because they both state that the objects in \(x\) have all the attributes in \(y\).

    Comment Source:But \\(f: 2^X \rightarrow 2^Y\\) and \\(g: 2^Y \rightarrow 2^X\\) are not just any pair of order-reversing functions, they have an additional property which classifies them as a **Galois connection**: \\[f(x) \supseteq y \iff x \subseteq g(y)\\] Using Willerton's language, these are equivalent because they both state that the objects in \\(x\\) have all the attributes in \\(y\\).
  • 14.
    edited May 23

    Next, we can look at the composite functions:

    \[g \circ f: 2^X \rightarrow 2^X\] \[f \circ g: 2^Y \rightarrow 2^Y\]

    Comment Source:Next, we can look at the composite functions: \\[g \circ f: 2^X \rightarrow 2^X\\] \\[f \circ g: 2^Y \rightarrow 2^Y\\]
Sign In or Register to comment.