Options

Notes on "At the interface of algebra and statistics," Tai-Danae Bradley

edited February 2021 in Study Groups

This thesis takes inspiration from quantum physics to investigate mathematical structure that lies at the interface of algebra and statistics. The starting point is a passage from classical probability theory to quantum probability theory. The quantum version of a probability distribution is a density operator, the quantum version of marginalizing is an operation called the partial trace, and the quantum version of a marginal probability distribution is a reduced density operator. Every joint probability distribution on a finite set can be modeled as a rank one density operator. By applying the partial trace, we obtain reduced density operators whose diagonals recover classical marginal probabilities. In general, these reduced densities will have rank higher than one, and their eigenvalues and eigenvectors will contain extra information that encodes subsystem interactions governed by statistics. We decode this information, and show it is akin to conditional probability, and then investigate the extent to which the eigenvectors capture "concepts" inherent in the original joint distribution. The theory is then illustrated with an experiment that exploits these ideas. Turning to a more theoretical application, we also discuss a preliminary framework for modeling entailment and concept hierarchy in natural language, namely, by representing expressions in the language as densities. Finally, initial inspiration for this thesis comes from formal concept analysis, which finds many striking parallels with the linear algebra. The parallels are not coincidental, and a common blueprint is found in category theory. We close with an exposition on free (co)completions and how the free-forgetful adjunctions in which they arise strongly suggest that in certain categorical contexts, the "fixed points" of a morphism with its adjoint encode interesting information.

Comments

  • 1.
    edited February 2021

    p. 11: Introduction to formal concept analysis

    See also:

    From Wikipedia:

    Formal concept analysis (FCA) is a principled way of deriving a concept hierarchy or formal ontology from a collection of objects and their properties. Each concept in the hierarchy represents the objects sharing some set of properties; and each sub-concept in the hierarchy represents a subset of the objects (as well as a superset of the properties) in the concepts above it. The term was introduced by Rudolf Wille in 1981, and builds on the mathematical theory of lattices and ordered sets that was developed by Garrett Birkhoff and others in the 1930s.

    Formal concept analysis finds practical application in fields including data mining, text mining, machine learning, knowledge management, semantic web, software development, chemistry and biology.

    Comment Source:p. 11: Introduction to formal concept analysis See also: * [Formal concept analysis](https://en.wikipedia.org/wiki/Formal_concept_analysis), Wikipedia. * Simon Willerton, [Formal concept analysis](https://golem.ph.utexas.edu/category/2013/09/formal_concept_analysis.html), the n-Category Cafe. From Wikipedia: > Formal concept analysis (FCA) is a principled way of deriving a concept hierarchy or formal ontology from a collection of objects and their properties. Each concept in the hierarchy represents the objects sharing some set of properties; and each sub-concept in the hierarchy represents a subset of the objects (as well as a superset of the properties) in the concepts above it. The term was introduced by Rudolf Wille in 1981, and builds on the mathematical theory of lattices and ordered sets that was developed by Garrett Birkhoff and others in the 1930s. > > Formal concept analysis finds practical application in fields including data mining, text mining, machine learning, knowledge management, semantic web, software development, chemistry and biology.
  • 2.
    edited May 2020

    A formal context K is a relation between the objects O and attributes A. It indicates which attributes apply to which objects. It is represented by a bipartite graph.

    A formal concept C is a special kind of sub-relation of K, corresponding I believe to a maximal complete bipartite subgraph.

    That is, it is a subset \(X\) of the objects along with a subset \(Y\) of the attributes, such that every object in \(X\) has all the attributes in \(Y\), and \(X\) is maximal in this regard.

    Note this is not the standard formal definition, as presented in op. cit., Wikipedia, etc., but I believe it is equivalent.

    Comment Source:A **formal context** K is a relation between the objects O and attributes A. It indicates which attributes apply to which objects. It is represented by a bipartite graph. A formal concept C is a special kind of sub-relation of K, corresponding I believe to a maximal complete bipartite subgraph. That is, it is a subset \\(X\\) of the objects along with a subset \\(Y\\) of the attributes, such that every object in \\(X\\) has all the attributes in \\(Y\\), and \\(X\\) is maximal in this regard. Note this is not the standard formal definition, as presented in _op. cit._, Wikipedia, etc., but I believe it is equivalent.
  • 3.
    edited February 2021

    p.13

    Formal concepts are illustrated with a miniature example, with:

    • Objects O = {fruit, vegetable}
    • Attributes A = {green, yellow, purple}

    The formal context given is \(\lbrace (fruit,orange), (fruit,green), (vegetable,purple) \rbrace\).

    Writing this using arrows for the tuples:

    • \(fruit \rightarrow orange\)
    • \(fruit \rightarrow green\)
    • \(vegetable \rightarrow purple\)

    And reversing the direction to match language scanning:

    • \(orange \leftarrow fruit\)
    • \(green \leftarrow fruit\)
    • \(purple \leftarrow vegetable\)
    Comment Source:p.13 Formal concepts are illustrated with a miniature example, with: * Objects O = {fruit, vegetable} * Attributes A = {green, yellow, purple} The formal context given is \\(\lbrace (fruit,orange), (fruit,green), (vegetable,purple) \rbrace\\). Writing this using arrows for the tuples: * \\(fruit \rightarrow orange\\) * \\(fruit \rightarrow green\\) * \\(vegetable \rightarrow purple\\) And reversing the direction to match language scanning: * \\(orange \leftarrow fruit\\) * \\(green \leftarrow fruit\\) * \\(purple \leftarrow vegetable\\)
  • 4.
    edited May 2020

    This context could be derived by analyzing a small text, such as "Today I would like to get either an orange fruit, a green fruit, or a purple vegetable."

    In a simple model, natural language processing tells us that an attribute/adjective X applies to an object/noun Y if the phrase X Y appears somewhere in the text.

    There are two formal concepts contained in this context:

    C1 = {(fruit,orange), (fruit,green)}

    C2 = {(vegetable,purple)}

    Each of these is a maximal complete bipartite subgraph.

    Comment Source:This context could be derived by analyzing a small text, such as "Today I would like to get either an orange fruit, a green fruit, or a purple vegetable." In a simple model, natural language processing tells us that an attribute/adjective X applies to an object/noun Y if the phrase X Y appears somewhere in the text. There are two formal concepts contained in this context: C1 = {(fruit,orange), (fruit,green)} C2 = {(vegetable,purple)} Each of these is a maximal complete bipartite subgraph.
  • 5.
    edited February 2021

    p.14

    The first formal concept is, perhaps, the concept of citrus fruits. The second formal concept is that of a purple vegetable. There are plenty of these: red cabbage, purple cauliflower, purple carrots, purple asparagus, endive, and eggplant—though that’s technically a fruit!—to name a few.

    Comment Source:p.14 > The first formal concept is, perhaps, the concept of citrus fruits. The second formal concept is that of a purple vegetable. There are plenty of these: red cabbage, purple cauliflower, purple carrots, purple asparagus, endive, and eggplant—though that’s technically a fruit!—to name a few.
  • 6.

    Now that we have some examples to inform our intuition, let's look at the standard technical definition of a formal concept, which is based on what are called "derivation operators."

    Comment Source:Now that we have some examples to inform our intuition, let's look at the standard technical definition of a formal concept, which is based on what are called "derivation operators."
  • 7.
    edited May 2020

    Let X be all the objects, and Y all the attributes.

    Then using standard notation, the powerset \(2^X\) consists of all subsets of \(X\), i.e., all sets of objects. And \(2^Y\) consists of all subsets of \(Y\), all sets of attributes.

    Define functions \(f: 2^X \rightarrow 2^Y\) and \(g: 2^Y \rightarrow 2^X\), called derivation operators, as follows.

    Let \(x \subseteq X\) be a subset of the objects.

    Then define \(x' = f(x) \subseteq Y\) as the set of attributes which apply to every object in \(X\).

    That defines the derivation operator \(f: 2^X \rightarrow 2^Y\).

    Comment Source:Let X be all the objects, and Y all the attributes. Then using standard notation, the powerset \\(2^X\\) consists of all subsets of \\(X\\), i.e., all sets of objects. And \\(2^Y\\) consists of all subsets of \\(Y\\), all sets of attributes. Define functions \\(f: 2^X \rightarrow 2^Y\\) and \\(g: 2^Y \rightarrow 2^X\\), called derivation operators, as follows. Let \\(x \subseteq X\\) be a subset of the objects. Then define \\(x' = f(x) \subseteq Y\\) as the set of attributes which apply to every object in \\(X\\). That defines the derivation operator \\(f: 2^X \rightarrow 2^Y\\).
  • 8.
    edited May 2020

    The derivation operator \(g: 2^Y \rightarrow 2^X\) is defined symmetrically.

    Let \(y \subseteq Y\) be a subset of the attributes.

    Then define \(y' = g(y) \subseteq X\) as the set of objects which have every attribute in \(Y\).

    Comment Source:The derivation operator \\(g: 2^Y \rightarrow 2^X\\) is defined symmetrically. Let \\(y \subseteq Y\\) be a subset of the attributes. Then define \\(y' = g(y) \subseteq X\\) as the set of objects which have every attribute in \\(Y\\).
  • 9.
    edited May 2020

    Then a formal concept is defined as a pair (x,y), where f(x) = y and g(y) = x.

    Or, using our other notation, (x,y) is a formal concept means that x' = y and y' = x.

    Comment Source:Then a **formal concept** is defined as a pair (x,y), where f(x) = y and g(y) = x. Or, using our other notation, (x,y) is a formal concept means that x' = y and y' = x.
  • 10.
    edited May 2020

    Exercise: prove (or refute) that this is equivalent to the definition I gave in comment #3:

    That is, it is a subset X of the objects along with a subset Y of the attributes, such that every object in X has all the attributes in Y, and X is maximal in this regard.

    Comment Source:Exercise: prove (or refute) that this is equivalent to the definition I gave in comment #3: > That is, it is a subset X of the objects along with a subset Y of the attributes, such that every object in X has all the attributes in Y, and X is maximal in this regard.
  • 11.
    edited May 2020

    These derivation operators have some nice order-theoretic structure.

    First, note the powersets \(2^X\) and \(2^Y\) are of course partially ordered by the inclusion relation.

    Next, it is not hard to see that \(f: 2^X \rightarrow 2^Y\) and \(g: 2^Y \rightarrow 2^X\) are order reversing functions.

    First consider \(f: 2^X \rightarrow 2^Y\).

    What happens if you apply \(f\) to a subset \(x_1 \subseteq x \subseteq X\)?

    Then, since the \(x_1\) is less than or equal to \(x\), the set of attributes which apply to every object in \(x_1\) must be greater than or equal to the set of attributes which apply to every object in \(x\). We weakened the condition by choosing a subset \(x_1\) of \(x\), which can only increase the set of common attributes.

    So \(x_1 \subseteq x \implies f(x_1) \supseteq f(x)\), which says that \(f\) is order-reversing.

    Similarly, decreasing a set of attributes \(y\) can only increase the set of objects \(g(y)\) which have all the attributes in \(y\).

    So \(g\) is order-reversing too.

    Comment Source:These derivation operators have some nice order-theoretic structure. First, note the powersets \\(2^X\\) and \\(2^Y\\) are of course partially ordered by the inclusion relation. Next, it is not hard to see that \\(f: 2^X \rightarrow 2^Y\\) and \\(g: 2^Y \rightarrow 2^X\\) are _order reversing_ functions. First consider \\(f: 2^X \rightarrow 2^Y\\). What happens if you apply \\(f\\) to a subset \\(x_1 \subseteq x \subseteq X\\)? Then, since the \\(x_1\\) is less than or equal to \\(x\\), the set of attributes which apply to every object in \\(x_1\\) must be greater than or equal to the set of attributes which apply to every object in \\(x\\). We weakened the condition by choosing a subset \\(x_1\\) of \\(x\\), which can only increase the set of common attributes. So \\(x_1 \subseteq x \implies f(x_1) \supseteq f(x)\\), which says that \\(f\\) is order-reversing. Similarly, decreasing a set of attributes \\(y\\) can only increase the set of objects \\(g(y)\\) which have all the attributes in \\(y\\). So \\(g\\) is order-reversing too.
  • 12.
    edited May 2020

    But \(f: 2^X \rightarrow 2^Y\) and \(g: 2^Y \rightarrow 2^X\) are not just any pair of order-reversing functions, they have an additional property which classifies them as a Galois connection:

    \[f(x) \supseteq y \iff x \subseteq g(y)\]

    Using Willerton's language, these are equivalent because they both state that the objects in \(x\) have all the attributes in \(y\).

    Comment Source:But \\(f: 2^X \rightarrow 2^Y\\) and \\(g: 2^Y \rightarrow 2^X\\) are not just any pair of order-reversing functions, they have an additional property which classifies them as a **Galois connection**: \\[f(x) \supseteq y \iff x \subseteq g(y)\\] Using Willerton's language, these are equivalent because they both state that the objects in \\(x\\) have all the attributes in \\(y\\).
  • 13.
    edited May 2020

    Before moving on, let's revisit the definitions of \(f\) and \(g\). Earlier we said:

    define \(f(x)\) as the set of attributes which apply to every object in \(x\).

    define \(g(y)\) as the set of objects which have every attribute in \(y\).

    Now let's spell these out more formally.

    Comment Source:Before moving on, let's revisit the definitions of \\(f\\) and \\(g\\). Earlier we said: > define \\(f(x)\\) as the set of attributes which apply to every object in \\(x\\). > > define \\(g(y)\\) as the set of objects which have every attribute in \\(y\\). Now let's spell these out more formally.
  • 14.
    edited May 2020

    Recall the definition of a formal context:

    A formal context K is a relation between the objects X and attributes Y. It indicates which attributes apply to which objects. It is represented by a bipartite graph.

    In the example given,

    • X = {fruit, vegetable}
    • Y = {green, yellow, purple}
    • K = \(\lbrace (fruit,orange), (fruit,green), (vegetable,purple) \rbrace\)
    Comment Source:Recall the definition of a formal context: > A **formal context** K is a relation between the objects X and attributes Y. It indicates which attributes apply to which objects. It is represented by a bipartite graph. In the example given, * X = {fruit, vegetable} * Y = {green, yellow, purple} * K = \\(\lbrace (fruit,orange), (fruit,green), (vegetable,purple) \rbrace\\)
  • 15.
    edited May 2020

    That K is a relation between X and Y means it is a subset of the Cartesian product \(X \times Y\).

    Let \(f_1: X \rightarrow 2^Y\) be the function sending an object \(o \in X\) to the set of attributes which apply to \(o\), i.e.,

    \[f_1(o) = \lbrace a \in Y\ |\ (o,a) \in K \rbrace \]

    I called it \(f_1\) because it operates on a single object.

    Comment Source:That K is a relation between X and Y means it is a subset of the Cartesian product \\(X \times Y\\). Let \\(f_1: X \rightarrow 2^Y\\) be the function sending an object \\(o \in X\\) to the set of attributes which apply to \\(o\\), i.e., \\[f_1(o) = \lbrace a \in Y\ |\ (o,a) \in K \rbrace \\] I called it \\(f_1\\) because it operates on a single object.
  • 16.
    edited May 2020

    Next, let's lift \(f_1\) to a function that operates on a set of objects.

    The standard way is to take the union of \(f_1(o)\) as \(o\) ranges over \(x\), i.e.,

    \[\bigcup_{o \in x} f_1(o)\]

    However, for formal concept analysis we use a different lifting, which is based on the intersection of the sets.

    Accordingly, define the left derivation operator \(f: 2^X \rightarrow 2^Y\) by

    \[f(x) = \bigcap_{o \in x} f_1(o)\]

    which is equivalent to

    let \(f(x)\) be the attributes which apply to every object in \(x\).

    Comment Source:Next, let's lift \\(f_1\\) to a function that operates on a set of objects. The standard way is to take the union of \\(f_1(o)\\) as \\(o\\) ranges over \\(x\\), i.e., \\[\bigcup_{o \in x} f_1(o)\\] However, for formal concept analysis we use a different lifting, which is based on the _intersection_ of the sets. Accordingly, define the left derivation operator \\(f: 2^X \rightarrow 2^Y\\) by \\[f(x) = \bigcap_{o \in x} f_1(o)\\] which is equivalent to > let \\(f(x)\\) be the attributes which apply to every object in \\(x\\).
  • 17.
    edited May 2020

    The right derivation operator \(g: 2^Y \rightarrow 2^X\) is defined similarly.

    First let \(g_1: Y \rightarrow 2^X\) be the function sending an attribute \(a \in Y\) to the set of objects which have that attribute, i.e.,

    \[g_1(a) = \lbrace o \in X\ |\ (o,a) \in K \rbrace \]

    Then define the right derivation operator \(g: 2^Y \rightarrow 2^X\) by

    \[g(y) = \bigcap_{a \in y} g_1(a)\]

    which is equivalent to

    let \(g(y)\) be the objects which have every attribute in \(y\).

    Comment Source:The right derivation operator \\(g: 2^Y \rightarrow 2^X\\) is defined similarly. First let \\(g_1: Y \rightarrow 2^X\\) be the function sending an attribute \\(a \in Y\\) to the set of objects which have that attribute, i.e., \\[g_1(a) = \lbrace o \in X\ |\ (o,a) \in K \rbrace \\] Then define the right derivation operator \\(g: 2^Y \rightarrow 2^X\\) by \\[g(y) = \bigcap_{a \in y} g_1(a)\\] which is equivalent to > let \\(g(y)\\) be the objects which have every attribute in \\(y\\).
Sign In or Register to comment.