Options

Exercises and Puzzles 6 - Chapter 1

edited May 6 in Exercises

It's time for more exercises!

Monotone functions between discrete posets:

Pullbacks of partitions, with a nice picture by Frederick Eisele:

Showing that there's a category of preorders:

Showing that skeletal dagger preorders are discrete:

Showing that a certain map from a poset of partitions to the Boolean poset \(\mathbb{B}\) is monotone:

Next, some puzzles about adjoints. For any set \(X\) let \(P(X)\) be the power set of set, namely the set of all subsets of \(X\). It's easy to see that the subset relation \(\subseteq\) makes \(P(X)\) into a poset. Suppose we have any function between sets

$$ f : X \to Y $$ This gives a function

$$ f_{!} : P(X) \to P(Y) $$ sending each subset \(S \subseteq X\) to its image under \(f\):

$$f_{!}(S) = \{ y \in Y: \; y = f(x) \textrm{ for some } x \in S \} . $$ Puzzle 17. Show that \( f_{!} : P(X) \to P(Y) \) is a monotone function.

Puzzle 18. Does \( f_{!} \) always have a left adjoint? If so, describe it. If not, give an example where it doesn't, and some conditions under which it does have a left adjoint.

Puzzle 19. Does \(f_{!}\) always have a right adjoint? If so, describe it. If not, give an example where it doesn't, and some conditions under which it does have a right adjoint.

For answers, see the discussion on Lecture 6. One of the adjoints of \(f_{!}\) only exists when \(f\) obeys a certain condition!

To see all the exercises in Chapter 1, go here.

Comments

  • 1.
    edited April 30

    Hey John,

    I have been playing around with these questions in a particular context.

    There is still something I am trying to figure out.

    Specifically, I am thinking of this puzzle by Allison Burgers in the comments in lecture 6

    Take any linear map \(A : V \to W\). This can be lifted to a monotone map \(A_!\) on the complete lattices of vector subspaces of \(V\) and \(W\).

    I am trying to show

    $$ A^! = A^+_! $$ where \(A^+\) is the Moore-Penrose pseudo-inverse. This is used all the time in linear regression.


    First, \(A_!\) has arbitrary meets and joins over lattices of subspaces. This is because \(A\) is a linear operator. This suffices Puzzle 18 and Puzzle 19, and moreover suffices the adjoint functor theorem for posets.

    Hence \(A_!\) has a left adjoint and right adjoint.

    I will focus on \(A_!\)'s right adjoint, which is \(A^!\).

    To show this is the Moore-Penrose inverse, we need to show the following axioms are obeyed (as per Wikipedia):

    • \(A\circ A^+ \circ A = A\)
    • \(A^+\circ A\circ A^+ = A^+\)
    • \((A\circ A^+)^* = A\circ A^+\)
    • \((A^+\circ A)^* = A^+\circ A\)

    Here \(T^\ast\) denotes the Hermitian transpose of \(T\).

    I can show a weakened version of the first two axioms, but I have no idea how to prove the second two axioms :(

    Proposition: \(A_! \circ A^! \circ A_! = A_!\)

    Proof.

    Let \(X \sqsubseteq V\) be a subspace of \(V\) and \(Y \sqsubseteq W\) be a subspace of \(W\).

    First, observe that:

    $$ A_!(X) \sqsubseteq A_!(X) \tag{$\star$} $$ Hence by right adjointness:

    $$ X \sqsubseteq (A^! \circ A_!) (X) $$ Since \(A_!\) is monotone we have:

    $$ A_! (X) \sqsubseteq (A_! \circ A^! \circ A_!) (X) \tag{1} $$ Next, using \((\star)\) and \(A^!\) monotone, we have

    $$ (A^! \circ A_!)(X) \sqsubseteq (A^! \circ A_!) (X) $$ Hence by right adjointness:

    $$ (A_! \circ A^! \circ A_!)(X) \sqsubseteq A_!(X) \tag{2}$$ (1) and (2) suffice to show, for all \(X\), \((A_! \circ A^! \circ A_!) (X) = A_! (X) \). This suffices the proposition. \(\Box\)

    A rather similar proof gives \(A^! \circ A_! \circ A^! = A^!\).

    I suspect that in general for any two adjoint functors \(F \vdash G\), then \(F \circ G \circ F = F\) and \(G \circ F \circ G = G\).

    But this is as far as I've got with this. I've got the following problems:

    • I am struggling to show the identity \( A^{++} = A\) holds. In this context this means \(A^! \dashv A_! \dashv A^!\). This property usually only holds for inverses.

    • I am also not quite sure how to recover a linear map \(T\) where \(A^! = T_!\).

    • Finally, I have now idea how to show \(A_! \circ A^!\) and \(A^! \circ A_!\) are Hermite, as per the four axioms.

    It is interesting to note that \(A \circ A^+\) is a projection, and hence a monad. In fact it is the orthogonal projection of \(A\).

    Comment Source:Hey John, I have been playing around with these questions in a particular context. There is still something I am trying to figure out. Specifically, I am thinking of this puzzle by Allison Burgers in [the comments in lecture 6](https://forum.azimuthproject.org/discussion/comment/17149/#Comment_17149) Take any [linear map](https://en.wikipedia.org/wiki/Linear_map) \\(A : V \to W\\). This can be lifted to a monotone map \\(A_!\\) on the [complete lattices of vector subspaces](https://en.wikipedia.org/wiki/Linear_subspace#Lattice_of_subspaces) of \\(V\\) and \\(W\\). I am trying to show $$ A^! = A^+_! $$ where \\(A^+\\) is the [Moore-Penrose pseudo-inverse](https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse). This is used all the time in [linear regression](https://en.wikipedia.org/wiki/Linear_regression). ------------------------------------------------ First, \\(A_!\\) has arbitrary meets and joins over lattices of subspaces. This is because \\(A\\) is a linear operator. This suffices **Puzzle 18** and **Puzzle 19**, and moreover suffices [*the adjoint functor theorem for posets*](https://forum.azimuthproject.org/discussion/2031/lecture-16-chapter-1-the-adjoint-functor-theorem-for-posets). Hence \\(A_!\\) has a left adjoint and right adjoint. I will focus on \\(A_!\\)'s right adjoint, which is \\(A^!\\). To show this is the Moore-Penrose inverse, we need to show the following axioms are obeyed (as per [Wikipedia](https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse#Definition)): - \\(A\circ A^+ \circ A = A\\) - \\(A^+\circ A\circ A^+ = A^+\\) - \\((A\circ A^+)^* = A\circ A^+\\) - \\((A^+\circ A)^* = A^+\circ A\\) Here \\(T^\ast\\) denotes the [Hermitian transpose](https://en.wikipedia.org/wiki/Hermitian_matrix) of \\(T\\). **I can show a *weakened* version of the first two axioms, but I have no idea how to prove the second two axioms** :( *Proposition*: \\(A_! \circ A^! \circ A_! = A_!\\) **Proof.** Let \\(X \sqsubseteq V\\) be a subspace of \\(V\\) and \\(Y \sqsubseteq W\\) be a subspace of \\(W\\). First, observe that: $$ A_!(X) \sqsubseteq A_!(X) \tag{$\star$} $$ Hence by right adjointness: $$ X \sqsubseteq (A^! \circ A_!) (X) $$ Since \\(A_!\\) is monotone we have: $$ A_! (X) \sqsubseteq (A_! \circ A^! \circ A_!) (X) \tag{1} $$ Next, using \\((\star)\\) and \\(A^!\\) monotone, we have $$ (A^! \circ A_!)(X) \sqsubseteq (A^! \circ A_!) (X) $$ Hence by right adjointness: $$ (A_! \circ A^! \circ A_!)(X) \sqsubseteq A_!(X) \tag{2}$$ (1) and (2) suffice to show, for all \\(X\\), \\((A_! \circ A^! \circ A_!) (X) = A_! (X) \\). This suffices the proposition. \\(\Box\\) A rather similar proof gives \\(A^! \circ A_! \circ A^! = A^!\\). I suspect that in general for *any* two adjoint functors \\(F \vdash G\\), then \\(F \circ G \circ F = F\\) and \\(G \circ F \circ G = G\\). But this is as far as I've got with this. I've got the following problems: - I am struggling to show the identity [\\( A^{++} = A\\)](https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse#Basic_properties) holds. In this context this means \\(A^! \dashv A_! \dashv A^!\\). This property *usually* only holds for inverses. - I am also not quite sure how to recover a linear map \\(T\\) where \\(A^! = T_!\\). - Finally, I have now idea how to show \\(A_! \circ A^!\\) and \\(A^! \circ A_!\\) are Hermite, as per the four axioms. It is interesting to note that \\(A \circ A^+\\) is a [projection](https://en.wikipedia.org/wiki/Projection_(linear_algebra)), and hence a monad. In fact it is the [orthogonal projection of \\(A\\)](http://www.math.lsa.umich.edu/~speyer/417/OrthoProj.pdf).
  • 2.
    edited April 30

    Matthew - I'm glad you've been working on Allison Burger's puzzles! I found them interesting but hadn't made much progress on them. I've contacted her... maybe she can help us out.

    I've never understood these pseudo-inverses; if I could understand them using adjoints that would make me very happy.

    Comment Source:Matthew - I'm glad you've been working on [Allison Burger's puzzles](https://forum.azimuthproject.org/discussion/comment/17149/#Comment_17149)! I found them interesting but hadn't made much progress on them. I've contacted her... maybe she can help us out. I've never understood these pseudo-inverses; if I could understand them using adjoints that would make me very happy.
  • 3.
    edited May 2

    I've never understood these pseudo-inverses; if I could understand them using adjoints that would make me very happy.

    I've seem them crop up a bunch of places in statistics and machine learning.

    It's a nice exercise to revisit its uses, so I've made some notes.

    Suppose I have a linear model represented by a linear map \(A\) and some measurements \(\vec{y}\). I want to find an input \(\vec{\beta}\) such that

    $$ A \vec{\beta} \approx \vec{y} $$ Suppose that \(A\) can be represented by a matrix \(M \in \mathbb{R}^{m \times n}\) where \(m > n\). Then we can naturally think of \(A\) as an overdetermined set of linear equations.

    Setting

    $$ \hat{\beta} = A^+ y $$ Where \(A^+\) is the Moore-Penrose pseudo inverse of \(A\) yields the Ordinary Least Squares (OLS) solution to \(A \vec{\beta} \approx y\).

    That is to say:

    $$ \hat{\beta} = \underset{\vec{b}}{\mathrm{argmin}}\, || y - A \vec{b} || $$ If \(A^\ast A\) has full rank then:

    $$ A^+ = (A^\ast A)^{-1} A^\ast $$ Otherwise you can:

    According to this stack exchange, this is slower but more numerically stable than QR decomposition.

    I've also seen the pseudo-inverse kicking around elsewhere:

    • In statistics and convex optimization, \(A^+ y\) is an L2-norm minimizer. In machine learning this is important. Compared to minimizers of other norms, such as L1, these \(A^+ y\) are less sparse. Lots of exotic regression analyses like LASSO involve tampering with \(A^+ y\) somehow. In statistics, it is brazenly assumed that \(A^\ast A\) has full rank all the time.

    • Assuming Gaussian distributed error terms, \(A^+ y\) is a maximum likelihood estimate \(\hat{E}\, [\beta]\). This is a popular theorem in statistics classes. Here are Cosma Shalizi's notes from his machine learning course at CMU.

    • Finally, they are used extensively in Kalman filters. In particular, I've seen them in robotics and aerospace applications. Suppose I have an internal state \(X\) I am uncertain of, some noisy sensor data \(Y_t\) indexed by time steps, and a linear transformation \(A_t\) relating \(X\) to \(Y\) at time \(t\). Kalman filters model statistical uncertainty of internal state as a time step dependent higher dimensional Gaussian. Concretely, this is reflected in a vector \(\hat{X}_t\) of maximum likelihood estimates and a covariance matrix \(P\) of uncertainties. As measurements come in, this model changes following Bayesian updating. In particular:

      $$ \hat{X}_{t+1} = \hat{E}\, [ X_t | Y_t ] $$ These notes by Prof. Robert Lipster of Tel Aviv University do a marvelous job of relating Kalman filter updating to the Moore-Penrose pseudo-inverse matrix. In particular, we have the following recurrence for \(\hat{X}_t\) and \(P_t\): $$ \begin{align} \hat{X}_{t+1} & = \hat{X}_t + P_t A_t^\ast (A_t P_t A_t^\ast)^+ (Y_t - A_t \hat{X}_t) \\ P_{t+1} & = P_t - P_t A_t^\ast (A_t P_t A_t^\ast)^+ A_t P_t \end{align} $$
    Comment Source:> I've never understood these pseudo-inverses; if I could understand them using adjoints that would make me very happy. I've seem them crop up a bunch of places in statistics and machine learning. It's a nice exercise to revisit its uses, so I've made some notes. Suppose I have a linear model represented by a linear map \\(A\\) and some measurements \\(\vec{y}\\). I want to find an input \\(\vec{\beta}\\) such that $$ A \vec{\beta} \approx \vec{y} $$ Suppose that \\(A\\) can be represented by a matrix \\(M \in \mathbb{R}^{m \times n}\\) where \\(m > n\\). Then we can naturally think of \\(A\\) as an *overdetermined* set of linear equations. Setting $$ \hat{\beta} = A^+ y $$ Where \\(A^+\\) is the Moore-Penrose pseudo inverse of \\(A\\) yields the [Ordinary Least Squares (OLS)](https://en.wikipedia.org/wiki/Ordinary_least_squares) solution to \\(A \vec{\beta} \approx y\\). That is to say: $$ \hat{\beta} = \underset{\vec{b}}{\mathrm{argmin}}\, || y - A \vec{b} || $$ If \\(A^\ast A\\) has [full rank](https://en.wikipedia.org/wiki/Rank_(linear_algebra)) then: $$ A^+ = (A^\ast A)^{-1} A^\ast $$ Otherwise you can: - Use [QR decomposition](https://en.wikipedia.org/wiki/QR_decomposition) - Use [*singular value decomposition*](https://en.wikipedia.org/wiki/Singular-value_decomposition). Joel Burke goes over this in his excellent [notes](http://robotics.caltech.edu/~jwb/courses/ME115/handouts/pseudo.pdf) for his Caltech class. According to [this stack exchange](https://stats.stackexchange.com/a/1882), this is slower but more numerically stable than QR decomposition. I've also seen the pseudo-inverse kicking around elsewhere: - In statistics and convex optimization, \\(A^+ y\\) is an [L2-norm](https://en.wikipedia.org/wiki/Lp_space#Statistics) minimizer. In machine learning this is important. Compared to minimizers of other norms, such as L1, these \\(A^+ y\\) are less sparse. Lots of exotic regression analyses like [LASSO](https://en.wikipedia.org/wiki/Lasso_(statistics)#Orthonormal_covariates) involve tampering with \\(A^+ y\\) somehow. In statistics, it is brazenly assumed that \\(A^\ast A\\) has full rank all the time. - Assuming Gaussian distributed error terms, \\(A^+ y\\) is a maximum likelihood estimate \\(\hat{E}\, [\beta]\\). This is a popular theorem in statistics classes. Here are Cosma Shalizi's notes from his [machine learning course](http://www.stat.cmu.edu/~cshalizi/mreg/15/lectures/06/lecture-06.pdf) at CMU. - Finally, they are used extensively in Kalman filters. In particular, I've seen them in robotics and aerospace applications. Suppose I have an internal state \\(X\\) I am uncertain of, some noisy sensor data \\(Y_t\\) indexed by time steps, and a linear transformation \\(A_t\\) relating \\(X\\) to \\(Y\\) at time \\(t\\). Kalman filters model statistical uncertainty of internal state as a time step dependent higher dimensional Gaussian. Concretely, this is reflected in a vector \\(\hat{X}_t\\) of maximum likelihood estimates and a covariance matrix \\(P\\) of uncertainties. As measurements come in, this model changes following [Bayesian updating](https://en.wikipedia.org/wiki/Bayesian_inference). In particular: $$ \hat{X}_{t+1} = \hat{E}\, [ X_t | Y_t ] $$ [These notes](https://www.eng.tau.ac.il/~liptser/lectures/lect_new6.pdf) by Prof. Robert Lipster of Tel Aviv University do a marvelous job of relating Kalman filter updating to the Moore-Penrose pseudo-inverse matrix. In particular, we have the following recurrence for \\(\hat{X}_t\\) and \\(P_t\\): $$ \begin{align} \hat{X}_{t+1} & = \hat{X}_t + P_t A_t^\ast (A_t P_t A_t^\ast)^+ (Y_t - A_t \hat{X}_t) \\ P_{t+1} & = P_t - P_t A_t^\ast (A_t P_t A_t^\ast)^+ A_t P_t \end{align} $$
  • 4.
    edited May 3

    Alright, I am ready to reverse my previous position that \(A^+\) is the right/left adjoint of \(A\).

    In fact, I believe I have found a theorem which makes things a little clearer:

    Theorem. Let \(A : V \to W \) be a linear map.

    1. If \(A\) is nonsingular, then $$ A^{-1}_! \dashv A_! \dashv A^{-1}_! $$ on the category of lattices of subspaces
    2. If \(A\) is singular, then it is impossible for the image of a linear map to be either left or right adjoint to \(A_!\)

    Proof. To prove part 1, observe that \(A\) being nonsingular implies it is invertible. We saw in Puzzle 11 that this means that \(A^{-1}_!\) is both a left and right adjoint.

    Next, to prove part 2, assume that \(A\) is singular. Then the kernel of \(A\) is nonempty. We write this as \(\mathrm{ker}(A) \neq \varnothing\). Now assume towards a contradiction there is some linear map \(R\) that where \(A_! \dashv R_!\). By the characterization of Galois connections I mention in lecture 5, this implies \(X \subseteq (R_! \circ A_!) (X) \) for all subspaces of \(X \subseteq V\). This means in particular that \(V \subseteq (R_! \circ A_!) (V)\). Since \(R_! \circ A_! : V \to V\), this means \(V = (R_! \circ A_!) (V)\). However, we know that \(R_! \circ A_! = (R \circ A)_!\). It must be that \(\mathrm{ker}(R \circ A) = \varnothing\). But we know \(\mathrm{ker}(A) \subseteq \mathrm{ker}(R \circ A)\) and \(\mathrm{ker}(A) \neq \varnothing\) by hypothesis, so \(\mathrm{ker}(R \circ A) \neq \varnothing\). But then \(\mathrm{ker}(R \circ A) \neq \varnothing\) and \(\mathrm{ker}(R \circ A) = \varnothing\) simultaneously. ⚡

    The proof that \(A\) cannot have a left adjoint if it is singular is similar. \(\Box\)

    I am now starting to suspect

    $$ A_! \dashv \mathrm{ker}(A) \vee A^+_! $$ But that might be wishful thinking.

    Comment Source:Alright, I am ready to reverse my previous position that \\(A^+\\) is the right/left adjoint of \\(A\\). In fact, I believe I have found a theorem which makes things a little clearer: > **Theorem.** Let \\(A : V \to W \\) be a linear map. > > 1. If \\(A\\) is [nonsingular](http://mathworld.wolfram.com/NonsingularMatrix.html), then $$ A^{-1}_! \dashv A_! \dashv A^{-1}_! $$ on the category of lattices of subspaces > 2. If \\(A\\) is singular, then it is impossible for the image of a linear map to be either left or right adjoint to \\(A_!\\) **Proof.** To prove part 1, observe that \\(A\\) being nonsingular implies it is invertible. We saw in [Puzzle 11](https://forum.azimuthproject.org/discussion/2055/exercises-and-puzzles-5-chapter-1) that this means that \\(A^{-1}_!\\) is both a left and right adjoint. Next, to prove part 2, assume that \\(A\\) is singular. Then the [kernel](https://en.wikipedia.org/wiki/Kernel_(linear_algebra)) of \\(A\\) is nonempty. We write this as \\(\mathrm{ker}(A) \neq \varnothing\\). Now assume towards a contradiction there is some linear map \\(R\\) that where \\(A_! \dashv R_!\\). By the characterization of Galois connections I mention in [lecture 5](https://forum.azimuthproject.org/discussion/comment/16506/#Comment_16506), this implies \\(X \subseteq (R_! \circ A_!) (X) \\) for all subspaces of \\(X \subseteq V\\). This means in particular that \\(V \subseteq (R_! \circ A_!) (V)\\). Since \\(R_! \circ A_! : V \to V\\), this means \\(V = (R_! \circ A_!) (V)\\). However, we know that \\(R_! \circ A_! = (R \circ A)_!\\). It must be that \\(\mathrm{ker}(R \circ A) = \varnothing\\). But we know \\(\mathrm{ker}(A) \subseteq \mathrm{ker}(R \circ A)\\) and \\(\mathrm{ker}(A) \neq \varnothing\\) by hypothesis, so \\(\mathrm{ker}(R \circ A) \neq \varnothing\\). But then \\(\mathrm{ker}(R \circ A) \neq \varnothing\\) and \\(\mathrm{ker}(R \circ A) = \varnothing\\) simultaneously. ⚡ The proof that \\(A\\) cannot have a left adjoint if it is singular is similar. \\(\Box\\) I am now starting to suspect $$ A_! \dashv \mathrm{ker}(A) \vee A^+_! $$ But that might be wishful thinking.
  • 5.

    This stuff is really cool, Matthew! I've been too busy at Applied Category Theory 2018 to dig into it. I hope I get time to do so soon. And I wish Allison Burger would help out, since she seemed to know about this stuff.

    Comment Source:This stuff is really cool, Matthew! I've been too busy at Applied Category Theory 2018 to dig into it. I hope I get time to do so soon. And I wish Allison Burger would help out, since she seemed to know about this stuff.
  • 6.
    edited May 10

    Hey John,

    This is a pretty tough puzzle for me!

    I was digging a little further into this and I am no longer so sure that \(A_! \dashv \mathrm{ker}(A) \vee A^+_!\).

    This is because \(\textrm{image}(A^+) = \textrm{image}(A^\ast)\) (see here for a discussion).

    So I am pretty stumped on this, sadly.

    Comment Source:Hey John, This is a pretty tough puzzle for me! I was digging a little further into this and I am no longer so sure that \\(A_! \dashv \mathrm{ker}(A) \vee A^+_!\\). This is because \\(\textrm{image}(A^+) = \textrm{image}(A^\ast)\\) (see [here](https://en.wikipedia.org/wiki/Proofs_involving_the_Moore%E2%80%93Penrose_inverse#Projectors_and_subspaces) for a discussion). So I am pretty stumped on this, sadly.
Sign In or Register to comment.