Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Options

Exercises and Puzzles 6 - Chapter 1

edited May 2018

It's time for more exercises!

Monotone functions between discrete posets:

Pullbacks of partitions, with a nice picture by Frederick Eisele:

Showing that there's a category of preorders:

Showing that skeletal dagger preorders are discrete:

Showing that a certain map from a poset of partitions to the Boolean poset $$\mathbb{B}$$ is monotone:

Next, some puzzles about adjoints. For any set $$X$$ let $$P(X)$$ be the power set of set, namely the set of all subsets of $$X$$. It's easy to see that the subset relation $$\subseteq$$ makes $$P(X)$$ into a poset. Suppose we have any function between sets

$$f : X \to Y$$ This gives a function

$$f_{!} : P(X) \to P(Y)$$ sending each subset $$S \subseteq X$$ to its image under $$f$$:

$$f_{!}(S) = \{ y \in Y: \; y = f(x) \textrm{ for some } x \in S \} .$$ Puzzle 17. Show that $$f_{!} : P(X) \to P(Y)$$ is a monotone function.

Puzzle 18. Does $$f_{!}$$ always have a left adjoint? If so, describe it. If not, give an example where it doesn't, and some conditions under which it does have a left adjoint.

Puzzle 19. Does $$f_{!}$$ always have a right adjoint? If so, describe it. If not, give an example where it doesn't, and some conditions under which it does have a right adjoint.

For answers, see the discussion on Lecture 6. One of the adjoints of $$f_{!}$$ only exists when $$f$$ obeys a certain condition!

To see all the exercises in Chapter 1, go here.

• Options
1.
edited April 2018

Hey John,

I have been playing around with these questions in a particular context.

There is still something I am trying to figure out.

Specifically, I am thinking of this puzzle by Allison Burgers in the comments in lecture 6

Take any linear map $$A : V \to W$$. This can be lifted to a monotone map $$A_!$$ on the complete lattices of vector subspaces of $$V$$ and $$W$$.

I am trying to show

$$A^! = A^+_!$$ where $$A^+$$ is the Moore-Penrose pseudo-inverse. This is used all the time in linear regression.

First, $$A_!$$ has arbitrary meets and joins over lattices of subspaces. This is because $$A$$ is a linear operator. This suffices Puzzle 18 and Puzzle 19, and moreover suffices the adjoint functor theorem for posets.

Hence $$A_!$$ has a left adjoint and right adjoint.

I will focus on $$A_!$$'s right adjoint, which is $$A^!$$.

To show this is the Moore-Penrose inverse, we need to show the following axioms are obeyed (as per Wikipedia):

• $$A\circ A^+ \circ A = A$$
• $$A^+\circ A\circ A^+ = A^+$$
• $$(A\circ A^+)^* = A\circ A^+$$
• $$(A^+\circ A)^* = A^+\circ A$$

Here $$T^\ast$$ denotes the Hermitian transpose of $$T$$.

I can show a weakened version of the first two axioms, but I have no idea how to prove the second two axioms :(

Proposition: $$A_! \circ A^! \circ A_! = A_!$$

Proof.

Let $$X \sqsubseteq V$$ be a subspace of $$V$$ and $$Y \sqsubseteq W$$ be a subspace of $$W$$.

First, observe that:

$$A_!(X) \sqsubseteq A_!(X) \tag{\star}$$ Hence by right adjointness:

$$X \sqsubseteq (A^! \circ A_!) (X)$$ Since $$A_!$$ is monotone we have:

$$A_! (X) \sqsubseteq (A_! \circ A^! \circ A_!) (X) \tag{1}$$ Next, using $$(\star)$$ and $$A^!$$ monotone, we have

$$(A^! \circ A_!)(X) \sqsubseteq (A^! \circ A_!) (X)$$ Hence by right adjointness:

$$(A_! \circ A^! \circ A_!)(X) \sqsubseteq A_!(X) \tag{2}$$ (1) and (2) suffice to show, for all $$X$$, $$(A_! \circ A^! \circ A_!) (X) = A_! (X)$$. This suffices the proposition. $$\Box$$

A rather similar proof gives $$A^! \circ A_! \circ A^! = A^!$$.

I suspect that in general for any two adjoint functors $$F \vdash G$$, then $$F \circ G \circ F = F$$ and $$G \circ F \circ G = G$$.

But this is as far as I've got with this. I've got the following problems:

• I am struggling to show the identity $$A^{++} = A$$ holds. In this context this means $$A^! \dashv A_! \dashv A^!$$. This property usually only holds for inverses.

• I am also not quite sure how to recover a linear map $$T$$ where $$A^! = T_!$$.

• Finally, I have now idea how to show $$A_! \circ A^!$$ and $$A^! \circ A_!$$ are Hermite, as per the four axioms.

It is interesting to note that $$A \circ A^+$$ is a projection, and hence a monad. In fact it is the orthogonal projection of $$A$$.

Comment Source:Hey John, I have been playing around with these questions in a particular context. There is still something I am trying to figure out. Specifically, I am thinking of this puzzle by Allison Burgers in [the comments in lecture 6](https://forum.azimuthproject.org/discussion/comment/17149/#Comment_17149) Take any [linear map](https://en.wikipedia.org/wiki/Linear_map) \$$A : V \to W\$$. This can be lifted to a monotone map \$$A_!\$$ on the [complete lattices of vector subspaces](https://en.wikipedia.org/wiki/Linear_subspace#Lattice_of_subspaces) of \$$V\$$ and \$$W\$$. I am trying to show $$A^! = A^+_!$$ where \$$A^+\$$ is the [Moore-Penrose pseudo-inverse](https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse). This is used all the time in [linear regression](https://en.wikipedia.org/wiki/Linear_regression). ------------------------------------------------ First, \$$A_!\$$ has arbitrary meets and joins over lattices of subspaces. This is because \$$A\$$ is a linear operator. This suffices **Puzzle 18** and **Puzzle 19**, and moreover suffices [*the adjoint functor theorem for posets*](https://forum.azimuthproject.org/discussion/2031/lecture-16-chapter-1-the-adjoint-functor-theorem-for-posets). Hence \$$A_!\$$ has a left adjoint and right adjoint. I will focus on \$$A_!\$$'s right adjoint, which is \$$A^!\$$. To show this is the Moore-Penrose inverse, we need to show the following axioms are obeyed (as per [Wikipedia](https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse#Definition)): - \$$A\circ A^+ \circ A = A\$$ - \$$A^+\circ A\circ A^+ = A^+\$$ - \$$(A\circ A^+)^* = A\circ A^+\$$ - \$$(A^+\circ A)^* = A^+\circ A\$$ Here \$$T^\ast\$$ denotes the [Hermitian transpose](https://en.wikipedia.org/wiki/Hermitian_matrix) of \$$T\$$. **I can show a *weakened* version of the first two axioms, but I have no idea how to prove the second two axioms** :( *Proposition*: \$$A_! \circ A^! \circ A_! = A_!\$$ **Proof.** Let \$$X \sqsubseteq V\$$ be a subspace of \$$V\$$ and \$$Y \sqsubseteq W\$$ be a subspace of \$$W\$$. First, observe that: $$A_!(X) \sqsubseteq A_!(X) \tag{\star}$$ Hence by right adjointness: $$X \sqsubseteq (A^! \circ A_!) (X)$$ Since \$$A_!\$$ is monotone we have: $$A_! (X) \sqsubseteq (A_! \circ A^! \circ A_!) (X) \tag{1}$$ Next, using \$$(\star)\$$ and \$$A^!\$$ monotone, we have $$(A^! \circ A_!)(X) \sqsubseteq (A^! \circ A_!) (X)$$ Hence by right adjointness: $$(A_! \circ A^! \circ A_!)(X) \sqsubseteq A_!(X) \tag{2}$$ (1) and (2) suffice to show, for all \$$X\$$, \$$(A_! \circ A^! \circ A_!) (X) = A_! (X) \$$. This suffices the proposition. \$$\Box\$$ A rather similar proof gives \$$A^! \circ A_! \circ A^! = A^!\$$. I suspect that in general for *any* two adjoint functors \$$F \vdash G\$$, then \$$F \circ G \circ F = F\$$ and \$$G \circ F \circ G = G\$$. But this is as far as I've got with this. I've got the following problems: - I am struggling to show the identity [\$$A^{++} = A\$$](https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse#Basic_properties) holds. In this context this means \$$A^! \dashv A_! \dashv A^!\$$. This property *usually* only holds for inverses. - I am also not quite sure how to recover a linear map \$$T\$$ where \$$A^! = T_!\$$. - Finally, I have now idea how to show \$$A_! \circ A^!\$$ and \$$A^! \circ A_!\$$ are Hermite, as per the four axioms. It is interesting to note that \$$A \circ A^+\$$ is a [projection](https://en.wikipedia.org/wiki/Projection_(linear_algebra)), and hence a monad. In fact it is the [orthogonal projection of \$$A\$$](http://www.math.lsa.umich.edu/~speyer/417/OrthoProj.pdf).
• Options
2.
edited April 2018

Matthew - I'm glad you've been working on Allison Burger's puzzles! I found them interesting but hadn't made much progress on them. I've contacted her... maybe she can help us out.

I've never understood these pseudo-inverses; if I could understand them using adjoints that would make me very happy.

Comment Source:Matthew - I'm glad you've been working on [Allison Burger's puzzles](https://forum.azimuthproject.org/discussion/comment/17149/#Comment_17149)! I found them interesting but hadn't made much progress on them. I've contacted her... maybe she can help us out. I've never understood these pseudo-inverses; if I could understand them using adjoints that would make me very happy.
• Options
3.
edited May 2018

I've never understood these pseudo-inverses; if I could understand them using adjoints that would make me very happy.

I've seem them crop up a bunch of places in statistics and machine learning.

It's a nice exercise to revisit its uses, so I've made some notes.

Suppose I have a linear model represented by a linear map $$A$$ and some measurements $$\vec{y}$$. I want to find an input $$\vec{\beta}$$ such that

$$A \vec{\beta} \approx \vec{y}$$ Suppose that $$A$$ can be represented by a matrix $$M \in \mathbb{R}^{m \times n}$$ where $$m > n$$. Then we can naturally think of $$A$$ as an overdetermined set of linear equations.

Setting

$$\hat{\beta} = A^+ y$$ Where $$A^+$$ is the Moore-Penrose pseudo inverse of $$A$$ yields the Ordinary Least Squares (OLS) solution to $$A \vec{\beta} \approx y$$.

That is to say:

$$\hat{\beta} = \underset{\vec{b}}{\mathrm{argmin}}\, || y - A \vec{b} ||$$ If $$A^\ast A$$ has full rank then:

$$A^+ = (A^\ast A)^{-1} A^\ast$$ Otherwise you can:

According to this stack exchange, this is slower but more numerically stable than QR decomposition.

I've also seen the pseudo-inverse kicking around elsewhere:

• In statistics and convex optimization, $$A^+ y$$ is an L2-norm minimizer. In machine learning this is important. Compared to minimizers of other norms, such as L1, these $$A^+ y$$ are less sparse. Lots of exotic regression analyses like LASSO involve tampering with $$A^+ y$$ somehow. In statistics, it is brazenly assumed that $$A^\ast A$$ has full rank all the time.

• Assuming Gaussian distributed error terms, $$A^+ y$$ is a maximum likelihood estimate $$\hat{E}\, [\beta]$$. This is a popular theorem in statistics classes. Here are Cosma Shalizi's notes from his machine learning course at CMU.

• Finally, they are used extensively in Kalman filters. In particular, I've seen them in robotics and aerospace applications. Suppose I have an internal state $$X$$ I am uncertain of, some noisy sensor data $$Y_t$$ indexed by time steps, and a linear transformation $$A_t$$ relating $$X$$ to $$Y$$ at time $$t$$. Kalman filters model statistical uncertainty of internal state as a time step dependent higher dimensional Gaussian. Concretely, this is reflected in a vector $$\hat{X}_t$$ of maximum likelihood estimates and a covariance matrix $$P$$ of uncertainties. As measurements come in, this model changes following Bayesian updating. In particular:

$$\hat{X}_{t+1} = \hat{E}\, [ X_t | Y_t ]$$ These notes by Prof. Robert Lipster of Tel Aviv University do a marvelous job of relating Kalman filter updating to the Moore-Penrose pseudo-inverse matrix. In particular, we have the following recurrence for $$\hat{X}_t$$ and $$P_t$$: \begin{align} \hat{X}_{t+1} & = \hat{X}_t + P_t A_t^\ast (A_t P_t A_t^\ast)^+ (Y_t - A_t \hat{X}_t) \\ P_{t+1} & = P_t - P_t A_t^\ast (A_t P_t A_t^\ast)^+ A_t P_t \end{align}
Comment Source:> I've never understood these pseudo-inverses; if I could understand them using adjoints that would make me very happy. I've seem them crop up a bunch of places in statistics and machine learning. It's a nice exercise to revisit its uses, so I've made some notes. Suppose I have a linear model represented by a linear map \$$A\$$ and some measurements \$$\vec{y}\$$. I want to find an input \$$\vec{\beta}\$$ such that $$A \vec{\beta} \approx \vec{y}$$ Suppose that \$$A\$$ can be represented by a matrix \$$M \in \mathbb{R}^{m \times n}\$$ where \$$m > n\$$. Then we can naturally think of \$$A\$$ as an *overdetermined* set of linear equations. Setting $$\hat{\beta} = A^+ y$$ Where \$$A^+\$$ is the Moore-Penrose pseudo inverse of \$$A\$$ yields the [Ordinary Least Squares (OLS)](https://en.wikipedia.org/wiki/Ordinary_least_squares) solution to \$$A \vec{\beta} \approx y\$$. That is to say: $$\hat{\beta} = \underset{\vec{b}}{\mathrm{argmin}}\, || y - A \vec{b} ||$$ If \$$A^\ast A\$$ has [full rank](https://en.wikipedia.org/wiki/Rank_(linear_algebra)) then: $$A^+ = (A^\ast A)^{-1} A^\ast$$ Otherwise you can: - Use [QR decomposition](https://en.wikipedia.org/wiki/QR_decomposition) - Use [*singular value decomposition*](https://en.wikipedia.org/wiki/Singular-value_decomposition). Joel Burke goes over this in his excellent [notes](http://robotics.caltech.edu/~jwb/courses/ME115/handouts/pseudo.pdf) for his Caltech class. According to [this stack exchange](https://stats.stackexchange.com/a/1882), this is slower but more numerically stable than QR decomposition. I've also seen the pseudo-inverse kicking around elsewhere: - In statistics and convex optimization, \$$A^+ y\$$ is an [L2-norm](https://en.wikipedia.org/wiki/Lp_space#Statistics) minimizer. In machine learning this is important. Compared to minimizers of other norms, such as L1, these \$$A^+ y\$$ are less sparse. Lots of exotic regression analyses like [LASSO](https://en.wikipedia.org/wiki/Lasso_(statistics)#Orthonormal_covariates) involve tampering with \$$A^+ y\$$ somehow. In statistics, it is brazenly assumed that \$$A^\ast A\$$ has full rank all the time. - Assuming Gaussian distributed error terms, \$$A^+ y\$$ is a maximum likelihood estimate \$$\hat{E}\, [\beta]\$$. This is a popular theorem in statistics classes. Here are Cosma Shalizi's notes from his [machine learning course](http://www.stat.cmu.edu/~cshalizi/mreg/15/lectures/06/lecture-06.pdf) at CMU. - Finally, they are used extensively in Kalman filters. In particular, I've seen them in robotics and aerospace applications. Suppose I have an internal state \$$X\$$ I am uncertain of, some noisy sensor data \$$Y_t\$$ indexed by time steps, and a linear transformation \$$A_t\$$ relating \$$X\$$ to \$$Y\$$ at time \$$t\$$. Kalman filters model statistical uncertainty of internal state as a time step dependent higher dimensional Gaussian. Concretely, this is reflected in a vector \$$\hat{X}_t\$$ of maximum likelihood estimates and a covariance matrix \$$P\$$ of uncertainties. As measurements come in, this model changes following [Bayesian updating](https://en.wikipedia.org/wiki/Bayesian_inference). In particular: $$\hat{X}_{t+1} = \hat{E}\, [ X_t | Y_t ]$$ [These notes](https://www.eng.tau.ac.il/~liptser/lectures/lect_new6.pdf) by Prof. Robert Lipster of Tel Aviv University do a marvelous job of relating Kalman filter updating to the Moore-Penrose pseudo-inverse matrix. In particular, we have the following recurrence for \$$\hat{X}_t\$$ and \$$P_t\$$: \begin{align} \hat{X}_{t+1} & = \hat{X}_t + P_t A_t^\ast (A_t P_t A_t^\ast)^+ (Y_t - A_t \hat{X}_t) \\ P_{t+1} & = P_t - P_t A_t^\ast (A_t P_t A_t^\ast)^+ A_t P_t \end{align}
• Options
4.
edited May 2018

Alright, I am ready to reverse my previous position that $$A^+$$ is the right/left adjoint of $$A$$.

In fact, I believe I have found a theorem which makes things a little clearer:

Theorem. Let $$A : V \to W$$ be a linear map.

1. If $$A$$ is nonsingular, then $$A^{-1}_! \dashv A_! \dashv A^{-1}_!$$ on the category of lattices of subspaces
2. If $$A$$ is singular, then it is impossible for the image of a linear map to be either left or right adjoint to $$A_!$$

Proof. To prove part 1, observe that $$A$$ being nonsingular implies it is invertible. We saw in Puzzle 11 that this means that $$A^{-1}_!$$ is both a left and right adjoint.

Next, to prove part 2, assume that $$A$$ is singular. Then the kernel of $$A$$ is nonempty. We write this as $$\mathrm{ker}(A) \neq \varnothing$$. Now assume towards a contradiction there is some linear map $$R$$ that where $$A_! \dashv R_!$$. By the characterization of Galois connections I mention in lecture 5, this implies $$X \subseteq (R_! \circ A_!) (X)$$ for all subspaces of $$X \subseteq V$$. This means in particular that $$V \subseteq (R_! \circ A_!) (V)$$. Since $$R_! \circ A_! : V \to V$$, this means $$V = (R_! \circ A_!) (V)$$. However, we know that $$R_! \circ A_! = (R \circ A)_!$$. It must be that $$\mathrm{ker}(R \circ A) = \varnothing$$. But we know $$\mathrm{ker}(A) \subseteq \mathrm{ker}(R \circ A)$$ and $$\mathrm{ker}(A) \neq \varnothing$$ by hypothesis, so $$\mathrm{ker}(R \circ A) \neq \varnothing$$. But then $$\mathrm{ker}(R \circ A) \neq \varnothing$$ and $$\mathrm{ker}(R \circ A) = \varnothing$$ simultaneously. ⚡

The proof that $$A$$ cannot have a left adjoint if it is singular is similar. $$\Box$$

I am now starting to suspect

$$A_! \dashv \mathrm{ker}(A) \vee A^+_!$$ But that might be wishful thinking.

Comment Source:Alright, I am ready to reverse my previous position that \$$A^+\$$ is the right/left adjoint of \$$A\$$. In fact, I believe I have found a theorem which makes things a little clearer: > **Theorem.** Let \$$A : V \to W \$$ be a linear map. > > 1. If \$$A\$$ is [nonsingular](http://mathworld.wolfram.com/NonsingularMatrix.html), then $$A^{-1}_! \dashv A_! \dashv A^{-1}_!$$ on the category of lattices of subspaces > 2. If \$$A\$$ is singular, then it is impossible for the image of a linear map to be either left or right adjoint to \$$A_!\$$ **Proof.** To prove part 1, observe that \$$A\$$ being nonsingular implies it is invertible. We saw in [Puzzle 11](https://forum.azimuthproject.org/discussion/2055/exercises-and-puzzles-5-chapter-1) that this means that \$$A^{-1}_!\$$ is both a left and right adjoint. Next, to prove part 2, assume that \$$A\$$ is singular. Then the [kernel](https://en.wikipedia.org/wiki/Kernel_(linear_algebra)) of \$$A\$$ is nonempty. We write this as \$$\mathrm{ker}(A) \neq \varnothing\$$. Now assume towards a contradiction there is some linear map \$$R\$$ that where \$$A_! \dashv R_!\$$. By the characterization of Galois connections I mention in [lecture 5](https://forum.azimuthproject.org/discussion/comment/16506/#Comment_16506), this implies \$$X \subseteq (R_! \circ A_!) (X) \$$ for all subspaces of \$$X \subseteq V\$$. This means in particular that \$$V \subseteq (R_! \circ A_!) (V)\$$. Since \$$R_! \circ A_! : V \to V\$$, this means \$$V = (R_! \circ A_!) (V)\$$. However, we know that \$$R_! \circ A_! = (R \circ A)_!\$$. It must be that \$$\mathrm{ker}(R \circ A) = \varnothing\$$. But we know \$$\mathrm{ker}(A) \subseteq \mathrm{ker}(R \circ A)\$$ and \$$\mathrm{ker}(A) \neq \varnothing\$$ by hypothesis, so \$$\mathrm{ker}(R \circ A) \neq \varnothing\$$. But then \$$\mathrm{ker}(R \circ A) \neq \varnothing\$$ and \$$\mathrm{ker}(R \circ A) = \varnothing\$$ simultaneously. ⚡ The proof that \$$A\$$ cannot have a left adjoint if it is singular is similar. \$$\Box\$$ I am now starting to suspect $$A_! \dashv \mathrm{ker}(A) \vee A^+_!$$ But that might be wishful thinking.
• Options
5.

This stuff is really cool, Matthew! I've been too busy at Applied Category Theory 2018 to dig into it. I hope I get time to do so soon. And I wish Allison Burger would help out, since she seemed to know about this stuff.

Comment Source:This stuff is really cool, Matthew! I've been too busy at Applied Category Theory 2018 to dig into it. I hope I get time to do so soon. And I wish Allison Burger would help out, since she seemed to know about this stuff.
• Options
6.
edited May 2018

Hey John,

This is a pretty tough puzzle for me!

I was digging a little further into this and I am no longer so sure that $$A_! \dashv \mathrm{ker}(A) \vee A^+_!$$.

This is because $$\textrm{image}(A^+) = \textrm{image}(A^\ast)$$ (see here for a discussion).

So I am pretty stumped on this, sadly.

Comment Source:Hey John, This is a pretty tough puzzle for me! I was digging a little further into this and I am no longer so sure that \$$A_! \dashv \mathrm{ker}(A) \vee A^+_!\$$. This is because \$$\textrm{image}(A^+) = \textrm{image}(A^\ast)\$$ (see [here](https://en.wikipedia.org/wiki/Proofs_involving_the_Moore%E2%80%93Penrose_inverse#Projectors_and_subspaces) for a discussion). So I am pretty stumped on this, sadly.