Okay so in Haskell and other compile-time* typed languages each expresion has a type. This lets the compiler catch a lot of problems before the program is run. It turns out that just as you can describe a set theory as a category, you can also do so with a type theory. If you do this to Haskel you get a* category called Hask with types as objects, and functions as morphisms.

* there are actually a lot of very similar but not identical categories that are all called Hask, /basically/ depending on how fine grained you are being, and if you aren't very careful you will get something that isn't quite always associative. This usually doesn't matter. (Its a very engineering, not mathematics, thing. Like how physicists will often ignore subtleties of analysis.)

Using the slightly oddly named 'data' keyword can define new types, by defining how to construct the type, and how to construct its members. For example:

data Maybe a :: Type where
| Just :: a -> Maybe a
| Nothing :: Maybe a

is one way to define the Maybe type constructor.
* \\(a :: b\\) means 'a has type b'.**
** Yes or has _kind_ b, people who know haskell).

This means that given a type, say, Integer, then Maybe Integer is also a type, and you make its members in two ways, by applying Just to an Integer or just using Nothing by itself.*

* I am using very verbose syntax that isn't in Haskell2010 but is handled by the defacto haskell compiler GHC, and wouldn't normally be used with something this simple.)

Now for operations/functions which use the same notation but mean different things for different types, like say "+", haskell has Type Classes, which are neither types nor classes, but rather a way to go from types to values. They are defined on type constructors, so while the typeclass that defines "+" (Num) has instances on types, things like Ratio, Integer, or Maybe Integer, the ones I am going to talk about have instances on type constructors like "Maybe". Those that take one type and produce another.

Now two of these typeclasses have names that might be familiar "Functor" and "Monad". These don't describe a generic functor or monad, but rather elaborate a function on types to types into a functor from Hask to Hask. In Haskell the instances for Maybe look like:

instance Functor Maybe where
fmap f (Just x) = Just (f x)
fmap Nothing = Nothing

"fmap" gives the mapping on morphisms, that is expresions of type A -> B, and produces an expresion of type "Maybe A -> Maybe B" and indeed the Functor instance equips the function on types "Maybe" into an endofunctor. (You can check the functor laws.) (f is the function)

instance Functor => Monad Maybe where
return = Just
(Just x) >>= fm = fm x
(Nothing) >>= fm = Nothing

Where (>>=) has type "Maybe a -> (a -> Maybe b) -> Maybe b"
Natural transformations on endofunctors of Hask end up being parametric functions, (and vice versa). So, "return" is the natural transformation from id to M, (M = Maybe here). And ">>=" (aka bind), gives the semantics of the Keisly category. In the paper they use "extend" which is just a bind that takes the function first (i.e. has type "(a -> M b) -> (M a -> M b)"* ) which is easier to describe in category theory terms. (and avoids the problem of not even locally small categories that defining the monad using the multiplication ("M (M a) -> M a" in haskell types).

My second comment is that in a type theory the size issues are kind of trivial. Either like Hask you just define Type :: Type, which sure lets in a paradox, but actually isn't quite as bad as it would be normally. Hask still works as a programing language type theory just fine, even though its inconsistent. (in at least two other was as well.)

Or you do like actual theorem provers and have an infinite hierarchy, where \\(Type_i :: Type_{i+1}\\), and then Cat can be defined in a level agnostic way, and the pre-sheaf monad is defined to take types in \\(Type_n\\) and produce things in \\(Type_{n+1}\\) and then you can define the multiplication as going down levels or as going to the lifting of "M a" into the higher level. And while technically these definitions live in \\(Type_{\omega}\\), and say the category of level agnostic types would have to live in \\(Type_{\omega + 1}) that Cat is fine too.