Scott Fleischman: Thanks for plugging my [*Compiling to categories*](http://conal.net/papers/compiling-to-categories/) paper.
Looking for additional fun and useful interpretations of functional programs is indeed one of my motivations for joining the class.

Since writing that paper, I've further developed categories of differentiable functions and have begun applying them to elegant machine learning by directly differentiating Haskell programs.
The current state of this work is described in a very recent paper [*The simple essence of automatic differentiation (Differentiable functional programming made easy)*](http://conal.net/papers/essence-of-ad).

Abstract:



Automatic differentiation (AD) in reverse mode (RAD) is a central component of deep learning and other uses of large-scale optimization.
Commonly used RAD algorithms such as backpropagation, however, are complex and stateful, hindering deep understanding, improvement, and parallel execution.
This paper develops a simple, generalized AD algorithm calculated from a simple, natural specification.
The general algorithm can be specialized by varying the representation of derivatives.
In particular, applying well-known constructions to a naive representation yields two RAD algorithms that are far simpler than previously known.
In contrast to commonly used RAD implementations, the algorithms defined here involve no graphs, tapes, variables, partial derivatives, or mutation.
They are inherently parallel-friendly, correct by construction, and usable directly from an existing programming language with no need for new data types or programming style, thanks to use of an AD-agnostic compiler plugin.



Here's an older version of the abstract, making the categorical nature more explicit:



Automatic differentiation (AD) is often presented in two forms: forward mode and reverse mode.
Forward mode is quite simple to implement and package via operator overloading, but is inefficient for many problems of practical interest such as deep learning and other uses of gradient-based optimization.
Reverse mode (including its specialization, backpropagation) is much more efficient for these problems but is also typically given much more complicated explanations and implementations.
This paper develops a very simple specification and implementation for mode-independent AD based on the vocabulary of categories.
Although the categorical vocabulary would be awkward to write in directly, one can instead write regular Haskell programs to be converted to this vocabulary automatically (via a compiler plugin) and then interpreted as differentiable functions.
The result is direct, exact, and efficient differentiation with no notational overhead.
The specification and implementation are generalized considerably by parameterizing over an underlying category.
This generalization is then easily specialized to two variations of reverse-mode AD.
These reverse-mode implementations are much simpler than previously known and are composed from two generally useful category transformers: continuation-passing and dualization.
All of the implementations are calculated from simple, homomorphic specifications and so are correct by construction.
The dualized variant is suitable for gradient-based optimization and is particularly compelling in simplicity and efficiency, requiring no matrix-level representations or computations.



Questions and comments on these papers are most welcome!