Last week or so it came out an interesting paper using Tropical Geometry helping to understand deep neural networks with ReLU activations. They show [here](https://arxiv.org/pdf/1805.07091.pdf) this clean and elegant characterization:

> the family of functions represented by feedforward neural networks with rectified linear units and integer weights is exactly the family of tropical rational maps.

As you know neural nets have had a shattering effect in industrial applications pushing state of the art boundaries in several machine learning problems and crossing the line separating curiosity from real utility in cases such as speech recognition task to name one.

A classic neural net layer is a map between vector spaces consisting of an affine transformation given by free parameters (called weights) followed by a single, fixed-in-advance, element-wise nonlinearity. A net is a composition of layers. The nonlinearity acts isotropically conmuting with rotations but not translations, so the layers compose nontrivially. The net is used to aproximate an unknown map of which one only knows how it acts on a given set of poits (training set), and the goal is that the aproximation behaves similarly for formerly unknown points (test set). This is done by seeking the best combination of weights.

As it happens, the exact nature of the choosen nonlinearity depends on pragmatic and technical factors as in this dicussion from [practicioners](https://www.reddit.com/r/MachineLearning/comments/73rtd7/d_do_you_still_use_relu_if_so_why/). While traditionally a sigmoidal function (an exponential with saturation) was used by default, other options arise, and ReLU is a very simple one. The function is the double integral of the Dirac pulse, so always 0 until it starts to grow at constant speed. While the sigmoid is a trascendental function, the ReLU is much more cheaply computed so using it reduces training times and is one of the things being tried successfully in several tasks. Despite the practical success, there is consensus about the lack of understanding on **why** neural nets generalize to unseen data so well in practical terms, so this paper is a good stab on thinking about what happens.