I'm trying to learn about Hessian-free networks:

* James Martens, [Deep learning via Hessian-free networks (2010)](http://machinelearning.wustl.edu/mlpapers/paper_files/icml2010_Martens10.pdf)

and a very literate blog post:

* Andrew Gibianski, [Hessian-free optimization (2014)]( http://andrew.gibiansky.com/blog/machine-learning/hessian-free-optimization/)