>Here in Lithuania, Šarūnas Raudys won a national science award for his work on classifiers in statistics and neural networks. He wrote a book, [Statistical and Neural Classifiers: An Integrated Approach to Design](https://www.amazon.com.au/Statistical-Neural-Classifiers-Integrated-Approach/dp/1852332972).

Thanks for the pointer to the book of Š. Raudys. Machine learning is something that does interest me, but it is not a field that I currently work in. However that work does look like a very good piece of scholarship on pattern recognition - and it does make use of local measures of information, such as the Fisher Information functional. If I manage to find the time, maybe I'll seek to borrow a copy from a university library at some point (after the present chaos is over, of course), and have a bit of a read.

>He describes the following hierarchy of classifiers ... His main conclusion is that it is very important, in training neural networks, that they not overlearn. Which is to say, there always needs to be some noise in the training so that they stay open to new types of learning, which can be more sophisticated, as per this hierarchy.

I haven't heard of this sort of idea before but it sounds interesting, and potentially natural from the point of view of machine learning. Certainly when building convolutional neural networks for image recognition, lower networks learn lines and shapes first, before the network abstracts naturally to more complicated artifacts, depending on the training data. Again, I'm not an expert, but I've had the privilege of rubbing shoulders with a number of people at my office trained in this sort of thing, so I've learned a little through osmosis, so to speak.

>I was intrigued that his hierarchy might relate to the building blocks of my philosophy, which are cognitive frameworks that I call "divisions of everything". I write about them in my presentation, [Time and Space as Representations of Decision-Making](http://www.ms.lt/sodas/Book/20170929TimeSpaceDecisionMaking). My thought was that these classifiers might be distinguished by an increasing number of perspectives, from one to seven.

Thanks for sharing your earlier work. I've had a quick glance through what you've written.

In terms of a seven fold hierarchy of classifiers, I think that there might be potentially a relationship with the idea of Lens Categories. I am reminded of a conversation that I had with someone (a Professor of Biomedical Engineering) in the states in early 2019. His question was really an optics question, but nonetheless there does seem to be a potential link here. After a bit of a chat, I thought that what he mentioned seemed to have some relation to this [1979 paper about the elliptic umbilic diffraction catastrophe](https://www.researchgate.net/publication/252379656_The_Elliptic_Umbilic_Diffraction_Catastrophe). A more modern variant of same establishing an interesting connection with Airy rings can be found [here in this paper](https://www.researchgate.net/publication/231110105_From_Airy_rings_to_the_elliptic_umbilic_diffraction_catastrophe). In particular there seems to be some tie in between categories that have a lens space topology, between optics, and between catastrophe theory. I have not explored these connections very thoroughly.

Nonetheless I think this is a bit tangential to your interests and maybe totally off-track.

>In recent years, I have been looking for similar frameworks in mathematics, and thus have been studying Bott periodicity and Clifford algebras [...]

Yes, so as mentioned, I had a look at your work. A number of questions and observations come to mind.

One set of questions is: What is significant about a hierarchy of classifiers? How would this generalise to other problems apart from pattern recognition?

Another would be: Why seven? Where does this number come from? Can we construct this from a more primitive framework in terms of which a hierarchy of classifiers and their multiplicity are emergent properties?

One observation is that your work might benefit a little from asking yourself questions like: can this be made simpler? How can I express this more mathematically? Can I pare away unnecessary words and concepts here? Which ideas are essential and which are just labels?

Also,

Can I rigorously prove any of the statements in my programme? If I wanted to prove any of these statements, how would I go about that?

>[Hierarchy of classifiers]

I hope you don't mind all of the questions I have posed above. Hopefully they will help to sharpen your ideas.

Nonetheless, I have to admit I am drawn to this emergent property of the work of Raudys, and am intrigued by this particular set of questions that I asked above:

* Why seven? Where does this number come from?
* Can we construct this from a more primitive framework in terms of which a hierarchy of classifiers and their multiplicity are emergent properties?

There is certainly an intriguing mystery here. However a blunt attack on understanding this sort of thing might not succeed; one would need to find an oblique way to try to dig a bit deeper into some of the structure here.