Hi all, I did some work a number of years ago training a 2-stage classification/regression model to distinguish irrigated crops in northwestern India during the Rabi (dry) season. We used logistic regression in the first stage to basically throw out everything that was obviously not irrigated (deserts, bodies of water, cities, etc.). The second stage assigned some number between 0-1 which was meant to roughly reflect the percentage of a pixel that was irrigated. This second stage was trained to distinguish forest from farm etc. and was trained using labels I generated by staring at lots of time stamped high-res satellite images.

Our raw data sets were a bunch of MODIS NDVI measurements at 1km resolution, reanalysis rainfall data, and elevation data from which we computed a local gradient (harder to grow stuff on hillsides). We did some wavelet based feature extraction and ended up with a handful of predictors on which we built the model.

This was all really cool stuff and we ended up with a little time series video of the changes in the extent of irrigated agriculture over 10 years in NW India including the Indus Valley. All this was related to monitoring the effects of an electricity management program (Jyotigram) that had been implemented in India meant to discourage illegal groundwater pumping while increasing the reliability of household electricity.

Anyways, it's been quite a while since I've worked on that kind of stuff and surely haven't thought much about how category theory might play a role. I could see a few general aspects of this type of problem that folks are thinking about using categories for, heterogeneous data integration using sheaves (Michael Robinson, Cliff Joslyn, Emilie Purvine) and the machine learning paper by Spivak, Fong, and Tuyeras, that John pointed to. There is clearly a lot more to be in done on both those fronts, particularly the machine learning front.

I'm curious though to know a little bit more about the types of problems you were hoping categories might help you tackle?