Thanks to David Tanzer for helping to organise this discussion. As has
been mentioned, I'm currently a bit flaky so I would really advise
against any plans that depend upon me delivering certain results in a
certain timeframe. However, to add to the bullet points:

* I'm currently working on some software for doing sparse
linear/bilinear regression against medium-large features vectors. I
hope to get this completed and run it against a big collection of
min/max correlations between
various "measurement points" at different temporal offsets. This is
mainly exploratory, attempting to used (bi-)linear relationships to
provide some ideas for more detailed, physically based models and as
many people have observed El Nino behaviour is definitely not just a
simple linear phenomenon. (The
kind of thing I'm thinking of is, say, a positive correlation
between SF bay and the sea around Japan at the same time is
important, and so is a negative correlation between the areas at
some distance around the El Nino 3.4 box and the points within the
box 3 months later. This might be plausible because, say, due to
energy conservation behaviour outside the box has to move towards
the mean
as the area inside the box moves away from the mean. But a goal is
to avoid doing too much assumption and just explore the data.) The
code is being put up as I'm writing it [on github](, and anyone
is welcome to do anything they wish with it (expecially if I
complete it).

A while back Dara asked whether this code could be used for doing
non-linear fitting, and I didn't get around to answering. To address
that, the code is assuming that

1. you've got a prediction function $f$
of a multivariate parameter $p$ such that
$$f(p) = \sum_{j\in 1:K} f_j(p \intersection P_j)$$
ie, the prediction can be broken into a simple sum of predictors that
depend only on some particular subdivision of $p$ into subsets,

2. That to optimise $f_j(p \intersection P_j)$ you get reasonable
results by optimising over each scalar element of the parameters in
turn for multiple cycles until there's no change. (This is true for
things like linear models, but you can imagine predictors where the
influences of the different variables are so deeply intertwined that
optimising along one co-ordinate without also simultaneously
considering the others will bounce around forever without converging.)

As such, it could be used for fitting against a _known in advance_ set
of non-linear functions providing they aren't too non-linear that 2 no
longer holds.

I really hope to finish this software, at least to the point where I
can provide some interesting plots for the blog article, hopefully further.


Also, just to note that I'm not against binary classification: for classifiers that are inherently based upon a binary decision (eg, SVM's, random hyperplane trees, etc) you really want have a binary output to be trying to estimate. I'm just a little bit wary of any technique that takes a "real number-prediction model" and than makes it binary by applying some form of sigmoid function to the output (eg, a logistic function in logistic regression).