David wrote:

> I’m very interested if anyone has any ideas of a non-binary statistic that could be used instead as a prediction variable. I suppose one possibility that could be used is the count of months within the next 5 for which the El Nino 3.4 is above a threshold […]

Nathan wrote:

> Why not the Nino 3.4 index itself? It’s a continuous variable.

I'm sure David has thought of that, but earlier he wrote:

> In terms of things to predict, the only real thoughts I’ve had so far is that I’m reluctant to try to predict a 3-month average based El Nino 3.4, purely because it’s likely to be noisy and hence errors aren’t necessarily indicative of errors on the bigger problem.

so presumably he'd consider the Nino 3.4 itself even more noisy and thus worse.

Personally I don't understand machine learning well enough be sure that trying to predict something noisy is worse than trying to predict a smoothed-out substitute like what David suggested (the 7-month average of Nino 3.4). Obviously you can't predict it as well! But does that mean it's a bad thing to do? It's bad if the algorithm winds up putting a lot of work into predicting irrelevant wiggles. But with a suitable measure of what counts as success (sorry, I'm forgetting the jargon here), one might avoid that.

Anyway, I favor predicting either the Nino 3.4 index or, for some technical reason, a time-averaged version of that. Predicting the Nino 3.4 index has the big sociological advantage that this is something people already do.

Like David, I don't want to have predicting a binary quantity like "is there an El Niño?" as our main goal.