Sorry for the confusion but CSALT doesn't use Nutonian Formulize/Eureqa. That is a separate tool which is standalone and fairly pricey. I used it for this analysis when I had a license a year ago. It essentially gives the same result as CSALT, which is my homegrown multiple regression analysis software.

Having said that, the middle link that you provided is where I have the same issues. Clearly, Eureqa uses both the training and validation set for optimizing the model, and doesn't have a truly independent 3rd test set, as the questioner pointed out (and that's why I said in comment #3 that I didn't really know how it works).

The idea behind the general analysis is that the global temperature anomaly T *composes* linearly from its constituent factors. I can't go back to Eureqa which does all sorts of fancy composition of factors, so rely on CSALT for linear multiple regression modeling and prediction