Dara, the data is right there on that page linked, if I understand it correctly

"Here we attempt to monitor ENSO by basing the Multivariate ENSO Index (MEI) on the six main observed variables over the tropical Pacific. These six variables are: sea-level pressure (P), zonal (U) and meridional (V) components of the surface wind, sea surface temperature (S), surface air temperature (A), and total cloudiness fraction of the sky (C). These observations have been collected and published in ICOADS for many years. The MEI is computed separately for each of twelve sliding bi-monthly seasons (Dec/Jan, Jan/Feb,..., Nov/Dec). After spatially filtering the individual fields into clusters (Wolter, 1987), the MEI is calculated as the first unrotated Principal Component (PC) of all six observed fields combined. This is accomplished by normalizing the total variance of each field first, and then performing the extraction of the first PC on the co-variance matrix of the combined fields (Wolter and Timlin, 1993). In order to keep the MEI comparable, all seasonal values are standardized with respect to each season and to the 1950-93 reference period. "

What may concern some (it does me a bit) is that they have already performed some analysis -- that of Principal Components -- and that this is processing that may in fact may not be optimal. Who is to say that doing PCA is not obscuring some important feature?

And why do they not include the sea-level component, similar to that which I discovered in [another thread](http://forum.azimuthproject.org/discussion/1480/tidal-records-and-enso/). Is that because it is not important? Or that they have not considered it? It may be best to keep the data in as raw as possible a form, while the machine learning chews on it. As I am sure Dara would suggest, make no assumptions apart from that the data is of some quality.