[Here's the comment I couldn't post this morning]

Yes, I'm trying to apply the stuff from the blog from tiny datasets

(where it takes a couple of minutes on my laptop) to a vastly bigger

dataset, namely the El Nino temperatures. In particular, if you take

the correlation between two (not necessarily adjacent points) with $N$

total points you get $N \times (N-1)$ ordered pairs. If you look at the

minimum and maximum normalized correlation between the first point (1) now, (2) 3 months ago and (3)

six months ago and the second (1) at the same time, (2) 3 months

preceding and (2) 6 months preceding you get $(3\times 3-1)*2=16$

possibilities.

So the input data -- the feature data if you will -- for 24 points is

either a $552\times 16$ matrix $X$ or a $8832$ element vector $x$

(depending if you "concatenate" it or not). Suppose

that through discussion we can figure out some plausible "real number"

output $y$. Then my plan is to try to generate predictors of the form

1. $\hat{y}=a^T x+c$ if doing linear regression.

2. $\hat{y}=\sum_{i=1:P}a_i^T X b_i + c$ if doing bilinear regression.

However, with different kinds of sparsity prior plus a variable number

of bilinear vectors $P$ (as far as I'm aware no-one has yet shown that

they "nest" in the same way PCA vectors do) that's $6P$ models to learn

on what's quite a medium-size feature vector. (By "big data" standards that's not huge, but the people who do that kind of stuff have big clusters to run on and are using loss functions with known properties that make efficient solution possible, neither of which is true for me.)

These things I'm looking at are (combinations of) models that have been

published quite extensively in the last two or three years. As such, they're known but

not at the point where there is existing easily available software to

solve those models. Part of my reason for focusing on various kinds of

variously sparsified regression is that in that area I

understand the model structure and how to sparsify it without doing

additional cross validation. In addition, I'm hoping that I can

reproduce the "division into two sets" test strategy that Ludescher et

al used, so that it's a quite direct comparsion.

One of the things that makes me a bit

hesitant to look at neural nets, decision forests, etc, at this point

is that I don't understand those well enough to sparsify them

effectively without essentially needing to have a training, test and

validation set which means it'd be looking at a division of the data

into 3 parts, so that it'd be more difficult to compare performance

directly. (Other people might very well understand how to use decision

forests, etc, for this without splitting the data but I don't.)

Yes, there's problems with taking a 7 month average as a proxy for 5

months where the "3-month average" is above some threshold. I'm very

interested if anyone has any ideas of a **non-binary** statistic that

could be used instead as a prediction variable. I suppose one

possibility that could be used is the count of months within the next

5 for which the El Nino 3.4 is above a threshold, although that still

piles a lot of different vectors of feature values at 0. (In case

it's not obvious, the reason I particularly care about this in the

context of (normal|bi-) linear regression is that the regression

function assumes it should try equally hard to hit all of the "target

outputs" you give it, so if there's a heavy concentration onto one

value then it will be heavily biased towards creating a linear

function which goes through that point for lots of outputs, which as

I'm sure you can visualise is quite "unnatural".)

Yes, I'm trying to apply the stuff from the blog from tiny datasets

(where it takes a couple of minutes on my laptop) to a vastly bigger

dataset, namely the El Nino temperatures. In particular, if you take

the correlation between two (not necessarily adjacent points) with $N$

total points you get $N \times (N-1)$ ordered pairs. If you look at the

minimum and maximum normalized correlation between the first point (1) now, (2) 3 months ago and (3)

six months ago and the second (1) at the same time, (2) 3 months

preceding and (2) 6 months preceding you get $(3\times 3-1)*2=16$

possibilities.

So the input data -- the feature data if you will -- for 24 points is

either a $552\times 16$ matrix $X$ or a $8832$ element vector $x$

(depending if you "concatenate" it or not). Suppose

that through discussion we can figure out some plausible "real number"

output $y$. Then my plan is to try to generate predictors of the form

1. $\hat{y}=a^T x+c$ if doing linear regression.

2. $\hat{y}=\sum_{i=1:P}a_i^T X b_i + c$ if doing bilinear regression.

However, with different kinds of sparsity prior plus a variable number

of bilinear vectors $P$ (as far as I'm aware no-one has yet shown that

they "nest" in the same way PCA vectors do) that's $6P$ models to learn

on what's quite a medium-size feature vector. (By "big data" standards that's not huge, but the people who do that kind of stuff have big clusters to run on and are using loss functions with known properties that make efficient solution possible, neither of which is true for me.)

These things I'm looking at are (combinations of) models that have been

published quite extensively in the last two or three years. As such, they're known but

not at the point where there is existing easily available software to

solve those models. Part of my reason for focusing on various kinds of

variously sparsified regression is that in that area I

understand the model structure and how to sparsify it without doing

additional cross validation. In addition, I'm hoping that I can

reproduce the "division into two sets" test strategy that Ludescher et

al used, so that it's a quite direct comparsion.

One of the things that makes me a bit

hesitant to look at neural nets, decision forests, etc, at this point

is that I don't understand those well enough to sparsify them

effectively without essentially needing to have a training, test and

validation set which means it'd be looking at a division of the data

into 3 parts, so that it'd be more difficult to compare performance

directly. (Other people might very well understand how to use decision

forests, etc, for this without splitting the data but I don't.)

Yes, there's problems with taking a 7 month average as a proxy for 5

months where the "3-month average" is above some threshold. I'm very

interested if anyone has any ideas of a **non-binary** statistic that

could be used instead as a prediction variable. I suppose one

possibility that could be used is the count of months within the next

5 for which the El Nino 3.4 is above a threshold, although that still

piles a lot of different vectors of feature values at 0. (In case

it's not obvious, the reason I particularly care about this in the

context of (normal|bi-) linear regression is that the regression

function assumes it should try equally hard to hit all of the "target

outputs" you give it, so if there's a heavy concentration onto one

value then it will be heavily biased towards creating a linear

function which goes through that point for lots of outputs, which as

I'm sure you can visualise is quite "unnatural".)