Hi Dara,

I have had very good experiences with scikit. In my work I mainly use the linear model and tree ensemble subpackages of scikit. These can handle any dataset that can reasonably fit in memory (same limitation as R and I think Mathematica). I have it it to train random forests on dense datasets with 250000 rows by 4000 columns, and linear models on sparse data sets with 250000 rows and 500000 columns. The linear models train in minutes or even seconds.

I also use some of the unsupervised libraries like k-means and various matrix decompositions. Matrix decomposition take more time and cannot handle datasets like the above, but all the weather pattern decompositions I posted here on were done with scikit.

You can train SVMs with arbitrary kernels by using sklearn.kernel_approximation.Nystroem followed by a linear method like the SGDClassifier. For large datasets this is much more efficient than say svmlib.
I have used this method to train gaussian svms on datasets of the size quoted above.
I am not aware of any other public libraries that can train nonlinear kernels on large data sets.
SVMs with non-linear kernels are usually very inefficient, since the original algorithms for these use dual space methods which are quadratic in the number of rows.
There are modern faster approximate methods (like the Nystroem kernel above), but I am not aware of libraries implement these.
Fast primal space libraries like liblinear and vowpal-wabbit only handle linear classifiers, although vowpal-wabbit has recently added a multilayer option, but that is not a kernel method.

Scikit does not scale to the google/twitter/facebook kind of big data, but it easily handles of the size we have been looking at here.
It has laso scaled better than R on problems I have worked with.

Daniel