I think Peter referred to this text:
Joshua S. Bloom and Joseph W. Richards, [Data mining and machine learning in time-domain discovery & classification (2011)](http://arxiv.org/pdf/1104.3142.pdf)
It has very clear descriptions of some ML methods. I've tried to summarise them but I'm not sure which wiki page would be most appropriate to post it on if it's of any use?
* Kernel density estimation (KDE) classifier: class-wise feature distributions are estimated using a non-parametric kernel smoother.
Con: difficulties estimating accurate densities in high-dimensional feature spaces (the curse)
* Naive Bayes classifier: class-wose KDE on one feature at a time, assuming zero covariance between features
Con: zero covariance is unlikely to be true.
* Bayesian network classifier: assumes a sparse, graphical conditional dependence structure among features.
* Gaussian mixture classifier: assumes that the feature distribution follows a multi-variate Gaussian distribution where the mean and covariance of each distribution are estimated from the training data.
* Random forest classifier: significantly statistically outperforms Gaussian mixture classifier.
* Quadratic discriminant analysis (QDA) classifier (or linear discriminant analysis *(LDA) classifier if pooled covariance estimates are used): refer to the type of boundaries used between features.
* Support vector machines (SVMs): find the maximum-margin hyperplane to separate instances of each pair of classes. Kernelisation of an AVM can easily be used to find non-linear class boundaries.
* K-nearest neighbours (KNN) classifier: predicts the class of each object by volting its K nearest neighbours in feature space, implicitly estimating the class decision boundaries non-parametrically.
* Classification trees: perform recursive binary partitioning of the feature space to arrive at a set of pure, disjoint regions.
* can capture complicated class boundaries
* are robust to outliers
* are immune to irrelevant features
* easily cope with missing feature values.
They have a high variance wrt. their training set due to their hierarchical nature.
Bagging, boosting and random forest overcome this by building many trees to bootstrapped versions of the training data and averaging their results.
* Artificial neural networks: non-linear regression models which predict class as a non-linear function of linear combinations of input features.
* computational difficulty with many local optima
* lack of interpretability (said by the authors to have severely diminished their popularity among the statistics community (2011).