WRT sparsity: If you read papers around the LASSO, etc, you see a general theme that putting in a sparseness prior discourages the model from using randomly occurring correlations between "not actually relevant" inputs and outputs to improve its overall score when fitting in a way that improves model generalization. Of course it's a matter of degree: putting a hugely weighted sparseness prior is likely to bias things, but it does seem like a moderate sparseness prior is helpful.

Regarding an example, on the one hand it'd be good but I'd have to introduce a model and then add a sparsity prior to it, so I'm not sure if it'll grow too big. I'll have a think...