#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Options

# James Sethna's "Sloppy Models"

Has anybody looked into James Sethna's Sloppy Models? It looks relevant the simple network oriented model approaches that people here are interested in.

• Options
1.

I have Sethna's book "Entropy, Order Parameters, and Complexity" and he also uses the term "sloppy" a couple of times. I think of sloppy as what my Physics instructors referred to as "first-order" physical models, and continue to use that to this day.

I do like Sethna's book, which is full of asides and creative explanations.

Comment Source:I have Sethna's book "Entropy, Order Parameters, and Complexity" and he also uses the term "sloppy" a couple of times. I think of sloppy as what my Physics instructors referred to as "first-order" physical models, and continue to use that to this day. I do like Sethna's book, which is full of asides and creative explanations. 
• Options
2.

As I understand it sloppyness is not related to being first order. It has to do with the shape of the region of the configurations space that a system spends its time in. A model is sloppy if it this has the shape of what they call a hyper-ribbon. This is when each principal axis of this region is many times smaller than the last, if you arrange them in descending size. This means that a few principal axes capture most of the observable variation of the system and the system can be effectively modelled with a few parameters. All systems treated by thermodynamics, statistical mechanics, renormalization must necesarilly be like this, but sloppyness is supposed deeper and more general concept.

Comment Source:As I understand it sloppyness is not related to being first order. It has to do with the shape of the region of the configurations space that a system spends its time in. A model is sloppy if it this has the shape of what they call a hyper-ribbon. This is when each principal axis of this region is many times smaller than the last, if you arrange them in descending size. This means that a few principal axes capture most of the observable variation of the system and the system can be effectively modelled with a few parameters. All systems treated by thermodynamics, statistical mechanics, renormalization must necesarilly be like this, but sloppyness is supposed deeper and more general concept.
• Options
3.

OK, Sethna actually had the phrase "sloppy notation" indexed in the back of his book and so that is how I was able to find it. He was using it in terms of making first-order approximations, such as applying Maxwell-Boltzmann as an approximation for Fermi-Dirac statistics at higher temperatures.

This other appropriation also makes sense. It is likely related to principle components in some way -- i.e the factors that when ranked by correlation are highest are the best candidates for detailed evaluation. The lower ranked factors have less impact and are considered nuisance for further evaluation -- having too many of these around will lead you down the path of the over-fitting rabbit hole.

In that case, my ENSO model is probably described as "sloppy". The main factor that I use is the QBO forcing. If I veer away from the as-measured 2.33 year forcing sinusoidal even slightly, the fit goes to pot. The other parameter is the Chandler Wobble beat period of ~6 years, which is a little less sensitive to the specific value but is necessary. Remove either of these forcings and the standing wave envelope of the ENSO time-series cannot be concisely replicated. In other words, the shape locks in stiffly with just a few parameters. It is tempting for me to try additional factors but this page by Sethna illustrates what can happen if you take that path.

I appreciate that you linked to this sloppy model concept, because I wasn't aware that it had this name. I would also think this would have a parallel as an information criteria concept adapted from statistical modeling.

Comment Source:OK, Sethna actually had the phrase "sloppy notation" indexed in the back of his book and so that is how I was able to find it. He was using it in terms of making first-order approximations, such as applying Maxwell-Boltzmann as an approximation for Fermi-Dirac statistics at higher temperatures. This other appropriation also makes sense. It is likely related to principle components in some way -- i.e the factors that when ranked by correlation are highest are the best candidates for detailed evaluation. The lower ranked factors have less impact and are considered nuisance for further evaluation -- having too many of these around will lead you down the path of the over-fitting rabbit hole. In that case, my ENSO model is probably described as "sloppy". The main factor that I use is the QBO forcing. If I veer away from the as-measured 2.33 year forcing sinusoidal even slightly, the fit goes to pot. The other parameter is the Chandler Wobble beat period of ~6 years, which is a little less sensitive to the specific value but is necessary. Remove either of these forcings and the standing wave envelope of the ENSO time-series cannot be concisely replicated. In other words, the shape locks in stiffly with just a few parameters. It is tempting for me to try additional factors but this [page by Sethna](http://www.lassp.cornell.edu/sethna/Sloppy/DoParametersMatter.html) illustrates what can happen if you take that path. I appreciate that you linked to this sloppy model concept, because I wasn't aware that it had this name. I would also think this would have a parallel as an information criteria concept adapted from statistical modeling. 
• Options
4.
edited May 2015

After reading how Sethna explained the "sloppy model" concept in the contest of biology and genetics research, I was reminded of the climate scientist Isaac Held mention of the "fruit fly" model described on his blog here.

Held acknowledges that only a select few components are often needed to represent the salient characteristics in a climate model, so those are the ones to illustrate when trying to educate others.

"Many of my colleagues are probably tired of hearing me talk about the importance for climate theory of studying a hierarchy of climate models and especially tired of hearing me make the analogy with the hierarchy of model organisms (E. Coli, yeast, fruit fly, zebra fish, mouse, etc) that biologists utilize so effectively. As I have written in a little essay on this subject, biologists have the advantage — their hierarchy is provided by nature, and the conservative character of much of evolution provides confidence that a lot of what we learn from simpler life forms carries over to more complex organisms. Climate theory requires such a hierarchy as well — how else do you go about trying to understand a complex system that you cannot easily perform controlled experiment on? — but we need to construct it ourselves, and agree among ourselves on which idealized systems to study intensively. "

EDIT: I should add that Isaac Held's fruit fly model has been discussed several times on the Azimuth Forum. Enter Isaac Held in the search window and many posts will show up. This might be how I first came across it.

Comment Source:After reading how Sethna explained the "sloppy model" concept in the contest of biology and genetics research, I was reminded of the climate scientist Isaac Held mention of the "fruit fly" model [described on his blog here](http://www.gfdl.noaa.gov/blog/isaac-held/2012/05/25/28-the-fruit-fly-of-climate-models/). Held acknowledges that only a select few components are often needed to represent the salient characteristics in a climate model, so those are the ones to illustrate when trying to educate others. > "Many of my colleagues are probably tired of hearing me talk about the importance for climate theory of studying a hierarchy of climate models and especially tired of hearing me make the analogy with the hierarchy of model organisms (E. Coli, yeast, fruit fly, zebra fish, mouse, etc) that biologists utilize so effectively. As I have written in a [little essay](http://www.gfdl.noaa.gov/blog/isaac-held/2012/05/25/28-the-fruit-fly-of-climate-models/) on this subject, biologists have the advantage — their hierarchy is provided by nature, and the conservative character of much of evolution provides confidence that a lot of what we learn from simpler life forms carries over to more complex organisms. Climate theory requires such a hierarchy as well — how else do you go about trying to understand a complex system that you cannot easily perform controlled experiment on? — but we need to construct it ourselves, and agree among ourselves on which idealized systems to study intensively. " EDIT: I should add that Isaac Held's fruit fly model has been discussed several times on the Azimuth Forum. Enter Isaac Held in the search window and many posts will show up. This might be how I first came across it.
• Options
5.

In that case, my ENSO model is probably described as "sloppy". The main factor that I use is the QBO forcing. If I veer away from the as-measured 2.33 year forcing sinusoidal even slightly, the fit goes to pot. The other parameter is the Chandler Wobble beat period of ~6 years, which is a little less sensitive to the specific value but is necessary. Remove either of these forcings and the standing wave envelope of the ENSO time-series cannot be concisely replicated. In other words, the shape locks in stiffly with just a few parameters.

A main point seems to be here that the parameters should still fit the model outside the considered period, like for some period in the future and I am not sure whether this applies to your model.

Comment Source:>In that case, my ENSO model is probably described as "sloppy". The main factor that I use is the QBO forcing. If I veer away from the as-measured 2.33 year forcing sinusoidal even slightly, the fit goes to pot. The other parameter is the Chandler Wobble beat period of ~6 years, which is a little less sensitive to the specific value but is necessary. Remove either of these forcings and the standing wave envelope of the ENSO time-series cannot be concisely replicated. In other words, the shape locks in stiffly with just a few parameters. A main point seems to be here that the parameters should still fit the model outside the considered period, like for some period in the future and I am not sure whether this applies to your model. 
• Options
6.

"A main point seems to be here that the parameters should still fit the model outside the considered period, like for some period in the future and I am not sure whether this applies to your model."

nad, better than that. Roughly the same parameters are used to fit to hundreds of years of historical proxy records.

The problem with fitting to the future is that the future has not yet arrived, so there is no way to get immediate feedback to see whether it has worked or not.

Comment Source:> "A main point seems to be here that the parameters should still fit the model outside the considered period, like for some period in the future and I am not sure whether this applies to your model." nad, better than that. Roughly the same parameters are used to fit to hundreds of years of historical proxy records. The problem with fitting to the future is that the future has not yet arrived, so there is no way to get immediate feedback to see whether it has worked or not. 
• Options
7.

The problem with fitting to the future is that the future has not yet arrived, so there is no way to get immediate feedback to see whether it has worked or not.

What happens if you pretend it's 1970 you don't know all the values afterwards and then do your fitting procedure ? Do you get the same parameters as in 2014 ?

Comment Source:>The problem with fitting to the future is that the future has not yet arrived, so there is no way to get immediate feedback to see whether it has worked or not. What happens if you pretend it's 1970 you don't know all the values afterwards and then do your fitting procedure ? Do you get the same parameters as in 2014 ? 
• Options
8.

Hi nad, I was a bit surprised that Paul said prediction wasn't too useful 'cos I didn't immediately guess that the future hasn't happened. However there's a hindcast here: https://forum.azimuthproject.org/discussion/comment/14523/#Comment_14523

Comment Source:Hi nad, I was a bit surprised that Paul said prediction wasn't too useful 'cos I didn't immediately guess that the future hasn't happened. However there's a hindcast here: https://forum.azimuthproject.org/discussion/comment/14523/#Comment_14523
• Options
9.
edited May 2015

Yes, I will have to bite the bullet and generate shorter interval fits that I can use as training intervals and then test against out-of-band ranges. I didn't want to do this initially because I wanted first to see how well a model could work over the entire range and not lose coherence. This would be a trade-off to be able to do shorter near-term forecasts.

Jim, I was taking the suggestion too literally. BTW, that hindcast was for an alternative view of the model and doesn't provide an actual prediction, but a comparison of the LHS and RHS of the underlying sloshing DiffEq.

Comment Source:Yes, I will have to bite the bullet and generate shorter interval fits that I can use as training intervals and then test against out-of-band ranges. I didn't want to do this initially because I wanted first to see how well a model could work over the entire range and not lose coherence. This would be a trade-off to be able to do shorter near-term forecasts. Jim, I was taking the suggestion too literally. BTW, that hindcast was for an alternative view of the model and doesn't provide an actual prediction, but a comparison of the LHS and RHS of the underlying sloshing DiffEq. 
• Options
10.

However there's a hindcast here

Unfortunately I don't understand what Paul is doing there, I just have very strong doubts about his machine learning Ansatz.

Comment Source:> However there's a hindcast here Unfortunately I don't understand what Paul is doing there, I just have very strong doubts about his machine learning Ansatz. 
• Options
11.

that hindcast was for an alternative view of the model and doesn't provide an actual prediction

Thanks for the correction. I'd really like to see what prediction a training period of, say 1880-1940 gives?

Comment Source:> that hindcast was for an alternative view of the model and doesn't provide an actual prediction Thanks for the correction. I'd really like to see what prediction a training period of, say 1880-1940 gives?
• Options
12.

"Thanks for the correction. I'd really like to see what prediction a training period of, say 1880-1940 gives?"

I have one set that goes from 1932-present and then I use the same parameters to test from 1880 to 1932. Though the fit is not as good, it seems to be rigid in the sense that it doesn't get better if the parameters are changed significantly.

Comment Source:> "Thanks for the correction. I'd really like to see what prediction a training period of, say 1880-1940 gives?" I have one set that goes from 1932-present and then I use the same parameters to test from 1880 to 1932. Though the fit is not as good, it seems to be rigid in the sense that it doesn't get better if the parameters are changed significantly. 
• Options
13.

"Unfortunately I don't understand what Paul is doing there, I just have very strong doubts about his machine learning Ansatz."

That's perfectly acceptable. The fitting process is not optimal in this case as I didn't program the algorithm to solve the differential equation. It only tried to approximate it by using the 2nd derivative to approximate the LHS and then machine learn the forcing RHS.

Doing this exercise is why I was immediately suspicious of what Per Strandgerg is trying to do on that other thread. I was using the Eureqa machine learning environment, while he was using a neural net. This is not optimal in that it uses the data itself to evaluate the 2nd derivative and then to estimate the coefficients. I think that Per is doing that as well with his "PID" pattern but has since gone quiet about it.

I really am trying to do this model objectively and not resorting to any trickery -- applying tricks but not trickery. This is all part of the journey to hone in on a plausible model.

Comment Source:> "Unfortunately I don't understand what Paul is doing there, I just have very strong doubts about his machine learning Ansatz." That's perfectly acceptable. The fitting process is not optimal in this case as I didn't program the algorithm to solve the differential equation. It only tried to approximate it by using the 2nd derivative to approximate the LHS and then machine learn the forcing RHS. Doing this exercise is why I was immediately suspicious of what [Per Strandgerg](https://forum.azimuthproject.org/discussion/1623/i-ve-cracked-the-enso-code-and-an-introduction-on-me-per-strandberg#latest) is trying to do on that other thread. I was using the Eureqa machine learning environment, while he was using a neural net. This is not optimal in that it uses the data itself to evaluate the 2nd derivative and then to estimate the coefficients. I think that Per is doing that as well with his "PID" pattern but has since gone quiet about it. I really am trying to do this model objectively and not resorting to any trickery -- applying tricks but not trickery. This is all part of the journey to hone in on a plausible model. 
• Options
14.

in #3

It is likely related to principle components in some way

Yes, they are related to non-linear PCA or manifold learning. What makes a model sloppy is that each principal component is orders of magnitude smaller than the rest. The geometric interpretation of sloppiness was developed by Mark Transtrum His work is summarized on these pages

There are also some nice [visualizations][http://pages.physics.cornell.edu/~imk36/sloppy-movies/] of the concepts

Sethna original observation was that many models of complex systems have more many parameters than measurable degrees of freedom of behaviour. Therefore many wildly differing parameter combinations are behaviourally equivalent, but also models with nearly identical parameters may have very different behaviours, because of the cumulative effect of (possibly negligible) across parameters. His point being that it is far more important to have models with correct behaviour than with correct parameter values. An extreme example of this is in statistical mechanics, where the actual values of microscopic parameters are essentially irrelevant and the large scale behaviour of systems can be modelled accurately by assigning random values to the microscopic parameters as long as the values of the values of the macro parameters are preserved. (eg fluid dynamics simulations) On the other hand a simulation using highly accurate but imperfect estimates of individual micro parameters can be way off, particularly if the errors are biased or correlated, since the combined error may give significant errors in the macro parameters. Sethna & Co are claiming that this is also true for complex systems that are still far from the statistical limit.

Sethna gives a nice overview in this talk

This also relates to regularization in machine learning which is required when complex models need to be learned from modest size data sets.

Comment Source:in #3 > It is likely related to principle components in some way Yes, they are related to non-linear PCA or [manifold learning](http://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction#Manifold_learning_algorithms). What makes a model sloppy is that each principal component is orders of magnitude smaller than the rest. The geometric interpretation of sloppiness was developed by [Mark Transtrum](http://www.physics.byu.edu/faculty/transtrum/Publications.html) His work is summarized on these pages * [http://www.physics.byu.edu/faculty/transtrum/SloppyDiffGeom/] * [http://www.physics.byu.edu/faculty/transtrum/SloppyDiffGeom/hyperribbon.html] * [http://www.physics.byu.edu/faculty/transtrum/SloppyDiffGeom/sloppycurvature.html] There are also some nice [visualizations][http://pages.physics.cornell.edu/~imk36/sloppy-movies/] of the concepts Sethna original observation was that many models of complex systems have more many parameters than measurable degrees of freedom of behaviour. Therefore many wildly differing parameter combinations are behaviourally equivalent, but also models with nearly identical parameters may have very different behaviours, because of the cumulative effect of (possibly negligible) across parameters. His point being that it is far more important to have models with correct behaviour than with correct parameter values. An extreme example of this is in statistical mechanics, where the actual values of microscopic parameters are essentially irrelevant and the large scale behaviour of systems can be modelled accurately by assigning random values to the microscopic parameters as long as the values of the values of the macro parameters are preserved. (eg fluid dynamics simulations) On the other hand a simulation using highly accurate but imperfect estimates of individual micro parameters can be way off, particularly if the errors are biased or correlated, since the combined error may give significant errors in the macro parameters. Sethna & Co are claiming that this is also true for complex systems that are still far from the statistical limit. Sethna gives a nice overview in this [talk](http://online.kitp.ucsb.edu/online/nonequil14/sethna/rm/flashtv.html) This also relates to regularization in machine learning which is required when complex models need to be learned from modest size data sets.