It looks like you're new here. If you want to get involved, click one of these buttons!

- All Categories 2.3K
- Chat 500
- Study Groups 19
- Petri Nets 9
- Epidemiology 4
- Leaf Modeling 1
- Review Sections 9
- MIT 2020: Programming with Categories 51
- MIT 2020: Lectures 20
- MIT 2020: Exercises 25
- MIT 2019: Applied Category Theory 339
- MIT 2019: Lectures 79
- MIT 2019: Exercises 149
- MIT 2019: Chat 50
- UCR ACT Seminar 4
- General 67
- Azimuth Code Project 110
- Statistical methods 4
- Drafts 2
- Math Syntax Demos 15
- Wiki - Latest Changes 3
- Strategy 113
- Azimuth Project 1.1K
- - Spam 1
- News and Information 147
- Azimuth Blog 149
- - Conventions and Policies 21
- - Questions 43
- Azimuth Wiki 713

Options

Started page on Automatic differentiation for collecting some notes.

## Comments

R has a function for automatic differentiation. Presumably Sage has better ones?

Example using R, a second deriv of $\frac{sin(x)^3+y}{x+atan(y)}$

`R has a function for automatic differentiation. Presumably Sage has better ones? Example using R, a second deriv of $\frac{sin(x)^3+y}{x+atan(y)}$ ~~~~ > foo <- expression((sin(x)^3+y)/(x+atan(y))) > D(D(foo, "x"), "y") -(3 * (cos(x) * sin(x)^2) * (1/(1 + y^2))/(x + atan(y))^2 + (1/(x + atan(y))^2 - (sin(x)^3 + y) * (2 * (1/(1 + y^2) * (x + atan(y))))/((x + atan(y))^2)^2)) ~~~~`

There's a subtle difference between "symbolic differentiation" and "automatic differentiation": symbolic works on expressions like you quoted, automatic works on assignment sequences (although obviously if they define the same function you should get equivalent results). Automatic differentiation is particularly useful for things like

where you get repeated use of subexpressions, in that it generally does a better job of avoiding recomputation of the same expression. (There's more complicated stuff like being able to "differentiate at a point" programs involving loops, etc, but I'm not concerned about that at the moment.)

My interest at the moment is in being able to type in some model from thin air and fit it to data using gradient based methods.

`There's a subtle difference between "symbolic differentiation" and "automatic differentiation": symbolic works on expressions like you quoted, automatic works on assignment sequences (although obviously if they define the same function you should get equivalent results). Automatic differentiation is particularly useful for things like ~~~~ r := a ^ b x := r * exp (r) v := r*sin(x*r)^2 + i*r*cos(x*r)^2 ~~~~ where you get repeated use of subexpressions, in that it generally does a better job of avoiding recomputation of the same expression. (There's more complicated stuff like being able to "differentiate at a point" programs involving loops, etc, but I'm not concerned about that at the moment.) My interest at the moment is in being able to type in some model from thin air and fit it to data using gradient based methods.`

Thanks for the clarification. Still, I think there must be software that does what you want?

`Thanks for the clarification. Still, I think there must be software that does what you want?`

I saw this thread and was going to make a general comment about numerical approximations and noticed that it was already considered on Automatic differentiation.

I come from a background in numerical analysis and my basic philosophy is often driven by the slogan: analytic expressions are overrated. My inclination would be to always go with the numerical approximation to differentiation.

There are some good arguments on the wiki against the idea, but I thought I would point out here some comments and see if anythings makes its way to the wiki.

First, just because something is written down in terms of an analytic expression doesn't mean it is correct. There is a tendency (not here, but I've seen it in various communities) to think the analytic expression is more accurate than the numerical one because it is cleaner.

An example of this is the expression for temperature change in Another look at climate sensitivity:

$$ c \frac{d T}{d t} = R_i(T) - R_o(T)$$ which becomes

$$ c \frac{d T}{d t} = \mu Q_0 (1 - (c_1 + c_2 \frac{1 - tanh [\kappa(T - T_c)]}{2})) - \sigma T^4 (1 - m tanh((T/T_0)^6)).$$ I have absolutely no faith in the accuracy of the right hand side analytical expression. When you do not have faith in the analytic expression, then the value of automatic differentiation is in question.

Everyone here knows this, but just saying it out loud. Might be worth adding something to this effect to the page if we can come up with something clean and succinct.

`I saw this thread and was going to make a general comment about numerical approximations and noticed that it was already considered on [[Automatic differentiation]]. I come from a background in numerical analysis and my basic philosophy is often driven by the slogan: analytic expressions are overrated. My inclination would be to always go with the numerical approximation to differentiation. There are some good arguments on the wiki against the idea, but I thought I would point out here some comments and see if anythings makes its way to the wiki. First, just because something is written down in terms of an analytic expression doesn't mean it is correct. There is a tendency (not here, but I've seen it in various communities) to think the analytic expression is more accurate than the numerical one because it is cleaner. An example of this is the expression for temperature change in [[Another look at climate sensitivity]]: $$ c \frac{d T}{d t} = R_i(T) - R_o(T)$$ which becomes $$ c \frac{d T}{d t} = \mu Q_0 (1 - (c_1 + c_2 \frac{1 - tanh [\kappa(T - T_c)]}{2})) - \sigma T^4 (1 - m tanh((T/T_0)^6)).$$ I have absolutely no faith in the accuracy of the right hand side analytical expression. When you do not have faith in the analytic expression, then the value of automatic differentiation is in question. Everyone here knows this, but just saying it out loud. Might be worth adding something to this effect to the page if we can come up with something clean and succinct.`

@Graham: there appear to be some, eg, C++ template libraries which provide forward mode automatic differentiation to first order, then there's a lot of papers that describe recent ideas but which don't appear to have been implemented in any open source code. I could probably make do with the code that's available, but I'm partly treating this as a learning exercise. The question about 2nd order stuff is just because the papers become so difficult to follow for 2nd order stuff.

@Eric: My viewpoint is slightly different. One situation I'm really uncomfortable being in is where there's two big sources of error, such as having both a lack of faith in the model and a lack of faith that it's derivatives are being computed correctly. Then, if the fit you get against the data is poor you don't know if the fit's poor because the model is rubbish or because wrong derivatives led the optimiser in the wrong direction and there's actually a better fitting set of parameters. (OK, a robust optimizer ought to eventually work even in the case of bad derivatives...) On the other hand, if you're very confident in your derivatives then when your model fit is poor you can conclude your model is rubbish.

My basic problem with finite differenced derivatives is just that the choice of offset size provides opportunities to make the wrong choice and again get poor derivative estimates.

`@Graham: there appear to be some, eg, C++ template libraries which provide forward mode automatic differentiation to first order, then there's a lot of papers that describe recent ideas but which don't appear to have been implemented in any open source code. I could probably make do with the code that's available, but I'm partly treating this as a learning exercise. The question about 2nd order stuff is just because the papers become so difficult to follow for 2nd order stuff. @Eric: My viewpoint is slightly different. One situation I'm really uncomfortable being in is where there's two big sources of error, such as having both a lack of faith in the model and a lack of faith that it's derivatives are being computed correctly. Then, if the fit you get against the data is poor you don't know if the fit's poor because the model is rubbish or because wrong derivatives led the optimiser in the wrong direction and there's actually a better fitting set of parameters. (OK, a robust optimizer ought to eventually work even in the case of bad derivatives...) On the other hand, if you're very confident in your derivatives then when your model fit is poor you can conclude your model is rubbish. My basic problem with finite differenced derivatives is just that the choice of offset size provides opportunities to make the wrong choice and again get poor derivative estimates.`

True enough. Then again, keep in mind that sometimes things look like finite differences but are actually exact expressions.

For example, consider a continuity relation:

$$ \nabla\cdot J = \frac{\partial\rho}{\partial t}.$$ We might naively discretize this according to

$$\frac{\tilde J(x_{i+1/2},t_i) - \tilde J(x_{i-1/2},t_i)}{\Delta x} = \frac{\tilde\rho(x_i,t_{i+1/2}) - \tilde\rho(x_i,t_{i-1/2})}{\Delta t}.$$ Or we might recognize the continuity equation as a statement about a closed 1-form in 2d spacetime, i.e. start with a 1-form

$$j = J d t + \rho d x$$ and compute the exterior derivative

$$d j = \left(\frac{\partial J}{\partial x} - \frac{\partial\rho}{\partial t}\right) d x\wedge d t.$$ The continuity equation simply says

$$d j = 0.$$ Stokes theorem tells us that for any 2d region $\mathcal{S}$ in 2d spacetime

$$\int_{\mathcal{S}} d j = \int_{\partial\mathcal{S}} j = 0.$$ In other words, take any 2d region in 2d spacetime, integrate $j$ around its boundary, and you will always get zero.

Now, take $\mathcal{S}$ to be the square region

$$\mathcal{S} = [x_{i-1/2},x_{i+1/2}]\times[t_{i-1/2},t_{i+1/2}].$$ The boundary $\partial\mathcal{S}$ consists of the four line segments and the exact continuity line integral becomes

$$\int_{[t_{i-1/2},t_{i+1/2}]} J(x_{i+1/2},t) d t - \int_{[t_{i-1/2},t_{i+1/2}]} J(x_{i-1/2},t) d t - \int_{[x_{i-1/2},x_{i+1/2}]} \rho(x,t_{i+1/2}) d x + \int_{[x_{i-1/2},x_{i+1/2}]} \rho(x,t_{i-1/2}) d x = 0.$$ Define

$$\tilde J(x_{i\pm 1/2},t_i) \Delta t = \int_{[t_{i-1/2},t_{i+1/2}]} J(x_{i\pm 1/2},t) d t$$ and

$$\tilde \rho(x_i,t_{i\pm 1/2}) \Delta x = \int_{[x_{i-1/2},x_{i+1/2}]} \rho(x_{i\pm 1/2},t) d x.$$ Rearranging terms, we arrive at

$$\frac{\tilde J(x_{i+1/2},t_i) - \tilde J(x_{i-1/2},t_i)}{\Delta x} = \frac{\tilde\rho(x_i,t_{i+1/2}) - \tilde\rho(x_i,t_{i-1/2})}{\Delta t},$$ which is our "naive" expression above. However, this expression is

exact. Nowhere did we introduce an approximation. We simply shifted our degrees of freedom from some immeasurable continuum pointwise quantities $J(x,t)$ and $\rho(x,t)$, to measureable quantities $\tilde J$ and $\tilde\rho$ defined over line segments (as they should be).So when I promote numerical methods, I mean "numerical methods done right". Just as there are cases where you may have a good continuum model and a poor numerical differentiation, you can have good numerical differentiation with a poor continuum model.

The reason I wanted to (and still hope to) promote rewriting the mathematics in the climate papaers in terms of differential forms is that by doing so, you can make sure you do the numerical integration "right".

`True enough. Then again, keep in mind that sometimes things look like finite differences but are actually exact expressions. For example, consider a continuity relation: $$ \nabla\cdot J = \frac{\partial\rho}{\partial t}.$$ We might naively discretize this according to $$\frac{\tilde J(x_{i+1/2},t_i) - \tilde J(x_{i-1/2},t_i)}{\Delta x} = \frac{\tilde\rho(x_i,t_{i+1/2}) - \tilde\rho(x_i,t_{i-1/2})}{\Delta t}.$$ Or we might recognize the continuity equation as a statement about a closed 1-form in 2d spacetime, i.e. start with a 1-form $$j = J d t + \rho d x$$ and compute the exterior derivative $$d j = \left(\frac{\partial J}{\partial x} - \frac{\partial\rho}{\partial t}\right) d x\wedge d t.$$ The continuity equation simply says $$d j = 0.$$ Stokes theorem tells us that for any 2d region $\mathcal{S}$ in 2d spacetime $$\int_{\mathcal{S}} d j = \int_{\partial\mathcal{S}} j = 0.$$ In other words, take any 2d region in 2d spacetime, integrate $j$ around its boundary, and you will always get zero. Now, take $\mathcal{S}$ to be the square region $$\mathcal{S} = [x_{i-1/2},x_{i+1/2}]\times[t_{i-1/2},t_{i+1/2}].$$ The boundary $\partial\mathcal{S}$ consists of the four line segments and the exact continuity line integral becomes $$\int_{[t_{i-1/2},t_{i+1/2}]} J(x_{i+1/2},t) d t - \int_{[t_{i-1/2},t_{i+1/2}]} J(x_{i-1/2},t) d t - \int_{[x_{i-1/2},x_{i+1/2}]} \rho(x,t_{i+1/2}) d x + \int_{[x_{i-1/2},x_{i+1/2}]} \rho(x,t_{i-1/2}) d x = 0.$$ Define $$\tilde J(x_{i\pm 1/2},t_i) \Delta t = \int_{[t_{i-1/2},t_{i+1/2}]} J(x_{i\pm 1/2},t) d t$$ and $$\tilde \rho(x_i,t_{i\pm 1/2}) \Delta x = \int_{[x_{i-1/2},x_{i+1/2}]} \rho(x_{i\pm 1/2},t) d x.$$ Rearranging terms, we arrive at $$\frac{\tilde J(x_{i+1/2},t_i) - \tilde J(x_{i-1/2},t_i)}{\Delta x} = \frac{\tilde\rho(x_i,t_{i+1/2}) - \tilde\rho(x_i,t_{i-1/2})}{\Delta t},$$ which is our "naive" expression above. However, this expression is _exact_. Nowhere did we introduce an approximation. We simply shifted our degrees of freedom from some immeasurable continuum pointwise quantities $J(x,t)$ and $\rho(x,t)$, to measureable quantities $\tilde J$ and $\tilde\rho$ defined over line segments (as they should be). So when I promote numerical methods, I mean "numerical methods done right". Just as there are cases where you may have a good continuum model and a poor numerical differentiation, you can have good numerical differentiation with a poor continuum model. The reason I wanted to (and still hope to) promote rewriting the mathematics in the climate papaers in terms of differential forms is that by doing so, you can make sure you do the numerical integration "right".`