#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Options

# Automatic differentiation

Started page on Automatic differentiation for collecting some notes.

• Options
1.

R has a function for automatic differentiation. Presumably Sage has better ones?

Example using R, a second deriv of $\frac{sin(x)^3+y}{x+atan(y)}$

> foo <- expression((sin(x)^3+y)/(x+atan(y)))
> D(D(foo, "x"), "y")
-(3 * (cos(x) * sin(x)^2) * (1/(1 + y^2))/(x + atan(y))^2 + (1/(x +
atan(y))^2 - (sin(x)^3 + y) * (2 * (1/(1 + y^2) * (x + atan(y))))/((x +
atan(y))^2)^2))

Comment Source:R has a function for automatic differentiation. Presumably Sage has better ones? Example using R, a second deriv of $\frac{sin(x)^3+y}{x+atan(y)}$ ~~~~ > foo <- expression((sin(x)^3+y)/(x+atan(y))) > D(D(foo, "x"), "y") -(3 * (cos(x) * sin(x)^2) * (1/(1 + y^2))/(x + atan(y))^2 + (1/(x + atan(y))^2 - (sin(x)^3 + y) * (2 * (1/(1 + y^2) * (x + atan(y))))/((x + atan(y))^2)^2)) ~~~~
• Options
2.
edited March 2011

There's a subtle difference between "symbolic differentiation" and "automatic differentiation": symbolic works on expressions like you quoted, automatic works on assignment sequences (although obviously if they define the same function you should get equivalent results). Automatic differentiation is particularly useful for things like

r := a ^ b
x := r * exp (r)
v := r*sin(x*r)^2 + i*r*cos(x*r)^2


where you get repeated use of subexpressions, in that it generally does a better job of avoiding recomputation of the same expression. (There's more complicated stuff like being able to "differentiate at a point" programs involving loops, etc, but I'm not concerned about that at the moment.)

My interest at the moment is in being able to type in some model from thin air and fit it to data using gradient based methods.

Comment Source:There's a subtle difference between "symbolic differentiation" and "automatic differentiation": symbolic works on expressions like you quoted, automatic works on assignment sequences (although obviously if they define the same function you should get equivalent results). Automatic differentiation is particularly useful for things like ~~~~ r := a ^ b x := r * exp (r) v := r*sin(x*r)^2 + i*r*cos(x*r)^2 ~~~~ where you get repeated use of subexpressions, in that it generally does a better job of avoiding recomputation of the same expression. (There's more complicated stuff like being able to "differentiate at a point" programs involving loops, etc, but I'm not concerned about that at the moment.) My interest at the moment is in being able to type in some model from thin air and fit it to data using gradient based methods.
• Options
3.

Thanks for the clarification. Still, I think there must be software that does what you want?

Comment Source:Thanks for the clarification. Still, I think there must be software that does what you want?
• Options
4.

I saw this thread and was going to make a general comment about numerical approximations and noticed that it was already considered on Automatic differentiation.

I come from a background in numerical analysis and my basic philosophy is often driven by the slogan: analytic expressions are overrated. My inclination would be to always go with the numerical approximation to differentiation.

There are some good arguments on the wiki against the idea, but I thought I would point out here some comments and see if anythings makes its way to the wiki.

First, just because something is written down in terms of an analytic expression doesn't mean it is correct. There is a tendency (not here, but I've seen it in various communities) to think the analytic expression is more accurate than the numerical one because it is cleaner.

An example of this is the expression for temperature change in Another look at climate sensitivity:

$$c \frac{d T}{d t} = R_i(T) - R_o(T)$$ which becomes

$$c \frac{d T}{d t} = \mu Q_0 (1 - (c_1 + c_2 \frac{1 - tanh [\kappa(T - T_c)]}{2})) - \sigma T^4 (1 - m tanh((T/T_0)^6)).$$ I have absolutely no faith in the accuracy of the right hand side analytical expression. When you do not have faith in the analytic expression, then the value of automatic differentiation is in question.

Everyone here knows this, but just saying it out loud. Might be worth adding something to this effect to the page if we can come up with something clean and succinct.

Comment Source:I saw this thread and was going to make a general comment about numerical approximations and noticed that it was already considered on [[Automatic differentiation]]. I come from a background in numerical analysis and my basic philosophy is often driven by the slogan: analytic expressions are overrated. My inclination would be to always go with the numerical approximation to differentiation. There are some good arguments on the wiki against the idea, but I thought I would point out here some comments and see if anythings makes its way to the wiki. First, just because something is written down in terms of an analytic expression doesn't mean it is correct. There is a tendency (not here, but I've seen it in various communities) to think the analytic expression is more accurate than the numerical one because it is cleaner. An example of this is the expression for temperature change in [[Another look at climate sensitivity]]: $$c \frac{d T}{d t} = R_i(T) - R_o(T)$$ which becomes $$c \frac{d T}{d t} = \mu Q_0 (1 - (c_1 + c_2 \frac{1 - tanh [\kappa(T - T_c)]}{2})) - \sigma T^4 (1 - m tanh((T/T_0)^6)).$$ I have absolutely no faith in the accuracy of the right hand side analytical expression. When you do not have faith in the analytic expression, then the value of automatic differentiation is in question. Everyone here knows this, but just saying it out loud. Might be worth adding something to this effect to the page if we can come up with something clean and succinct.
• Options
5.
edited March 2011

@Graham: there appear to be some, eg, C++ template libraries which provide forward mode automatic differentiation to first order, then there's a lot of papers that describe recent ideas but which don't appear to have been implemented in any open source code. I could probably make do with the code that's available, but I'm partly treating this as a learning exercise. The question about 2nd order stuff is just because the papers become so difficult to follow for 2nd order stuff.

@Eric: My viewpoint is slightly different. One situation I'm really uncomfortable being in is where there's two big sources of error, such as having both a lack of faith in the model and a lack of faith that it's derivatives are being computed correctly. Then, if the fit you get against the data is poor you don't know if the fit's poor because the model is rubbish or because wrong derivatives led the optimiser in the wrong direction and there's actually a better fitting set of parameters. (OK, a robust optimizer ought to eventually work even in the case of bad derivatives...) On the other hand, if you're very confident in your derivatives then when your model fit is poor you can conclude your model is rubbish.

My basic problem with finite differenced derivatives is just that the choice of offset size provides opportunities to make the wrong choice and again get poor derivative estimates.

Comment Source:@Graham: there appear to be some, eg, C++ template libraries which provide forward mode automatic differentiation to first order, then there's a lot of papers that describe recent ideas but which don't appear to have been implemented in any open source code. I could probably make do with the code that's available, but I'm partly treating this as a learning exercise. The question about 2nd order stuff is just because the papers become so difficult to follow for 2nd order stuff. @Eric: My viewpoint is slightly different. One situation I'm really uncomfortable being in is where there's two big sources of error, such as having both a lack of faith in the model and a lack of faith that it's derivatives are being computed correctly. Then, if the fit you get against the data is poor you don't know if the fit's poor because the model is rubbish or because wrong derivatives led the optimiser in the wrong direction and there's actually a better fitting set of parameters. (OK, a robust optimizer ought to eventually work even in the case of bad derivatives...) On the other hand, if you're very confident in your derivatives then when your model fit is poor you can conclude your model is rubbish. My basic problem with finite differenced derivatives is just that the choice of offset size provides opportunities to make the wrong choice and again get poor derivative estimates.
• Options
6.

True enough. Then again, keep in mind that sometimes things look like finite differences but are actually exact expressions.

For example, consider a continuity relation:

$$\nabla\cdot J = \frac{\partial\rho}{\partial t}.$$ We might naively discretize this according to

$$\frac{\tilde J(x_{i+1/2},t_i) - \tilde J(x_{i-1/2},t_i)}{\Delta x} = \frac{\tilde\rho(x_i,t_{i+1/2}) - \tilde\rho(x_i,t_{i-1/2})}{\Delta t}.$$ Or we might recognize the continuity equation as a statement about a closed 1-form in 2d spacetime, i.e. start with a 1-form

$$j = J d t + \rho d x$$ and compute the exterior derivative

$$d j = \left(\frac{\partial J}{\partial x} - \frac{\partial\rho}{\partial t}\right) d x\wedge d t.$$ The continuity equation simply says

$$d j = 0.$$ Stokes theorem tells us that for any 2d region $\mathcal{S}$ in 2d spacetime

$$\int_{\mathcal{S}} d j = \int_{\partial\mathcal{S}} j = 0.$$ In other words, take any 2d region in 2d spacetime, integrate $j$ around its boundary, and you will always get zero.

Now, take $\mathcal{S}$ to be the square region

$$\mathcal{S} = [x_{i-1/2},x_{i+1/2}]\times[t_{i-1/2},t_{i+1/2}].$$ The boundary $\partial\mathcal{S}$ consists of the four line segments and the exact continuity line integral becomes

$$\int_{[t_{i-1/2},t_{i+1/2}]} J(x_{i+1/2},t) d t - \int_{[t_{i-1/2},t_{i+1/2}]} J(x_{i-1/2},t) d t - \int_{[x_{i-1/2},x_{i+1/2}]} \rho(x,t_{i+1/2}) d x + \int_{[x_{i-1/2},x_{i+1/2}]} \rho(x,t_{i-1/2}) d x = 0.$$ Define

$$\tilde J(x_{i\pm 1/2},t_i) \Delta t = \int_{[t_{i-1/2},t_{i+1/2}]} J(x_{i\pm 1/2},t) d t$$ and

$$\tilde \rho(x_i,t_{i\pm 1/2}) \Delta x = \int_{[x_{i-1/2},x_{i+1/2}]} \rho(x_{i\pm 1/2},t) d x.$$ Rearranging terms, we arrive at

$$\frac{\tilde J(x_{i+1/2},t_i) - \tilde J(x_{i-1/2},t_i)}{\Delta x} = \frac{\tilde\rho(x_i,t_{i+1/2}) - \tilde\rho(x_i,t_{i-1/2})}{\Delta t},$$ which is our "naive" expression above. However, this expression is exact. Nowhere did we introduce an approximation. We simply shifted our degrees of freedom from some immeasurable continuum pointwise quantities $J(x,t)$ and $\rho(x,t)$, to measureable quantities $\tilde J$ and $\tilde\rho$ defined over line segments (as they should be).

So when I promote numerical methods, I mean "numerical methods done right". Just as there are cases where you may have a good continuum model and a poor numerical differentiation, you can have good numerical differentiation with a poor continuum model.

The reason I wanted to (and still hope to) promote rewriting the mathematics in the climate papaers in terms of differential forms is that by doing so, you can make sure you do the numerical integration "right".

Comment Source:True enough. Then again, keep in mind that sometimes things look like finite differences but are actually exact expressions. For example, consider a continuity relation: $$\nabla\cdot J = \frac{\partial\rho}{\partial t}.$$ We might naively discretize this according to $$\frac{\tilde J(x_{i+1/2},t_i) - \tilde J(x_{i-1/2},t_i)}{\Delta x} = \frac{\tilde\rho(x_i,t_{i+1/2}) - \tilde\rho(x_i,t_{i-1/2})}{\Delta t}.$$ Or we might recognize the continuity equation as a statement about a closed 1-form in 2d spacetime, i.e. start with a 1-form $$j = J d t + \rho d x$$ and compute the exterior derivative $$d j = \left(\frac{\partial J}{\partial x} - \frac{\partial\rho}{\partial t}\right) d x\wedge d t.$$ The continuity equation simply says $$d j = 0.$$ Stokes theorem tells us that for any 2d region $\mathcal{S}$ in 2d spacetime $$\int_{\mathcal{S}} d j = \int_{\partial\mathcal{S}} j = 0.$$ In other words, take any 2d region in 2d spacetime, integrate $j$ around its boundary, and you will always get zero. Now, take $\mathcal{S}$ to be the square region $$\mathcal{S} = [x_{i-1/2},x_{i+1/2}]\times[t_{i-1/2},t_{i+1/2}].$$ The boundary $\partial\mathcal{S}$ consists of the four line segments and the exact continuity line integral becomes $$\int_{[t_{i-1/2},t_{i+1/2}]} J(x_{i+1/2},t) d t - \int_{[t_{i-1/2},t_{i+1/2}]} J(x_{i-1/2},t) d t - \int_{[x_{i-1/2},x_{i+1/2}]} \rho(x,t_{i+1/2}) d x + \int_{[x_{i-1/2},x_{i+1/2}]} \rho(x,t_{i-1/2}) d x = 0.$$ Define $$\tilde J(x_{i\pm 1/2},t_i) \Delta t = \int_{[t_{i-1/2},t_{i+1/2}]} J(x_{i\pm 1/2},t) d t$$ and $$\tilde \rho(x_i,t_{i\pm 1/2}) \Delta x = \int_{[x_{i-1/2},x_{i+1/2}]} \rho(x_{i\pm 1/2},t) d x.$$ Rearranging terms, we arrive at $$\frac{\tilde J(x_{i+1/2},t_i) - \tilde J(x_{i-1/2},t_i)}{\Delta x} = \frac{\tilde\rho(x_i,t_{i+1/2}) - \tilde\rho(x_i,t_{i-1/2})}{\Delta t},$$ which is our "naive" expression above. However, this expression is _exact_. Nowhere did we introduce an approximation. We simply shifted our degrees of freedom from some immeasurable continuum pointwise quantities $J(x,t)$ and $\rho(x,t)$, to measureable quantities $\tilde J$ and $\tilde\rho$ defined over line segments (as they should be). So when I promote numerical methods, I mean "numerical methods done right". Just as there are cases where you may have a good continuum model and a poor numerical differentiation, you can have good numerical differentiation with a poor continuum model. The reason I wanted to (and still hope to) promote rewriting the mathematics in the climate papaers in terms of differential forms is that by doing so, you can make sure you do the numerical integration "right".