Options

Blog - relative entropy in evolutionary dynamics

Marc Harper has put a draft of a blog post here:

Take a look! If there's something hard to understand, please let us know. If you have any other comments, please give us those too!

Or, just edit the wiki and tell us here....

Comments

  • 1.
    edited December 2013

    Here are a few comments:

    the fitness landscape is a Euclidean gradient

    The term "fitness landscape" was not introduced, but the fitness functions $f_i$ were introduced shortly before, so presumably this is some property of these. What property?

    the replicator equation is a gradient

    This seems confusing because I believe "gradient" is being used in a different way than above. I think Marc means the replicator equation is a gradient flow equation, meaning an equation of the form

    $$ \frac{d x(t)}{d t} = - \nabla F(x(t)) $$ I have never heard anyone else say "an equation is a gradient".

    ... and so we have an inversion, with distance being a kind of an information rather than an interpretation of information difference as a distance! This fits with how we all think of information in our everyday lives – how far is it to San Francisco? That’s a piece of information, colloquially.

    This seems more like a pun than a serious idea. What's the price of gas in San Francisco? That's also a piece of information, colloquially, but that doesn't mean we should identify prices with information. I suspect I'm missing the point here.

    Dash showed that there is a natural generalization of evolutionarily stable states to incentive stable states, and that for incentive stable states, the relative entropy is decreasing to zero when the trajectories get near the equilibrium. Neat!

    Since you don't say what incentive stable states are, I bet most people won't think this is "neat". They'll just go "hmm, I wonder what that means." It's usually best to save comments like "Neat!" for things that one has explained, so the reader understands them fully and enjoys them.

    In general, I write blog articles as if every piece of jargon has a cost attached to it: I only use jargon if its use justifies the cost, and I try to explain it. (Doing this part of the cost.) So, I'd be inclined to define or delete many of the terms like "Shashahani metric", "logit or Fermi incentive" (we just need one term, not two), "metric incentive stable state (or G-ISS)" (again, we just need one, and since neither are defined it's not clear we should be talking about either), "best-reply and projection dynamics". These add quite a large cognitive burden to the reader, who has to sift through these terms to find the ideas.

    If it's okay with you, Marc I can go ahead and do this; I have it down to an art. Then you can go back and do stuff yourself, etc.

    Comment Source:Here are a few comments: > the fitness landscape is a Euclidean gradient The term "fitness landscape" was not introduced, but the fitness functions $f_i$ were introduced shortly before, so presumably this is some property of these. What property? > the replicator equation is a gradient This seems confusing because I believe "gradient" is being used in a different way than above. I think Marc means the replicator equation is a [gradient flow equation](http://math.stackexchange.com/questions/156236/gradient-flow-and-what-is-for-example-l2-gradient), meaning an equation of the form $$ \frac{d x(t)}{d t} = - \nabla F(x(t)) $$ I have never heard anyone else say "an equation is a gradient". > ... and so we have an inversion, with distance being a kind of an information rather than an interpretation of information difference as a distance! This fits with how we all think of information in our everyday lives – how far is it to San Francisco? That’s a piece of information, colloquially. This seems more like a pun than a serious idea. What's the price of gas in San Francisco? That's also a piece of information, colloquially, but that doesn't mean we should identify prices with information. I suspect I'm missing the point here. > Dash showed that there is a natural generalization of evolutionarily stable states to _incentive stable states_, and that for incentive stable states, the relative entropy is decreasing to zero when the trajectories get near the equilibrium. Neat! Since you don't say what incentive stable states are, I bet most people won't think this is "neat". They'll just go "hmm, I wonder what that means." It's usually best to save comments like "Neat!" for things that one has explained, so the reader understands them fully and enjoys them. In general, I write blog articles as if every piece of jargon has a _cost_ attached to it: I only use jargon if its use justifies the cost, and I try to explain it. (Doing this part of the cost.) So, I'd be inclined to define or delete many of the terms like "Shashahani metric", "logit or Fermi incentive" (we just need one term, not two), "metric incentive stable state (or G-ISS)" (again, we just need one, and since neither are defined it's not clear we should be talking about either), "best-reply and projection dynamics". These add quite a large cognitive burden to the reader, who has to sift through these terms to find the ideas. If it's okay with you, Marc I can go ahead and do this; I have it down to an art. Then you can go back and do stuff yourself, etc.
  • 2.

    Thanks John.

    Yes, the fitness landscape is $f=(f_1, \ldots, f_n)$, and you are right about "the replicator equation is a gradient" meaning that it's the gradient flow; specifically the potential function is one-half the mean-fitness. I guess this is colloquial terminology.

    I'll work on the incentive stable state part... I was trying to cut down on equations and jargon, and your experience in such matters is greatly appreciated! What's neat about Dash's generalization is that it's very easy and extends the result to many more evolutionary dynamics, so I'll try to explain that better, and see if I can justify the exclamation point.

    Comment Source:Thanks John. Yes, the fitness landscape is $f=(f_1, \ldots, f_n)$, and you are right about "the replicator equation is a gradient" meaning that it's the gradient flow; specifically the potential function is one-half the mean-fitness. I guess this is colloquial terminology. I'll work on the incentive stable state part... I was trying to cut down on equations and jargon, and your experience in such matters is greatly appreciated! What's neat about Dash's generalization is that it's very easy and extends the result to many more evolutionary dynamics, so I'll try to explain that better, and see if I can justify the exclamation point.
  • 3.

    The ideal is to ruthlessly cut back on jargon while explaining all ideas mentioned in a clear way - with enough details that someone can really remember these ideas. Half-explained ideas are rather painful. I like to only include equations that are absolutely necessary. If an idea is explained really clearly in words, anyone who knows enough math can translate it into equations, while anyone who doesn't will still understand what's going on. If an equation is used, every letter in it should be explained.

    I've made an initial pass through the article up to this point:

    the q=0 relative entropy is one-half the Euclidean distance squared, and the associated evolutionary dynamic...

    I don't know how you associate an evolutionary dynamic to a concept of relative entropy. I'd really like to know, and I bet other people would too!

    Comment Source:The ideal is to ruthlessly cut back on jargon while explaining all ideas mentioned in a clear way - with enough details that someone can really remember these ideas. Half-explained ideas are rather painful. I like to only include equations that are absolutely necessary. If an idea is explained really clearly in words, anyone who knows enough math can translate it into equations, while anyone who doesn't will still understand what's going on. If an equation is used, every letter in it should be explained. I've made an initial pass through the article up to this point: > the q=0 relative entropy is one-half the Euclidean distance squared, and the associated evolutionary dynamic... I don't know how you associate an evolutionary dynamic to a concept of relative entropy. I'd really like to know, and I bet other people would too!
  • 4.

    When I click on the link in the first post in this thread, I see Marc's original page, not John's edit. I can get to John's edit via Recently Revised.

    Either way, I trip over `Euclidean gradient'. What is that?

    A mention of `Lyapunov' was actually helping me. I've met Lyapunov functions before in ecological theory.

    Comment Source:When I click on the link in the first post in this thread, I see Marc's original page, not John's edit. I can get to John's edit via Recently Revised. Either way, I trip over `Euclidean gradient'. What is that? A mention of `Lyapunov' was actually helping me. I've met Lyapunov functions before in ecological theory.
  • 5.
    edited December 2013

    Graham: I'm guessing "Euclidean gradient" is Marc's term for something you've seen quite often:

    $$ \nabla g = (\frac{\partial g}{\partial x_i}, \dots, \frac{\partial g}{\partial x_i}) $$ where $g$ is a function from $\mathbb{R}^n$ to $\mathbb{R}$.

    I think he's putting in the odd qualifier "Euclidean" because he's just about to talk about the gradient with respect to some other Riemannian metric. The general concept of gradient depends on a Riemannian metric; when this is the usual Euclidean metric on $\mathbb{R}^n$ we get the familiar gradient above.

    So yeah, this is something that needs to be explained if we want folks who haven't studied much differential geometry to understand it.

    I could reinstate the phrase "Lyapunov", but only with some explanation. Buzzwords are helpful to those who know them, but harmful to those who don't, and for any buzzword a majority of the population doesn't know it. So, they're mainly useful when speaking to restricted audiences with a common background. But a blog isn't like that.

    When I click on the link in the first post in this thread, I see Marc’s original page, not John’s edit. I can get to John’s edit via Recently Revised.

    I don't quite understand the origin of that problem, but I've fixed it.

    Comment Source:Graham: I'm guessing "Euclidean gradient" is Marc's term for something you've seen quite often: $$ \nabla g = (\frac{\partial g}{\partial x_i}, \dots, \frac{\partial g}{\partial x_i}) $$ where $g$ is a function from $\mathbb{R}^n$ to $\mathbb{R}$. I think he's putting in the odd qualifier "Euclidean" because he's just about to talk about the gradient with respect to some _other_ Riemannian metric. The general concept of gradient depends on a Riemannian metric; when this is the usual Euclidean metric on $\mathbb{R}^n$ we get the familiar gradient above. So yeah, this is something that needs to be explained if we want folks who haven't studied much differential geometry to understand it. I could reinstate the phrase "Lyapunov", but only with some explanation. Buzzwords are helpful to those who know them, but harmful to those who don't, and for any buzzword a majority of the population doesn't know it. So, they're mainly useful when speaking to restricted audiences with a common background. But a blog isn't like that. > When I click on the link in the first post in this thread, I see Marc’s original page, not John’s edit. I can get to John’s edit via Recently Revised. I don't quite understand the origin of that problem, but I've fixed it.
  • 6.
    edited December 2013

    John's comment on Euclidean gradient is correct -- it's to distinguish it from a gradient with respect to other Riemannian metrics -- and it's not invented terminology, though it may not be as common as I thought (?).

    I don’t know how you associate an evolutionary dynamic to a concept of relative entropy. I’d really like to know, and I bet other people would too!

    Given a relative entropy, take the Hessian and use it as a Riemmanian metric. Take the gradient flow of this metric for a potential function given by a Euclidean gradient. This is a gradient with respect to the new metric. Replace the potential function with an arbitrary fitness function. (The dynamic may no longer be a gradient flow). This gives an "evolutionary dynamic" associated to the relative entropy for an arbitrary fitness landscape. The relative entropy is then a Lyapunov function iff there is an ESS of the fitness landscape. Note that a lot of divergences (aka relative entropies) give essentially the same dynamic since they have the same Hessian (up to a multiplicative constant, which gives the same trajectories).

    For example, for the $q$-relative entropies, the Riemannian metric is $$g_{ij}(x) = \frac{\delta_{ij}}{x_i^q}.$$ (Amari's $alpha$-divergence is the most direct, but the more well-known Tsallis and Renyi entropies are closely related, see e.g. http://en.wikipedia.org/wiki/F-divergence ). The two special cases in the blog post are $q=0$ for the Euclidean metric and the projection dynamic, and $q=1$ for the Shahshahani/Fisher metric and the replicator dynamic. More generally the dynamic is what I called the $q$-deformed replicator dynamic in my "Escort Evolutionary Game Theory" paper. I picked $q=0$ since it's a dynamic in common usage, it's curiously got the average fitness rather than the mean fitness, and it's not that complicated compared to some of the other evolutionary dynamics in use, which would require talking about "best replies" and other game-theoretic topics.

    The associated evolutionary dynamic as I have just described it is a special case of Hofbauer and Sigmund's "adaptive dynamic" (http://homepage.univie.ac.at/josef.hofbauer/90aml_adap.pdf ) which defines an evolutionary dynamic for an arbitrary Riemannian metric on the simplex. My contribution is the connection to information theory/geometry, and later Dash and I showed that you can extend to other time-scales and other modes of selection (incentives).

    Comment Source:John's comment on Euclidean gradient is correct -- it's to distinguish it from a gradient with respect to other Riemannian metrics -- and it's not invented terminology, though it may not be as common as I thought (?). > I don’t know how you associate an evolutionary dynamic to a concept of relative entropy. I’d really like to know, and I bet other people would too! Given a relative entropy, take the Hessian and use it as a Riemmanian metric. Take the gradient flow of this metric for a potential function given by a Euclidean gradient. This is a gradient with respect to the new metric. Replace the potential function with an arbitrary fitness function. (The dynamic may no longer be a gradient flow). This gives an "evolutionary dynamic" associated to the relative entropy for an arbitrary fitness landscape. The relative entropy is then a Lyapunov function iff there is an ESS of the fitness landscape. Note that a lot of divergences (aka relative entropies) give essentially the same dynamic since they have the same Hessian (up to a multiplicative constant, which gives the same trajectories). For example, for the $q$-relative entropies, the Riemannian metric is $$g_{ij}(x) = \frac{\delta_{ij}}{x_i^q}.$$ (Amari's $alpha$-divergence is the most direct, but the more well-known Tsallis and Renyi entropies are closely related, see e.g. http://en.wikipedia.org/wiki/F-divergence ). The two special cases in the blog post are $q=0$ for the Euclidean metric and the projection dynamic, and $q=1$ for the Shahshahani/Fisher metric and the replicator dynamic. More generally the dynamic is what I called the $q$-deformed replicator dynamic in my "Escort Evolutionary Game Theory" paper. I picked $q=0$ since it's a dynamic in common usage, it's curiously got the average fitness rather than the mean fitness, and it's not that complicated compared to some of the other evolutionary dynamics in use, which would require talking about "best replies" and other game-theoretic topics. The associated evolutionary dynamic as I have just described it is a special case of Hofbauer and Sigmund's "adaptive dynamic" (http://homepage.univie.ac.at/josef.hofbauer/90aml_adap.pdf ) which defines an evolutionary dynamic for an arbitrary Riemannian metric on the simplex. My contribution is the connection to information theory/geometry, and later Dash and I showed that you can extend to other time-scales and other modes of selection (incentives).
  • 7.
    edited December 2013

    I took this passage:

    When the fitness landscape is a Euclidean gradient, the replicator equation describes gradient flow on a special geometry, sometimes called the 'Shahshahani geometry'. This geometry involves a Riemannian metric on the space of population distributions, called the 'Fisher information metric'.

    and expanded it so that it says:

    Suppose the fitness function $f$ is the gradient of some other function: $$ f_i(x) = \frac{\partial g}{\partial_i x} $$ Then the replicator equation describes 'gradient flow' on a special non-Euclidean geometry, sometimes called the 'Shahshahani geometry', where the distance is measured using the 'Fisher information metric'. The space of population distributions is a simplex: $$ \{ x \in \mathbb{R}^n : \; x_i \ge 0, \; \sum_{i = 1}^n x_i = 1 \} $$ for example an equilateral triangle when $n = 3$. But using the Fisher information metric this triangle becomes round, exactly like a portion of a sphere. And the replicator equation takes the form $$ \frac{\partial x}{\partial t} = - \nabla F(x(t)) $$ for some function $F$, where now the gradient is computed as you would on a sphere. This sort of equation is called gradient flow: it means that $x(t)$ flows 'downhill' if we think of $F$ as 'height'.

    However, I'm confused. Under this interpretation of the phrase "when the fitness landscape is a Euclidean gradient", every fitness landscape is a Euclidean gradient! After all, every smooth vector field on the simplex is the gradient of some smooth function. So, why bother saying "when the fitness landscape is a Euclidean gradient"?

    I have a guess, namely that Marc is hinting $F = -g$. No matter what's going on, when I get this straightened out, I'll probably be able to rewrite this passage so it's less technical sounding! I'm not trying to be technical, just clear.

    Comment Source:I took this passage: >When the fitness landscape is a Euclidean gradient, the replicator equation describes gradient flow on a special geometry, sometimes called the 'Shahshahani geometry'. This geometry involves a Riemannian metric on the space of population distributions, called the <a href = "http://en.wikipedia.org/wiki/Fisher_information_metric">'Fisher information metric'</a>. and expanded it so that it says: > Suppose the fitness function $f$ is the gradient of some other function: > $$ f_i(x) = \frac{\partial g}{\partial_i x} $$ > Then the replicator equation describes 'gradient flow' on a special non-Euclidean geometry, sometimes called the 'Shahshahani geometry', where the distance is measured using the <a href = "http://en.wikipedia.org/wiki/Fisher_information_metric">'Fisher information metric'</a>. The space of population distributions is a simplex: > $$ \{ x \in \mathbb{R}^n : \; x_i \ge 0, \; \sum_{i = 1}^n x_i = 1 \} $$ > for example an equilateral triangle when $n = 3$. But using the Fisher information metric this triangle becomes round, exactly like a portion of a sphere. And the replicator equation takes the form > $$ \frac{\partial x}{\partial t} = - \nabla F(x(t)) $$ > for some function $F$, where now the gradient is computed as you would on a sphere. This sort of equation is called <b>gradient flow</b>: it means that $x(t)$ flows 'downhill' if we think of $F$ as 'height'. However, I'm confused. Under this interpretation of the phrase "when the fitness landscape is a Euclidean gradient", _every_ fitness landscape is a Euclidean gradient! After all, every smooth vector field on the simplex is the gradient of some smooth function. So, why bother saying "when the fitness landscape is a Euclidean gradient"? I have a guess, namely that Marc is hinting $F = -g$. No matter what's going on, when I get this straightened out, I'll probably be able to rewrite this passage so it's less technical sounding! I'm not trying to be technical, just clear.
  • 8.

    However, I’m confused. Under this interpretation of the phrase “when the fitness landscape is a Euclidean gradient”, every fitness landscape is a Euclidean gradient! After all, every smooth vector field on the simplex is the gradient of some smooth function. So, why bother saying “when the fitness landscape is a Euclidean gradient”?

    It can be confusing, but not every interesting replicator equation comes from a fitness landscape that is a Euclidean gradient.

    Explicitly, If the fitness landscape is given by a game matrix $A$ so that $f(x) = Ax$, the replicator equation is a Shahshahani gradient if $A$ is symmetric (with potential function $V = \frac{1}{2} x \cdot A x$), but not in general. It was shown by (IIRC) Hofbauer that the landscape $f(x) = Ax$ is a Shahshahani gradient if and only if for all $i,j,k$, the matrix components satisfy $a_{ij} + a_{jk} + a_{ki} = a_{ik} + a_{kj} + a_{ji}$. There is a more general criterion for arbitrary landscapes that involves a similar "cyclical" sum of partial derivatives.

    Curiously, if we start with the potential $V = \frac{1}{2} x \cdot A x$ (half the mean fitness) for an arbitrary matrix and take the Eucliedan gradient, we get a replicator equation for a different fitness landscape $g$ given by $$ g(x) = \frac{A + A^T}{2} x,$$ the symmetrization of $A$. If we use a rock-paper-scissors landscape (which is anti-symmetric), we get a degenerate replicator equation.

    On the other hand, it's easy to see (using the relevant formula from differential geometry) that for any potential function, the gradient flow for the Shahshahani metric is $$\frac{dx_i}{dt} = x_i \left( (\nabla V)_i - x \cdot \nabla V \right)$$ So all potential functions yield replicator equations that are Shahshahani gradients, but not all replicator equations are Shahshahani gradients. The nicest special case is to assume that the fitness landscape is a Euclidean gradient because we are then guaranteed that the replicator equation is a Shahshahani gradient, and this covers the important application of "alleles at a gene locus" (which has a symmetric game matrix).

    The relative entropy, however, is always a Lyapunov function when there is an internal ESS, regardless of whether the particular replicator equation is a Shahshahani gradient or not.

    Comment Source:> However, I’m confused. Under this interpretation of the phrase “when the fitness landscape is a Euclidean gradient”, every fitness landscape is a Euclidean gradient! After all, every smooth vector field on the simplex is the gradient of some smooth function. So, why bother saying “when the fitness landscape is a Euclidean gradient”? It can be confusing, but not every interesting replicator equation comes from a fitness landscape that is a Euclidean gradient. Explicitly, If the fitness landscape is given by a game matrix $A$ so that $f(x) = Ax$, the replicator equation is a Shahshahani gradient if $A$ is symmetric (with potential function $V = \frac{1}{2} x \cdot A x$), but not in general. It was shown by (IIRC) Hofbauer that the landscape $f(x) = Ax$ is a Shahshahani gradient if and only if for all $i,j,k$, the matrix components satisfy $a_{ij} + a_{jk} + a_{ki} = a_{ik} + a_{kj} + a_{ji}$. There is a more general criterion for arbitrary landscapes that involves a similar "cyclical" sum of partial derivatives. Curiously, if we start with the potential $V = \frac{1}{2} x \cdot A x$ (half the mean fitness) for an arbitrary matrix and take the Eucliedan gradient, we get a replicator equation for a different fitness landscape $g$ given by $$ g(x) = \frac{A + A^T}{2} x,$$ the symmetrization of $A$. If we use a rock-paper-scissors landscape (which is anti-symmetric), we get a degenerate replicator equation. On the other hand, it's easy to see (using the relevant formula from differential geometry) that for any potential function, the gradient flow for the Shahshahani metric is $$\frac{dx_i}{dt} = x_i \left( (\nabla V)_i - x \cdot \nabla V \right)$$ So all potential functions yield replicator equations that are Shahshahani gradients, but not all replicator equations are Shahshahani gradients. The nicest special case is to assume that the fitness landscape is a Euclidean gradient because we are then guaranteed that the replicator equation is a Shahshahani gradient, and this covers the important application of "alleles at a gene locus" (which has a symmetric game matrix). The relative entropy, however, is always a Lyapunov function when there is an internal ESS, regardless of whether the particular replicator equation is a Shahshahani gradient or not.
  • 9.
    edited December 2013

    Thanks! The shorter answer would have been to say that this sentence of mine was bullshit:

    After all, every smooth vector field on the simplex is the gradient of some smooth function.

    It's gotta have vanishing curl!

    Comment Source:Thanks! The shorter answer would have been to say that this sentence of mine was bullshit: > After all, every smooth vector field on the simplex is the gradient of some smooth function. It's gotta have vanishing curl!
  • 10.

    I'm being a bit slow on this because this week I need to give and grade two final exams and prepare a talk for the SETI Institute. Then on Friday I'll fly up to Berkeley, give that talk, and also participate in a workshop on logic, probability and reflection run by some singularitarians at the Machine Intelligence Research Institute.

    However, I think I know roughly how I want to edit the blog article, so that should go pretty quick when I finally do it! I'll send Marc an email when I do that, as well as announcing it here.

    I would still appreciate more comments and questions about the existing article, though.

    Comment Source:I'm being a bit slow on this because this week I need to give and grade two final exams and prepare a talk for the [SETI Institute](http://johncarlosbaez.wordpress.com/2013/12/05/talk-at-the-seti-institute/). Then on Friday I'll fly up to Berkeley, give that talk, and also participate in a [workshop on logic, probability and reflection](http://intelligence.org/2013/07/24/miris-december-2013-workshop/) run by some singularitarians at the Machine Intelligence Research Institute. However, I think I know roughly how I want to edit the blog article, so that should go pretty quick when I finally do it! I'll send Marc an email when I do that, as well as announcing it here. I would still appreciate more comments and questions about the existing article, though.
  • 11.

    Fixed two typos on the wiki page.

    Comment Source:Fixed two typos on the wiki page.
  • 12.

    See the sentence "Dash Fryer (at Pomona College) ... I showed that you can use generalized entropies of the and Renyi types

    The phrase "the and" is a syntax error. Sounds like you meant "the Renyi types", but I'll let you change it since Renyi isn't on my horizon.

    Comment Source:See the sentence "Dash Fryer (at Pomona College) ... I showed that you can use generalized entropies of the and Renyi types The phrase "the and" is a syntax error. Sounds like you meant "the Renyi types", but I'll let you change it since Renyi isn't on my horizon.
  • 13.

    "Suppose the fitness function f is the gradient of some other function:"

    The denominator in the formula should have the subscript i on the letter x, no?

    Comment Source:"Suppose the fitness function f is the gradient of some other function:" The denominator in the formula should have the subscript i on the letter x, no?
  • 14.
    edited December 2013

    Great stuff -- at least the first two pages, which is as far I could understand on a first reading.

    The whole topic is new and exciting to me. Here are two points that look obvious, but which might make the language stronger (for newcomers) if you wove them into the text: (1) the range of the fitness functions can be any real number, (2) evolutionary stable state = zero of the replicator equation.

    Can you add a bit more description about the Fisher information metric. Not a full definition (which is what the link is for), but something to give some general color about the metric.

    Can you add a bit of color on the nature of the function F, and how it is related to the metric. You give the hint that F is like a height. Is it actually the geodesic distance from a reference point?

    Can you give a sentence or two that further interprets why evolutionary movement would be described by a gradient flow, from higher altitudes to lower ones. What is the meaning of the scalar function g, which (1) provides a compact representation for the fitnesses of all of the species, and (2) is the value that the system is seeking to minimize. Can you summarize how it is related to entropy.

    Suggestion: it would be nice, to reach a broader audience, to add some sentences that at least partially define entropy and relative entropy, at least in this context. My understanding of entropy is vague and entropic.

    Comment Source:Great stuff -- at least the first two pages, which is as far I could understand on a first reading. The whole topic is new and exciting to me. Here are two points that look obvious, but which might make the language stronger (for newcomers) if you wove them into the text: (1) the range of the fitness functions can be any real number, (2) evolutionary stable state = zero of the replicator equation. Can you add a bit more description about the Fisher information metric. Not a full definition (which is what the link is for), but something to give some general color about the metric. Can you add a bit of color on the nature of the function F, and how it is related to the metric. You give the hint that F is like a height. Is it actually the geodesic distance from a reference point? Can you give a sentence or two that further interprets why evolutionary movement would be described by a gradient flow, from higher altitudes to lower ones. What is the meaning of the scalar function g, which (1) provides a compact representation for the fitnesses of all of the species, and (2) is the value that the system is seeking to minimize. Can you summarize how it is related to entropy. Suggestion: it would be nice, to reach a broader audience, to add some sentences that at least partially define entropy and relative entropy, at least in this context. My understanding of entropy is vague and entropic.
  • 15.

    Thanks David. Your first two points are well-taken -- it's probably not obvious to newcomers that ESS are rest points of the replicator equation.

    For the Fisher information metric, the definition/intuition from statistical theory describes, in some sense, the sensitivity of a distribution to changes in its parameters. Information geometers seem to think about it in terms of curvature -- flatter parts of probability space correspond to less sensitivity. In this case the curvature is constant, however, since it's realizable as part of a sphere. I'm not sure what to say about it in this article -- the point is more that the Shahshahani gradient sort of comes out of nowhere, but can be understood information theoretically (and is the Hessian of the relative entropy). The Fisher information metric comes up in quantum mechanics as well as the Fubini-Study metric, so it's a rather general thing. I'll keep thinking about it.

    As per the replicator equation being a gradient, it's not always the case, but often is (see the discussion above). Biologically natural selection is what "optimizes" a fitness landscape (whereas processes like mutation and drift cause movement but in a random sense rather than as culling). The function g is the mean fitness in the "alleles at a gene locus" case. Incidentally this is closely related to Fisher's fundamental theorem of natural selection, which says that the time derivative of the mean fitness is equal to the variance in the fitness lanscape (with respect to the population distribution). I think that John mentioned this in his earlier series, but it may be worth bringing up again, since we get a "generalized" version now.

    I'll think about what to say about entropy. In this context the information theoretic definition is probably more appropriate...

    Comment Source:Thanks David. Your first two points are well-taken -- it's probably not obvious to newcomers that ESS are rest points of the replicator equation. For the Fisher information metric, the definition/intuition from statistical theory describes, in some sense, the sensitivity of a distribution to changes in its parameters. Information geometers seem to think about it in terms of curvature -- flatter parts of probability space correspond to less sensitivity. In this case the curvature is constant, however, since it's realizable as part of a sphere. I'm not sure what to say about it in this article -- the point is more that the Shahshahani gradient sort of comes out of nowhere, but can be understood information theoretically (and is the Hessian of the relative entropy). The Fisher information metric comes up in quantum mechanics as well as the Fubini-Study metric, so it's a rather general thing. I'll keep thinking about it. As per the replicator equation being a gradient, it's not always the case, but often is (see the discussion above). Biologically natural selection is what "optimizes" a fitness landscape (whereas processes like mutation and drift cause movement but in a random sense rather than as culling). The function g is the mean fitness in the "alleles at a gene locus" case. Incidentally this is closely related to Fisher's fundamental theorem of natural selection, which says that the time derivative of the mean fitness is equal to the variance in the fitness lanscape (with respect to the population distribution). I think that John mentioned this in his earlier series, but it may be worth bringing up again, since we get a "generalized" version now. I'll think about what to say about entropy. In this context the information theoretic definition is probably more appropriate...
  • 16.

    From Blog - relative entropy in evolutionary dynamics

    Then the replicator equation describes ‘gradient flow’ on a special non-Euclidean geometry, sometimes called the ‘Shahshahani geometry’, where the distance is measured using the 'Fisher information metric'.

    The Fisher information metric is derived from a probability distribution, the $p(x, \theta)$ in Wikipedia. I don't understand where $p(x, \theta)$ comes from. Is it derived from $f(x)$ somehow?

    Comment Source:From [[Blog - relative entropy in evolutionary dynamics]] > Then the replicator equation describes ‘gradient flow’ on a special non-Euclidean geometry, sometimes called the ‘Shahshahani geometry’, where the distance is measured using the 'Fisher information metric'. The Fisher information metric is derived from a probability distribution, the $p(x, \theta)$ in Wikipedia. I don't understand where $p(x, \theta)$ comes from. Is it derived from $f(x)$ somehow?
  • 17.

    The paper

    Akio Fujiwara, Shun-ichi Amari, Gradient systems in view of information geometry

    is behind a paywall. (I can get over the wall but not understand the article!)

    Comment Source:The paper Akio Fujiwara, Shun-ichi Amari, Gradient systems in view of information geometry is behind a paywall. (I can get over the wall but not understand the article!)
  • 18.

    In this case the "parameters" are just $x_1, \ldots, x_n$, thinking of the simplex as a model for discrete probability distributions. So the Fisher information metric is just the expectation of the second derivatives of $\log x_i$. In information geometry, the points of a "statistical manifold" are often a more interesting distribution, like the normal distribution on parameters $\mu$ and $\sigma$. So more precisely, the n-dimensional simplex (called $P(S)$ on a finite set $S$) is modeled by some $\theta_0, \theta_1, \ldots, \theta_n$ (e.g. coordinate axes) where $\theta_0$ is 1 minus the sum of the others. It's easier just to pretend that $x_k = \theta_k$ for our purposes and not worry about coordinates.

    It is possible to compute $p(x, \theta)$ explicitly for the replicator dynamics in some cases using the fitness landscape, but depends on solving (another) system of differential equations that rarely has an analytic solution. It's not necessary to compute the Fisher information, in any case.

    Comment Source:In this case the &quot;parameters&quot; are just $x_1, \ldots, x_n$, thinking of the simplex as a model for discrete probability distributions. So the Fisher information metric is just the expectation of the second derivatives of $\log x_i$. In information geometry, the points of a &quot;statistical manifold&quot; are often a more interesting distribution, like the normal distribution on parameters $\mu$ and $\sigma$. So more precisely, the n-dimensional simplex (called $P(S)$ on a finite set $S$) is modeled by some $\theta_0, \theta_1, \ldots, \theta_n$ (e.g. coordinate axes) where $\theta_0$ is 1 minus the sum of the others. It's easier just to pretend that $x_k = \theta_k$ for our purposes and not worry about coordinates. It is possible to compute $p(x, \theta)$ explicitly for the replicator dynamics in some cases using the fitness landscape, but depends on solving (another) system of differential equations that rarely has an analytic solution. It's not necessary to compute the Fisher information, in any case.
  • 19.

    Thanks, Marc. So I think you get $$g_{ij}(x) = \frac{\delta_{ij}}{x_i}.$$ But now I don't understand John's

    But using the Fisher information metric this triangle becomes round, exactly like a portion of a sphere.

    Comment Source:Thanks, Marc. So I think you get $$g_{ij}(x) = \frac{\delta_{ij}}{x_i}.$$ But now I don't understand John's > But using the Fisher information metric this triangle becomes round, exactly like a portion of a sphere.
  • 20.
    edited December 2013

    Marc, in the beginning of the article you say that $x_i$ is the fraction of individuals who are of the ith type. So the vector x is on the simplex. Then you say let $f_i(x)$ is the fitness of the ith type, which appears to say that fitness is a function of the probability distribution, not the absolute population counts. Then, for the replicator equation to be true, this equation would be strongly constrained to produce a derivative that keeps x in the simplex.

    Looking at part 9 of John's information geometry series, I see that he defines fitness as a function of the vector of absolute populations P. And the replicator equation there uses both the the vectors p and P.

    Do you need to introduce a vector X of population counts as well?

    Comment Source:Marc, in the beginning of the article you say that $x_i$ is the fraction of individuals who are of the ith type. So the vector x is on the simplex. Then you say let $f_i(x)$ is the fitness of the ith type, which appears to say that fitness is a function of the probability distribution, not the absolute population counts. Then, for the replicator equation to be true, this equation would be strongly constrained to produce a derivative that keeps x in the simplex. Looking at <a href="http://johncarlosbaez.wordpress.com/2012/06/01/information-geometry-part-9/">part 9</a> of John's information geometry series, I see that he defines fitness as a function of the vector of absolute populations P. And the replicator equation there uses both the the vectors p and P. Do you need to introduce a vector X of population counts as well?
  • 21.
    edited December 2013

    Is the function g defined only on the simplex, or on the larger Euclidean space that contains the simplex?

    If it is only on the simplex, then you are taking a gradient on the manifold, which is a concept that I would need to study further. Is that why the subscript i in the denominator of the gradient formula is attached to the derivative symbol, and not x?

    Here are two things that would helpful for working through the concepts:

    • State the domain and range of functions that are introduced -- unless it is patently obvious even to readers outside of the field of inquiry. The extra type info can resolve any ambiguities from the written language, or the readers interpretation of it.

    • A companion "toy example" to show the definitions at work. I'm imagining some system with two species, with very simple fitness functions.

    Now, in terms of your article, there are downsides to these suggestions, so you'll have to do the weighing. Adding the extra domain and range information, if not done in natural way, can make the tone sound more formal and heavy. How about something along these lines:

    Let $f_i(x)$ be the fitness of the ith type, which is in general a function of all the population counts. Positive fitness means the population of that species will go up, and negative fitness means it will go down.

    The downsides to adding the toy example is that it adds mass to the article, and it introduces the challenge of interweaving it with the general definitions, without breaking up the flow. Can it be made very succinct? Or it could be put in an appendix. Or worked out on a separate web page, which could be linked to from the blog article.

    Comment Source:Is the function g defined only on the simplex, or on the larger Euclidean space that contains the simplex? If it is only on the simplex, then you are taking a gradient on the manifold, which is a concept that I would need to study further. Is that why the subscript i in the denominator of the gradient formula is attached to the derivative symbol, and not x? Here are two things that would helpful for working through the concepts: * State the domain and range of functions that are introduced -- unless it is patently obvious even to readers outside of the field of inquiry. The extra type info can resolve any ambiguities from the written language, or the readers interpretation of it. * A companion "toy example" to show the definitions at work. I'm imagining some system with two species, with very simple fitness functions. Now, in terms of your article, there are downsides to these suggestions, so you'll have to do the weighing. Adding the extra domain and range information, if not done in natural way, can make the tone sound more formal and heavy. How about something along these lines: > Let $f_i(x)$ be the fitness of the ith type, which is in general a function of all the population counts. Positive fitness means the population of that species will go up, and negative fitness means it will go down. The downsides to adding the toy example is that it adds mass to the article, and it introduces the challenge of interweaving it with the general definitions, without breaking up the flow. Can it be made very succinct? Or it could be put in an appendix. Or worked out on a separate web page, which could be linked to from the blog article.
  • 22.
    edited December 2013

    @Graham There's a mapping to the "positive portion" of the radius 2 n-sphere that turns the Fisher metric into the Euclidean metric: $y_i = 2 \sqrt{x_i}$. This turns $\sum_i{x} = 1$ into $\sum_{i}{y_i^2} = 4$, and we can use to the Jacobian of the transformation to show that the metric changes as claimed. This means that geodesics are images of great circles, and helps visualize the geometry a bit.

    @David Your reading is correct -- the $f_i$ need only be defined on the simplex. This is often called "frequency-dependent" fitness. There's no absolute population size in this case and I've given the "standard" formulation. Since we are assuming that the $x_i$ are differentiable, the absolute size is essentially infinite. There is a mapping to a Lotka-Volterra equation (using non-absolute fitnesses) of a lower dimension, however. This is a consequence of a mapping from $\mathbb{R}^n_{+}$ to the simplex given by $z \mapsto z / |z|$, where $|z| = z_1 + \cdots + z_n$. (There's actually a Fisher metric on "non-normalized probability distributions" $\mathbb{R}^n_{+}$ much like the Fisher metric in the simplex.) The replicator equation as John wrote in the link would probably be called something like a Lotka-Volterra equation by others, and if you look lower down on that page, he discusses the "replicator equation on probabilities". In that case, for any choice of the $f_i$, the replicator equation stays in the simplex (just sum the right hand side and see that it's zero always). It's not as big a constraint as you might think.

    The gradient with respect to the Fisher metric is taken "on the manifold". This is one of the least scary things from differential geometry, and can be easily defined entirely in terms of the metric. The full right hand side of the replicator equation is that gradient (for $f_i(x) = Ax$ with symmetric matrices $A$... more generally see above). The function $g$ introduced by John in the canonical example of alleles at a gene locus is the mean fitness $x \cdot f(x)$, and need only be defined on the simplex in general.

    Unfortunately your boxed statement is not quite right in general -- even if the fitness is positive, the population proportion could go down. What matters is whether the fitness $f_i$ is greater than the mean fitness (or not). But what you said is true for zero-sum games, which have mean fitness zero, like rock-paper-scissors games. But I think something along the lines you suggest could work, and I think talking about the fitness functions more is worth it too.

    Comment Source:@Graham There's a mapping to the "positive portion" of the radius 2 n-sphere that turns the Fisher metric into the Euclidean metric: $y_i = 2 \sqrt{x_i}$. This turns $\sum_i{x} = 1$ into $\sum_{i}{y_i^2} = 4$, and we can use to the Jacobian of the transformation to show that the metric changes as claimed. This means that geodesics are images of great circles, and helps visualize the geometry a bit. @David Your reading is correct -- the $f_i$ need only be defined on the simplex. This is often called "frequency-dependent" fitness. There's no absolute population size in this case and I've given the "standard" formulation. Since we are assuming that the $x_i$ are differentiable, the absolute size is essentially infinite. There is a mapping to a Lotka-Volterra equation (using non-absolute fitnesses) of a lower dimension, however. This is a consequence of a mapping from $\mathbb{R}^n_{+}$ to the simplex given by $z \mapsto z / |z|$, where $|z| = z_1 + \cdots + z_n$. (There's actually a Fisher metric on "non-normalized probability distributions" $\mathbb{R}^n_{+}$ much like the Fisher metric in the simplex.) The replicator equation as John wrote in the link would probably be called something like a Lotka-Volterra equation by others, and if you look lower down on that page, he discusses the "replicator equation on probabilities". In that case, for any choice of the $f_i$, the replicator equation stays in the simplex (just sum the right hand side and see that it's zero always). It's not as big a constraint as you might think. The gradient with respect to the Fisher metric is taken "on the manifold". This is one of the least scary things from differential geometry, and can be easily defined entirely in terms of the metric. The full right hand side of the replicator equation is that gradient (for $f_i(x) = Ax$ with symmetric matrices $A$... more generally see above). The function $g$ introduced by John in the canonical example of alleles at a gene locus is the mean fitness $x \cdot f(x)$, and need only be defined on the simplex in general. Unfortunately your boxed statement is not quite right in general -- even if the fitness is positive, the population proportion could go down. What matters is whether the fitness $f_i$ is greater than the mean fitness (or not). But what you said is true for zero-sum games, which have mean fitness zero, like rock-paper-scissors games. But I think something along the lines you suggest could work, and I think talking about the fitness functions more is worth it too.
  • 23.

    Thanks again Marc. I think it needs to be made clear there is an extra step to get to the sphere, or omit this bit altogether.

    Comment Source:Thanks again Marc. I think it needs to be made clear there is an extra step to get to the sphere, or omit this bit altogether.
  • 24.
    edited December 2013

    With its Fisher metric, the geometry of the triangle

    $${(x_1,x_2,x_3): x_i \ge 0, \sum x_i = 1 }$$ becomes identical to that of an eighth of a sphere

    $${(y_1,y_2,y_3): y_i \ge 0, \sum y_i^2 = 1 }$$ That's all I was saying. Perhaps some people will find it confusing to not describe the precise function that expresses the coordinates $y_i$ in terms of the coordinates $x_i$. But I suspect it's also distracting, and thus confusing (to some other set of people), to write down this formula. To people who actually care, the Wikipedia link contains the desired formula. And it happens to be the very first formula you'd guess: $y_i = \sqrt{x_i}$.

    I just wanted to let people know that the Fisher metric is not a profoundly abstract thing; it's just a way to puff up a simplex to part of a sphere. But this is Marc's article and he should have the final say.

    (By the way, Marc says $y_i = 2 \sqrt{x_i}$, but that's just because he's using another normalization of the Fisher metric; that doesn't matter much in an expository article like this!)

    Comment Source:With its Fisher metric, the geometry of the triangle $$\{(x_1,x_2,x_3): x_i \ge 0, \sum x_i = 1 \}$$ becomes identical to that of an eighth of a sphere $$\{(y_1,y_2,y_3): y_i \ge 0, \sum y_i^2 = 1 \}$$ That's all I was saying. Perhaps some people will find it confusing to not describe the precise function that expresses the coordinates $y_i$ in terms of the coordinates $x_i$. But I suspect it's also distracting, and thus confusing (to some other set of people), to write down this formula. To people who actually care, the Wikipedia link contains the desired formula. And it happens to be the very first formula you'd guess: $y_i = \sqrt{x_i}$. I just wanted to let people know that the Fisher metric is not a profoundly abstract thing; it's just a way to puff up a simplex to part of a sphere. But this is Marc's article and he should have the final say. (By the way, Marc says $y_i = 2 \sqrt{x_i}$, but that's just because he's using another normalization of the Fisher metric; that doesn't matter much in an expository article like this!)
  • 25.
    edited December 2013

    I've edited Marc's post. Everyone take a look and let me know what you think!

    Some points:

    (1) Marc had very briefly mentioned two big results connecting evolutionary dynamics to relative entropy and information geometry:

    • When the fitness landscape is a Euclidean gradient, the replicator equation is a gradient for a special geometry called the Shahshahani geometry (also called the Fisher information geometry).

    • When the fitness landscape has an evolutionarily stable state, the relative entropy of the stable state and the current state decreases to zero as the population trajectory converges to the stable state (a Lyapunov theorem).

    People who are already experts on evolutionary dynamics, relative entropy and information geometry, or who have been closely following the information geometry series, might be able to guess what these sentences mean. However, there are very few such people! Even I had a lot of trouble guessing what it meant for a "landscape" to be a "gradient", or for an "equation" to be a "gradient". (The word "gradient" is being used in two different ways here.)

    So, I decided to massively expand on these points, making them into sections called "Relative entropy as a Lyapunov function" and "The replicator equation as a gradient flow equation". I also inserted a section explaining the replicator equation in more detail.

    The goal in these 3 sections is to review lots of old stuff from the information geometry series, laying the ground for an explanation of Marc's new work. This old stuff is fascinating and for most readers it will be brand new. Then comes a section explaining Marc's actual work, which generalizes this old stuff.

    (2) David asked Marc to point out that

    the range of the fitness functions can be any real number

    I've done this.

    (3) David asked Marc to point out that

    evolutionarily stable state = zero of the replicator equation.

    I've done this. I added a quick explanation of "evolutionarily stable state" and pointed back to Part 13 of the information geometry series, where this was explained in a lot more detail.

    In general it's always good to link back to previous blog articles when some concept has been explained before, instead of hoping people will remember.

    (4) David asked:

    Can you add a bit more description about the Fisher information metric? Not a full definition (which is what the link is for), but something to give some general color about the metric.

    I gave the full definition and pointed back to part 7 of the information geometry where it was explained in more detail.

    (5) David asked:

    Can you give a sentence or two that further interprets why evolutionary movement would be described by a gradient flow, from higher altitudes to lower ones?

    I've explained that in this model (the replicator equation), evolution is a "learning process", where the amount of information "left to learn" keeps decreasing, as long as there exists an evolutionarily stable state. It's possible that a few more very hand-wavy sentences about this subject right near the beginning would help readers get the basic idea. Most readers will be killed off later on as the math intensity increases, so it's good to make important general points right near the start.

    (6) David wrote:

    Suggestion: it would be nice, to reach a broader audience, to add some sentences that at least partially define entropy and relative entropy, at least in this context. My understanding of entropy is vague and entropic.

    I've added the definition of relative entropy and a quick intuitive explanation of what it means. You need to understand this concept to get an inkling of why the Fisher information metric is important, in my opinion.

    (7) Graham wrote:

    The Fisher information metric is derived from a probability distribution, the $p(x, \theta)$ in Wikipedia. I don't understand where $p(x, \theta)$ comes from. Is it derived from $f(x)$ somehow?

    I hope this question is answered clearly in the new version. The probability distribution showing up in the Fisher information metric is none other than the population distribution $p_i$. The probability distribution has nothing to do with the fitness functions $f_i$.

    It's very cute how both population and probability begin with $p$ and followed by the word "distribution".

    (8) I have played down the use of the term "fitness landscape", since Marc never defined it, and I'm not sure it's helpful to most people when we can equally well talk about the fitness functions $f_i$. However, evolutionary biologists love this term, so maybe we should define the fitness landscape to be the collection of functions $f_i$.

    (9) I have also played down the term "incentive", which comes in the final section, since Marc never defined it and I don't know what it means. If he thinks it's important, he can define it and we can include that definition.

    Comment Source:I've edited Marc's post. **Everyone take a look and let me know what you think!** Some points: (1) Marc had very briefly mentioned two big results connecting evolutionary dynamics to relative entropy and information geometry: > * When the fitness landscape is a Euclidean gradient, the replicator equation is a gradient for a special geometry called the Shahshahani geometry (also called the Fisher information geometry). > * When the fitness landscape has an evolutionarily stable state, the relative entropy of the stable state and the current state decreases to zero as the population trajectory converges to the stable state (a Lyapunov theorem). People who are already experts on evolutionary dynamics, relative entropy and information geometry, or who have been closely following the [information geometry series](http://math.ucr.edu/home/baez/information/information.html), might be able to guess what these sentences mean. However, there are very few such people! Even I had a lot of trouble guessing what it meant for a &quot;landscape&quot; to be a &quot;gradient&quot;, or for an &quot;equation&quot; to be a &quot;gradient&quot;. (The word &quot;gradient&quot; is being used in two different ways here.) So, I decided to massively expand on these points, making them into sections called &quot;Relative entropy as a Lyapunov function&quot; and &quot;The replicator equation as a gradient flow equation&quot;. I also inserted a section explaining the replicator equation in more detail. The goal in these 3 sections is to review lots of old stuff from the information geometry series, laying the ground for an explanation of Marc's new work. This old stuff is fascinating and for most readers it will be brand new. Then comes a section explaining Marc's actual work, which generalizes this old stuff. (2) David asked Marc to point out that > the range of the fitness functions can be any real number I've done this. (3) David asked Marc to point out that > evolutionarily stable state = zero of the replicator equation. I've done this. I added a quick explanation of &quot;evolutionarily stable state&quot; and pointed back to [Part 13](http://johncarlosbaez.wordpress.com/2012/06/26/information-geometry-part-13/) of the information geometry series, where this was explained in a lot more detail. In general it's always good to link back to previous blog articles when some concept has been explained before, instead of hoping people will remember. (4) David asked: > Can you add a bit more description about the Fisher information metric? Not a full definition (which is what the link is for), but something to give some general color about the metric. I gave the full definition and pointed back to [part 7](http://math.ucr.edu/home/baez/information/information_geometry_7.html) of the information geometry where it was explained in more detail. (5) David asked: > Can you give a sentence or two that further interprets why evolutionary movement would be described by a gradient flow, from higher altitudes to lower ones? I've explained that in this model (the replicator equation), evolution is a "learning process", where the amount of information "left to learn" keeps decreasing, as long as there exists an evolutionarily stable state. It's possible that a few more very hand-wavy sentences about this subject right near the beginning would help readers get the basic idea. Most readers will be killed off later on as the math intensity increases, so it's good to make important general points right near the start. (6) David wrote: > Suggestion: it would be nice, to reach a broader audience, to add some sentences that at least partially define entropy and relative entropy, at least in this context. My understanding of entropy is vague and entropic. I've added the definition of relative entropy and a quick intuitive explanation of what it means. You need to understand this concept to get an inkling of why the Fisher information metric is important, in my opinion. (7) Graham wrote: > The Fisher information metric is derived from a probability distribution, the $p(x, \theta)$ in Wikipedia. I don't understand where $p(x, \theta)$ comes from. Is it derived from $f(x)$ somehow? I hope this question is answered clearly in the new version. The probability distribution showing up in the Fisher information metric is none other than the _population_ distribution $p_i$. The probability distribution has nothing to do with the fitness functions $f_i$. It's very cute how both population and probability begin with $p$ and followed by the word "distribution". (8) I have played down the use of the term "fitness landscape", since Marc never defined it, and I'm not sure it's helpful to most people when we can equally well talk about the fitness functions $f_i$. However, evolutionary biologists love this term, so maybe we should define the **fitness landscape** to be the collection of functions $f_i$. (9) I have also played down the term "incentive", which comes in the final section, since Marc never defined it and I don't know what it means. If he thinks it's important, he can define it and we can include that definition.
  • 26.
    edited December 2013

    By the way, I'd be really grateful if someone could draw pretty pictures of the triangle

    $$ x + y + z = 1, \qquad x,y,z \ge 0$$ and the eighth-sphere

    $$x^2 + y^2 + z^2 = 1, \qquad x,y,z \ge 0$$ in the same style, showing the 3 coordinate axes (unlabelled) and drawing the surface in some translucent color if possible! I think these pictures could help a lot with explaining the population distribution simplex and the Fisher information metric!

    Comment Source:By the way, I'd be really grateful if someone could draw pretty pictures of the triangle $$ x + y + z = 1, \qquad x,y,z \ge 0$$ and the eighth-sphere $$x^2 + y^2 + z^2 = 1, \qquad x,y,z \ge 0$$ in the same style, showing the 3 coordinate axes (unlabelled) and drawing the surface in some translucent color if possible! I think these pictures could help a lot with explaining the population distribution simplex and the Fisher information metric!
  • 27.

    I made pictures.

    Comment Source:I made pictures.
  • 28.

    Great - thanks, Graham!

    Sorry to bug you, but no good deed goes unpunished: could you do a version of these pictures where the bottom plane is not shaded light grey, but rather merely white? I found the first picture confusing, since it's supposed to show a triangle, but the light grey makes it look like we're seeing two sides of a tetrahedron—which is also a simplex, but of the wrong dimension.

    Comment Source:Great - thanks, Graham! Sorry to bug you, but no good deed goes unpunished: could you do a version of these pictures where the bottom plane is not shaded light grey, but rather merely white? I found the first picture confusing, since it's supposed to show a triangle, but the light grey makes it look like we're seeing two sides of a tetrahedron&mdash;which is also a simplex, but of the wrong dimension.
  • 29.

    The bottom is now white.

    Comment Source:The bottom is now white.
  • 30.

    Great, thanks!

    Comment Source:Great, thanks! <img src = "http://math.ucr.edu/home/baez/emoticons/thumbsup.gif" alt = ""/>
  • 31.
    edited January 2014

    It's looking really good.

    A copy-edit:

    In the place where you define a Lyapunov function, you say:

    is a Lyapunov function if

    But shouldn't it be like:

    Then V is a Lyapunov function for this equation if

    Comment Source:It's looking really good. A copy-edit: In the place where you define a Lyapunov function, you say: > is a **Lyapunov function** if But shouldn't it be like: > Then V is a **Lyapunov function** for this equation if
  • 32.
    edited January 2014

    Ah, that was a typical sort of bug. Leaving out one dollar sign around a displayed equation makes a bunch of text afterward disappear!

    The passage should have been:

    To be a bit more precise, suppose we have a differential equation like

    $$ \frac{d}{d t} x(t) = v(x(t))$$ where $x(t) \in \mathbb{R}^n$ and $v$ is some smooth vector field on $\mathbb{R}^n$. Then a smooth function $$ V : \mathbb{R}^n \to \mathbb{R} $$ is a
    Lyapunov function if

    But leaving out one dollar sign after the first equation screwed everything up.

    Comment Source:Ah, that was a typical sort of bug. Leaving out one dollar sign around a displayed equation makes a bunch of text afterward disappear! The passage should have been: > To be a bit more precise, suppose we have a differential equation like > $$ \frac{d}{d t} x(t) = v(x(t))$$ > where $x(t) \in \mathbb{R}^n$ and $v$ is some smooth vector field on $\mathbb{R}^n$. Then a smooth function > $$ V : \mathbb{R}^n \to \mathbb{R} $$ > is a <b>Lyapunov function</b> if But leaving out one dollar sign after the first equation screwed everything up.
  • 33.

    Thanks everyone! It reads very well now. One comment on something John said above:

    The probability distribution has nothing to do with the fitness functions $f_i$.

    There is actually a relationship; in case anyone is interested, it goes as follows: $x_i = e^{v_i - G}$ where $\frac{d}{dt} v_i = f_i$ and $\frac{d}{dt} G = x \cdot f(x)$ is the normalizing term. However it's typically not easy to give closed forms for these auxillary differential equations, so it's not talked about much, but for one example of a solution see "On a Notion of Linear Replicator Equations" by Nihat Ay and Ionas Erb.

    I'm using fitness landscape as text for the functions $f$, as is typical in EGT, but other people and fields use different definitions. So I think it's fine to leave it a little vague. Same for the definition of incentive -- anyone interested enough can look into the reference at the end.

    I made a few very minor changes:

    (1) The Hernando et al paper doesn't actually use replicator equations, rather absolute population versions with $q \neq 1$ scaling.

    (2) Most of the results discussed in my "Information geometry and EGT" paper aren't originally mine, so I specified that it's a survey.

    (3) I changed "simplex algorithm" to just "linear programming" to avoid confusion with the other uses of simplex.

    Comment Source:Thanks everyone! It reads very well now. One comment on something John said above: > The probability distribution has nothing to do with the fitness functions $f_i$. There is actually a relationship; in case anyone is interested, it goes as follows: $x_i = e^{v_i - G}$ where $\frac{d}{dt} v_i = f_i$ and $\frac{d}{dt} G = x \cdot f(x)$ is the normalizing term. However it's typically not easy to give closed forms for these auxillary differential equations, so it's not talked about much, but for one example of a solution see "On a Notion of Linear Replicator Equations" by Nihat Ay and Ionas Erb. I'm using fitness landscape as text for the functions $f$, as is typical in EGT, but other people and fields use different definitions. So I think it's fine to leave it a little vague. Same for the definition of incentive -- anyone interested enough can look into the reference at the end. I made a few very minor changes: (1) The Hernando et al paper doesn't actually use replicator equations, rather absolute population versions with $q \neq 1$ scaling. (2) Most of the results discussed in my "Information geometry and EGT" paper aren't originally mine, so I specified that it's a survey. (3) I changed "simplex algorithm" to just "linear programming" to avoid confusion with the other uses of simplex.
  • 34.

    Great! I'll post this one in about a week. I'm going to Germany for the rest of the winter and spring on Wednesday January 15th, so I'll either post right before that or a couple days afterward.

    Comment Source:Great! I'll post this one in about a week. I'm going to Germany for the rest of the winter and spring on Wednesday January 15th, so I'll either post right before that or a couple days afterward.
  • 35.
    edited January 2014

    Marc Harper needed some extra time, since he was busy at a conference, but the blog post has now appeared:

    Thanks to everyone who helped, especially David Tanzer and Graham Jones!

    Comment Source:Marc Harper needed some extra time, since he was busy at a conference, but the blog post has now appeared: * [Relative entropy in evolutionary dynamics](http://johncarlosbaez.wordpress.com/2014/01/22/relative-entropy-in-evolutionary-dynamics/). Thanks to everyone who helped, especially David Tanzer and Graham Jones! <img src = "http://math.ucr.edu/home/baez/emoticons/thumbsup.gif" alt = ""/>
Sign In or Register to comment.