Options

Tutorial on stochastic Petri nets, with SIR disease model as example

edited March 27 in Epidemiology

In this model, this model tracks the evolution of three sub-populations:

  • S(t) = susceptible individuals at time t
  • I(t) = infected individuals
  • R(t) = recovered individuals

For a general Petri net, these are called species.

There is a process called infection, which takes as "inputs" one infected individual and one susceptible individual, and "outputs" two infected individuals. This process can be written:

\[\mathit{infection}: S + I \rightarrow I + I\]

Another process called recovery inputs an infected individual and outputs a recovered individual:

\[\mathit{recovery}: I \rightarrow R\]

The processes are also referred to as transitions, or reactions.

Each process has an associated transition rate, which is a kind of "speed coefficient" for the reaction.

Comments

  • 1.
    edited March 27

    For a stochastic Petri net, the size of each species is given by an (evolving) non-negative integer.

    The state of the net is therefore given by a triple (s,i,r) of natural numbers. In other words, the state space is \(\mathbb{N}^3\).

    The transitions fire at random times, and have the effect of decreasing their input populations and increasing their output populations by the indicated amounts.

    One firing of recovery has the effect of decreasing \(I\) by 1 and increasing \(R\) by 1.

    Formally, one firing of infection would decrease \(S\) and \(I\) by, and increase \(I\) by 2. But netting this out, we see that one firing of infection decreases \(S\) by 1 and increases \(I\) by 1.

    For example, suppose we were in state \(x = (5,2,1)\).

    Then if infection fired, we would move to \(x' = (4,3,1)\).

    Then if recovery fired, we would move to \(x'' = (4,2,2)\).

    Comment Source:For a stochastic Petri net, the size of each species is given by an (evolving) non-negative integer. The state of the net is therefore given by a triple (s,i,r) of natural numbers. In other words, the state space is \\(\mathbb{N}^3\\). The transitions fire at random times, and have the effect of decreasing their input populations and increasing their output populations by the indicated amounts. One firing of recovery has the effect of decreasing \\(I\\) by 1 and increasing \\(R\\) by 1. Formally, one firing of infection would decrease \\(S\\) and \\(I\\) by, and increase \\(I\\) by 2. But netting this out, we see that one firing of infection decreases \\(S\\) by 1 and increases \\(I\\) by 1. For example, suppose we were in state \\(x = (5,2,1)\\). Then if infection fired, we would move to \\(x' = (4,3,1)\\). Then if recovery fired, we would move to \\(x'' = (4,2,2)\\).
  • 2.
    edited March 28

    Each transition is modeled as a Poisson process with a variable rate parameter.

    Specifically, the firing rate of the process is equal to its rate constant times the product of its input population counts.

    (EDIT: this is only true to a first approximation, for large population sizes. The refinement is described in comment 14.)

    Suppose in our example, that the transition rate for infection was 2.1, and the transition rate for recovery was 7.9.

    Then in state \(x = (5,2,1)\):

    • The firing rate for infection would be \(2.1 \cdot S \cdot I = 10.2\).

    • The firing rate for recovery would be \(7.9 \cdot I = 15.8\).

    And to say that the firing rate for a process is \(z\) means the following: in an infinitesimal time interval \(dt\), the probability of the process firing is \(z * dt\).

    I.e., in the limit as dt goes to 0, Prob(firing)/dt = z.

    Comment Source:Each transition is modeled as a Poisson process with a variable rate parameter. Specifically, the firing rate of the process is equal to its rate constant times the product of its input population counts. (EDIT: this is only true to a first approximation, for large population sizes. The refinement is described in comment 14.) Suppose in our example, that the transition rate for infection was 2.1, and the transition rate for recovery was 7.9. Then in state \\(x = (5,2,1)\\): * The firing rate for infection would be \\(2.1 \cdot S \cdot I = 10.2\\). * The firing rate for recovery would be \\(7.9 \cdot I = 15.8\\). And to say that the firing rate for a process is \\(z\\) means the following: in an infinitesimal time interval \\(dt\\), the probability of the process firing is \\(z * dt\\). I.e., in the limit as dt goes to 0, Prob(firing)/dt = z.
  • 3.
    edited March 28

    That the firing rates are proportional to the input population sizes is called the law of mass action.

    It makes sense the the firing rate of infection will be proportional to \(S \cdot I\).

    (EDIT: this is only true to a first approximation, for large population sizes. The refinement is described in comment 14.)

    Before an epidemic has begun, \(I(t)\) is small, so infections occur at a low rate. As \(I\) begins to increase, the rate of infection increases proportionally. This is the exponential growth at the outset.

    As the epidemic progresses, \(S\) goes down, which has the effect of reducing the base of the exponential.

    At the tail end of the epidemic, the combination of reduced values for \(S\) and \(I\) pulls the infection rate down.

    These dynamics are responsible for the sigmoidal shape of the cumulative infection curve.

    Comment Source:That the firing rates are proportional to the input population sizes is called the _law of mass action_. It makes sense the the firing rate of infection will be proportional to \\(S \cdot I\\). (EDIT: this is only true to a first approximation, for large population sizes. The refinement is described in comment 14.) Before an epidemic has begun, \\(I(t)\\) is small, so infections occur at a low rate. As \\(I\\) begins to increase, the rate of infection increases proportionally. This is the exponential growth at the outset. As the epidemic progresses, \\(S\\) goes down, which has the effect of reducing the base of the exponential. At the tail end of the epidemic, the combination of reduced values for \\(S\\) and \\(I\\) pulls the infection rate down. These dynamics are responsible for the sigmoidal shape of the cumulative infection curve.
  • 4.
    edited March 28

    For more complex examples of stochastic Petri net, see:

    There is one net for a chemical reaction:

    \[2 NAD^+ + 2 H_2 O \rightarrow 2 NADH + 2 H^+ + O_2\]

    Here the species are molecules, and the transitions are chemical reactions.

    Note this is using the standard graphical notation for Petri nets, which is a directed bipartite graph, with circles for the species and squares for the transitions. There is a directed edge from a circle to a square whenever the corresponding species is input to the corresponding transition, and similarly an edge from a square to a circle whenever the transition outputs to the species.

    Those slides also contain a more complex Petri net, for a biochemical reaction network (the "RKIP pathway").

    Comment Source:For more complex examples of stochastic Petri net, see: * Monika Heiner, [From Petri nets to differential equations: an integrative approach for biochemical network analysis](https://www-dssz.informatik.tu-cottbus.de/publications/slides/2005_muenchen_msbf.sld.pdf), Brandenburg University of Technology Cottbus, Dept. of CS, May 2005. There is one net for a chemical reaction: \\[2 NAD^+ + 2 H_2 O \rightarrow 2 NADH + 2 H^+ + O_2\\] Here the species are molecules, and the transitions are chemical reactions. Note this is using the standard graphical notation for Petri nets, which is a directed bipartite graph, with circles for the species and squares for the transitions. There is a directed edge from a circle to a square whenever the corresponding species is input to the corresponding transition, and similarly an edge from a square to a circle whenever the transition outputs to the species. Those slides also contain a more complex Petri net, for a biochemical reaction network (the "RKIP pathway").
  • 5.
    edited March 27

    For each time point \(t\), we have three random variables \(S(t)\), \(I(t)\) and \(R(t)\).

    Together, they constitute a vector-valued stochastic process, over the continuous time parameter \(t\). Here, \(V(t) = (S(t), I(t), R(t))\).

    It is a Markov process.

    This statements are true for all stochastic Petri nets.

    Comment Source:For each time point \\(t\\), we have three random variables \\(S(t)\\), \\(I(t)\\) and \\(R(t)\\). Together, they constitute a vector-valued stochastic process, over the continuous time parameter \\(t\\). Here, \\(V(t) = (S(t), I(t), R(t))\\). It is a Markov process. This statements are true for all stochastic Petri nets.
  • 6.
    edited March 27

    We may inquire about the expected values \(E(S(t))\), \(E(I(t))\) and \(E(R(t))\).

    These together form another vector-valued process, consisting of the expected values of the populations at each point in time.

    But this process is deterministic - for each point in time, we have three scalar values, one for species.

    Comment Source:We may inquire about the expected values \\(E(S(t))\\), \\(E(I(t))\\) and \\(E(R(t))\\). These together form another vector-valued process, consisting of the expected values of the populations at each point in time. But _this_ process is deterministic - for each point in time, we have three scalar values, one for species.
  • 7.

    The mathematical derivation of this deterministic mean-value process from the definitional properties of stochastic Petri nets - this gets into some deeper mathematical waters, which we won't cover here.

    Here is a whole book devoted to the subject:

    • John C. Baez and Jacob Biamonte, Quantum techniques for stochastic mechanics, arXiv:1209.3632 [quant-ph]. Text includes treatment from the ground up of Petri nets, both stochastic and deterministic. Example includes SI, SIR and SIRS models.

    We can look into this in another context - in the network theory study group!

    Comment Source:The mathematical derivation of this deterministic mean-value process from the definitional properties of stochastic Petri nets - this gets into some deeper mathematical waters, which we won't cover here. Here is a whole book devoted to the subject: * John C. Baez and Jacob Biamonte, [Quantum techniques for stochastic mechanics](https://arxiv.org/abs/1209.3632), arXiv:1209.3632 [quant-ph]. Text includes treatment from the ground up of Petri nets, both stochastic and deterministic. Example includes SI, SIR and SIRS models. We can look into this in another context - in the network theory study group!
  • 8.

    Fortunately, it is not necessary to go through first principles in order to understand the deterministic mean process for a stochastic Petri net.

    In fact, it turns out to be a rather straightforward and intuitive construction. It is given by ordinary differential equations, which can be directly constructed, using the graph structure of the Petri net and the transition coefficients.

    Comment Source:Fortunately, it is not necessary to go through first principles in order to understand the deterministic mean process for a stochastic Petri net. In fact, it turns out to be a rather straightforward and intuitive construction. It is given by ordinary differential equations, which can be directly constructed, using the graph structure of the Petri net and the transition coefficients.
  • 9.

    While the population counts are integers, their expected values are real.

    Hence, for the deterministic process, we have moved into the realm of "continuous Petri nets."

    Whereas for stochastic Petri nets, we can visualize the species as places that hold discrete tokens, now they are places that hold continuous magnitudes.

    Picture them as buckets holding a certain amount of "numerical fluid." The processes are then continuous pumps, draining their input buckets, and filling their output buckets.

    An analogous law of mass action still holds: the pumping rate of a process is equal to its transition coefficient times the product of the values present at the input buckets.

    Comment Source:While the population counts are integers, their expected values are real. Hence, for the deterministic process, we have moved into the realm of "continuous Petri nets." Whereas for stochastic Petri nets, we can visualize the species as _places_ that hold discrete tokens, now they are places that hold continuous magnitudes. Picture them as buckets holding a certain amount of "numerical fluid." The processes are then continuous pumps, draining their input buckets, and filling their output buckets. An analogous law of mass action still holds: the pumping rate of a process is equal to its transition coefficient times the product of the values present at the input buckets.
  • 10.
    edited March 27

    This leads us to the standard presentation of the SIR model for epidemics, which is deterministic and based on ordinary differential equations. See:

    When dealing with the large scales of human populations, we project the trends of the expected values, and don't track all of the micro-variations.

    Comment Source:This leads us to the standard presentation of the SIR model for epidemics, which is deterministic and based on ordinary differential equations. See: * [Compartmental models in epidemiology](https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology), Wikipedia. When dealing with the large scales of human populations, we project the trends of the expected values, and don't track all of the micro-variations.
  • 11.

    On the other hand, at smaller scales, the non-deterministic structure of a stochastic Petri net becomes more interesting, see for example:

    Comment Source:On the other hand, at smaller scales, the non-deterministic structure of a stochastic Petri net becomes more interesting, see for example: * Gang Wang, Ying Zhang, Simon J. Shepherd, Clive B. Beggs, Nini Rao, [Application of stochastic Petri nets for modelling the transmission of airborne Infection in Indoor Environment](https://pdfs.semanticscholar.org/c701/1a719988e589e0425ae7346b2202f93a78f6.pdf), Acta Medica Mediterranea, 2016, 32: 587.
  • 12.
    edited March 27

    In fact, though, local behavior does make a difference in the evolution of epidemics. For example, just look at the variation in epidemic characteristics across countries for COVID-19.

    Perhaps one way to get at this would be to analyze a population into local zones, where each zone has its own \(S, I, R\) populations. Each zone could have its own transition coefficients, depending on local conditions.

    Then there would also be transitions representing the flow of individuals due to travel between zones.

    Not sure whether this has been done, or whether it would be fruitful, but it's a concept.

    Comment Source:In fact, though, local behavior _does_ make a difference in the evolution of epidemics. For example, just look at the variation in epidemic characteristics across countries for COVID-19. Perhaps one way to get at this would be to analyze a population into local zones, where each zone has its own \\(S, I, R\\) populations. Each zone could have its own transition coefficients, depending on local conditions. Then there would also be transitions representing the flow of individuals due to travel between zones. Not sure whether this has been done, or whether it would be fruitful, but it's a concept.
  • 13.
    edited March 28

    Exercises and notes:

    • Work out the differential equations for the SIR model, based on the pumping principles and the law of mass action I described in comment 9. You can check your answers with the SIR equations given in the Wikipedia article on Compartmental models in epidemiology.

    • Do the same for the simpler SI model, which is obtained from SIR by removing R and the recovery transition. Compare your results to the Logistic equation.

    • See this discussion on the forum, regarding the logistic equation and COVID-19.

    Comment Source:Exercises and notes: * Work out the differential equations for the SIR model, based on the pumping principles and the law of mass action I described in comment 9. You can check your answers with the SIR equations given in the Wikipedia article on [Compartmental models in epidemiology](https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology). * Do the same for the simpler SI model, which is obtained from SIR by removing R and the recovery transition. Compare your results to the [[Logistic equation]]. * See [this discussion](https://forum.azimuthproject.org/discussion/377/logistic-equation-with-recent-commentary-on-corona-virus) on the forum, regarding the logistic equation and COVID-19.
  • 14.
    edited March 28

    Above I wrote:

    Specifically, the firing rate of the process is equal to its rate constant times the product of its input population counts.

    And:

    That the firing rates are proportional to the input population sizes is called the law of mass action.

    I see this needs to be corrected. This product rule holds only when none of the input species to a transition occur with multiplicity more than one. The general case is given by a modified product rule.

    The operative principle here is explained in:

    The modified product rule actual follows from a more abstract principle.

    Picture the state space of the Petri net as a graph with nodes \(\mathbb{N}^k\).

    Each transition induces a (possibly infinite) collection of edges into this graph. These edges are generated by the "delta vector" that is produced by one firing of the transition.

    For example, in the SI model, the delta vector is (-1,1).

    Note: the technical name is not delta vector but stoichiometric vector. So I took some poetic license.

    Furthermore, each edge \(e\) from state \(x\) to state \(y\) gets weighted by the number of ways of moving directly from \(x\) to \(y\).

    And this counting procedure is predicated on the idea that the individuals in the population are distinguishable.

    For example, suppose that we had a transition:

    \[t: A + B \rightarrow Z\]

    The delta vector for \(t\) is (-1, -1, +1).

    For example, take the SI model, and suppose that edge \(e\) goes from \(x = (2,4,6)\) to \(y = (1,3,7)\).

    The underlying procedure for a single step of the transition involved choosing one of 2 members from A and one of 3 members from B. So there are 6 ways that this transition can be performed, and the weight for the edge is 6. There you have the product rule.

    But now consider the case of:

    \[t: A + A \rightarrow X\]

    Now the delta vector is (-2, +1).

    Now the underlying procedure involves choosing a member from species A, and then choosing a second member from A after it has been reduced by one from the first step.

    So if \(e\) is an edge from \(x\) to \(y\), and state \(x\) contains \(k\) members of A, then the weight given to \(e\) is \(k \cdot (k-1)\).

    This is called the falling power.

    The modified product rule then is clear: take the product of falling powers, where there is one falling power for each distinct species that occurs as input to a transition.

    The law of mass action -- which simply uses the product rule -- follows as a limiting approximation for large population sizes.

    Comment Source:Above I wrote: > Specifically, the firing rate of the process is equal to its rate constant times the product of its input population counts. And: > That the firing rates are proportional to the input population sizes is called the _law of mass action_. I see this needs to be corrected. This product rule holds only when none of the input species to a transition occur with multiplicity more than one. The general case is given by a modified product rule. The operative principle here is explained in: * John C. Baez and Jacob Biamonte, [Quantum techniques for stochastic mechanics](https://arxiv.org/abs/1209.3632), arXiv:1209.3632 [quant-ph]. The modified product rule actual follows from a more abstract principle. Picture the state space of the Petri net as a graph with nodes \\(\mathbb{N}^k\\). Each transition induces a (possibly infinite) collection of edges into this graph. These edges are generated by the "delta vector" that is produced by one firing of the transition. For example, in the SI model, the delta vector is (-1,1). Note: the technical name is not delta vector but stoichiometric vector. So I took some poetic license. Furthermore, each edge \\(e\\) from state \\(x\\) to state \\(y\\) gets weighted by the _number of ways_ of moving directly from \\(x\\) to \\(y\\). And this counting procedure is predicated on the idea that the individuals in the population are _distinguishable_. For example, suppose that we had a transition: \\[t: A + B \rightarrow Z\\] The delta vector for \\(t\\) is (-1, -1, +1). For example, take the SI model, and suppose that edge \\(e\\) goes from \\(x = (2,4,6)\\) to \\(y = (1,3,7)\\). The underlying procedure for a single step of the transition involved choosing one of 2 members from A and one of 3 members from B. So there are 6 ways that this transition can be performed, and the weight for the edge is 6. There you have the product rule. But now consider the case of: \\[t: A + A \rightarrow X\\] Now the delta vector is (-2, +1). Now the underlying procedure involves choosing a member from species A, and then choosing a second member from A after it has been reduced by one from the first step. So if \\(e\\) is an edge from \\(x\\) to \\(y\\), and state \\(x\\) contains \\(k\\) members of A, then the weight given to \\(e\\) is \\(k \cdot (k-1)\\). This is called the falling power. The modified product rule then is clear: take the product of falling powers, where there is one falling power for each distinct species that occurs as input to a transition. The law of mass action -- which simply uses the product rule -- follows as a limiting approximation for large population sizes.
  • 15.
    edited March 29

    Master Equation

    The 'master equation' determines the probabilistic structure of the process for a Petri net.

    To do this, first we'll review how a Petri net is a kind of continuous time Markov chain (CTMC).

    Then we can apply the general concept of the master equation for CTMC to the case of Petri nets.

    Comment Source:**Master Equation** The 'master equation' determines the probabilistic structure of the process for a Petri net. To do this, first we'll review how a Petri net is a kind of continuous time Markov chain (CTMC). Then we can apply the general concept of the master equation for CTMC to the case of Petri nets.
  • 16.
    edited March 29

    Say that our Petri net has species S.

    Let a definite state to be a mapping from S into \(\mathbb{N}\), giving the size of all the populations.

    So the set of all definite states is \(\mathbb{N}^S\).

    Let a stochastic state to be a probability distribution over \(\mathbb{N}^S\).

    Comment Source:Say that our Petri net has species S. Let a _definite state_ to be a mapping from S into \\(\mathbb{N}\\), giving the size of all the populations. So the set of all definite states is \\(\mathbb{N}^S\\). Let a _stochastic state_ to be a probability distribution over \\(\mathbb{N}^S\\).
  • 17.

    Here is the problem at hand, which the master equation will address.

    Suppose that we know that at time 0, the net is in a stochastic state \(s_0\).

    Then at time \(t\), what stochastic state will the net be in?

    Comment Source:Here is the problem at hand, which the master equation will address. Suppose that we know that at time 0, the net is in a stochastic state \\(s_0\\). Then at time \\(t\\), what stochastic state will the net be in?
  • 18.
    edited March 29

    First, let's tighten up the construction of a Markov chain for a given Petri net.

    Here is a general recipe for constructing a CTMC. Form a labeled, directed graph which has Nodes = the definite states, and a directed edge from \(x\) to \(y\) whenever \(y\) can be reached from \(x\) in one "hop." The edge will get labeled with the transition rate for this hopping.

    That is what we did above in comment 14, when we constructed a directed, labeled graph with Nodes = \(\mathbb{N}^S\) = the definite states.

    Note: here we have a graph with a countably infinite number of nodes. However, things are simplified by the fact that each node has only a finite number of neighbors.

    Specifically, for our Petri net graph, there is a directed edge from \(x \in \mathbb{N}^S\) to \(y \in \mathbb{N}^S\) whenever \(y\) can be reached from \(x\) by a single firing of one of the transitions.

    And this holds whenever \(y - x\) equals the stoichiometric vector ("delta vector") for one of the transitions.

    If multiple transitions have the same stoichiometric vector \(y - x\), then we will introduce, as an intermediate construction, multiple edges from \(x\) to \(y\).

    Comment Source:First, let's tighten up the construction of a Markov chain for a given Petri net. Here is a general recipe for constructing a CTMC. Form a labeled, directed graph which has Nodes = the definite states, and a directed edge from \\(x\\) to \\(y\\) whenever \\(y\\) can be reached from \\(x\\) in one "hop." The edge will get labeled with the transition rate for this hopping. That is what we did above in comment 14, when we constructed a directed, labeled graph with Nodes = \\(\mathbb{N}^S\\) = the definite states. Note: here we have a graph with a countably infinite number of nodes. However, things are simplified by the fact that each node has only a finite number of neighbors. Specifically, for our Petri net graph, there is a directed edge from \\(x \in \mathbb{N}^S\\) to \\(y \in \mathbb{N}^S\\) whenever \\(y\\) can be reached from \\(x\\) by a single firing of one of the transitions. And this holds whenever \\(y - x\\) equals the stoichiometric vector ("delta vector") for one of the transitions. If multiple transitions have the same stoichiometric vector \\(y - x\\), then we will introduce, as an intermediate construction, multiple edges from \\(x\\) to \\(y\\).
  • 19.
    edited March 30

    Let's get more specific now.

    Let \(t\) be a transition, and Stoich(\(t\)) be its stoichiometric vector.

    For a species \(z\), let InpDegree(\(t,z\)) be the number of places that \(t\) inputs from \(z\), and OutDegree(\(t,z\)) be the number of places \(t\) outputs to \(z\).

    Let \(x \in \mathbb{N}^S\) be a definite state, and let \(x' = x\) + Stoich(\(t\)).

    If \(x'\) contains any negative components, then it is outside of \(\mathbb{N}^S\), and it is not a valid state. That indicates that the transition \(t\) doesn't have a sufficient number of input tokens in \(x\) to fire.

    If however \(x'\) does belong to \(\mathbb{N}^S\), then we add an edge \(e(t,x,x')\) from \(x\) to \(x'\) in the graph.

    Comment Source:Let's get more specific now. Let \\(t\\) be a transition, and Stoich(\\(t\\)) be its stoichiometric vector. For a species \\(z\\), let InpDegree(\\(t,z\\)) be the number of places that \\(t\\) inputs from \\(z\\), and OutDegree(\\(t,z\\)) be the number of places \\(t\\) outputs to \\(z\\). Let \\(x \in \mathbb{N}^S\\) be a definite state, and let \\(x' = x\\) + Stoich(\\(t\\)). If \\(x'\\) contains any negative components, then it is outside of \\(\mathbb{N}^S\\), and it is not a valid state. That indicates that the transition \\(t\\) doesn't have a sufficient number of input tokens in \\(x\\) to fire. If however \\(x'\\) does belong to \\(\mathbb{N}^S\\), then we add an edge \\(e(t,x,x')\\) from \\(x\\) to \\(x'\\) in the graph.
  • 20.
    edited March 30

    Here we define the weight for edge \(e(t,x,x')\):

    • Weight(e(t,x,x')) = RateConstant(t) * \(\prod_{\mathit{species} \in S}\) FallingPower(Population(x,species), InpDegree(t, species))

    Where:

    • FallingPower(n, k) = n * (n-1) * ... * (n - k + 1)

    This gives the "rate of probability flow" through the edge.

    Comment Source:Here we define the weight for edge \\(e(t,x,x')\\): * Weight(e(t,x,x')) = RateConstant(t) * \\(\prod_{\mathit{species} \in S}\\) FallingPower(Population(x,species), InpDegree(t, species)) Where: * FallingPower(n, k) = n * (n-1) * ... * (n - k + 1) This gives the "rate of probability flow" through the edge.
  • 21.
    edited March 30

    Just two more steps to wrap up the construction of the Markov chain graph for our Petri net.

    First, form the union of the all of these per-transition graphs.

    Second, wherever this union leads to multiple edges going from \(x\) to \(x'\), then sum their weights and replace with a single edge.

    That's it.

    Comment Source:Just two more steps to wrap up the construction of the Markov chain graph for our Petri net. First, form the union of the all of these per-transition graphs. Second, wherever this union leads to multiple edges going from \\(x\\) to \\(x'\\), then sum their weights and replace with a single edge. That's it.
  • 22.
    edited March 31

    Once we have a graph labeled with transition rates, we can proceed to the master equation, which gives a deterministic law for how the stochastic state changes over time -- once an initial stochastic state is specified.

    Let \(\Gamma_{\sigma}(t)\) be the stochastic state at time \(t\), given \(\sigma\) as the initial stochastic state.

    Comment Source:Once we have a graph labeled with transition rates, we can proceed to the master equation, which gives a deterministic law for how the stochastic state changes over time -- once an initial stochastic state is specified. Let \\(\Gamma_{\sigma}(t)\\) be the stochastic state at time \\(t\\), given \\(\sigma\\) as the initial stochastic state.
  • 23.
    edited March 31

    The master equation for the Markov chain is a vector differential equation:

    \[\Gamma_{\sigma}'(t) = H(\Gamma_{\sigma}(t))\]

    where \(H\) is a linear operator to be constructed from the weighted graph for the Markov chain.

    Comment Source:The _master equation_ for the Markov chain is a vector differential equation: \\[\Gamma_{\sigma}'(t) = H(\Gamma_{\sigma}(t))\\] where \\(H\\) is a linear operator to be constructed from the weighted graph for the Markov chain.
  • 24.
    edited March 31

    To make this all work smoothly, let's situate the stochastic states inside of a containing vector space.

    Let D be the definite states -- these are the nodes in the transition graph.

    The vector space that we will work within is the space of real-valued functions on D, i.e. \(\mathbb{R}^D\).

    Let \(\Delta \subset R^D\) be the simplex in F, meaning the functions taking values in [0,1] where all the values sum to 1.

    \(\Delta\) is all the probability distributions over D, all the stochastic states.

    Comment Source:To make this all work smoothly, let's situate the stochastic states inside of a containing vector space. Let D be the definite states -- these are the nodes in the transition graph. The vector space that we will work within is the space of real-valued functions on D, i.e. \\(\mathbb{R}^D\\). Let \\(\Delta \subset R^D\\) be the simplex in F, meaning the functions taking values in [0,1] where all the values sum to 1. \\(\Delta\\) is all the probability distributions over D, all the stochastic states.
  • 25.
    edited March 31

    Now we can write:

    \[\Gamma_{\sigma}: [0,\infty) \rightarrow \Delta \subset \mathbb{R}^D\]

    \[\Gamma_{\sigma}(0) = \sigma\]

    \[\Gamma_{\sigma}': [0,\infty) \rightarrow \mathbb{R}^D\]

    Comment Source:Now we can write: \\[\Gamma_{\sigma}: [0,\infty) \rightarrow \Delta \subset \mathbb{R}^D\\] \\[\Gamma_{\sigma}(0) = \sigma\\] \\[\Gamma_{\sigma}': [0,\infty) \rightarrow \mathbb{R}^D\\]
  • 26.
    edited March 31

    For definite states \(i, j \in D\), define:

    \[H_{i,j} = (\Gamma_{i}'(0))(j)\]

    This is the instantaneous rate of change of the probability of being in state \(j\), given that the chain began in state \(i\).

    Loosely speaking, it is the rate at which probability flows from definite state \(i\) to definite state \(j\).

    Comment Source:For definite states \\(i, j \in D\\), define: \\[H_{i,j} = (\Gamma_{i}'(0))(j)\\] This is the instantaneous rate of change of the probability of being in state \\(j\\), given that the chain began in state \\(i\\). Loosely speaking, it is the rate at which probability flows from definite state \\(i\\) to definite state \\(j\\).
  • 27.
    edited March 31

    This is exactly the same concept of flow rates that we used for the edge weights in the directed graph for the Markov chain.

    The set of values \(H_{i,j}\) is an "infinite square matrix." The values can be picked off right from the graph: \(H_{i,j}\) = the weight of the edge from \(i\) to \(j\) if it exists in the graph, else 0.

    The weighted graph is a specification for the system of coefficients \([H_{i,j} ]\) that determine the master equation.

    Comment Source:This is exactly the same concept of flow rates that we used for the edge weights in the directed graph for the Markov chain. The set of values \\(H_{i,j}\\) is an "infinite square matrix." The values can be picked off right from the graph: \\(H_{i,j}\\) = the weight of the edge from \\(i\\) to \\(j\\) if it exists in the graph, else 0. The weighted graph is a specification for the system of coefficients \\(\[H_{i,j} \]\\) that determine the master equation.
  • 28.
    edited March 31

    Let's look at the master equation again:

    \[\Gamma_{\sigma}'(t) = H(\Gamma_{\sigma}(t))\]

    The linear operator \(H\) is just matrix multiplication by \([H_{i,j}]\).

    Instantiating to t=0, we get:

    \[\Gamma_{\sigma}'(0) = H(\Gamma_{\sigma}(0)) = H(\sigma)\]

    For a definite state \(j\), this gives us:

    \[\Gamma_{j}'(0) = H(\Gamma_{j}(0)) = H(j)\]

    which equals the jth column of \([H_{i,j}]\).

    Comment Source:Let's look at the master equation again: \\[\Gamma_{\sigma}'(t) = H(\Gamma_{\sigma}(t))\\] The linear operator \\(H\\) is just matrix multiplication by \\(\[H_{i,j}\]\\). Instantiating to t=0, we get: \\[\Gamma_{\sigma}'(0) = H(\Gamma_{\sigma}(0)) = H(\sigma)\\] For a definite state \\(j\\), this gives us: \\[\Gamma_{j}'(0) = H(\Gamma_{j}(0)) = H(j)\\] which equals the jth column of \\(\[H_{i,j}\]\\).
  • 29.

    Recap: for each definite state \(j\), the jth column of \([H_{i,j}]\) equals the derivative of the stochastic state when the chain is started in \(j\).

    Comment Source:Recap: for each definite state \\(j\\), the jth column of \\(\[H_{i,j}\]\\) equals the derivative of the stochastic state when the chain is started in \\(j\\).
  • 30.
    edited March 31

    But for the master equation to be valid, \([H_{i,j}]\) can't be any old infinite square matrix.

    There are two criteria to be satisfied.

    Look at the matrix multiplication involved in computing \(H(\sigma)\), for a general stochastic state \(\sigma\). This is an infinite sum, which is required to converge for \(H(\sigma)\) to be well defined.

    A sufficient condition to guarantee this convergence is that in the graph for the Markov chain, each node has only a finite number of neighbors. From this it follows that the rows (and columns) will only have a finite number of nonzero entries, which makes \(H(\sigma)\) well-defined.

    Comment Source:But for the master equation to be valid, \\(\[H_{i,j}\]\\) can't be any old infinite square matrix. There are two criteria to be satisfied. Look at the matrix multiplication involved in computing \\(H(\sigma)\\), for a general stochastic state \\(\sigma\\). This is an infinite sum, which is required to converge for \\(H(\sigma)\\) to be well defined. A sufficient condition to guarantee this convergence is that in the graph for the Markov chain, each node has only a finite number of neighbors. From this it follows that the rows (and columns) will only have a finite number of nonzero entries, which makes \\(H(\sigma)\\) well-defined.
  • 31.
    edited March 31

    The second condition on \([H_{i,j}]\) is that it must be infinitesimal stochastic, which means this:

    • All the off-diagonal elements are non-negative
    • Each column sums to zero
    Comment Source:The second condition on \\(\[H_{i,j}\]\\) is that it must be _infinitesimal stochastic_, which means this: * All the off-diagonal elements are non-negative * Each column sums to zero
  • 32.
    edited March 31

    The reason for this requirement stems from the master equation:

    \[\Gamma_{\sigma}'(t) = H(\Gamma_{\sigma}(t))\]

    Since \(\Gamma(t)\) must always remain in the simplex \(\Delta\) -- i.e. must always be a probability distribution -- a big constraint is placed on \(\Gamma'(t)\).

    Specifically, \(\Gamma'(t)\) as tangent vector must contain no normal component with respect to the "plane of the simplex." More accurately put, it must be perpendicular to any vector which is normal to the simplex. The simplex vector which is normal to the simplex is the function which maps every definite state \(D\) to 1.

    And to say that a tangent vector is normal to this is to say that the sum of the components of this tangent vector is 0.

    Now the jth column of the matrix for H is the tangent vector \(\Gamma_{j}'(0)\).

    Hence each column must sum to 0.

    Comment Source:The reason for this requirement stems from the master equation: \\[\Gamma_{\sigma}'(t) = H(\Gamma_{\sigma}(t))\\] Since \\(\Gamma(t)\\) must always remain in the simplex \\(\Delta\\) -- i.e. must always be a probability distribution -- a big constraint is placed on \\(\Gamma'(t)\\). Specifically, \\(\Gamma'(t)\\) as tangent vector must contain no normal component with respect to the "plane of the simplex." More accurately put, it must be perpendicular to any vector which is normal to the simplex. The simplex vector which is normal to the simplex is the function which maps every definite state \\(D\\) to 1. And to say that a tangent vector is normal to this is to say that the sum of the components of this tangent vector is 0. Now the jth column of the matrix for H is the tangent vector \\(\Gamma_{j}'(0)\\). Hence each column must sum to 0.
  • 33.

    The requirement that the off-diagonal elements of the jth column be non-negative follows from the geometry of the simplex, as per the following.

    Observe that \(\Gamma_j(0) = j\) = the corner of the simplex which is 1 at j, and 0 elsewhere.

    The derivative \(\Gamma'_j(0)\) is a tangent vector which must lead from this corner further into the simplex.

    (Or else the tangent vector is zero.)

    So it must be decreasing the jth component to start going down from 1, and increasing all the other components to start increasing from 0.

    The latter implies that the off-diagonal elements are all non-negative.

    That shows why H must be infinitesimal stochastic.

    Putting the two conditions together, we see that the diagonal entry will be the negative of the sum of the off-diagonal entries.

    Comment Source:The requirement that the off-diagonal elements of the jth column be non-negative follows from the geometry of the simplex, as per the following. Observe that \\(\Gamma_j(0) = j\\) = the corner of the simplex which is 1 at j, and 0 elsewhere. The derivative \\(\Gamma'_j(0)\\) is a tangent vector which must lead from this corner further into the simplex. (Or else the tangent vector is zero.) So it must be decreasing the jth component to start going down from 1, and increasing all the other components to start increasing from 0. The latter implies that the off-diagonal elements are all non-negative. That shows why H must be infinitesimal stochastic. Putting the two conditions together, we see that the diagonal entry will be the negative of the sum of the off-diagonal entries.
  • 34.

    And we can see this requirement concretely as well.

    If you start in state j, the probability of being in the other states reachable from j will start increasing from 0, and the probability of remaining in state j will start decreasing from 1.

    (Unless it is an absorbing state.)

    Because the probabilities of being somewhere will always total to 1, the sum of these derivatives must equal zero.

    Comment Source:And we can see this requirement concretely as well. If you start in state j, the probability of being in the other states reachable from j will start increasing from 0, and the probability of remaining in state j will start decreasing from 1. (Unless it is an absorbing state.) Because the probabilities of being _somewhere_ will always total to 1, the sum of these derivatives must equal zero.
  • 35.
    edited April 4

    Solution to the master equation

    Given

    \[\Gamma: [0,\infty) \rightarrow \mathbb{R}^D\]

    and a linear operator

    \[H: \mathbb{R}^D \rightarrow \mathbb{R}^D\]

    the equation to be solved is

    \[\Gamma'(t) = H(\Gamma(t))\]

    Comment Source:**Solution to the master equation** Given \\[\Gamma: [0,\infty) \rightarrow \mathbb{R}^D\\] and a linear operator \\[H: \mathbb{R}^D \rightarrow \mathbb{R}^D\\] the equation to be solved is \\[\Gamma'(t) = H(\Gamma(t))\\]
  • 36.
    edited April 4

    So it's a vector-valued linear differential equation.

    The solution is a natural generalization from the case where \(D\) contains just one element. In that case, the linear

    \[H: \mathbb{R}^1 \rightarrow \mathbb{R}^1\]

    amounts to multiplication by a constant \(H\), and the master equation takes the simple form:

    \[\Gamma'(t) = H \cdot \Gamma(t)\]

    which is an elementary differential equation, with solution:

    \[\Gamma(t) = e^{H t}\ \Gamma(0)\]

    Comment Source:So it's a vector-valued linear differential equation. The solution is a natural generalization from the case where \\(D\\) contains just one element. In that case, the linear \\[H: \mathbb{R}^1 \rightarrow \mathbb{R}^1\\] amounts to multiplication by a constant \\(H\\), and the master equation takes the simple form: \\[\Gamma'(t) = H \cdot \Gamma(t)\\] which is an elementary differential equation, with solution: \\[\Gamma(t) = e^{H t}\ \Gamma(0)\\]
  • 37.
    edited April 4

    Well this completely generalizes to the vector-case, and the solution to the master equation is written exactly the same way

    \[\Gamma(t) = e^{H t}\ \Gamma(0)\]

    Now, the constant \(H\) is the matrix for the linear operator.

    Comment Source:Well this completely generalizes to the vector-case, and the solution to the master equation is written exactly the same way \\[\Gamma(t) = e^{H t}\ \Gamma(0)\\] Now, the constant \\(H\\) is the matrix for the linear operator.
  • 38.
    edited April 4

    Rewriting slightly, we have:

    \[\Gamma(t) = e^{t H}\ \Gamma(0)\]

    Things got more interesting here, as we are taking the exponential of a matrix.

    Comment Source:Rewriting slightly, we have: \\[\Gamma(t) = e^{t H}\ \Gamma(0)\\] Things got more interesting here, as we are taking the exponential of a matrix.
  • 39.
    edited April 4

    The matrix exponential is defined by the same formula as the scalar exponential, that is by the power series:

    \[e^X = \frac{1}{0!} X^0 + \frac{1}{1!} X^1 + \frac{1}{2!} X^2 + \cdots = I + X + \frac{1}{2!} X^2 + \cdots\]

    where \(I\) is the identity matrix.

    Comment Source:The matrix exponential is defined by the same formula as the scalar exponential, that is by the power series: \\[e^X = \frac{1}{0!} X^0 + \frac{1}{1!} X^1 + \frac{1}{2!} X^2 + \cdots = I + X + \frac{1}{2!} X^2 + \cdots\\] where \\(I\\) is the identity matrix.
  • 40.
    edited April 4

    Via matrix multiplication, the matrix \(e^{t H}\) is applied to the initial vector \(\Gamma(0)\) to get the vector \(\Gamma(t)\).

    It is a theorem that whenever \(H\) is infinitesimal stochastic (see comment #31), the operator \(e^{t H}\) will map stochastic states to stochastic states:

    \[e^{t H}: \Delta \rightarrow \Delta\]

    Since \(\Gamma(0) \in \Delta\), it follows that \(\Gamma(t) = e^{t H} \Gamma(0) \in \Delta\).

    In other words, H being infinitesimal stochastic guarantees that the path of \(\Gamma\) lies within the simplex \(\Delta\) -- which is to be expected, as \(\Gamma\) is supposed to describe the evolution of the probability distribution.

    Comment Source:Via matrix multiplication, the matrix \\(e^{t H}\\) is applied to the initial vector \\(\Gamma(0)\\) to get the vector \\(\Gamma(t)\\). It is a theorem that whenever \\(H\\) is infinitesimal stochastic (see [comment #31](https://forum.azimuthproject.org/discussion/comment/22052/#Comment_22052)), the operator \\(e^{t H}\\) will map stochastic states to stochastic states: \\[e^{t H}: \Delta \rightarrow \Delta\\] Since \\(\Gamma(0) \in \Delta\\), it follows that \\(\Gamma(t) = e^{t H} \Gamma(0) \in \Delta\\). In other words, H being infinitesimal stochastic guarantees that the path of \\(\Gamma\\) lies within the simplex \\(\Delta\\) -- which is to be expected, as \\(\Gamma\\) is supposed to describe the evolution of the probability distribution.
  • 41.
    edited April 5

    Expanding, we get:

    \[\Gamma(t) = e^{t H} \Gamma(0) = (I + tH + \frac{1}{2!} t^2 H^2 + \cdots)\ \Gamma(0) = \Gamma(0) + tH\ \Gamma(0) + \frac{1}{2!} t^2 H^2 \Gamma(0) + \cdots \]

    Comment Source:Expanding, we get: \\[\Gamma(t) = e^{t H} \Gamma(0) = (I + tH + \frac{1}{2!} t^2 H^2 + \cdots)\ \Gamma(0) = \Gamma(0) + tH\ \Gamma(0) + \frac{1}{2!} t^2 H^2 \Gamma(0) + \cdots \\]
Sign In or Register to comment.