It looks like you're new here. If you want to get involved, click one of these buttons!

- All Categories 2.3K
- Chat 501
- Study Groups 21
- Petri Nets 9
- Epidemiology 4
- Leaf Modeling 2
- Review Sections 9
- MIT 2020: Programming with Categories 51
- MIT 2020: Lectures 20
- MIT 2020: Exercises 25
- MIT 2019: Applied Category Theory 339
- MIT 2019: Lectures 79
- MIT 2019: Exercises 149
- MIT 2019: Chat 50
- UCR ACT Seminar 4
- General 69
- Azimuth Code Project 110
- Statistical methods 4
- Drafts 9
- Math Syntax Demos 15
- Wiki - Latest Changes 3
- Strategy 113
- Azimuth Project 1.1K
- - Spam 1
- News and Information 148
- Azimuth Blog 149
- - Conventions and Policies 21
- - Questions 43
- Azimuth Wiki 714

Options

In this model, this model tracks the evolution of three sub-populations:

- S(t) = susceptible individuals at time t
- I(t) = infected individuals
- R(t) = recovered individuals

For a general Petri net, these are called *species*.

There is a process called *infection*, which takes as "inputs" one infected individual and one susceptible individual, and "outputs" two infected individuals. This process can be written:

\[\mathit{infection}: S + I \rightarrow I + I\]

Another process called *recovery* inputs an infected individual and outputs a recovered individual:

\[\mathit{recovery}: I \rightarrow R\]

The processes are also referred to as transitions, or reactions.

Each process has an associated *transition rate*, which is a kind of "speed coefficient" for the reaction.

## Comments

For a stochastic Petri net, the size of each species is given by an (evolving) non-negative integer.

The state of the net is therefore given by a triple (s,i,r) of natural numbers. In other words, the state space is \(\mathbb{N}^3\).

The transitions fire at random times, and have the effect of decreasing their input populations and increasing their output populations by the indicated amounts.

One firing of recovery has the effect of decreasing \(I\) by 1 and increasing \(R\) by 1.

Formally, one firing of infection would decrease \(S\) and \(I\) by, and increase \(I\) by 2. But netting this out, we see that one firing of infection decreases \(S\) by 1 and increases \(I\) by 1.

For example, suppose we were in state \(x = (5,2,1)\).

Then if infection fired, we would move to \(x' = (4,3,1)\).

Then if recovery fired, we would move to \(x'' = (4,2,2)\).

`For a stochastic Petri net, the size of each species is given by an (evolving) non-negative integer. The state of the net is therefore given by a triple (s,i,r) of natural numbers. In other words, the state space is \\(\mathbb{N}^3\\). The transitions fire at random times, and have the effect of decreasing their input populations and increasing their output populations by the indicated amounts. One firing of recovery has the effect of decreasing \\(I\\) by 1 and increasing \\(R\\) by 1. Formally, one firing of infection would decrease \\(S\\) and \\(I\\) by, and increase \\(I\\) by 2. But netting this out, we see that one firing of infection decreases \\(S\\) by 1 and increases \\(I\\) by 1. For example, suppose we were in state \\(x = (5,2,1)\\). Then if infection fired, we would move to \\(x' = (4,3,1)\\). Then if recovery fired, we would move to \\(x'' = (4,2,2)\\).`

Each transition is modeled as a Poisson process with a variable rate parameter.

Specifically, the firing rate of the process is equal to its rate constant times the product of its input population counts.

(EDIT: this is only true to a first approximation, for large population sizes. The refinement is described in comment 14.)

Suppose in our example, that the transition rate for infection was 2.1, and the transition rate for recovery was 7.9.

Then in state \(x = (5,2,1)\):

The firing rate for infection would be \(2.1 \cdot S \cdot I = 10.2\).

The firing rate for recovery would be \(7.9 \cdot I = 15.8\).

And to say that the firing rate for a process is \(z\) means the following: in an infinitesimal time interval \(dt\), the probability of the process firing is \(z * dt\).

I.e., in the limit as dt goes to 0, Prob(firing)/dt = z.

`Each transition is modeled as a Poisson process with a variable rate parameter. Specifically, the firing rate of the process is equal to its rate constant times the product of its input population counts. (EDIT: this is only true to a first approximation, for large population sizes. The refinement is described in comment 14.) Suppose in our example, that the transition rate for infection was 2.1, and the transition rate for recovery was 7.9. Then in state \\(x = (5,2,1)\\): * The firing rate for infection would be \\(2.1 \cdot S \cdot I = 10.2\\). * The firing rate for recovery would be \\(7.9 \cdot I = 15.8\\). And to say that the firing rate for a process is \\(z\\) means the following: in an infinitesimal time interval \\(dt\\), the probability of the process firing is \\(z * dt\\). I.e., in the limit as dt goes to 0, Prob(firing)/dt = z.`

That the firing rates are proportional to the input population sizes is called the

law of mass action.It makes sense the the firing rate of infection will be proportional to \(S \cdot I\).

(EDIT: this is only true to a first approximation, for large population sizes. The refinement is described in comment 14.)

Before an epidemic has begun, \(I(t)\) is small, so infections occur at a low rate. As \(I\) begins to increase, the rate of infection increases proportionally. This is the exponential growth at the outset.

As the epidemic progresses, \(S\) goes down, which has the effect of reducing the base of the exponential.

At the tail end of the epidemic, the combination of reduced values for \(S\) and \(I\) pulls the infection rate down.

These dynamics are responsible for the sigmoidal shape of the cumulative infection curve.

`That the firing rates are proportional to the input population sizes is called the _law of mass action_. It makes sense the the firing rate of infection will be proportional to \\(S \cdot I\\). (EDIT: this is only true to a first approximation, for large population sizes. The refinement is described in comment 14.) Before an epidemic has begun, \\(I(t)\\) is small, so infections occur at a low rate. As \\(I\\) begins to increase, the rate of infection increases proportionally. This is the exponential growth at the outset. As the epidemic progresses, \\(S\\) goes down, which has the effect of reducing the base of the exponential. At the tail end of the epidemic, the combination of reduced values for \\(S\\) and \\(I\\) pulls the infection rate down. These dynamics are responsible for the sigmoidal shape of the cumulative infection curve.`

For more complex examples of stochastic Petri net, see:

There is one net for a chemical reaction:

\[2 NAD^+ + 2 H_2 O \rightarrow 2 NADH + 2 H^+ + O_2\]

Here the species are molecules, and the transitions are chemical reactions.

Note this is using the standard graphical notation for Petri nets, which is a directed bipartite graph, with circles for the species and squares for the transitions. There is a directed edge from a circle to a square whenever the corresponding species is input to the corresponding transition, and similarly an edge from a square to a circle whenever the transition outputs to the species.

Those slides also contain a more complex Petri net, for a biochemical reaction network (the "RKIP pathway").

`For more complex examples of stochastic Petri net, see: * Monika Heiner, [From Petri nets to differential equations: an integrative approach for biochemical network analysis](https://www-dssz.informatik.tu-cottbus.de/publications/slides/2005_muenchen_msbf.sld.pdf), Brandenburg University of Technology Cottbus, Dept. of CS, May 2005. There is one net for a chemical reaction: \\[2 NAD^+ + 2 H_2 O \rightarrow 2 NADH + 2 H^+ + O_2\\] Here the species are molecules, and the transitions are chemical reactions. Note this is using the standard graphical notation for Petri nets, which is a directed bipartite graph, with circles for the species and squares for the transitions. There is a directed edge from a circle to a square whenever the corresponding species is input to the corresponding transition, and similarly an edge from a square to a circle whenever the transition outputs to the species. Those slides also contain a more complex Petri net, for a biochemical reaction network (the "RKIP pathway").`

For each time point \(t\), we have three random variables \(S(t)\), \(I(t)\) and \(R(t)\).

Together, they constitute a vector-valued stochastic process, over the continuous time parameter \(t\). Here, \(V(t) = (S(t), I(t), R(t))\).

It is a Markov process.

This statements are true for all stochastic Petri nets.

`For each time point \\(t\\), we have three random variables \\(S(t)\\), \\(I(t)\\) and \\(R(t)\\). Together, they constitute a vector-valued stochastic process, over the continuous time parameter \\(t\\). Here, \\(V(t) = (S(t), I(t), R(t))\\). It is a Markov process. This statements are true for all stochastic Petri nets.`

We may inquire about the expected values \(E(S(t))\), \(E(I(t))\) and \(E(R(t))\).

These together form another vector-valued process, consisting of the expected values of the populations at each point in time.

But

thisprocess is deterministic - for each point in time, we have three scalar values, one for species.`We may inquire about the expected values \\(E(S(t))\\), \\(E(I(t))\\) and \\(E(R(t))\\). These together form another vector-valued process, consisting of the expected values of the populations at each point in time. But _this_ process is deterministic - for each point in time, we have three scalar values, one for species.`

The mathematical derivation of this deterministic mean-value process from the definitional properties of stochastic Petri nets - this gets into some deeper mathematical waters, which we won't cover here.

Here is a whole book devoted to the subject:

We can look into this in another context - in the network theory study group!

`The mathematical derivation of this deterministic mean-value process from the definitional properties of stochastic Petri nets - this gets into some deeper mathematical waters, which we won't cover here. Here is a whole book devoted to the subject: * John C. Baez and Jacob Biamonte, [Quantum techniques for stochastic mechanics](https://arxiv.org/abs/1209.3632), arXiv:1209.3632 [quant-ph]. Text includes treatment from the ground up of Petri nets, both stochastic and deterministic. Example includes SI, SIR and SIRS models. We can look into this in another context - in the network theory study group!`

Fortunately, it is not necessary to go through first principles in order to understand the deterministic mean process for a stochastic Petri net.

In fact, it turns out to be a rather straightforward and intuitive construction. It is given by ordinary differential equations, which can be directly constructed, using the graph structure of the Petri net and the transition coefficients.

`Fortunately, it is not necessary to go through first principles in order to understand the deterministic mean process for a stochastic Petri net. In fact, it turns out to be a rather straightforward and intuitive construction. It is given by ordinary differential equations, which can be directly constructed, using the graph structure of the Petri net and the transition coefficients.`

While the population counts are integers, their expected values are real.

Hence, for the deterministic process, we have moved into the realm of "continuous Petri nets."

Whereas for stochastic Petri nets, we can visualize the species as

placesthat hold discrete tokens, now they are places that hold continuous magnitudes.Picture them as buckets holding a certain amount of "numerical fluid." The processes are then continuous pumps, draining their input buckets, and filling their output buckets.

An analogous law of mass action still holds: the pumping rate of a process is equal to its transition coefficient times the product of the values present at the input buckets.

`While the population counts are integers, their expected values are real. Hence, for the deterministic process, we have moved into the realm of "continuous Petri nets." Whereas for stochastic Petri nets, we can visualize the species as _places_ that hold discrete tokens, now they are places that hold continuous magnitudes. Picture them as buckets holding a certain amount of "numerical fluid." The processes are then continuous pumps, draining their input buckets, and filling their output buckets. An analogous law of mass action still holds: the pumping rate of a process is equal to its transition coefficient times the product of the values present at the input buckets.`

This leads us to the standard presentation of the SIR model for epidemics, which is deterministic and based on ordinary differential equations. See:

When dealing with the large scales of human populations, we project the trends of the expected values, and don't track all of the micro-variations.

`This leads us to the standard presentation of the SIR model for epidemics, which is deterministic and based on ordinary differential equations. See: * [Compartmental models in epidemiology](https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology), Wikipedia. When dealing with the large scales of human populations, we project the trends of the expected values, and don't track all of the micro-variations.`

On the other hand, at smaller scales, the non-deterministic structure of a stochastic Petri net becomes more interesting, see for example:

`On the other hand, at smaller scales, the non-deterministic structure of a stochastic Petri net becomes more interesting, see for example: * Gang Wang, Ying Zhang, Simon J. Shepherd, Clive B. Beggs, Nini Rao, [Application of stochastic Petri nets for modelling the transmission of airborne Infection in Indoor Environment](https://pdfs.semanticscholar.org/c701/1a719988e589e0425ae7346b2202f93a78f6.pdf), Acta Medica Mediterranea, 2016, 32: 587.`

In fact, though, local behavior

doesmake a difference in the evolution of epidemics. For example, just look at the variation in epidemic characteristics across countries for COVID-19.Perhaps one way to get at this would be to analyze a population into local zones, where each zone has its own \(S, I, R\) populations. Each zone could have its own transition coefficients, depending on local conditions.

Then there would also be transitions representing the flow of individuals due to travel between zones.

Not sure whether this has been done, or whether it would be fruitful, but it's a concept.

`In fact, though, local behavior _does_ make a difference in the evolution of epidemics. For example, just look at the variation in epidemic characteristics across countries for COVID-19. Perhaps one way to get at this would be to analyze a population into local zones, where each zone has its own \\(S, I, R\\) populations. Each zone could have its own transition coefficients, depending on local conditions. Then there would also be transitions representing the flow of individuals due to travel between zones. Not sure whether this has been done, or whether it would be fruitful, but it's a concept.`

Exercises and notes:

Work out the differential equations for the SIR model, based on the pumping principles and the law of mass action I described in comment 9. You can check your answers with the SIR equations given in the Wikipedia article on Compartmental models in epidemiology.

Do the same for the simpler SI model, which is obtained from SIR by removing R and the recovery transition. Compare your results to the Logistic equation.

See this discussion on the forum, regarding the logistic equation and COVID-19.

`Exercises and notes: * Work out the differential equations for the SIR model, based on the pumping principles and the law of mass action I described in comment 9. You can check your answers with the SIR equations given in the Wikipedia article on [Compartmental models in epidemiology](https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology). * Do the same for the simpler SI model, which is obtained from SIR by removing R and the recovery transition. Compare your results to the [[Logistic equation]]. * See [this discussion](https://forum.azimuthproject.org/discussion/377/logistic-equation-with-recent-commentary-on-corona-virus) on the forum, regarding the logistic equation and COVID-19.`

Above I wrote:

And:

I see this needs to be corrected. This product rule holds only when none of the input species to a transition occur with multiplicity more than one. The general case is given by a modified product rule.

The operative principle here is explained in:

The modified product rule actual follows from a more abstract principle.

Picture the state space of the Petri net as a graph with nodes \(\mathbb{N}^k\).

Each transition induces a (possibly infinite) collection of edges into this graph. These edges are generated by the "delta vector" that is produced by one firing of the transition.

For example, in the SI model, the delta vector is (-1,1).

Note: the technical name is not delta vector but stoichiometric vector. So I took some poetic license.

Furthermore, each edge \(e\) from state \(x\) to state \(y\) gets weighted by the

number of waysof moving directly from \(x\) to \(y\).And this counting procedure is predicated on the idea that the individuals in the population are

distinguishable.For example, suppose that we had a transition:

\[t: A + B \rightarrow Z\]

The delta vector for \(t\) is (-1, -1, +1).

For example, take the SI model, and suppose that edge \(e\) goes from \(x = (2,4,6)\) to \(y = (1,3,7)\).

The underlying procedure for a single step of the transition involved choosing one of 2 members from A and one of 3 members from B. So there are 6 ways that this transition can be performed, and the weight for the edge is 6. There you have the product rule.

But now consider the case of:

\[t: A + A \rightarrow X\]

Now the delta vector is (-2, +1).

Now the underlying procedure involves choosing a member from species A, and then choosing a second member from A after it has been reduced by one from the first step.

So if \(e\) is an edge from \(x\) to \(y\), and state \(x\) contains \(k\) members of A, then the weight given to \(e\) is \(k \cdot (k-1)\).

This is called the falling power.

The modified product rule then is clear: take the product of falling powers, where there is one falling power for each distinct species that occurs as input to a transition.

The law of mass action -- which simply uses the product rule -- follows as a limiting approximation for large population sizes.

`Above I wrote: > Specifically, the firing rate of the process is equal to its rate constant times the product of its input population counts. And: > That the firing rates are proportional to the input population sizes is called the _law of mass action_. I see this needs to be corrected. This product rule holds only when none of the input species to a transition occur with multiplicity more than one. The general case is given by a modified product rule. The operative principle here is explained in: * John C. Baez and Jacob Biamonte, [Quantum techniques for stochastic mechanics](https://arxiv.org/abs/1209.3632), arXiv:1209.3632 [quant-ph]. The modified product rule actual follows from a more abstract principle. Picture the state space of the Petri net as a graph with nodes \\(\mathbb{N}^k\\). Each transition induces a (possibly infinite) collection of edges into this graph. These edges are generated by the "delta vector" that is produced by one firing of the transition. For example, in the SI model, the delta vector is (-1,1). Note: the technical name is not delta vector but stoichiometric vector. So I took some poetic license. Furthermore, each edge \\(e\\) from state \\(x\\) to state \\(y\\) gets weighted by the _number of ways_ of moving directly from \\(x\\) to \\(y\\). And this counting procedure is predicated on the idea that the individuals in the population are _distinguishable_. For example, suppose that we had a transition: \\[t: A + B \rightarrow Z\\] The delta vector for \\(t\\) is (-1, -1, +1). For example, take the SI model, and suppose that edge \\(e\\) goes from \\(x = (2,4,6)\\) to \\(y = (1,3,7)\\). The underlying procedure for a single step of the transition involved choosing one of 2 members from A and one of 3 members from B. So there are 6 ways that this transition can be performed, and the weight for the edge is 6. There you have the product rule. But now consider the case of: \\[t: A + A \rightarrow X\\] Now the delta vector is (-2, +1). Now the underlying procedure involves choosing a member from species A, and then choosing a second member from A after it has been reduced by one from the first step. So if \\(e\\) is an edge from \\(x\\) to \\(y\\), and state \\(x\\) contains \\(k\\) members of A, then the weight given to \\(e\\) is \\(k \cdot (k-1)\\). This is called the falling power. The modified product rule then is clear: take the product of falling powers, where there is one falling power for each distinct species that occurs as input to a transition. The law of mass action -- which simply uses the product rule -- follows as a limiting approximation for large population sizes.`

Master EquationThe 'master equation' determines the probabilistic structure of the process for a Petri net.

To do this, first we'll review how a Petri net is a kind of continuous time Markov chain (CTMC).

Then we can apply the general concept of the master equation for CTMC to the case of Petri nets.

`**Master Equation** The 'master equation' determines the probabilistic structure of the process for a Petri net. To do this, first we'll review how a Petri net is a kind of continuous time Markov chain (CTMC). Then we can apply the general concept of the master equation for CTMC to the case of Petri nets.`

Say that our Petri net has species S.

Let a

definite stateto be a mapping from S into \(\mathbb{N}\), giving the size of all the populations.So the set of all definite states is \(\mathbb{N}^S\).

Let a

stochastic stateto be a probability distribution over \(\mathbb{N}^S\).`Say that our Petri net has species S. Let a _definite state_ to be a mapping from S into \\(\mathbb{N}\\), giving the size of all the populations. So the set of all definite states is \\(\mathbb{N}^S\\). Let a _stochastic state_ to be a probability distribution over \\(\mathbb{N}^S\\).`

Here is the problem at hand, which the master equation will address.

Suppose that we know that at time 0, the net is in a stochastic state \(s_0\).

Then at time \(t\), what stochastic state will the net be in?

`Here is the problem at hand, which the master equation will address. Suppose that we know that at time 0, the net is in a stochastic state \\(s_0\\). Then at time \\(t\\), what stochastic state will the net be in?`

First, let's tighten up the construction of a Markov chain for a given Petri net.

Here is a general recipe for constructing a CTMC. Form a labeled, directed graph which has Nodes = the definite states, and a directed edge from \(x\) to \(y\) whenever \(y\) can be reached from \(x\) in one "hop." The edge will get labeled with the transition rate for this hopping.

That is what we did above in comment 14, when we constructed a directed, labeled graph with Nodes = \(\mathbb{N}^S\) = the definite states.

Note: here we have a graph with a countably infinite number of nodes. However, things are simplified by the fact that each node has only a finite number of neighbors.

Specifically, for our Petri net graph, there is a directed edge from \(x \in \mathbb{N}^S\) to \(y \in \mathbb{N}^S\) whenever \(y\) can be reached from \(x\) by a single firing of one of the transitions.

And this holds whenever \(y - x\) equals the stoichiometric vector ("delta vector") for one of the transitions.

If multiple transitions have the same stoichiometric vector \(y - x\), then we will introduce, as an intermediate construction, multiple edges from \(x\) to \(y\).

`First, let's tighten up the construction of a Markov chain for a given Petri net. Here is a general recipe for constructing a CTMC. Form a labeled, directed graph which has Nodes = the definite states, and a directed edge from \\(x\\) to \\(y\\) whenever \\(y\\) can be reached from \\(x\\) in one "hop." The edge will get labeled with the transition rate for this hopping. That is what we did above in comment 14, when we constructed a directed, labeled graph with Nodes = \\(\mathbb{N}^S\\) = the definite states. Note: here we have a graph with a countably infinite number of nodes. However, things are simplified by the fact that each node has only a finite number of neighbors. Specifically, for our Petri net graph, there is a directed edge from \\(x \in \mathbb{N}^S\\) to \\(y \in \mathbb{N}^S\\) whenever \\(y\\) can be reached from \\(x\\) by a single firing of one of the transitions. And this holds whenever \\(y - x\\) equals the stoichiometric vector ("delta vector") for one of the transitions. If multiple transitions have the same stoichiometric vector \\(y - x\\), then we will introduce, as an intermediate construction, multiple edges from \\(x\\) to \\(y\\).`

Let's get more specific now.

Let \(t\) be a transition, and Stoich(\(t\)) be its stoichiometric vector.

For a species \(z\), let InpDegree(\(t,z\)) be the number of places that \(t\) inputs from \(z\), and OutDegree(\(t,z\)) be the number of places \(t\) outputs to \(z\).

Let \(x \in \mathbb{N}^S\) be a definite state, and let \(x' = x\) + Stoich(\(t\)).

If \(x'\) contains any negative components, then it is outside of \(\mathbb{N}^S\), and it is not a valid state. That indicates that the transition \(t\) doesn't have a sufficient number of input tokens in \(x\) to fire.

If however \(x'\) does belong to \(\mathbb{N}^S\), then we add an edge \(e(t,x,x')\) from \(x\) to \(x'\) in the graph.

`Let's get more specific now. Let \\(t\\) be a transition, and Stoich(\\(t\\)) be its stoichiometric vector. For a species \\(z\\), let InpDegree(\\(t,z\\)) be the number of places that \\(t\\) inputs from \\(z\\), and OutDegree(\\(t,z\\)) be the number of places \\(t\\) outputs to \\(z\\). Let \\(x \in \mathbb{N}^S\\) be a definite state, and let \\(x' = x\\) + Stoich(\\(t\\)). If \\(x'\\) contains any negative components, then it is outside of \\(\mathbb{N}^S\\), and it is not a valid state. That indicates that the transition \\(t\\) doesn't have a sufficient number of input tokens in \\(x\\) to fire. If however \\(x'\\) does belong to \\(\mathbb{N}^S\\), then we add an edge \\(e(t,x,x')\\) from \\(x\\) to \\(x'\\) in the graph.`

Here we define the weight for edge \(e(t,x,x')\):

Where:

This gives the "rate of probability flow" through the edge.

`Here we define the weight for edge \\(e(t,x,x')\\): * Weight(e(t,x,x')) = RateConstant(t) * \\(\prod_{\mathit{species} \in S}\\) FallingPower(Population(x,species), InpDegree(t, species)) Where: * FallingPower(n, k) = n * (n-1) * ... * (n - k + 1) This gives the "rate of probability flow" through the edge.`

Just two more steps to wrap up the construction of the Markov chain graph for our Petri net.

First, form the union of the all of these per-transition graphs.

Second, wherever this union leads to multiple edges going from \(x\) to \(x'\), then sum their weights and replace with a single edge.

That's it.

`Just two more steps to wrap up the construction of the Markov chain graph for our Petri net. First, form the union of the all of these per-transition graphs. Second, wherever this union leads to multiple edges going from \\(x\\) to \\(x'\\), then sum their weights and replace with a single edge. That's it.`

Once we have a graph labeled with transition rates, we can proceed to the master equation, which gives a deterministic law for how the stochastic state changes over time -- once an initial stochastic state is specified.

Let \(\Gamma_{\sigma}(t)\) be the stochastic state at time \(t\), given \(\sigma\) as the initial stochastic state.

`Once we have a graph labeled with transition rates, we can proceed to the master equation, which gives a deterministic law for how the stochastic state changes over time -- once an initial stochastic state is specified. Let \\(\Gamma_{\sigma}(t)\\) be the stochastic state at time \\(t\\), given \\(\sigma\\) as the initial stochastic state.`

The

master equationfor the Markov chain is a vector differential equation:\[\Gamma_{\sigma}'(t) = H(\Gamma_{\sigma}(t))\]

where \(H\) is a linear operator to be constructed from the weighted graph for the Markov chain.

`The _master equation_ for the Markov chain is a vector differential equation: \\[\Gamma_{\sigma}'(t) = H(\Gamma_{\sigma}(t))\\] where \\(H\\) is a linear operator to be constructed from the weighted graph for the Markov chain.`

To make this all work smoothly, let's situate the stochastic states inside of a containing vector space.

Let D be the definite states -- these are the nodes in the transition graph.

The vector space that we will work within is the space of real-valued functions on D, i.e. \(\mathbb{R}^D\).

Let \(\Delta \subset R^D\) be the simplex in F, meaning the functions taking values in [0,1] where all the values sum to 1.

\(\Delta\) is all the probability distributions over D, all the stochastic states.

`To make this all work smoothly, let's situate the stochastic states inside of a containing vector space. Let D be the definite states -- these are the nodes in the transition graph. The vector space that we will work within is the space of real-valued functions on D, i.e. \\(\mathbb{R}^D\\). Let \\(\Delta \subset R^D\\) be the simplex in F, meaning the functions taking values in [0,1] where all the values sum to 1. \\(\Delta\\) is all the probability distributions over D, all the stochastic states.`

Now we can write:

\[\Gamma_{\sigma}: [0,\infty) \rightarrow \Delta \subset \mathbb{R}^D\]

\[\Gamma_{\sigma}(0) = \sigma\]

\[\Gamma_{\sigma}': [0,\infty) \rightarrow \mathbb{R}^D\]

`Now we can write: \\[\Gamma_{\sigma}: [0,\infty) \rightarrow \Delta \subset \mathbb{R}^D\\] \\[\Gamma_{\sigma}(0) = \sigma\\] \\[\Gamma_{\sigma}': [0,\infty) \rightarrow \mathbb{R}^D\\]`

For definite states \(i, j \in D\), define:

\[H_{i,j} = (\Gamma_{i}'(0))(j)\]

This is the instantaneous rate of change of the probability of being in state \(j\), given that the chain began in state \(i\).

Loosely speaking, it is the rate at which probability flows from definite state \(i\) to definite state \(j\).

`For definite states \\(i, j \in D\\), define: \\[H_{i,j} = (\Gamma_{i}'(0))(j)\\] This is the instantaneous rate of change of the probability of being in state \\(j\\), given that the chain began in state \\(i\\). Loosely speaking, it is the rate at which probability flows from definite state \\(i\\) to definite state \\(j\\).`

This is exactly the same concept of flow rates that we used for the edge weights in the directed graph for the Markov chain.

The set of values \(H_{i,j}\) is an "infinite square matrix." The values can be picked off right from the graph: \(H_{i,j}\) = the weight of the edge from \(i\) to \(j\) if it exists in the graph, else 0.

The weighted graph is a specification for the system of coefficients \([H_{i,j} ]\) that determine the master equation.

`This is exactly the same concept of flow rates that we used for the edge weights in the directed graph for the Markov chain. The set of values \\(H_{i,j}\\) is an "infinite square matrix." The values can be picked off right from the graph: \\(H_{i,j}\\) = the weight of the edge from \\(i\\) to \\(j\\) if it exists in the graph, else 0. The weighted graph is a specification for the system of coefficients \\(\[H_{i,j} \]\\) that determine the master equation.`

Let's look at the master equation again:

\[\Gamma_{\sigma}'(t) = H(\Gamma_{\sigma}(t))\]

The linear operator \(H\) is just matrix multiplication by \([H_{i,j}]\).

Instantiating to t=0, we get:

\[\Gamma_{\sigma}'(0) = H(\Gamma_{\sigma}(0)) = H(\sigma)\]

For a definite state \(j\), this gives us:

\[\Gamma_{j}'(0) = H(\Gamma_{j}(0)) = H(j)\]

which equals the jth column of \([H_{i,j}]\).

`Let's look at the master equation again: \\[\Gamma_{\sigma}'(t) = H(\Gamma_{\sigma}(t))\\] The linear operator \\(H\\) is just matrix multiplication by \\(\[H_{i,j}\]\\). Instantiating to t=0, we get: \\[\Gamma_{\sigma}'(0) = H(\Gamma_{\sigma}(0)) = H(\sigma)\\] For a definite state \\(j\\), this gives us: \\[\Gamma_{j}'(0) = H(\Gamma_{j}(0)) = H(j)\\] which equals the jth column of \\(\[H_{i,j}\]\\).`

Recap: for each definite state \(j\), the jth column of \([H_{i,j}]\) equals the derivative of the stochastic state when the chain is started in \(j\).

`Recap: for each definite state \\(j\\), the jth column of \\(\[H_{i,j}\]\\) equals the derivative of the stochastic state when the chain is started in \\(j\\).`

But for the master equation to be valid, \([H_{i,j}]\) can't be any old infinite square matrix.

There are two criteria to be satisfied.

Look at the matrix multiplication involved in computing \(H(\sigma)\), for a general stochastic state \(\sigma\). This is an infinite sum, which is required to converge for \(H(\sigma)\) to be well defined.

A sufficient condition to guarantee this convergence is that in the graph for the Markov chain, each node has only a finite number of neighbors. From this it follows that the rows (and columns) will only have a finite number of nonzero entries, which makes \(H(\sigma)\) well-defined.

`But for the master equation to be valid, \\(\[H_{i,j}\]\\) can't be any old infinite square matrix. There are two criteria to be satisfied. Look at the matrix multiplication involved in computing \\(H(\sigma)\\), for a general stochastic state \\(\sigma\\). This is an infinite sum, which is required to converge for \\(H(\sigma)\\) to be well defined. A sufficient condition to guarantee this convergence is that in the graph for the Markov chain, each node has only a finite number of neighbors. From this it follows that the rows (and columns) will only have a finite number of nonzero entries, which makes \\(H(\sigma)\\) well-defined.`

The second condition on \([H_{i,j}]\) is that it must be

infinitesimal stochastic, which means this:`The second condition on \\(\[H_{i,j}\]\\) is that it must be _infinitesimal stochastic_, which means this: * All the off-diagonal elements are non-negative * Each column sums to zero`

The reason for this requirement stems from the master equation:

\[\Gamma_{\sigma}'(t) = H(\Gamma_{\sigma}(t))\]

Since \(\Gamma(t)\) must always remain in the simplex \(\Delta\) -- i.e. must always be a probability distribution -- a big constraint is placed on \(\Gamma'(t)\).

Specifically, \(\Gamma'(t)\) as tangent vector must contain no normal component with respect to the "plane of the simplex." More accurately put, it must be perpendicular to any vector which is normal to the simplex. The simplex vector which is normal to the simplex is the function which maps every definite state \(D\) to 1.

And to say that a tangent vector is normal to this is to say that the sum of the components of this tangent vector is 0.

Now the jth column of the matrix for H is the tangent vector \(\Gamma_{j}'(0)\).

Hence each column must sum to 0.

`The reason for this requirement stems from the master equation: \\[\Gamma_{\sigma}'(t) = H(\Gamma_{\sigma}(t))\\] Since \\(\Gamma(t)\\) must always remain in the simplex \\(\Delta\\) -- i.e. must always be a probability distribution -- a big constraint is placed on \\(\Gamma'(t)\\). Specifically, \\(\Gamma'(t)\\) as tangent vector must contain no normal component with respect to the "plane of the simplex." More accurately put, it must be perpendicular to any vector which is normal to the simplex. The simplex vector which is normal to the simplex is the function which maps every definite state \\(D\\) to 1. And to say that a tangent vector is normal to this is to say that the sum of the components of this tangent vector is 0. Now the jth column of the matrix for H is the tangent vector \\(\Gamma_{j}'(0)\\). Hence each column must sum to 0.`

The requirement that the off-diagonal elements of the jth column be non-negative follows from the geometry of the simplex, as per the following.

Observe that \(\Gamma_j(0) = j\) = the corner of the simplex which is 1 at j, and 0 elsewhere.

The derivative \(\Gamma'_j(0)\) is a tangent vector which must lead from this corner further into the simplex.

(Or else the tangent vector is zero.)

So it must be decreasing the jth component to start going down from 1, and increasing all the other components to start increasing from 0.

The latter implies that the off-diagonal elements are all non-negative.

That shows why H must be infinitesimal stochastic.

Putting the two conditions together, we see that the diagonal entry will be the negative of the sum of the off-diagonal entries.

`The requirement that the off-diagonal elements of the jth column be non-negative follows from the geometry of the simplex, as per the following. Observe that \\(\Gamma_j(0) = j\\) = the corner of the simplex which is 1 at j, and 0 elsewhere. The derivative \\(\Gamma'_j(0)\\) is a tangent vector which must lead from this corner further into the simplex. (Or else the tangent vector is zero.) So it must be decreasing the jth component to start going down from 1, and increasing all the other components to start increasing from 0. The latter implies that the off-diagonal elements are all non-negative. That shows why H must be infinitesimal stochastic. Putting the two conditions together, we see that the diagonal entry will be the negative of the sum of the off-diagonal entries.`

And we can see this requirement concretely as well.

If you start in state j, the probability of being in the other states reachable from j will start increasing from 0, and the probability of remaining in state j will start decreasing from 1.

(Unless it is an absorbing state.)

Because the probabilities of being

somewherewill always total to 1, the sum of these derivatives must equal zero.`And we can see this requirement concretely as well. If you start in state j, the probability of being in the other states reachable from j will start increasing from 0, and the probability of remaining in state j will start decreasing from 1. (Unless it is an absorbing state.) Because the probabilities of being _somewhere_ will always total to 1, the sum of these derivatives must equal zero.`

Solution to the master equationGiven

\[\Gamma: [0,\infty) \rightarrow \mathbb{R}^D\]

and a linear operator

\[H: \mathbb{R}^D \rightarrow \mathbb{R}^D\]

the equation to be solved is

\[\Gamma'(t) = H(\Gamma(t))\]

`**Solution to the master equation** Given \\[\Gamma: [0,\infty) \rightarrow \mathbb{R}^D\\] and a linear operator \\[H: \mathbb{R}^D \rightarrow \mathbb{R}^D\\] the equation to be solved is \\[\Gamma'(t) = H(\Gamma(t))\\]`

So it's a vector-valued linear differential equation.

The solution is a natural generalization from the case where \(D\) contains just one element. In that case, the linear

\[H: \mathbb{R}^1 \rightarrow \mathbb{R}^1\]

amounts to multiplication by a constant \(H\), and the master equation takes the simple form:

\[\Gamma'(t) = H \cdot \Gamma(t)\]

which is an elementary differential equation, with solution:

\[\Gamma(t) = e^{H t}\ \Gamma(0)\]

`So it's a vector-valued linear differential equation. The solution is a natural generalization from the case where \\(D\\) contains just one element. In that case, the linear \\[H: \mathbb{R}^1 \rightarrow \mathbb{R}^1\\] amounts to multiplication by a constant \\(H\\), and the master equation takes the simple form: \\[\Gamma'(t) = H \cdot \Gamma(t)\\] which is an elementary differential equation, with solution: \\[\Gamma(t) = e^{H t}\ \Gamma(0)\\]`

Well this completely generalizes to the vector-case, and the solution to the master equation is written exactly the same way

\[\Gamma(t) = e^{H t}\ \Gamma(0)\]

Now, the constant \(H\) is the matrix for the linear operator.

`Well this completely generalizes to the vector-case, and the solution to the master equation is written exactly the same way \\[\Gamma(t) = e^{H t}\ \Gamma(0)\\] Now, the constant \\(H\\) is the matrix for the linear operator.`

Rewriting slightly, we have:

\[\Gamma(t) = e^{t H}\ \Gamma(0)\]

Things got more interesting here, as we are taking the exponential of a matrix.

`Rewriting slightly, we have: \\[\Gamma(t) = e^{t H}\ \Gamma(0)\\] Things got more interesting here, as we are taking the exponential of a matrix.`

The matrix exponential is defined by the same formula as the scalar exponential, that is by the power series:

\[e^X = \frac{1}{0!} X^0 + \frac{1}{1!} X^1 + \frac{1}{2!} X^2 + \cdots = I + X + \frac{1}{2!} X^2 + \cdots\]

where \(I\) is the identity matrix.

`The matrix exponential is defined by the same formula as the scalar exponential, that is by the power series: \\[e^X = \frac{1}{0!} X^0 + \frac{1}{1!} X^1 + \frac{1}{2!} X^2 + \cdots = I + X + \frac{1}{2!} X^2 + \cdots\\] where \\(I\\) is the identity matrix.`

Via matrix multiplication, the matrix \(e^{t H}\) is applied to the initial vector \(\Gamma(0)\) to get the vector \(\Gamma(t)\).

It is a theorem that whenever \(H\) is infinitesimal stochastic (see comment #31), the operator \(e^{t H}\) will map stochastic states to stochastic states:

\[e^{t H}: \Delta \rightarrow \Delta\]

Since \(\Gamma(0) \in \Delta\), it follows that \(\Gamma(t) = e^{t H} \Gamma(0) \in \Delta\).

In other words, H being infinitesimal stochastic guarantees that the path of \(\Gamma\) lies within the simplex \(\Delta\) -- which is to be expected, as \(\Gamma\) is supposed to describe the evolution of the probability distribution.

`Via matrix multiplication, the matrix \\(e^{t H}\\) is applied to the initial vector \\(\Gamma(0)\\) to get the vector \\(\Gamma(t)\\). It is a theorem that whenever \\(H\\) is infinitesimal stochastic (see [comment #31](https://forum.azimuthproject.org/discussion/comment/22052/#Comment_22052)), the operator \\(e^{t H}\\) will map stochastic states to stochastic states: \\[e^{t H}: \Delta \rightarrow \Delta\\] Since \\(\Gamma(0) \in \Delta\\), it follows that \\(\Gamma(t) = e^{t H} \Gamma(0) \in \Delta\\). In other words, H being infinitesimal stochastic guarantees that the path of \\(\Gamma\\) lies within the simplex \\(\Delta\\) -- which is to be expected, as \\(\Gamma\\) is supposed to describe the evolution of the probability distribution.`

Expanding, we get:

\[\Gamma(t) = e^{t H} \Gamma(0) = (I + tH + \frac{1}{2!} t^2 H^2 + \cdots)\ \Gamma(0) = \Gamma(0) + tH\ \Gamma(0) + \frac{1}{2!} t^2 H^2 \Gamma(0) + \cdots \]

`Expanding, we get: \\[\Gamma(t) = e^{t H} \Gamma(0) = (I + tH + \frac{1}{2!} t^2 H^2 + \cdots)\ \Gamma(0) = \Gamma(0) + tH\ \Gamma(0) + \frac{1}{2!} t^2 H^2 \Gamma(0) + \cdots \\]`