I chose to ignore interior-point and ellipsoid methods as per:

> We prove that the classic policy-iteration method (Howard 1960), including the

Simplex method (Dantzig 1947) with the most-negative-reduced-cost pivoting rule, is

a strongly polynomial-time algorithm for solving the Markov decision problem (MDP)

with a fixed discount rate. Furthermore, the computational complexity of the policyiteration

method (including the Simplex method) is superior to that of the only known

strongly polynomial-time interior-point algorithm ([28] 2005) for solving this problem.

The result is surprising since the Simplex method with the same pivoting rule was

shown to be exponential for solving a general linear programming (LP) problem,

the Simplex (or simple policy-iteration) method with the smallest-index pivoting

rule was shown to be exponential for solving an MDP regardless of discount rates,

and the policy-iteration method was recently shown to be exponential for solving a

undiscounted MDP. We also extend the result to solving MDPs with sub-stochastic

and transient state transition probability matrices.

* Yunyu Ye, [The Simplex and Policy-Iteration Methods are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate (2010)](http://web.stanford.edu/~yyye/SimplexMDP3.pdf)

> We prove that the classic policy-iteration method (Howard 1960), including the

Simplex method (Dantzig 1947) with the most-negative-reduced-cost pivoting rule, is

a strongly polynomial-time algorithm for solving the Markov decision problem (MDP)

with a fixed discount rate. Furthermore, the computational complexity of the policyiteration

method (including the Simplex method) is superior to that of the only known

strongly polynomial-time interior-point algorithm ([28] 2005) for solving this problem.

The result is surprising since the Simplex method with the same pivoting rule was

shown to be exponential for solving a general linear programming (LP) problem,

the Simplex (or simple policy-iteration) method with the smallest-index pivoting

rule was shown to be exponential for solving an MDP regardless of discount rates,

and the policy-iteration method was recently shown to be exponential for solving a

undiscounted MDP. We also extend the result to solving MDPs with sub-stochastic

and transient state transition probability matrices.

* Yunyu Ye, [The Simplex and Policy-Iteration Methods are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate (2010)](http://web.stanford.edu/~yyye/SimplexMDP3.pdf)