Oliver Friedmann, On Lower Bounds for Policy Iteration
Policy iteration is one of the most important algorithmic schemes for solving problems in the domain of determined game theory such as parity games, stochastic games and Markov decision processes, and many more. It is parameterized by an improvement rule that determines how to proceed in the iteration from one policy to the next. It is a major open problem whether there is an improvement rule that results in a polynomial time algorithm for solving one of the considered game classes.
Simplex algorithms for solving linear programs are closely related to policy iteration algorithms. Like policy iteration, the simplex algorithm is parameterized by a so-called pivoting rule that describes how to proceed from one basic feasible solution in the linear program to the next. Also, it is a major open problem whether there is a pivoting rule that results in a (strongly) polynomial time algorithm for solving linear programs.
We describe our recent constructions for parity games that give rise to superpolynomial and exponential lower bounds for all major improvement rules, and how to extend these lower bounds to more expressive game classes like stochastic games. We show that our constructions for parity games can be translated to Markov decision processes, transferring our lower bounds to their domain, and finally show how the lower bounds for the MDPs can be transferred to the linear programming domain, resolving problems that have been open for several decades.