Controlled stochastic process
A stochastic process whose probabilistic characteristics may be changed (controlled) in the course of its evolution in pursuance of some objective, normally the minimization (maximization) of a functional (the control objective) representing the quality of the control. Various types of controlled processes arise, depending on how the process is specified or on the nature of the control objective. The greatest progress has been made in the theory of controlled jump (or stepwise) Markov processes and controlled diffusion processes when the complete evolution of the process is observed by the controller. A corresponding theory has also been developed in the case of partial observations (incomplete data).
Controlled jump Markov process.
This is a controlled stochastic process with continuous time and piecewise-constant trajectories whose infinitesimal characteristics are influenced by the choice of control. For the construction of such a process the following are usually specified (cf. [1], [2]): 1) a Borel set <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c0260101.png" /> of states; 2) a Borel set <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c0260102.png" /> of controls, a set <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c0260103.png" /> of controls admissible when the process is in state <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c0260104.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c0260105.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c0260106.png" /> (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c0260107.png" /> denotes the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c0260108.png" />-algebra of Borel subsets of the Borel set <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c0260109.png" />), and in some cases a measurable selector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601010.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601011.png" />; 3) a jump measure in the form of a transition function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601012.png" /> defined for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601013.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601014.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601015.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601016.png" />, such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601017.png" /> is a Borel function in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601018.png" /> for each <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601019.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601020.png" /> is countably additive in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601021.png" /> for each <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601022.png" />; moreover <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601023.png" /> is bounded, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601024.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601025.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601026.png" />. Roughly speaking, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601027.png" /> is the probability that the process jumps into the set <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601028.png" /> in the time interval <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601029.png" /> when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601030.png" /> and control action <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601031.png" /> is applied.
Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601032.png" /> be the space of all piecewise-constant right-continuous functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601033.png" /> with values in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601034.png" />, let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601035.png" /> (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601036.png" />) be the minimal <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601037.png" />-algebra in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601038.png" /> with respect to which the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601039.png" /> is measurable for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601040.png" /> (for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601041.png" />) and let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601042.png" />. Any function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601043.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601044.png" /> with values <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601045.png" /> that is progressively measurable relative to the family <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601047.png" /> is called a (natural) strategy (or control). From the definition it follows that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601048.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601049.png" />. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601050.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601051.png" /> is a Borel function on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601052.png" /> with the property <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601053.png" />, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601054.png" /> is said to be a Markov strategy or Markov control, and if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601055.png" />, it is a stationary strategy or stationary control. The classes of natural, Markov and stationary strategies are denoted by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601056.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601057.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601058.png" />, respectively. In view of the possibility of making a measurable selection from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601059.png" />, the class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601060.png" /> (and hence <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601061.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601062.png" />) is not empty. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601063.png" /> is bounded, then for any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601064.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601065.png" /> one can construct a unique probability measure <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601066.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601067.png" /> such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601068.png" /> and for any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601069.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601070.png" />,
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601071.png" /> | (1a) |
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601072.png" /> |
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601073.png" /> | (1b) |
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601074.png" /> |
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601075.png" /> is the first jump time after <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601076.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601077.png" /> is the minimal <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601078.png" />-algebra in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601079.png" /> containing <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601080.png" /> and relative to which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601081.png" /> is measurable, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601082.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601083.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601084.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601085.png" />. The stochastic process <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601086.png" /> is a controlled jump Markov process. The control Markov property of a controlled jump Markov process means that from a known "present" <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601087.png" />, the "past" <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601088.png" /> enters in the right-hand side of (1a)–(1b) only through the strategy <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601089.png" />. For an arbitrary strategy <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601090.png" /> the process <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601091.png" /> is, in general, not Markovian, but if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601092.png" />, then one has a Markov process, while if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601093.png" /> and if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601094.png" /> does not depend on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601095.png" />, one has a homogeneous Markov process with jump measure equal to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601096.png" />. Control of the process consists of the selection of a strategy from the class of strategies.
A typical control problem is the maximization of a functional
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601097.png" /> | (2) |
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601098.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c02601099.png" /> are bounded Borel functions on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010100.png" /> and on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010101.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010102.png" /> is a fixed number. By defining suitable functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010103.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010104.png" /> and introducing fictitious states, a wide class of functionals containing terms of the form <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010105.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010106.png" /> is the moment of jump, and allowing termination of the process, can be reduced to the form (2). By a value function one denotes the function
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010107.png" /> | (3) |
A strategy <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010108.png" /> is called <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010111.png" />-optimal if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010112.png" /> for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010113.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010118.png" />-a.e. <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010119.png" />-optimal if this is true for almost-all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010120.png" /> relative to the measure <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010121.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010122.png" />. A <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010123.png" />-optimal strategy is called optimal. Now, in the model described above the interval <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010124.png" /> is shortened to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010125.png" />, and the symbols <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010126.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010127.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010128.png" /> are used in the sense in which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010129.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010130.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010131.png" /> were used before. Considering the jumps of the process as the succession of steps in a controlled discrete-time Markov chain one can establish the existence of a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010132.png" />-a.e. <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010133.png" />-optimal strategy in the class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010134.png" /> and obtain the measurability of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010135.png" /> in the form: <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010136.png" /> is an analytic set. This allows one to apply the ideas of dynamic programming and to derive the relation
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010137.png" /> | (4) |
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010138.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010139.png" /> (a variant of Bellman's principle). For <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010140.png" /> one obtains from (4) and (1a)–(1b) the Bellman equation
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010141.png" /> | (5) |
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010142.png" /> |
The value function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010143.png" /> is the only bounded function on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010144.png" />, absolutely continuous in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010145.png" /> and satisfying (5) and the condition <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010146.png" />. Equation (5) may be solved by the method of successive approximations. For <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010147.png" /> if follows from the Kolmogorov equation for the Markov process <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010148.png" /> that if the supremum in (5) is attained by a measurable function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010149.png" />, then the Markov strategy <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010150.png" /> is optimal. In this way the existence of optimal Markov strategies in semi-continuous models (in which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010151.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010152.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010153.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010154.png" /> satisfy the compactness and continuity conditions of the definition) is established, in particular for finite models (with finite <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010155.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010156.png" />). In arbitrary Borel models one can conclude the existence of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010157.png" />-a.e. <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010158.png" />-optimal Markov strategies for any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010159.png" /> by using a measurable selection theorem (cf. Selection theorems). In countable models one obtains Markov <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010160.png" />-optimal strategies. The results can partly be extended to the case when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010161.png" /> and the functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010162.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010163.png" /> are unbounded, but in general the sufficiency of Markov strategies, i.e. optimality of Markov strategies in the class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010164.png" />, has not been proved.
For homogeneous models, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010165.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010166.png" /> do not depend on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010167.png" />, one considers along with (2) the functionals
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010168.png" /> | (6) |
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010169.png" /> | (7) |
and moreover poses the question of sufficiency for the class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010170.png" />. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010171.png" /> and the Borel function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010172.png" /> is bounded, then equation (5) for the functional (6) becomes
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010173.png" /> | (8) |
This equation coincides with Bellman's equation for the analogous problem with discrete time and it has a unique bounded solution. If the supremum in (8) is attained for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010174.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010175.png" />, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010176.png" /> is optimal. The results on the existence of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010177.png" />-optimal strategies in the class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010178.png" />, analogous to those mentioned above, can also be obtained. For the criterion (7) complete results have been obtained only for finite and special forms of ergodic controlled jump Markov processes and similar cases of discrete time: one can choose <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010179.png" /> and a function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010180.png" /> in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010181.png" /> such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010182.png" /> is optimal for the criterion (2) at once for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010183.png" />, and hence optimal for the criterion (7).
Controlled diffusion process.
This is a continuous controlled random process in a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010184.png" />-dimensional Euclidean space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010185.png" />, admitting a stochastic differential with respect to a certain Wiener process which enters exogenously. The theory of controlled diffusion processes arose as a generalization of the theory of controlled deterministic systems represented by equations of the form <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010186.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010187.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010188.png" /> is the state of the system and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010189.png" /> is the control parameter.
For a formal description of controlled diffusion processes one uses the language of Itô stochastic differential equations. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010190.png" /> be a complete probability space and let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010191.png" /> be an increasing family of complete <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010192.png" />-algebras <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010193.png" /> contained in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010194.png" />. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010195.png" /> be a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010196.png" />-dimensional Wiener process relative to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010197.png" />, defined on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010198.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010199.png" /> (i.e. a process for which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010200.png" /> is the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010201.png" />-dimensional continuous standard Wiener process for each <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010202.png" />, the processes <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010203.png" /> are independent, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010204.png" /> is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010205.png" />-measurable for each <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010206.png" />, and for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010207.png" /> the random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010208.png" /> are independent of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010209.png" />). Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010210.png" /> be a separable metric space. For <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010211.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010212.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010213.png" /> two functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010214.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010215.png" /> are assumed to be given, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010216.png" /> is a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010217.png" />-dimensional matrix and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010218.png" /> is a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010219.png" />-dimensional vector. Assume that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010220.png" /> are Borel functions in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010221.png" /> satisfying a Lipschitz condition for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010222.png" /> with constants not depending on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010223.png" /> and such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010224.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010225.png" /> are bounded. An arbitrary process <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010226.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010227.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010228.png" />, progressively measurable relative to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010229.png" /> and taking values in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010230.png" />, is called a strategy (or control); <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010231.png" /> denotes the set of all strategies. For every <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010232.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010233.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010234.png" />, there exists a unique solution of the Itô stochastic differential equation
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010235.png" /> | (9) |
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010236.png" /> |
(Itô's theorem). This solution, denoted by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010237.png" />, is called a controlled diffusion process (controlled process of diffusion type); it is controlled by selection of the strategy <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010238.png" />. Besides strategies in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010239.png" /> one can consider other classes of strategies. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010240.png" /> be the space of continuous functions on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010241.png" /> with values in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010242.png" />. The semi-axis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010243.png" /> may be interpreted as the set of values of the time <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010244.png" />. Elements of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010245.png" /> are denoted by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010246.png" />. Further, let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010247.png" /> be the smallest <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010248.png" />-algebra of subsets of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010249.png" /> relative to which the coordinate functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010250.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010251.png" /> in the space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010252.png" /> are measurable. A function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010253.png" /> with values in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010254.png" /> is called a natural strategy, or natural control, admissible at the point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010255.png" /> if it is progressively measurable relative to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010256.png" /> and if for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010257.png" /> there exists at least one solution of the equation (9) that is progressively measurable relative to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010258.png" />. The set of all natural strategies admissible at <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010259.png" /> is denoted by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010260.png" />, its subset consisting of all natural strategies of the form <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010261.png" /> is denoted by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010262.png" /> and is called the set of Markov strategies, or Markov controls, admissible at the point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010263.png" />. One can say that a natural strategy defines an equation at the moment of time <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010264.png" /> on the basis of the observations of the process <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010265.png" /> on the time interval <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010266.png" />, and that a Markov strategy defines an equation on the basis of observations of the process only at the moment of time <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010267.png" />. For <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010268.png" /> (even for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010269.png" />) the solution of (9) need not be unique. Therefore, for every <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010270.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010271.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010272.png" />, one arbitrarily fixes some solution of (9) and denotes it by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010273.png" />.
Then, using the formula <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010274.png" />, one defines an imbedding <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010275.png" /> for which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010276.png" /> (a. e.).
The aim of the control is normally to maximize or minimize the expectation of some functional of the trajectory <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010277.png" />. A general formulation is as follows. On <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010278.png" /> let Borel functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010279.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010280.png" /> be defined, and let a Borel function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010281.png" /> be defined on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010282.png" />. For <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010283.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010284.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010285.png" /> one denotes by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010286.png" /> the first exit time of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010287.png" /> from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010288.png" />, and puts
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010289.png" /> | (10) |
where the indices <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010290.png" /> to the expectation sign mean that they should be introduced under the expectation sign as needed. There arises then the problem of determining a strategy <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010291.png" /> maximizing <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010292.png" />, and of determining a value function
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010293.png" /> | (11) |
A strategy <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010294.png" /> for which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010295.png" /> is called <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010298.png" />-optimal for the point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010299.png" />. Optimal means a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010300.png" />-optimal strategy. If in (11) the set <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010301.png" /> is replaced by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010302.png" /> (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010303.png" />), then the corresponding least upper bound is denoted by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010304.png" /> (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010305.png" />). Since one has the inclusion <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010306.png" />, it follows that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010307.png" />. Under reasonably wide assumptions (cf. [3]) it is known that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010308.png" /> (this is so if, e.g., <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010309.png" /> are continuous in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010310.png" />, continuous in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010311.png" /> uniformly in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010312.png" /> for every <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010313.png" /> and if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010314.png" /> are bounded in absolute value by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010315.png" /> for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010316.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010317.png" /> do not depend on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010318.png" />). The question of the equality <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010319.png" /> in general situations is still open. A formal application of the ideas of dynamic programming reduces this to the so-called Bellman principle:
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010320.png" /> | (12) |
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010321.png" /> |
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010322.png" /> are arbitrarily defined stopping times (cf. Markov moment) not exceeding <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010323.png" />. If in (12) one replaces <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010324.png" /> by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010325.png" />, and applies Itô's formula to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010326.png" />, then after some non-rigorous arguments one arrives at the Bellman equation:
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010327.png" /> | (13) |
where
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010328.png" /> | (14) |
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010329.png" /> |
and where the indices <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010330.png" /> are assumed to be summed from 1 to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010331.png" />; the matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010332.png" /> is defined by
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010333.png" /> |
Bellman's equation plays a central role in the theory of controlled diffusion processes, since it often turns out that a sufficiently "good" solution of it, equal to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010334.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010335.png" />, is the value function, while if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010336.png" /> for every <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010337.png" /> realizes the least upper bound in (13) and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010338.png" /> is a Markov strategy admissible at <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010339.png" />, then the strategy <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010340.png" /> is optimal at the point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010341.png" />. Thus one can sometimes show that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010342.png" />.
A rigorous proof of such results meets with serious difficulties, connected with the non-linear character of equation (13), which in general is a non-linear degenerate parabolic equation. The simplest case is that in which (13) is a non-degenerate quasi-linear equation (the matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010343.png" /> does not depend on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010344.png" /> and is uniformly non-degenerate in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010345.png" />). Here, under certain additional restrictions on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010346.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010347.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010348.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010349.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010350.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010351.png" /> one can make use of results from the theory of quasi-linear parabolic equations to prove the solvability of (13) in Hölder classes of functions and to give a method for constructing <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010352.png" />-optimal strategies, based on a solution of (13). An analogous approach can be used (cf. [3]) in the one-dimensional case when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010353.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010354.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010355.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010356.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010357.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010358.png" /> are bounded and do not depend on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010359.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010360.png" /> is uniformly bounded away from zero. In this case (13) reduces to a second-order quasi-linear equation on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010361.png" />, such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010362.png" /> and (13) can be solved for its highest derivative <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010363.png" />. Methods of the theory of differential equations help in the study of (13) even if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010364.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010365.png" /> is a two-dimensional domain, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010366.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010367.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010368.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010369.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010370.png" /> do not depend on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010371.png" /> (cf. [3]). Here, as in previous cases, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010372.png" /> is allowed to depend on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010373.png" />. It is relevant also to mention the case of the Hamilton–Jacobi equation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010374.png" />, which may be studied by methods of the theory of differential equations (cf. [5]).
By methods of the theory of stochastic processes one can show that the value function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010375.png" /> satisfies equation (13) in more general cases under certain types of smoothness assumptions on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010376.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010377.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010378.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010379.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010380.png" /> if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010381.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010382.png" /> (cf. [3]).
Along with problems of controlled motion, one can also consider optimal stopping of the controlled processes for one or two persons, e.g. maximization over <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010383.png" /> and an arbitrary stopping time <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010384.png" /> of a value functional of the form:
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010385.png" /> |
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010386.png" /> |
Related to the theory of controlled diffusion processes are controlled partially-observable processes and problems of control of stochastic processes, in which the control is realizable by the selection of a measure on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010387.png" /> from a given class of measures, corresponding to processes of diffusion type (cf. [3], [4], [6], [7], [8]).
References
| [1] | I.I. [I.I. Gikhman] Gihman, A.V. [A.V. Skorokhod] Skorohod, "Controlled stochastic processes" , Springer (1977) (Translated from Russian) |
| [2] | A.A. Yushkevich, "Controlled jump Markov models" Theory Probab. Appl. , 25 (1980) pp. 244–266 Teor. Veroyatnost. i Primenen. , 25 (1980) pp. 247–270 |
| [3] | N.V. Krylov, "Controlled diffusion processes" , Springer (1980) (Translated from Russian) |
| [4] | W.H. Fleming, R.W. Rishel, "Deterministic and stochastic optimal control" , Springer (1975) |
| [5] | S.N. Kruzhkov, "Generalized solutions of the Hamilton–Jacobi equations of eikonal type. I. Statement of the problem, existence, uniqueness and stability theorems, some properties of the solutions" Mat. Sb. , 98 : 3 (1975) pp. 450–493 (In Russian) |
| [6] | R.S. Liptser, A.N. Shiryaev, "Statistics of random processes" , 1–2 , Springer (1977–1978) (Translated from Russian) |
| [7] | W.M. Wonham, "On the separation theorem of stochastic control" SIAM J. Control , 6 (1968) pp. 312–326 |
| [8] | M.H.A. Davis, "The separation principle in stochastic control via Girsanov solutions" SIAM J. Control and Optimization , 14 (1976) pp. 176–188 |
Comments
The Bellman equations mentioned above (equations (5), (13)) are in this form sometimes called the Bellman–Hamilton–Jacobi equation.
A controlled diffusion process is also defined as a controlled random process, in some Euclidean space, whose measure admits a Radon–Nikodým derivative with respect to a certain Wiener process which is independent of the control.
There are many important topics in the theory of controlled stochastic process other than those mentioned above. Controlled stepwise (jump) processes are of limited interest due to the lack of important applications. The following comments are intended to put the subject in a wider perspective, as well as pointing to some recent technical innovations.
Controlled processes in discrete time.
These are normally specified by a state transition equation
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010388.png" /> | (a1) |
Here <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010389.png" /> is the state at time <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010390.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010391.png" /> is the control and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010392.png" /> is a given sequence of independent, identically distributed random variables with common distribution function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010393.png" />. If the initial state <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010394.png" /> is independent of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010395.png" /> and the control <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010396.png" /> is Markovian, i.e. <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010397.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010398.png" />, then the process <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010399.png" /> defined by
is Markovian. The control objective is normally to minimize a cost function such as
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010400.png" /> |
The number <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010401.png" /> of stages may be finite or infinite; <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010402.png" /> is the discount factor. The one-stage cost with terminal cost <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010403.png" /> is
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010404.png" /> |
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010405.png" /> |
Define
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010406.png" /> |
then the principle of dynamic programming indicates that
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010407.png" /> |
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010408.png" />, etc., and that the optimal control <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010409.png" /> is the value such that
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010410.png" /> |
In the infinite horizon case (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010411.png" />) one expects that if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010412.png" />, then
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010413.png" /> |
and that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010414.png" /> will satisfy Bellman's functional equation
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010415.png" /> |
The general theory of discrete-time control concerns conditions under which results of the type above can be rigorously substantiated. Generally, contraction (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010416.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010417.png" />) or monotonicity (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010418.png" />) conditions are required. <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010419.png" /> is not necessarily measurable if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010420.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010421.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010422.png" /> are merely Borel functions. However, if these functions are lower semi-analytic, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010423.png" /> is lower semi-analytic and existence of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010424.png" />-optimal universally measurable policies can be proved. [a1], [a2] are excellent references for this theory.
Viscosity solutions of the Bellman equations.
Return to the controlled diffusion problem (9), (10) and write the Bellman equation (a1) as
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010425.png" /> | (a2) |
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010426.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010427.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010428.png" /> coincides with the left-hand side of (13). As pointed out in the main article, it is a difficult matter to decide in which sense, if any, the value function, defined by (12), satisfies (a2). The concept of viscosity solutions of the Bellman equation, introduced for first-order equations in [a3], provides an answer to this question. A function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010429.png" /> is a viscosity solution of (a2) if for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010430.png" />,
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010431.png" /> |
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010432.png" /> |
Note that any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010433.png" /> solution of (a2) is a viscosity solution and that if a viscosity solution is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010434.png" /> at some point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010435.png" />, then (a2) is satisfied at <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010436.png" />. It is possible to show in great generality that if the value function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010437.png" /> of (12) is continuous, then it is a viscosity solution of (a1). [a14] can be consulted for a proof of this result, conditions under which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010438.png" /> is continuous and other results on uniqueness and regularity of viscosity solutions.
A probabilistic approach.
The theory of controlled diffusion is intimately connected with partial differential equations. However the most general results on existence of optimal controls and on stochastic maximum principles (see below) can be obtained by purely probabilistic methods. This is described below for the diffusion model (9) where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010439.png" /> does not depend on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010440.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010441.png" /> is uniformly positive definite. In this case a weak solution of (9) can be defined for any feedback control <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010442.png" />; denote by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010443.png" /> the set of such controls and by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010444.png" /> the expectation with respect to the sample space measure for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010445.png" /> when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010446.png" />. Suppose that the pay-off (see also Gain function) to be maximized is
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010447.png" /> |
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010448.png" /> is a fixed time. Define
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010449.png" /> |
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010450.png" />. Let a scalar process <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010451.png" /> be defined by
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010452.png" /> |
Thus, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010453.png" /> is the maximal expected total pay-off given the control chosen and the evolution of the process up to time <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010454.png" />. It is possible to show that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010455.png" /> is always a supermartingale (cf. Martingale) and that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010456.png" /> is a martingale if and only if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010457.png" /> is optimal. <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010458.png" /> has the Doob–Meyer decomposition <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010459.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010460.png" /> is a martingale and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010461.png" /> is a continuous increasing process. Thus <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010462.png" /> is optimal if and only if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010463.png" />. By the martingale representation theorem (cf. Martingale), <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010464.png" /> can always be written in the form
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010465.png" /> |
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010466.png" /> is the Wiener process appearing in the weak solution of (9) with control <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010467.png" />. It is easily shown that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010468.png" /> does not depend on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010469.png" /> and that the relation between <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010470.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010471.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010472.png" /> is
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010473.png" /> |
where
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010474.png" /> |
This immediately gives a maximum principle: if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010475.png" /> is optimal, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010476.png" />; but <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010477.png" /> is increasing, so it must be the case that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010478.png" /> a.e., which implies that
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010479.png" /> | (a3) |
One also gets an existence theorem: Since <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010480.png" /> is the same for all controls one can construct an optimal control <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010481.png" /> by taking
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010482.png" /> |
Similar techniques can be applied to very general classes of controlled stochastic differential systems (not just controlled diffusion) and to optimal stopping and impulse control problems (see below). General references are [a5], [a6]. Some of this theory has also been developed using methods of non-standard analysis [a7].
Stochastic maximum principle.
The necessary condition (a3) is not as it stands a true maximum principle because the "adjoint variable" <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010483.png" /> is only implicitly characterized. It is shown in [a8] and elsewhere that under wide conditions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010484.png" /> is given by
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010485.png" /> |
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010486.png" /> is an optimal control and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010487.png" /> is the fundamental solution of the linearized or derivative system corresponding to (9) with control <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010488.png" />, i.e. it satisfies
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010489.png" /> |
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010490.png" /> |
This gives the stochastic maximum principle in a form which is directly analogous to the Pontryagin maximum principle of deterministic optimal control theory.
Impulse control.
In many important applications, control is not exercized continuously, but rather a sequence of "interventions" is made at isolated instants of time. The theory of impulse control is a mathematical formulation of this kind of problem. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010491.png" /> be a homogeneous Markov process on a state space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010492.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010493.png" /> (the set of right-continuous <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010494.png" />-valued function having limits from the left). Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010495.png" /> be the corresponding semi-group: <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010496.png" />. Informally, a controlled process <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010497.png" /> is defined as follows. A strategy <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010498.png" /> is a sequence <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010499.png" /> of random times <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010500.png" /> and states <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010501.png" />, with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010502.png" /> strictly increasing. <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010503.png" /> starts at some fixed point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010504.png" /> and follows a realization of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010505.png" /> up to time <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010506.png" />. At <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010507.png" /> the position of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010508.png" /> is moved to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010509.png" />, then follows a realization of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010510.png" /> starting at <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010511.png" /> up to time <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010512.png" />; etc. A filtered probability space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010513.png" /> carrying <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010514.png" /> is constructed in such a way that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010515.png" /> is adapted to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010516.png" /> (cf. Optional random process) and for each <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010517.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010518.png" /> is a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010519.png" /> stopping time and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010520.png" /> is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010521.png" />-measurable. It is convenient to formulate the optimization problem in terms of minimizing a cost function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010522.png" />, which generally takes the form
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010523.png" /> |
Suppose that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010524.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010525.png" />; this rules out the strategies having more than a finite number of interventions in bounded time intervals. The value function is
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010526.png" /> |
Define the operator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010527.png" /> by
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010528.png" /> |
When <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010529.png" /> is compact and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010530.png" /> is a Feller process it is possible to show [a9] that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010531.png" /> is continuous and that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010532.png" /> is the largest continuous function satisfying
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010533.png" /> | (a4) |
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010534.png" /> | (a5) |
The optimal strategy <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010535.png" /> is:
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010536.png" /> |
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010537.png" /> |
Thus, the state space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010538.png" /> divides into a continuation set, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010539.png" />, and an intervention set, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010540.png" />. Further, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010541.png" /> is the unique solution of
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010542.png" /> | (a6) |
where the infimum is taken over the set of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010543.png" /> stopping times <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010544.png" />. This shows the close connection between impulse control and optimal stopping: (a6) is an optimal stopping problem for the process <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010545.png" /> with implicit obstacle <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010546.png" />. Similar results are obtained for right processes in [a6], [a10]; the measurability properties here are more delicate. There is also well-developed analytic theory of impulse control. Assuming <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010547.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010548.png" /> is the differential generator of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010549.png" />, one obtains from (a5)
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010550.png" /> | (a7) |
Further, equality holds in at least one of (a4), (a7) at each <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010551.png" />, i.e.
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010552.png" /> | (a8) |
Equations (a4), (a7), (a8) characterize <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010553.png" /> and have been extensively studied for diffusion processes (i.e. when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010554.png" /> is a second-order differential operator) using the method of quasi-variational inequalities [a11]. Existence and regularity properties are obtained.
Control of applied non-diffusion models.
Many applied problems in operations research — for example in queueing systems or inventory control — involve optimization of non-diffusion stochastic models. These are generalizations of the jump process described in the main article allowing for non-constant trajectories between jumps and for various sorts of boundary behaviour. There have been various attempts to create a unified theory for such problems: piecewise-deterministic Markov processes [a12], Markov decision drift processes [a13], [a14]. Both continuous and impulse control are studied, as well as discretization methods and computational techniques.
Control of partially-observed processes.
This subject is still far from completely understood, despite important recent advances. It is closely related to the theory of non-linear filtering. Consider a controlled diffusion as in (9), where control must be based on observations of a scalar process <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010555.png" /> given by
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010556.png" /> | (a9) |
(<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010557.png" /> is another independent Wiener process and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010558.png" /> is, say, bounded), with a pay-off functional of the form
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010559.png" /> |
is to be maximized. This problem can be formulated in the following way. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010560.png" /> be independent Wiener processes on some probability space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010561.png" /> and let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010562.png" /> be the natural filtration of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010563.png" />. The admissible controls <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010564.png" /> are all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010565.png" />-valued processes <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010566.png" /> adapted to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010567.png" />. Under standard conditions (9) has a unique strong solution for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010568.png" />. Now define a measure <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010569.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010570.png" /> by
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010571.png" /> |
By Girsanov's theorem, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010572.png" /> is a probability measure and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010573.png" /> is Wiener process under measure <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010574.png" />. Thus <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010575.png" /> satisfy (9), (a9) on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010576.png" />. For any function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010577.png" />, put <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010578.png" />. According to the Kallianpur–Striebel formula, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010579.png" /> where 1 denotes the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010580.png" /> and
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010581.png" /> |
<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010582.png" /> can be thought of as a non-normalized conditional distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010583.png" /> given <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010584.png" />; it satisfies the Zakai equation
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010585.png" /> | (a10) |
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010586.png" /> is given by (14) with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010587.png" />. It follows from the properties of conditional mathematical expectation that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010588.png" /> can be expressed in the form
| <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010589.png" /> |
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010590.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010591.png" />. This shows that the partially-observed problem (9), (a9) is equivalent to a problem (a10),
with complete observations on the probability space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010592.png" /> where the controlled process is the measure-valued diffusion <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026010/c026010593.png" />. The question of existence of optimal controls has been extensively studied. It seems that optimal controls do exist, but only if some form of randomization is introduced; see [a7], [a15], [a16]. In addition, maximum principles have been obtained [a17], [a18] and some preliminary study of the Bellman equation undertaken [a19].
References
| [a1] | D.P. Bertsekas, S.E. Shreve, "Stochastic optimal control: the discrete-time case" , Acad. Press (1978) |
| [a2] | E.B. Dynkin, A.A. Yushkevich, "Controlled Markov processes" , Springer (1979) |
| [a3] | M.G. Crandall, P.L. Lions, "Viscosity solutions of Hamilton–Jacobi equations" Trans. Amer. Math. Soc. , 277 (1983) pp. 1–42 |
| [a4a] | P.L. Lions, "Optimal control of diffusion processes and Hamilton–Jacobi–Bellman equations Part I" Comm. Partial Differential Eq. , 8 (1983) pp. 1101–1134 |
| [a4b] | P.L. Lions, "Optimal control of diffusion processes and Hamilton–Jacobi–Bellman equations Part II" Comm. Partial Differential Eq. , 8 (1983) pp. 1229–1276 |
| [a5] | R.J. Elliott, "Stochastic calculus and applications" , Springer (1982) |
| [a6] | N. El Karoui, "Les aspèctes probabilistes du contrôle stochastique" , Lect. notes in math. , 876 , Springer (1980) |
| [a7] | S. Albeverio, J.E. Fenstad, R. Høegh-Krohn, T. Lindstrøm, "Nonstandard methods in stochastic analysis and mathematical physics" , Acad. Press (1986) |
| [a8] | U.G. Hausmann, "A stochastic maximum principle for optimal control of diffusions" , Pitman (1986) |
| [a9] | M. Robin, "Contrôle impulsionel des processus de Markov" , Univ. Paris IX (1978) (Thèse d'Etat) |
| [a10] | J.P. Lepeltier, B. Marchal, "Théorie générale du contrôle impulsionnel Markovien" SIAM. J. Control and Optimization , 22 (1984) pp. 645–665 |
| [a11] | A. Bensoussan, J.L. Lions, "Impulse control and quasi-variational inequalities" , Gauthier-Villars (1984) |
| [a12] | M.H.A. Davis, "Piecewise-deterministic Markov processes: a general class of non-diffusion stochastic models" J. Royal Statist. Soc. (B) , 46 (1984) pp. 353–388 |
| [a13] | F.A. van der Duyn Schouten, "Markov decision drift processes" , CWI , Amsterdam (1983) |
| [a14] | A.A. Yushkevich, "Continuous-time Markov decision processes with intervention" Stochastics , 9 (1983) pp. 235–274 |
| [a15] | W.H. Fleming, E. Paradoux, "Optimal control for partially-observed diffusions" SIAM J. Control and Optimization , 20 (1982) pp. 261–285 |
| [a16] | V.S. Borkar, "Existence of optimal controls for partially-observed diffusions" Stochastics , 11 (1983) pp. 103–141 |
| [a17] | A. Bensoussan, "Maximum principle and dynamic programming approaches of the optimal control of partially-observed diffusions" Stochastics , 9 (1983) pp. 169–222 |
| [a18] | U.G. Haussmann, "The maximum principle for optimal control of diffusions with partial information" SIAM J. Control and Optimization , 25 (1987) pp. 341–361 |
| [a19] | V.E. Beneš, I. Karatzas, "Filtering of diffusions controlled through their conditional measures" Stochastics , 13 (1984) pp. 1–23 |
| [a20] | D.P. Bertsekas, "Dynamic programming and stochastic control" , Acad. Press (1976) |
| [a21] | H.J. Kushner, "Stochastic stability and control" , Acad. Press (1967) |
| [a22] | C. Striebel, "Optimal control of discrete time stochastic systems" , Lect. notes in econom. and math. systems , 110 , Springer (1975) |
| [a23] | P.L. Lions, "On the Hamilton–Jacobi–Bellmann equations" Acta Appl. Math. , 1 (1983) pp. 17–41 |
| [a24] | M. Robin, "Long-term average cost control problems for continuous time Markov processes. A survey" Acta Appl. Math. , 1 (1983) pp. 281–299 |
