Decentralized partially observable Markov decision process

The decentralized partially observable Markov decision process (Dec-POMDP)^[1]^[2] is a model for coordination and decision-making among multiple agents. It is a probabilistic model that can consider uncertainty in outcomes, sensors and communication (i.e., costly, delayed, noisy or nonexistent communication).

It is a generalization of a Markov decision process (MDP) and a partially observable Markov decision process (POMDP) to consider multiple decentralized agents.^[3]

Definition

Formal definition

A Dec-POMDP is a 7-tuple [math]\displaystyle{ (S,\{A_i\},T,R,\{\Omega_i\},O,\gamma) }[/math], where

[math]\displaystyle{ S }[/math] is a set of states,
[math]\displaystyle{ A_i }[/math] is a set of actions for agent i, with [math]\displaystyle{ A=\times_i A_i }[/math] is the set of joint actions,
[math]\displaystyle{ T }[/math] is a set of conditional transition probabilities between states, [math]\displaystyle{ T(s,a,s')=P(s'\mid s,a) }[/math],
[math]\displaystyle{ R: S \times A \to \mathbb{R} }[/math] is the reward function.
[math]\displaystyle{ \Omega_i }[/math] is a set of observations for agent i, with [math]\displaystyle{ \Omega=\times_i \Omega_i }[/math] is the set of joint observations,
[math]\displaystyle{ O }[/math] is a set of conditional observation probabilities [math]\displaystyle{ O(s',a, o)=P(o\mid s',a) }[/math], and
[math]\displaystyle{ \gamma \in [0, 1] }[/math] is the discount factor.

At each time step, each agent takes an action [math]\displaystyle{ a_i \in A_i }[/math], the state updates based on the transition function [math]\displaystyle{ T(s,a,s') }[/math] (using the current state and the joint action), each agent observes an observation based on the observation function [math]\displaystyle{ O(s',a, o) }[/math] (using the next state and the joint action) and a reward is generated for the whole team based on the reward function [math]\displaystyle{ R(s,a) }[/math]. The goal is to maximize expected cumulative reward over a finite or infinite number of steps. These time steps repeat until some given horizon (called finite horizon) or forever (called infinite horizon). The discount factor [math]\displaystyle{ \gamma }[/math] maintains a finite sum in the infinite-horizon case ([math]\displaystyle{ \gamma \in [0,1) }[/math]).

References

↑ Bernstein, Daniel S.; Givan, Robert; Immerman, Neil; Zilberstein, Shlomo (November 2002). "The Complexity of Decentralized Control of Markov Decision Processes". Mathematics of Operations Research 27 (4): 819–840. doi:10.1287/moor.27.4.819.297. ISSN 0364-765X.
↑ Oliehoek, Frans A.; Amato, Christopher (2016) (in en-gb). A Concise Introduction to Decentralized POMDPs | SpringerLink. SpringerBriefs in Intelligent Systems. doi:10.1007/978-3-319-28929-8. ISBN 978-3-319-28927-4. http://www.fransoliehoek.net/docs/OliehoekAmato16book.pdf.
↑ Oliehoek, Frans A.; Amato, Christopher (2016-06-03) (in en). A Concise Introduction to Decentralized POMDPs. Springer. ISBN 978-3-319-28929-8. https://books.google.com/books?id=FZRPDAAAQBAJ&q=Decentralized+partially+observable+Markov+decision+process.

External links

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Decentralized partially observable Markov decision process. Read more

[1] Bernstein, Daniel S.; Givan, Robert; Immerman, Neil; Zilberstein, Shlomo (November 2002). "The Complexity of Decentralized Control of Markov Decision Processes". Mathematics of Operations Research 27 (4): 819–840. doi:10.1287/moor.27.4.819.297. ISSN 0364-765X.

[2] Oliehoek, Frans A.; Amato, Christopher (2016) (in en-gb). A Concise Introduction to Decentralized POMDPs | SpringerLink. SpringerBriefs in Intelligent Systems. doi:10.1007/978-3-319-28929-8. ISBN 978-3-319-28927-4. http://www.fransoliehoek.net/docs/OliehoekAmato16book.pdf.

[3] Oliehoek, Frans A.; Amato, Christopher (2016-06-03) (in en). A Concise Introduction to Decentralized POMDPs. Springer. ISBN 978-3-319-28929-8. https://books.google.com/books?id=FZRPDAAAQBAJ&q=Decentralized+partially+observable+Markov+decision+process.

[1]

[2]

[3]

Anonymous

Search

Decentralized partially observable Markov decision process

Namespaces

More

Page actions

Contents

Definition

Formal definition

References

External links

Navigation

Navigation

Help

Translate

Wiki tools

Wiki tools

Anonymous

Search

Decentralized partially observable Markov decision process

Definition

Formal definition

References

External links

Navigation

Wiki tools

Page tools

Other projects

Categories