Philosophy:Functional Decision Theory

From HandWiki
Revision as of 07:35, 5 February 2024 by John Stpola (talk | contribs) (url)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Short description: Decision theory

Functional Decision Theory (FDT) is a school of thought within decision theory which states that, when a rational agent is confronted with a set of possible actions, one should select the decision procedure (a fixed mathematical decision function, as opposed to a singular act) that leads to the best output.[1][2] It aims to provide a more reliable method to maximize utility — the measure of how much an outcome satisfies an agent's preference — than the more prominent decision theories, Causal Decision Theory (CDT) and Evidential Decision Theory (EDT).

In general, CDT states that the agent should consider the causal effects of their actions to maximize the utility;[3] in other words, it prescribes to act in the way that will produce the best consequences given the situation at hand.[4] EDT states that the agent should look at how likely certain outcomes are given their actions and observations (regardless of causality); in other words, it advises an agent to ‘do what you most want to learn that you will do.’[5] Many proponents of FDT argue that, since there are some scenarios in which either CDT, EDT or both do not prescribe the most rational choice, both theories are incorrect.[1][6]

Background

FDT was first proposed by Eliezer Yudkowsky and Nate Soares in a 2017 research paper supported by the Machine Intelligence Research Institute (MIRI).[1] Prior to this publication, Yudkowsky had proposed another, albeit similar, decision theory, which he named Timeless Decision Theory (TDT).[7] Roughly speaking, Timeless Decision Theory states that, rather than acting like you are determining an individual decision, you should act as if you are determining the output of an abstract computation.[8] The original paper and the idea behind TDT was seen as a work in progress, which found much criticism due to its vagueness.[9]

Broadly, FDT can be seen as a replacement of TDT,[10] and a generalization of Wei Dai's Updateless Decision Theory (UDT).[11][12]

Informal description

Informally, Functional Decision Theory recommends the agent to select her decision procedure that produces the best outcome. It claims that the agent possesses a model of her decision procedures (which a reliable predictor must also know to high certainty), and which she can alter accordingly. As a visual example, when you type "2 + 2" on a calculator, and receive the answer of "4", you conclude that 2 + 2 = 4 because the calculator runs the same function. Similarly, a predictor, such as that in Newcomblike problems, also runs the same decision function of the agent in order to predict her actions.[2]

Philosophical Thought Experiments

FDT outperforms both CDT and EDT

Parfit's Hitchhiker

This problem shows a scenario in which FDT outperforms both CDT and EDT simultaneously. It states that: “An agent is dying in the desert. A driver comes along who offers to give the agent a ride into the city, but only if the agent will agree to visit an ATM once they arrive and give the driver $1,000. The driver will have no way to enforce this after they arrive, but he does have an extraordinary ability to detect lies with 99% accuracy. Being left to die causes the agent to lose the equivalent of $1,000,000. In the case where the agent gets to the city, should she proceed to visit the ATM and pay the driver?”[13]

The CDT agent says no. Given that she has safely arrived in the city, she sees nothing further to gain by paying the driver. The EDT agent agrees: on the assumption that she is already in the city, it would be bad news for her to learn that she was out $1,000. Assuming that the CDT and EDT agents are smart enough to know what they would do upon arriving in the city, this means that neither can honestly claim that they would pay. The driver, detecting the lie, leaves them in the desert to die. The prescriptions of CDT and EDT here run contrary to many people’s intuitions, which say that the most “rational” course of action is to pay upon reaching the city. Certainly if these agents had the opportunity to make binding pre-commitments to pay upon arriving, they would achieve better outcomes.

The FDT agent reasons the driver models her reasoning in order to detect her lies. Therefore, she does pay up, even though she knows she is out of the desert already. While it might seem irrational to pay even though one is already outside of the desert, it is convenient to be the kind of agent that pays up in these kind of scenarios — because it means you, while still in the desert, can honestly claim to pay up once you’re in the city, and therefore it means the driver will take you.[1]

Blackmail

The following dilemma, as stated by Yudkowsky:

A blackmailer has a nasty piece of information which incriminates both the blackmailer and the agent. She has written a computer program which, if run, will publish it on the internet, costing $1,000,000 in damages to both of them. If the program is run, the only way it can be stopped is for the agent to wire the blackmailer $1,000 within 24 hours—the blackmailer will not be able to stop the program once it is running. The blackmailer would like the $1,000, but doesn’t want to risk incriminating herself, so she only runs the program if she is quite sure that the agent will pay up. She is also a perfect predictor of the agent, and she runs the program (which, when run, automatically notifies her via a blackmail letter) if she predicts that she would pay upon receiving the blackmail. Imagine that the agent receives the blackmail letter. Should she wire $1,000 to the blackmailer?

While CDT and EDT would both pay the blackmailer, the FDT agent reasons, “Paying corresponds to a world where I lose $1,000; refusing corresponds to a world where I never get blackmailed (as the blackmailer would have predicted this). The latter looks better, so I refuse.” As such, she never gets blackmailed — her counterfactual reasoning is proven correct, according to Yudkowsky.[1]

FDT and EDT outperform both CDT

Newcomb's Paradox

In Newcomb's Paradox, an agent finds herself standing in front of a transparent box labeled “A” that contains $1,000, and an opaque box labeled “B” that contains either $1,000,000 or $0. A reliable predictor, who has made similar predictions in the past and has been correct 99% of the time, claims to have placed $1,000,000 in box B if she predicted that the agent would leave box A behind. The predictor has already made her prediction and left. Box B is now empty or full. Should the agent take both boxes (“two-boxing”), or only box B, leaving the transparent box containing $1,000 behind (“one-boxing”)?

Possible Outcomes
Predicted One-Boxing Predicted Two-Boxing
One-Boxing $1,000,000 $0
Two-Boxing $1,001,000 $1,000

An agent using CDT argues that at the moment she is making the decision to one-box or two-box, the predictor has already either put a million dollars or nothing in box B. Her own decision now can't change the predictor's earlier decision; she can't cause the past to be different. Furthermore, no matter what the content of box B actually is, two-boxing gives an extra thousand dollars. The CDT agent therefore two-boxes. In contrast, an EDT agent argues as follows: “If I two-box, the predictor will almost certainly have predicted this. Future-me two-boxing would therefore be strong evidence that box B is empty. If I one box, the predictor will almost certainly have predicted that too — which is why future-me one-boxing would be strong evidence of box B containing a million dollars.” Following this line of reasoning, the EDT agent, in contrast to the CDT agent, one-boxes.[14]

In the case of an FDT agent, she reasons that the predictor must have a model of her decision process. Therefore, it would be best if the FDT agent’s decision procedure would lead to her one-boxing, because then the predictor's model of the FDT agent’s decision procedure would also output one-boxing, leading him to predict the FDT agent will one-box and put a million dollars in box B. Then, since FDT and EDT both one-box they will receive a million, outperforming CDT which only obtain $1,000 by two-boxing.[15]

In general, a Newcomb problem illustrates choice situations in which:

  • An option (one-boxing) that reliably indicates the presence of some desirable state (box B contains a million dollars)
  • Another option (two-boxing) that reliably indicates the presence of some undesirable state (box B is empty), often accompanied with some additional benefit ($1,000 from box A)
  • Neither option does anything to bring the state about (the predictor already has placed the money on the boxes)[16]

Psychological Twin Prisoner's Dilemma

In this variant of the Prisoner's Dilemma, an agent and her twin must both choose to either “cooperate” or “defect.” If both cooperate, they each receive $1,000,000. If both defect, they each receive $1,000. If one cooperates and the other defects, the defector gets $1,001,000 and the cooperator gets nothing. The agent and the twin know that they reason the same way, using the same considerations to come to their conclusions. However, their decisions are causally independent, made in separate rooms without communication. Should the agent cooperate with her twin?[1]

An CDT agent would defect, as she would argue that no matter what action her twin takes, she wins an extra thousand dollars by defecting. She and her twin both reason in this way, and thus they both walk away with $1,000. EDT would prescribe cooperation, on the grounds that it would be good news to learn that one had cooperated as it would provide evidence that the twin also cooperated.[17]

In the case of an FDT agent, she would cooperate, reasoning that since her twin and herself follow the same course of reasoning, if it concludes that cooperation is better, then both cooperate and obtain $1,000,000. If it concludes that defection is better, then both defect and obtain a mere $1,000. Since the former is preferable, and since both twins have the same decision procedure, the course of reasoning therefore concludes cooperation.[2][1]

Death in Damascus

In this plot, an agent encounters Death in Damascus and is told that Death is coming for her tomorrow, as it is written in his appointment book — including the location of the event. This agent knows that deciding to flee to Aleppo (at a cost of $1,000) means that Death will be in Aleppo tomorrow, whereas staying in Damascus means that Death will be in Damascus tomorrow. Should she stay, or flee? FDT would suggest staying, as no matter what Death will be waiting for her as it has her decision procedure, therefore it is better to save the extra $1000. However, CDT is put unstable as it bases her decision on the hypothetical that Death's action is independent of his action (as it was already written in the book).[18][1]

FDT and CDT outperform both EDT

Smoking Lesion Problem

While in Newcomb's Paradox, FDT and EDT outperform CDT, in the Smoking Lesion Problem it is claimed that FDT and CDT outperform EDT.

Consider a hypothetical world where smoking is strongly correlated with lung cancer, but only because there is a common cause – a condition that tends to cause both smoking and cancer. Once we fix the presence or absence of this condition, there is no additional correlation between smoking and cancer. If Susan prefers smoking without cancer to not smoking without cancer, and prefers smoking with cancer to not smoking with cancer, should Susan smoke?[19]

Example Utility Table
Sick Healthy
Smoke 50 100
Abstain 25 75

EDT tells Susan not to smoke, because it treats the fact that her smoking is evidence that she has the lesion, and therefore is evidence that she is likely to get cancer, as a reason not to smoke. Causal decision theory tells her to smoke, as it does not treat the connection between an action and a bad outcome as a reason not to perform the action, rather it considers that smoking has no causal effect on whether or not one gets cancer. In the case of FDT, whether or not the cancer metastasizes does not depend upon the output of the FDT procedure since there exists no dependence of smoking and cancer, therefore FDT recommends smoking. Since smoking provides more utility to Susan regardless of whether she has cancer or not - something she cannot control - it is viewed that the "correct" answer is to smoke.[19][2][1] Nonetheless, there has been discussion whether this truly is the correct approach.[20]

Formal Description

Yudkowsky formalizes Functional Decision Theory per the following formula:[1]

[math]\displaystyle{ \operatorname{FDT}(P, G, x) := \operatorname{arg\,max} \operatorname{E}[V | \operatorname{do}(\operatorname{FDT}(P, G, x) = a)] }[/math]

Criticism

Yudkowsky and Soares assume that an FDT agent is certain that she follows FDT, and this knowledge is held fixed under all counterfactual suppositions. Moreover, decision theorists do not agree on a "correct" or "rational" solution to all of the problems that Yudkowsky and Soares claim that FDT solves. In fact, many would suggest that FDT provides insane recommendations in certain cases, as detailed by Wolfgang Schwarz:[6]

Suppose you have committed an indiscretion that would ruin you if it should become public. You can escape the ruin by paying $1 once to a blackmailer. Of course you should pay! FDT says you should not pay because, if you were the kind of person who doesn't pay, you likely wouldn't have been blackmailed. How is that even relevant? You are being blackmailed. Not being blackmailed isn't on the table. It's not something you can choose.

Similarly, in the variant of Newcomb's Problem where you already know the contents of the million dollar box:

Suppose the you see $1000 in the left box and a million in the right box. If you were to take both boxes, you would get a million and a thousand. If you were to take just the right box, you would get a million. So Causal Decision Theory says you should take box boxes. However, you follow FDT, and you are certain that you do. If FDT recommended two-boxing, then any FDT agent throughout history would two-box. And, crucially, the predictor would (probably) have foreseen that you would two-box, so she would have put nothing into the box on the right. As a result, if FDT recommended two-boxing, you would probably end up with $1000. To be sure, you know that there's a million in the box on the right. You can see it. But according to FDT, this is irrelevant. What matters is what would be in the box relative to different assumptions about what FDT recommends. Therefore, FDT recommends to one-box despite the fact that you gain $1000 less.[6]

In general, criticism of Functional Decision Theory can be summarized in the following points of argument.[21]

  • FDT can make bizarre recommendations, i.e. does not maximize utility
  • FDT assumes that the predictor is running your algorithm to predict your actions, rather than other methods of reliably predicting
  • 'Implausible discontinuities' — If a predictor is swapped with a physical processes (e.g. the lesion from the Smoking Lesion), then FDT alters its recommendations
  • There is no objective fact to determine whether two processes are running the same algorithm or not

See also

References

  1. 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 Yudkowsky, Eliezer; Soares, Nate (2017-10-13). "Functional Decision Theory: A New Theory of Instrumental Rationality". arXiv:1710.05060 [cs.AI].
  2. 2.0 2.1 2.2 2.3 Haan, Hein de (2022-10-06). "How to Do Decision Theory (Extended)" (in en). https://medium.com/how-to-build-an-asi/how-to-do-decision-theory-extended-93bc212daed4. 
  3. Weirich, Paul (2020), Zalta, Edward N., ed., Causal Decision Theory (Winter 2020 ed.), Metaphysics Research Lab, Stanford University, https://plato.stanford.edu/archives/win2020/entries/decision-causal/, retrieved 2022-11-14 
  4. Casper, Stephen (2020-01-03). "Decision Theory I: Understanding Functional Decision Theory" (in en). https://medium.com/@thestephencasper/decision-theory-i-understanding-functional-decision-theory-2bef68d063b6. 
  5. Ahmed, Arif (2021-10-31). Evidential Decision Theory (1 ed.). Cambridge University Press. doi:10.1017/9781108581462. ISBN 978-1-108-58146-2. https://www.cambridge.org/core/product/identifier/9781108581462/type/element. 
  6. 6.0 6.1 6.2 "Wolfgang Schwarz :: On Functional Decision Theory". https://www.umsu.de/blog/2018/688#c2169. 
  7. Yudkowsky, Eliezer (2010). Timeless Decision Theory. The Singularity Institute. https://intelligence.org/files/TDT.pdf. 
  8. "An introduction to Timeless Decision Theory" (in en). 2010-08-19. https://formalisedthinking.wordpress.com/2010/08/19/an-introduction-to-timeless-decision-theory/. 
  9. "Open Thread September, Part 3" (in en-US). https://www.greaterwrong.com/posts/yFcxfAgt2GwYbK7Fe/open-thread-september-part-3. 
  10. "Timeless Decision Theory - LessWrong" (in en). https://www.lesswrong.com/tag/timeless-decision-theory. 
  11. "Updateless Decision Theory - LessWrong" (in en). https://www.lesswrong.com/tag/updateless-decision-theory. 
  12. Benjamin A., Levinstein; Soares, Nate. "Cheating Death in Damascus". The Journal of Philosophy 117 (5): 237–266. https://intelligence.org/files/DeathInDamascus.pdf. 
  13. Haan, Hein de (2022-10-07). "How to Solve Parfit's Hitchhiker" (in en). https://www.cantorsparadise.com/how-to-solve-parfits-hitchhiker-99d9b74a2040. 
  14. Haan, Hein de (2022-03-09). "How to do Decision Theory" (in en). https://medium.com/how-to-build-an-asi/how-to-do-decision-theory-aad5edacb144. 
  15. Haan, Hein de (2022-04-06). "How to do Functional Decision Theory" (in en). https://medium.com/how-to-build-an-asi/how-to-do-functional-decision-theory-b9035ca05812. 
  16. Joyce, James M. (1999). The Foundations of Causal Decision Theory. Cambridge Studies in Probability, Induction and Decision Theory. Cambridge: Cambridge University Press. doi:10.1017/cbo9780511498497. ISBN 978-0-521-64164-7. https://www.cambridge.org/core/books/foundations-of-causal-decision-theory/068BC033D4D8245FCF607904CC4DB730. 
  17. Lewis, David (1979). "Prisoners' Dilemma is a Newcomb Problem". Philosophy & Public Affairs 8 (3): 235–240. ISSN 0048-3915. https://www.jstor.org/stable/2265034. 
  18. Gibbard, Allan; Harper, William L. (1981), Harper, William L.; Stalnaker, Robert; Pearce, Glenn, eds., "Counterfactuals and Two Kinds of Expected Utility" (in en), IFS: Conditionals, Belief, Decision, Chance and Time (Dordrecht: Springer Netherlands): pp. 153–190, doi:10.1007/978-94-009-9117-0_8, ISBN 978-94-009-9117-0, https://doi.org/10.1007/978-94-009-9117-0_8, retrieved 2022-11-21 
  19. 19.0 19.1 Egan, Andy (2007). "Some Counterexamples to Causal Decision Theory". The Philosophical Review 116 (1): 93–114. doi:10.1215/00318108-2006-023. ISSN 0031-8108. https://www.jstor.org/stable/20446939. 
  20. entirelyuseless (2015-11-02). "Smoking Lesion" (in en). https://entirelyuseless.com/2015/11/02/smoking-lesion/. 
  21. wdmacaskill (in en). A Critique of Functional Decision Theory. https://www.lesswrong.com/posts/ySLYSsNeFL5CoAQzN/a-critique-of-functional-decision-theory.