Middle Knowledge. In front of you are two doors. If you go through the left door, you come into a room with a single transparent box containing $7. If you go through the right door, you come into a room with two opaque boxes, one black, one white. Your first choice is which door to take. Then you have to choose exactly one box from the room in which you find yourself. A psychologist has figured out which box you would take if found yourself in the room with the two boxes. She has put $10 into the box she thinks you would take, and $0 into the other.
Suppose you are confident that you'll choose the left door and take the $7. Suppose also that you don't know what the psychologist knows: which of the two boxes you would have taken if you had gone through the other door. Your evidence clearly wouldn't favour one of the boxes over the other. We may assume that some sub-personal mechanism would have broken the tie and made you reach for one of the boxes. The mechanism may well be deterministic, its outcome sensitive to apparently irrelevant environmental circumstances such as the temperature. The psychologist knows how the mechanism works, and she knows what the relevant circumstances in the room are. That's how she knows what you would do.
Now, given all this, we can evaluate the possible plans. Your plan to first choose the left door and then take the transparent box has expected (and certain) payoff $7. How about the alternative plan to first choose the right door and then take the black box?
Well, you don't know what's in the black box. You're 50% sure that it contains $10 and 50% sure that it contains $0. The content of the box is settled. So if you were to go through the door on the right and take the black box, you might get $10 or you might get $0, with equal probability. The expected payoff is $5. Same for taking the white box.
$7 is better than $5. Your plan to choose the left door maximises expected utility (assuming utility = monetary payoff).
But here you are standing in front of the two doors. If you go through the one on the left, you'll get $7. What would happen if instead you went through the door on the right? This is not a question we have considered. We've only looked at the more specific questions what would happen if you went through the door on the right and took this or that box.
If you went through the door on the right, your sub-personal mechanism would select one of the boxes. You don't know which box it would select, but the psychologist knows. She has put $10 into the box it would select. Thus, if you went through the door on the right, you would almost certainly get $10.
So the first move in your plan does not maximise expected utility.
This is an example of what I called a "bizarre" case in this recent post. It illustrates the failure of the '(DC4)' principle, according to which, if a plan P maximises expected utility conditional on P then each of its acts maximises expected utility conditional on P.
It is important to the example that the relevant plan is not a best equilibrium in the decision problem over the plans. I wonder if that is true for all counterexamples to (DC4).
An easy way to block any counterexamples of this kind would be to allow for unspecific plans. If we allow for a plan that merely says that you go through the right-hand door, without settling which box to choose, then your plan to go through the left-hand door and take the $7 no longer maximises expected utility.
]]>Oesterheld and Conitzer discuss the following scenario.
Adversarial Offer With Opt-Out. In stage 1, you have the option of paying $0.20. In stage 2, nothing happens if you paid the $0.20 in stage 1. If you didn't pay, you are presented with two boxes of which you may purchase one for a price of $1. (You may also purchase neither.) A reliable predictor has put $3 in each box that she predicted you wouldn't buy.
According to Oesterheld and Conitzer, "orthodox CDT" says that if you are presented with the choice in stage 2 then you should buy any box of which you were sufficiently confident, before making your choice, that you won't buy it. In expectation, this has a negative expected payoff. You should therefore pay the $0.20 in stage 1. You make a guaranteed loss. Agents who follow EDT, by contrast, would not pay in stage 1 and take no box in stage 2, avoiding the loss.
Oesterheld and Conitzer assume that "orthodox CDT" evaluates the options by the agent's pre-deliberation credences. The kind of CDT I prefer instead says that any option you choose should maximise expected utility at the time of choice. You then couldn't rationally choose to take one of the boxes in the stage 2 problem. Nor could you rationally choose to take none of them. You should remain undecided. More precisely, you should remain undecided if the predictor can't foresee how this state of indecision will be resolved. In expectation, you then make a profit, and you shouldn't pay the $0.20. You will outperform the EDTers.
But what if the predictor can foresee how a state of indecision gets resolved? Then the stage 2 problem has "no equilibrium": there is not stable choice, and no stable state of indecision. Intuitively, not buying any box is nonetheless better than buying one of the boxes. Then, again, the problem would disappear.
I can't see any appeal in the idea that buying a box is the right choice in the stage 2 problem. But I could understand if someone says that the norms of rationality here fall silent, so that no option is permissible and none is forbidden. Then we really do get a case in which you should pay the $0.20 in stage 1 if, for some reason, you think you'll buy a box in stage 2.
I'm not sure how bad that would be. It would be somewhat problematic if we found that CDT licenses choices that together amount to a sure loss. But in this example, at least, I don't think CDT licenses any such choices. (We also don't get an interesting difference here between planning and implementing.)
Gallow (2021) discusses a more puzzling scenario.
Utility Cycle With Switching. In stage 1, you have to choose one of three boxes: A, B, or C. In stage 2, you can pay $60 to swap the box you have taken for the "next" box – meaning that if you took A, you can swap it for B; if you took B, you can swap it for C; if you took C you can swap it for A. A reliable predictor has predicted both choices. If she predicted that you'd end up with box A, she has put $0 into A, $100 into B, and $-100 into C. If she predicted that you end up with box B, she put $0 into B, $100 into C, and $-100 into A. If she predicted that you put take C, she put $0 into C, $100 into A, and $-100 into B.
Oddly, CDT says that you should switch in stage 2, whatever you did in stage 1.
The three options in stage 1 are completely symmetrical. So let's assume without loss of generality that you took box A. If you now choose to switch in stage 2, you can be confident that the predictor has predicted that you'll end up with box B. Box A then contains $-100 and box B $0. So switching is better. If, on the other hand, you choose not to switch then you can be confident that the predictor predicted that you'll end up with A. Box A then contains $0 and box B $100. Switching would have been better.
Dmitri doesn't discuss what you should do in stage 1, and it doesn't seem relevant, but let's have a look. Knowing that you'll switch in stage 2, your choice in stage 1 looks like this (where 'Pred-A' means 'you are predicted to take box A in stage 1'):
Pred-A | Pred-B | Pred-C | |
---|---|---|---|
A | $-160 | $40 | $-60 |
B | $-60 | $-160 | $40 |
C | $40 | $-60 | $-160 |
(For example, if you've been predicted to take box A in stage 1 then you've been predicted to end up with B, so $-100 is in A, $0 in B, and $100 in C. In addition, you lose $60 in stage 2.)
The problem has no equilibrium. None of the pure options is stable, and there is no stable state of indecision, given your knowledge that the predictor can tell which box you will take. Suppose, for example, that you're perfectly undecided between the three options. Then you give credence 1/3 to taking box A, in which case you'll end up with $-160 (because the predictor will have foreseen your action); you give credence 1/3 to taking box B, in which case you'll also end up with $-160; same for taking box C. The state of indecision is worth $-160. It is unstable because all pure options are better.
Due to the symmetry of the problem, no plausible decision rule could favour one of the three out-of-equilibrium options. I'm inclined to say that none is rationally permitted, and none is forbidden.
To retain the puzzle, we must then assume that the predictor can foresee your non-rational choice in stage 1. If she can't, the scenario appears to blow up into paradox. Suppose the predictor can't predict what you do in a decision problem without equilibria. If this is true for the problem in stage 1, then not switching is best in stage 2, and then the problem in stage 1 becomes solvable. That is, if you can't make a rational choice in stage 1, and the predictor can't foresee your non-rational choice, then you can make a rational choice in stage 1!
From a distance, the Utility Cycle case resembles the Adversarial Offer case, on a charitable interpretation of the latter. Both figure an unsolvable decision problem in one stage and an opportunity in the other stage to undo or prevent the choice made in the unsolvable problem.
But the Utility Cycle case looks more problematic. In Adversarial Offer, it seems OK to pay $0.20 if you know you are inclined to (stupidly) buy one of the boxes in stage 2. In Utility Cycle, by contrast, all the options in the unsolvable stage 1 problem are obviously on a par. It really seems odd that you would pay $60 to swap whatever box you chose in stage 1.
For one thing, by switching you will probably make a net loss of $60. If the predictor is infallible, you'll make a sure loss of $60. The loss is avoidable. Agents who follow EDT would not switch in stage 2 and break even. And they wouldn't make a different choice in stage 1.
In addition, if you take (say) box A and then switch, it would have been better to take box B and not switch. Whatever sequence of acts you choose is (causally) dominated by another. The best plan, it seems, would involve not switching in stage 2. If you switch, you therefore violate the principle that any rationally acceptable plan should be rationally implementable, as well as its converse. You also appear to violate the principle that the continuation of a rationally acceptable plan should remain rationally acceptable after some parts of it were implemented.
Finally, if you switch you appear to violate the principle of Preference Reflection. Consider your attitude towards switching in stage 2 before you make the choice in stage 1. Let's assume you are currently undecided between the three boxes. Switching then has lower expected utility then not switching. Yet after making the choice in stage 1, you suddenly prefer to switch.
Let's go through these issues in turn.
First, your avoidable loss of $60. Dmitri points out that CDTers are used to underperforming EDTers. In Newcomb's Problem, we CDTers complain that the EDTers were presented with a more favourable decision situation. Dmitri argues that the same cannot be said here. Instead, he suggests that the outcomes in sequential choice situations don't necessarily shed light on the rationality of the individual choices. Since our temporal parts are like separate agents, the fact that they can be led to predictable ruin is, he says, just an intrapersonal tragedy of the commons.
I agree. But I think a tragedy of the commons can never arise between utilitarian agents who only care about the total good in the community. The analogue of this condition is satisfied in Utility Cycle With Switching. In both stages, you only care about maximising your total payoff. So we can't have an intrapersonal tragedy of the commons here. Something else is going on.
I think the case is closer to Newcomb's Problem. The setup favours EDTers over CDTers, although in a more subtle manner than in Newcomb's Problem.
Consider stage 2. If you're a CDTer, you can be confident that whatever box you now have contains $-100, while the box for which you can trade it contains $0. If you're an EDTer, you can be confident that your box contains $0 and that the alternative contains $100. (Silly you, to reject the switch.)
One might say that if you're a CDTer, then you have inflicted this Newcomb-like situation upon yourself, by whatever you did in stage 1. Your bad options are your own fault. But that's not true.
Imagine we observe many repetitions of the scenario, with both EDTers and CDTers taking part. We see what's inside the boxes. First comes an EDTer who has been predicted to take box A in stage 1. Based on this prediction, box A has been filled with $0. Next comes a CDTer who has been predicted to take box A. Based on this prediction, box A has been filled with $-100. And so on. In general, the predictor has predicted what box the agent would choose in stage 1 and then put $0 into the box if the agent is an EDTer and $-100 if they are a CDTer.
The setup clearly disfavours CDTers, even though, strictly speaking, CDTers are not given worse options (in stage 1).
The second worry about switching in stage 2 was that it amounts to some kind of dynamic inconsistency.
Let's look at the possible plans. Here is the decision matrix for the hypothetical planning problem. ('A¬S' means taking box A in stage 1 and not switching in stage 2.)
Pred-A¬S v Pred-CS | Pred-AS v Pred-B¬S | Pred-BS v Pred-C¬S | |
---|---|---|---|
A¬S | $0 | $-100 | $100 |
AS | $40 | $-60 | $-160 |
B¬S | $100 | $0 | $-100 |
BS | $-160 | $40 | $-60 |
C¬S | $-100 | $100 | $0 |
CS | $-60 | $-160 | $40 |
The only equilibrium is perfect indecision between the three ¬S plans. According to the form of CDT that I like, there is no "choosable" plan. But three plans are "weakly acceptable" in the sense that you could rationally perform them through a resolution of your indecision.
The situation resembles Arif's Psycho Insurance case from this post. Here, too, we have a weakly acceptable plan that's no longer acceptable after its first stage has been (irrationally) implemented. I'm not worried about this.
A violation of Preference Reflection would worry me more. But arguably we don't have that. Suppose you can already choose whether you will switch before you make your choice in stage 1. One might think that you should prefer to not switch, since you give equal credence to all three predictions, which makes switching irrational. But we have to be careful.
Recall how the situation is set up. The predictor has predicted which box you will take in stage 1 and whether you will switch in stage 2. If she has predicted that you will switch she has put $-100 into your box (the one you're predicted to take) and $0 into the alternative. If she has predicted that you don't switch, she has put $0 into your box and $100 into the alternative. You can't change her prediction about whether you'd switch. And switching is better either way. So you should switch – even if you don't yet know what you'll do in stage 1.
Sometimes no act meets this condition. In that case, I've assumed that one should be undecided. More specifically, I've assumed that one should be in an "stable" state of indecision in which no (pure) option is preferable to one's present state of indecision. Unfortunately, there are decision problems in which no act is ratifiable and no state of indecision is stable. I'm not sure what to say about such cases. And I wonder if whatever we should say about them also motivates relaxing the ratifiability condition for certain cases in which there are ratifiable options.
I have briefly motivated the ratifiability condition in this earlier post, where I complained that Ahmed (2014) presents it as an alternative to orthodox CDT. Others do this as well. Apparently "orthodox CDT" says that one should choose an act that maximises expected utility relative to one's pre-deliberation credences. "Orthodox CDT" therefore says that it's perfectly fine for you to choose an act even if, at the time of choice, you know that some other act is guaranteed to be better. I don't know anyone who has ever defended this form of CDT. It should be called "unreflective CDT" or "idiotic CDT" or "strawman CDT", not "orthodox CDT".
Anyway. Let me explain how there can be decision problems without equilibria.
On some ways of modelling states of indecision – notably, the models developed in Skyrms (1990) – one can prove that every decision problem has an equilibrium solution in the form of either a ratifiable option or a stable state of indecision. This result depends on a certain way of assessing the quality of an indecision state, so that we can determine whether it is at least as good as any of the (pure) options. Skyrms assumes that the value of an indecision state is a weighted average of the expected utility of the options between which the agent is undecided. But this isn't always correct.
The problem is especially clear if we assume, as I like to do, that indecision is a genuine type of intentional state, on a par with decisions. Like decisions, indecision states cause actions. But the connection between an indecision state and the resulting action is not deterministic. From the agent's perspective, at least, it is a matter of chance which act will come about if they are in a given state of indecision, with the chances matching the agent's inclination towards the relevant act.
The Skyrmsian model now becomes implausible in cases where someone can predict the outcome of this chancy (or quasi-chancy) process. For a simple example, consider the Death in Damascus case from Gibbard and Harper (1978).
Death in Damascus. You have a choice between going to Aleppo and staying in Damascus. Death has predicted where you will go and is awaiting you there.
This is an "unstable" decision problem because neither act is ratifiable. It seems that you should be perfectly undecided between the two options. If death has a utility of -10 and surviving 0 then Skyrms's model gives the state of indecision a score of 5. (Hence neither of the pure options look better if you're in that state: their expected utility is also 5.)
But what if Death can foresee how states of indecision get resolved – which acts they eventually cause? Then the state of indecision is arguably no better than, say, a straight decision to go to Aleppo. Either way you can be sure to die. The state should get a score of -10, not -5.
This matters for whether the state is an equilibrium. If Death can foresee how states of indecision get resolved, and you are perfectly undecided between Aleppo and Damascus, then you can be certain you will die. You don't know where Death is waiting, because you don't know which act you'll end up taking. For all you know, Death is equally likely to wait in Aleppo and in Damascus. This means that if (counterfactually) you were to decide to go to Aleppo, you would have a 50% chance of survival. That's better than guaranteed death. The indecision is unstable. Both pure options are better.
All the decision problems discussed in Ahmed (2014) are tame insofar as they all have an equilibrium, even if the relevant predictor (that figures in many of them) can foresee resolutions of indecision. But this isn't always the case. In Death in Damascus, it is not – assuming Death can foresee where you will go, no matter how the act is caused. Here no pure act is ratifiable, and no state of indecision is stable.
We can make the problem more vivid if we assume that Death is much better at foreseeing resolutions of indecision (say, 100% success rate) than at foreseeing pure choices (60% success rate). In that case, you surely don't want to remain in a state of indecision. You'd rather take a pure option with a 40% chance of survival than go into a state of indecision with a 0% chance of survival. But neither of the pure options is ratifiable. If you decide to go to Aleppo, it would be better to stay in Damascus. If you decide to stay in Damascus, it would be better to go to Damascus. What should you do?
I'm not sure.
Perhaps it would help to be less serious about "states of indecision". Skyrms sometimes talks as if being in (what I call) a state of indecision is simply a matter of having certain credences about what you will do. His model of deliberation then says that at the endpoint of deliberation you should give equal credence to both options in Death in Damascus. It doesn't say that you should be in a special intentional state that gets resolved by some chancy or quasi-chancy mechanism. Arntzenius (2008) endorses this approach. Ahmed (2014) calls it "deliberational decision theory".
If there's no process of resolving states of indecision, one might hope that we don't need to worry about cases in which someone can predict the outcome of that process.
But then I worry about something else. I want decision theory to be a highly idealised model connecting an agent's beliefs and desires with their behaviour, not just with their beliefs about what they will do. To be sure, one could add to the "deliberational" theory that an agent will initiate an act whenever they are certain (at the end of deliberation) that they will perform it. This gives us a connection to behaviour in every situation in which there is a ratifiable option. But I'd like to have more. I'd like decision theory to provide an instruction for building an ideal agent. The instruction should not fall silent whenever there is no ratifiable option. It should cover as many cases as possible.
Besides, it's not even clear that the "deliberational" approach really avoids the problem. Suppose you're in the Death in Damascus scenario where Death only has a 60% success rate at predicting your acts, and you know that Death's sister is watching you and will strike you down immediately if you end up unsure about what you will do. Shouldn't you plunge for one of the pure acts rather than deliberate yourself into uncertainty? And yet none of these acts is ratifiable.
I think we need to say something about equilibrium-free decision problems. And I'm not sure what we should say. (I briefly touched on this here.)
One option is to stick with the requirements of stability and ratifiability and infer that in such a case, everything is rationally forbidden. You're in a rational dilemma.
Another option is to restrict the requirements to problems with equilibria. If there's no equilibrium, we might say that decision theory falls silent. Nothing is rationally forbidden, nothing is rationally permitted.
Alternatively, one might say that the norms of practical rationality become indeterminate in such a case. It is not determinately permitted to go to Aleppo nor is it determinately forbidden.
Alternatively, one might try to find some backup rule that tells us which of the non-equilibrium solutions are permissible and which aren't.
Some of these options might suggest that we could also weaken the equilibrium requirement for certain decision problems in which there are ratifiable options or stable states of indecision.
For example, it is tempting to say that in the version of Death in Damascus where Death is much better at predicting acts that result from indecision than at predicting straight decisions, you should make a straight choice. Doesn't matter which, but you should choose an unratifiable act. That gives you a 40% chance of survival, whereas indecision would mean certain death. True, whatever option you choose, you'll think that the other option would have given you a 60% chance of survival. But you also know that the same consideration would detract you from the other option if you were to choose it.
Now consider another variation on Death in Damascus. This time, there is a ratifiable act, but it looks terrible.
Dionysos in Damascus. You have a choice between going to Aleppo, staying in Damascus, and going to Homs. Dionysos is likely to have predicted what you will do, even if it results from a state of indecision. If he predicted that you'd go to Aleppo he will meet you there and engage you in a somewhat unpleasant orgy. If he predicted that you'd stay in Damascus he will meet you in Damascus and engage you in the same kind of orgy. If he predicted that you'd go to Homs, he will soon release a toxic gas into the air of Aleppo and Damascus that will kill all their inhabitants. In any case, Alastor (a different God, and for reasons we don't need to get into) is waiting for you in Homs and will torture you savagely if you show up there.
Your decision matrix might look like this, where 'P-Aleppo' means that Dionysos has predicted that you'll go to Aleppo.
P-Aleppo | P-Damascus | P-Homs | |
---|---|---|---|
Aleppo | -1 | 0 | -25 |
Damascus | 0 | -1 | -25 |
Homs | -10 | -10 | -20 |
(I assume that you prefer the torture in Homs over dying from toxic gas. I also assume – although this is inessential – that you care somewhat about the lives of the other people in Aleppo and Damascus.)
Without the third column and the third row, this is just the original Death in Damascus case, except with milder stakes. That problem has no equilibrium. This remains true in the extended problem: going to Aleppo and staying in Damascus are not ratifiable, and no state of indecision between these two options is stable.
The third option, however, is ratifiable. If you decide to go to Homs, you can be confident that Dionysos has predicted your escape from the orgy and that he is about to release the toxic gas in his fury, killing everyone in Aleppo and Damascus. And then you are really better off going to Homs.
I think this is the only equilibrium, but I'm not entirely sure. At any rate, there is no stable state of indecision in which you are less than 60% inclined towards going to Homs. That's bad enough.
Intuitively, you'd be crazy to go to Homs. Much better to endure the orgy in Aleppo or Damascus. Or so one is tempted to think.
In fact, I'm not sure what to make of this case. You could defend a decision for Homs much like we two-boxers defend our decision to two-box. "Yes", you might say, "choosing the other options would be better news, but my aim is not to bring about good news. Like any rational person, I will go to Homs. Dionysos has predicted that I'm rational, and so he has prepared the gas in Aleppo and Damascus. Nothing I can do about that. It's already all set up. The choice Dionysos has given me is between being tortured in Homs and dying in one of the other places. I wish Dionysos had thought that I'm irrational. Then he would have given me better options. Alas I'm rational, and he knows that I am."
So maybe we should bite the bullet and say that you should go to Homs.
But maybe we should instead allow you to choose an unratifiable act. Above I suggested that if Death is especially good at predicting resolutions of indecision, then you should either decide to go to Aleppo or to stay in Damascus, because the alternative – remaining undecided – is much worse and because you can recognise that the reasons against going to whichever city you choose would equally apply if you were to choose the other city. One might argue that this is true for Dionysos in Damascus as well. You should either decide to go to Aleppo or to stay in Damascus (or remain undecided between these options), because the alternative – going to Homs (or being inclined towards Homs) – is much worse and because you can recognise that the reasons against going to whichever of Aleppo or Damascus you choose would equally apply if you were to choose the other city.
Or maybe we should say that we should evaluate not only our actual, fine-grained options, but also more coarse-grained possibilities, represented by disjunctions of options. The expected utility of a disjunction is simply the average of the expected utility of the disjuncts. In Dionysos in Damascus, Aleppo ∨ Damascus has much greater expected utility than the alternative, conditional on being "chosen" (i.e., conditional on Aleppo ∨ Damascus). In that sense, the disjunction is ratifiable. Now one might suggest that we should first identify all ratifiable disjunctions (including disjunctions with one element, and including the disjunction between all options), then select the best of them, and then restrict our attention to the decision problem in which these are the only options. (Any other options are forbidden.) If a state of indecision between the options in the remaining problem is stable then this is the unique answer to the original problem. If not, any of the pure options is permitted.
I vaguely remember Sobel having once proposed something like this. Anything else I should read on these issues?
A few quick comments on the first topic.
It is often assumed that there can be evidential connections between what acts we will choose and what happened in the past. In Newcomb's Problem, for example, you can be confident that the predictor foresaw that you'd one-box if you one-box, and that she foresaw that you'd two-box if you two-box. Some philosophers, however, have suggested that deliberating agents should regard their acts as evidentially independent of the past. If they are right then even EDT recommends two-boxing in Newcomb's Problem.
Arif considers some arguments for the independence requirement, and finds them all wanting. I agree with much of what he says.
The best argument he finds goes like this. During deliberation, the question what you will do is equivalent to the question what you should do. If information about the past were relevant to what you will do, it would also be relevant to what you should do. But there are cases in which it is not. For example, a gambling addict who deliberates about whether to gamble may know that she often gave in to the temptation in the past. This historical fact is evidence that she will give in this time. But it doesn't provide any reason to give in.
In response, Arif suggests that we should distinguish two readings of 'I will do so-and-so' – one reading on which the sentence expresses a prediction, and one on which it expresses a present intention. On its second reading, the question of what you will do should be insensitive to non-reason-giving information about the past.
That may be right. I'm not sure. The argument (involving the alleged equivalence of 'will' and 'should') looks unconvincing anyway.
Let's assume that you engage in an actual process of deliberation. Then it makes sense to assume that your initial beliefs about what you will do are partly shaped by evidence you have about the past. At that stage, 'what will I do?' is clearly not equivalent to 'what should I do?'. At the endpoint of deliberation, when you have settled on what to do, the two questions are more closely connected. If you are rational, and know that you are rational, and know about your beliefs and desires, then you will believe that you will do X iff you believe that X is the right choice. But the questions are still not equivalent, as we can see if (for example) you don't believe that you are rational. In that case, you may well think that you'll end up doing one thing even tough you ought to do another.
The gambler case is anyway somewhat beside the point. What matters for decision theory is whether mere hypotheses about the past are evidentially relevant to hypotheses about what you will do. If you're unsure whether you'll one-box or two-box in Newcomb's Problem, the hypothesis that a reliable predictor has predicted that you'll one-box makes it likely that you'll end up one-boxing. The hypothesis provides no reason or incentive to one-box. As such, it shouldn't steer your deliberation towards one-boxing. And of course it doesn't. It is, after all, only a hypothesis.
Overall, I agree with Arif that the independence requirement is implausible and unmotivated.
Let's turn to the final counterexample to CDT. This actually comes up in the context of the previous topic, but I'll take it out of context.
The scenario resembles the Newcomb Insurance case from the previous chapter.
Psycho Insurance. In front of you is a button. You can either press it or not. If you have been predicted to press it, doing so will cost you $1. If you have been predicted not to press it, pressing it will give you $1. After making your choice, and before you learn of the outcome, you are offered a chance to bet that the prediction was accurate. Taking the bet gets you $0.50 if the prediction was accurate and costs you $1.50 if it was inaccurate. At the outset, you are highly confident that the prediction is accurate.
This is another sequential choice problem with two stages. In stage 1, you are asked whether to press the button. In stage 2, you are asked whether to accept the bet.
Arif assumes that you should accept the bet in stage 2. If you press the button in stage 1, you are therefore guaranteed to lose $0.50 overall. If you don't press the button, you either gain $0.50 (if you were predicted not to press the button) or lose $1.50 (if you were predicted to press the button).
EDT says that you shouldn't press the button. CDT seems to allow for this as well. Specifically, not pressing the button maximises (causal) expected utility if your credence in having been predicted to not to press the button is greater than 0.5. If your credence is less than 0.5, however, pressing the button maximises (causal) expected utility. In that case, Arif assumes, CDT says that you should press the button in stage 1, accept the bet in stage 2, and face a sure loss of $0.50.
Arif calls this a "diachronic Dutch Book" against CDT.
Like in Newcomb Insurance, we also seem to have a problematic divergence between what CDT identifies as the optimal plan and what it says you should do at the individual stages. Here is the matrix for the four possible plans.
pred-press | pred-¬press | |
---|---|---|
press ∧ bet | $-0.50 | $-0.50 |
press ∧ ¬bet | $-1 | $1 |
¬press ∧ bet | $-1.50 | $0.50 |
¬press ∧ ¬bet | $0 | $0 |
Rows 1 and 3 are dominated. If you could decide on a plan for both acts at once, it therefore looks like you should choose either press ∧ ¬bet or ¬press ∧ ¬bet. Either way, you'll plan to not bet in stage 2. Yet once you get to that stage, CDT says that should bet.
Let's go through all this more slowly.
In stage 2, taking the bet is the right choice if, at that point, you are still confident that the predictor foresaw what you did in stage 1. Like in Newcomb Insurance, we can distinguish two versions of the scenario – one in which you believe that the predictor can foresee how states of indecision are resolved and one in which you believe she can only foresee decisions or indecisions. This time, I'll focus on the first version, which is arguably implied by Arif's stipulations.
Let's assume, then, that the predictor has the superpower to foresee your eventual act in stage 1, even if you remain undecided. We then know that you should take the bet in stage 2.
Return to stage 1. If you had faced the stage 1 problem in isolation – without any subsequent stage 2 – CDT would say that you should remain undecided between pressing the button and not pressing it. Things change if we factor in the additional payoff from stage 2. The stage 1 problem then appears to have three equilibria.
One, you could decide to press the button. You should then be confident that this has been predicted, and that you'll end up with $-1 + $0.50 = $-0.50 overall. If you were to not press the button, you'd probably end up with $0 + $-1.50 = $-1.50, because you'd have rendered the prediction false. So this appears to be a stable state.
Two, you could decide to not press the button. You should then be confident that you'll end up with $0 + $0.50 = $0.50. The other option apparently would have left you with $1 + $-1.50 = $-0.50.
Three, you could be perfectly undecided between pressing and not pressing. In that state, you should give credence 0.5 to either prediction. Your expected net payoff is then $0: if you were predicted to press, you end up with $-1 + $0.50 = $-0.50; if you were predicted to not press, you end up with $0 + $.50 = $.50; both are equally likely. The state is stable because both pure options have a lower expected payoff. If you were to press, you'd end up with either $-1 + $.50 = $-.50 or $1 - $1.50 = $-.50, with equal probability (average: $-.50). If you were to not press, you'd end up with either $0 + $-1.50 = $-1.50 or $0 + $0.5 = $0.50 (average: $-.50).
The best of the three equilibria is the second. It is worth $0.50, compared to $-0.50 for the first and $0 for the third. Any sensible form of CDT should at least declare the second equilibrium permissible. I'm inclined to say that it is required.
CDT therefore endorses what Arif thinks is the correct solution: not pressing the button in stage 1 and taking the bet in stage 2.
Arif points out that if your credence in pred-press is greater than 0.5, then CDT says that you must press the button in stage 1 and take the bet in stage 2, for a guaranteed loss of $0.50. Is that correct?
We are invited to consider a situation in which your credence in pred-press is greater than 0.5. Arif doesn't say whether this is supposed to be your initial credence, at the start of deliberation, or your reflective credence, at the end of deliberation. The first kind of situation is relatively unproblematic. The second is not.
Suppose that, for whatever reason, you start your deliberation in a state in which your credence in pred-press is greater than 0.5. Perhaps it's 0.7. In that case, it should not remain at 0.7. Due to your belief in the prediction's accuracy, your credence in pred-press is tied to your credence in press. If during deliberation you become convinced that not pressing is the right choice, you'll also become convinced that you won't press the button, and thereby that you have not been predicted to press the button. This is what my preferred form of CDT says. Whatever your initial credence might have been, I say that your reflective credence in pred-press should be near 0. That is, whatever credence you start out with, you should not press the button in stage 1. You'll never make a guaranteed loss.
We might consider what you should do if your reflective credence in pred-press is greater than 0.5. But now we're talking about an odd situation. According to best-equilibrium CDT, you should never be in that situation. Your reflective credence in pred-press should be near 0.
Best-equilibrium CDT therefore avoids the "diachronic Dutch Book". But other forms of CDT do not. According to permissive CDT, any equilibrium in a decision problem is rationally acceptable. On this account, it looks like you may rationally reach the first equilibrium in stage 1, and make a guaranteed loss.
This kind of case is a reason to prefer best-equilibrium CDT. On closer inspection, however, it is not obvious that even permissive CDT falls prey to the apparent Dutch Book. Perhaps there is in fact no equilibrium at which you can decide to press the button.
Let's return to the apparent first equilibrium. Assume you are confident that you will press the button and that this has been predicted. You should then also be confident that you will accept the bet in stage 2, and that you'll end up with $-1 + $0.50 = $-0.50 overall. We need to ask if it would have been better to not press the button. If yes, your decision isn't stable.
So what would be the case if – counterfactually – you were to not press the button? Well, the prediction would have been false. The crucial question is whether you would still accept the bet in stage 2. We know that accepting the bet is in fact the uniquely rational choice. But it might be irrational under counterfactual circumstances.
Evidently, you would accept the bet iff you would be sufficiently confident that your act in stage 1 has been predicted. So we need to ask what you would believe about the prediction if – counterfactually – you were to not press the button.
It may help to consider the parallel question in Newcomb's Problem (with an all-but-infallible predictor). In that scenario, I would take both boxes and get $1000. I couldn't have done any better. The opaque box is empty. If I had taken only the opaque box, I would have gotten nothing. Now here's the question. What would I have believed about the content of the opaque box if – counterfactually and irrationally – I had one-boxed? Would I have said to myself "I'll take just this box, of which I know that it is empty"? Or would I have said "I'll take just this box, of which I think it contains a million"?
The answer isn't obvious. But the first possibility looks defensible. When I consider what would have happened if I had just taken the opaque box, I don't envisage being completely shocked to find the box empty. I already know that it's empty. By taking the box I would knowingly take an empty box.
(Suppose I enjoy finding out that I'm wrong about something of which I was highly confident. Let's say this is worth $2000 to me. If the counterfactual situation in which I one-box is a situation in which I'm convinced that the box contains a million then I couldn't rationally take two boxes, because the counterfactual situation would be better.)
If that is correct then my confidence in the prediction's accuracy is not robust under counterfactual supposition. If I were to one-box, I would still think that the box is empty, and so I would think that the predictor made a mistake.
Now back to Psycho Button. We are looking at the supposed equilibrium in which you are confident that you'll press the button. What would be the case, we ask, if you were to not press the button? Would you still believe that the prediction is accurate? This is the same question as in Newcomb's Problem, and the answer isn't obvious. One might reasonably argue that your confidence in the predictor's accuracy is not robust under the counterfactual supposition. If that is correct – more precisely, if the kind of supposition that is relevant to CDT gives this answer – then the counterfactual scenario in which you don't press the button is a scenario in which you do not accept the bet in stage 2. And then that scenario has a known utility of $0, which is better than $-0.50. So you're not in equilibrium.
This issue is related to a general problem with the "sophisticated" approach to sequential choice, and with "backwards induction" arguments in game theory. When we consider what an agent should do at a choice point, we assume that the agent is rational, knows that she is rational, has not lost any of her evidence, and so on. But when we reason backwards, we must also consider what the agent would do at that choice point if she were to make a possibly irrational choice at an earlier point. We can't assume that the answer is the same.
In sum, it's not obvious that permissive CDT allows for the problematic choices in Psycho Insurance.
So much for the diachronic Dutch Book. What about the apparent mismatch between the optimal strategy and your actual choices? Here the situation is analogous to that in Newcomb Insurance.
Remember the decision matrix for the possible plans.
pred-press | pred-¬press | |
---|---|---|
press ∧ bet | $-0.50 | $-0.50 |
press ∧ ¬bet | $-1 | $1 |
¬press ∧ bet | $-1.50 | $0.50 |
¬press ∧ ¬bet | $0 | $0 |
On the assumption that the predictor can foresee resolutions of indecision, the only equilibrium is perfect indecision between rows 1 and 3.
This may be surprising given that both of these options are dominated. Indeed, Arif assumes that because press ∧ bet and ¬press ∧ bet are both dominated, CDT requires you to choose one of the other plans. But not so. In fact, none of the other plans is rationally choosable, nor is there a stable state of indecision in which you think you might implement one of these plans.
But indecision between the two dominated plans is stable. Suppose you're in that state. Then you are 50% confident that you will press the button and accept the bet. You will then have been predicted to press the button, so you'll lose $-0.50. The other 50% of your credence goes to scenarios in which you don't press and bet, in which case you will have been predicted to not press, and you end up with $0.50. Your expected payoff is $0. Even though you prefer one direction of the indecision state to the other, you are not drawn further towards the relevant pure choice: if you were to directly choose to not press and bet, you would get either $-1.50 or $0.50, with equal probability.
So if you had a choice between the plans, you should be undecided between press ∧ bet and ¬press ∧ bet: you should settle on betting in stage 2 and you should remain undecided about stage 1.
We've already covered the individual decision problems. Here you should take the bet in stage 2, and you should arguably not press the button in stage 1, although permissive CDT might also allow that you press the button or remain undecided in stage 1.
In any case, we don't have the mismatch between planning and implementation that Arif predicts. If you could decide between plans, you would decide to accept the bet in stage 2. Once you reach that stage, this is just what you will do.
Arif actually doesn't consider the possible match or mismatch between planning perspective and implementation perspective. Like much of the literature, he instead concentrates on your attitudes towards plans and their continuation. He suggests that CDT violates the principle I called (DC3) in the previous post, according to which the continuation of any ex ante acceptable plan is still acceptable at any point that is compatible with its implementation.
In the previous post I showed that permissive CDT validates (DC3), assuming that "acceptable" means "rationally choosable". In Psycho Insurance, the principle is vacuously satisfied because there is no rationally choosable plan. The principle is even satisfied if we weaken "acceptable" to encompass options that have positive probability in a rational state of indecision. On this reading, press ∧ bet and ¬press ∧ bet are both acceptable. And their continuation, to bet, is acceptable whatever you do in stage 1.
We come closer to dynamic inconsistency if we assume that the predictor can't foresee how states of indecision are resolved.
In that case, you should accept the bet in stage 2 only if you have not been undecided in stage 1. In stage 1, we get the same three apparent equilibria as above, with the same caveat.
In the choice among plans, we get a new equilibrium. This time, you should be perfectly undecided between press ∧ ¬bet and ¬press ∧ ¬bet.
Here, then, one might say that when you consider the possible plans then you think that you should definitely reject the bet in stage 2. Yet when you come to stage 2, you should accept the bet. On the weak reading of "acceptable", ¬press ∧ ¬bet is acceptable but its continuation (after the first stage has been implemented) is not.
I don't think this is a serious problem. It would really be worrying if CDT told you that the best plan is to first do A and then B, but also that you should choose A and ¬B when you reach the two choice points. If ¬B is better than B on the assumption that you do A, then A ∧ B can hardly be the optimal plan. But CDT doesn't tell you that ¬press ∧ ¬bet is the best plan.
]]>Throughout this post I'll assume that we are dealing with ideally rational agents with stable basic desires. We're interested in the attitudes such agents should take towards their options in simple, finite sequential choice situations where no relevant information about the world arrives in between the choice points.
In this context, a plan is a proposition specifying an act for each choice point in the sequence. A plan is rationally choosable if it is a rational choice in a hypothetical decision problem in which the options are the possible plans. A plan is rationally implementable if at each choice point, the agent could rationally choose whatever the plan says she does at that point.
In the previous post, I considered the following principle. (This is the left-to-right direction of the principle I there called "Dynamic Consistency".)
(DC1) If a plan is rationally choosable then it is rationally implementable.
We found that an attractive form of CDT violates (DC1) in the following variant of a scenario from Ahmed (2014).
Newcomb Insurance With A Coin.
Stage 1. You face Newcomb's Problem, but with different monetary values. The transparent box is empty, the opaque box contains $100 iff you have been predicted to one-box. In addition to one-boxing and two-boxing, you have the option to toss a fair coin and let the outcome decide whether you'll one-box or two-box. The predictor can infallibly foresee your choice, but she can't foresee the outcome of the coin toss.
Stage 2. Before the content of the opaque box is revealed, you must bet on whether the predictor foresaw how many boxes you took. If you bet that the prediction was accurate, you get $25 if you're right and lose $75 if you're wrong; if you bet that the prediction was inaccurate, you get $75 if you're right and lose $25 if you're wrong.
Here is a decision matrix for the possible plans.
pred-1b | pred-2b | |
---|---|---|
1b & bet-acc | $100+$25 = $125 | $0-$75 = $-75 |
1b & bet-inacc | $100-$25 = $75 | $0+$75 = $75 |
2b & bet-acc | $100-$75 = $25 | $0+$25 = $25 |
2b & bet-inacc | $100+$75 = $175 | $0-$25 = $-25 |
rand & bet-acc | $125 if 1b else $25 = $75 | $-75 if 1b else $25 = $-25 |
rand & bet-inacc | $75 if 1b else $175 = $125 | $75 if 1b else $-25 = $25 |
The only rationally choosable plan is row 6. But when you face the individual choices, you should arguably one-box in stage 1 and bet on an accurate prediction in stage 2 – because that's the best equilibrium solution. We have a counterexample to (DC1).
A curious aspect of Newcomb Insurance With A Coin is that you do better if you face the individual choices than if you choose a plan. One might have thought that more control is always better. Not so here. If in stage 1, you had the power to "bind" your future choices, you should not make use of that power.
The reason why you do better if you face the individual choices is that you are then given better options. You can rationally intend to one-box, and if you intend to one-box then the opaque box is certain to contain $100. If you have simultaneous control over both choices, by contrast, you can't rationally intend to one-box. The only option you can rationally intend to choose is randomisation. In that case there's a 50% chance that the opaque box contains $100, but also a 50% chance that it contains nothing.
In Newcomb's original problem, the predictor gives EDTers the better options. She gives them a choice between $1M and $1M1K, while CDTers get a choice between $0 and $1K. In Newcomb Insurance With A Coin, the predictor gives better options not just to EDTers, but also to (best-equilibrium) CDTers who face the two choices independently. CDTers who have simultaneous control over both choices are punished with worse options.
If we hold fixed the content of the opaque box then the optimal plan (rand & bet-inacc) is no worse than the optimal implementation (1b & bet-acc). Suppose the opaque box contains $100. Then your expected payoff is $125 either way. Suppose the opaque box is empty. Then rand & bet-inacc has expected payoff $25 while 1b & bet-acc is guaranteed to result in $-75.
So we have two factors that pull in opposite directions. On the one hand, having simultaneous control over both choices allows you to make more (or at least equally much) out of whatever cards you've been dealt. On the other hand, the extra control makes it likely that you've been dealt worse cards.
This kind of situation is clearly unusual. When we intuit that a rationally choosable plan should be rationally implementable, we don't suppose that having a choice between plans is associated with having worse options.
The dynamic consistency principle (DC1) compares a hypothetical choice between plans with the actual choices at the individual choice points. Perhaps we should not take the merely hypothetical choice so seriously. That is, perhaps we shouldn't consider what you should do if you could actually choose between plans – with the possible consequence that you would then have been given worse options. Instead, we might simply consider which plans maximise expected utility, without considering whether you could rationally decide in their favour.
(DC2) If a plan maximises expected utility then each of its acts maximises expected utility after the earlier acts have been performed.
From a CDT perspective, however, (DC2) is highly implausible.
The problem is that a plan can maximize (causal) expected utility only if you believe that you won't choose it. In Newcomb Insurance With A Coin, for example, the plan 2b & bet-inacc maximises expected utility if you believe that you'll choose 1b in stage 1. But if you go ahead and implement 2b & bet-inacc, you can hardly remain confident that you choose 1b in stage 1. After having chosen 2b, you know that you have chosen 2b, and then 2b & bet-inacc no longer maximises expected utility. (Nor does bet-inacc on its own.)
The most popular formulation of dynamic consistency in the literature goes like this.
(DC3) If a plan is choosable at the start of a sequential choice problem, then its continuation is choosable at any later point after the earlier parts of the plan have been implemented.
Newcomb Insurance With A Coin is a counterexample to (DC1), but not to (DC3). Is (DC3) valid in CDT? It depends.
I'll first prove a lemma.
Lemma. If a plan P maximises (causal) expected utility conditional on P then the plan's continuation still maximises (causal) expected utility conditional on P.
Proof. I assume that causal expected utility can be expressed in terms of some kind of supposition, so that EU(A) = ∑_{w} V(w)Cr(w//A), where Cr(B//A) is the agent's credence in B on the supposition that A. Cr(B//A) must be distinguished from the ordinary conditional probability Cr(B/A), which I also write as Cr^{A}(^{}B). I need two assumptions about the relevant kind of supposition. Both should be fairly uncontroversial.
No-Backtracking. If A says that such-and-such acts are performed up to some point in a sequential choice situation, and B says that such-and-such acts are performed afterwards, then Cr^{A}_{}(A_{}//B_{}) = 1.
Similarity. If Cr(A//B) = 1 then Cr(C//B) = Cr(C//A ∧ B).
Now assume some plan P=A_{1}…A_{n} maximises (causal) expected utility conditional on A_{1}…A_{n}, compared to any other plan. In particular,
for any acts B_{i}…B_{n} available at points i…n respectively. By (No-Backtracking),
for any B_{i}…B_{n}. By Similarity, it follows that
for any B_{i}…B_{n} and world w. Plugging this into the first inequality (once with A_{i}…A_{n} as B_{i}…B_{n} and once with B_{i}…B_{n} as B_{i}…B_{n}, we get
for any B_{i}…B_{n}. QED.
Any sensible form of CDT should hold that an option is choosable only if it maximizes causal expected utility conditional on being chosen. Let permissive CDT be the view that this condition is not only necessary, but also sufficient for choosability.
Observation 1. Permissive CDT validates (DC3).
Proof. Let Cr_{i} be the agent's credence at point i. Since Cr_{i}(*) = Cr_{1}(*/A_{1}…A_{i-1}) and the value function is stable, we can replace 'Cr^{A1…An}' by 'Cr_{i}^{Ai…An}' in the Lemma's result:
for any B_{i}…B_{n}. QED.
Let best-equilibrium CDT be the view that one may only choose a best among the options that maximise causal expected utility conditional on being chosen, where the relevant measure of goodness is each candidate's expected utility conditional on being chosen.
Observation 2. Best-equilibrium CDT does not validate (DC3).
Here is a counterexample.
Two Buttons. In stage 1, you can choose whether to press a button. In stage 2, you can choose whether to press a different button. A predictor has predicted your choice in both stages. If she predicted that you'd press only the first button, she wired the buttons so that you get $15 iff you press neither button and $12 otherwise. If she predicted that you'd do anything else, she wired the buttons so that you get $10 if you press both buttons and $0 otherwise. You are certain that the predictor has foreseen your choice.
The payoff matrix for your plans looks as follows, where 'P1N2' means 'press button 1 and not button 2'.
Pred-P1P2 | Pred-P1N2 | Pred-N1P2 | Pred-N1N2 | |
---|---|---|---|---|
P1P2 | $10 | $12 | $10 | $10 |
P1N2 | $0 | $12 | $0 | $0 |
N1P2 | $0 | $12 | $0 | $0 |
N1N2 | $0 | $15 | $0 | $0 |
The only equilibrium in this decision problem is P1P2. After you've pushed the first button (P1) in stage 1, your decision problem in stage 2 is effectively the top left quarter of the matrix. This problem has two equilibria, P1P2 and P1N2. The second is better. Best-equilibrium CDT therefore says that in stage 2, the continuation of the only ex ante choosable plan P1P2 is no longer choosable.
I don't understand the common focus on (DC3). The principle compares ex ante attitudes towards a plan with attitudes towards the plan during its hypothetical implementation, even if that implementation is irrational. If it would be irrational to implement a plan, why should we assume that you should have stable attitudes towards the plan during its implementation? To be sure, if we could show (DC1) – if we could show that any choosable plan is rationally implementable – then (DC3) would have some appeal. But then (DC1) is doing the real work.
Newcomb Insurance With A Coin shows that (DC1) is invalid in best-equilibrium CDT. Permissive CDT escapes the counterexample. It allows you to implement the uniquely choosable plan. Is this always true? That is, can we show the following?
(DC4) If a plan P maximises expected utility conditional on P then each of its acts maximises expected utility conditional on P after the earlier acts have been chosen.
(DC4) resembles (DC2), except that we've conditionalised on P. This ensures that we don't consider the relevant plan as a merely counterfactual alternative, which rendered (DC2) untenable.
(DC4) looks plausible to me. Oddly, I can't prove it without some non-trivial assumptions. The following two assumptions do the job.
Future Determinacy. You are never uncertain about what your future self would choose at later points in the sequence under the supposition that you make a certain choice now.
Strong Centring. If you are certain that you will choose A_{1} now and A_{2}…A_{n} afterwards then you are certain that you would choose A_{2}….A_{n} on the supposition that you now choose A_{1}.
Strong Centring is debatable. Future Determinacy is not plausible as a general assumption. But it is often satisfied. If you know that your future self is rational, you can often figure out what they would do if they faced a certain decision situation. The only counterexamples are situations in which you know that your future self would face a choice in which two options are both choosable, and you don't know which of the options they would pick.
Observation 2. (DC4) holds in CDT whenever Future Determinacy and Strong Centring are satisfied.
Proof: Assume A_{1}…A_{n} maximises (causal) expected utility conditional on P. By the Lemma, we know that after A_{1}…A_{i-1} have been implemented, A_{i}…A_{n} still maximises expected utility conditional on P. That is,
for any B_{i}…B_{n}. We need to show that A_{i} alone also maximises expected utility conditional on P. Suppose for reductio that some alternative B_{i} has greater expected utility conditional on P. That is,
By Future Determinacy, there are acts B_{i+1}…B_{n} such that Cr_{i}^{A1…An}(B_{i+1}…B_{n}//B_{i}) = 1. By Similarity, it follows that Cr_{i}^{A1…An}(w_{}//B_{i}) = Cr_{i}^{A1…An}(w//B_{i}B_{i+1}…B_{n}). Also, by Strong Centring and Similarity, Cr^{P}_{i}(w//A_{i})_{} = Cr^{P}_{i}(w//A_{i}…A_{n}). Thus (2) turns into
And this contradicts (1). QED.
Why do we need Future Determinacy? Consider a two-stage dynamic decision problem, and assume the plan A_{1} ∧ A_{2} maximises expected utility conditional on A_{1} ∧ A_{2}. Now suppose Future Determinacy is false: you don't know would you would choose in stage 2 if you chose B_{1} in stage 1. Let's say you're unsure whether you would choose A_{2} or B_{2}, because both would be equally choiceworthy. We know that (conditional on A_{1} ∧ A_{2}) neither B_{1} ∧ A_{2} nor B_{1} ∧ B_{2} has greater expected utility than A_{1} ∧ A_{2}. Oddly, their disjunction – which is equivalent to B_{1} – might still have greater expected utility than A_{1} ∧ A_{2}. And then A_{1} would not maximise expected utility (conditional on A_{1} ∧ A_{2}).
Instead of Future Determinacy, we could also require that (conditional on A_{1}…A_{n}) no disjunction of plan continuations has greater expected utility than each disjunct. Or more specifically: If some plan continuations all have expected utility x then their disjunction does not have expected utility greater than x. Let's call a scenario bizarre if it falsifies both Future Determinacy and this condition. Any counterexample to (DC4) would have to be bizarre (given our other assumptions, like Strong Centring).
Observation 3. (DC1) holds in permissive CDT for any scenario that is not bizarre.
This immediately follows from Observation 2.
Observation 4. Any non-bizarre case in which best-equilibrium CDT violates (DC1) is a case in which simultaneous control over all acts in the relevant sequence is bad news, indicating that the agent has been given worse options.
This is because a better equilibrium in a decision problem is always better because it carries better news about your options. By (DC4), the optimal planning equilibrium is still an equilibrium during implementation. If there is a better equilibrium during implementation this means that the original equilibrium – the planning equilibrium – carries worse news about the options.
It would be good to figure out what happens in "bizarre" cases.
I can't be the first to look into dynamic consistency from a CDT perspective. Any literature suggestions are welcome.