The Prisoner’s Dilemma

The Prisoners’ Dilemma

The Prisoner’s Dilemma is a game, but a game that seems to bear lessons for the conduct of human affairs more generally, and it has attracted a great deal of attention from men not noted for their frivolity. It was discovered in 1950 at the RAND corporation, a military think-tank established after World War II by the United States Air Force to conduct a “program of study and research on the broad subject of intercontinental warfare”.

photo: DenisNata / Shutterstock.com

So it is a serious game, but a simple one for all that. It requires two players, let’s say you and me. There is only one move. Each of us must make a choice, to “cooperate” or “defect”, without knowing what the other has chosen. Perhaps each of us takes, from a chess board, one black and one white pawn, and as we face each other I put my hands behind my back and proffer a closed fist containing the pawn I have chosen. You make your choice, too, in the same way. Together we open our hands, and reveal what we have chosen. The black pawn represents the black heart of the defector, the white the innocence of the cooperator.

Now, the reckoning. Should we each reveal a white pawn, we have cooperated and each of us wins £20: a fair and happy outcome. If we both are blackhearts with black pawns in our hands, we win nothing. But wickedness is not without its rewards in this game, for if I hold black and you white then I win £40 – and you, looking sadly at the white pawn in your hand, must pay £10 for your naivety.

A troubling game

The troubling aspect of this game, the reason it is infamous, is that it seems clear that each player’s best strategy is always to defect. Whichever pawn is in my hand, you will do better by defecting: if I have cooperated, then defecting will make you £20 richer than you would be by cooperating, and if I have defected then your defecting will spare you a loss of £10. Yet we collectively are punished for our cold rationality, for if we both defect then we are both worse off than if we had both cooperated.

Douglas Hofstadter
photo: stolen from here

Douglas Hofstadter seems to have been especially troubled by this: “I found that I simply could not accept the seemingly flawless logical conclusion that says that a rational player … will always defect”, he wrote, and his Scientific American column of June 1983 explores this discomfort. Hofstadter sent a letter to 20 friends, inviting them to make a choice – to cooperate or defect – which would be used in a game of Prisoner’s Dilemma against each of the 19 others. So each person was to play 19 games, but had to choose one move that would be their move in each. These games were played for real (fairly small) sums of money.

The letter included a nod to the outcome Hofstadter had hoped for: “You’re very bright.”, he wrote. “So are all of you! About equally bright, I’d say, in fact.” In the event, most of the participants defected – 14 out of 20 – which, though it is the outcome most of us would expect, is not what Hofstadter had been hoping for. It is not easy to explain what he had been expecting, and I am not quite certain that it makes sense, but let me try: Hofstadter hoped that his friends would form what he called a ‘superrational’ cluster:

❝Superrational thinkers, by recursive definition, include in their calculations the fact that they are in a group of superrational thinkers.❞

In other words he hoped the players would realise that any answer arrived at by pure reason must be reached, too, in the same way, by all the other members of the group; and thence the players would realise that, since everyone must arrive at the same answer, there are only two possible outcomes: either all must cooperate, or all must defect; and that finally since of those two it is clearly preferable for everyone if everyone cooperates, the only rational move is to cooperate. Hofstadter later speculates that perhaps there are alien “civilisations to which cooperation appears axiomatic” that are destined to survive, whereas civilisations such as ours – “to which defection appears axiomatic” – shall not. (The Cold War threat of nuclear annihilation hangs gloomily over many of his writings of this period.)

We meet again

The Prisoner’s Dilemma challenges our moral sensibility too. For example, watch this clip from the British game show Golden Balls, in which two people play a Prisoner’s Dilemma-like game for high stakes:

It is hard not to feel that the winning player has done something morally questionable here, even though both are abiding by the rules of the game.

This is perhaps related to the fact that cooperation very often is the rational strategy in real-life situations. Whenever I go to the corner shop for a pint of milk, I have to choose whether to cooperate by paying for the milk, or defect by running away without paying. I could probably get away with stealing it: I think I could outrun the shopkeeper, who would in any case be reluctant to leave his shop unmanned for too long, and the police would be unlikely to devote much effort to the investigation of such a petty crime. But the inconvenience of this would outweigh the benefits. I would never be able to go back to the shop, so I would have to go further afield next time I need milk, and even passing by would risk the shopkeeper recognising me and calling the police, so I would have to walk my daughter to school by an inconveniently roundabout route. These practical considerations mean that, quite apart from any moral qualms I may have, stealing the milk would not be a net win for me. By contrast if I were passing through a distant Scottish village to which I was sure I would never return, then perhaps stealing a pint of milk from the village shop would be the rational course of action, since the inconvenience of being unable to go back to the same shop would be no inconvenience at all.

A Scottish village
photo: TTphoto / Shutterstock.com

In short, cooperation may be the rational strategy if you encounter the same people time after time. The political scientist Robert Axelrod put this to the test in a famous experiment conducted in 1979. Axelrod set up a Prisoner’s Dilemma tournament, by computer, and invited a number of game theorists to submit computer programs encoding their strategy. The twist was that each digital player would be played against each other player 200 times, to simulate a situation where we encounter the same people repeatedly. These computer programs could remember what had happened on previous encounters with their opponent, and tailor their behaviour accordingly.

Robert Axelrod
photo: Robert Axelrod

The aim is to score as many points as possible over the whole tournament. This is not the same as ‘winning’: indeed the strategy that wins most often is to always defect, which will beat any opponent who does not always defect, and equalise with one who does. But this is typically a poor sort of winning in which neither player scores very many points, and the programs that did best overall in Axelrod’s tournament turned out to be those that were most effective at eliciting mutual cooperation.

The overall winner was a very simple strategy – the simplest of all the entries in that first tournament – called Tit for Tat. The tit-for-tat strategy is simply to cooperate the first time you meet someone new, and thereafter to do whatever they did the previous time. This strategy will cooperate with a co-player who is willing to cooperate, and yet cannot be much taken advantage of by one that is not.

After the tournament was over, and Axelrod had published the results in great detail, another tournament was announced for which entries were solicited from computer hobbyists as well as from professional mathematicians. There were 62 entries, again including Tit for Tat which was again submitted by its inventor, the psychologist Anatol Rapoport; and also including many subtle strategies that it could be proven would have won in the first tournament had they been there. The tournament was played, and Tit for Tat won again. This second victory showed the robustness of Tit for Tat, that it remained unbeaten even when all the competitors knew it was the one to beat.

Press-Dyson strategies

Freeman Dyson

Bill Press

And there matters rested, more or less, until this year. A few things were learned in the interim: it was found, for example, that sometimes a slight variant may fare better than plain Tit for Tat, the so-called Generous Tit for Tat in which an opponent’s defection is forgiven 10% of the time. Then there was a suggestion in 1993 that “win-stay, lose-shift” fares slightly better than generous Tit for Tat most of the time. (I am grateful to James Junghanns for drawing my attention to this, in the comments below.)

Still, it is reassuring to find that merely playing the game repeatedly with the same people is enough to favour strategies that resemble decent human behaviour, rather than the stark sociopathic logic that seems to reign when playing only once, and this is a comfortable place to rest.

But we can rest no longer! In a paper published earlier this year, Bill Press and Freeman Dyson outline a series of startling mathematical results that change forever our understanding of the game. And, if the game is a model of human affairs, does that change our understanding of the world more generally too?

Press and Dyson uncovered a class of strategies for the iterated Prisoner’s Dilemma – that is, the game where the Prisoner’s Dilemma is played repeatedly with the same other player – that have strange, almost unbelievable properties. One will force the other player’s score, on average, to be a certain number; and nothing she can do will change that score, however she chooses to play. Another type, which Press and Dyson dub “extortionate”, is able to take advantage of the other player’s self-interest. Against an extortionate opponent, the more you allow yourself to be taken advantage of the better you will score. So a purely self-interested player, who will modify his play only to increase his own score, can be taken for a ride by an “extortionist” who is able to use the other player’s self-interest against him.

All this is heady stuff, and it is hard to say precisely what the broader implications are. I have made a little interactive toy where you can pit yourself against both these types of strategy, if you would like to see how they work in practice. Click the tabs to switch between different example strategies.

One thing these new strategies do not obviously do is to dethrone Tit for Tat on its own turf. We cannot be certain till more work has been done, more tournaments have been run, but it is certainly possible that Tit for Tat (or generous Tit for Tat, or Win-Stay, Lose-Shift) will keep its crown. Stewart and Plotkin found a Press-Dyson strategy that did slightly better than generous Tit for Tat in a single simulated tournament, but there’s no really good reason to think that advantage will prove reliable. We just don’t know, yet. This generous strategy, or at any rate a similar one, is also available to play with in the interactive toy.

The goldfish and the elephant

If you have played with the computer opponent, and read how its strategies work, you may be puzzled or surprised to see that the computer ignores all but the immediately previous move when deciding what to do. How is it that such a short memory is sufficient? And how is it that we, who can remember the whole game, cannot use this additional knowledge to gain an advantage over the forgetful machine?

photo: Warren Goldswain / Shutterstock.com

In the jungle, there lived an elephant. This elephant was the most memorious of all the elephants, and she could remember everything that had happened since the birth of the world in every tiny detail. Sometimes on a hot afternoon the elephant would cool herself by the watering hole, and sometimes she would pass the time with a goldfish who lived there in the water. The goldfish was always surprised to see the elephant, for he never could recall anything that had happened more than a few short moments before, and he had no idea that the elephant was there almost every day.

To pass the time, the elephant taught the goldfish to play a game. In the jungle they knew it by a different name, but it was the game we know as Prisoner’s Dilemma. After some weeks the elephant found that the goldfish knew how to play, though he could not remember having been taught or even having met the elephant before. Memory is like that.

And so they played, hour after hour. The elephant could remember everything, of course: every move they had ever played, the precise way the sunlight had glinted on the water and the leaves had rustled at that moment she unclenched her trunk to reveal the black nut or the white nut. The poor goldfish could remember only the move they had just played, and then only by concentrating as hard as he could.

Still, the elephant found increasingly she did not win as often as she would have liked, and the goldfish’s play seemed to be improving, though his memory plainly was not. One afternoon she realised with shock that she was losing, not every turn but more often than not, though she thought hard each turn about everything either of them had ever done and the goldfish did not even recall that he had narrowly escaped being eaten by a crocodile just five minutes before.

That evening when the sun had set, the elephant sat down to think about what had happened. How was it possible that all her memories should profit her so little against this forgetful fish? Slowly she began to understand it. This forgetful fish, she thought, he can remember nothing. Only the move we have just played, and sometimes not even that. And the history of the game – this long history that I remember so vividly – does not exist to him. His choice is informed only by what he can remember. And the thing that I am up against is only the tiny stone that he shoots so deftly from his mouth, and this thing – this choice of his – is unlinked from the past. All those memories of mine cannot help me to predict what the fish will do, for they cannot change what he will do. Only the turn just played is real to him, so that only can matter to me.

Sadly, the elephant lay down to sleep. The elephant has no advantage of the goldfish in this game, and likewise you have none over the goldfish-memoried machine. This observation, that a longer memory confers no advantage against an opponent with a short one, is one of the key points of Press and Dyson’s work and appears as Appendix A in their paper.

About the name

Oh, if you are wondering why it’s called the Prisoner’s Dilemma, that is because of a story that has been told to explain this game since shortly after its discovery. The story goes that two men, having engaged in a joint criminal enterprise, are apprehended by the law and interrogated separately. Each prisoner is given the opportunity to reduce his own sentence by testifying against the other – but that testimony will increase the prison sentence handed down to the other man, and if both testify then there will be evidence enough to put both behind bars for longer than if neither had. In this version the payoffs are prison sentences rather than cash prizes: it has the same paradoxical quality that each prisoner individually is always better off by defecting, but collectively they would be better to invoke their right to remain silent.

On the topic of names, did you know that the phrase Tit for Tat is the origin of the rhyming slang ‘titfer’, meaning a hat?

Links

This entry was posted in chatter, Mathematics. Bookmark the permalink.

23 Responses to The Prisoner’s Dilemma

  1. Nick says:

    “In the event most of the participants defected – 14 out of 20 – which, though it is the outcome most of us would expect, is not what Hofstadter had been hoping for.”

    In the event others read this sentence incorrectly and are curious about what happened, there needs to be a comma after the word “event”. To wit:

    “In the event, most of the participants defected – 14 out of 20 – which, though it is the outcome most of us would expect, is not what Hofstadter had been hoping for.

  2. Stoo says:

    Have you seen this answer to the dilemma, from the same TV show? http://www.youtube.com/watch?v=S0qjK3TWZE8

    It’s masterful game play, even though his opponent doesn’t initially agree.

    • I love that!

      It’s interesting that this only works because the game is not quite the same as Prisoner’s Dilemma: in Split or Steal, if your opponent steals then you don’t lose anything by splitting; but in one-shot PD you always lose by cooperating.

  3. Axelrod’s payoff matrix randomly satisfied an inequality that biased the tournament in favour of Tit For Tat. In general Win Stay Lose Shift (WSLS) has a higher expected payoff.

  4. j2kun says:

    There are two more important difference between split or steal and the prisoner’s dilemma that allows for such optimal strategies as the one linked in the above comment. The first is that most often the prisoner’s dilemma is an iterated game, so that the players can change their strategy based on the actions of their opponents. The second and more important difference is that in split or steal the players have the opportunity to talk to each other. In the classical (mathematical) prisoner’s dilemma, there is no dialogue.

  5. Tay Sharp says:

    The other participants in the tournament could shift the winner.

    Imagine a contest of tit-for-tat competitors and pure altruist (always cooperate) bots. (I know you can’t control the field you play against, but go with this hypo for purposes of exposition.)

    Suppose I introduce a “punish tit-for-tat, reward altruism” participant. I use the first few rounds to test whether or not the opponent is tit-for-tat or pure altruist. When round 10 hits, out of say, 100, I become pure altruist for altruists, or pure sadist for tit-for-tat, depending on which strategy I discovered. The more of this new bot, call it the JUDGE bot, dominates the field, the more the altruists will outperform TfT.

    Suppose JUDGES lose interest in promoting altruism exclusively. Say it is refined slightly to also detect other JUDGE bots (maybe they even self-identify with a special code of initial cooperate/defect moves). When the bots see other JUDGEs, they go pure altruist. When it sees another altruist, it backs off to 90%, or maybe 50% rewards. The TfT it continues to punish remorselessly.

    If these sort of bots were dominating the field, then we opened up the field to all competitors, a quick improvement would be to masquerade as a judge, but defect in the last few rounds.

    You might get really complex cliques that help each other, but sabotage other scores. Those cliques might say something else about human societies.

    I’m not sure if anyone’s studied the math of these sort of algorithm-ist bots, maybe the lost energy trying to figure out who is on your side is always self-defeating. It’d be interesting to see more research on this though, see if my idle non-mathematical musings bear anything interesting.

    • John says:

      I think you are forgetting the “goldfish memory”; trying to use long-term strategies fails when pitted against short-term, previous result plays.

      • Troels says:

        But the purpose of the judge is not to win himself. The purpose of the Judge is to help the altruist win, and to punish the TfT’s. Now it is no longer 1vs1 but a team game.

  6. I don’t remember any references off the top of my head but researchers have submitted “swarms” of strategies to various tournaments and contests. The basic idea is that these strategies go through a choreographed “recognition sequence”. If they do not recognise the opponent, they default to a sensible strategy such as Tit-for-Tat, Generous Tit-for-Tat or Win-Stay-Lose-Shift. If they do recognise the opponent “drone” strategies sacrifice themselves by making a nonrandom sequence of suboptimal choices in order to favour a “queen” strategy that has been pre-designated to win.

    These techniques have proven themselves to be robust (in the sense that recognition works even under noisy conditions) but not worthwhile (in that even the pre-designated “queen” strategy does not gain a significant advantage over baseline strategies).

    This seems general enough to cover the scenario proposed by Tay Sharp (though admittedly anecdotally and not rigorously).

  7. peet says:

    so does “tit for tat” translate to “eye for an eye” as a strategy for punishment? does it lend it any merit?

    • Referring to “punishment” misses the whole point of Tit For Tat: two TFT-playing players will always cooperate and nobody will ever need to be punished.

      But when a defection does occur, there is indeed a single instance of automatic Eye For An Eye punishment. Depending on the pair of strategies, this may or may not end there or reverberate throughout the game from then on.

      For example, if two TFT players are playing against each other and cooperating, all is fine and well and they will go on cooperating forever. But add an instance of noise in the form of a single defection and suddenly you have instigated a retaliatory defection, which in turn generates a counter-defection, and so on, degenerating into an endless cycle of reprisals.

      This is why Generous Tit For Tat wins against Tit For Tat in the long run: by having a less-than-certain retaliation, it can recover from the deterministic cycle of revenge.

      From a philosophical standpoint, “do unto others as they would you do unto them” is probably a less dismal rendition of Tit For Tat than “Eye For An Eye” is.

  8. Pingback: Overview and recent developments for The Prisoners’ Dilemma « Economics Info

  9. I’m sorry for the off top, but will you write some more posts about maze random access problem? I’m quite interested in this question.

  10. Pingback: Prisoner’s Dilemma « Experiments and Expressions

  11. Pingback: Infinite Games | Eventually Almost Everywhere

  12. Nick Shryane says:

    Enjoyed the blog. Do you know of any experiments that have looked at the strategies humans actually follow when playing iterated PD?

Leave a comment