The Prisoner’s Dilemma
2012-07-23 § 23 Comments
The Prisoners’ Dilemma
The Prisoner’s Dilemma is a game, but a game that seems to bear lessons for the conduct of human affairs more generally, and it has attracted a great deal of attention from men not noted for their frivolity. It was discovered in 1950 at the RAND corporation, a military think-tank established after World War II by the United States Air Force to conduct a “program of study and research on the broad subject of intercontinental warfare”.
So it is a serious game, but a simple one for all that. It requires two players, let’s say you and me. There is only one move. Each of us must make a choice, to “cooperate” or “defect”, without knowing what the other has chosen. Perhaps each of us takes, from a chess board, one black and one white pawn, and as we face each other I put my hands behind my back and proffer a closed fist containing the pawn I have chosen. You make your choice, too, in the same way. Together we open our hands, and reveal what we have chosen. The black pawn represents the black heart of the defector, the white the innocence of the cooperator.
Now, the reckoning. Should we each reveal a white pawn, we have cooperated and each of us wins £20: a fair and happy outcome. If we both are blackhearts with black pawns in our hands, we win nothing. But wickedness is not without its rewards in this game, for if I hold black and you white then I win £40 – and you, looking sadly at the white pawn in your hand, must pay £10 for your naivety.
A troubling game
The troubling aspect of this game, the reason it is infamous, is that it seems clear that each player’s best strategy is always to defect. Whichever pawn is in my hand, you will do better by defecting: if I have cooperated, then defecting will make you £20 richer than you would be by cooperating, and if I have defected then your defecting will spare you a loss of £10. Yet we collectively are punished for our cold rationality, for if we both defect then we are both worse off than if we had both cooperated.
Douglas Hofstadter seems to have been especially troubled by this: “I found that I simply could not accept the seemingly flawless logical conclusion that says that a rational player … will always defect”, he wrote, and his Scientific American column of June 1983 explores this discomfort. Hofstadter sent a letter to 20 friends, inviting them to make a choice – to cooperate or defect – which would be used in a game of Prisoner’s Dilemma against each of the 19 others. So each person was to play 19 games, but had to choose one move that would be their move in each. These games were played for real (fairly small) sums of money.
The letter included a nod to the outcome Hofstadter had hoped for: “You’re very bright.”, he wrote. “So are all of you! About equally bright, I’d say, in fact.” In the event, most of the participants defected – 14 out of 20 – which, though it is the outcome most of us would expect, is not what Hofstadter had been hoping for. It is not easy to explain what he had been expecting, and I am not quite certain that it makes sense, but let me try: Hofstadter hoped that his friends would form what he called a ‘superrational’ cluster:
❝Superrational thinkers, by recursive definition, include in their calculations the fact that they are in a group of superrational thinkers.❞
In other words he hoped the players would realise that any answer arrived at by pure reason must be reached, too, in the same way, by all the other members of the group; and thence the players would realise that, since everyone must arrive at the same answer, there are only two possible outcomes: either all must cooperate, or all must defect; and that finally since of those two it is clearly preferable for everyone if everyone cooperates, the only rational move is to cooperate. Hofstadter later speculates that perhaps there are alien “civilisations to which cooperation appears axiomatic” that are destined to survive, whereas civilisations such as ours – “to which defection appears axiomatic” – shall not. (The Cold War threat of nuclear annihilation hangs gloomily over many of his writings of this period.)
We meet again
The Prisoner’s Dilemma challenges our moral sensibility too. For example, watch this clip from the British game show Golden Balls, in which two people play a Prisoner’s Dilemma-like game for high stakes:
It is hard not to feel that the winning player has done something morally questionable here, even though both are abiding by the rules of the game.
This is perhaps related to the fact that cooperation very often is the rational strategy in real-life situations. Whenever I go to the corner shop for a pint of milk, I have to choose whether to cooperate by paying for the milk, or defect by running away without paying. I could probably get away with stealing it: I think I could outrun the shopkeeper, who would in any case be reluctant to leave his shop unmanned for too long, and the police would be unlikely to devote much effort to the investigation of such a petty crime. But the inconvenience of this would outweigh the benefits. I would never be able to go back to the shop, so I would have to go further afield next time I need milk, and even passing by would risk the shopkeeper recognising me and calling the police, so I would have to walk my daughter to school by an inconveniently roundabout route. These practical considerations mean that, quite apart from any moral qualms I may have, stealing the milk would not be a net win for me. By contrast if I were passing through a distant Scottish village to which I was sure I would never return, then perhaps stealing a pint of milk from the village shop would be the rational course of action, since the inconvenience of being unable to go back to the same shop would be no inconvenience at all.
In short, cooperation may be the rational strategy if you encounter the same people time after time. The political scientist Robert Axelrod put this to the test in a famous experiment conducted in 1979. Axelrod set up a Prisoner’s Dilemma tournament, by computer, and invited a number of game theorists to submit computer programs encoding their strategy. The twist was that each digital player would be played against each other player 200 times, to simulate a situation where we encounter the same people repeatedly. These computer programs could remember what had happened on previous encounters with their opponent, and tailor their behaviour accordingly.
The aim is to score as many points as possible over the whole tournament. This is not the same as ‘winning’: indeed the strategy that wins most often is to always defect, which will beat any opponent who does not always defect, and equalise with one who does. But this is typically a poor sort of winning in which neither player scores very many points, and the programs that did best overall in Axelrod’s tournament turned out to be those that were most effective at eliciting mutual cooperation.
The overall winner was a very simple strategy – the simplest of all the entries in that first tournament – called Tit for Tat. The tit-for-tat strategy is simply to cooperate the first time you meet someone new, and thereafter to do whatever they did the previous time. This strategy will cooperate with a co-player who is willing to cooperate, and yet cannot be much taken advantage of by one that is not.
After the tournament was over, and Axelrod had published the results in great detail, another tournament was announced for which entries were solicited from computer hobbyists as well as from professional mathematicians. There were 62 entries, again including Tit for Tat which was again submitted by its inventor, the psychologist Anatol Rapoport; and also including many subtle strategies that it could be proven would have won in the first tournament had they been there. The tournament was played, and Tit for Tat won again. This second victory showed the robustness of Tit for Tat, that it remained unbeaten even when all the competitors knew it was the one to beat.
And there matters rested, more or less, until this year. A few things were learned in the interim: it was found, for example, that sometimes a slight variant may fare better than plain Tit for Tat, the so-called Generous Tit for Tat in which an opponent’s defection is forgiven 10% of the time. Then there was a suggestion in 1993 that “win-stay, lose-shift” fares slightly better than generous Tit for Tat most of the time. (I am grateful to James Junghanns for drawing my attention to this, in the comments below.)
Still, it is reassuring to find that merely playing the game repeatedly with the same people is enough to favour strategies that resemble decent human behaviour, rather than the stark sociopathic logic that seems to reign when playing only once, and this is a comfortable place to rest.
But we can rest no longer! In a paper published earlier this year, Bill Press and Freeman Dyson outline a series of startling mathematical results that change forever our understanding of the game. And, if the game is a model of human affairs, does that change our understanding of the world more generally too?
Press and Dyson uncovered a class of strategies for the iterated Prisoner’s Dilemma – that is, the game where the Prisoner’s Dilemma is played repeatedly with the same other player – that have strange, almost unbelievable properties. One will force the other player’s score, on average, to be a certain number; and nothing she can do will change that score, however she chooses to play. Another type, which Press and Dyson dub “extortionate”, is able to take advantage of the other player’s self-interest. Against an extortionate opponent, the more you allow yourself to be taken advantage of the better you will score. So a purely self-interested player, who will modify his play only to increase his own score, can be taken for a ride by an “extortionist” who is able to use the other player’s self-interest against him.
All this is heady stuff, and it is hard to say precisely what the broader implications are. I have made a little interactive toy where you can pit yourself against both these types of strategy, if you would like to see how they work in practice. Click the tabs to switch between different example strategies.
One thing these new strategies do not obviously do is to dethrone Tit for Tat on its own turf. We cannot be certain till more work has been done, more tournaments have been run, but it is certainly possible that Tit for Tat (or generous Tit for Tat, or Win-Stay, Lose-Shift) will keep its crown. Stewart and Plotkin found a Press-Dyson strategy that did slightly better than generous Tit for Tat in a single simulated tournament, but there’s no really good reason to think that advantage will prove reliable. We just don’t know, yet. This generous strategy, or at any rate a similar one, is also available to play with in the interactive toy.
The goldfish and the elephant
If you have played with the computer opponent, and read how its strategies work, you may be puzzled or surprised to see that the computer ignores all but the immediately previous move when deciding what to do. How is it that such a short memory is sufficient? And how is it that we, who can remember the whole game, cannot use this additional knowledge to gain an advantage over the forgetful machine?In the jungle, there lived an elephant. This elephant was the most memorious of all the elephants, and she could remember everything that had happened since the birth of the world in every tiny detail. Sometimes on a hot afternoon the elephant would cool herself by the watering hole, and sometimes she would pass the time with a goldfish who lived there in the water. The goldfish was always surprised to see the elephant, for he never could recall anything that had happened more than a few short moments before, and he had no idea that the elephant was there almost every day.
To pass the time, the elephant taught the goldfish to play a game. In the jungle they knew it by a different name, but it was the game we know as Prisoner’s Dilemma. After some weeks the elephant found that the goldfish knew how to play, though he could not remember having been taught or even having met the elephant before. Memory is like that.
And so they played, hour after hour. The elephant could remember everything, of course: every move they had ever played, the precise way the sunlight had glinted on the water and the leaves had rustled at that moment she unclenched her trunk to reveal the black nut or the white nut. The poor goldfish could remember only the move they had just played, and then only by concentrating as hard as he could.
Still, the elephant found increasingly she did not win as often as she would have liked, and the goldfish’s play seemed to be improving, though his memory plainly was not. One afternoon she realised with shock that she was losing, not every turn but more often than not, though she thought hard each turn about everything either of them had ever done and the goldfish did not even recall that he had narrowly escaped being eaten by a crocodile just five minutes before.
That evening when the sun had set, the elephant sat down to think about what had happened. How was it possible that all her memories should profit her so little against this forgetful fish? Slowly she began to understand it. This forgetful fish, she thought, he can remember nothing. Only the move we have just played, and sometimes not even that. And the history of the game – this long history that I remember so vividly – does not exist to him. His choice is informed only by what he can remember. And the thing that I am up against is only the tiny stone that he shoots so deftly from his mouth, and this thing – this choice of his – is unlinked from the past. All those memories of mine cannot help me to predict what the fish will do, for they cannot change what he will do. Only the turn just played is real to him, so that only can matter to me.
Sadly, the elephant lay down to sleep. The elephant has no advantage of the goldfish in this game, and likewise you have none over the goldfish-memoried machine. This observation, that a longer memory confers no advantage against an opponent with a short one, is one of the key points of Press and Dyson’s work and appears as Appendix A in their paper.
About the name
Oh, if you are wondering why it’s called the Prisoner’s Dilemma, that is because of a story that has been told to explain this game since shortly after its discovery. The story goes that two men, having engaged in a joint criminal enterprise, are apprehended by the law and interrogated separately. Each prisoner is given the opportunity to reduce his own sentence by testifying against the other – but that testimony will increase the prison sentence handed down to the other man, and if both testify then there will be evidence enough to put both behind bars for longer than if neither had. In this version the payoffs are prison sentences rather than cash prizes: it has the same paradoxical quality that each prisoner individually is always better off by defecting, but collectively they would be better to invoke their right to remain silent.
On the topic of names, did you know that the phrase Tit for Tat is the origin of the rhyming slang ‘titfer’, meaning a hat?
- Conversation on The Edge website with William Poundstone, Bill Press & Freeman Dyson, and Karl Sigmund & Martin Nowak.
- Technical paper with all the details, by Press and Dyson.
- Discussion paper by Stewart and Plotkin.
- Mike Shulman attacks “hype and misinterpretation” of the Press-Dyson result in a long blog post that includes a clear but detailed explanation of the Press-Dyson theorem.
- The aforementioned interactive toy.