Wednesday, March 28, 2007

Game theory: Prisoners’ Dilemma

Game theory is a branch of applied mathematics that deals with the analysis of games (i.e., situations involving parties with conflicting interests). It is a mathematical method of decision-making which involves searching for the best strategy contingent upon what another player will or will not do. Typically, a competitive situation is analyzed to determine the optimal course of action for an interested topic. It is generally taught in mathematics classes such as applied combinatorics, and in economics classes such as industrial organization. In addition to the mathematical elegance and complete "solution" which is possible for simple games, the principles of game theory also find applications to complicated games such as cards, checkers, and chess, as well as real-world problems as diverse as economics, property division, politics, and warfare.

Game theory has two distinct branches: combinatorial game theory and classical game theory.

Combinatorial game theory covers two-player games of perfect knowledge such as go, chess, or checkers. Notably, combinatorial games have no chance element, and players take turns.
In classical game theory, players move, bet, or strategize simultaneously. Both hidden information and chance elements are frequent features in this branch of game theory, which is also a branch of economics.

The Prisoners’ Dilemma is a non-zero sum problem founded in game theory initially discussed by Albert W. Tucker. Tucker's invention of the Prisoners' Dilemma example did not come out via a research paper, but in a classroom. In 1950, while addressing an audience of psychologists at Stanford University in his capacity of visiting professor, Tucker created the Prisoners' Dilemma to illustrate the difficulty of analyzing certain kinds of games.

Tucker’s actual Prisoners' Dilemma example is as follows:

Two burglars, Bob and Al, are captured near the scene of a burglary and are given the "third degree" separately by the police. Each has to choose whether or not to confess and implicate the other. If neither man confesses, then both will serve one year on a charge of carrying a concealed weapon. If each confesses and implicates the other, both will go to prison for 10 years. However, if one burglar confesses and implicates the other, and the other burglar does not confess, the one who has collaborated with the police will go free, while the other burglar will go to prison for 20 years on the maximum charge.

The strategies in this case are those of whether to confess or don't confess. The payoffs or penalties in this case, are the sentences served. We can express all this compactly in a payoff table which has become quite standard in game theory. Basically, the table would indicate that if they both confess, each gets 10 years, but if Al confesses and Bob does not, Bob gets 20 and Al goes free, and vice versa. However, if both do not confess, then both get 1 year each.

A dilemma arises in deciding the best course of action in the absence of knowledge of the other prisoner's decision, as in what strategies are "rational" if both men want to minimize the time they spend in jail. Each prisoner's best strategy would appear to be to turn the other in. Al might reason as follows: "Two things can happen: Bob can confess or Bob can keep quiet. Suppose Bob confesses. Then I get 20 years if I don't confess, 10 years if I do, so in that case it's best to confess. On the other hand, if Bob doesn't confess, and I don't either, I get a year; but in that case, if I confess I can go free. Either way, it's best if I confess. Therefore, I'll confess."

But Bob will presumably reason in the same manner. Therefore, given that both of them confess, both will go to prison for 10 years each. Yet, if they had acted "irrationally" and kept quiet, they each could have gotten off with one year each.

What has happened here is that the two prisoners have fallen into something known as "dominant strategy equilibrium".

A dominant strategy is defined as follows:

If we were to allow an individual player in a game to evaluate separately each of the strategy combinations he may face, and, for each combination, choose from his own strategies the one that gives the best payoff. If the same strategy is chosen for each of the different combinations of strategies the player might face, that strategy is called a "dominant strategy" for that player in that game.

Therefore, dominant strategy equilibrium occurs if, in a game, each player has a dominant strategy, and each player plays the dominant strategy, then that combination of the dominant strategies and the corresponding payoffs are said to constitute the dominant strategy equilibrium for that game.

In the Prisoners' Dilemma game, to confess is a dominant strategy, and when both prisoners confess, dominant strategy equilibrium occurs. In this case, the individually rational action results in both persons being made worse off in terms of their own self-interested purposes. This revelation has wide implications in modern social science. This is because there are many interactions in the modern world that seem very much like this, from arms races through road congestion and pollution to the depletion of fisheries and the overexploitation of some subsurface water resources. These are all quite different interactions in detail, but are interactions in which individually rational action leads to inferior results for each person, and the Prisoners' Dilemma suggests something of what is going on in each of them.

A number of critical issues can be raised with the Prisoners' Dilemma in view of its simplified and abstract conception of many real life interactions, and each of these issues has been the basis of a large scholarly literature:

1. The Prisoners' Dilemma is a two-person game, but many of the applications of the idea are really many-person interactions;

2. We have assumed that there is no communication between the two prisoners. If they could communicate and commit themselves to coordinated strategies, we would expect a quite different outcome;

3. In the Prisoners' Dilemma, the two prisoners interact only once. Repetition of the interactions might lead to quite different results;

4. Compelling as the reasoning that leads to the dominant strategy equilibrium may be, it is not the only way this problem might be reasoned out. Perhaps it is not really the most rational answer after all.

The Prisoners' Dilemma has wide applications to economics and business. Let’s take an example of two firms, say A and B, selling similar products. Each must decide on a pricing strategy. They best exploit their joint market power when both charge a high price; each makes a profit of $10 million per month. If one sets a competitive low price, it wins a lot of customers away from the rival. Suppose its profit rises to $12 million, and that of the rival falls to $7 million. If both set low prices, the profit of each is $9 million. In this case, the low-price strategy is akin to the prisoner's confession, and the high-price akin to keeping silent. Let’s term the former cheating, and the latter cooperation. In this case, cheating is each firm's dominant strategy, but the result when both cheat is worse for each than that of both cooperating.

On a superficial level the Prisoners' Dilemma appears to run counter to Adam Smith's idea of the invisible hand. When each person in the game pursues his private interest, he does not promote the collective interest of the group. But often a group's cooperation is not in the interests of society as a whole. Collusion to keep prices high, for example, is not in society's interest because the cost to consumers from collusion is generally more than the increased profit of the firms. Therefore companies that pursue their own self-interest by cheating on collusive agreements often help the rest of society. Similarly cooperation among prisoners under interrogation makes convictions more difficult for the police to obtain. One must understand the mechanism of cooperation before one can either promote or defeat it in the pursuit of larger policy interests.

Would the Prisoners be able to extricate themselves from the Dilemma and sustain cooperation when each has a powerful incentive to cheat? The most common path to cooperation arises from repetitions of the game. In the above example, one month's cheating gets the cheater an extra $2 million. But a switch from mutual cooperation to mutual cheating loses $1 million. If one month's cheating is followed by two months' retaliation, therefore, the result is a wash for the cheater. Any stronger punishment of a cheater would be a clear deterrent.

This idea needs some comment and elaboration:

1. The cheater's reward comes at once, while the loss from punishment lies in the future. If players heavily discount future payoffs, then the loss may be insufficient to deter cheating. Thus, cooperation is harder to sustain among very impatient players.

2. Punishment won't work unless cheating can be detected and punished. Therefore, companies cooperate more when their actions are more easily detected (setting prices, for example) and less when actions are less easily detected (deciding on non-price attributes of goods, such as repair warranties). Punishment is usually easier to arrange in smaller and closed groups. Thus, industries with few firms and less threat of new entry are more likely to be collusive.

3. Punishment can be made automatic by following strategies like "tit for tat," which was popularized by University of Michigan political scientist Robert Axelrod. In this case, you cheat if and only if your rival cheated in the previous round. But if rivals' innocent actions can be misinterpreted as cheating, then tit for tat runs the risk of setting off successive rounds of unwarranted retaliation.

4. A fixed, finite number of repetitions are logically inadequate to yield cooperation. Both or all players know that cheating is the dominant strategy in the last play. Given this, the same goes for the second-last play, then the third-last, and so on. But in practice we see some cooperation in the early rounds of a fixed set of repetitions. The reason may be either that players don't know the number of rounds for sure, or that they can exploit the possibility of "irrational niceness" to their mutual advantage.

5. Cooperation can also arise if the group has a large leader, who personally stands to lose a lot from outright competition and therefore exercises restraint, even though he knows that other small players will cheat. For example, Saudi Arabia's role of "swing producer" in the OPEC cartel is an instance of this.