The Institutional Approach for Modeling the Evolution of Human Societies

Artificial life is concerned with understanding the dynamics of human societies. A defining feature of any society is its institutions. However, defining exactly what an institution is has proven difficult, with authors often talking past each other. This article presents a dynamic model of institutions, which views them as political game forms that generate the rules of a group's economic interactions. Unlike most prior work, the framework presented here allows for the construction of explicit models of the evolution of institutional rules. It takes account of the fact that group members are likely to try to create rules that benefit themselves. Following from this, it allows us to determine the conditions under which self-interested individuals will create institutional rules that support cooperation—for example, that prevent a tragedy of the commons. The article finishes with an example of how a model of the evolution of institutional rewards and punishments for promoting cooperation can be created. It is intended that this framework will allow artificial life researchers to examine how human groups can themselves create conditions for cooperation. This will help provide a better understanding of historical human social evolution, and facilitate the resolution of pressing societal social dilemmas.


Introduction
Artificial life is concerned with the simulation and synthesis of living systems. One key type of living system that artificial life seeks to understand through simulation and synthesis is human social organization. The goals of this are many and varied, from wanting to better understand the ecological and social pressures that historically transformed human groups from egalitarian hunter-gatherers to hierarchical chiefdoms and states, to being able to devise incentive schemes to prevent climate change, to being able to engineer artificial systems that autonomously adapt their social organization to changing conditions. All of these efforts lie at the interface with a number of other disciplines that are concerned with understanding human social organization, including anthropology, archaeology, artificial intelligence, economics, evolutionary biology, primatology, political science, and psychology.
This article reviews the different approaches that have been used to model the cultural evolution of human societies, before going on to argue for the merits of an institutional approach. Following Hurwicz [34], institutions are defined here as political game forms that generate the rules of a groupʼs economic interactions. This is in contrast to other work that has tended to define institutions either as equilibrium economic behavior within a society (e.g., [61]), or directly as the rules of the economic interactions themselves (e.g., [44]). The problem with these approaches is that they do not allow us to model how the rules of economic interactions change within a group. In particular, they overlook the fact that rules will typically result from processes of bargaining and negotiation between self-interested group members who may have different bargaining strengths [65,66]. But by viewing institutions as the political game forms that generate these rules, we can develop dynamic models of how institutions and hence the economic interactions of societies change over time, allowing us to better address the goals of artificial life researchers.

Two Big Questions about Human Societies
When we look at human societies, two big features stand out as being in particular need of explanation. The first is the high level of cooperation and coordination between unrelated individuals. Compared to other primates, humans are unique in depending upon exchange with other individuals for nearly all of their vital resources. For example, very few human individuals produce by themselves all of the food, shelter, clothing, and so on that they need to survive. Rather, individuals specialize in one occupation, and obtain their other vital resources through exchange with others. In economics, this high degree of interdependence is known as catallaxy [45,55]. By contrast, other primates are much less interdependent, and produce nearly all of their vital resources themselves, with only limited exchange of food even between parents and offspring in most species [35].
Strikingly, the degree of interdependence among humans has increased over time from the first hunter-gatherers through to modern-day states. For hundreds of thousands of years, humans lived as hunter-gatherers, obtaining resources by hunting large animals and gathering plant materials [40]. Ancient hunter-gatherer groups practiced extensive food sharing between camp members, and there was a marked division of labor between males who hunted large animals, providing protein, and females who gathered plants, providing carbohydrates [41]. With the Neolithic origin of agriculture that began circa 10,000 years ago, division of labor further increased, with some individuals specializing entirely in tasks unrelated to food production, such as producing crafts [46]. Where we see such high levels of specialization elsewhere in the biological world, it is only in cases where there is a very high genetic relatedness between group members, as exemplified by eusocial insect colonies [42]. In such cases, the division of labor is coordinated by means of a common genetic program carried by each individual. But in human societies, division of labor and exchange occur between unrelated individuals that may never meet again, in what Seabright [64] calls "a company of strangers." Crucially, exchange is always sequential: One individual has to part with their goods first [25]. This creates all kinds of opportunities for one party to cheat on an exchange. This ever present threat of cheating is what Greif [25,26] calls the "fundamental problem of exchange." Further, there are asymmetries in the information held by the parties to an exchange. For example, the producer of a good knows far more about its quality than the receiver [44]. (Think of a used car salesperson, for example.) These complications are glossed over in neoclassical economics, which assumes that exchange is simultaneous and with perfect information and hence assumes away the fundamental problem of exchange. In reality, however, individuals must have had to find a way to overcome these problems repeatedly throughout the evolution of human societies.
The fact that interactions in modern societies are between unrelated individuals who may never meet again is problematic for traditional evolutionary explanations for cooperation based upon kinship and dyadic reciprocity. Some researchers have taken this to be evidence that, in contrast to other species, selection in humans must primarily operate between whole groups or societies, with more cooperative societies outcompeting less cooperative societies [60,70]. However, this kind of group selection explanation requires that competition between individuals within a group be suppressed, for example through biased social learning, in which individuals are assumed to simply copy common behaviors in their group without regard to the economic consequences of doing the behavior [32,62,9]. By contrast, this article will show how modeling the creation of institutional rules through political games allows the interactions that make up modern societies to be explained in terms of the self-interested-actor model that underlies economics and evolutionary biology.
The second key feature of human societies is their transition between egalitarian and hierarchical modes of social organization. Both anthropological [5] and archaeological evidence [56] implies that the first human social groups were egalitarian hunter-gatherers. Anthropological studies of modern hunter-gatherer groups show that decisions are invariably reached by a group consensus being formed, with each individual being allowed to voice its opinion in a group-wide discussion [5]. While such groups do have leaders, the role of leaders is not to coerce others or monopolize the discussion, but rather to facilitate turn-taking and help the group reach a consensus. Archaeological evidence of burial sites similarly reveals no status differentiation when individuals were buried [56].
By contrast, the transition to agriculture was accompanied by a shift to hierarchical social organization, with a small number of individuals exhibiting high status. Evidence from burial sites shows that leaders started to be buried with valuable grave goods such as obsidian, and were not buried alongside other group members as had occurred previously [56]. Hierarchy was manifested both in resource inequality, and in inequality in decision making, with leaders at the top of the hierarchy coercing the rest of the group to follow their decisions. The archaeological evidence points to the first hierarchical societies being chiefdoms, with a single level of hierarchy: a chief presiding over commoners [18]. The origin of states around 4000 years ago is defined in terms of a shift to multiple levels of hierarchy, with rulers creating specialized administrative positions between themselves and the commoners [67]. This represents a new form of division of labor and specialization, where some individuals specialize in administering the group.
What we see in human evolution, then, is a gradual increase both in hierarchical organization and in the degree of division of labor and specialization. These co-occur with an increase in group size. Hunter-gatherer bands would have numbered no more than the hundreds. Cemetery evidence shows that the origin of agriculture brought about a massive increase in fertility [4], with evidence suggesting that the carrying capacity of agriculturalists with irrigation may have been up to 250 times larger than that of hunter-gatherers [30]. This is supported by evidence that the first cities arose during this period. Finally, in states economic interactions occur between millions of individuals [64]. What artificial life needs is a dynamic model of how cooperation, hierarchy, and group size coevolve. In the next section, I introduce the critical role that institutions play in this.

Institutions
What do economic interactions within groups look like? In modern groups, individuals take part in a range of interactions, from bilateral exchange through to the production and maintenance of goods upon which the whole group depends, such as clean air. These interactions can be modeled using game theory [23], an approach that has been endorsed across essentially all of science, from economics and evolutionary biology through to computer science and artificial intelligence. In game theory a social interaction (game) consists of two components [34]: the rules of the game-more formally, the game form-and the preferences of the players over the different possible material outcomes. Our focus here is on the rules of the game. The rules of a game consist of the possible strategies that an individual can choose between (e.g., "cooperate" or "defect") and the mapping between the strategies chosen by each player and the material payoff that each receives (e.g., amount of money, food, or shelter ). Some of the rules of a game will follow directly from properties of the physical world and the current state of technology that the players have, and so cannot be changed. But crucially, there are aspects of the rules that it is possible for human players themselves to change [44,49,34,26,36]. An institution in game theory is defined as a family of rules (game forms) that individuals can potentially choose between, given the current physical state of the environment and the state of their technology [34]. The particular rules chosen are known as the institutional rules.
These rules change what the optimal economic behavior is for individuals that are trying to maximize their own material payoff.
One key type of institutional rules have the effect of promoting cooperation in economic activities. They do this by providing coordinated systems of rewards and punishments, and by coordinating the sharing of information about the actions of other individuals. In this way, they make cooperation rather than defection advantageous for self-interested individuals [26]. For example, Ostrom [49] describes the institutional rules that regulate the use of irrigation systems in a number of small-scale existing societies. The rules include prescriptions about when a farmer may take water, and how much they may take. They also include arrangements such as groups setting up systems in which irrigators take turns to monitor other users to ensure that they are not violating the rules, or hire third-party agents to act as monitors and pay for them using their communal resources. By creating these rules, groups move the game away from what would otherwise be a tragedy of the commons [29] in water usage, in which self-interested individuals would simply take as much water as they needed, leaving insufficient amounts for other farmers.
Models in evolutionary biology, and indeed artificial life, have rarely considered the possibility that the individuals playing a game are able to jointly change some of the rules in this way. A prime example is given by models of sanctioning (e.g., [11,8,7,31]). These models consider one possible game form in which a strategy is for one individual to unilaterally punish another, at a cost to itself. But these models exclude other possible game forms that individuals could move to, given their current environment and technology, such as allocating a proportion of their shared resources to pay for some individuals to act as monitors, which removes the unilateral costs of punishment [49,2,27,52]. In other words, they do not allow for the possibility of individuals collectively changing the rules and hence the situation they find themselves in. Yet empirical evidence demonstrates that humans do exactly this across all scales of society [36,49,59].
Institutions and the selection of institutional rules are not an invention of modern society; they exist even in hunter-gatherer groups. For example, extant hunter-gatherer groups have rules specifying who may take part in hunting an animal, who gets to keep which part of the kill, how the food will be shared back at the camp, and so on [33]. These rules greatly increase the efficiency of exchange, because they prevent individuals from repeatedly having to engage in a costly negotiation process about how to share each and every kill. If individuals always had to negotiate, then the costs of negotiating might more than offset the benefits of sharing [38]. Furthermore, even among hunter-gatherers there is evidence that these rules are produced by political processes of bargaining and negotiation between group members. For example, when the Aché hunter-gatherer society moved from foraging to horticulture, they debated the benefits of public versus private ownership of fields, and finally voted to transition from public to private ownership, thereby changing the rules of their economic game [36].
The origin of agriculture during the Neolithic would similarly have necessitated a change of rules of property rights from public to private ownership, in order to prevent one individual from simply having its crops taken by another [6]. Agriculture would also have required rules to regulate the construction and usage of new collective goods such as irrigation systems [13]; rules of this kind are seen in extant small-scale farming communities [49]. As a final example, the explosion of longdistance trade in medieval Europe required new rules to allow a trader to ascertain the reputation of new trading partners, as in the law merchant system in Europe [43,26]. By creating these rules that spread reputation over long distances, the traders moved their situation away from a single-shot prisonerʼs dilemma game in which self-interested individuals would defect, and into a situation in which cooperation was an equilibrium. Historical evidence implies that these rules were self-created by coalitions of traders, rather than being imposed externally by a coercive state [26]. In modern economics, the institutional rules of a society have been argued to be the main determinant of whether whole nations succeed or fail [44,1].
The key point is that institutional rules can be actively shaped by group members [44,65,66]. Specifically, we should expect each group member to try to create institutional rules that will benefit itself and its kin. In extant hunter-gatherer groups, institutional rules are routinely discussed by all group members around the campfire [5]. By contrast, with the rise of agriculture leaders started to dominate the creation of institutional rules, creating rules that benefited themselves (e.g., by reinforcing inequality) at the expense of the rest of the group.
The story of human social evolution, then, is a story about how institutions and institutional rules have changed over time [55]. How have institutional rules been created that allow for successful trade between individuals who may never meet again? And why did the processes that create a groupʼs institutional rules change from egalitarian in hunter-gatherers, to extremely hierarchical in the first states?

A Framework for Modeling the Creation of Institutional Rules
Hurwicz [34] provides a general model for the dynamics of institutional rules within a group. Hurwicz defines an institution as a political game form, which sets the form (rules) for a subsequent economic game. In the political game form, the individual strategies consist of messages, and the outcomes consist of the rules of the economic game [59]. The material payoffs that individuals earn are then determined by playing the economic game according to these generated rules. For example, the political game may consist of individuals negotiating over how much each group member should contribute to the public good, and what the sanctions should be if an individual contributes less than this amount. Material payoffs are then assigned by playing the public goods game with these rules ( Figure 1).
In the presence of an institution, then, individuals engage in two stages of social interactions, where the first (political) sets the rules for the second (economic). Different sets of institutional rules generated in the political game will change the way that self-interested individuals will behave in the economic game. In other words, the results of the political game will determine whether cooperation is favored or not.
What might the rules of the political game itself look like? In modeling terms, the political game could be represented by an aggregation rule, as is common in social choice theory [17]. An aggregation rule is a function that transforms a collection of individual preferences into a group decision. Since hunter-gatherer groups are typically of an egalitarian nature, where the preferences of all group members are taken account of [5], the aggregation rule for a hunter-gatherer group might take some average of the preferences of all group members. By contrast, with the origin of agriculture, and subsequently the first states, political game forms became much less egalitarian [56,19]. Through unequal access to resources, leaders became able to change the rules of the political game so that it was no longer egalitarian, but instead favored themselves. They could then use the political game to create economic rules that benefited themselves at the expense of others. An example of this is the institutional rules that determine how the surpluses resulting from agriculture are distributed within groups. Among hunter-gatherers, institutional rules meant that food was shared relatively equally within groups [5]. With the transition to agriculture, however, despotic leaders created rules of  distribution in which most resources went to themselves and their kin [53]. This altered political game could be modeled by using an aggregation rule that gives weight to the amount of resource that a group member has.
An important future area of research is to develop more sophisticated models of the political game that go beyond simple aggregation rules. In reality, political games represent complex processes of bargaining and negotiation in which forward-looking individuals will try to realize their interests by persuading others, forming alliances, or the like. It is this complicated process that gives rise to some of the costs of having institutions [49] (more technically, these costs are examples of what are known as transaction costs in economics, where the transaction here is political). Thus, institutional theorists may draw upon computational models of both argumentation (e.g., [71]) and alliance formation (e.g., [24]). In particular, it is worth considering how belief-desire-intention models of agent behavior could be used to formalize the political game, by taking explicit account of individuals that have desires, and formulate behavior based on their current beliefs about the world and the beliefs of other agents.
As we have seen in the case of the Neolithic transition from hunter-gatherers to agriculturalists, the rules of political games themselves change over time. How and why is this? Answering this requires frame shifting up a level. In the general model of Hurwicz, the rules of the political game are set by a preceding game, which can be thought of as a constitutional game [50]. The constitutional game might model, for example, a transition between egalitarian and hierarchical interactions within groups. Of course, the rules for the constitutional game themselves have to come from somewhere, and they may themselves be set by another preceding game. However, there will not be an infinite regress of games, because eventually the rules will be given by unchangeable aspects of the environment, such as the total amount of resources available to individuals, and the laws of physics [34,50].
A criticism of the Hurwicz model might be that in reality institutions change very slowly, and that institutional evolution is highly path-dependent. The model presented here can take account of this, however. In particular, the political game does not have to be played on the same time scale as the economic game. For example, the economic game may be played many times over the course of a generation, while the political game may only be played once every several generations. Further, the political game takes account of path dependence because it is constrained by rules set by the constitutional game, which will typically be played even less frequently. In this way the model combines intentional change, where self-interested actors actively try to create rules to benefit themselves, with historical contingencies. The balance between the effect of historical contingencies and the effect of intentional action is an empirical question that can only be determined by examining the particular institutions in question. Finally, as will be demonstrated below, individuals do not have to be unreasonably forward-looking to form institutional rules under the Hurwicz model. Rather, processes of trial-and-error and payoff-biased social learning can lead to the spread of efficient institutional rules [52].

Comparison with Other Approaches to Modeling Institutions
One approach in the literature has been to view institutions directly as the rules of the economic interactions themselves (e.g., [44,49]). The problem with this approach is that it struggles to explain institutional change. Viewing institutions as rules recognizes that they can be produced by intentional action-in other words, that institutions are the means by which humans shape economic interactions [44]. However, we also need a model for the processes that generate these rules. Following Hurwicz [34], it is argued here that the essence of an institution is a political game form that generates the rules.
The other main approach in the literature is to view institutions as equilibrium patterns of social behavior within groups (e.g., [68,61,60]), for example, driving on the right-hand side of the road, or having a taboo against eating certain foods (see, e.g., [61]). This view of institutions as equilibria is commonly used in models of cultural group selection [62,60]. The idea here is that different social groups happen to reach different stable equilibria (for example, as modeled in [10]), that is, settle on different institutions. The equilibrium that a group reaches may be due to the initial frequency of a social behavior when the group is founded, for example. Groups at an equilibrium where individuals will happen to cooperate might then outcompete other groups at equilibria where their members cooperate less, leading to the spread of cooperation-promoting institutions. The idea of institutions as equilibria is compatible with the model presented here to the extent that different institutional rules-that is, different outcomes of the political game-will lead to different economic game forms with different equilibria.
However, the Hurwicz model makes very different predictions about the processes by which groups move between equilibria in the economic game. In the institutions-as-equilibria cultural group selection model, institutional change is a result of random-drift-like processes inside groups followed by competition between groups. This is inherently a punctuated process, because variation is only produced and selected at the group level. By contrast, the model presented here allows institutional rules to change as a result of intentional action inside groups. Allowing for intentional action in addition to random drift fits well with the cognitive skills of humans, including language and shared intentionality [69]. It allows for the fact that self-interested individuals should be expected to try and craft institutional rules that benefit themselves in economic interactions. As a result, the model predicts gradual and step-by-step change as individuals constantly strive to improve their lot by either exploiting the existing institutional rules, or trying to change the rules to benefit themselves [44]. By contrast, for cultural group selection, only catastrophic events that affected the whole group, such as warfare or a major internal crisis, could lead to institutional change.
In reality, institutional change is likely to reflect some elements of both processes. Institutional rules such as the law merchant [43,26], which regulated anonymous trade in medieval Europe in the absence of coercion, clearly reflect elements of intentional design. Other institutional rules, such as the side of the road that we drive on, are more the result of stochastic variation. Further, the political game form and the economic game form may be constrained by past chance events. However, in the case of both dyadic exchange and public goods production, individuals have been demonstrated to not blindly cooperate, but to make a calculated choice based on the context [57,28,39,51]. In other words, they respond to the institutional environment in a calculating way. This in turn implies that they should be expected to actively shape the institutional environment as far as possible to meet their own preferences.
5 What Can the Institutional Approach Offer to Our Understanding of Societal Challenges?
The problem of cooperation in modern societies manifests itself in two forms. The first is in exchange of resources between agents, that is, trade. Trade may be between individuals at a village market, between firms within a nation, or between nations. The second form of cooperation is in the provision and usage of collective goods, ranging from the management of a local inshore fishery, through to a global reduction in carbon emissions to prevent climate change.
In all of these cases, what determines whether or not a society achieves cooperation is whether or not its institutional rules provide the right incentives to the agents in that society. Do the institutional rules move the economic game form away from a single-shot prisonerʼs dilemma? The agents could be, for example, single individuals, firms, or governments.
As Ostrom [49] notes, policy prescriptions by economists and other social scientists have traditionally involved externally imposing a solution to a cooperation problem on a society. For trading, this might involve suggesting that a society copy the market rules of a more successful society. For collective goods, suggested policies might include either dividing the goods into private shares, or assigning a state body to monitor and enforce rewards and punishments [49]. But as Ostrom stresses, these imposed mechanisms of institutional change have repeatedly failed. Essentially, this is because what works well in one local environment need not necessarily work well in another. This is both because local environments will tend to differ in ways that affect the economic game form, and because different societies have different local norms and customs. Transplanting institutional rules into a society in which they are not compatible with the norms and beliefs held by the agents within that society is unlikely to work. Furthermore, norms and beliefs typically change very slowly, which is why economics tends to explain changes in behavior in terms of changes in relative prices rather than changes in individual preferences [44].
This suggests that to make successful policy prescriptions we need a bottom-up understanding of how institutional rules change within societies. Traditional models in economics have focused on equilibrium conditions. But such models, along with cultural group selection models, are ill suited to capture the dynamics of institutional evolution, because institutions typically change through many small and gradual changes. And while the Hurwicz framework and similar approaches (e.g., [59]) have been proposed in economics, they have not been instantiated in a fully dynamic form that fits particular empirical scenarios. This is where artificial life, and the related field of agent-based economics, comes in. At its very core, artificial life is concerned with producing the bottom-up generation of behavior. This is exactly what is needed to understand how agent behavior and institutional rules coevolve. To date, a convincing theory of institutional change has been lacking. A convincing model of institutional change needs both to allow institutional rules to change as a result of individual agent behavior, and to allow for the fact that individual agents are not perfectly rational and will have incomplete information about their environment. These are both traditional strengths of artificial life.
Artificial life researchers are also used to dealing with complex systems in which small perturbations can sometimes cause large and unexpected shocks. This is quite likely to occur with institutional evolution, where small changes in the outcome of the political game may lead to large changes in the economic game form. Again, the toolkit of bottom-up modeling is well equipped to highlight this.
By using artificial life simulation techniques, we can begin to get a handle on the effect that changing the institutional rules is likely to have on economic games, and on how these changes in the economic game form feed back into changed individual preferences in the political game. We can also start to appreciate the effect of different political and constitutional game forms on this process. This has previously all lain outside the scope of static equilibrium models, which has limited the ability of analysts to foresee the implications of policy changes.
The next section provides a basic example of how the general Hurwicz model can be instantiated as a dynamic model of the evolution of institutional rules.

The Evolution of Institutional Rules for Rewarding and Sanctioning in a Public Goods Game with an Egalitarian Political Game Form
One well-studied type of social interaction in both the social sciences and evolutionary biology are collective actions [48]. In these situations each group member must choose whether or not to cooperate by contributing some of their individual resources to a group project. This provides a benefit that is shared with the whole group, including the actor. Because the benefits are shared with the whole group, but only cooperators pay the cost, we would expect defection to be favored by evolution in the absence of other factors such as rewards, punishment, reputation, or kin structure. Examples of collective action problems have occurred throughout human evolution. For example, hunter-gatherers engage in cooperative hunting, in which several individuals must work together in order to prevent a prey from escaping. Hunter-gatherers also engage in various collective construction projects, such as burning habitat, and building dams to trap and poison fish [37]. The advent of agriculture brought about further collective action problems, including the usage of common-pool resources such as irrigation water [12] and grazing land.
It is well known that collective action problems can be solved if individuals that cooperate are rewarded, or if individuals that defect are punished [48,47]. Specifically, cooperation will be individually advantageous if the reward that cooperators receive is greater than the cost of cooperating, or if the punishment that defectors receive is greater than the cost of cooperating [47,63]. The question is then, where do these rewards or punishments come from [49]? Cultural evolution models have typically assumed that each individual unilaterally chooses whether to reward or punish another, at some individual cost (e.g., [11,8,7,31,58]). This kind of unilateral and uncoordinated punishment has been shown in some behavioral economics experiments involving individuals playing public goods games in a university laboratory setting (e.g., [20], but see also [3] for a different interpretation of these experiments). However, once we move outside the behavioral economics laboratory and into field settings, evidence for individually costly and uncoordinated punishment is rare (see [27] for a review). Not only are rewards and punishments coordinated in modern states, (e.g., through a tax-funded police force), but they are also coordinated both among small-scale hunter-gatherer [5] and in agricultural societies [49]. That is, rewards and punishments are coordinated by self-created institutional rules.
The model below considers a situation in which individuals not only take part in a collective action to generate resources for their group, but also take part in a political game that determines how these resources are to be used. The model assumes an egalitarian political game form in which the preference of each group member is weighted equally when creating the institutional rules. This type of political game form would have been relevant during the hunter-gatherer period and the transition to agriculture. It also applies to modern self-governing societies that form their economic rules without a coercive state or elite imposing them. Examples include the governance of community irrigation systems, fisheries, and forests studied by Ostrom [49].
In the model, group members decide how much of their resources to use as a productive public good, for example to build and maintain an irrigation system, as opposed to how much to use to incentivize cooperation through rewards and punishments. This makes the evolution of the incentives to cooperate endogenous to the model. Consequently, this type of model can be used both to examine the conditions under which groups can self-organize to create incentives to cooperate, and to determine the balance between rewards and punishments that is evolutionarily stable under different conditions. The model is aimed at elucidating the selection pressure on institutional rules. It aims to determine a set of sufficient conditions for the evolution of institutional rules that lead to stable cooperation among self-interested individuals. Because the focus is on finding sufficient conditions, the ecological environment has been deliberately kept simple. Similarly, economic interactions are modeled using the standard public goods game from game theory, allowing comparison with the vast body of literature on public goods games.

Model Definition
The model presented here builds upon the model of the cultural evolution of sanctioning institutions presented in [52]. The model considers a population of individuals that is subdivided into a finite number of groups, N g , linked by migration. This spatial population structure corresponds to Wrightʼs finite island model [72]. The life cycle of individuals consists of discrete and nonoverlapping generations, as follows: (1) Social interactions occur between all individuals within each group, as detailed below. (2) Each individual has a Poisson-distributed number of offspring that survive to adulthood, with the mean of the distribution being determined by the social interactions and resource abundance within its group (defined explicitly below). (3) Adults of the previous generation perish. (4) Each individual of the descendant generation either remains in its local group (with probability 1 − M) or disperses to a randomly chosen group (excluding its natal group).
Each individual i in group j carries three cultural traits that are passed from parent to offspring subject to a per-trait mutation rate A. The first trait determines whether individuals cooperate and produce B units of public good at a cost of C to themselves, or whether they defect and produce no public good, and hence pay no cost. Mutation on this trait involves changing to the other type. The second trait is a preference, h ij (range [0, 1]), for the proportion of their groupʼs public good, h j , that should be used for production, for example, to maintain an irrigation system. This good is distributed among all group members to increase their payoff, and is referred to as the productive public good. The remaining proportion, 1 − h j , of the public good is then used to pay for institutional rewards and punishments. How this is divided up between reward and punishment is determined by the third trait that individuals carry, r ij . Specifically, individuals have a preference for what proportion, r j (range [0, 1]), of the remaining public good should be used to reward cooperators. The remainder (1 − r j ) is then invested in punishing defectors. Consequently, the fraction of the total public good invested in punishing defectors is (1 − h j )(1 − r j ), and the fraction invested in rewarding cooperators is (1 − h j )r j . The traits h ij and r ij thus represent individual preferences over outcomes of the political game, that is, preferences for h j and r j . Mutation on these traits changes the value according to a truncated normally distributed random variable (with variance j = 0.1), centered around the current trait value.
The social interaction stage of the life cycle is defined by a political game followed by an economic game within the individualʼs group. The number of cooperators in a group at time t is written as n cj (t ), and the number of defectors as n dj (t ). The political game determines h j , the proportion of group j ʼs public good that is used for production. It also determines r j , the proportion of the remaining public good that is used to reward cooperators as opposed to punish defectors. The model assumes an egalitarian political game form in which each group memberʼs preference is weighted equally. The values of h j and r j are thus set by taking the mean of each group memberʼs preference (without regard to whether the individual is a cooperator or a defector). This is then followed by the economic game, which is modeled as a linear public goods game followed by rewards and punishments according to the values of h j and r j .
Cooperators contribute to the public good, and may be rewarded for doing so, depending upon the outcome of the political game. Defectors do not contribute, and may be punished for this, again depending on the outcome of the political game. The outcome of the economic game determines the maximal growth rate of cooperators and defectors in the group. This can equivalently be thought of as the payoff from social/economic interactions. The maximal growth rate of cooperators in group j at time t, U cj (t ), is then given by The constant E represents the efficiency of implementing the institutional rules, that is, the rate at which public good is converted into rewards or punishments. This would typically be less than 1, due to various transaction costs [44], including the costs of negotiating and bargaining over how the public good is to be used, and the costs of monitoring individuals to determine whether they cooperate or defect [49]. The base growth rate, in the absence of social interactions, is given by U 0 . The term E(1 − h j (t ))r j (t )B represents the reward given to each cooperator, which is determined by the institutional rules decided by the group members in the preceding political game. Note that the number of cooperators cancels out of this term, because the marginal benefit of cooperation is assumed to be a constant that is independent of the number of cooperators, that is, cooperation brings constant returns to scale, as in linear public goods games. The maximal growth rate of defectors in group j at time t, U dj (t ), is given by where the last term represents the amount of punishment given to each defector, again as decided by the outcome of the political game.
The values of h j (t ) and r j (t ) are set by an aggregation rule that takes the mean of each group memberʼs preference: The fitness (expected number of offspring) of cooperators w c and defectors w d in group j is then defined as This follows a Beverton-Holt model of reproduction (e.g., [16]), as commonly used in ecological modeling where generations are discrete. It corresponds to a discrete-time analogue of the logistic growth equation. The actual number of offspring produced by each individual is given by sampling from a Poisson distribution with the fitness of the individual as the mean of the distribution. The variable K j (t ) can be thought of as the "carrying capacity" of the group (more precisely, it is a dynamic variable representing the intensity of local density-dependent competition within the group, and is proportional to the carrying capacity as defined in standard ecology models). Its value is determined by a type of hard selection process in which the carrying capacity of a group depends upon the mean growth rate (payoff) of its members (both cooperators and defectors), relative to the mean growth rate in the population as a whole. The mean growth rate of the members of a group, U j t ð Þ, is calculated as follows: In contrast to some forms of hard selection, the total population carrying capacity (the sum of all group carrying capacities) is kept fixed to its value at the beginning of the first generation, N g G, where G is a parameter that determines the initial size of every group. Specifically, This represents cases in which there is a finite amount of resource available for the whole population, and consequently one groupʼs growth is another groupʼs loss. While the model is framed here in terms of biological fitness, the vertical transmission and the fitness-proportionate selection used correspond to payoff-biased social learning, where individuals imitate traits in proportion to the payoff their bearers receive relative to the mean payoff in the population [14].
The model defines a stochastic process for the state variables n cj (t), n dj (t), h j (t), and r j (t) in each group j of the spatially structured population. These variables allow us to evaluate the average frequency of cooperators and defectors, and the average values of h ij and r ij in the population. Due to the strong nonlinearity of the model, the analysis proceeds by means of individual-based simulations. The baseline parameters used for the simulations, unless otherwise specified, are given in Table 1. Because the simulation model is stochastic and contains no absorbing states, it represents an ergodic Markov chain in which every state will eventually be visited through mutation. As a result, we are interested in the stationary distribution, that is, what proportion of time the simulation spends in each state. Consequently, the analysis focuses on the long-run time-average values of cooperation, h ij , and r ij , which do not depend upon initial conditions provided that the simulation is run for a sufficient length of time. This is in contrast to the multiple replicates that would have to be done in simulations with absorbing states.

Results
As a baseline we can consider first the case of a well-mixed population consisting of a single group. Starting the analysis with a well-mixed population allows us to determine the role that group structure plays in the evolutionary dynamics. Figure 2 shows the resulting coevolutionary dynamics of cooperation and the outcome of the political game (rewards and punishments). Cooperation is stable when there is sufficient investment of a groupʼs resources in rewards and punishments, so that the cost of cooperating is more than offset by the rewards to cooperators and the punishments received by defectors. However, individual preferences over outcomes of the political game, h ij and r ij , are not themselves under selection in a well-mixed population. This is because the only effect that h ij and r ij have on individual fitness is through their effects on the outcome of the political game, that is, h j and r j . But if the population only consists of a single group, then h j and r j are the same for every member of the population. Consequently, the individual preferences h ij and r ij cannot be a source of differential fitness, because they do not differentially affect the bearer. As a result they change entirely through drift, that is, random sampling. When h j by chance becomes too large, then cooperation breaks down because of insufficient investment in rewards and punishments.
In comparison, Figure 3 illustrates the coevolutionary dynamics when there are N g = 50 groups connected by migration. The key result is that individual preferences over the outcome of the political game are now under selection. Specifically, h ij , the individual preferences for the proportion of resources to be used as a productive public good rather than for rewards and punishments, is selected to become close to 1. This is due to the hard selection process, in which the carrying capacity of a group depends on the mean growth rate of its members compared to the population average. Groups that obtain a larger carrying capacity will then send out a larger number of migrants, thereby spreading the institutional preferences of their members throughout the population. Hard selection in a structured population thus creates competition between institutions [52]. However, cooperation is not stable, because individuals evolve to invest as much of their resources as possible in the productive public good, at the expense of the rewards and punishments that are necessary to maintain it. This is the "tragedy of the political game." It is largely analogous to the second-order free-rider problem in traditional models of punishment [22]. Here it arises because cooperative individuals receive an immediate benefit from the productive public good. By contrast, the benefits of punishing defectors only arise if there are a sufficient number of defectors present. The benefits of rewards are also lower than that of the productive public good (E < 1), so the productive public good is also preferred to rewards. Thus, although individuals could play the political game in such a way that cooperation would be stable through rewards and punishments, they are tempted not to because of the immediate benefits of investing in the productive public good instead, even though this ultimately leads to the loss of cooperation.  Figure 4 shows the long-run time-average values of cooperation, h ij and r ij , and their sensitivity to model parameters, when the stationary distribution is approximated by running the simulations for 3 × 10 6 generations. These results also show that where individuals can choose their own system of rewards and punishments, they prefer rewards to punishments. This occurs even though punishment of defectors is more efficient when cooperation is common [15]. Punishment is more efficient in this case because each unit of investment only has to be shared amongst defectors to penalize them, whereas each unit of investment in rewards will have to be shared with nearly the entire group. Punishment could therefore allow cooperation to be maintained under larger h j values. However, because rewards bring an immediate benefit to cooperators, evolution goes in this direction rather than towards efficiency.
How can the stability of institutional rewards and punishments be increased, and the tragedy of the political game be averted? To investigate this, we can consider an alternative form of hard selection in which the productive public good directly increases carrying capacity. This would be the case with irrigation farming, for example [30]. The increase in carrying capacity must ultimately be limited by other factors, though, such as space. This means that investment in the productive  public good now experiences diminishing marginal returns. This can be modeled using the following function in place of Equation 7 [52]: where h is a parameter controlling the saturation point, that is, the maximum possible increase in carrying capacity from investment in the productive good. The parameter g sets the gradient, that is, how quickly the saturation point is reached. The growth rate and payoff functions of cooperators and defectors (Equations 1 and 2) are replaced with the following (since the productive public good has been moved from the growth rate to the carrying capacity terms): Figure 5 shows the results of using this model with h = 300 and g = 0.0075. In this case the tragedy no longer occurs-individuals do not evolve to invest all of their resources in the productive public good. Rather, rewards and punishments are maintained, and cooperation remains stable. Diminishing returns mean that the selection pressure on increasing h j also diminishes as h j becomes large. Consequently, the tragedy of the political game is averted. In reality, all pubic goods must ultimately undergo diminishing marginal returns [21,54], implying that the tragedy is not likely to occur in many situations. Finally, individuals again evolve to prefer rewarding to punishment.

Discussion
Institutions can be defined as political game forms that generate the rules, and hence incentives, for economic interactions [34]. Taking this view allows us to produce dynamic models of institutional evolution. When combined with historical evidence on the types of political game forms and institutional rules that different societies had (e.g., [26]), this allows us to explore why some groups have managed to create institutional rules that foster cooperation, and why others have failed [1]. Applications of this include understanding the rise of hierarchy and states, and addressing pressing public goods problems such as climate change.
Cultural group selection models have traditionally viewed institutions as equilibria. These models suggest that institutional rules change by a discontinuous and punctuated process of random drift and between-group competition. However, individuals should be expected to try to craft institutional rules that benefit themselves. This means that institutional rules can also change as a result of withingroup processes, often on much faster time scales and without the need for catastrophic events occurring at the group level. The simulation model presented here demonstrates that the institutional rules that support cooperation can evolve among self-interested individuals, without the need for conformity-or prestige-biased social learning to suppress competition within groups.
The model also demonstrates the importance of modeling the dynamics of rule formation. Previous work has shown that the most efficient strategy for promoting cooperation should be to switch from institutional rewards to punishments once cooperation becomes common [15]. However, evolution of individual preferences for the rules does not lead to this efficient strategy. Rather, individuals evolve to prefer rewarding to punishing even when cooperation is common (Figures 4  and 5). This is because, in contrast to punishment, cooperators still receive some small benefit from rewards even when there are no defectors present. Future work should model political game forms in more detail. There is a need for more realistic models of the bargaining and negotiation processes that go on within groups to generate institutional rules. How can we best model the bargaining process between individuals with different preferences for institutional rules? The processes by which political game forms themselves change also need to be modeled. When are political game forms likely to move between egalitarianism and despotism, as happened, for example, with the transition from a hunter-gatherer to agricultural lifestyle 10,000 years ago?
In summary, a framework for modeling institutional evolution has been presented here. An application of the framework was illustrated using a simple model of the coevolution of individual social behaviors, with individual preferences for whether groups should reward cooperators, or punish defectors. The political game form was modeled as an egalitarian process in which the preferences of all group members were aggregated. Previous work suggests that the results will be qualitatively similar if the institutional rules are set by a single individual (i.e., a leader ), provided that the leader receives the same amount of the public good as other group members [52]. However, future work should investigate how the rules will change if leaders take a disproportionate share of the public good, as happened after the origin of agriculture.
In conclusion, it is intended that this framework will allow artificial life researchers to address how groups can self-organize to create conditions that support cooperation.