Let’s now compare each algorithm’s performance based on total points obtained per game. Again, the logic here is the same: bootstrap the mean number of points per game and compare them using the `comp.alg`

function which takes two algorithms at once and returns a data frame with four columns, one column for each player for two algorithms.

#### average cooperative response

Computing the average cooperative responses by algorithm after playing 500 games of 200 rounds each should help us understand which algorithm tends to play nice and which ones are more selfish. For the sake of reproducibility, we’ve set a seed and established an equal, fixed probability of defection and cooperation across algorithms.

## Code for algorithm median cooperation

```
mean_df <- data.frame(Median = c(replicate(500,median(
sample(copy.cat(200,c(.35,.27),0)$P2,
replace = T))),
replicate(500,median(
sample(prober(200,c(.35,.27),0,.05)$P2,
replace = T))),
replicate(500,median(
sample(tit.for.tat(200,c(.35,.27),0)$P2,
replace = T))),
replicate(500,median(
sample(exact.revenge(200,c(.35,.27),0)$P2,
replace = T)))),
kind = factor(rep(
c("Copy.Cat", "Prober", "TFT", "Revenge"),
each = 500),
levels = c("Copy.Cat", "Prober", "TFT", "Revenge")))
clrs <- c(
"#FFBE00", # MCRN yellow
"#B92F0A", # MCRN red
"#7C225C", # MCRN maroon
"#394DAA" # MCRN Blue
)
p0 <- ggplot(mean_df, aes(x = mean, fill = kind)) +
geom_density(alpha = .45) +
ylim(0,25) +
labs(x = "Avg. cooperative response") +
scale_fill_manual(values = clrs) +
labs(colour = "Algorithm")
```

The cooperative response plot shows that, on average, the tit–for–tat algorithm cooperates more often than the vengeful and the prober algorithms, but just as much as `copy.cat`

^{5}.

^{5} I used the Wilcoxon rank sum test to compute the difference in medians and check if the algorithm distributions are stochastically equivalent. It’s not clear whether the `prober`

algorithm is equivalent to `tit.for.tat`

, but the test shows a statistically significant difference in the algorithms’ median cooperative responses.

## Code

```
wilcox_gt <- mean_df |>
filter(kind == c("TFT","Prober")) |>
pivot_longer(cols = -kind, names_to = "vars",
values_to = "values") |>
group_by(vars) |>
summarize(
Estimate = wilcox.test(values~kind, conf.int = T)$estimate,
Sig. = wilcox.test(values~kind, conf.int = T)$p.value
) |>
gt() |>
tab_header(
title = md("*Comparison of Tit×Tat and Prober cooperative responses*")
) |>
cols_label(vars = md("**Estimate**"),
Estimate = md("**Difference in estimate**"),
Sig. = md("**p-value**")) |>
cols_align(align = "center") |>
opt_table_font(font = google_font("EB Garamond"),
weight = 400,style = "plain", add = TRUE) |>
tab_style(style = cell_text(font = google_font("Ubuntu")),
locations = cells_body()) |>
tab_options(column_labels.font.weight = "bold",
row_group.font.weight = "bold") |>
data_color(rows = everything(),
palette = "#f9f9f9")
wilcox_gt
```

**Estimate** |
**Difference in estimate** |
**p-value** |

Median |
-2.38e-05 |
0.0451 |

Axelrod believed that, on average, “nice” algorithms would perform better than “ mean and deceitful” ones like `exact.revenge`

and `prober`

, and we can clearly see that our most vengeful algorithm performs poorly on the cooperative front, with a mean cooperation of .0055 per game, about once per game.

Based on the algorithm’s cooperative response we can then visualize and compute the final score distribution (in total points) for each player after 500 games of 200 rounds each. We’ll see that, by defecting often and without provocation, the second player typically scores much higher than player 1. However, when player 2 chooses to not betray without provocation or imitate the first’s player current choices both players end up with higher and approximately equal scores. Because of this willingness to cooperate, according to Axelrod, the nice algorithms tend to be more conducive to cooperation and less prone to conflict.

## Code for data manipulation & plots

```
test2 <- comp.alg(c(exact.revenge, prober),
iter = 500,
rounds = 200,
prob = c(.345, .231),
choice = 0,
default = .10) |> rename("R.P1" = "A1P1",
"R.P2" = "A1P2",
"Pr.P1" = "A2P1",
"Pr.P2" = "A2P2")
perf_df <- cbind(test,test2) |> pivot_longer(cols = everything(),
names_to = c("Algorithm", "Player"),
names_pattern = "([^.]+)\\.(P\\d)",
values_to = "Score") |>
mutate(Algorithm = case_when(
Algorithm == "R" ~ "Revenge",
Algorithm == "Pr" ~ "Prober",
Algorithm == "CC" ~"Copy.Cat",
Algorithm == "TFT" ~ "Tit×Tat",
TRUE ~Algorithm
),
Algorithm = factor(Algorithm),
Player = factor(Player)) |>
arrange(Algorithm, Player) |>
group_by(Algorithm, Player)
#> str(perf_df)
p1 <- ggplot(perf_df, aes(Score, fill= Player)) +
geom_density(alpha = .45) +
labs(title = NULL,
x = NULL,
y = NULL) +
scale_fill_manual(values = clrs)
p2 <- ggplot(perf_df, aes(Score, fill= Algorithm)) +
geom_density(alpha = .45) +
labs(title = NULL,
x = "Score",
y = NULL) +
scale_fill_manual(values = clrs)
#> p1/p2
```

These two plots show the performance of both players and the four algorithms, and we can see that the most extreme results are obtained using the `exact.revenge`

algorithm. The density curves for this algorithm are touching the extremes on the left and right, showing that one player will always under perform while the other one, typically the second one, will outperform his opponent. This algorithm is a “sneaky” one because player 1 is not able to confidently predict his opponent’s moves. Regardless of his choice, after the first betrayal, player 2 will never cooperate again.

To my surprise, the `prober`

algorithm returned a more favorable score for player 2 than the `tit.for.tat`

or the `copy.cat`

algorithm; however, the nice algorithms always favored both players, not just player 2.

## Code

```
sumperf_gt <- perf_df |> summarise(Score = mean(Score)) |>
arrange(Algorithm) |>
as_tibble() |> gt() |>
tab_header(title = md("*Average Player Score by Algorithm*")) |>
cols_label(Algorithm = md("**Algorithm**"),
Player = md("**Player**"),
Score = md("**Score**")) |>
opt_table_font(font = google_font("EB Garamond"),
weight = 400,style = "plain", add = TRUE) |>
tab_style(style = cell_text(font = google_font("Ubuntu")),
locations = cells_body()) |>
tab_options(column_labels.font.weight = "bold",
row_group.font.weight = "bold") |>
data_color(rows = everything(),
palette = "#f9f9f9") |>
opt_interactive(use_compact_mode = TRUE, use_highlight = TRUE)
sumperf_gt
```

*Average Player Score by Algorithm*

Nevertheless, regardless of which algorithm we used, player 1’s score is, on average, lower than player 2, even if the nice algorithms tend to equalize the scores and may result in less conflict as Axelrod described.