Phase VII — Game Theory for Optimization & Robotics | Week 15 | 2.5 hours Play once, defect. Play forever, cooperate — the shadow of the future changes everything.
Robots in a warehouse interact repeatedly at shared corridors, intersections, and charging stations. A robot that aggressively cuts off others gains short-term speed but faces retaliation tomorrow. The folk theorem guarantees that cooperative norms (yield at intersections, share corridors) can be sustained as equilibria when robots value future interactions — exactly the regime of a permanent fleet.
Given a stage game $G$, the $T$-fold repeated game $G(T)$ plays $G$ for $T$ rounds. Players observe all previous actions (perfect monitoring).
In the infinitely repeated game $G(\infty, \delta)$, player $i$'s payoff is:
$$U_i = (1 - \delta) \sum_{t=0}^{\infty} \delta^t u_i(a^t)$$
where $\delta \in (0, 1)$ is the discount factor and the $(1-\delta)$ normalization makes payoffs comparable to stage-game payoffs.
Grim trigger: Cooperate until any player defects, then defect forever.
For the Prisoner's Dilemma with payoffs $R$ (reward), $T$ (temptation), $P$ (punishment), $S$ (sucker), where $T > R > P > S$:
Cooperation under grim trigger is an SPE iff:
$$\delta \geq \frac{T - R}{T - P}$$
Tit-for-Tat (TFT): Start cooperating. Then copy opponent's last action. - Simple, retaliatory, forgiving, clear. - Won Axelrod's tournament (1980).
Win-Stay-Lose-Shift (Pavlov): Repeat last action if payoff was $R$ or $T$; switch if $P$ or $S$.
The feasible payoff set $F$ is the convex hull of all pure-strategy payoff vectors.
The minmax value for player $i$: $\underline{v}_i = \min_{a_{-i}} \max_{a_i} u_i(a_i, a_{-i})$.
A payoff vector $v$ is individually rational if $v_i \geq \underline{v}_i$ for all $i$.
Folk Theorem (Friedman 1971): Any feasible, individually rational payoff vector $v$ with $v_i > \underline{v}_i$ can be sustained as a Nash equilibrium of $G(\infty, \delta)$ for $\delta$ sufficiently close to 1.
Stronger version (Fudenberg-Maskin 1986): Under full dimension condition, any feasible strictly individually rational payoff is achievable in subgame perfect equilibrium for large $\delta$.
For finitely repeated games with a unique NE in the stage game: backward induction gives the stage NE in every round. No cooperation emerges.
Exception: if the stage game has multiple NE, cooperation can be sustained in early rounds by threatening to switch to a worse NE in the final rounds.
"""Repeated games and folk theorem — Day 103"""
import numpy as np
import matplotlib.pyplot as plt
from itertools import product
# --- Strategy definitions ---
def always_cooperate(history, player):
return 'C'
def always_defect(history, player):
return 'D'
def tit_for_tat(history, player):
if not history:
return 'C'
opponent = 1 - player
return history[-1][opponent]
def grim_trigger(history, player):
opponent = 1 - player
if any(h[opponent] == 'D' for h in history):
return 'D'
return 'C'
def pavlov(history, player):
if not history:
return 'C'
my_last = history[-1][player]
opp_last = history[-1][1 - player]
if (my_last == 'C' and opp_last == 'C') or (my_last == 'D' and opp_last == 'D'):
return my_last # win-stay
return 'D' if my_last == 'C' else 'C' # lose-shift
def random_strategy(history, player):
return np.random.choice(['C', 'D'])
# --- Game engine ---
PD_PAYOFFS = {
('C', 'C'): (3, 3), ('C', 'D'): (0, 5),
('D', 'C'): (5, 0), ('D', 'D'): (1, 1)
}
def play_repeated_game(strat1, strat2, rounds=200, payoffs=None):
"""Simulate a repeated game."""
if payoffs is None:
payoffs = PD_PAYOFFS
history = []
scores = [0, 0]
score_history = [[], []]
for _ in range(rounds):
a1 = strat1(history, 0)
a2 = strat2(history, 1)
p1, p2 = payoffs[(a1, a2)]
scores[0] += p1; scores[1] += p2
history.append((a1, a2))
score_history[0].append(scores[0])
score_history[1].append(scores[1])
return scores, history, score_history
def run_tournament(strategies, rounds=200, payoffs=None):
"""Round-robin tournament."""
names = list(strategies.keys())
n = len(names)
total_scores = {name: 0 for name in names}
for i in range(n):
for j in range(n):
scores, _, _ = play_repeated_game(
strategies[names[i]], strategies[names[j]], rounds, payoffs)
total_scores[names[i]] += scores[0]
total_scores[names[j]] += scores[1]
ranking = sorted(total_scores.items(), key=lambda x: -x[1])
return ranking
def threshold_delta(T, R, P):
"""Minimum discount factor for grim trigger cooperation in PD."""
return (T - R) / (T - P)
def plot_payoff_evolution(strat1, strat2, name1, name2, rounds=100):
"""Plot cumulative and per-round average payoffs."""
scores, history, score_hist = play_repeated_game(strat1, strat2, rounds)
avg1 = [score_hist[0][t] / (t + 1) for t in range(rounds)]
avg2 = [score_hist[1][t] / (t + 1) for t in range(rounds)]
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
ax1.plot(score_hist[0], label=name1); ax1.plot(score_hist[1], label=name2)
ax1.set_title('Cumulative Payoffs'); ax1.legend(); ax1.grid(True, alpha=0.3)
ax2.plot(avg1, label=name1); ax2.plot(avg2, label=name2)
ax2.axhline(y=3, color='g', linestyle='--', alpha=0.5, label='Mutual C')
ax2.axhline(y=1, color='r', linestyle='--', alpha=0.5, label='Mutual D')
ax2.set_title('Average Payoff per Round'); ax2.legend(); ax2.grid(True, alpha=0.3)
plt.tight_layout(); plt.savefig('repeated_game.png', dpi=100); plt.show()
def visualize_folk_theorem():
"""Visualize feasible and individually rational regions for PD."""
fig, ax = plt.subplots(figsize=(7, 6))
# Feasible set: convex hull of payoff vectors
payoff_vecs = [(3,3), (0,5), (5,0), (1,1)]
from matplotlib.patches import Polygon
hull = Polygon([(0,5),(3,3),(5,0),(1,1)], alpha=0.2, color='blue', label='Feasible')
ax.add_patch(hull)
# Minmax = 1 for both (can guarantee 1 by playing D)
ax.axhline(y=1, color='red', linestyle='--', alpha=0.7)
ax.axvline(x=1, color='red', linestyle='--', alpha=0.7, label='Minmax = 1')
ax.fill_between([1, 5], 1, 5, alpha=0.1, color='green', label='IR region')
for v, label in zip(payoff_vecs, ['(C,C)', '(C,D)', '(D,C)', '(D,D)']):
ax.plot(*v, 'ko', markersize=8)
ax.annotate(label, v, textcoords="offset points", xytext=(5, 5))
ax.set_xlabel('Player 1 payoff'); ax.set_ylabel('Player 2 payoff')
ax.set_title('Folk Theorem: Achievable Payoffs (PD)')
ax.legend(); ax.set_xlim(-0.5, 6); ax.set_ylim(-0.5, 6); ax.grid(True, alpha=0.3)
plt.tight_layout(); plt.savefig('folk_theorem.png', dpi=100); plt.show()
# --- Demo ---
if __name__ == "__main__":
delta_min = threshold_delta(T=5, R=3, P=1)
print(f"Minimum δ for grim trigger cooperation: {delta_min:.3f}")
strategies = {
'TFT': tit_for_tat, 'Grim': grim_trigger, 'AllC': always_cooperate,
'AllD': always_defect, 'Pavlov': pavlov, 'Random': random_strategy
}
ranking = run_tournament(strategies, rounds=200)
print("\nTournament Results:")
for name, score in ranking:
print(f" {name:8s}: {score}")
plot_payoff_evolution(tit_for_tat, grim_trigger, 'TFT', 'Grim')
visualize_folk_theorem()
P1. For PD with $T=5, R=3, S=0, P=1$, find the minimum $\delta$ for: (a) grim trigger, (b) TFT.
P2. In a finitely repeated PD (100 rounds), what does backward induction predict? Why is this empirically wrong?
P3. Design a strategy that beats TFT against AllD but loses to TFT against itself.
E1. Prove the folk theorem for the PD: show that for any payoff vector $(v_1, v_2)$ with $v_i > 1$ in the feasible set, there exists a strategy profile and $\bar{\delta}$ such that for all $\delta > \bar{\delta}$, it is an SPE.
E2. Implement Axelrod's ecological tournament: after each generation, strategies replicate proportional to their score. Plot population dynamics over 100 generations.
E3. Extend to imperfect public monitoring: with probability $\epsilon$, the observed action is flipped. How does this affect the minimum $\delta$ for cooperation?