Phase VII — Game Theory for Optimization & Robotics | Week 14 | 2.5 hours A Nash equilibrium is a profile where no player can unilaterally improve — the fixed point of rational play.
When multiple robots choose charging stations, a stable assignment is one where no single robot benefits by switching. This is a pure Nash equilibrium of the station-selection game. Finding it avoids oscillation in real-time scheduling — robots converge to a consistent plan without central coordination.
Player $i$'s best response to $s_{-i}$ is any strategy maximizing their payoff:
$$BR_i(s_{-i}) = \arg\max_{s_i \in S_i} u_i(s_i, s_{-i})$$
Best response is generally a set (multiple strategies may tie). A strategy profile $(s_1^*, \ldots, s_n^*)$ where each player best-responds to the others is special — that's an equilibrium.
A strategy profile $s^* = (s_1^*, \ldots, s_n^*)$ is a Nash Equilibrium (NE) if for every player $i$:
$$u_i(s_i^*, s_{-i}^*) \geq u_i(s_i, s_{-i}^*) \quad \forall s_i \in S_i$$
Equivalently: $s_i^* \in BR_i(s_{-i}^*)$ for all $i$. No player has a profitable unilateral deviation.
For a bimatrix $(A, B)$: 1. For each column $j$, underline the maximum entry in $A_{:,j}$ (Player 1's best response to $j$). 2. For each row $i$, underline the maximum entry in $B_{i,:}$ (Player 2's best response to $i$). 3. Any cell $(i,j)$ with both entries underlined is a pure NE.
Games can have 0, 1, or many pure NE. The number and structure vary.
A profile $s$ is Pareto efficient if no other profile makes every player at least as well off and at least one strictly better. NE need not be Pareto efficient — the Prisoner's Dilemma's unique NE (Defect, Defect) is Pareto dominated by (Cooperate, Cooperate).
$$\text{NE} \not\subseteq \text{Pareto efficient} \quad \text{and} \quad \text{Pareto efficient} \not\subseteq \text{NE}$$
import numpy as np
def best_response(payoff, player, opponent_action):
"""Return set of best-response actions for player given opponent's choice.
player: 1 (row) or 2 (column).
"""
if player == 1:
col = payoff[:, opponent_action]
max_val = np.max(col)
return set(np.where(col == max_val)[0])
else:
row = payoff[opponent_action, :]
max_val = np.max(row)
return set(np.where(row == max_val)[0])
def find_pure_nash(A, B):
"""Find all pure-strategy Nash equilibria using the underline method."""
m, n = A.shape
equilibria = []
for i in range(m):
for j in range(n):
# i must be a best response to j for player 1
if A[i, j] == np.max(A[:, j]):
# j must be a best response to i for player 2
if B[i, j] == np.max(B[i, :]):
equilibria.append((i, j))
return equilibria
def is_pareto_efficient(A, B, profile):
"""Check if a profile is Pareto efficient."""
i, j = profile
u1, u2 = A[i, j], B[i, j]
m, n = A.shape
for r in range(m):
for c in range(n):
if A[r, c] >= u1 and B[r, c] >= u2:
if A[r, c] > u1 or B[r, c] > u2:
return False
return True
# --- Demo: Classic Games ---
games = {
"Prisoner's Dilemma": (np.array([[-1,-3],[0,-2]]), np.array([[-1,0],[-3,-2]])),
"Stag Hunt": (np.array([[4,0],[3,2]]), np.array([[4,3],[0,2]])),
"Battle of Sexes": (np.array([[3,0],[0,2]]), np.array([[2,0],[0,3]])),
"Matching Pennies": (np.array([[1,-1],[-1,1]]), np.array([[-1,1],[1,-1]])),
}
for name, (A, B) in games.items():
ne = find_pure_nash(A, B)
print(f"\n=== {name} ===")
print(f" Pure NE: {ne}")
for eq in ne:
pareto = is_pareto_efficient(A, B, eq)
print(f" {eq}: payoffs=({A[eq]:.0f},{B[eq]:.0f}), Pareto efficient={pareto}")
# Best response mapping for Stag Hunt
print("\n=== Stag Hunt: Best Response Map ===")
A_sh, B_sh = games["Stag Hunt"]
for j in range(2):
br1 = best_response(A_sh, 1, j)
print(f" BR1(P2={j}) = {br1}")
for i in range(2):
br2 = best_response(B_sh, 2, i)
print(f" BR2(P1={i}) = {br2}")
Expected output:
=== Prisoner's Dilemma ===
Pure NE: [(1, 1)]
(1, 1): payoffs=(-2,-2), Pareto efficient=False
=== Stag Hunt ===
Pure NE: [(0, 0), (1, 1)]
(0, 0): payoffs=(4,4), Pareto efficient=True
(1, 1): payoffs=(2,2), Pareto efficient=False
=== Battle of Sexes ===
Pure NE: [(0, 0), (1, 1)]
(0, 0): payoffs=(3,2), Pareto efficient=True
(1, 1): payoffs=(2,3), Pareto efficient=True
=== Matching Pennies ===
Pure NE: []
=== Stag Hunt: Best Response Map ===
BR1(P2=0) = {0}
BR1(P2=1) = {1}
BR2(P1=0) = {0}
BR2(P1=1) = {1}
P1. Find all pure NE in this 3×3 game using the underline method:
$$A = \begin{pmatrix} 2 & 0 & 1 \\ 3 & 1 & 0 \\ 1 & 2 & 3 \end{pmatrix}, \quad B = \begin{pmatrix} 1 & 3 & 0 \\ 0 & 2 & 1 \\ 2 & 0 & 3 \end{pmatrix}$$
P2. Construct a 2×2 symmetric game with exactly two pure NE that are both Pareto efficient.
P3. In the Prisoner's Dilemma, the NE is not Pareto efficient. Explain intuitively why rational individual play leads to a collectively suboptimal outcome.
E1. Prove: every game in which IESDS yields a unique profile has that profile as the unique NE.
E2. Construct a 3×3 game with exactly three pure Nash equilibria. Verify computationally.
A = np.array([[3, 0, 0],
[0, 3, 0],
[0, 0, 3]])
B = A.copy() # symmetric coordination game
ne = find_pure_nash(A, B)
print(f"NE: {ne}") # [(0,0), (1,1), (2,2)]
Three NE on the diagonal: (0,0), (1,1), (2,2) — each giving payoff (3,3). Off-diagonal gives 0 to both, so no deviation is profitable.
E3. A congestion game has 3 robots and 2 stations; cost at station $k$ is $c_k \cdot n_k$ where $n_k$ is the number of robots at station $k$ and $c_1=1, c_2=2$. Enumerate all profiles and find all pure NE.