Catan RL Bot

Started 8/24/2025

Training an RL agent to play Settlers of Catan end-to-end.

Environment: Custom Catan gym-style environment with legal action masking.
Algorithms: PPO and A2C baselines; exploring self-play with population-based training.
State: Board encodings (hex types, numbers, ports), player inventories, dev cards, turn phase.
Action space: Placement/build/trade/knight/monopoly/year-of-plenty/end-turn, with masking.
Reward shaping: Win/loss, VP deltas, intermediate rewards for road/settlement/city strategies.

Planned:

Curriculum (setup → early game → mid/late).
Opponent pools and Elo tracking.
Heuristic + search (MCTS) hybrid for planning-heavy phases.

Links (todo):

Repo: TBA
Demo match logs: TBA

Progress

2025-10-06 — Bank exhaustion payouts and turn flow

Patched the resource distribution routine to honor the official Catan rule when the bank runs out mid-roll.
Added a guard that detects when available stock can’t satisfy all eligible players and cancels the payout for everyone (except the single-beneficiary edge case, where the remainder is delivered).

Embedded inline docs:

# Enforce bank exhaustion rules: if a resource cannot be fully paid out
# to all eligible players, nobody receives it (unless only one player
# would benefit, in which case they take whatever remains).

Wrote regression coverage around dice rolls that hit depleted resources to ensure the agent doesn’t train on illegal windfalls.
Added an explicit END_TURN action so the policy can advance phases without dirtying dice state; this also fixed a bug where the environment reused resolved dice values across turns.

2025-09-14 — Reverse‑engineering, crawler, event engine, action space

Reverse‑engineered Colonist.io: identified leaderboard, profile history, and replay JSON APIs; mapped engine enums/payload shapes and codified them as strict types.
Crawler: cookie‑auth click CLI to fetch leaderboards → profiles → replays with per‑user and per‑replay concurrency; saves under data/raw/colonist/{1v1,4v4}/<timestamp>.
Event engine: typed mapped events and a state‑change applier that translates raw state_changes into high‑level actions; validator replays mapped vs raw to reproduce final boards/winners and writes diffs plus board PNGs.
Action space: practical starter head in the env [op, edge, node] with 1‑based IDs and turn masking; expansion planned for trades and dev‑card plays.
Normalization: v1 records with initial_state, state_changes, events_mapped, end_game_state, plus game/players/board/settings/result/quality; optional profile enrichment and ranked‑stats caching.
Next: finalize geometry/indexing maps; structured observations + legal action masks; BC dataset shards; PPO self‑play loop.

2025-09-06 — Data pipeline milestone

Crawler: cookie-based CLI with leaderboard→profile→replay flow; supports 1v1 and 4p queues, per-user and per-replay concurrency; mode-specific replay folders (e.g., 1v1/<timestamp>/replays).
Normalizer: emits initial_state, full state_changes, final_state; convenience board, players (optional ranked-stats enrichment with cache), settings, result, and compact events_mapped (roads, setup placements, bank trades, discard, dev plays knight/road_building/year_of_plenty/monopoly).
Validator + engine: validator replays raw vs mapped events and compares final board + winner; reports diffs. Engine scaffolding for applying raw/mapped events and snapshots to support validation.
Docs: AGENTS updated with Crawl & Normalize usage, cookie setup, and output layout.