Catan RL Bot
Started 8/24/2025
Training an RL agent to play Settlers of Catan end-to-end.
- Environment: Custom Catan gym-style environment with legal action masking.
- Algorithms: PPO and A2C baselines; exploring self-play with population-based training.
- State: Board encodings (hex types, numbers, ports), player inventories, dev cards, turn phase.
- Action space: Placement/build/trade/knight/monopoly/year-of-plenty/end-turn, with masking.
- Reward shaping: Win/loss, VP deltas, intermediate rewards for road/settlement/city strategies.
Planned:
- Curriculum (setup → early game → mid/late).
- Opponent pools and Elo tracking.
- Heuristic + search (MCTS) hybrid for planning-heavy phases.
Links (todo):
- Repo: TBA
- Demo match logs: TBA
Progress
2025-10-06 — Bank exhaustion payouts and turn flow
- Patched the resource distribution routine to honor the official Catan rule when the bank runs out mid-roll.
- Added a guard that detects when available stock can’t satisfy all eligible players and cancels the payout for everyone (except the single-beneficiary edge case, where the remainder is delivered).
- Embedded inline docs:
# Enforce bank exhaustion rules: if a resource cannot be fully paid out # to all eligible players, nobody receives it (unless only one player # would benefit, in which case they take whatever remains). - Wrote regression coverage around dice rolls that hit depleted resources to ensure the agent doesn’t train on illegal windfalls.
- Added an explicit
END_TURNaction so the policy can advance phases without dirtying dice state; this also fixed a bug where the environment reused resolved dice values across turns.
2025-09-14 — Reverse‑engineering, crawler, event engine, action space
- Reverse‑engineered Colonist.io: identified leaderboard, profile history, and replay JSON APIs; mapped engine enums/payload shapes and codified them as strict types.
- Crawler: cookie‑auth click CLI to fetch leaderboards → profiles → replays with per‑user and per‑replay concurrency; saves under
data/raw/colonist/{1v1,4v4}/<timestamp>. - Event engine: typed mapped events and a state‑change applier that translates raw
state_changesinto high‑level actions; validator replays mapped vs raw to reproduce final boards/winners and writes diffs plus board PNGs. - Action space: practical starter head in the env
[op, edge, node]with 1‑based IDs and turn masking; expansion planned for trades and dev‑card plays. - Normalization: v1 records with
initial_state,state_changes,events_mapped,end_game_state, plusgame/players/board/settings/result/quality; optional profile enrichment and ranked‑stats caching. - Next: finalize geometry/indexing maps; structured observations + legal action masks; BC dataset shards; PPO self‑play loop.
2025-09-06 — Data pipeline milestone
- Crawler: cookie-based CLI with leaderboard→profile→replay flow; supports 1v1 and 4p queues, per-user and per-replay concurrency; mode-specific replay folders (e.g.,
1v1/<timestamp>/replays). - Normalizer: emits
initial_state, fullstate_changes,final_state; convenienceboard,players(optional ranked-stats enrichment with cache),settings,result, and compactevents_mapped(roads, setup placements, bank trades, discard, dev plays knight/road_building/year_of_plenty/monopoly). - Validator + engine: validator replays raw vs mapped events and compares final board + winner; reports diffs. Engine scaffolding for applying raw/mapped events and snapshots to support validation.
- Docs: AGENTS updated with Crawl & Normalize usage, cookie setup, and output layout.