Self-Improving Agents
Started 9/7/2025
Designing and iterating on autonomous agents that can improve themselves through tight, measurable loops. The goal is to turn agent behaviors and fixes into a continual dataset that raises capability and reliability over time.
Goals:
- Close the loop: generate → evaluate → critique → patch → verify → deploy.
- Capture high-signal traces, failures, and fixes to fuel fine-tuning/RM updates.
- Keep everything measurable: clear success criteria, regression tests, and dashboards.
Components:
- Orchestrator: task router + run manager with structured traces and artifacts.
- Evaluators: unit/integration tests, adversarial prompts, safety/guardrail checks.
- Memory/Data: append-only store of runs, critiques, diffs, and outcomes; retrieval for planning.
- Skills/Tools: code, shell, web, structured APIs; composable tool-using plans.
- Training: small fine-tunes (SFT/DPO) and reward modeling from curated traces.
- Sandbox: deterministic runners with fixtures for reproducible evaluation.
Planned:
- Seed task suites (coding, data wrangling, web research) with clear metrics.
- Failure harvesting: auto-minimize and save failing traces as regression tests.
- Self-edit flow: draft change proposals (PR-style diff), run eval battery, gate on pass.
- RM-assisted critique to prioritize high-impact fixes and reduce regressions.
- Dashboard: success rates, time-to-fix, and distribution shift alerts.
Progress:
2025-09-07 — Initial scope and plan
- Added project entry and outlined components and milestones.
- Next: scaffold evaluator harness and a minimal seed task set.
Links (todo): Repo + design doc.