About This Project
How it was built and why.
Motivation
Advanced Squad Leader has one of the most complex rulebooks of any board game — over 1 million tokens, densely cross-referenced, with numerous tables, exceptions, and numerical calculations. Looking up rules mid-game is slow and error-prone even for experienced players.
This project explores whether a RAG-based AI assistant can reliably answer rules questions in a domain where precision matters: wrong answers cause incorrect gameplay, and the source document is large enough that naive retrieval frequently misses relevant context.
System Architecture
FastAPI · OpenAI Responses API · WebSocket streaming · PostgreSQL · AI Judge evaluation
The backend is a FastAPI application running on Digital Ocean behind nginx. Chat is delivered over a persistent WebSocket connection, streaming tokens to the browser as they are generated.
Each question goes through the following pipeline:
-
Retrieval — The question is sent to the
OpenAI Responses API
with a
file_searchtool configured against a vector store of the rulebook. Up to 20 chunks are retrieved per query. - Generation — Retrieved chunks are injected into a structured system prompt that instructs the model to cite specific rule section numbers (e.g. A4.34) in every answer. Conversation history is included for multi-turn context.
- Streaming — The response streams token-by-token over WebSocket. The frontend renders markdown progressively and tracks time-to-first-token (TTFT), RAG latency, total response time, token counts, and estimated cost per query.
Rule citations in responses are rendered as clickable links that open an in-browser PDF viewer at the exact rulebook page, using a pre-built section-to-page index.
Evaluation — every chat interaction is logged. Structured eval sets are run against the model and scored by a zero-shot AI Judge (Pass / Fail / Needs Review). Flagged responses are then reviewed manually to confirm or correct the judge's verdict, giving a human-verified accuracy signal across model versions.
Why This Domain Is Challenging
Several properties of the ASL rulebook make it harder than typical RAG use cases:
- Scale — At 1M+ tokens, the rulebook is ~2.5× the length of Deep Learning by Goodfellow et al. Standard context windows cannot hold it.
- Cross-references — Rules routinely reference other rules by section number. A complete answer often requires retrieving from multiple non-adjacent sections.
- Multimodal content — The original document contains hex maps, counter diagrams, and fire tables. Converting to text loses visual context that some rules depend on.
- Precision — Answers that are mostly right but miss an exception or modifier produce incorrect gameplay. Vague answers have no value here.
- Calculations — Many questions require chaining lookups across multiple tables (terrain effects, unit stats, fire columns) that pure retrieval cannot resolve alone.
Evaluation
Model performance is tracked using a structured eval pipeline. Each eval consists of a question and a reference answer. After running a model against the eval set, a zero-shot AI Judge scores each response as Pass, Fail, or Needs Review. Results flagged for review are assessed manually.
Evals are split by question type — Recall (pure rule lookup) and Calc (recall + computation) — since these require different capabilities and tend to show different failure modes.
About Advanced Squad Leader
Advanced Squad Leader is a tactical-level hex-and-counter wargame covering WW2 and the Korean War, published by Avalon Hill in 1985. It is widely regarded as the most comprehensive and detailed squad-level wargame ever published. See the Wikipedia entry.
kevmo.us is not affiliated with Hasbro, Avalon Hill Games, Inc., or Multi-Man Publishing, Inc.
Advanced Squad Leader is a trademark of Avalon Hill Games, Inc.