About This Project

How it was built and why.

Motivation

Advanced Squad Leader has one of the most complex rulebooks of any board game — over 1 million tokens, densely cross-referenced, with numerous tables, exceptions, and numerical calculations. Looking up rules mid-game is slow and error-prone even for experienced players.

This project explores whether a RAG-based AI assistant can reliably answer rules questions in a domain where precision matters: wrong answers cause incorrect gameplay, and the source document is large enough that naive retrieval frequently misses relevant context.


System Architecture

FastAPI · OpenAI Responses API · WebSocket streaming · PostgreSQL · AI Judge evaluation

The backend is a FastAPI application running on Digital Ocean behind nginx. Chat is delivered over a persistent WebSocket connection, streaming tokens to the browser as they are generated.

Each question goes through the following pipeline:

  • Retrieval — The question is sent to the OpenAI Responses API with a file_search tool configured against a vector store of the rulebook. Up to 20 chunks are retrieved per query.
  • Generation — Retrieved chunks are injected into a structured system prompt that instructs the model to cite specific rule section numbers (e.g. A4.34) in every answer. Conversation history is included for multi-turn context.
  • Streaming — The response streams token-by-token over WebSocket. The frontend renders markdown progressively and tracks time-to-first-token (TTFT), RAG latency, total response time, token counts, and estimated cost per query.

Rule citations in responses are rendered as clickable links that open an in-browser PDF viewer at the exact rulebook page, using a pre-built section-to-page index.

Evaluation — every chat interaction is logged. Structured eval sets are run against the model and scored by a zero-shot AI Judge (Pass / Fail / Needs Review). Flagged responses are then reviewed manually to confirm or correct the judge's verdict, giving a human-verified accuracy signal across model versions.


Why This Domain Is Challenging

Several properties of the ASL rulebook make it harder than typical RAG use cases:

  • Scale — At 1M+ tokens, the rulebook is ~2.5× the length of Deep Learning by Goodfellow et al. Standard context windows cannot hold it.
  • Cross-references — Rules routinely reference other rules by section number. A complete answer often requires retrieving from multiple non-adjacent sections.
  • Multimodal content — The original document contains hex maps, counter diagrams, and fire tables. Converting to text loses visual context that some rules depend on.
  • Precision — Answers that are mostly right but miss an exception or modifier produce incorrect gameplay. Vague answers have no value here.
  • Calculations — Many questions require chaining lookups across multiple tables (terrain effects, unit stats, fire columns) that pure retrieval cannot resolve alone.

Evaluation

Model performance is tracked using a structured eval pipeline. Each eval consists of a question and a reference answer. After running a model against the eval set, a zero-shot AI Judge scores each response as Pass, Fail, or Needs Review. Results flagged for review are assessed manually.

Evals are split by question type — Recall (pure rule lookup) and Calc (recall + computation) — since these require different capabilities and tend to show different failure modes.

View current evaluation results →


About Advanced Squad Leader

Advanced Squad Leader is a tactical-level hex-and-counter wargame covering WW2 and the Korean War, published by Avalon Hill in 1985. It is widely regarded as the most comprehensive and detailed squad-level wargame ever published. See the Wikipedia entry.


kevmo.us is not affiliated with Hasbro, Avalon Hill Games, Inc., or Multi-Man Publishing, Inc.

Advanced Squad Leader is a trademark of Avalon Hill Games, Inc.