How the Cube Doctor actually works.
The headline math has calibrated cleanly against the gold-standard reference cubes (MTGO Vintage Cube, Lucky 7's). The inner ring of cube discourse will reasonably press on what's under the hood — which numbers are derived how, what the sim does and doesn't model, and where the role taxonomy hits its limits. This page answers those questions plainly, including the parts that are AI-derived.
What's AI-derived, what's not.
Every number that shows up in a Cube Doctor report flows from one of three sources: Gemini's nightly per-card enrichment, hand-coded heuristics, or pure-CPU simulation. Knowing which is which changes how much weight any single finding deserves.
| Layer | Source | Notes |
|---|---|---|
| Canonical role-ratio targets (Aristocrats 1:1 outlet:payoff, etc.) | Human | Heuristics from cube-design literature (Lucky Paper, MTGGoldfish, Tom LaPille's WotC columns). Hand-edited; not yet empirically tuned. |
| Per-card role classifications (which cards are payoffs / enablers / outlets in each archetype) | AI | Gemini batch API, 95 cards per request. Re-runs nightly on new printings. |
| Per-card power level (the 0–10 used in Power Band, Wheel Predictor) | AI | Gemini batch API, same nightly pass. |
| Per-card mechanical summary, synergy explanations, signature partners | AI | Gemini batch API, same nightly pass. |
| The simulator (draft order, deck construction, DQI computation) | Code | Pure-CPU TypeScript port of Cardivore's on-device Dart sim. No model calls. |
| Drafter policies (balanced / greedy / synergy) | Code | Hand-coded heuristics; see section 3. |
| Format detection, color-balance computation, signpost detection | Code | Pure-CPU analyzers. |
| Doctor's read prose, weakness write-ups, cuts-to-consider list (only when the AI box is checked) | AI | Claude Opus, per run. Deterministic runs skip this entirely. |
The practical implication: a finding like
"Aristocrats payoff:enabler is 0.4:1 vs canonical 1:1" depends
on both the AI layer and the human layer. The
0.4 comes from Gemini's bucketing of each
card in your cube; the 1:1 comes from a
hand-edited heuristic in our analyzer. Either layer can be wrong
about your specific cube. The triage system — which classifies each finding
as actionable, monitor, or tagging artifact
before it hits the report — is the third layer that catches cases
where the first two disagree on a card the cube genuinely supports.
MTGO Vintage Cube returned 14 findings; the triage correctly muted
all 14 as tagging artifacts. That's the layer that earns the
tool the right to flag anything.
Where the role-ratio targets come from.
Below are the full canonical role-ratio targets the analyzer compares each cube's actual counts against. They're explicit heuristics, not measurements — and they're the half of the equation that's most likely to need tuning as the tool sees more cubes.
| Archetype | Expected ratio | Why |
|---|---|---|
| aristocrats | outlet 1.0 : payoff 1.0 | Starve either side and the engine bricks: outlets without payoffs are card disadvantage, payoffs without outlets sit in hand. |
| ramp | enabler 2.0 : payoff 1.0 | Reviewers consistently flag "lots of mana, nowhere to spend it" or "payoffs but no acceleration." |
| spells-matter | payoff 1.0 : enabler 3.0 | Front-loaded — the deck wants 3 cheap spells per payoff to actually trigger the engine. |
| reanimator | enabler 1.0 : payoff 1.5 : outlet 1.0 | Discard / mill ≈ big creatures, with at least one tutor. |
| tribal | payoff 1.0 : enabler 4.0 | A couple of lord-style payoffs over a deep creature bench. |
| combo | enabler 1.0 : payoff 1.0 : finisher 0.5 | Both halves plus tutors to assemble. |
| control | removal 1.0 : draw 0.5 : finisher 0.3 | Removal-heavy with card advantage and a small finisher count (1–2 closers in a 40-card draft deck). |
| aggro | payoff 1.0 : enabler 0.5 | Pump / anthem payoffs over haste / evasion bodies. |
| tokens | payoff 1.0 : enabler 2.0 | Payoffs (anthems, sacrifice synergies) per token producer. |
| graveyard | outlet 1.0 : payoff 1.5 | Outlets feed; payoffs reward. |
| voltron | payoff 1.0 : enabler 3.0 | One payoff (commander / aura target) plus N protection / equipment. |
| artifacts | payoff 1.0 : enabler 2.0 | Affinity-style payoffs per ~2 cheap-artifact enablers. |
| midrange | removal 1.0 : payoff 2.0 | The "no real ratio" archetype — generic balance check. |
The triage path is what makes findings shippable despite the ratios being heuristic. After the analyzer surfaces a finding, a secondary pass — LLM-driven in AI runs, rule-based in deterministic runs — classifies each one as actionable (the cube probably wants this fixed), monitor (real imbalance, but the sim shows the lane works anyway), or tagging artifact (the finding fires only because of how cards got bucketed, not because the cube has a real gap). The MTGO Vintage Cube result — 14 findings surfaced, 0 actionable, 14 muted as artifacts — is the calibration test that says the triage works.
What the drafter actually does — and what it doesn't.
The sim doesn't play games. It drafts and constructs decks; DQI is a synthetic deck-quality score, not win-rate. Knowing what the drafter can't see is more useful than knowing what it can.
The 70 / 15 / 15 mix
Every 8-player draft pod runs a mix of three policies:
- Balanced (70%) — the AdaptivePolicy. Weights both raw card quality and archetype fit; adjusts its mix as the pool develops. The closest the bots get to a competent human drafter doing rate-and-signal reading.
- Greedy (15%) — weights raw card quality and largely ignores synergy. Closes the gap on "what would a new drafter forced to value-pick everything do?"
- Synergy (15%) — heavily weights archetype fit once it commits to a color pair. Closes the gap on "what would a drafter who's already 5 picks into Storm do?"
The mix is a calibration compromise. A 100% balanced sim would underrate sleepers (every bot picks generic value first, so payoff cards wheel and the analyzer thinks they're dead). A 100% synergy sim would underrate generic value cards (every bot forces archetype too early, so Lightning Bolt and Mulldrifter look "weak" because no one picks them). The mix is set to where it is so neither failure mode dominates.
What the drafter doesn't model
- Castability for fragile multi-color. The
drafter picks 3+ color cards on raw power; the deck-builder
evicts them when the mana base can't support them. Result:
cards like Sphinx of the Steel Wind, Currency Converter,
Mana Crypt sometimes show "high pick rate, ~0% maindeck"
patterns that look like cube traps but may be the sim
getting castability wrong. That's exactly why the Trap
Detector now ships a
⚠pill on 3+ color cards with 0% conversion, and why the Traps & Sleepers scatter has a conversion-rate veto that won't label a card with ≥70% maindeck conversion as a Trap or Dead Weight no matter what its DQI looks like. - Sideboarding. The sim doesn't simulate sideboard plans. Cards that are "great in a sideboard" register as drafted-but-not-maindecked.
- Metagame knowledge. Bots don't know "everyone always picks this first" or "this card is hate against the dominant strategy." They draft each pack from scratch.
- Player skill variance. All 8 seats run the same policy mix. No "good drafter vs bad drafter" modeling.
- Real win-rate. No games get played. DQI is a deck-quality proxy (threat density, answer density, curve, consistency) — well-correlated with win-rate in practice but not the thing itself.
Is the 6-role taxonomy enough?
Short answer: it's enough for most cubes and not enough for archetypes built on sub-roles the taxonomy flattens.
The role-ratio analyzer buckets cards into six roles per
archetype: outlet,
payoff,
enabler,
finisher,
removal,
draw. That's enough to cleanly
describe aggro, midrange, control, ramp, tokens, voltron,
generic spells-matter, generic tribal, artifacts, and
aristocrats.
Where it gets coarse
The cases where the taxonomy flattens distinct mechanics into one role label:
- Storm wants cost-reducer +
mass-draw + kill-spell as three distinct
roles. We flatten cost-reducer and mass-draw into
enabler, and the kill spell intofinisher. The analyzer can say "your enabler density is good" but it can't say "you have plenty of cost-reducers but no mass-draw" — even though that's the actual problem on most Storm cubes. - Aristocrats wants
death-trigger-payoff + sac-outlet +
token-generator as three slots. We ship
payoff+outlet+enabler, which captures the loop coarsely but misses the token-generation angle that most aristocrats cubes are actually fueled by. - Reanimator wants discard-outlet
and self-mill as distinct sub-roles within
outlet— they're not interchangeable in practice, but the analyzer treats them as one bucket.
If your cube is built around an archetype that needs sub-roles, expect the role-ratio analyzer to be less informative on that lane than on a generic midrange one.
What we do have at finer granularity
Every card in the database carries an
archetypeSupport map and a
synergyExplanations[] list from
the per-card enrichment. Each entry is an archetype-tag plus
a one-line why — so Goblin Electromancer has
explicit explanations for
"spells-matter (cost reduction enables instants /
sorceries)" and "control (cantripping a 2/2 body)".
When the AI-mode Doctor reads the cube, it sees these
per-card explanations and can reference them in prose. The
role-ratio analyzer doesn't currently consume them at
sub-role granularity. That's the next iteration, not
promised on a date.
How the AI was prompted.
Section 1 listed which numbers come from Gemini. This section shows what we actually ask it. Putting it on a methodology page so anyone who wants to assess "are these judgments shaped by reasonable questions" can read the prompts verbatim instead of guessing.
The framing
Every card in our database is enriched once by the same prompt (and re-run when we ship a new schema version). The prompt opens by framing the analyst's role and what their output is used for:
You are an expert Magic: The Gathering card analyst specializing
in cube and constructed formats.
Analyze this card for deck building and cube construction. Your
output is consumed by the Cardivore Cube Doctor — a tool that
diagnoses cubes and proposes swaps. Every field you fill will be
read by downstream code OR quoted directly by the cube doctor's
narrator when explaining recommendations to a human cube builder.
The OUTPUT SHAPE is enforced by the API contract — you don't need
to worry about field names, types, enum values, or whether to use
empty arrays vs. null. The system handles that. Your job is to
fill each field with the SUBSTANTIVE content that makes the
analysis useful. Output shape is enforced via Gemini's structured-output mode against a Pydantic schema — the model can't return malformed JSON or invalid enum values; the API rejects them before they hit our pipeline. The prompt only carries semantic guidance.
The load-bearing field definitions
Most of the per-card numbers that show up in your report come from these five fields. Each one is asked for with explicit examples and explicit failure modes. Verbatim from the prompt:
rawPowerLevel (0–10 scale)
rawPowerLevel — overall impact in a typical cube environment. bandPlacement (enum)
bandPlacement — pick the closest fit: low (peasant/pauper-tier),
mid-low, mid (typical cube median), mid-high, high (vintage-adjacent),
vintage (restricted/banned-tier in lower formats). This REPLACES the
cube doctor's hand-tuned power-ceiling clip — picking the wrong band
ships a card into a cube it doesn't belong in. identityStrength (0–1)
identityStrength (0-1) — how strongly this card SIGNALS lane
identity to a drafter looking at it in pack 1. Distinct from
signpostLevel: a payoff might score high signpostLevel without
strongly signaling the deck (utility synergy glue). Spider Spawning
would be ~0.95 (obvious lane signal); Counterspell ~0.2 (plays in
every Blue deck). comboAnchor (boolean)
comboAnchor — true iff this card has an obvious 2-card combo it's
the lynchpin for. Splinter Twin = true; Lightning Bolt = false.
Powers the cube doctor's micro-combo scanner directly — false
positives WILL get surfaced as cube preservation flags, so be
conservative. synergyExplanations (top-3 archetype tags, with reasons)
synergyExplanations — for the TOP 3 synergyTags, a {tag, why}
object explaining HOW the card fits the tag in 1 sentence. For
Blood Artist: {"tag":"Aristocrats","why":"Triggers on every
creature death, draining 1 from the opponent."}. The cube doctor
narrator quotes these directly so rationales reference REAL
mechanics instead of templated "fits the aristocrats theme"
boilerplate. Empty when synergyTags is empty. replacedBy (strictly-better card names)
replacedBy — strictly-better alternatives in the same role.
Lightning Strike → ["Lightning Bolt"]. Cancel → ["Counterspell",
"Mana Leak", "Negate"]. Empty when this card IS the format
reference or has unique modes nothing else matches. Why this matters for findings
When a report says "Aristocrats payoff:enabler is 0.4:1,"
the 0.4 rolls up from per-card
synergyExplanations + an
archetypeSupport map populated
by the same prompt above. When the Power Band chart shows
a card sitting in the high-power outlier band, that's the
rawPowerLevel field. When the
micro-combo scanner flags a card as a combo piece, that's
comboAnchor.
Reading the prompts lets you assess whether the questions we're asking Gemini are the right questions to ground the downstream analysis. They're explicit, they ship working examples, and they call out specific failure modes the model needs to avoid. They're not perfect; the per-card enrichment occasionally mislabels cards (we've seen Goblin Electromancer's "spells-matter (cost reduction)" come back as a generic "control" tag, for instance). When that happens the role-ratio finding it drives gets flagged by the triage layer in most cases. When it doesn't get caught there, it ends up in a tagging-artifact bucket on the report — which is the visible failure mode the inner ring can dunk on, and we'd rather you do than not.
Questions, corrections, gripes
If a ratio reads wrong on your cube, a sim assumption looks off, or the role taxonomy is hiding something important about a specific archetype you care about — we want to hear about it. The Cardivore app has an in-app feedback channel that comes straight to us; bring your reasoning and ideally a cube link we can run against the change.