Methodology

How the Cube Doctor actually works.

The headline math has calibrated cleanly against the gold-standard reference cubes (MTGO Vintage Cube, Lucky 7's). The inner ring of cube discourse will reasonably press on what's under the hood — which numbers are derived how, what the sim does and doesn't model, and where the role taxonomy hits its limits. This page answers those questions plainly, including the parts that are AI-derived.

01 · Sources

What's AI-derived, what's not.

Every number that shows up in a Cube Doctor report flows from one of three sources: Gemini's nightly per-card enrichment, hand-coded heuristics, or pure-CPU simulation. Knowing which is which changes how much weight any single finding deserves.

Layer	Source	Notes
Canonical role-ratio targets (Aristocrats 1:1 outlet:payoff, etc.)	Human	Heuristics from cube-design literature (Lucky Paper, MTGGoldfish, Tom LaPille's WotC columns). Hand-edited; not yet empirically tuned.
Per-card role classifications (which cards are payoffs / enablers / outlets in each archetype)	AI	Gemini batch API, 95 cards per request. Re-runs nightly on new printings.
Per-card power level (the 0–10 used in Power Band, Wheel Predictor)	AI	Gemini batch API, same nightly pass.
Per-card mechanical summary, synergy explanations, signature partners	AI	Gemini batch API, same nightly pass.
The simulator (draft order, deck construction, DQI computation)	Code	Pure-CPU TypeScript port of Cardivore's on-device Dart sim. No model calls.
Drafter policies (balanced / greedy / synergy)	Code	Hand-coded heuristics; see section 3.
Format detection, color-balance computation, signpost detection	Code	Pure-CPU analyzers.
Doctor's read prose, weakness write-ups, cuts-to-consider list (only when the AI box is checked)	AI	Claude Opus, per run. Deterministic runs skip this entirely.

The practical implication: a finding like "Aristocrats payoff:enabler is 0.4:1 vs canonical 1:1" depends on both the AI layer and the human layer. The 0.4 comes from Gemini's bucketing of each card in your cube; the 1:1 comes from a hand-edited heuristic in our analyzer. Either layer can be wrong about your specific cube. The triage system — which classifies each finding as actionable, monitor, or tagging artifact before it hits the report — is the third layer that catches cases where the first two disagree on a card the cube genuinely supports. MTGO Vintage Cube returned 14 findings; the triage correctly muted all 14 as tagging artifacts. That's the layer that earns the tool the right to flag anything.

None of this should be a surprise on a methodology page. Tools that hide their AI dependencies lose credibility the moment someone reads the source. Tools that surface the split honestly let the reader weight each finding by what's actually under it. That's the whole pitch here.

02 · Canonical ratios

Where the role-ratio targets come from.

Below are the full canonical role-ratio targets the analyzer compares each cube's actual counts against. They're explicit heuristics, not measurements — and they're the half of the equation that's most likely to need tuning as the tool sees more cubes.

Archetype	Expected ratio	Why
aristocrats	outlet 1.0 : payoff 1.0	Starve either side and the engine bricks: outlets without payoffs are card disadvantage, payoffs without outlets sit in hand.
ramp	enabler 2.0 : payoff 1.0	Reviewers consistently flag "lots of mana, nowhere to spend it" or "payoffs but no acceleration."
spells-matter	payoff 1.0 : enabler 3.0	Front-loaded — the deck wants 3 cheap spells per payoff to actually trigger the engine.
reanimator	enabler 1.0 : payoff 1.5 : outlet 1.0	Discard / mill ≈ big creatures, with at least one tutor.
tribal	payoff 1.0 : enabler 4.0	A couple of lord-style payoffs over a deep creature bench.
combo	enabler 1.0 : payoff 1.0 : finisher 0.5	Both halves plus tutors to assemble.
control	removal 1.0 : draw 0.5 : finisher 0.3	Removal-heavy with card advantage and a small finisher count (1–2 closers in a 40-card draft deck).
aggro	payoff 1.0 : enabler 0.5	Pump / anthem payoffs over haste / evasion bodies.
tokens	payoff 1.0 : enabler 2.0	Payoffs (anthems, sacrifice synergies) per token producer.
graveyard	outlet 1.0 : payoff 1.5	Outlets feed; payoffs reward.
voltron	payoff 1.0 : enabler 3.0	One payoff (commander / aura target) plus N protection / equipment.
artifacts	payoff 1.0 : enabler 2.0	Affinity-style payoffs per ~2 cheap-artifact enablers.
midrange	removal 1.0 : payoff 2.0	The "no real ratio" archetype — generic balance check.

Honest caveat. These are starter expectations from cube-design writing, not measurements against real cube data. We plan to tune them against a corpus of well-curated published cubes over time. Treat any single ratio finding as a starting hypothesis, not a verdict.

The triage path is what makes findings shippable despite the ratios being heuristic. After the analyzer surfaces a finding, a secondary pass — LLM-driven in AI runs, rule-based in deterministic runs — classifies each one as actionable (the cube probably wants this fixed), monitor (real imbalance, but the sim shows the lane works anyway), or tagging artifact (the finding fires only because of how cards got bucketed, not because the cube has a real gap). The MTGO Vintage Cube result — 14 findings surfaced, 0 actionable, 14 muted as artifacts — is the calibration test that says the triage works.

03 · The sim drafter

What the drafter actually does — and what it doesn't.

The sim doesn't play games. It drafts and constructs decks; DQI is a synthetic deck-quality score, not win-rate. Knowing what the drafter can't see is more useful than knowing what it can.

The 70 / 15 / 15 mix

Every 8-player draft pod runs a mix of three policies:

Balanced (70%) — the AdaptivePolicy. Weights both raw card quality and archetype fit; adjusts its mix as the pool develops. The closest the bots get to a competent human drafter doing rate-and-signal reading.
Greedy (15%) — weights raw card quality and largely ignores synergy. Closes the gap on "what would a new drafter forced to value-pick everything do?"
Synergy (15%) — heavily weights archetype fit once it commits to a color pair. Closes the gap on "what would a drafter who's already 5 picks into Storm do?"

The mix is a calibration compromise. A 100% balanced sim would underrate sleepers (every bot picks generic value first, so payoff cards wheel and the analyzer thinks they're dead). A 100% synergy sim would underrate generic value cards (every bot forces archetype too early, so Lightning Bolt and Mulldrifter look "weak" because no one picks them). The mix is set to where it is so neither failure mode dominates.

What the drafter doesn't model

Castability for fragile multi-color. The drafter picks 3+ color cards on raw power; the deck-builder evicts them when the mana base can't support them. Result: cards like Sphinx of the Steel Wind, Currency Converter, Mana Crypt sometimes show "high pick rate, ~0% maindeck" patterns that look like cube traps but may be the sim getting castability wrong. That's exactly why the Trap Detector now ships a ⚠ pill on 3+ color cards with 0% conversion, and why the Traps & Sleepers scatter has a conversion-rate veto that won't label a card with ≥70% maindeck conversion as a Trap or Dead Weight no matter what its DQI looks like.
Sideboarding. The sim doesn't simulate sideboard plans. Cards that are "great in a sideboard" register as drafted-but-not-maindecked.
Metagame knowledge. Bots don't know "everyone always picks this first" or "this card is hate against the dominant strategy." They draft each pack from scratch.
Player skill variance. All 8 seats run the same policy mix. No "good drafter vs bad drafter" modeling.
Real win-rate. No games get played. DQI is a deck-quality proxy (threat density, answer density, curve, consistency) — well-correlated with win-rate in practice but not the thing itself.

Compression-aware labels. On heavily-iterated cubes (MTGO Vintage Cube, Lucky 7's, Modern Cube) every card is close to every other card on DQI — the spread is inside the documented 0.15 noise floor. When the chart detects this tight compression it suppresses the Premium / Sleeper / Trap / Dead Weight labels and explains why, instead of promoting sim noise to category labels. Wider-distribution cubes still get the normal labeled scatter.

04 · Role taxonomy

Is the 6-role taxonomy enough?

Short answer: it's enough for most cubes and not enough for archetypes built on sub-roles the taxonomy flattens.

The role-ratio analyzer buckets cards into six roles per archetype: outlet, payoff, enabler, finisher, removal, draw. That's enough to cleanly describe aggro, midrange, control, ramp, tokens, voltron, generic spells-matter, generic tribal, artifacts, and aristocrats.

Where it gets coarse

The cases where the taxonomy flattens distinct mechanics into one role label:

Storm wants cost-reducer + mass-draw + kill-spell as three distinct roles. We flatten cost-reducer and mass-draw into enabler, and the kill spell into finisher. The analyzer can say "your enabler density is good" but it can't say "you have plenty of cost-reducers but no mass-draw" — even though that's the actual problem on most Storm cubes.
Aristocrats wants death-trigger-payoff + sac-outlet + token-generator as three slots. We ship payoff + outlet + enabler, which captures the loop coarsely but misses the token-generation angle that most aristocrats cubes are actually fueled by.
Reanimator wants discard-outlet and self-mill as distinct sub-roles within outlet — they're not interchangeable in practice, but the analyzer treats them as one bucket.

If your cube is built around an archetype that needs sub-roles, expect the role-ratio analyzer to be less informative on that lane than on a generic midrange one.

What we do have at finer granularity

Every card in the database carries an archetypeSupport map and a synergyExplanations[] list from the per-card enrichment. Each entry is an archetype-tag plus a one-line why — so Goblin Electromancer has explicit explanations for "spells-matter (cost reduction enables instants / sorceries)" and "control (cantripping a 2/2 body)". When the AI-mode Doctor reads the cube, it sees these per-card explanations and can reference them in prose. The role-ratio analyzer doesn't currently consume them at sub-role granularity. That's the next iteration, not promised on a date.

The honest version. The taxonomy is good enough to ship findings on most cubes and not good enough to ship findings on archetype-specific sub-mechanics. The tool should — and does — flag fewer things on cubes built on those sub-mechanics than on a generic midrange shell. That's a feature; pretending the analyzer can fully characterize Storm would be the actual failure.

05 · The actual prompts

How the AI was prompted.

Section 1 listed which numbers come from Gemini. This section shows what we actually ask it. Putting it on a methodology page so anyone who wants to assess "are these judgments shaped by reasonable questions" can read the prompts verbatim instead of guessing.

The framing

Every card in our database is enriched once by the same prompt (and re-run when we ship a new schema version). The prompt opens by framing the analyst's role and what their output is used for:

You are an expert Magic: The Gathering card analyst specializing
in cube and constructed formats.

Analyze this card for deck building and cube construction. Your
output is consumed by the Cardivore Cube Doctor — a tool that
diagnoses cubes and proposes swaps. Every field you fill will be
read by downstream code OR quoted directly by the cube doctor's
narrator when explaining recommendations to a human cube builder.

The OUTPUT SHAPE is enforced by the API contract — you don't need
to worry about field names, types, enum values, or whether to use
empty arrays vs. null. The system handles that. Your job is to
fill each field with the SUBSTANTIVE content that makes the
analysis useful.

Output shape is enforced via Gemini's structured-output mode against a Pydantic schema — the model can't return malformed JSON or invalid enum values; the API rejects them before they hit our pipeline. The prompt only carries semantic guidance.

The load-bearing field definitions

Most of the per-card numbers that show up in your report come from these five fields. Each one is asked for with explicit examples and explicit failure modes. Verbatim from the prompt:

rawPowerLevel (0–10 scale)

rawPowerLevel — overall impact in a typical cube environment.

bandPlacement (enum)

bandPlacement — pick the closest fit: low (peasant/pauper-tier),
mid-low, mid (typical cube median), mid-high, high (vintage-adjacent),
vintage (restricted/banned-tier in lower formats). This REPLACES the
cube doctor's hand-tuned power-ceiling clip — picking the wrong band
ships a card into a cube it doesn't belong in.

identityStrength (0–1)

identityStrength (0-1) — how strongly this card SIGNALS lane
identity to a drafter looking at it in pack 1. Distinct from
signpostLevel: a payoff might score high signpostLevel without
strongly signaling the deck (utility synergy glue). Spider Spawning
would be ~0.95 (obvious lane signal); Counterspell ~0.2 (plays in
every Blue deck).

comboAnchor (boolean)

comboAnchor — true iff this card has an obvious 2-card combo it's
the lynchpin for. Splinter Twin = true; Lightning Bolt = false.
Powers the cube doctor's micro-combo scanner directly — false
positives WILL get surfaced as cube preservation flags, so be
conservative.

synergyExplanations (top-3 archetype tags, with reasons)

synergyExplanations — for the TOP 3 synergyTags, a {tag, why}
object explaining HOW the card fits the tag in 1 sentence. For
Blood Artist: {"tag":"Aristocrats","why":"Triggers on every
creature death, draining 1 from the opponent."}. The cube doctor
narrator quotes these directly so rationales reference REAL
mechanics instead of templated "fits the aristocrats theme"
boilerplate. Empty when synergyTags is empty.

replacedBy (strictly-better card names)

replacedBy — strictly-better alternatives in the same role.
Lightning Strike → ["Lightning Bolt"]. Cancel → ["Counterspell",
"Mana Leak", "Negate"]. Empty when this card IS the format
reference or has unique modes nothing else matches.

Why this matters for findings

When a report says "Aristocrats payoff:enabler is 0.4:1," the 0.4 rolls up from per-card synergyExplanations + an archetypeSupport map populated by the same prompt above. When the Power Band chart shows a card sitting in the high-power outlier band, that's the rawPowerLevel field. When the micro-combo scanner flags a card as a combo piece, that's comboAnchor.

Reading the prompts lets you assess whether the questions we're asking Gemini are the right questions to ground the downstream analysis. They're explicit, they ship working examples, and they call out specific failure modes the model needs to avoid. They're not perfect; the per-card enrichment occasionally mislabels cards (we've seen Goblin Electromancer's "spells-matter (cost reduction)" come back as a generic "control" tag, for instance). When that happens the role-ratio finding it drives gets flagged by the triage layer in most cases. When it doesn't get caught there, it ends up in a tagging-artifact bucket on the report — which is the visible failure mode the inner ring can dunk on, and we'd rather you do than not.

What we can't show you here. The Doctor's run-time prompts (the ones that produce the cube read / weakness write-ups / cut reasons when the AI box is checked) are a different stack — Claude Opus, agent-loop with tool calls, not single-shot enrichment. Those prompts are longer and more dynamic; we'll cover them in a follow-up methodology entry if there's interest.

Questions, corrections, gripes

If a ratio reads wrong on your cube, a sim assumption looks off, or the role taxonomy is hiding something important about a specific archetype you care about — we want to hear about it. The Cardivore app has an in-app feedback channel that comes straight to us; bring your reasoning and ideally a cube link we can run against the change.

← Back to Cube Doctor