AI-powered support
Ask me anything about SettleGrid
A practical guide to calculating per-call and per-caller margin for AI APIs. Three worked examples covering LLM wrappers, paid-upstream-API wrappers, and compute-only tools. Benchmarks by category, tracking cadence, and when low margin is actually acceptable.
Ask three AI tool developers how much margin their API has and you'll get three different numbers — because they're almost certainly measuring different things. Some report gross margin (revenue minus direct costs). Some report contribution margin (revenue minus variable costs only). Some report net margin (revenue minus all costs including overhead). And some report a "vibes-adjusted" number that includes whatever they feel like including.
This lesson walks through how to calculate margin precisely for an AI API — specifically per-call margin and per-caller margin, the two numbers that actually matter for pricing decisions. We'll work through three complete examples with real Anthropic and Stripe rates, show how to track margin over time, and cover when low margin is acceptable (loss-leader products, discovery-phase launches) versus when it's a red flag.
If you haven't read lesson 4 on the economics of tool calling, start there — it covers the three-layer economics that this lesson's calculations live inside. And lesson 1 on pricing covers why cost floors matter in the first place.
For an AI API, four margin numbers tell different stories. Track all of them to avoid the "my revenue is up!" trap when costs are up faster.
For a single call: (price − direct_variable_costs) / price. Direct variable costs are the line items that scale linearly with each call — LLM inference, upstream API fees, per-call infrastructure, payment processing on that specific call.
This is the most pricing-relevant number. It tells you whether your price is set correctly relative to what each call costs you. A negative number means you're losing money on every call; a single-digit positive number means you're likely unsustainable after overhead; a 40-80% number is typical for healthy AI APIs.
Per-caller revenue minus per-caller variable costs. This answers: for the total volume this caller generated this month, did we make money?
Per-caller margin can diverge from per-call margin in two ways. Support-heavy callers absorb variable costs beyond their direct calls (your time answering questions). Free-tier-heavy callers may cost you money even on the paid calls they eventually make, because their average cost-per-call includes the unpaid free-tier calls. Track per-caller margin for your top 20% of callers by volume — they usually reveal economic edge cases.
Total revenue minus total variable costs, divided by total revenue. This is your aggregate health number.
Blended margin should be stable or improving as you scale. If it's dropping while volume grows, you have one of the failure modes from lesson 4 — mix shift, service creep, or the "we'll fix margin later" trap.
Revenue minus all costs, including overhead, allocation, and founder time. For most solo-developer tools, this is the only margin number that matters for "is this a business" purposes. If blended gross margin is 70% but you spend 30 hours a week on support for $2,000 in revenue, your net margin is probably negative once you value your time at any reasonable rate.
A realistic first example: a research-synthesis tool that takes a query, retrieves context from an internal RAG index, synthesizes an answer via Claude Sonnet 4.6, and returns structured JSON.
Assumptions: input is ~2K tokens of query + 4K tokens of RAG context = 6K total input; output is ~800 tokens of structured JSON. Per Anthropic API pricing, Sonnet 4.6 is $3/MTok input and $15/MTok output.
| Cost line | Per call |
|---|---|
LLM input tokens (6K × $3/MTok) |
$0.0180 |
LLM output tokens (0.8K × $15/MTok) |
$0.0120 |
| Vector search + RAG retrieval infrastructure | $0.0010 |
| API gateway + request handling | $0.0005 |
| Payment settlement (platform fee) | $0.0015 |
| Total variable cost | $0.0330 |
At a per-call price of $0.10:
($0.10 − $0.0330) / $0.10 = 67%The RAG-retrieved context is relatively stable across callers researching similar topics. If you add prompt caching and land a 60% cache hit rate on the 4K-token RAG context:
$0.30/MTok cache read rate): $0.00072$3/MTok): $0.0108$0.0120$0.0235 (vs $0.0300 without caching)Per-call gross margin moves to ($0.10 − $0.0265) / $0.10 = 73.5%. A 6.5-percentage-point improvement from one implementation change, with no pricing or feature changes.
If 60% of queries don't actually need Sonnet-quality synthesis (they're simple lookups that Haiku can handle at $1/MTok input, $5/MTok output):
$0.0050 per call$0.0235 (with caching)0.6 × $0.0050 + 0.4 × $0.0235 = $0.0124$0.0154Per-call gross margin now: ($0.10 − $0.0154) / $0.10 = 84.6%. Another 11-point improvement from better routing. Margin doubled from the naive baseline, same price, same callers.
A different shape: a tool that calls a premium financial-data API ($0.05/call at list price, per the provider's public pricing) and enriches the result with LLM analysis.
| Cost line | Per call |
|---|---|
| Upstream financial-data API | $0.0500 |
| LLM enrichment (1K input + 0.5K output on Haiku) | $0.0035 |
| Infrastructure | $0.0010 |
| Settlement | $0.0025 |
| Total variable cost | $0.0570 |
At $0.12/call:
($0.12 − $0.0570) / $0.12 = 52.5%This is a healthier-than-it-looks business. The dominant cost is the upstream API — which you could negotiate down at volume. The provider's $0.05 list price typically drops to $0.02-0.03 with $5K+/month commitment contracts. At the negotiated rate:
| Cost line (post-negotiation) | Per call |
|---|---|
| Upstream API (negotiated) | $0.0250 |
| LLM enrichment | $0.0035 |
| Infrastructure | $0.0010 |
| Settlement | $0.0025 |
| Total | $0.0320 |
Per-call gross margin at $0.12: ($0.12 − $0.0320) / $0.12 = 73.3%.
The negotiation is the biggest margin lever on this kind of tool. If you're paying list price to a paid upstream API and your spend is above $2K-5K/month, ask for a volume discount. Most upstream providers negotiate; the worst answer is no. For tools that want to pass through upstream cost rather than absorb it entirely, the platform-choice discussion in the MCP billing comparison matters — different billing layers expose different cost-passthrough mechanics (transparent markup vs flat-fee vs volume-tiered).
A pure-compute tool with no LLM or paid API dependencies: a tool that does DNS lookups and returns structured results.
| Cost line | Per call |
|---|---|
| Compute (serverless function invocation) | $0.00002 |
| DNS resolution (free-tier public resolvers) | $0.00000 |
| Infrastructure (amortized logging, monitoring) | $0.00020 |
| Settlement | $0.00030 |
| Total variable cost | $0.00052 |
At $0.01/call:
($0.01 − $0.00052) / $0.01 = 94.8%Compute-only tools often look like obvious business wins because their margin is so high. The catch is the fixed-cost structure. The same infrastructure costs that show up as tiny per-call numbers above (log ingestion, monitoring, alerting) have a floor that doesn't scale down. At low volume (say, 5K calls/month = $50 revenue), those fixed costs might eat the entire margin.
Compute-only tools need volume to be economically meaningful. If you can't see a clear path to 50K+ calls per month, the tool may be hobbyist-economics regardless of the per-call margin math.
What counts as a "healthy" margin depends on the tool category. AI APIs in different categories have structurally different cost profiles, and comparing a compute-only tool's 95% margin against an LLM-wrapper tool's 65% margin as if they were the same category produces wrong conclusions.
These ranges reflect typical unit economics across the MCP ecosystem, aligned with the per-call pricing benchmarks table that the pricing-fundamentals lesson also references. They're not rules — your specific tool may legitimately land outside these ranges — but they're useful reality checks when you're trying to decide if your margin is good or bad.
| Category | Typical gross margin | Why the range |
|---|---|---|
| Compute-only (DNS, simple transforms, algorithmic) | 85-95% | Minimal variable cost; margin is mostly capped by overhead and scale |
| LLM wrapper (Haiku-routed, short outputs) | 75-90% | Haiku's low per-call cost leaves room; caching compounds |
| LLM wrapper (Sonnet-routed, medium outputs) | 60-80% | Inference is meaningful but manageable; tier routing is the swing variable |
| LLM wrapper (Opus-routed, long-context) | 40-65% | Inference cost dominates; premium pricing needed to keep margin viable |
| Paid upstream API wrapper | 45-75% | Upstream cost is the floor; negotiation moves the ceiling |
| Premium data feed (Bloomberg, specialty providers) | 20-45% | Upstream cost is high and contractually rigid; scale is the only margin lever |
| Human-in-the-loop (manual review, curation) | 15-35% | Labor cost scales with volume; automation investments move the ceiling |
Two caveats on this table. First, these are gross margins on variable costs only — net operating margin is lower after overhead, support, and founder time. Second, early in a tool's life, actual margin often runs 10-20 points below the category range because scale hasn't kicked in yet. Use the range as a mature-state target, not a month-one expectation.
If your tool's gross margin is meaningfully below the category range and you've already applied the levers from lesson 4 (tier routing, caching, batching, negotiation), your pricing is probably set too low. That's the most common explanation — and also the easiest to fix. The pricing fundamentals covered in lesson 1 apply directly here.
Two margin concepts that sound similar and aren't. Getting them confused produces wrong pricing decisions.
Unit margin is per-call gross margin — the number we've been calculating throughout this lesson. It answers: for a single call, after paying direct variable costs, how much revenue survives?
Contribution margin is per-caller (or per-cohort) margin after variable costs for that caller, including variable costs that aren't per-call. For example, support time you spend on a specific enterprise caller is a variable cost relative to that caller (more calls ≠ more support, but more callers ≠ more support), but it's not a per-call cost.
For most solo-developer tools, unit margin and contribution margin are nearly identical because there are few caller-specific variable costs. For enterprise-serving tools with dedicated account management, the two can diverge significantly. A caller with 90% unit margin can have 40% contribution margin once you allocate the support time they absorb.
The practical implication: unit margin is the right number for pricing decisions (is this per-call price correct?), but contribution margin is the right number for caller-retention decisions (is this caller worth keeping at these terms?).
Margin isn't a static number — it drifts. Instrument it so you see drift before it bites.
Per-call gross margin should be a real-time number on your developer dashboard. If today's margin is 3 percentage points below last week's, you should see that before the billing cycle closes. Treating margin as a monthly accounting artifact means you're always reacting to last month's problem this month.
Blend margin is a summary number; segmented margin reveals where problems are. Segment at minimum by: new vs established callers, free-tier vs paid, and top 10% volume vs rest. Many "blended margin dropped" incidents turn out to be "one specific new enterprise caller on a custom pricing arrangement is mix-shifting the blend" — a different problem than "pricing across the board is off."
Set explicit margin alerts, ideally per-cohort. Example alerts: blended gross margin <60% for two consecutive days, top-volume caller's margin drops below 50%, any new caller's first-week margin is negative. Alerts turn margin tracking from a retrospective exercise into a proactive one.
A useful sanity check: your gross margin should be stable as a ratio of your largest cost input. If inference is your largest cost, plot (revenue per call) / (inference cost per call). That ratio should be at least 3 for a healthy AI wrapper — revenue 3× the direct inference cost covers settlement, infrastructure, and overhead with reasonable margin. A ratio that drops below 2 is a signal that pricing is too tight for the workload.
When you run pricing experiments, the margin math needs to stay green on both variants. Two specific moves help.
If your cost floor (including settlement fees) is $0.04/call, your experiment framework should refuse to set prices below, say, $0.06 — a price below that loses money regardless of volume outcome. This prevents a late-night "let's try $0.02 to see what happens" from landing a full weekend of negative-margin calls. Bake the floor into the system so a wrong-headed experiment literally can't happen.
A pricing experiment can increase revenue while decreasing contribution margin — the classic "pricing war" trap. Instead of optimizing for revenue, optimize for total contribution margin: (volume × price) − (volume × variable_cost). That captures the real question, which is "did this experiment make me more money after costs?" rather than "did this experiment produce a bigger top-line number?"
Not every tool needs 70% gross margin from day one. Three legitimate scenarios for accepting lower margin.
If your tool is part of a larger product strategy where the tool itself exists to draw callers into a higher-margin adjacent service, the tool's standalone margin can be lower than you'd otherwise accept. A free or near-free MCP server that drives sign-ups to your paid SaaS can make sense even at 10-20% gross margin on the tool itself.
Caveat: the "loss leader" strategy only works if the upsell path is real and measured. If 90% of loss-leader callers never convert, the loss is real — but the leader part isn't. Track conversion rate, not just adoption.
During the first 90 days of a tool's life, pricing experiments are expected to produce volatile margin. You're learning the market. Operating at thinner margin during this period is sometimes the right call because it prioritizes adoption over per-unit profitability. Set a hard end-date on the discovery phase — "if margin is still under 40% by day 90, we reprice" — to avoid sliding into permanent low-margin operations.
Occasionally an enterprise caller with a strong logo or strategic value will negotiate below your usual margin floor. Those deals can be legitimate business, but they should be (a) explicit exceptions with named strategic rationale, (b) tracked separately from your blended margin, and (c) bounded in volume. A tool that accepts every low-margin enterprise deal because "logos matter" ends up with blended margin nobody understands.
One specific trap in strategic deals is the "design partner" discount. A large prospective customer asks for significant pricing concessions in exchange for being a named design partner — essentially trading margin for marketing. This can be legitimate, but only if the marketing value is measured and the contract has a defined end date. "Design partner" relationships that drift into permanent 50% discounts with no attribution back to new customer acquisition are pure margin loss dressed up as strategy. Set explicit terms up front: what the design partner provides (case study, quote, intro calls to named prospects), what the discount is, and when the discount expires. Most design partners are fine with structured terms once they're written down; the ones who aren't were never going to be good partners in the first place.
One final nuance worth making explicit: margin is a percentage; runway is a number of months. A tool with 80% margin but $50/month in absolute contribution is not a better business than a tool with 40% margin and $5,000/month. Margin matters for efficiency and for asking "can this scale?", but the absolute dollar number matters for "can this pay me?"
Early in a tool's life, the margin percentage gets a lot of founder attention because it's forecasting the future. That's legitimate — a tool with negative margin won't become a business regardless of scale. But once margin is positive and stable, the absolute contribution in dollars matters more than another five percentage points of margin. Spending two weeks to push margin from 65% to 72% makes sense if it unlocks meaningful absolute-dollar gains; it doesn't if the underlying revenue is already modest and the engineering time could instead drive new caller acquisition.
The discipline: check margin monthly, but make product and sales decisions against absolute contribution. A healthy tool business has both — margin that's defensible and absolute numbers that justify continued investment. Optimize for both, not just for margin.
Three common margin obsessions that are usually wrong to prioritize.
Third-decimal-place precision. A per-call margin of 68.2% vs 68.7% doesn't change any decision. Round to whole percentages for dashboards; reserve decimal precision for the underlying cost accounting that feeds the dashboard.
Benchmark-vs-competitor margin. Your competitor's margin is irrelevant unless you know their cost structure — which you don't. Focus on your own margin trend and your own pricing levers. Competitor pricing is a market signal; competitor margin is noise.
Margin on every single call. Some calls will be below margin — free-tier calls, failed-call refunds, early-access partner calls. That's fine if the blended margin is healthy. Measure the aggregate, not the outliers.
Practical ritual: once a month, produce a one-page margin report. Include per-call gross margin by tier, blended gross margin, per-caller margin for top 10 callers, a 12-month trend chart, and a short written note on any anomalies. Stash the report somewhere you'll find it next month so you have month-over-month comparison.
That habit catches margin drift that dashboards can miss. Dashboards are good at "is today different from last week"; monthly reports are good at "is this quarter different from last quarter." Both views are needed, and the monthly cadence gives you a forum to ask "should we reprice?" on a schedule instead of a whim.
The goal of all of this isn't margin maximization for its own sake. It's clarity — knowing, on any given day, whether each call you serve and each caller you have is a net contributor to the business or a net cost. With that clarity, pricing and product decisions stop being leaps of faith and start being engineering problems with measurable, observable, iteratively-improvable outcomes.
SettleGrid lets you try any pricing model — per-call, tiered, freemium — with two lines of code, and switch without redeploying. Free tier is production-ready: 50K operations/month, 0% take rate on your first $1K/mo of revenue.