promptdojo_›phase 06 · applied builds›ch 27 · ai image generation

lesson 3 of 3 · the image pipeline — turning one idea into a hundred shipped assetsstep 2 / 7

Batch and filter — where the math actually works

The single biggest 2026 unlock in image generation isn't a better model. It's the collapse in per-image cost at the Flash tier. nano-banana at ~$0.039 changed the economics of "how should I get to a usable image" from "spend a lot, generate one, hope" to "spend a little, generate many, filter."

The math, walked through

Suppose you need one shipping-ready image. Three strategies:

Strategy A — One-shot premium

Send one prompt to Midjourney Pro tier or GPT Image 1 (High quality at $0.167/image). Per-image cost: ~$0.10-0.17. Hit rate (image is usable as-is, no broken hands, on-brief): assume 30-40%. So you regenerate, on average, 2-3 times to get a winner.

Real cost per shipping-ready image: ~$0.25-0.50, plus operator time.

Strategy B — Batch and filter

Send the same prompt to nano-banana 10 times. Cost: $0.039 × 10 = $0.39. Each individual image has a lower hit rate (~20%) because nano-banana isn't as polished as the premium models. But you have 10 candidates.

Probability that AT LEAST ONE is shipping-ready:

P(none) = (1 - 0.2) ^ 10 = 0.8 ^ 10 ≈ 0.107
P(at least one) ≈ 89%

Real cost per shipping-ready image: ~$0.39 with 89% confidence on first batch.

If the first batch produces nothing usable (~11% of the time), you batch again. Two batches gets you to 98.8% confidence at ~$0.78 total. That's still cheaper than strategy A's expected cost.

Strategy C — Self-hosted Flux

Rent an H100 GPU at ~$2/hour. Run Flux.1 [dev] (open weights, non-commercial license — for commercial you'd run Flux.1 [schnell] or have a license for [pro]). A well-tuned setup generates ~1 image/second at 1024x1024. That's 3,600 images/hour at $2/hour, so ~$0.00056 per image.

Below ~50,000 images/month, the API cost wins. Above that, the self-hosted economics win and the gap widens fast.

When strategy B (batch-and-filter) is wrong

A few cases where you DON'T want batch-and-filter:

Highest possible quality required. Hero shot for a billboard. Flagship product launch image. Use Flux.1.1 [pro] Ultra at $0.06/image and review it yourself, don't try to filter 25 mediocre nano-bananas.
Specific text required. Ideogram. Generate 3-5, pick the one with crispest text. The filter step is text-readability check, not aesthetic.
Tight subject-identity requirement across multiple outputs. Flux Kontext with a reference. Batch is the wrong frame — you want consistent outputs, not varied ones.
Single one-off where operator time is cheap. Personal project, internal use. The math collapses; just pick whichever model you like and run it once.

The filter step in production

Three ways to filter, in order of how often you'll actually use them:

Vision-model judge (the default)

# pseudocode — real call shape varies by provider

# import anthropic
# client = anthropic.Anthropic()
# resp = client.messages.create(
#     model="claude-sonnet-4-7",
#     messages=[{
#         "role": "user",
#         "content": [
#             {"type": "text", "text": f"Brief: {brief}. Rate each on 0-10 for fit."},
#             *[{"type": "image", "source": {...}} for img in candidates],
#         ],
#     }],
# )

Cost: ~$0.001-0.005 per candidate evaluated. Speed: ~3-10 seconds for a batch of 10. Quality: very good — the model catches broken hands, off-brief composition, AI face. The 2026 default.

CLIP similarity (the deterministic option)

Embed the brief in CLIP-space, embed each candidate, rank by cosine similarity. Fast and free. The downside: CLIP only knows "does this image match this text," not "is this beautiful" or "does this have broken hands."

Aesthetic scorer

Open-source models like LAION-Aesthetics-V2-Predictor give an aesthetic score 0-10. Combine with CLIP for a composite "matches the brief AND looks good" rank. Useful when you don't want vision-model latency or cost.

In practice, most production pipelines use the vision-model judge. The cost is in the noise compared to generation, and the quality of the rank is the bottleneck for shipping confidence.

The batch size question

How many candidates is the right batch size? It's a function of three things:

Per-image cost. Cheaper model → bigger batches make sense.
Per-image hit rate. Lower hit rate → bigger batches needed for high confidence.
Filter cost. The filter step costs money/time too. At some point adding more candidates costs more than it's worth.

The 2026 default heuristic: batch of 10 for nano-banana, 5 for Flux Pro, 3 for premium tiers. Adjust based on the hit rate you observe on your specific use case.

What the next step asks

You'll see five pipelines, each broken in one specific stage. The drill is identifying which stage broke. Then you'll read about format-and-platform, fill in a dispatch pattern, write the cost function, and plan a full shoot.

⌘↵ runs the editor.read, then continue.

promptdojo_›phase 06 · applied builds›ch 27 · ai image generation

lesson 3 of 3 · the image pipeline — turning one idea into a hundred shipped assetsstep 2 / 7

Batch and filter — where the math actually works

The math, walked through

Suppose you need one shipping-ready image. Three strategies:

Strategy A — One-shot premium

Real cost per shipping-ready image: ~$0.25-0.50, plus operator time.

Strategy B — Batch and filter

Probability that AT LEAST ONE is shipping-ready:

P(none) = (1 - 0.2) ^ 10 = 0.8 ^ 10 ≈ 0.107
P(at least one) ≈ 89%

Real cost per shipping-ready image: ~$0.39 with 89% confidence on first batch.

If the first batch produces nothing usable (~11% of the time), you batch again. Two batches gets you to 98.8% confidence at ~$0.78 total. That's still cheaper than strategy A's expected cost.

Strategy C — Self-hosted Flux

Below ~50,000 images/month, the API cost wins. Above that, the self-hosted economics win and the gap widens fast.

When strategy B (batch-and-filter) is wrong

A few cases where you DON'T want batch-and-filter:

Highest possible quality required. Hero shot for a billboard. Flagship product launch image. Use Flux.1.1 [pro] Ultra at $0.06/image and review it yourself, don't try to filter 25 mediocre nano-bananas.
Specific text required. Ideogram. Generate 3-5, pick the one with crispest text. The filter step is text-readability check, not aesthetic.
Tight subject-identity requirement across multiple outputs. Flux Kontext with a reference. Batch is the wrong frame — you want consistent outputs, not varied ones.
Single one-off where operator time is cheap. Personal project, internal use. The math collapses; just pick whichever model you like and run it once.

The filter step in production

Three ways to filter, in order of how often you'll actually use them:

Vision-model judge (the default)

# pseudocode — real call shape varies by provider

# import anthropic
# client = anthropic.Anthropic()
# resp = client.messages.create(
#     model="claude-sonnet-4-7",
#     messages=[{
#         "role": "user",
#         "content": [
#             {"type": "text", "text": f"Brief: {brief}. Rate each on 0-10 for fit."},
#             *[{"type": "image", "source": {...}} for img in candidates],
#         ],
#     }],
# )

Cost: ~$0.001-0.005 per candidate evaluated. Speed: ~3-10 seconds for a batch of 10. Quality: very good — the model catches broken hands, off-brief composition, AI face. The 2026 default.

CLIP similarity (the deterministic option)

Aesthetic scorer

In practice, most production pipelines use the vision-model judge. The cost is in the noise compared to generation, and the quality of the rank is the bottleneck for shipping confidence.

The batch size question

How many candidates is the right batch size? It's a function of three things:

Per-image cost. Cheaper model → bigger batches make sense.
Per-image hit rate. Lower hit rate → bigger batches needed for high confidence.
Filter cost. The filter step costs money/time too. At some point adding more candidates costs more than it's worth.

The 2026 default heuristic: batch of 10 for nano-banana, 5 for Flux Pro, 3 for premium tiers. Adjust based on the hit rate you observe on your specific use case.

What the next step asks

⌘↵ runs the editor.read, then continue.

The image pipeline — turning one idea into a hundred shipped assets — step 2 of 7

Batch and filter — where the math actually works

The math, walked through

Strategy A — One-shot premium

Strategy B — Batch and filter

Strategy C — Self-hosted Flux

When strategy B (batch-and-filter) is wrong

The filter step in production

Vision-model judge (the default)

CLIP similarity (the deterministic option)

Aesthetic scorer

The batch size question

What the next step asks

The image pipeline — turning one idea into a hundred shipped assets — step 2 of 7

Batch and filter — where the math actually works

The math, walked through

Strategy A — One-shot premium

Strategy B — Batch and filter

Strategy C — Self-hosted Flux

When strategy B (batch-and-filter) is wrong

The filter step in production

Vision-model judge (the default)

CLIP similarity (the deterministic option)

Aesthetic scorer

The batch size question

What the next step asks