The only generative image benchmark that shows the images
12 models, 192 prompts, 6 categories — every output published. Judge with your own eyes which model is best for your use case, your budget, your quality bar.

Text Rendering › Typography Style › Easyopenai/gpt-image-2
Prompt: The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink
V1 Leaderboard
192 prompts, 6 categories, graded pass/fail by VLM judges.
| # | Model | Pass Rate | Pass / Fail | Avg Latency |
|---|---|---|---|---|
| 1 | openai/gpt-image-2 | 96.4% | 185/7 | 45.3s |
| 2 | fal/google/nano-banana-2 | 95.3% | 183/9 | 28.1s |
| 3 | bfl/flux-2-max | 91.7% | 176/16 | 26.7s |
| 4 | fal/google/nano-banana-pro | 91.1% | 175/17 | 23.4s |
| 5 | bfl/flux-2-pro | 83.3% | 160/32 | 11.8s |
| 6 | bfl/flux-2-klein-9b | 78.6% | 151/41 | 4.1s |
| 7 | gx10/bonsai-image-4b | 76.0% | 146/46 | 4.1s |
| 8 | z-image-local/z-image-turbo | 75.5% | 145/47 | 18.1s |
| 9 | bfl/flux-2-klein-4b | 74.0% | 142/50 | 3.8s |
| 10 | qwen-image-local/qwen-image-gen | 70.8% | 136/56 | 80.2s |
| 11 | nucleus-local/nucleus-image | 67.2% | 129/63 | 39.1s |
| 12 | sana-local/sana-1.5-1.6b | 53.1% | 102/90 | 11.1s |
What we evaluate
Each model is tested across 6 categories with 192 prompts spanning easy to extreme difficulty.
Popular benchmark guides
Start learning
Comprehensive guides on image generation evaluation — from metrics to methodology.
Browse guidesFrequently asked questions
See how every model performs
Compare models side-by-side with our interactive benchmark explorer.
Explore ImageBench V1