Name: ImageBench
Creator: ImageBench
License: https://imagebench.ai/license

Question 1

What is ImageBench?

Accepted Answer

ImageBench is an AI image model benchmark that publishes every generated image, not just aggregate scores. Two evaluations: Benchmark V1 (capabilities across 6 categories) and RealBench (photorealism via human votes).

Question 2

Which AI image model is the best in 2026?

Accepted Answer

On ImageBench V1, Nano Banana 2 (Google) and GPT Image 2 (OpenAI) share the top spot at 95.3% pass rate. Full rankings, per-category scores, and every generated image are published on the ImageBench leaderboard.

Question 3

Which AI image model is best for text rendering?

Accepted Answer

Flux 2 Max, Nano Banana Pro, and GPT Image 2 all score 100% on the Text Rendering category of Benchmark V1. See the ranking and outputs on our best AI image generator for text page.

Question 4

Which AI image models can I run locally?

Accepted Answer

ImageBench benchmarks open-weight models under the local/ prefix — Flux 2 Klein, Sana 1.5, HiDream, Z-Image, Qwen-Image, Krea 2, and more. Krea 2 Turbo currently leads local models at 81.8% pass rate.

Question 5

How does GPT Image 2 compare to Nano Banana 2?

Accepted Answer

Both tie at 95.3% overall pass rate on Benchmark V1 but differ per category. Every prompt, image, and verdict is on the GPT Image 2 vs Nano Banana 2 comparison page.

Question 6

What's the difference between Benchmark V1 and RealBench?

Accepted Answer

Two different questions. Benchmark V1 asks "how good is it?" — pass/fail on capability tasks graded by VLM judges. RealBench asks "how real does it look?" — how often humans mistake AI images for real photos.

Question 7

How does Benchmark V1 evaluate image generation models?

Accepted Answer

192 prompts across 6 categories: Text Rendering, Spatial Reasoning, Human Realism, Truthfulness, Professional Studio, Graphical Design. VLM judges (Qwen 3.5 122B and per-category specialists) grade pass/fail, verified by human review. Every prompt, image, and verdict is published.

Question 8

How does RealBench measure realism?

Accepted Answer

RealBench uses votes from a quick in-site game where players guess whether each image is a real photo or AI. A model's realism score is the share of votes that judged its AI images as real.

Question 9

What metrics does ImageBench.ai use?

Accepted Answer

V1 doesn't use FID or CLIP Score. Verdicts come from VLM judges — Qwen 3.5 122B as primary judge with per-category specialists — plus human review. See the methodology and calibration blog posts.

Question 10

How is ImageBench.ai different from LM Arena / Artificial Analysis?

Accepted Answer

Arena platforms measure crowd preference via blind votes — great for overall ranking, but they don't tell you why or where models fail. ImageBench runs structured category benchmarks (text, spatial, hands, truthfulness, studio, design) and publishes every image and verdict.

Question 11

How do I compare image generation models visually?

Accepted Answer

Use the ImageBench gallery to inspect the same prompts across models side by side. Filter by category, difficulty, or model and judge the actual generated images — not just the aggregate scores.

Question 12

Does ImageBench.ai test open-source models too?

Accepted Answer

Yes. Open-weight models like Flux 2, Sana, HiDream, Z-Image, and Qwen-Image are evaluated under the same conditions as API-based models. Local models are labeled 'local/' in the leaderboard.

Question 13

How often are benchmarks updated?

Accepted Answer

Whenever a major model is released or updated. The leaderboard reflects the latest available versions; historical results are preserved so you can track progress over time.

Question 14

Can I submit my model for evaluation?

Accepted Answer

Yes. If your model has a public API or downloadable weights, reach out and we'll add it to upcoming benchmark runs. All models are evaluated under the same conditions.

Question 15

Is the methodology open?

Accepted Answer

Yes. Every benchmark publishes its prompt suite, scoring code, and evaluation criteria. Transparent methodology is what separates useful benchmarks from marketing.

Question 16

Who is behind ImageBench.ai?

Accepted Answer

ImageBench.ai is built by Damien Henry — former co-founder of Clipdrop (YC W21, acquired by Stability AI), Google Arts & Culture innovation lead, and current SVP Image Research at Jasper.

Question 17

How can I get in touch?

Accepted Answer

Email hello@imagebench.ai

#
1	openai/gpt-image-2	78.5	token-based	45.3s
2	fal/google/nano-banana-2	73.0	$0.08 / image at 1K	28.1s
3	fal/google/nano-banana-pro	66.2	$0.15 / image	23.4s
4	fal/bytedance/seedream-v5-pro	65.6	from $0.0675 / image	147.4s
5	local/boogu-image-turbo	61.9	N/A	9.9s
6	bfl/flux-2-pro	60.1	from $0.03 / image	11.8s
7	bfl/flux-2-max	59.6	from $0.07 / image	26.7s
8	fal/bytedance/seedream-v4	59.4	$0.03 / image	14.1s
9	local/qwen-image-2512-20b	58.6	N/A	80.2s
10	local/flux-2-klein-9b	56.6	N/A	8.5s

#	Model	Realism score	Rated real / votes	Images
1	fal/google/nano-banana-pro	53%	1,907 / 3,609	141
2	fal/bytedance/seedream-v4	47%	1,679 / 3,539	139
3	openai/gpt-image-2	46%	1,572 / 3,418	139
4	local/z-image-turbo-6b	46%	1,646 / 3,603	140
5	bfl/flux-2-max	41%	1,473 / 3,592	141

The only generative image benchmark that shows the images

Benchmark V1

RealBench V1

EditBench

Gallery

The results at a glance

Benchmark V1

Quality vs. size

Quality vs. price

RealBench V1

Frequently asked questions