Learn
Comprehensive guides on evaluating AI image generation — from automated metrics to human judgment.
018 min
Introduction to Image Evaluation
Why evaluating image generation matters, and the two sides: automated metrics vs human judgment.
0212 min
Automated Metrics
FID, CLIP Score, LPIPS, VQAScore — what they measure, when to use them, and common pitfalls.
0310 min
Human Evaluation
ELO rankings, pairwise preference, rater calibration, and LLM-as-a-Judge approaches.
049 min
Comparing Image Models
Quality, speed, cost, consistency — how to compare fairly and find the Pareto frontier.
0511 min
Prompt Fidelity & Compositionality
Does the image match the text? Measuring attribute binding, spatial reasoning, and counting.
067 min
Consistency & Reproducibility
Same prompt, different outputs — measuring variance and why it matters for production.
078 min
Common Failures
Bad hands, text rendering, artifacts — why they happen and failure rates by model.
086 min
Cost, Speed & Deployment
API pricing, latency, throughput — the cost × quality × speed tradeoff.
0910 min
Safety & Bias
NSFW content, demographic bias, IP concerns, red teaming, and the EU AI Act.
105 min
Glossary
A–Z reference of every term used across the guides.