Authors
George Stein, Jesse Cresswell, Rasa Hosseinzadeh, Yi Sui, Brendan Ross, Valentin Villecroze, Zhaoyan Liu, Anthony L Caterini, Eric Taylor, Gabriel Loaiza-Ganem
Publication date
2024/2/13
Journal
Advances in Neural Information Processing Systems
Volume
36
Description
We systematically study a wide variety of generative models spanning semantically-diverse image datasets to understand and improve the feature extractors and metrics used to evaluate them. Using best practices in psychophysics, we measure human perception of image realism for generated samples by conducting the largest experiment evaluating generative models to date, and find that no existing metric strongly correlates with human evaluations. Comparing to 17 modern metrics for evaluating the overall performance, fidelity, diversity, rarity, and memorization of generative models, we find that the state-of-the-art perceptual realism of diffusion models as judged by humans is not reflected in commonly reported metrics such as FID. This discrepancy is not explained by diversity in generated samples, though one cause is over-reliance on Inception-V3. We address these flaws through a study of alternative self …
Total citations
Scholar articles
Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models
G Stein, J Cresswell, R Hosseinzadeh, Y Sui, B Ross… - Advances in Neural Information Processing Systems, 2024