View article

Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models

Authors

George Stein, Jesse Cresswell, Rasa Hosseinzadeh, Yi Sui, Brendan Ross, Valentin Villecroze, Zhaoyan Liu, Anthony L Caterini, Eric Taylor, Gabriel Loaiza-Ganem

Publication date

2024/2/13

Journal

Advances in Neural Information Processing Systems

Volume

Description

We systematically study a wide variety of generative models spanning semantically-diverse image datasets to understand and improve the feature extractors and metrics used to evaluate them. Using best practices in psychophysics, we measure human perception of image realism for generated samples by conducting the largest experiment evaluating generative models to date, and find that no existing metric strongly correlates with human evaluations. Comparing to 17 modern metrics for evaluating the overall performance, fidelity, diversity, rarity, and memorization of generative models, we find that the state-of-the-art perceptual realism of diffusion models as judged by humans is not reflected in commonly reported metrics such as FID. This discrepancy is not explained by diversity in generated samples, though one cause is over-reliance on Inception-V3. We address these flaws through a study of alternative self …

Total citations

Cited by 36

202320246 30

Scholar articles

Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models

G Stein, J Cresswell, R Hosseinzadeh, Y Sui, B Ross… - Advances in Neural Information Processing Systems, 2024