Authors
Eric Wong, Shibani Santurkar, Aleksander Madry
Publication date
2021/7/1
Conference
International Conference on Machine Learning
Pages
11205-11216
Publisher
PMLR
Description
We show how fitting sparse linear models over learned deep feature representations can lead to more debuggable neural networks. These networks remain highly accurate while also being more amenable to human interpretation, as we demonstrate quantitatively and via human experiments. We further illustrate how the resulting sparse explanations can help to identify spurious correlations, explain misclassifications, and diagnose model biases in vision and language tasks.
Total citations
20212022202320246283316
Scholar articles
E Wong, S Santurkar, A Madry - International Conference on Machine Learning, 2021