View article

[PDF] from mlr.press

Leveraging sparse linear layers for debuggable deep networks

Authors

Eric Wong, Shibani Santurkar, Aleksander Madry

Publication date

2021/7/1

Conference

International Conference on Machine Learning

Pages

11205-11216

Publisher

PMLR

Description

We show how fitting sparse linear models over learned deep feature representations can lead to more debuggable neural networks. These networks remain highly accurate while also being more amenable to human interpretation, as we demonstrate quantitatively and via human experiments. We further illustrate how the resulting sparse explanations can help to identify spurious correlations, explain misclassifications, and diagnose model biases in vision and language tasks.

Total citations

Cited by 83

20212022202320246 28 33 16

Scholar articles

Leveraging sparse linear layers for debuggable deep networks

E Wong, S Santurkar, A Madry - International Conference on Machine Learning, 2021