View article

[PDF] from acm.org

Does learning require memorization? a short tale about a long tail

Authors

Vitaly Feldman

Publication date

2020/6/22

Book

Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing

Pages

954-959

Description

State-of-the-art results on image recognition tasks are achieved using over-parameterized learning algorithms that (nearly) perfectly fit the training set and are known to fit well even random labels. This tendency to memorize seemingly useless training data labels is not explained by existing theoretical analyses. Memorization of the training data also presents significant privacy risks when the training data contains sensitive personal information and thus it is important to understand whether such memorization is necessary for accurate learning.

We provide a simple conceptual explanation and a theoretical model demonstrating that for natural data distributions memorization of labels is necessary for achieving close-to-optimal generalization error. The model is motivated and supported by the results of several recent empirical works. In our model, data is sampled from a mixture of subpopulations and the frequencies …

Total citations

Cited by 423

2020202120222023202426 68 88 140 98

Scholar articles

Does learning require memorization? a short tale about a long tail

V Feldman - Proceedings of the 52nd Annual ACM SIGACT …, 2020