View article

[PDF] from psu.edu

Approximate statistical tests for comparing supervised classification learning algorithms

Authors

Thomas G Dietterich

Publication date

1998/10/1

Source

Neural computation

Volume

Issue

Pages

1895-1923

Publisher

MIT Press

Description

This article reviews five approximate statistical tests for determining whether one learning algorithm outperforms another on a particular learning task. These test sare compared experimentally to determine their probability of incorrectly detecting a difference when no difference exists (type I error). Two widely used statistical tests are shown to have high probability of type I error in certain situations and should never be used: a test for the difference of two proportions and a paired-differences t test based on taking several random train-test splits. A third test, a paired-differences t test based on 10-fold cross-validation, exhibits somewhat elevated probability of type I error. A fourth test, McNemar's test, is shown to have low type I error. The fifth test is a new test, 5 × 2 cv, based on five iterations of twofold cross-validation. Experiments show that this test also has acceptable type I error. The article also measures the …

Total citations

Cited by 4555

19981999200020012002200320042005200620072008200920102011201220132014201520162017201820192020202120222023202420 56 67 61 61 91 96 105 140 134 162 181 166 171 155 187 172 199 220 192 182 228 310 354 313 326 152

Scholar articles

Approximate statistical tests for comparing supervised classification learning algorithms

TG Dietterich - Neural computation, 1998

Cited by 4555 Related articles All 21 versions

Approximate statistical tests for comparing supervised learning algorithms*

T Dietterich - Neural Comput. v10 i7, 1895

Cited by 5 Related articles