Authors
André Rodrigues Olivera, Valter Roesler, Cirano Iochpe, Maria Inês Schmidt, Álvaro Vigo, Sandhi Maria Barreto, Bruce Bartholow Duncan
Publication date
2017
Journal
Sao Paulo Medical Journal
Volume
135
Issue
03
Pages
234-246
Publisher
Associação Paulista de Medicina-APM
Description
CONTEXT AND OBJECTIVE
Type 2 diabetes is a chronic disease associated with a wide range of serious health complications that have a major impact on overall health. The aims here were to develop and validate predictive models for detecting undiagnosed diabetes using data from the Longitudinal Study of Adult Health (ELSA-Brasil) and to compare the performance of different machine-learning algorithms in this task.
DESIGN AND SETTING
Comparison of machine-learning algorithms to develop predictive models using data from ELSA-Brasil.
METHODS
After selecting a subset of 27 candidate variables from the literature, models were built and validated in four sequential steps: (i) parameter tuning with tenfold cross-validation, repeated three times; (ii) automatic variable selection using forward selection, a wrapper strategy with four different machine-learning algorithms and tenfold cross-validation (repeated three times), to evaluate each subset of variables; (iii) error estimation of model parameters with tenfold cross-validation, repeated ten times; and (iv) generalization testing on an independent dataset. The models were created with the following machine-learning algorithms: logistic regression, artificial neural network, naïve Bayes, K-nearest neighbor and random forest.
RESULTS
The best models were created using artificial neural networks and logistic regression. These achieved mean areas under the curve of, respectively, 75.24% and 74.98% in the error estimation step and 74.17% and 74.41% in the generalization testing step …
Total citations
201720182019202020212022202320242617172411115