View article

[PDF] from academia.edu

A Study of K-Nearest Neighbour as an Imputation Method

Authors

Gustavo EAPA Batista, Maria Carolina Monard

Publication date

2002/12/30

Conference

HIS

Volume

Issue

251-260

Pages

Description

Data quality is a major concern in Machine Learning and other correlated areas such as Knowledge Discovery from Databases (KDD). As most Machine Learning algorithms induce knowledge strictly from data, the quality of the knowledge extracted is largely determined by the quality of the underlying data. One relevant problem in data quality is the presence of missing data. Despite the frequent occurrence of missing data, many Machine Learning algorithms handle missing data in a rather naive way. Missing data treatment should be carefully thought, otherwise bias might be introduced into the knowledge induced. In this work, we analyse the use of the k-nearest neighbour as an imputation method. Imputation is a term that denotes a procedure that replaces the missing values in a data set by some plausible values. Our analysis indicates that missing data imputation based on the k-nearest neighbour algorithm can outperform the internal methods used by C4. 5 and CN2 to treat missing data.

Total citations

Cited by 537

20032004200520062007200820092010201120122013201420152016201720182019202020212022202320242 2 2 5 2 5 5 4 3 5 5 17 25 34 35 51 40 57 64 68 71 33

Scholar articles

A study of K-nearest neighbour as an imputation method.

GE Batista, MC Monard - His, 2002

A Study of K-Nearest Neighbour as an Imputation Method, Vol. 30*

G Batista, MC Monard - 2002

Cited by 3 Related articles