Authors
Clifton Phua, Vincent Lee, K Smith-Miles
Publication date
2006
Journal
Encyclopedia of Data Warehousing and Mining
Volume
24
Description
The personal name problem is the situation where the authenticity, ordering, gender, and other information cannot be determined correctly and automatically for every incoming personal name. A novel solution, tested on scoring data, is to mine a comprehensive external name dictionary with a set of chosen techniques made up of exact matching, phonetics (extended soundex), simmetrics (levenshtein), and classifiers (naïve Bayes algorithm). The main contribution of this paper is in the evaluation of and selection from five very different approaches and the empirical comparisons of multiple phonetical and string similarity techniques for the personal name problem. Other contributions include relating personal names mining to credit application fraud detection and other security systems, and making the labelled data and techniques available for future studies. In reality, there is no silver bullet solution to this problem but it can be alleviated with appropriate techniques on sufficient name data.
Total citations
200720082009201020112012201320142015201620172018201920202021202220232024224121112121
Scholar articles
C Phua, V Lee, K Smith-Miles - Encyclopedia of Data Warehousing and Mining, 2006