Authors
Tommi Vatanen, Maria Osmala, Tapani Raiko, Krista Lagus, Marko Sysi-Aho, M Orešič, Timo Honkela, Harri Lähdesmäki
Publication date
2015/1/5
Journal
Neurocomputing
Volume
147
Pages
60-70
Publisher
Elsevier
Description
In this paper, we study fundamental properties of the Self-Organizing Map (SOM) and the Generative Topographic Mapping (GTM), ramifications of the initialization of the algorithms and properties of the algorithms in the presence of missing data. We show that the commonly used principal component analysis (PCA) initialization of the GTM does not guarantee good learning results with high-dimensional data. Initializing the GTM with the SOM is shown to yield improvements in self-organization with three high-dimensional data sets: commonly used MNIST and ISOLET data sets and epigenomic ENCODE data set. We also propose a revision of handling missing data to the batch SOM algorithm called the Imputation SOM and show that the new algorithm is more robust in the presence of missing data. We benchmark the performance of the topographic mappings in the missing value imputation task and conclude that …
Total citations
201520162017201820192020202120222023202491161416111515166
Scholar articles
T Vatanen, M Osmala, T Raiko, K Lagus, M Sysi-Aho… - Neurocomputing, 2015