Authors
Georgios Chatzigeorgakidis, Sophia Karagiorgou, Spiros Athanasiou, Spiros Skiadopoulos
Publication date
2018/12
Journal
Journal of Big Data
Volume
5
Pages
1-27
Publisher
Springer International Publishing
Description
Efficient management and analysis of large volumes of data is a demanding task of increasing scientific and industrial importance, as the ubiquitous generation of information governs more and more aspects of human life. In this article, we introduce FML-kNN, a novel distributed processing framework for Big Data that performs probabilistic classification and regression, implemented in Apache Flink. The framework’s core is consisted of a k-nearest neighbor joins algorithm which, contrary to similar approaches, is executed in a single distributed session and is able to operate on very large volumes of data of variable granularity and dimensionality. We assess FML-kNN’s performance and scalability in a detailed experimental evaluation, in which it is compared to similar methods implemented in Apache Hadoop, Spark, and Flink distributed processing engines. The results indicate an overall superiority of our …
Total citations
2019202020212022202320244961093
Scholar articles
G Chatzigeorgakidis, S Karagiorgou, S Athanasiou… - Journal of Big Data, 2018