Authors
Artem Trofimov, Mikhail Shavkunov, Sergey Reznick, Nikita Sokolov, Mikhail Yutman, Igor E Kuralenok, Boris Novikov
Publication date
2019/6/24
Book
Proceedings of the 13th ACM International Conference on Distributed and Event-based Systems
Pages
264-265
Description
Large-scale classification of text streams is an essential problem that is hard to solve. Batch processing systems are scalable and proved their effectiveness for machine learning but do not provide low latency. On the other hand, state-of-the-art distributed stream processing systems are able to achieve low latency but do not support the same level of fault tolerance and determinism. In this work, we discuss how the distributed streaming computational model and fault tolerance mechanisms can affect the correctness of text classification data flow. We also propose solutions that can mitigate the revealed pitfalls.
Total citations
Scholar articles
A Trofimov, M Shavkunov, S Reznick, N Sokolov… - Proceedings of the 13th ACM International Conference …, 2019