Authors
Paolo Boldi, Bruno Codenotti, Massimo Santini, Sebastiano Vigna
Publication date
2004/7/10
Journal
Software: Practice and Experience
Volume
34
Issue
8
Pages
711-726
Publisher
John Wiley & Sons, Ltd.
Description
We report our experience in implementing UbiCrawler, a scalable distributed Web crawler, using the Java programming language. The main features of UbiCrawler are platform independence, linear scalability, graceful degradation in the presence of faults, a very effective assignment function (based on consistent hashing) for partitioning the domain to crawl, and more in general the complete decentralization of every task. The necessity of handling very large sets of data has highlighted some limitations of the Java APIs, which prompted the authors to partially reimplement them. Copyright © 2004 John Wiley & Sons, Ltd.
Total citations
20032004200520062007200820092010201120122013201420152016201720182019202020212022202320243131827344437414349526357486039282935352712
Scholar articles
P Boldi, B Codenotti, M Santini, S Vigna - Software: Practice and Experience, 2004