Authors
Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins
Publication date
1999/5/17
Journal
Computer networks
Volume
31
Issue
11-16
Pages
1481-1493
Publisher
Elsevier
Description
The Web harbors a large number of communities — groups of content-creators sharing a common interest — each of which manifests itself as a set of interlinked Web pages. Newgroups and commercial Web directories together contain of the order of 20,000 such communities; our particular interest here is on emerging communities — those that have little or no representation in such fora. The subject of this paper is the systematic enumeration of over 100,000 such emerging communities from a Web crawl: we call our process trawling. We motivate a graph-theoretic approach to locating such communities, and describe the algorithms, and the algorithmic engineering necessary to find structures that subscribe to this notion, the challenges in handling such a huge data set, and the results of our experiment.
Total citations
1999200020012002200320042005200620072008200920102011201220132014201520162017201820192020202120222023202410325793102108114101104989996696971534342303228251922198
Scholar articles
R Kumar, P Raghavan, S Rajagopalan, A Tomkins - Computer networks, 1999