View article

[PDF] from wustl.edu

Parallel boosted regression trees for web search ranking

Authors

Stephen Tyree, Kilian Q Weinberger, Kunal Agrawal, Jennifer Paykin

Publication date

2011/3/28

Book

Proceedings of the 20th international conference on World wide web

Pages

387-396

Description

Gradient Boosted Regression Trees (GBRT) are the current state-of-the-art learning paradigm for machine learned web-search ranking - a domain notorious for very large data sets. In this paper, we propose a novel method for parallelizing the training of GBRT. Our technique parallelizes the construction of the individual regression trees and operates using the master-worker paradigm as follows. The data are partitioned among the workers. At each iteration, the worker summarizes its data-partition using histograms. The master processor uses these to build one layer of a regression tree, and then sends this layer to the workers, allowing the workers to build histograms for the next layer. Our algorithm carefully orchestrates overlap between communication and computation to achieve good performance.

Since this approach is based on data partitioning, and requires a small amount of communication, it generalizes to …

Total citations

Cited by 208

201120122013201420152016201720182019202020212022202320243 11 15 13 9 12 18 23 26 17 22 15 15 6

Scholar articles

Parallel boosted regression trees for web search ranking

S Tyree, KQ Weinberger, K Agrawal, J Paykin - Proceedings of the 20th international conference on …, 2011