Authors
Stephen Tyree, Kilian Q Weinberger, Kunal Agrawal, Jennifer Paykin
Publication date
2011/3/28
Book
Proceedings of the 20th international conference on World wide web
Pages
387-396
Description
Gradient Boosted Regression Trees (GBRT) are the current state-of-the-art learning paradigm for machine learned web-search ranking - a domain notorious for very large data sets. In this paper, we propose a novel method for parallelizing the training of GBRT. Our technique parallelizes the construction of the individual regression trees and operates using the master-worker paradigm as follows. The data are partitioned among the workers. At each iteration, the worker summarizes its data-partition using histograms. The master processor uses these to build one layer of a regression tree, and then sends this layer to the workers, allowing the workers to build histograms for the next layer. Our algorithm carefully orchestrates overlap between communication and computation to achieve good performance.
Since this approach is based on data partitioning, and requires a small amount of communication, it generalizes to …
Total citations
201120122013201420152016201720182019202020212022202320243111513912182326172215156
Scholar articles
S Tyree, KQ Weinberger, K Agrawal, J Paykin - Proceedings of the 20th international conference on …, 2011