Inventors
Gary King, Connor T Jerzak, Anton Strezhnev
Publication date
2022/11/29
Patent office
US
Patent number
11514233
Application number
16415065
Description
Embodiments of the invention utilize a feature-extraction approach and/or a matching approach in combination with a nonparametric approach to estimate the proportion of documents in each of multiple labeled categories with high accuracy. The feature-extraction approach automatically generates continuously valued text features optimized for estimating the category proportions, and the matching approach constructs a matched set that closely resembles a data set that is unobserved based on an observed set, thereby improving the degree to which the distributions of the observed and unobserved sets resemble each other.
Total citations
202120222023202411