Authors
Arun Iyer, Manohar Jonnalagedda, Suresh Parthasarathy, Arjun Radhakrishna, Sriram K Rajamani
Publication date
2019/6/8
Book
Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation
Pages
301-315
Description
We present a way to combine techniques from the program synthesis and machine learning communities to extract structured information from heterogeneous data. Such problems arise in several situations such as extracting attributes from web pages, machine-generated emails, or from data obtained from multiple sources. Our goal is to extract a set of structured attributes from such data.
We use machine learning models ("ML models") such as conditional random fields to get an initial labeling of potential attribute values. However, such models are typically not interpretable, and the noise produced by such models is hard to manage or debug. We use (noisy) labels produced by such ML models as inputs to program synthesis, and generate interpretable programs that cover the input space. We also employ type specifications (called "field constraints") to certify well-formedness of extracted values. Using …
Total citations
2019202020212022202315961
Scholar articles
A Iyer, M Jonnalagedda, S Parthasarathy… - Proceedings of the 40th ACM SIGPLAN Conference on …, 2019