Yuqing Gao, Bing Xiang, Bowen Zhou
Publication date
Patent office
Patent number
Application number
Embodiments of the present invention utilize active learn ing to update parallel corpus with increased speed and decreased cost. Anactive learning approach, where a machine can partially teach itself, does not rely solely on human trans lators and provides a great benefit to statistical machine trans lation systems by increasing translation performance while using less human resources. Described herein is a method for creating or updating par allel corpus in a machine translation system. The method prepares a test set E to be updated, translates the test set E from a first language to a second language so as to create set F in the second language, translates set F back to the first language so as to create set E'in the first language, computes confidence scores for the translation of each item in the set based on the similarity of E and E, creates a subset of the highest confidence scores and adds the translations in the …
Total citations