Patent attributes
A computer-implemented method includes receiving first lossy converted documents. The computer-implemented method includes generating corrected documents for the first lossy converted documents. Each of the corrected documents includes edit markers that reflect structure changes relative to a corresponding document of the first lossy converted documents. The computer-implemented method includes generating feature vectors for the first lossy converted documents. The feature vectors include structure features of the first lossy converted documents. The computer-implemented method includes training one or more models based on the structure features and the edit markers. The computer-implemented method includes applying the trained one or more models to second lossy converted documents to determine proposed structure edits. The computer-implemented method includes transforming the second lossy converted documents to second corrected documents by applying one or more of the proposed structure edits.