Patent attributes
A document analysis device that includes an artificial intelligence (AI) processing engine configured to receive training data, to select a sentence from the training data, and to compute a first set of similarity scores between the selected sentence and other sentences from the training data. The AI processing engine is further configured to determine a set count that is equal to a number of similarity scores in the first set of similarity scores that exceed a similarity score threshold value and to compare the set count to a set outlier threshold value. The AI processing engine is further configured to keep the selected sentence in the training data when the set count is greater than or equal to the set outlier threshold value and to remove the selected sentence from the training data when the set count is less than the set outlier threshold value.