Patent attributes
Various aspects of the subject technology relate to systems, methods, and machine-readable media for automated quantitative assessment of text complexity. A system may include processing at least one body of text in a text-based query using a natural language processing engine. The processed text may include sub-blocks of text in a predetermined sequence size such as an n-gram. The system may compare reference bases to the processed text, where each reference base is associated with a different natural language. The system determines which of the reference bases has a highest number of matching words within the body of text, and thereby identifies the reference base as the source language of the supplied text. The system then determines an average complexity score for n-gram using a quantitative assessment engine. The system then applies a readability score to the body of text based on the average complexity scores of the n-grams.