Patent attributes
An example method embodying the disclosed technology comprises: digitally storing Teacher models and a Student model at a server computer; training each model with a corpus of unlabeled training data using Masked Language Modeling; fine-tuning each Teacher model for an ASAG task with labeled ground truth data; executing each Teacher model to generate and digitally store a respective set of class probabilities on an unlabeled task-specific data set for the ASAG task; further training the Student model by a linear ensemble of the Teacher models using KD; receiving, at the server computer, digital input comprising a target response text and a corresponding target reference answer text; programmatically inputting the target response text and the corresponding target reference answer text to the Student model, thereby outputting a corresponding predicted binary label; displaying correction data indicating the corresponding predicted binary label in a GUI; and, optionally, displaying explainability data in the GUI.