Patent attributes
A system and methods are provided in which an artificial intelligence inference module identifies targeted information in large-scale unlabeled data, wherein the artificial intelligence inference module autonomously learns hierarchical representations from large-scale unlabeled data and continually self-improves from self-labeled data points using a teacher model trained to detect known targets from combined inputs of a small hand labeled curated dataset prepared by a domain expert together with self-generated intermediate and global context features derived from the unlabeled dataset by unsupervised and self-supervised processes. The trained teacher model processes further unlabeled data to self-generate new weakly-supervised training samples that are self-refined and self-corrected, without human supervision, and then used as inputs to a noisy student model trained in a semi-supervised learning process on a combination of the teacher model training set and new weakly-supervised training samples. With each iteration, the noisy student model continually self-optimizes its learned parameters against a set of configurable validation criteria such that the learned parameters of the noisy student surpass and replace the learned parameter of the prior iteration teacher model, with these optimized learned parameters periodically used to update the artificial intelligence inference module.