Patent attributes
Disclosed herein are embodiments of systems, methods, and products comprising an analytic server that automates training dataset generation for different application areas. The server may perform an automated, iterative refinement process to build a collection of dataset generator models over time. The server may receive a set of seed examples in a domain and generate candidate examples based on the features of the seed examples using data synthesis techniques. The server may execute a pre-trained label discriminator (LD) and domain discriminator (D2) on the candidate examples. The LD may identify and reject mislabeled data. The D2 may identify and reject out of domain data. The analytic server may regenerate new labeled data based on the feedback of the LD and D2. The analytic server may train a dataset generator by iteratively performing these steps for refinement until the regenerated candidate examples reach a pass rate threshold.