Patent attributes
Techniques for active learning-based data labeling are described. An active learning-based data labeling service enables a user to build and manage large, high accuracy datasets for use in various machine learning systems. Machine learning may be used to automate annotation and management of the datasets, increasing efficiency of labeling tasks and reducing the time required to perform labeling. Embodiments utilize active learning techniques to reduce the amount of a dataset that requires manual labeling. As subsets of the dataset are labeled, this label data is used to train a model which can then identify additional objects in the dataset without manual intervention. The label data can be added to an augmented manifest, the augmented manifest can be used to filter the dataset to perform further labeling jobs on the same or different subsets of the dataset.