Patent attributes
Computer systems and associated methods are disclosed to implement a collaborative dataset management system (CDMS) for machine learning (ML) data. In embodiments, CDMS allows many users to create, review, and collaboratively evolve ML datasets. In embodiments, dataset owners may make their datasets available to other users on CDMS for a fee and under specified licensing conditions. CDMS users can search for other users' datasets on the system to use in their own ML tasks. CDMS users may also create child datasets from existing datasets on the system. Parent and child datasets may be linked so that changes to one dataset are provided to the other via merge requests. A dataset owner may use CDMS to review an incoming merge request using one or more audit jobs before approving the request. In this manner, CDMS provides a shared repository and collaboration system for managing high-quality datasets to power machine learning processes.