Embodiments described herein provide a technique to crowdsource labeling of training data for a machine learning model while maintaining the privacy of the data provided by crowdsourcing participants. Client devices can be used to generate proposed labels for a unit of data to be used in a training dataset. One or more privacy mechanisms are used to protect user data when transmitting the data to a server. The server can aggregate the proposed labels and use the most frequently proposed labels for an element as the label for the element when generating training data for the machine learning model. The machine learning model is then trained using the crowdsourced labels to improve the accuracy of the model.