Described herein are systems and methods for training first and second neural network models. A system comprises a memory comprising instruction data representing a set of instructions and a processor configured to communicate with the memory and to execute the set of instructions. The set of instructions, when executed by the processor, cause the processor to set a weight in the second model based on a corresponding weight in the first model, train the second model on a first dataset, wherein the training comprises updating the weight in the second model and adjust the corresponding weight in the first model based on the updated weight in the second model.