US Patent 11620515 Multi-task knowledge distillation for language model

Systems and methods are provided that employ knowledge distillation under a multi-task learning setting. In some embodiments, the systems and methods are implemented with a larger teacher model and a smaller student model, each of which comprise one or more shared layers and a plurality of task layers for performing multiple tasks. During training of the teacher model, its shared layers are initialized, and then the teacher model is multi-task refined. The teacher model predicts teacher logits. During training of the student model, its shared layers are initialized. Knowledge distillation is employed to transfer knowledge from the teacher model to the student model by the student model updating its shared layers and task layers, for example, according to the teacher logits of the teacher model. Other features are also provided.

Timeline

No Timeline data yet.

Further Resources

Title

Author

Link

Type

Date

No Further Resources data yet.

US Patent 11620515 Multi-task knowledge distillation for language model

Contents

Patent attributes

Timeline

Further Resources

References

Find more entities like US Patent 11620515 Multi-task knowledge distillation for language model