US Patent 11361571 Term extraction in highly technical domains

A language model is fine-tuned by extracting terminology terms from a text document. The method comprises identifying a text snippet, identifying candidate multi-word expressions using part of speech tags, and determining a specificity score value for each of the candidate multi-word expressions. Moreover, the method comprises determining a topic similarity score value for each of the candidate multi-word expressions, selecting remaining expressions from the candidate multi-word expressions using a function of a specificity value and a topic similarity value of each of the candidate multi-word expressions, adding a noun comprised in the text snippet to the remaining expressions depending on a correlation function, labeling the remaining multi-word expressions, and fine-tuning an existing pre-trained transformer-based language model using as training data the identified text snippet marked with the labeled remaining expressions.

Timeline

No Timeline data yet.

Further Resources

Title

Author

Link

Type

Date

No Further Resources data yet.

US Patent 11361571 Term extraction in highly technical domains

Contents

Patent attributes

Timeline

Further Resources

References

Find more entities like US Patent 11361571 Term extraction in highly technical domains