US Patent 10042923 Topic extraction using clause segmentation and high-frequency words

The disclosed embodiments provide a system for processing data. During operation, the system obtains a set of clauses in a first set of content items comprising unstructured data. Next, the system obtains a set of stop words comprising high-frequency words that occur in a second set of content items. The system then automatically extracts a set of topics from the set of clauses by generating a set of n-grams from the set of clauses and excluding a first n-gram in the set of n-grams from the set of topics when the first n-gram contains a word in the set of stop words in a pre-specified position of the first n-gram. Finally, the system displays the set of topics to a user to improve understanding of the first set of content items by the user without requiring the user to manually analyze the first set of content items.

Timeline

No Timeline data yet.

Further Resources

Title

Author

Link

Type

Date

No Further Resources data yet.

US Patent 10042923 Topic extraction using clause segmentation and high-frequency words

Contents

Patent attributes

Timeline

Further Resources

References

Find more entities like US Patent 10042923 Topic extraction using clause segmentation and high-frequency words