Patent attributes
A method for generating training data for disambiguation of an entity comprising a word or word string related to a topic to be analyzed includes acquiring sent messages by a user, each including at least one entity in a set of entities; organizing the messages and acquiring sets, each containing messages sent by each user; identifying a set of messages including different entities, greater than or equal to a first threshold value, and identifying a user corresponding to the identified set as a hot user; receiving an instruction indicating an object entity to be disambiguated; determining a likelihood of co-occurrence of each keyword and the object entity in sets of messages sent by hot users; and determining training data for the object entity on the basis of the likelihood of co-occurrence of each keyword and the object entity in the sets of messages sent by the hot users.