Patent attributes
A method for building a factual database of concepts and entities that are related to the concepts through a learning process. Training content (e.g., news articles, books) and a set of entities (e.g., Bill Clinton and Barack Obama) that are related to a concept (e.g., Presidents) is received. Groups of words that co-occur frequently in the textual content in conjunction with the entities are identified as templates. Templates may also be identified by analyzing parts-of-speech patterns of the templates. Entities that co-occur frequently in the textual content in conjunction with the templates are identified as additional related entities (e.g., Ronald Reagan and Richard Nixon). To eliminate erroneous results, the identified entities may be presented to a user who removes any false positives. The entities are then stored in association with the concept.