Patent attributes
A computer-implemented method, system and computer program product for automatically bootstrapping a domain-specific vocabulary from at least one source document using one or more computers, by: (a) encoding one or more passages in the source document to identify one or more relevant words therein, wherein the encoding assigns an importance to the relevant words using an attention mechanism (AM) on top of a recurrent neural network (RNN); (b) expanding the relevant words using word embedding distance, ontology information, or multi-part analogies; and (c) mapping the expanded words to concepts for inclusion into the domain-specific vocabulary, wherein concept disambiguation is performed to ensure that incorrect concepts are not included into the domain-specific vocabulary.