Patent attributes
Identifying entities in semi-structured content is described. A system assigns a corresponding entity type based on a corresponding entity type score for each token in a sequence of tokens in semi-structured content, based on multiple entity types, wherein each token is a corresponding character set. The system assigns a corresponding boundary type based on a corresponding boundary type score for each token in the sequence of tokens, based on a begin boundary type or a continue boundary type. The system identifies an entity based on a corresponding entity type score and a corresponding boundary type for each token in the sequence of tokens. The system outputs the sequence of tokens as an identified set of entities based on the identified entity.