Patent attributes
Disclosed is system comprising data processing arrangement including processors configured to receive sentences from unlabeled training data set; tokenize, using tokenizer module, sentences to obtain tokens; generate character level features for character of tokens of sentences; generate token level feature for each token of the sentences, wherein token level feature of token in sentence is identified using token coordinates of token and token coordinates of tokens neighboring token in sentence; train artificial neural network adapted to identify entities in sentences to determine first trend set, wherein training is based on received sentences, character level features for each character of tokens of sentences and token level feature for tokens of sentences; train the artificial neural network on set of labelled data to determine second trend set; identify, using identifier module, entity in text content, wherein identifier module uses first trend set and second trend set determined by artificial neural network.