Patent attributes
A method, system and computer program product to generate a training data set for image segmentation applications, comprising providing a set of input documents of a first format. The input documents each comprise one or more pages. The input documents are split into individual document pages and parsed. Parsing comprises identifying a predefined set of items including position information of the position of the predefined set of items in the individual document pages; generating a bitmap image of a second format for each individual document page of the first format. The bitmap image comprises a predefined number of pixels. A mask is generated for each individual document. The mask comprises the predefined number of pixels of the corresponding bitmap image. Generating the mask comprises assigning an encoded class label to each pixel of the mask based on the position information of identified items of the predefined set of items.