Patent attributes
Techniques are disclosed for clustering text. The techniques may be employed to cluster text blocks that are received in either sequential reading order or arbitrary order. A methodology implementing the techniques according to an embodiment includes receiving text blocks comprising elements that may include one or more of glyphs, characters, and/or words. The method further includes determining an order of the received text blocks as one of arbitrary order or sequential reading order. Text blocks received in sequential reading order progress from left to right and from top to bottom for horizontal oriented text, and from top to bottom and left to right for vertical oriented text. The method further includes performing z-order text clustering in response to determining that the received text blocks are in sequential reading order and performing sorted order text clustering in response to determining that the received text blocks are not in sequential reading order.