Patent attributes
One embodiment provides a method for clustering documents based upon a structure of each of the documents, including: receiving, at a device utilizing the machine-learning model, at least one document, each including a plurality of characters and having a structure; converting, for each of the at least one document, each of the plurality of characters to one of a plurality of character representations, wherein the converting includes identifying an attribute of a character and selecting a character representation corresponding to the attribute; producing at least one array for each of the one or more documents, wherein the at least one array includes the plurality of characters converted to the character representations; and clustering the at least one document into document clusters having similar structures by grouping the at least one arrays into groups of arrays having similarities, wherein each document cluster include documents corresponding to the arrays within one of the groups of arrays.