Patent attributes
First and second trees having entities identified by hexadecimal values are generated. First files are allocated across the first tree based on hashes of the first files and hexadecimal values of the first tree entities. First index values are calculated for first tree entities using hashes of the first files that have been allocated to entities branching into a lower level of the first tree. Second files are allocated across the second tree based on hashes of the second files and hexadecimal values of the second tree entities. Second index values are calculated for the second tree entities using hashes of the second files that have been allocated to entities branching into a lower level of the second tree. A determination is made of a number of entities between the first and second trees having matching index values to measure similarity between the first and second trees.