Patent attributes
Systems, methods, and software can be used to cluster software codes in a scalable manner. In some aspects, a computer-implemented method comprises: obtaining a plurality of software samples; computing one or more first hash results for each of the plurality of software samples; computing one or more second hash results for each of the plurality of software samples based on the one or more first hash results, wherein an amount of the one or more second hash results is less than an amount of the one or more first hash results; determining a similarity output based on the one or more second hash results of two of the plurality of software samples; and clustering the plurality of software samples based on the similarity output to generate one or more software sample clusters.