Patent attributes
A method includes preparing a representation of data associated with a plurality of software modules, the representation comprising similarity-based hashing of signatures constructed from a first subset of features of the plurality of software modules. The method also includes performing a similarity-based query utilizing the similarity-based hashing of signatures to identify one or more of the plurality of software modules as candidate software modules matching a received seed software module. The method further includes computing distances between the candidate software modules and the seed software module utilizing a second subset of features of the plurality of software modules, classifying one or more of the candidate software modules as a designated type based on the computed distances, generating a notification comprising a list of the classified candidate software modules, and controlling access by one or more client devices associated with an enterprise to the candidate software modules in the list.