Patent attributes
Behavior report generation monitors the behavior of unknown sample files executing in a sandbox. Behaviors are encoded and feature vectors created based upon a q-gram for each sample. Prototypes extraction includes extracting prototypes from the training set of feature vectors using a clustering algorithm. Once prototypes are identified in this training process, the prototypes with unknown labels are reviewed by domain experts who add a label to each prototype. A K-Nearest Neighbor Graph is used to merge prototypes into fewer prototypes without using a fixed distance threshold and then assigning a malware family name to each remaining prototype. An input unknown sample can be classified using the remaining prototypes and using a fixed distance. For the case that no such prototype is close enough, the behavior report of a sample is rejected and tagged as an unknown sample or that of an emerging malware family.