Patent 12067114 was granted and assigned to CrowdStrike on August, 2024 by the United States Patent and Trademark Office.
Training and use of a byte n-gram embedding model is described herein. A neural network is trained to determine a probability of occurrence associated with a byte n-gram. The neural network includes one or more embedding model layers, at least one of which is configured to output an embedding array of values. The byte n-gram embedding model may be used to generate a hash of received data, to classify the received data with no knowledge of a data structure associated with the received data, to compare the received data to files having a known classification, and/or to generate a signature for the received data.