Patent attributes
Disclosed herein are methods, systems, and processes for context-based identification of anomalous log data. Log data with multiple original logs is received at an anomalous log data identification system. A context associated training dataset is generated by splitting a string in a log into multiple split strings, generating a context association between each split string and a unique key that corresponds to the log, and generating an input/output (I/O) string data batch that includes I/O string data for each split string in the log by training each split string against every other split string in the log. A context-based anomalous log data identification model is then trained according to a machine learning technique using the I/O string data batch that includes a list of unique strings in the context associated training dataset. The training tunes the context-based anomalous log data identification model to classify or cluster a vector associated with a new string in a new log that is not part of the multiple original logs as anomalous.