Patent attributes
An analytics system creates a data structure counting strings of methylation vectors from a healthy control group. The analytics system enumerates possibilities of methylation state vectors given a sample fragment from a subject, and calculates probabilities for all possibilities with a Markov chain probability. The analytics system generates a p-value score for the subject's test methylation state vector by summing the calculated probabilities that are less than or equal to the calculated probability of the possibility matching the test methylation state vector. The analytics system determines the test methylation state vector to be anomalously methylated compared to the healthy control group if the p-value score is below a threshold score. With a number of such sample fragments, the analytics system can filter the sample fragments based on each p-value score. The analytics system can run a classification model on the filtered set to predict whether the subject has cancer.