Patent attributes
Systems, methods, and apparatuses for generating and using machine learning models using genetic data. A set of input features for training the machine learning model can be identified and used to train the model based on training samples, e.g., for which one or more labels are known. As examples, the input features can include aligned variables (e.g., derived from sequences aligned to a population level or individual references) and/or non-aligned variables (e.g., sequence content). The features can be classified into different groups based on the underlying genetic data or intermediate values resulting from a processing of the underlying genetic data. Features can be selected from a feature space for creating a feature vector for training a model. The selection and creation of feature vectors can be performed iteratively to train many models as part of a search for optimal features and an optimal model.