Patent attributes
Computational methods used for large scale scaffolding of a genome assembly are provided. Such methods may include a step of applying a location clustering model to a test set of contigs to form two or more location cluster groups, each location cluster group comprising one or more location-clustered contigs; a step of applying an ordering model to each of the two or more location cluster groups to form an ordered set of one or more location-clustered contigs within each cluster group; and a step of applying an orienting model to each ordered set of one or more location-clustered contigs to assign a relative orientation to each of the location-clustered contigs within each location cluster group. In some aspects, the test set of contigs are generated from aligning a set of reads generated by a chromosome conformation analysis technique (e.g., Hi-C) with a draft assembly, a reference assembly, or both.