Methods and systems for processing a plurality of sample reads for genome sequencing include, for each sample read of the plurality of sample reads, comparing substring sequences from the sample read to reference sequences representing different portions of a reference genome. One or more reference sequences are identified that match one or more of the compared substring sequences, and a probabilistic location within the reference genome is determined for the sample read based on the one or more identified reference sequences. The reference genome is partitioned for reference-aligned genome sequencing based on the determined probabilistic locations of the respective sample reads.