SBIR/STTR Award attributes
ABSTRACT There is growing interest in the therapeutic application of phage for treatments of antibiotic-resistant infections and gut microbiome-related disorders. Phage therapies have the advantage of potentially extreme specificity for their targets leading to very little in the way of off-target side effects when compared with traditional antibiotic therapy. However, the identification of phage that target an organism of interest and determining host range remains a technical challenge. Host assignment for a phage typically requires laboratory culture of the organism of interest, a significant barrier when trying to target organisms which are difficult to culture, and introducing significant biases into the existing phage-host knowledge base. And like antibiotics, it is possible that organisms can acquire resistance to phage transduction, limiting the utility of a single phage to treat an infection over time. For these reasons it would be highly beneficial to have the ability to identify phage with potentially therapeutic targets efficiently from an uncultured population of microbes. In this application, we propose to develop a machine-learning based platform for the identification and assignment of phage and their hosts from metagenomic whole genome sequencing (WGS) data. Our approach leverages the unique property of proximity ligation sequencing, or Hi-C, to efficiently gather direct physical evidence of phage-host associations from mixed microbial communities. We propose to use this technology to assemble a large-scale, high-quality phage-host interaction dataset from human fecal samples, use it to train a machine learning model to predict phage-host relationships from existing WGS data, and provide a convenient platform for users to input metagenomic reads to receive phage-host information. This approach would enable the identification of phage and combinations of phage to simultaneously target organisms that are otherwise untractable through standard clinical methods from both existing and future WGS data sets.