Systems Genomics Modeling of Multi-drug Resistance in Mycobacterium tuberculosis

This project improves understanding and prediction of drug-resistant tuberculosis, supporting better public health interventions and identification of novel resistance mechanisms.
Multi-drug resistance (MDR) in tuberculosis (TB) is a growing global health challenge, particularly in developing nations like Nepal with a high TB burden. Drug-resistant TB, caused by mutations in the Mycobacterium tuberculosis (Mtb) genome, is rapidly increasing. The genetic foundations of drug resistance in TB are not fully understood, and existing catalogs of Mtb drug-resistant mutations are insufficient to explain numerous instances of drug-resistant TB. Identifying new drug-resistant mutations is challenging due to Mtb's genetic diversity and complex resistance mechanisms.
Our goal is to develop a machine learning method to predict drug-resistant TB leveraging a large pool of TB genomic data. Additionally, we aim to explore the underlying metabolic adaptation in drug-resistant TB using genome-scale metabolic modeling.
This project developed a scalable bioinformatics pipeline to process over 5,000 Mycobacterium tuberculosis genomes from clinical isolates worldwide. The pipeline robustly identifies mutations associated with drug resistance through integrated variant calling. An extensive evaluation of existing state-of-the-art deep learning methods revealed their heavy reliance on a limited set of well-studied genes and limited capability to discover new resistance-associated genes. This work provides a critical foundation for improved understanding of multi-drug resistance mechanisms and the discovery of novel genetic markers for drug-resistant tuberculosis.