| Title: |
Multi-Label Random Forest Model for Tuberculosis Drug Resistance Classification and Mutation Ranking |
| Authors: |
Kouchaki, S; Yang, Y; Lachapelle, A; Walker, T; Walker, SA; Hoosdally, S; Gibertoni Cruz, AL; Carter, J; Grazian, Clara; Earle, SG; Fowler, P; Iqbal, Z; Hunt, M; Knaggs, J; Smith, GE; Rathod, P; Jarrett, L; Matias, D; Cirillo, DM; Borroni, E; Battaglia, S; Ghodousi, A; Spitaler, A; Cabibbe, A; Tahseen, S; Nilgiriwala, K; Shah, S; Rodrigues, C; Kambli, P; Surve, U; Khot, R; Niemann, S; Merker, M; Hoffmann, H; Todt, K; Plesnik, S; Ismail, N; Omar, SV; Joseph, L; Thwaites, G; Thuong, TNT; Ngoc, NH; Srinivasan, V; Moore, D; Coronel, J; Solano, W; Gao, GF; He, G; Zhao, Y; Liu, C; Ma, A; Zhu, B; Laurenson, I; Claxton, P; Koch, A; Wilkinson, R; Lalvani, A; Posey, J; Gardy, J; Werngren, J; Paton, N; Jou, R; Wu, MH; Lin, WH; Ferrazoli, L; Siqueira de Oliveira, R; Arandjelovic, I; Chaiprasert, A; Comas, I; Roig, CR; Drobniewski, FA; Farhat, MR; Gao, Q; Hee, ROT; Sintchenko, V; Supply, P; van Soolingen, D; Peto, TEA; Crook, D; Clifton, D |
| Source: |
urn:ISSN:1664-302X ; Frontiers in Microbiology, 11, 667 |
| Publisher Information: |
Frontiers Media |
| Publication Year: |
2020 |
| Collection: |
UNSW Sydney (The University of New South Wales): UNSWorks |
| Subject Terms: |
31 Biological Sciences; 3102 Bioinformatics and Computational Biology; Orphan Drug; Infectious Diseases; Rare Diseases; Biodefense; Antimicrobial Resistance; Tuberculosis; Emerging Infectious Diseases; 3 Good Health and Well Being; MLRF; SLRF; drug resistance; mutation ranking; CRyPTIC Consortium; anzsrc-for: 31 Biological Sciences; anzsrc-for: 3102 Bioinformatics and Computational Biology; anzsrc-for: 0502 Environmental Science and Management; anzsrc-for: 0503 Soil Sciences; anzsrc-for: 0605 Microbiology; anzsrc-for: 3107 Microbiology; anzsrc-for: 3207 Medical microbiology |
| Description: |
Resistance prediction and mutation ranking are important tasks in the analysis of Tuberculosis sequence data. Due to standard regimens for the use of first-line antibiotics, resistance co-occurrence, in which samples are resistant to multiple drugs, is common. Analysing all drugs simultaneously should therefore enable patterns reflecting resistance co-occurrence to be exploited for resistance prediction. Here, multi-label random forest (MLRF) models are compared with single-label random forest (SLRF) for both predicting phenotypic resistance from whole genome sequences and identifying important mutations for better prediction of four first-line drugs in a dataset of 13402 Mycobacterium tuberculosis isolates. Results confirmed that MLRFs can improve performance compared to conventional clinical methods (by 18.10%) and SLRFs (by 0.91%). In addition, we identified a list of candidate mutations that are important for resistance prediction or that are related to resistance co-occurrence. Moreover, we found that retraining our analysis to a subset of top-ranked mutations was sufficient to achieve satisfactory performance. The source code can be found at http://www.robots.ox.ac.uk/~davidc/code.php. |
| Document Type: |
article in journal/newspaper |
| File Description: |
application/pdf |
| Language: |
unknown |
| Relation: |
https://hdl.handle.net/1959.4/unsworks_67797; https://doi.org/10.3389/fmicb.2020.00667 |
| DOI: |
10.3389/fmicb.2020.00667 |
| Availability: |
https://hdl.handle.net/1959.4/unsworks_67797; https://unsworks.unsw.edu.au/bitstreams/afdf5471-c253-40c1-9ac1-78708ee0b1d0/download; https://doi.org/10.3389/fmicb.2020.00667 |
| Rights: |
open access ; https://purl.org/coar/access_right/c_abf2 ; CC BY ; https://creativecommons.org/licenses/by/4.0/ ; free_to_read |
| Accession Number: |
edsbas.9EF69A6A |
| Database: |
BASE |