Jimmy Royer, Analysis Group
Using Machine Learning Methods to Predict Tuberculosis Treatment Resistance.
Machine-learning algorithms are used to detect complex, often unforeseen patterns within rich datasets. There are two general categories of algorithms: unsupervised and supervised. Supervised machine-learning algorithms, the topic of this presentation, start out with a hypothesis and categories that are set out in advance. These algorithms are then “trained” on data for which the outcomes of interest are known, with the training process continuing until a desired level of accuracy is achieved. These results are then used to make predictions based on out-of-sample data for which the outcome of interest is not known. While most statistical models can be viewed as a simpler form of machine-learning algorithm that imposes a pre-determined functional form for the relationship between the predictors and the outcome of interest, more advanced machine-learning algorithms impose much less structure and can therefore detect very complex and intricate relationships in high-dimensional data (i.e., data with several different types of variables, possibly including quantitative, text and image information). Advances are now being made in analyzing the output of these algorithms to permit assessment of the relative importance of each variable. The current talk will provide an introduction to neural networks, an advanced supervised machine learning method. The methodology is then applied to lab data from the World Health Organization (WHO) to identify gene mutations associated with resistance to tuberculosis treatment that are amenable to targeted drug therapy.