Predicting Breast Cancer and Prostate Cancer Susceptibility from Single Nucleotide Polymorphisms

Abstract

Large-scale genome-wide genetic profiling using markers of SNPs provides opportunities to investigate the possibility of using those biomarkers for predicting genetic risks. Recent computational studies have identified some associated genetic variations which can explain a fraction of breast cancer risk and prostate cancer risk. We attempt to build accurate classification models for predicting disease susceptibility based on human SNPs. We firstly carry out feature selection via logistic regression coupled with a likelihood ratio test and remove a large number of irrelevant SNPs. Then, we employ supervised learning method SVM to build classification models. Our computational results show that our feature selection method based on logistic regression and likelihood ratio test can effectively select relevant features for SVM on the prostate cancer dataset, whereas it does not help SVM very much when applied on the breast cancer dataset.

Publication
ICML 2013 Workshop on Role of Machine Learning in Transforming Healthcare