Department of Electrical and Communications Engineering, Gauhati University, India
Research
Using Machine Learning to Identify the Risk Factors of Pancreatic Cancer from the NCI PLCO Dataset
Author(s): Ananya Dutta*
Background: Pancreatic cancer (PC) is a disease with poor prognosis and survival rate. There is a pertinent need to identify the risk factors of this disease. The purpose of this study is to identify a subset of factors (a.k.a. features) as predictors of PC from the Prostate, Lung, Colorectal and Ovarian (PLCO) cancer dataset consisting of responses to 65 questions about demographics, cancer and health history, medication usage, and smoking habits from 154,897 participants.
Method: There are two challenges to selecting the subset of features that predict PC with highest probability: The problem is computationally intractable, and the PLCO dataset is highly imbalanced. We use an innovative method to use the dataset in a balanced way, without involving up or down-sampling. We use nine feature selection methods to select the optimal subset of f.. View More»