Computational Biology and Bioinformatics
Sundaram, Shobhita (School: IE San Juan Bautista de la Salle)
Pancreatic cancer (PC) is currently one of the deadliest cancers, with a 5-year survival rate of just 7%. Detecting PC while it is localized increases the survival rate to 30%, however to date, such a diagnostic tool is nonexistent. In this project, I proposed the development of a computational classification model that can accurately detect premalignant PC from patient blood mass spectrometry (MS) data. A database of 181 MS samples was used: 80 from test-subjects with premalignant PC, and 111 controls from healthy test-subjects. Preprocessing was performed using non-classical statistical techniques to eliminate chemical noise and extract true signals. Detected peaks were analyzed to determine the most discriminative biomarkers. Prior research used univariate feature selection methods to measure each variable’s importance. In this research, a novel hybrid approach was developed, combining univariate and multivariate methods to analyze interactions between protein markers, and how interdependencies impact predictive value. Using this method, several machine learning algorithms were trained to produce diagnostic models. The highest-performing model achieved an ROC accuracy of 80%, demonstrating significant improvement over prior research, which was limited to 69%. For each of the selected biomarker proteins, intensity levels were found to be significantly elevated or suppressed in the pre-cancerous samples, compared to control samples. These results suggest that a hybrid approach to feature selection can discover new biomarkers, and lead to development of superior tools to diagnose premalignant pancreatic cancer, thus allowing doctors to treat it while the disease is still localized, and curative surgery is still possible.
National Security Agency Research Directorate : First Place Award Mathematics $1,000
Fourth Award of $500
American Statistical Association: Certificate of Honorable Mention