Data mining and data reduction methods to detect interactions in epidemiologic data are being developed and tested. In these analyses, multifactor dimensionality reduction, focused interaction testing framework, and traditional logistic regression models were used to identify potential interactions with up to three factors. These techniques were used in a population-based case-control study of pancreatic cancer from the San Francisco Bay Area (308 cases, 964 controls). From 7 biochemical pathways, along with tobacco smoking, 26 polymorphisms in 20 genes were included in these analyses. Combinations of genetic markers and cigarette smoking were identified as potential risk factors for pancreatic cancer, including genes in base excision repair (OGG1), nucleotide excision repair (XPD, XPA, XPC), and double-strand break repair (XRCC3). XPD.751, XPD.312, and cigarette smoking were the best single-factor predictors of pancreatic cancer risk, whereas XRCC3. 241*smoking and OGG1.326*XPC.PAT were the best two-factor predictors. There was some evidence for a three-factor combination of OGG1.326*XPD.751*smoking, but the covariate-adjusted relative-risk estimates lacked precision. Multifactor dimensionality reduction and focused interaction testing framework showed little concordance, whereas logistic regression allowed for covariate adjustment and model confirmation. Our data suggest that multiple common alleles from DNA repair pathways in combination with cigarette smoking may increase the risk for pancreatic cancer, and that multiple approaches to data screening and analysis are necessary to identify potentially new risk factor combinations.
ASJC Scopus subject areas