Identification of hepatocellular carcinoma-related genes with a machine learning and network analysis

Tuantuan Gui, Xiao Dong, Rudong Li, Yixue Li, Zhen Wang

Research output: Contribution to journalArticle

35 Citations (Scopus)

Abstract

Liver cancer is one of the leading causes of cancer mortality worldwide. Hepatocellular carcinoma (HCC) is the main type of liver cancer. We applied a machine learning approach with maximum-relevance-minimum-redundancy (mRMR) algorithm followed by incremental feature selection (IFS) to a set of microarray data generated from 43 tumor and 52 nontumor samples. With the machine learning approach, we identified 117 gene probes that could optimally separate tumor and nontumor samples. These genes not only include known HCC-relevant genes such as MT1X, BMI1, and CAP2, but also include cancer genes that were not found previously to be closely related to HCC, such as TACSTD2. Then, we constructed a molecular interaction network based on the protein-protein interaction (PPI) data from the STRING database and identified 187 genes on the shortest paths among the genes identified with the machine learning approach. Network analysis reveals new potential roles of ubiquitin C in the pathogenesis of HCC. Based on gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, we showed that the identified subnetwork is significantly enriched in biological processes related to cell death. These results bring new insights of understanding the process of HCC.

Original languageEnglish (US)
Pages (from-to)63-71
Number of pages9
JournalJournal of Computational Biology
Volume22
Issue number1
DOIs
StatePublished - Jan 1 2015

Fingerprint

Network Analysis
Electric network analysis
Learning systems
Hepatocellular Carcinoma
Identification (control systems)
Machine Learning
Genes
Gene
Cancer
Liver Neoplasms
Ubiquitin C
Liver
Tumor
Encyclopedias
Biological Phenomena
Neoplasms
Gene Ontology
Tumors
Neoplasm Genes
Protein-protein Interaction

Keywords

  • Hepatocellular carcinoma
  • maximum relevance minimum redundancy
  • protein-protein interaction.

ASJC Scopus subject areas

  • Modeling and Simulation
  • Molecular Biology
  • Genetics
  • Computational Mathematics
  • Computational Theory and Mathematics

Cite this

Identification of hepatocellular carcinoma-related genes with a machine learning and network analysis. / Gui, Tuantuan; Dong, Xiao; Li, Rudong; Li, Yixue; Wang, Zhen.

In: Journal of Computational Biology, Vol. 22, No. 1, 01.01.2015, p. 63-71.

Research output: Contribution to journalArticle

@article{d7cb00aea3ce449fba63adbca7936a91,
title = "Identification of hepatocellular carcinoma-related genes with a machine learning and network analysis",
abstract = "Liver cancer is one of the leading causes of cancer mortality worldwide. Hepatocellular carcinoma (HCC) is the main type of liver cancer. We applied a machine learning approach with maximum-relevance-minimum-redundancy (mRMR) algorithm followed by incremental feature selection (IFS) to a set of microarray data generated from 43 tumor and 52 nontumor samples. With the machine learning approach, we identified 117 gene probes that could optimally separate tumor and nontumor samples. These genes not only include known HCC-relevant genes such as MT1X, BMI1, and CAP2, but also include cancer genes that were not found previously to be closely related to HCC, such as TACSTD2. Then, we constructed a molecular interaction network based on the protein-protein interaction (PPI) data from the STRING database and identified 187 genes on the shortest paths among the genes identified with the machine learning approach. Network analysis reveals new potential roles of ubiquitin C in the pathogenesis of HCC. Based on gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, we showed that the identified subnetwork is significantly enriched in biological processes related to cell death. These results bring new insights of understanding the process of HCC.",
keywords = "Hepatocellular carcinoma, maximum relevance minimum redundancy, protein-protein interaction.",
author = "Tuantuan Gui and Xiao Dong and Rudong Li and Yixue Li and Zhen Wang",
year = "2015",
month = "1",
day = "1",
doi = "10.1089/cmb.2014.0122",
language = "English (US)",
volume = "22",
pages = "63--71",
journal = "Journal of Computational Biology",
issn = "1066-5277",
publisher = "Mary Ann Liebert Inc.",
number = "1",

}

TY - JOUR

T1 - Identification of hepatocellular carcinoma-related genes with a machine learning and network analysis

AU - Gui, Tuantuan

AU - Dong, Xiao

AU - Li, Rudong

AU - Li, Yixue

AU - Wang, Zhen

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Liver cancer is one of the leading causes of cancer mortality worldwide. Hepatocellular carcinoma (HCC) is the main type of liver cancer. We applied a machine learning approach with maximum-relevance-minimum-redundancy (mRMR) algorithm followed by incremental feature selection (IFS) to a set of microarray data generated from 43 tumor and 52 nontumor samples. With the machine learning approach, we identified 117 gene probes that could optimally separate tumor and nontumor samples. These genes not only include known HCC-relevant genes such as MT1X, BMI1, and CAP2, but also include cancer genes that were not found previously to be closely related to HCC, such as TACSTD2. Then, we constructed a molecular interaction network based on the protein-protein interaction (PPI) data from the STRING database and identified 187 genes on the shortest paths among the genes identified with the machine learning approach. Network analysis reveals new potential roles of ubiquitin C in the pathogenesis of HCC. Based on gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, we showed that the identified subnetwork is significantly enriched in biological processes related to cell death. These results bring new insights of understanding the process of HCC.

AB - Liver cancer is one of the leading causes of cancer mortality worldwide. Hepatocellular carcinoma (HCC) is the main type of liver cancer. We applied a machine learning approach with maximum-relevance-minimum-redundancy (mRMR) algorithm followed by incremental feature selection (IFS) to a set of microarray data generated from 43 tumor and 52 nontumor samples. With the machine learning approach, we identified 117 gene probes that could optimally separate tumor and nontumor samples. These genes not only include known HCC-relevant genes such as MT1X, BMI1, and CAP2, but also include cancer genes that were not found previously to be closely related to HCC, such as TACSTD2. Then, we constructed a molecular interaction network based on the protein-protein interaction (PPI) data from the STRING database and identified 187 genes on the shortest paths among the genes identified with the machine learning approach. Network analysis reveals new potential roles of ubiquitin C in the pathogenesis of HCC. Based on gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, we showed that the identified subnetwork is significantly enriched in biological processes related to cell death. These results bring new insights of understanding the process of HCC.

KW - Hepatocellular carcinoma

KW - maximum relevance minimum redundancy

KW - protein-protein interaction.

UR - http://www.scopus.com/inward/record.url?scp=84920276400&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84920276400&partnerID=8YFLogxK

U2 - 10.1089/cmb.2014.0122

DO - 10.1089/cmb.2014.0122

M3 - Article

C2 - 25247452

AN - SCOPUS:84920276400

VL - 22

SP - 63

EP - 71

JO - Journal of Computational Biology

JF - Journal of Computational Biology

SN - 1066-5277

IS - 1

ER -