A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data

Bioinformatics. 2016 Jan 1;32(1):1-8. doi: 10.1093/bioinformatics/btv544. Epub 2015 Sep 15.

Abstract

Motivation: Recent advances in high-throughput omics technologies have enabled biomedical researchers to collect large-scale genomic data. As a consequence, there has been growing interest in developing methods to integrate such data to obtain deeper insights regarding the underlying biological system. A key challenge for integrative studies is the heterogeneity present in the different omics data sources, which makes it difficult to discern the coordinated signal of interest from source-specific noise or extraneous effects.

Results: We introduce a novel method of multi-modal data analysis that is designed for heterogeneous data based on non-negative matrix factorization. We provide an algorithm for jointly decomposing the data matrices involved that also includes a sparsity option for high-dimensional settings. The performance of the proposed method is evaluated on synthetic data and on real DNA methylation, gene expression and miRNA expression data from ovarian cancer samples obtained from The Cancer Genome Atlas. The results show the presence of common modules across patient samples linked to cancer-related pathways, as well as previously established ovarian cancer subtypes.

Availability and implementation: The source code repository is publicly available at https://github.com/yangzi4/iNMF.

Contact: gmichail@umich.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Computer Simulation
  • DNA Methylation / genetics
  • Databases, Genetic
  • Female
  • Genomics / methods*
  • Humans
  • MicroRNAs / metabolism
  • Ovarian Neoplasms / genetics
  • Reproducibility of Results

Substances

  • MicroRNAs