Bi-Force: large-scale bicluster editing and its application to gene expression data biclustering

Peng Sun, Nora K Speicher, Richard Röttger, Jiong Guo, Jan Baumbach

Research output: Contribution to journalJournal articleResearchpeer-review

Abstract

The explosion of the biological data has dramatically reformed today's biological research. The need to integrate and analyze high-dimensional biological data on a large scale is driving the development of novel bioinformatics approaches. Biclustering, also known as 'simultaneous clustering' or 'co-clustering', has been successfully utilized to discover local patterns in gene expression data and similar biomedical data types. Here, we contribute a new heuristic: 'Bi-Force'. It is based on the weighted bicluster editing model, to perform biclustering on arbitrary sets of biological entities, given any kind of pairwise similarities. We first evaluated the power of Bi-Force to solve dedicated bicluster editing problems by comparing Bi-Force with two existing algorithms in the BiCluE software package. We then followed a biclustering evaluation protocol in a recent review paper from Eren et al. (2013) (A comparative analysis of biclustering algorithms for gene expressiondata. Brief. Bioinform., 14:279-292.) and compared Bi-Force against eight existing tools: FABIA, QUBIC, Cheng and Church, Plaid, BiMax, Spectral, xMOTIFs and ISA. To this end, a suite of synthetic datasets as well as nine large gene expression datasets from Gene Expression Omnibus were analyzed. All resulting biclusters were subsequently investigated by Gene Ontology enrichment analysis to evaluate their biological relevance. The distinct theoretical foundation of Bi-Force (bicluster editing) is more powerful than strict biclustering. We thus outperformed existing tools with Bi-Force at least when following the evaluation protocols from Eren et al. Bi-Force is implemented in Java and integrated into the open source software package of BiCluE. The software as well as all used datasets are publicly available at http://biclue.mpi-inf.mpg.de.

Original languageEnglish
JournalNucleic Acids Research
Volume42
Issue number9
Pages (from-to)e78
ISSN0305-1048
DOIs
Publication statusPublished - 2014

Fingerprint

Cluster Analysis
Gene Ontology
Explosions
Computational Biology
Research
Datasets
Power (Psychology)
Heuristics

Keywords

  • Algorithms
  • Animals
  • Cluster Analysis
  • Computer Simulation
  • Databases, Genetic
  • Gene Expression Profiling
  • Gene Ontology
  • Humans
  • Models, Genetic
  • Oligonucleotide Array Sequence Analysis
  • Principal Component Analysis
  • Software

Cite this

@article{e6ac25ee1b61439090a4b4f5de72baf6,
title = "Bi-Force: large-scale bicluster editing and its application to gene expression data biclustering",
abstract = "The explosion of the biological data has dramatically reformed today's biological research. The need to integrate and analyze high-dimensional biological data on a large scale is driving the development of novel bioinformatics approaches. Biclustering, also known as 'simultaneous clustering' or 'co-clustering', has been successfully utilized to discover local patterns in gene expression data and similar biomedical data types. Here, we contribute a new heuristic: 'Bi-Force'. It is based on the weighted bicluster editing model, to perform biclustering on arbitrary sets of biological entities, given any kind of pairwise similarities. We first evaluated the power of Bi-Force to solve dedicated bicluster editing problems by comparing Bi-Force with two existing algorithms in the BiCluE software package. We then followed a biclustering evaluation protocol in a recent review paper from Eren et al. (2013) (A comparative analysis of biclustering algorithms for gene expressiondata. Brief. Bioinform., 14:279-292.) and compared Bi-Force against eight existing tools: FABIA, QUBIC, Cheng and Church, Plaid, BiMax, Spectral, xMOTIFs and ISA. To this end, a suite of synthetic datasets as well as nine large gene expression datasets from Gene Expression Omnibus were analyzed. All resulting biclusters were subsequently investigated by Gene Ontology enrichment analysis to evaluate their biological relevance. The distinct theoretical foundation of Bi-Force (bicluster editing) is more powerful than strict biclustering. We thus outperformed existing tools with Bi-Force at least when following the evaluation protocols from Eren et al. Bi-Force is implemented in Java and integrated into the open source software package of BiCluE. The software as well as all used datasets are publicly available at http://biclue.mpi-inf.mpg.de.",
keywords = "Algorithms, Animals, Cluster Analysis, Computer Simulation, Databases, Genetic, Gene Expression Profiling, Gene Ontology, Humans, Models, Genetic, Oligonucleotide Array Sequence Analysis, Principal Component Analysis, Software",
author = "Peng Sun and Speicher, {Nora K} and Richard R{\"o}ttger and Jiong Guo and Jan Baumbach",
note = "{\circledC} The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.",
year = "2014",
doi = "10.1093/nar/gku201",
language = "English",
volume = "42",
pages = "e78",
journal = "Nucleic Acids Research",
issn = "0305-1048",
publisher = "Heinemann",
number = "9",

}

Bi-Force : large-scale bicluster editing and its application to gene expression data biclustering. / Sun, Peng; Speicher, Nora K; Röttger, Richard; Guo, Jiong; Baumbach, Jan.

In: Nucleic Acids Research, Vol. 42, No. 9, 2014, p. e78.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - Bi-Force

T2 - large-scale bicluster editing and its application to gene expression data biclustering

AU - Sun, Peng

AU - Speicher, Nora K

AU - Röttger, Richard

AU - Guo, Jiong

AU - Baumbach, Jan

N1 - © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

PY - 2014

Y1 - 2014

N2 - The explosion of the biological data has dramatically reformed today's biological research. The need to integrate and analyze high-dimensional biological data on a large scale is driving the development of novel bioinformatics approaches. Biclustering, also known as 'simultaneous clustering' or 'co-clustering', has been successfully utilized to discover local patterns in gene expression data and similar biomedical data types. Here, we contribute a new heuristic: 'Bi-Force'. It is based on the weighted bicluster editing model, to perform biclustering on arbitrary sets of biological entities, given any kind of pairwise similarities. We first evaluated the power of Bi-Force to solve dedicated bicluster editing problems by comparing Bi-Force with two existing algorithms in the BiCluE software package. We then followed a biclustering evaluation protocol in a recent review paper from Eren et al. (2013) (A comparative analysis of biclustering algorithms for gene expressiondata. Brief. Bioinform., 14:279-292.) and compared Bi-Force against eight existing tools: FABIA, QUBIC, Cheng and Church, Plaid, BiMax, Spectral, xMOTIFs and ISA. To this end, a suite of synthetic datasets as well as nine large gene expression datasets from Gene Expression Omnibus were analyzed. All resulting biclusters were subsequently investigated by Gene Ontology enrichment analysis to evaluate their biological relevance. The distinct theoretical foundation of Bi-Force (bicluster editing) is more powerful than strict biclustering. We thus outperformed existing tools with Bi-Force at least when following the evaluation protocols from Eren et al. Bi-Force is implemented in Java and integrated into the open source software package of BiCluE. The software as well as all used datasets are publicly available at http://biclue.mpi-inf.mpg.de.

AB - The explosion of the biological data has dramatically reformed today's biological research. The need to integrate and analyze high-dimensional biological data on a large scale is driving the development of novel bioinformatics approaches. Biclustering, also known as 'simultaneous clustering' or 'co-clustering', has been successfully utilized to discover local patterns in gene expression data and similar biomedical data types. Here, we contribute a new heuristic: 'Bi-Force'. It is based on the weighted bicluster editing model, to perform biclustering on arbitrary sets of biological entities, given any kind of pairwise similarities. We first evaluated the power of Bi-Force to solve dedicated bicluster editing problems by comparing Bi-Force with two existing algorithms in the BiCluE software package. We then followed a biclustering evaluation protocol in a recent review paper from Eren et al. (2013) (A comparative analysis of biclustering algorithms for gene expressiondata. Brief. Bioinform., 14:279-292.) and compared Bi-Force against eight existing tools: FABIA, QUBIC, Cheng and Church, Plaid, BiMax, Spectral, xMOTIFs and ISA. To this end, a suite of synthetic datasets as well as nine large gene expression datasets from Gene Expression Omnibus were analyzed. All resulting biclusters were subsequently investigated by Gene Ontology enrichment analysis to evaluate their biological relevance. The distinct theoretical foundation of Bi-Force (bicluster editing) is more powerful than strict biclustering. We thus outperformed existing tools with Bi-Force at least when following the evaluation protocols from Eren et al. Bi-Force is implemented in Java and integrated into the open source software package of BiCluE. The software as well as all used datasets are publicly available at http://biclue.mpi-inf.mpg.de.

KW - Algorithms

KW - Animals

KW - Cluster Analysis

KW - Computer Simulation

KW - Databases, Genetic

KW - Gene Expression Profiling

KW - Gene Ontology

KW - Humans

KW - Models, Genetic

KW - Oligonucleotide Array Sequence Analysis

KW - Principal Component Analysis

KW - Software

U2 - 10.1093/nar/gku201

DO - 10.1093/nar/gku201

M3 - Journal article

VL - 42

SP - e78

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 0305-1048

IS - 9

ER -