TY - JOUR
T1 - MoSBi
T2 - Automated signature mining for molecular stratification and subtyping
AU - Rose, Tim Daniel
AU - Bechtler, Thibault
AU - Ciora, Octavia Andreea
AU - Anh Lilian Le, Kim
AU - Molnar, Florian
AU - Köhler, Nikolai
AU - Baumbach, Jan
AU - Röttger, Richard
AU - Pauling, Josch Konstantin
PY - 2022/4/19
Y1 - 2022/4/19
N2 - SignificanceMolecular patient stratification and disease subtyping are ongoing and high-impact problems that rely on the identification of characteristic molecular signatures. Current computational methods show high sensitivity to custom parameterization, which leads to inconsistent performance on different molecular data. Our new method, MoSBi (molecular signature identification using biclustering), 1) enables so far unmatched high performance for stratification and subtyping across datasets of various different biomolecules, 2) provides a scalable solution for visualizing the results and their correspondence to clinical factors, and 3) has immediate practical relevance through its automatic workflow where individual selection, parameterization, screening, and visualization of biclustering algorithms is not required. MoSBi is a major step forward with a high impact for clinical and wet-lab researchers.AbstractThe improving access to increasing amounts of biomedical data provides completely new chances for advanced patient stratification and disease subtyping strategies. This requires computational tools that produce uniformly robust results across highly heterogeneous molecular data. Unsupervised machine learning methodologies are able to discover de novo patterns in such data. Biclustering is especially suited by simultaneously identifying sample groups and corresponding feature sets across heterogeneous omics data. The performance of available biclustering algorithms heavily depends on individual parameterization and varies with their application. Here, we developed MoSBi (molecular signature identification using biclustering), an automated multialgorithm ensemble approach that integrates results utilizing an error model-supported similarity network. We systematically evaluated the performance of 11 available and established biclustering algorithms together with MoSBi. For this, we used transcriptomics, proteomics, and metabolomics data, as well as synthetic datasets covering various data properties. Profiting from multialgorithm integration, MoSBi identified robust group and disease-specific signatures across all scenarios, overcoming single algorithm specificities. Furthermore, we developed a scalable network-based visualization of bicluster communities that supports biological hypothesis generation. MoSBi is available as an R package and web service to make automated biclustering analysis accessible for application in molecular sample stratification.
AB - SignificanceMolecular patient stratification and disease subtyping are ongoing and high-impact problems that rely on the identification of characteristic molecular signatures. Current computational methods show high sensitivity to custom parameterization, which leads to inconsistent performance on different molecular data. Our new method, MoSBi (molecular signature identification using biclustering), 1) enables so far unmatched high performance for stratification and subtyping across datasets of various different biomolecules, 2) provides a scalable solution for visualizing the results and their correspondence to clinical factors, and 3) has immediate practical relevance through its automatic workflow where individual selection, parameterization, screening, and visualization of biclustering algorithms is not required. MoSBi is a major step forward with a high impact for clinical and wet-lab researchers.AbstractThe improving access to increasing amounts of biomedical data provides completely new chances for advanced patient stratification and disease subtyping strategies. This requires computational tools that produce uniformly robust results across highly heterogeneous molecular data. Unsupervised machine learning methodologies are able to discover de novo patterns in such data. Biclustering is especially suited by simultaneously identifying sample groups and corresponding feature sets across heterogeneous omics data. The performance of available biclustering algorithms heavily depends on individual parameterization and varies with their application. Here, we developed MoSBi (molecular signature identification using biclustering), an automated multialgorithm ensemble approach that integrates results utilizing an error model-supported similarity network. We systematically evaluated the performance of 11 available and established biclustering algorithms together with MoSBi. For this, we used transcriptomics, proteomics, and metabolomics data, as well as synthetic datasets covering various data properties. Profiting from multialgorithm integration, MoSBi identified robust group and disease-specific signatures across all scenarios, overcoming single algorithm specificities. Furthermore, we developed a scalable network-based visualization of bicluster communities that supports biological hypothesis generation. MoSBi is available as an R package and web service to make automated biclustering analysis accessible for application in molecular sample stratification.
KW - biclustering
KW - multiomics
KW - pathomechanism
KW - stratification
KW - subtyping
KW - Metabolomics
KW - Humans
KW - Gene Expression Profiling
KW - Patients/classification
KW - Algorithms
KW - Proteomics
KW - Software
KW - Disease/classification
KW - Cluster Analysis
U2 - 10.1073/pnas.2118210119
DO - 10.1073/pnas.2118210119
M3 - Journal article
C2 - 35412913
AN - SCOPUS:85128152229
SN - 0027-8424
VL - 119
SP - e2118210119
JO - Proceedings of the National Academy of Sciences (PNAS)
JF - Proceedings of the National Academy of Sciences (PNAS)
IS - 16
M1 - e2118210119
ER -