Data Imputation in Merged Isobaric Labeling-Based Relative Quantification Datasets

Nicolai Bjødstrup Palstrøm, Rune Matthiesen, Hans Christian Beck

Research output: Contribution to journalJournal articleResearchpeer-review

Abstract

The data-dependent acquisition in mass spectrometry-based proteomics combined with quantitative analysis using isobaric labeling (iTRAQ and TMT) inevitably introduces missing values in proteomic experiments where a number of LC-runs are combined, especially in the growing field of shotgun clinical proteomics, where the protein profiles from the proteomics analysis of several hundred patient samples are compared and correlated to clinical traits such as a specific disease or disease treatment in order to link specific outcomes to one or more proteins. In the context of clinical research it is evident that missing values in such datasets reduce the power of the downstream statistical analysis therefore may hampers the linking of the expression of disease traits to the expression of specific proteins that may be useful for prognostic, diagnostic, or predictive purposes. In our study, we tested three data imputation approaches initially developed for microarray data for the imputation of missing values in datasets that are generated by several runs of shotgun proteomic experiments and where the data were relative protein abundances based on isobaric tags (iTRAQ and TMT). Our conclusion is that imputation methods based on k Nearest Neighbors successfully impute missing values in datasets with up to 50% missing values.

Original languageEnglish
JournalMethods in Molecular Biology
Volume2051
Pages (from-to)297-308
Number of pages12
ISSN1064-3745
DOIs
Publication statusPublished - 2020

Fingerprint

Firearms
Proteins
Datasets
Research

Cite this

Palstrøm, Nicolai Bjødstrup ; Matthiesen, Rune ; Beck, Hans Christian. / Data Imputation in Merged Isobaric Labeling-Based Relative Quantification Datasets. In: Methods in Molecular Biology. 2020 ; Vol. 2051. pp. 297-308.
@article{d35d13ab945d469c87c3effd3a542161,
title = "Data Imputation in Merged Isobaric Labeling-Based Relative Quantification Datasets",
abstract = "The data-dependent acquisition in mass spectrometry-based proteomics combined with quantitative analysis using isobaric labeling (iTRAQ and TMT) inevitably introduces missing values in proteomic experiments where a number of LC-runs are combined, especially in the growing field of shotgun clinical proteomics, where the protein profiles from the proteomics analysis of several hundred patient samples are compared and correlated to clinical traits such as a specific disease or disease treatment in order to link specific outcomes to one or more proteins. In the context of clinical research it is evident that missing values in such datasets reduce the power of the downstream statistical analysis therefore may hampers the linking of the expression of disease traits to the expression of specific proteins that may be useful for prognostic, diagnostic, or predictive purposes. In our study, we tested three data imputation approaches initially developed for microarray data for the imputation of missing values in datasets that are generated by several runs of shotgun proteomic experiments and where the data were relative protein abundances based on isobaric tags (iTRAQ and TMT). Our conclusion is that imputation methods based on k Nearest Neighbors successfully impute missing values in datasets with up to 50{\%} missing values.",
author = "Palstr{\o}m, {Nicolai Bj{\o}dstrup} and Rune Matthiesen and Beck, {Hans Christian}",
year = "2020",
doi = "10.1007/978-1-4939-9744-2_13",
language = "English",
volume = "2051",
pages = "297--308",
journal = "Methods in Molecular Biology",
issn = "1064-3745",
publisher = "AAAI Press",

}

Data Imputation in Merged Isobaric Labeling-Based Relative Quantification Datasets. / Palstrøm, Nicolai Bjødstrup; Matthiesen, Rune; Beck, Hans Christian.

In: Methods in Molecular Biology, Vol. 2051, 2020, p. 297-308.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - Data Imputation in Merged Isobaric Labeling-Based Relative Quantification Datasets

AU - Palstrøm, Nicolai Bjødstrup

AU - Matthiesen, Rune

AU - Beck, Hans Christian

PY - 2020

Y1 - 2020

N2 - The data-dependent acquisition in mass spectrometry-based proteomics combined with quantitative analysis using isobaric labeling (iTRAQ and TMT) inevitably introduces missing values in proteomic experiments where a number of LC-runs are combined, especially in the growing field of shotgun clinical proteomics, where the protein profiles from the proteomics analysis of several hundred patient samples are compared and correlated to clinical traits such as a specific disease or disease treatment in order to link specific outcomes to one or more proteins. In the context of clinical research it is evident that missing values in such datasets reduce the power of the downstream statistical analysis therefore may hampers the linking of the expression of disease traits to the expression of specific proteins that may be useful for prognostic, diagnostic, or predictive purposes. In our study, we tested three data imputation approaches initially developed for microarray data for the imputation of missing values in datasets that are generated by several runs of shotgun proteomic experiments and where the data were relative protein abundances based on isobaric tags (iTRAQ and TMT). Our conclusion is that imputation methods based on k Nearest Neighbors successfully impute missing values in datasets with up to 50% missing values.

AB - The data-dependent acquisition in mass spectrometry-based proteomics combined with quantitative analysis using isobaric labeling (iTRAQ and TMT) inevitably introduces missing values in proteomic experiments where a number of LC-runs are combined, especially in the growing field of shotgun clinical proteomics, where the protein profiles from the proteomics analysis of several hundred patient samples are compared and correlated to clinical traits such as a specific disease or disease treatment in order to link specific outcomes to one or more proteins. In the context of clinical research it is evident that missing values in such datasets reduce the power of the downstream statistical analysis therefore may hampers the linking of the expression of disease traits to the expression of specific proteins that may be useful for prognostic, diagnostic, or predictive purposes. In our study, we tested three data imputation approaches initially developed for microarray data for the imputation of missing values in datasets that are generated by several runs of shotgun proteomic experiments and where the data were relative protein abundances based on isobaric tags (iTRAQ and TMT). Our conclusion is that imputation methods based on k Nearest Neighbors successfully impute missing values in datasets with up to 50% missing values.

U2 - 10.1007/978-1-4939-9744-2_13

DO - 10.1007/978-1-4939-9744-2_13

M3 - Journal article

VL - 2051

SP - 297

EP - 308

JO - Methods in Molecular Biology

JF - Methods in Molecular Biology

SN - 1064-3745

ER -