Proteomic Validation of Transcript Isoforms, Including Those Assembled from RNA-Seq Data

Aidan P Tay, Chi Nam Ignatius Pang, Natalie a Twine, Gene Hart-Smith, Linda Harkness, Moustapha Kassem, Marc R Wilkins

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

Resumé

Human proteome analysis now requires an understanding of protein isoforms. We recently published the PG Nexus pipeline, which facilitates high confidence validation of exons and splice junctions by integrating genomics and proteomics data. Here we comprehensively explore how RNA-seq transcriptomics data, and proteomic analysis of the same sample, can identify protein isoforms. RNA-seq data from human mesenchymal (hMSC) stem cells were analyzed with our new TranscriptCoder tool to generate a database of protein isoform sequences. MS/MS data from matching hMSC samples were then matched against the TranscriptCoder-derived database, along with Ensembl and the neXtProt database. Querying the TranscriptCoder-derived or Ensembl database could unambiguously identify ∼450 protein isoforms, with isoform-specific proteotypic peptides, including candidate hMSC-specific isoforms for the genes DPYSL2 and FXR1. Where isoform-specific peptides did not exist, groups of nonisoform-specific proteotypic peptides could specifically identify many isoforms. In both the above cases, isoforms will be detectable with targeted MS/MS assays. Unfortunately, our analysis also revealed that some isoforms will be difficult to identify unambiguously as they do not have peptides that are sufficiently distinguishing. We covisualize mRNA isoforms and peptides in a genome browser to illustrate the above situations. Mass spectrometry data is available via ProteomeXchange (PXD001449).
OriginalsprogEngelsk
TidsskriftJournal of Proteome Research
Vol/bind14
Udgave nummer9
Sider (fra-til)3541-3554
ISSN1535-3893
DOI
StatusUdgivet - 2015

Fingeraftryk

Protein Isoforms
RNA
Peptides
Databases
Genes
Proteomics
RNA Isoforms
Proteome
Stem cells
Mesenchymal Stromal Cells
Mass spectrometry
Exons
Assays
Pipelines

Citer dette

Tay, A. P., Pang, C. N. I., Twine, N. A., Hart-Smith, G., Harkness, L., Kassem, M., & Wilkins, M. R. (2015). Proteomic Validation of Transcript Isoforms, Including Those Assembled from RNA-Seq Data. Journal of Proteome Research, 14(9), 3541-3554. https://doi.org/10.1021/pr5011394
Tay, Aidan P ; Pang, Chi Nam Ignatius ; Twine, Natalie a ; Hart-Smith, Gene ; Harkness, Linda ; Kassem, Moustapha ; Wilkins, Marc R. / Proteomic Validation of Transcript Isoforms, Including Those Assembled from RNA-Seq Data. I: Journal of Proteome Research. 2015 ; Bind 14, Nr. 9. s. 3541-3554.
@article{64234a1343634198aaa6be953fcb7a1b,
title = "Proteomic Validation of Transcript Isoforms, Including Those Assembled from RNA-Seq Data",
abstract = "Human proteome analysis now requires an understanding of protein isoforms. We recently published the PG Nexus pipeline, which facilitates high confidence validation of exons and splice junctions by integrating genomics and proteomics data. Here we comprehensively explore how RNA-seq transcriptomics data, and proteomic analysis of the same sample, can identify protein isoforms. RNA-seq data from human mesenchymal (hMSC) stem cells were analyzed with our new TranscriptCoder tool to generate a database of protein isoform sequences. MS/MS data from matching hMSC samples were then matched against the TranscriptCoder-derived database, along with Ensembl and the neXtProt database. Querying the TranscriptCoder-derived or Ensembl database could unambiguously identify ∼450 protein isoforms, with isoform-specific proteotypic peptides, including candidate hMSC-specific isoforms for the genes DPYSL2 and FXR1. Where isoform-specific peptides did not exist, groups of nonisoform-specific proteotypic peptides could specifically identify many isoforms. In both the above cases, isoforms will be detectable with targeted MS/MS assays. Unfortunately, our analysis also revealed that some isoforms will be difficult to identify unambiguously as they do not have peptides that are sufficiently distinguishing. We covisualize mRNA isoforms and peptides in a genome browser to illustrate the above situations. Mass spectrometry data is available via ProteomeXchange (PXD001449).",
author = "Tay, {Aidan P} and Pang, {Chi Nam Ignatius} and Twine, {Natalie a} and Gene Hart-Smith and Linda Harkness and Moustapha Kassem and Wilkins, {Marc R}",
year = "2015",
doi = "10.1021/pr5011394",
language = "English",
volume = "14",
pages = "3541--3554",
journal = "Journal of Proteome Research",
issn = "1535-3893",
publisher = "American Chemical Society",
number = "9",

}

Tay, AP, Pang, CNI, Twine, NA, Hart-Smith, G, Harkness, L, Kassem, M & Wilkins, MR 2015, 'Proteomic Validation of Transcript Isoforms, Including Those Assembled from RNA-Seq Data', Journal of Proteome Research, bind 14, nr. 9, s. 3541-3554. https://doi.org/10.1021/pr5011394

Proteomic Validation of Transcript Isoforms, Including Those Assembled from RNA-Seq Data. / Tay, Aidan P; Pang, Chi Nam Ignatius; Twine, Natalie a; Hart-Smith, Gene; Harkness, Linda; Kassem, Moustapha; Wilkins, Marc R.

I: Journal of Proteome Research, Bind 14, Nr. 9, 2015, s. 3541-3554.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

TY - JOUR

T1 - Proteomic Validation of Transcript Isoforms, Including Those Assembled from RNA-Seq Data

AU - Tay, Aidan P

AU - Pang, Chi Nam Ignatius

AU - Twine, Natalie a

AU - Hart-Smith, Gene

AU - Harkness, Linda

AU - Kassem, Moustapha

AU - Wilkins, Marc R

PY - 2015

Y1 - 2015

N2 - Human proteome analysis now requires an understanding of protein isoforms. We recently published the PG Nexus pipeline, which facilitates high confidence validation of exons and splice junctions by integrating genomics and proteomics data. Here we comprehensively explore how RNA-seq transcriptomics data, and proteomic analysis of the same sample, can identify protein isoforms. RNA-seq data from human mesenchymal (hMSC) stem cells were analyzed with our new TranscriptCoder tool to generate a database of protein isoform sequences. MS/MS data from matching hMSC samples were then matched against the TranscriptCoder-derived database, along with Ensembl and the neXtProt database. Querying the TranscriptCoder-derived or Ensembl database could unambiguously identify ∼450 protein isoforms, with isoform-specific proteotypic peptides, including candidate hMSC-specific isoforms for the genes DPYSL2 and FXR1. Where isoform-specific peptides did not exist, groups of nonisoform-specific proteotypic peptides could specifically identify many isoforms. In both the above cases, isoforms will be detectable with targeted MS/MS assays. Unfortunately, our analysis also revealed that some isoforms will be difficult to identify unambiguously as they do not have peptides that are sufficiently distinguishing. We covisualize mRNA isoforms and peptides in a genome browser to illustrate the above situations. Mass spectrometry data is available via ProteomeXchange (PXD001449).

AB - Human proteome analysis now requires an understanding of protein isoforms. We recently published the PG Nexus pipeline, which facilitates high confidence validation of exons and splice junctions by integrating genomics and proteomics data. Here we comprehensively explore how RNA-seq transcriptomics data, and proteomic analysis of the same sample, can identify protein isoforms. RNA-seq data from human mesenchymal (hMSC) stem cells were analyzed with our new TranscriptCoder tool to generate a database of protein isoform sequences. MS/MS data from matching hMSC samples were then matched against the TranscriptCoder-derived database, along with Ensembl and the neXtProt database. Querying the TranscriptCoder-derived or Ensembl database could unambiguously identify ∼450 protein isoforms, with isoform-specific proteotypic peptides, including candidate hMSC-specific isoforms for the genes DPYSL2 and FXR1. Where isoform-specific peptides did not exist, groups of nonisoform-specific proteotypic peptides could specifically identify many isoforms. In both the above cases, isoforms will be detectable with targeted MS/MS assays. Unfortunately, our analysis also revealed that some isoforms will be difficult to identify unambiguously as they do not have peptides that are sufficiently distinguishing. We covisualize mRNA isoforms and peptides in a genome browser to illustrate the above situations. Mass spectrometry data is available via ProteomeXchange (PXD001449).

U2 - 10.1021/pr5011394

DO - 10.1021/pr5011394

M3 - Journal article

VL - 14

SP - 3541

EP - 3554

JO - Journal of Proteome Research

JF - Journal of Proteome Research

SN - 1535-3893

IS - 9

ER -