Villum Fonden - Teknisk og naturvidenskabelig forskning - Artificial Intelligence to revolutionize Protein Mass Spectrometry

Projekter: ProjektPrivate fonde



Despite being the principal players in controlling cell behavior, the entire set of proteins (proteome) is difficult to quantify on large scale confining these experiments to only a few specialized laboratories ​1​. Hence, the current technological proteomics platforms cannot follow the pace of other omics technologies such as next-generation sequencing where large sample numbers are rapidly processed.

Mass spectrometry (MS) is the main technology to identify and quantify proteins but suffers from great complexity. For sufficient sensitivity and protein coverage, proteins need to be digested into peptides and then characterized in two or more MS stages. Liquid chromatography-mass spectrometry (LC-MS) is the main approach where the peptides are unfolded by mass and retention time (MS1) by chromatography, and then fragmented into ions for secondary mass spectra (MS2). This second stage is considered necessary to identify the peptide sequence despite of overlapping peptide peaks on MS1 and different peptides sharing their mass. However, this stage requires expensive instrumentation, it can impair the quality of the MS1 data and the selection of peptides can be highly stochastic leading to often poor reproducibility.

In the earlier days of LC-MS, only peptide masses were quantified (MS1 stage) which then was found to fail to identify proteins in complex samples. This peptide mass fingerprinting method only used the peptide mass for identifications, despite the rich information surrounding the two-dimensional peptide peak.

In a normally operating high-resolution mass spectrometer, a small window (+/- 2 min retention and +/- 100 m/z) around a peptide peak already includes >1000 peaks, all of which might be of importance to identify the peptide. This additional information is extremely complex as peptides come with many different side products such as common chemical modifications (e.g. oxidized Met residues) and a variety of additional, less prominent side-reactions. Moreover, commonly used reactions during sample preparation (e.g. reduction and alkylation of cysteines residues) bring further complexity as they often lead to massive peptide-specific side effects even for peptides with the same mass​2
Computational methods capable to quantify on the MS1 stage alone will pave the road to cheaper, smaller and faster low-resolution instruments such as orthogonal and miniature time-of-flight (TOF) mass spectrometers, that still will quantify entire proteomes in record time or are sufficiently small for mobile usage. With the current availability of high-performance computation, deep convolutional neural networks have the great potential to recognize ever so subtle characteristic patterns that are distributed over different mechanisms and not recognized by other prediction models.


Protein mass spectrometry is the major technique to characterize thousands of proteins in biological samples, and therefore is an important cornerstone for unraveling and understanding biological processes and their role in complex diseases. Although this experimental platform is increasingly used in biological and clinical research, the current instrumentation is very expensive, slow and requires experienced technical staff for their operation. One reason for this complexity is the usage of at least two stages of data acquisition to ensure high quality identification and quantification of proteins. We seek to revolutionize this slow and costly process by stripping the second stage from the experiment and replace it with cutting-edge artificial intelligence algorithms, so-called deep learning models. These AIs will be trained to predict proteins on basis of extensive mostly unharvested information from the first stage. We envision that our method will open protein mass spectrometry to a broader spectrum of use-cases in research and industry, and ultimately to enable high-quality research with small mobile units.
Effektiv start/slut dato01/11/201931/10/2021