Tutorial


Background and prerequisites


Phosphomatics helps you make sense of your high-throughput phosphoproteomics data. Our aim is to help you connect the phosphorylation sties that you observe in your mass spectrometry experiments to known upstream kinases. This provides a greater understanding of the regulatory signaling networks operating in your systems.

Currently, Phosphomatics is able to process Human, Mouse and Rat phosphorylation site data.

Phosphomatics fits in to your data analysis pipeline after searches have been conducted to identify phosphorylated peptides and proteins and, optionally, downstream statistical analysis has been conducted. Phosphomatics uses UniProt identifiers to index substrate-kinase relationships and construct signaling pathways. It is therefore important that either: 1) your database searches are either conducted against sequence databases obtained from UniProt, or 2) identifiers from a search using a different database are subsequently mapped to UniProt equivalents. Programmatic and non-programmatic methods to map identifiers between formats are available directly from UniProt.

Experimentally observed phosphorylation sites are scanned against a database of known phosphorylation interactions between individual substrates and upstream kinases extracted from Signor 2.0. As detailed elsewhere, this database was constructed through a high-quality curation of literature resources drawing on other phospho-specific interaction databases such as PhosphoSitePlus and PhosphoELM in addition to information drawn from newly published primary literature. Currently, this contains 17,704 substrate-kinase interactions for Human, Rat, and Mouse

Preparing a search


Data can be uploaded to Phosphomatics in two different formats: 1) a custom plain-text file that can be constructed from the output of almost any proteomics search software, or 2) for a qualitative analysis, the unaltered 'Phospho (STY) Sites.txt' file produced by MaxQuant can be supplied.

Plain-text file

Download and populate the input data template with your experimental data. The input fields are described in the tables below. For a minimal analysis, only the UniProt ID of the identified protein, phosphorylated residue number and amino acid are required. To filter the networks constructed by Phosphomatics, quantitative information can be provided. When completing the template, it is important that header line (i.e. the top line in the image below) not be changed in any way.

Important: The completed input data file must be saved as a 'plain text' CSV file. That is, not an Excel '.xlsx' file or similar.

Input Field Type Description
Protein UniProt ID Required The ID of the identified protein from UniProt
Phosphorylated Residue Number Required The position of the phosphorylated residue within the protein
Phosphorylated Amino Acid Required The identity of the amino acid phosphorylated. Must be a single letter residue code.
Log(Fold Change) Optional The log-transformed fold change in abundance of the phosphorylation site.
Negative log(p-val) Optional The negative log10 of the p-value associated with the fold change.

Once complete, click 'New Analysis', upload your completed input file and click 'Parse CSV file',

MaxQuant file upload

Phosphomatics can extract the information needed for a qualitative analysis directly from the 'Phospho (STY)Sites.txt' produced by MaxQuant when variable phosphorylation is allowed in a search. For this to be successful, it is essential that your 'Identifier parser rules' are correctly set within MaxQuant to extract only the UniProt identifier code from the header lines of entries in your FASTA file. If this has been done correctly, your Phospho (STY)Sites.txt files should look similar to the below:

Phosphomatics will automatically remove any entries from MaxQuant results tables that are attributable to decoy and contaminant protein hits. Furthermore, an option can be applied that removes phosphorylation sites with a localisation probability of less than a specified value.

A note on protein groups

Since phosphoproteomics experiments frequently include a phosphopeptide enrichment step, a large number of 'protein groups' are often produced by searches of the resulting data. Protein groups arise when a given peptide (or set of peptides) are degenerate and could support the presence of multiple proteins, but insufficient information is present to definitively identify individual entities (aka. the 'protein inference' problem). See here for a nice overview.

While protein groups are frequently produced in database searches, interactions between upstream kinases and downstream phosphorylation targets are annotated to individual entities. To handle this contrast, Phosphomatics 'unwraps' protein groups and considers each entity within the group individually. The assignments of unwrapped proteins to groups is, however, tracked so that users can interpret results within this context where need be. This data is presented on the Investigate cards for given substrate-kinase interactions,

Finalise

After uploading your input file, you will be shown an extract of the first few lines of the uploaded data. This allows you to check that everything is correctly formatted and interpreted by Phosphomatics. For data files where no quantitative information is supplied, fold-change and p-values are set to 0 in this table. Lastly, select the relevant species for your dataset. This will be used when mapping entities to canonical pathways for enrichment analysis. When ready, click 'Start Search'. A typical search takes ~1-2 mins to complete.

Summary


The summary boxes in the top-left of the page provide information about the fraction of input phosphorylation sites that were successfully mapped to possible upstream kinase as well as indicating the specificity of these mappings. Phosphomatics keeps track of mapping results to two different levels of specificity: Site-specific (or Exact) matches indicate phosphorylation sites in your input data set for which an upstream kinase is known to phosphorylate that precise position. Conversely, Protein-Level matches indicate cases where an upstream kinase is known to phosphorylate a given protein specified in your input data, although at a site different from that specified.

Phosphorylation networks

Phosphomatics constructs networks that show possible relationships between phosphorylation sites in your uploaded data file and upstream kinases. Here, phosphorylation sites (i.e. your input data) are represented as circles and upstream kinases are depicted as green triangles. If quantitative information has been supplied, the fold-change of a given site is represented by the color gradient. Importantly, each link drawn between an uploaded phosphorylation site and an upstream kinase is supported by at least one publication.

Each network diagram throughout Phosphomatics is interactive. Use your mouse pointer and wheel to pan and zoom through the network. Double-clicking anywhere in the network to enlarge to full screen. Right clicking a node (either kinase or substrate) or an edge will display a context menu with additional functions described below:

Action Applies To Description
Add all partners to myList Substrates and Kinases All substrate-kinase links (i.e. network edges) for the selected entity will be saved to the myList portfolio.
Add to myList Edges The selected relationship will be saved to the myList portfolio.
Investigate Substrates and Kinases View specific substrate or kinase information and interactions for the selected substrate or kinase.
Investigate Relationship Edges View specific information about the link between the selected substrate and kinase.

Data filters


By default, Phosphomatics displays information for all uploaded phosphorylation sites but provides a range of data filters can be applied to limit the visualisations to only include a certain subset of interest.

Clicking the 'Filters' button in the top right corner of any page will display the filters dialog box that allows you to add or subtract filters. Clicking apply will cause the page to reload with the selected filter set applied. Once set, filters are subsequently applied to all analysis pages that you visit within Phosphomatics - i.e. there is no need to re-apply filters if you move to a different analysis page.

The active filters are displayed at the top for the filters dialog box. Clicking an active filter will remove it from the analysis

Filter Description
Log(Fold Change) Threshold Absolute value threshold of the Log(Fold Change) for inclusion in the analysis. Only active if quantitative information is supplied in the input file.
Direction Required direction of Fold Change required for inclusion in the analysis. Only active if quantitative information is supplied in the input file.
-Log10(p-value) Threshold Threshold of the -Log10(p-value) for inclusion in the analysis. Only active if quantitative information is supplied in the input file.
Pathway Restrict phosphorylation sites to retain only those in the specified pathway (defined by KEGG).

Substrate and kinase analysis


The substrate and kinase analysis pages allow you to explore each of these entities and their relationships with other molecules in your data set. The layout of the substrate and kinase pages is highly similar and they are described here together. For example, a partial image of the substrate analysis page below. Here, the table at left lists all phosphorylation sites in your input data set that could be mapped to any upstream kinase at any level of specificity. Note that phosphorylation sites that could not be mapped to any upstream kinase are not shown in the table. The columns 'E' and 'P' represent the number of 'exact' and 'protein-level' matches to upstream kinases for a given phosphorylation site

Clicking rows in the substrate/kinase table on the left will update the remainder of the page with the relevant information. Data for each entity is presented in a number of data cards described below:

Analysis Description
Entity data Contextual data about the selected substrate or kinase drawn from UniProt. This provides functional information for the entity as well as links to outside databases
Site-specific phosphorylations Details of interactions identified with site-specific resolution. For substrate analyses, these represent interactions with upstream kinases that may have effected the observed phosphorylation site. For kinase analysis, these are downstream phosphorylation targets identified in your uploaded data set.
Protein-level phosphorylations As above, but for interactions identified with protein-level specificity
Co-localisation data Since only a small fraction of kinase-substrate interactions have been investigated and indexed, interaction data is drawn from BioGrid and presented here for the selected kinase or substrate. This shows the range of proteins with which the selected kinase or substrate is known to interact. This does not necessarily imply that a phosphorylation relationship exists but may nonetheless suggest possible upstream kinases or downstream phosphorylation sites.
Similar substrates Present for substrate analysis only. Since only a small fraction of kinase-substrate interactions have been investigated and indexed, it is likely that many of your uploaded phosphorylation sites will not be successfully mapped to known upstream kinases. In these cases, a sequence alignment can be performed between the selected phosphorylation site and the remainder of the sites in your uploaded dataset. This will return phosphorylation sites in your data set that share broadly similar primary sequence characteristics. Given that many kinases demonstrate a preference for certain primary sequence features in their phosphorylation targets, this may aid in uncovering addition substrates for certain kinases. Note: Similar substrates searching requires that full protein sequences be associated with each uploaded phosphorylation site. This process can take some time after the initial data processing step is completed. If similar substrates searches return no results after a minute or two, refresh the page and try again.

Pathway analysis


This page summarises your phosphorylation site data into known, canonical signaling pathways as defined by KEGG. The bar plot provides information about the numbers of your input substrates and mapped upstream kinases present in each signaling pathway and data on the statistical enrichment of entities is provided in the table to the right. If no information is displayed in the table, it is likely that either: 1) you selected the incorrect taxonomy in the initial search setup for your data, 2) your taxonomy is not supported by Phosphomatics, or 3) your data filters are too strict.

Field Description
Pathway The KEGG-defined canonical signaling pathway
Substrates The number of proteins in your uploaded dataset that meet the applied data filters and are components of a given pathway
Kinases The number of upstream kinases that could be mapped to the uploaded data set and are components of a given pathway
Enrichment (-log10(p-value)) The negative log10 of the p-value for over-representation of entities in a given pathway compared to that predicted by chance alone. Enrichment analysis is conducted using g:Profiler

Clicking 'Add to myList' will add all interactions between substrates and kinases in a given pathway to the myList portfolio. Clicking 'view' will open a separate tab displaying more detail about the entities in a given pathway. For example, the 'Cellular senescence' pathway details are shown below. Click the 'Plot Options' button to toggle the information displayed between a quantitative and qualitative analysis.

`
Field Value Description
Plot type Fold-change Shade entities based on provided quantitative information for phosphorylation site fold-changes. Only substrates meeting the applied data filters are shown and mapped (unobserved) upstream kinases are omitted.
Relations A qualitative plot that highlights individual entities that relate to substrates meeting the applied data filters. Here, kinases mapped to input phosphorylation sites are indicated in addition to phosphorylation targets.
Method Max Applies to quantitative 'fold-change' plot type only. In cases where two or more phosphorylation sites on the same protein are present and meet the applied data filters, shade the relevant entity by the maximal fold-change (either positive or negative) of the set.
Mean Applies to quantitative 'fold-change' plot type only. In cases where two or more phosphorylation sites on the same protein are present and meet the applied data filters, shade the relevant entity by the mean fold-change of the set.

Investigate


Each relationship between an uploaded phosphorylation site and an upstream kinase drawn by Phosphomatics has its own, dedicated 'Investigate' page. These pages summarise and expand on the evidence that was used to draw this link. The investigate page for a given relationship can be accessed through the edge right-click menu of any network diagram in Phosphomatics, or by clicking the 'Investigate' button appearing in tables summarising interaction data for substrates or kinases.

Analysis Description
Protein Group Information If a protein was a constituent of an indistinguishable protein group in the uploaded data file, this will be indicated here. Take note of this information since substrate-kinases relationships are indexed for individual entities.
Protein Information Contextual data about the interacting substrate and kinase drawn from UniProt. This provides functional information for the entity as well as links to outside databases
Signor Data Data and publications supporting the selected interaction drawn from the Signor database. Click the PubMed ID link to access publications detailing this interaction.
BioGrid Interactions Interaction data drawn from BioGrid for the selected substrate and kinase. Click the PubMed ID link to access publications detailing this interaction or the BioGrid ID to view the associated interaction card at BioGrid.
Text Mining Implemented using iTextMine, the text mining card uses natural language processing algorithms to search scientific literature for manuscripts that reference the elected phosphorylation relationship. These can serve to highlight important literature that may be useful in contextualising this finding. Note that this process is sensitive to protein nomenclatures used in databases and those published by individual authors in literature.
Similar substrates Present for substrate analysis only. Since only a small fraction of kinase-substrate interactions have been investigated and indexed, it is likely that many of your uploaded phosphorylation sites will not be successfully mapped to known upstream kinases. In these cases, a sequence alignment can be performed between the selected phosphorylation site and the remainder of the sites in your uploaded dataset. This will return phosphorylation sites in your data set that share broadly similar primary sequence characteristics. Given that many kinases demonstrate a preference for certain primary sequence features in their phosphorylation targets, this may aid in uncovering addition substrates for certain kinases.

My list


Specific interactions of interest can be added to the 'myList' portfolio at various points throughout Phosphomatics. The myList tab allows you to review these interactions, curate the list if need be and download the results. Clicking the 'Download myList' button will download a csv file containing details of the specific interactions in the myList portfolio while the 'Download All' button will produce a file containing all interactions for all uploaded phosphorylation sites.