Phosphomatics Documentation


For global phosphoproteomics, analysising phosphopeptides via mass spectrometry and processing the raw data with suftware such as MaxQuant typically produces a large and unintelligible list of phosphoproteins and quantification values. A significant degree of post processing is then required to identify differentially regulated phosphorylation sites and place these in the context of the underlying biology. This is frequently a complicated process requiring specialist training and many different software packages.

Phosphomatics aims to provide a unified platform that allows researchers to quickly conduct statistical and biological analysis of their mass spectrometry data.

Data Upload and Preparation


Getting Started


To begin a new analysis, upload a data file containing your phosphoproteomics data uding the file upload form on the phosphomatics home page.

Phosphomtics is very flexible in the format of the datafile that can be uploaded however there are a few basic requirements. For each phosphoryaltion site, Phosphomatics input data file must contain: (1) UniProt ID, (2) Phosphorylated residue (S/T/Y), (3) Phosphorylation site, (4) Quantification values. The column labels don't need to match those in the example below and the order of the columns is not important. An empty template and completed file containing an example dataset can be downloaded using the buttons below.

Data Import Wizard


Phosphomatics provides a convenient data import wizard that guides you the common steps involved in preparing phosphoproteomics data for downstream statistical and biological interpretataion.

Column Assignment


The column headers from your input file are listed in the 'Assign Data Columns' table. Use the dropdown menu in each row to assign each column to one of the five available categories (UniProt ID, Site, Residue, Quant, Not Used). UniProt ID, Site and Residue must each be assigned to exactly one data column while multiple quant columns are allowed. Columns assigned to 'Not Used' will simply be ignored throughout the remainder of the analysis.

For easy data import from a range of proteomics search software, a number of preset data import filters can be applied. For example, the 'Phospho (STY) Sites.txt' file produced by MaxQuant contains many data fields that are not necessary for downstream analysis. Selecting 'MaxQuant LFQ' from the 'Apply Preset' dropdown menu and then clicking 'Apply' will automatically detect the known structure of these data files and assign fields to the correct categories.

Correct column assignment is critical to successful processing so these should be checked carefully. The 'Show Active Only' checkbox can be toggled to hide all the columns assigned to 'Not Used' which can be numerous in some cases

Sample Grouping


Next, study groups can be constructed from analytical samples. At least three replicate experiments are required in each sample group for sound statistical analysis. Provide a name for each sample group and then click 'Create Group'.

Assign samples to each group by clicking the button. The quantification samples will be displayed. CLick the rows corresponding to the samples you wish to assign to the sample group and click 'submit'. NOTE: It is important that all quantification samples are assigned to a group. To exclude a given sample from analysis, change the column type to 'Not Used' in the 'Column Assignment' tab.

Samples will appear in a table within each group. You may, optionally, set an alias and index for each sample:

Parameter Description
Alias Sample names parsed directly from input files are frequently length and reflect raw mass spectrometry file names or extraneous information added by search engines. Set an alias to replace these with a shorter name. These will then be displayed in subsequent plots. Clicking the button will allow you to set aliases for all samples in a group simultaneously.
Index Sample indices define the order in which your samples should be displayed in subsequent analysis plots and graphics. These are integers that are continuous across groups. These may be useful in cases where an intuitive order of the samples and groups is desired Eg. when time-resolved data has been acquired.

Data Filtering and Imputation


Depending on the sample and analysis methodologies, quantitative proteomics data can contain many 'missing' or 'zero' values. These indicate that a given protein (or phosphopeptide) could not be detected or quantified in a given sample. This reduces the number of 'real' quantitation replicates that we have in the dataset for that entity which raises important questions for how we handle downstream statistical analysis.

One simple method to address this problem is to simply exclude phosphosites with any zero values from the analysis. This is certainly a valid approach however it could be considered overly stringent in many cases since a large fraction of entities may contain some proportion of zero values. Excluding all of these would then result in loss of significant and meaningful datga. Conversely, if a phosphosite contains too great a fraction of zero values, insufficient 'real' observations are present to draw meaningful conclusions.

Typically, this problem is handled by a combination of filtering out phosphosites that have too great a proportion of missing values, and 'imputing' any missing values still present in the remaining data set. Phosphomatics requires that no missing values are present in the final dataset after processing so it is essential that these steps are performed unless you're sure that your uploaded data file does not contain missing values.

Typically, this problem is handled by a combination of filtering out phosphosites that have too great a proportion of missing values, and 'imputing' any missing values still present in the remaining data set. Phosphomatics requires that no missing values are present in the final dataset after processing so it is essential that these steps are performed unless you're sure that your uploaded data file does not contain missing values.

Some search engines will display decoy or contaminant proteins in search results alongside true identifications and these should be filtered before continuing with downstream analysis. Click the 'New Filter' button to specify filter terms that will be used to exclude phosphorylation sites from further analysis. These filters will then be matched against the UniProt Identifier column of an input phosphopeptide. If any filter is found in the UniProt ID column, the site will be excluded.

For example, in the image above, two filters are added that exclude phosphopeptides that have either 'REV' or 'CON' in the UniProt Identifier column.

Data Normalisation and Transformation


Proteomics data typically requires Log transformation prior to conducting statistical tests. This is because protein (or phosphopeptide) abundances are generally left-skewed - that is, there are many more sites of low abundance than of high abundance - and many statistical tests expect that data approximates a normal distribution. Log transformation should be applied to reduce the skew of the data.

Clicking the 'Preview' button will display a histogram of all abundance values in the uploaded dataset with the selected transformation method applied

Furthermore, small, random errors in sample preparation typically produced slightly varying levels of protein content in the final sample that is analysed. This results in differing amounts of protein available for detection by the mass spectrometer and hence globally varying signal intensities. To correct for this, normalisations can be applied to compensate for the impacts of varying global intensities.

This step is not mandatory but may improve the downstream analysis in some circumstances. Remember that normalisation is not a remedy for poor sample preparation and typically assumes that the bulk of the detected proteins are not influenced by the treatment conditions.

Clicking the 'Preview' button will display a boxplot of the abundance values for each sample for the selected normalisation method

Create Group Comparisons


Here, we can optionally pre-create some sample group comparisons. Two-sample t-tests are conducted for the specified group comparison and the sites meeting provided thresholds are placed into data groups.

Click the button to add a new comparison row. Under the headings 'Group 1' and 'Group 2', select the sample groups you wish to compare. The -Log10(p-value) and (log) fold change threshold can then be specified.

Data Review and Submission


In the final tab, a preview is provided of your prepared data after all the selected import options have been applied. At this point, you can go back and edit any of the import parameters prior to starting an analysis.

Once you're happy with the import settings, clicking 'Submit' will start the data processing. For a typical analysis of ~10,000 phosphorylation sites, this typicall takes approximately 5 minutes to complete.

Data Groups


What are Data Groups in Phosphomatics?


Global (phospho)proteomics experiments can identify and quantify thousands of peptides however, typically, the majority of these are not significantly changed in abundance between different treatment conditions and only a small fraction show large and reproducible differences. To help make sense of 'omics data, analyses frequently focus on this latter group of peptides that are significantly changed in abundance since these are most likely to drive or explain observable differences in biological outcomes. This means that, for an entire global phosphoproteomcis data set uploaded to Phosphomatics, we may wish to focus on only a small portion of the observed phosphosites.

To achieve this in a dynamic way, Phosphomatics provides numerous methods that can be used to create subsets of your input data that contain only phosphorylation sites with certain features. For example, groups can be created from clusters of phosphosites appearing in a heatmap, from peptides in the 'significant' regions of a volcano plot, from proteins that share certain Gene Ontology terms etc.

Creating Data Groups


New data groups can be created from the results of a range of different analyses in phosphomatics. The exact process depends slightly on the type of analysis conducted however, in general, clicking the 'Create Group' button present on certain analysis pages will open a menu with a range of options allowing you to customise the contents of the new data group

Data groups can be created from:

  • Clusters identified in cluster maps
  • Desired regions of volcano plots
  • Components of cannonical pathways or gene ontology terms
  • known substrates of selected kinases

Working with data groups


Once a data group has been created, you will have the ability to activate it using the dropdown menu in the top right corner of the screen. Once a new data group is selected from the dropdown menu, the active page will be refreshed and the presented analysis updated to include only data from the selected group. For example, if a data group is created from a certain cluster of phosphosites from a heatmap, activating this group from the dropdown menu will result in all subsequent analyses being conducted on only the sites from that cluster and not the entire dataset.

Data groups can also be created from the contents of other data groups (i.e. groups can be 'nested'). For example, a group created from significant peptides in a volcano plot can be futher sectioned into a sub-group containing only nuclear proteins. Through creative use of data groups, complex relationships can addressed and data structures created.

Managing data groups


On the 'Data Groups' tab, you can find a listing of all data groups that have been created from your analysis. The group 'All' is present by default and cannot be deleted. This contains all input phosphosites that passed your selected import filters and is the 'base' for the creation of additional groups..

Clicking 'View' will display a list of all entities in the relevant group while clicking 'delete' will remove the group from your analysis. In the event that you delete the group that is currently active, the active group is reset to the default 'All'.

Data Summary and Statistical analysis


Once the initial processing is complete, you will be redirected to the data summary page where you can begin to investigate the statistical relationships between phosphosites in your data.

Multivariate Analysis


Multivariate analysis simultaneously considers many phosphorylation sites (variables) to provide a high-level view of how your samples - and sample groups - relate to one another. Phosphomatics prives a number of different multivariate analysis options.

Cluster Map


Cluster maps are particularly useful for visualising patterns of phosphorylation site abundance changes between samples and treatment groups. Here, hierarchical clustering is performed to create groups of phosphorylation sites that share similar patterns of abundance changes. A range of plot control options are provided at right which can be used to customise the data, analysis and display settings. After editing these parapeters, click 'Update' to visualise the result.

By default, phosphosite abundances are z-transformed and a mild fold-change and p-value filter is applied to exclude phosphosites that are clearly unchanged in abundance. This can help to create more sharp clusters that may have biological importance. These filters can, however, be removed using the options provided at right. Distinct clusters of phosphosites that have similar patterns of abundance changes are indicated by the colour-coding along the left side of the plot. These are determined by agglomerative hierarchical clustering using distance and linkage metrics that can be customised using the options at right.

New data groups can be created from heatmap clusters. Click the button to open the group creation dialog box. Check one or more of the colour-coded circles corresponding to the clusters you wish to enter into a groups. Provide either a name for the new group or nominate an existing group to which these sites should be added. Then, click 'Create'.

Clidk the button to download the preapred plot data.

After changing any of the parameters, click the button to refresh the plot.

Parameter Description
-Log10(p-val) threshold Only phosphorylation sites that have greater than the specified -Log10(p-value) will be included in the analysis. For a >2 group analysis, this is derived from ANOVA. Setting this value to 0 will disable this filter.
Fold change threshold Only phosphorylation sites that have greater than the specified log2(fold change) will be included in the analysis. For a >2 group analysis, this is the ANOVA F-statistic. Setting this value to 0 will disable this filter.
Number of clusters Sets the number of clusters that phsophomatics should attempt to identify in the data.
Row-wise transformation Whether to apply a Z-transform to phosphopeptide intensities prior to clustering. Z-transformation is useful for displaying relationships between the patterns of relative change in phosphosite abundance irrespective of absolute abundance.
Linkage method Methodology used to compute the distance between two clusters. See here for detailed information about this calculation.
Distance metric Methodology used to compute pairwise distances between points. See here for detailed information about this calculation.

Univariate Analysis


Volcano Plot


A volcano plot is useful for quickly identifying groups of phosphopeptides that have statistically and biologically significant abundance changes between two sample groups. Here, for each phosphosite, the log2(fold-change) between two groups is plotted on the x-axis vs the -Log10(p-value) for the observation on the y-axis. Points, corresponding to individual phosphosites, appearing in the upper left and upper right quadrants of the plot posess both a substantial fold change and significant p-value indicating that these are likely important sites that are related to the differences between the two sample groups.

In the example above, 'signifiant' phosphosites are highlighted in red while the remaining sites that do not meet either the p-value or fold change thresholds shown in blue. Horizontal and vertical red lines indicated the positions of the set thresholds.

Parameter Description
Comparison Group A/B The two sample treatment groups defined above that should be considered in conctruction of the volcano plot.
-Log10(p-value) threshold The -Log10(p-value) threshold required for phosphorylation sites to be considered as significant.
Fold change threshold The log2(fold change) threshold required for phosphorylation sites to considered as significant.

New data groups can be created from the significant phosphosites in the volcano plot. Click the button to open the group creation dialog box. Provide either a name for the new group or nominate an existing group to which these sites should be added. Then, click 'Create'.

Clidk the button to download the preapred plot data.

After changing any of the parameters, click the button to refresh the plot.

Feature Correlation


Feature correlations analyses seek to identify groups of features (phosphorylation sites) that have similar patterns of abundance changes between different samples. Here, for each pair of features, abundance values across all samples are used to calculate a pearson correlation coefficient that describes the similarity in the pattern of changes. High coefficients (closer to 1) indicate that a given pair of phosphotylation sites share similar abundance changes while low scores indicate substantial divergence.

Note that producing this plot can be a computationally demanding task when the number of phosphosites to consider is large. As a result, a strict p-value filter is applied by default.

New data groups can be created from heatmap clusters. Click the button to open the group creation dialog box. Check one or more of the colour-coded circles corresponding to the clusters you wish to enter into a groups. Provide either a name for the new group or nominate an existing group to which these sites should be added. Then, click 'Create'.

Clidk the button to download the preapred plot data.

After changing any of the parameters, click the button to refresh the plot.

Parameter Description
-Log10(p-value) threshold Only phosphorylation sites that have greater than the specified -Log10(p-value) will be included in the analysis. For a >2 group analysis, this is derived from ANOVA. Setting this value to 0 will disable this filter.
Fold change threshold Only phosphorylation sites that have greater than the specified log2(fold change) will be included in the analysis. For a >2 group analysis, this is the ANOVA F-statistic. Setting this value to 0 will disable this filter.
Number of clusters Sets the number of clusters that phsophomatics should attempt to identify in the data.
Row-wise transformation Whether to apply a Z-transform to phosphopeptide intensities prior to clustering. Z-transformation is useful for displaying relationships between the patterns of relative change in phosphosite abundance irrespective of absolute abundance.
Color palette Selects a color scheme for drawing the heatmap.
Linkage method Methodology used to compute the distance between two clusters. See here for detailed information about this calculation.
Distance metric Methodology used to compute pairwise distances between points. See here for detailed information about this calculation.

Enrichment Analysis


Enrichment Anslysis (or Over-representation analyses) seek to determine if a given set of proteins contains a greater number of entities with certain functional, spatial or procedural characteristics than would be expected by chance alone. As a simple example, if it is known that 10% of all proteins are located in the nucleus and, in your experimental data, 90% of differentially regulated proteins are known to locate in the nucleus, we would strongly suspect that some significant biological change has occured to nuclear dynamics.

New data groups can be created from phosphorylation sites of the proteins that comprise each GO term or pathway. Check the boxes in the 'Action' column of the terms that you want to include in the new group. Once all desired terms have been selected, Click 'Create group from selected' and provide a name for the new group or select an existing group to which the current selection should be added..

Parameter Description
Process The pathway or ontological term under consideration
Entities The number of proteins in your active data set that are elements of this process. Note that this considers proteins only - i.e. multiple phosphopeptides from the same protein are only counted once.
-log10(p-val) The false-discovery rate-adjusted p-value for the enrichment of proteins in your active dataset versus the protein set of a given process

New data groups can be created from the significant phosphosites in given ontologies or pathways. Select the checkboxes in the 'Action' column for the terms you wish to include int he new group. Click the button to open the group creation dialog box. Provide either a name for the new group or nominate an existing group to which these sites should be added. Then, click 'Create'.

Global Upstream Kinase Analysis


Once you have conducted any desired statisstical analysis and created data groups of interest, you can proceed to analyse the possible upstream kinases for your phosphorylation sites.

Kinase-Linked Cluster Map


The kinase-linked cluster map may be useful for visualising groups of kinases that are associated with substrates that vary in abundance between samples and treatment groups. Here, hierarchical clustering is performed to create group of kinases known to phosphorylate substrates with similar patterns of abundance changes between samples.

By contrast to the substrate cluster map (wherein each pixel represents the abundance of a given phosphorylation site in a given sample), pixels in the kinase-linked cluster map represent possible upstream kinases. The abundance value for a given kinases is determined by averaging the observed intensities of all known phosphorylation substrates in the currently active data group. Note that the abundance values for a given kinase are not fixed since they are sensitive to the conposition of the data group used to construct the plot.

New data groups can be created from heatmap clusters. Note that the data group created will consist of the substrates of the kinases comprising the target cluster. Click the button to open the group creation dialog box. Check one or more of the colour-coded circles corresponding to the clusters you wish to enter into a groups. Provide either a name for the new group or nominate an existing group to which these sites should be added. Then, click 'Create'.

Clidk the button to download the preapred plot data.

After changing any of the parameters, click the button to refresh the plot.

Parameter Description
Number of clusters Sets the number of clusters that phsophomatics should attempt to identify in the data.
Row-wise transformation Whether to apply a Z-transform to phosphopeptide intensities prior to clustering. Z-transformation is useful for displaying relationships between the patterns of relative change in phosphosite abundance irrespective of absolute abundance.
Linkage method Methodology used to compute the distance between two clusters. See here for detailed information about this calculation.
Distance metric Methodology used to compute pairwise distances between points. See here for detailed information about this calculation.
Specificity Required specificity of kinase-substrate interactions for inclusion in the networks constructed. Note that selection of 'Protein' level specificity can lead to extremely large graphs that can take some time to calculate and draw - be patient.

KSEA


The KSEA tab implements the Kinase-Substrate Enrichment Analysis as described by Casado et. al.using the scripts of Wiredja, et. al.. This analysis attempts identify kinases that are associated with phosphorylation sites in a data set that substantially differ in abundance between two treatment groups.

The plot above shows the Z-score for enrichment of different kinases in the example data set for the control and THZ treated groups. Kinases with Z-scores greater than 0 are more active in the control group. In this experient, THZ is a CDK7 inhibitor and this is reflected in the greater activity of CDK7 in the control group vs THZ-treated group.

A more detailed breakdown of the KSEA results is available in a tabular format

New data groups can be created from the substrates of KSEA-identified kinases. First, check the box next to the kinases whose substrates you wish to add to the new group in the results view table (above). Next, click the button to open the group creation dialog box. Provide either a name for the new group or nominate an existing group to which these sites should be added. Then, click 'Create'.

Clidk the button to download the preapred plot data.

After changing any of the parameters, click the button to refresh the plot. <

Definitions of the KSEA table columns are given below:

Parameter Description
Log2FC Mean Log2 fold change of substrates for each kinase
Enrichment Mean Log2 fold change of a given kinases substrates divided by mean Log2 fold-change of substrates for all kinases
m Number of substrates for each kinase
Z-score The weighted and standard deviation-normalised score for each kinase
p-val T-test p-value for the observation

A range of parameters can be modified to refine the KSEA analysis. These are defined below:

Important: For KSEA analysis, all uploaded phosphorylation sites that have passed the selected import filters are used for analysis regardless of the currently active data group. That is, KSEA is conducted with the constituents of the 'all' data group regardless of the currently active selection. This is because KSEA computes a z-score for the enrichment of a certain kinases' phosphorylation sites compared to the aggregate of all phosphorylation sites which would become invalid as smaller data subsets were selected. This is the only time in phosphomatics wherein calculations are not performed on the currently active data group.

Parameter Description
Group 1 / Group 2 The two sample treatment groups defined above that should be considered for KSEA analysis.
Use NetworKIN KSEA uses kinase-substrate relationship data from PhosphoSitePlus to link observed phosphorylation sites to upstream kinases. However, only a small fraction of kinase-substrate relationships have been identified, characterised and indexed in PhosphoSitePlus meaning that much of the input data will be unused. NetworKIN is an upstream kinase prediction algorithm that can be used to supplement the experimental PhosphoSitePlus data used by KSEA. This can allow for greater fractions of input data to be assigned to upstream kinases and then be considered by KSEA. The use of NetworKIN predictions can be disabled by unchecking this box.
NetworKIN Threshold Minimum NetworKIN score for a relationship to be utilised in KSEA analysis.
m Threshold For a given upstream kinase, m is the minimum number of substrates in the input dataset that must be assigned to this kinase for it to be included in KSEA analysis.
p-val Threshold Statistical significance threshold for a given kinase to be included in the output graphics.

Kinase Volcano


Parameter Description
Comparison Group A/B The two sample treatment groups defined above that should be considered in conctruction of the volcano plot.
-Log10(p-value) threshold The -Log10(p-value) threshold required for phosphorylation sites to be considered as significant.
Fold change threshold The log2(fold change) threshold required for phosphorylation sites to considered as significant.
Specificity Required specificity of kinase-substrate interactions for inclusion in the networks constructed. Note that selection of 'Protein' level specificity can lead to extremely large graphs that can take some time to calculate and draw - be patient.

Kinase Correlation


Parameter Description
Number of clusters Sets the number of clusters that phsophomatics should attempt to identify in the data.
Row-wise transformation Whether to apply a Z-transform to phosphopeptide intensities prior to clustering. Z-transformation is useful for displaying relationships between the patterns of relative change in phosphosite abundance irrespective of absolute abundance.
Linkage method Methodology used to compute the distance between two clusters. See here for detailed information about this calculation.
Distance metric Methodology used to compute pairwise distances between points. See here for detailed information about this calculation.
Specificity Required specificity of kinase-substrate interactions for inclusion in the networks constructed. Note that selection of 'Protein' level specificity can lead to extremely large graphs that can take some time to calculate and draw - be patient.

Network


The 'Networks' tab displays specific relationships between the substrates in the active data group and known upstream kinases based on data from PhosphoSitePlus and Signor. Network disgrams represent substrates of the active data group as coloured circles and known upstream kinases as green circles. Colour-coding of substrates represents mean fold-change between two selected treatment groups.

Parameter Description
Group 1 / Group 2 The two sample treatment groups defined above that should be used to determine mean fold-change.
Specificity Required specificity of kinase-substrate interactions for inclusion in the networks constructed. Note that selection of 'Protein' level specificity can lead to extremely large graphs that can take some time to calculate and draw - be patient.

Interactions


Substrate-kinase relationships can be viewed in a tabular format on the 'Interaction Details' page. Here, clicking cells corresponding to either substrates or kinases will open a window that protein functional information and external links drawn from UniProt. Each substrate-kinase relationship has a dedicated page summarising additional evidence for this interaction which can be accessed by cliking the 'Investigate' button.

Parameter Description
Specificity Required specificity of kinase-substrate interactions for inclusion in the networks constructed.

Kinase Explorer


Substrate Quantitation


Parameter Description
Comparison Group A/B The two sample treatment groups defined above that should be considered in conctruction of the volcano plot.
Specificity Required specificity of kinase-substrate interactions for inclusion in the networks constructed. Note that selection of 'Protein' level specificity can lead to extremely large graphs that can take some time to calculate and draw - be patient.

Sequence Analysis


Parameter Description
Specificity Required specificity of kinase-substrate interactions for inclusion in the networks constructed. Note that selection of 'Protein' level specificity can lead to extremely large graphs that can take some time to calculate and draw - be patient.
Type Output type to display. 'Table' for a detailed list of phosphorylation substrate sequence windows for the selected kinase, 'Logo' for a graphical depiction of residue frequencies at each position.
Palette Colour residues by various physicochemical properties.

Substrate Explorer


Substrate Map


Multiple sites of phosphorylation on the same protein are frequently identified and quantified in global mass spectrometry experiments. While these may have differing biological functions, and are generally quantified separately, it may be useful to visualise phosphorylation site abundances on the same protein together.

The primary sequence corresponding to the protein of the selected site is displayed on the left. Residues highlighted in red correspond to phosphorylation sites in the active data group that were also identified in the uploaded data. Clicking the red boxes will update the currently selected phosphorylation site.

The plot at right shows quantitation information for each phosphorylation site on a given protein.

Substrate Correlations


Substrate correlation analysis helps you identify phosphorylation sites that have similar patterns of abundance changes to a selected target. Here, a correlation score is computed between the selected phosphorylation site and all other phosphorylation sites in the active data group. These scores are then ranked and the most highly correlated substrates are displayed. While phosphorylation sites that are highly correlated are not necessarily functionally related, this may provide usefull leads that expand the analysis of interesting substrates.

Parameter Description
Show Top N Return and display the highest-scoring (most correlated) N substrates. Note that a maximum of 200 correlations can be returned.
Method Method used to compute correlation scores. See here for details.

The bar plot above gives the correlations scores computed using the selected method of the top N-many substrates and the scatter plot below shows the profile of abundance changes for these phosphorylation sites across the sample groups. Here, a clear relationship can be seen wherein abundance decreases in the Quant_CTRL.2 sample and is increased in all three THZ samples.

New data groups can be created from the TopN correlated phosphorylation sites by clicking the 'Create Group' button. After a short (2-3 s) delay, the newly created cluster should appear in the dropdown menu at the top right of the screen.

Investigate


Each relationship between an uploaded phosphorylation site and an upstream kinase that can be drawn by Phosphomatics had a dedicated 'Investigate' page that centralises information and resources that may be drawn upon to support a given interaction. These can be accessed by clicking the 'Investigate' button appearking in detailed substrate-kinase interaction tables or through the right-click menu of edges connecting substrates and kinases in network diagrams

Information presented here is drawn from a range of sources detailed below:

Parameter Description
Protein Summary Contextual information drawn from Uniprot summarising the known functions of both kinase and substrate.
Interaction Evidence Specific evidence for a phosphorylation interaction between the selected substrate and kinase drawn from Signor 2.0 and PhosphoSitePlus.
Substrate Processing Information about how phosphorylation impacts the function and interactions of the selected substrate drawn from UniProt.
Substrate PTMs Known alternative PTM sites on the substrate protein drawn from UniProt.
Text Mining To reveal additional published literature that may support the selected substrate kinase-relationship, text mining of published manuscripts is implemented through iTextMine. Note that the intent of this feature is to suggest additional manuscripts that may be helpful. It is subject to the language used by the authors of each manuscript and therefor may be inaccurate in cases. Ensure that you review manuscripts suggested for accuracy.

"Reference" columns in these tables provide PubMed ID numbers of published manuscripts supporting a given claim. Clicking these ID numbers will display abstract and citation data for the manuscript.

Tables containing reference information also contain checkboxes that can be used to select and save reference information to the 'My References' tab. To do this, check the boxes adjacent to the desired references and then click the 'Add Selected to Bibliography' button. The checkbox in the header row can be used to save all references in a given table.

My References


References saved to your bibliography can be viewed in the 'My References' tab. Clicking entries in the table at left will display abstract and citation data fof a given manuscript. The search field can be used to filter results and the 'Group' or 'All' toggle switches can be used to display either all saved references or only those that were saved to the currently active data group.

A PDF file containing abstract and citations for saved references can be downloaded from the 'Downloads' page.

Saving Results and Returning to Your Analysis


Download result files


A wide variety of different result files can be obtained for your analysis from the download center. Clicking the links should start a download. In some cases, additional parameters can be specified that determine the scope of data to be included in the downloaded file. For example, the site specificity of of kinase-substrate relationship tables (i.e. site-specific or protein-level) or the treatment groups to be compared.

In some cases, results files for download can take some time to produce so please be patient. This is particularly important for Cytoscape network files and KSEA input files.

In general, result file downloads are in a tab-delimited, plain-text file format. Exceptions are bibliography (pdf) and cytoscape input files.

Returning to a previous analysis


The analysis summary page provides a Dataset ID which is unique to your analysis results. You can use this code to return to your analysis at a later time or share your results with collaborators by using the 'Restore Previous' form on the Phosphomatics home page.

Keep in mind though that analysis results are only stored for two weeks following the data of most recent access and automatically erased thereafter.