PRIDE welcomes submissions of protein and peptide identification/quantification data with the accompanying mass spectra evidence and any other related data types, usually to be published in peer-reviewed publications. The focus of PRIDE is to support the deposition of proteomics datasets from any experimental approach.
The current way of submitting data to PRIDE is using the PRIDE Submission tool check the full documentation and Globus check the full documentation, an alternate way to submit data to PRIDE. This page contains a summarized introduction of the document explaining the steps about How to Submit datasets to PRIDE database following the ProteomeXchange guidelines. Alternatively please contact [email protected] for assistance or advice.
The following also exemplifies the main steps of a submission to PRIDE, although some of less-important details might have changed.
Before submitting to PRIDE database the user need to register an account with PRIDE resource. If you don’t have a PRIDE account, please create one here. Currently we don’t send out automatic emails upon successful registration. Please contact [email protected] if your login information is not valid after 24 hours following registration.
The general rule is that a dataset should correspond to the data described in a single manuscript, if all data in the manuscript comes from the same data workflow (e.g. Data Dependent Acquisition, DDA). If a manuscript contains data coming from different proteomics workflows (DDA and Selected Reaction Monitoring, SRM), it is recommended to split the data in different datasets so this is easier to interpret for third parties. However, it should be highlighted that it is the submitter’s decision how to organise their submitted datasets, which could depend on a number of factors (e.g. future publications).
There are two types of submitted datasets to PRIDE (or to any other ProteomeXchange resource):
Examples include bottom-up DDA datasets where identification results were generated from any tool that can export the data standard mzIdentML (and the corresponding peak list files, see below).
Examples include bottom-up DDA datasets where identification results were generated from any tool that cannot export the PSI data standards mzIdentML or mzTab, or other datasets coming from approaches where no open standard for the results currently has been implemented (e.g. top-down proteomics).
The first step to prepare your submission for PRIDE is to know which files are Mandatory, which are Recommended and which are Optional, and the benefits of providing each file type. Each submitted dataset to PRIDE MUST contain the following information (following ProteomeXchange guidelines):
Mass spectrometer output files (called ‘RAW�) (Mandatory): The RAW files are the native machine data files - Thermo .RAW, ABSCIEX .wiff, .scan, Agilent .d, Waters .raw, Bruker .yep, Bruker .baf - check the full list here. Each RAW file needs to be related with at least one SEARCH file.
mzTab or mzIdentML result files (called ‘RESULT�) (Mandatory for Complete Submissions): The mzTab and mzIdentML are standard file formats provided by most of the analysis software tools check the full list here. The mzIdentML files contains only identification information, whereas the mzTab files can contains both Identification and Quantification results. These files needs to be related with at least one ‘PEAK� (peak list) file.
Peptide/protein identification files (called ‘SEARCH�) (Mandatory for Partial Submissions, Optional for Complete Submissions): These are the files output by the software used to perform the data analysis - Mascot .dat, ProteomeDiscover .msf - check the full list here. Each SEARCH file needs to be related with at least one RAW file.
PEAK List files (called ‘PEAK�) (Mandatory for Complete Submissions, Recommended for Complete submissions): If mzTab or mzIdentML are provided the corresponding peak list files must also be provided in order to be able to check the MS/MS evidences that support the peptide/protein identifications (check the full list here).
Optionally other files can be included in any dataset submission to facilitate the review process or to help reproduce the original results or to provide a better understanding of the dataset:
If you have all the files ready, the next step is to Download the PRIDE Submission Tool. The PRIDE Submission Tool guides the users through the submission process generating at the end the submission.px file. The submission.px file contains 2 types of crucial information:
Metadata: Required experimental metadata like experiment description, sample taxonomy information, instruments and protein modifications used.
Mappings between the uploaded files: for instance between the RAW files and the corresponding ‘RESULT� or search engine output (‘SEARCH�) files.
Note : We can only accept A-Za-z0-9_. in file names. Please remove any other special characters from your file name.
Finally users can submit the dataset using the Aspera (by default) or FTP file transfer protocols provided by the PRIDE Submission Tool.
Submitted datasets are ‘private� by default, which means you need to be logged-in to view your data. During the submission process we create a reviewer account for your dataset which you can include in your letter to the editor or in the actual manuscript, so that it can be used during the review process. The reviewer account will give access to all of the files included in a dataset. You can access the private dataset files in two ways:
PRIDE Archive web site is available at /pride. Registered submitters can use their personal accounts or the reviewer accounts to access and download the individual datasets. For every submitted dataset there is a separate reviewer account generated. Once logged in with your registered User (the e-mail account you used to register in PRIDE) or an issued Reviewer Account you will get access to the private dataset/s listed under that account.
Once your publication is accepted, it is essential for the user to ensure that the associated dataset is made publicly accessible. Failure to do so may result in notifications from journals or fellow researchers requesting the dataset’s public availability. Furthermore, if the manuscript is designated as open access, the PRIDE team will automatically identify instances where the corresponding dataset is not public and initiate the necessary publication steps.
You can make your dataset public in two ways:
Via the PRIDE Archive web site. Once you have logged in with your user account at /pride/login you can list all your private datasets, after selecting the dataset to be published; the user can click the green “Publish� button to request the publication. Here you can provide details for your dataset and submit a web form.
If you are not the original submitter, you can contact [email protected].
Upon making the project public, a project page will be released in PRIDE and available also at ProteomeCentral ().
Note: Exceptions to the public release policy of the datasets
Exceptions to this policy may be only being granted in documented special cases, which will be considered on an individual case-by-case basis. If the original submitters have used or are planning to use the same dataset (that should be released) in other ongoing studies, they can request once an extension of the non-released status. This extension will be of a maximum of 6-months. An official request must be done by the data owner to PRIDE, justifying this request appropriately. It should be noted that this 6-month extension does not consider the requirements of the scientific journal where the article has been published, which may mandate that the data is released immediately anyway.
It is possible to submit multiple datasets for the same project/publication. Some of the reasons to submit multiple datasets for the same study/project/publication:
In the manuscript, the submitters can reference each independent dataset with the corresponding accession number.
During the review process of the manuscript, the user may want to modify their dataset. The following things can be modified:
For large submissions, we suggest “Globus� which is an alternate tool to upload large datasets to the PRIDE database.