
Data on sequence level mapping through the SIFTS process can now be retrieved directly from updated mmCIF files, available on PDBe entry pages. These files now contain residue-level mapping between and PDB entries, including coordinate numbering for UniProt accessions and mapping to a host of other protein resources.
Adding value to PDB data
PDBe’s release process creates ‘updated� PDBx/mmCIF files from PDB archive files, containing remapped enumerations and additional information, while yielding more consistent, standardized metadata, without altering core PDB information, such as atomic coordinates and experimental data. These updated PDBx/mmCIF files have now been further enriched by the addition of three new ‘SIFTS- specific� categories, providing improved mapping between structure and sequence data.
SIFTS provides residue-level mapping between and entries, while also incorporating annotation from the , , , , , , , , and resources. The information is updated and released every wednesday concurrently with the release of new PDB entries and can be obtained from the PDBe Rest API or PDB entry XML files in the ÀÖÌìÌÃÓÎÏ·ÍøÕ¾ FTP area. Following this update to the updated PDBx/mmCIF files produced by PDBe, SIFTS data can also be directly retrieved from these files and are available from PDBe entry pages.
New SIFTS data categories
Of the three new categories added to incorporate SIFTS data in mmCIF files, two include segment-related annotations, which map PDB residue ranges to Uniprot accessions (_pdbx_sifts_unp_segments) and to other external resources (_pdbx_sifts_xref_db_segments). The UniProt mapping is shown for the canonical accession and all other isoforms, including information of best mapping and sequence identity. The external resource mappings provide information on Pfam, SCOP2 and CATH mappings. The third new category (_pdbx_sifts_xref_db) contains residue-level cross-references to the best mapped UniProt accession and other external databases, therefore is a single reference category for all annotations from external databases associated with any given residue.
The _pdbx_sifts_xref_db category for PDB entry 4daj, describing residue-level cross-references to external databases. The items specific to the UniProt database and other cross-reference databases are enboxed in blue and pink respectively. This updated mmCIF file can be downloaded from here.
In addition to these new categories, the updated PDBx/mmCIF files also contain a modified _atom_site (coordinates) category to include UniProt mapping information. This includes items containing the accession, residue identity and residue number for the corresponding best mapped UniProt accession. Additional details about all the items of these new categories introduced can be found in the (PDBx/mmCIF) Development Version. This updated mmCIF format file can be downloaded from a PDB entry page or from PDBe search results.
Aside from enriching the metadata in mmCIF files, addition of SIFTS annotation serves as a basis for further developments. In near future, we will update our 3D visualisation tool Mol* to read this information and display these annotations on the 3D protein structures directly, thereby enabling us to optimise 3D visualisation on our PDBe and PDBe-KB web pages. These updated PDBx/mmCIF files also facilitate addition of information to support protein superimposition functionality in Mol*.