Improved access to ligand data and annotations in the PDBe FTP

Image with news article title as text along with the word release in a white circle with an protein structure image in the background

Wed, 01/31/2024 - 12:30

PDBe has overhauled its ligand FTP data structure, making data on all types of ligands in the PDB more easily accessible. Furthermore, users can also get streamlined access to all protein-ligand interaction data, combined with relevant functional annotations for ligands in the PDB archive, with new ligand data files.

Structuring Ligand Reference Class Data

PDBe recently introduced Covalently Linked Components (CLCs), a new class of reference small molecules that facilitates identification of covalently linked multi-component ligands across the PDB archive. This new reference definition expands upon, and fills the gaps between, the existing wwPDB reference dictionaries of Chemical Component Definitions (CCDs) and Peptide-like molecules Reference Dictionary (PRD).

This enhancement vastly improves the interpretation of PDB small molecule data for users, however the organisation of this data in the PDBe FTP directory structure remained focused on CCD definitions. This has therefore necessitated a reorganisation of the FTP directory structure for ligand information at PDBe, while at the same time providing an opportunity to provide additional interactions data and annotations for all these molecules in the PDB archive.

Summarising Ligand Interactions Data:

Until now, the only method to access ligand interactions data in the PDBe was by a specific PDB ID, requiring querying of each PDB entry and hindering comprehensive analysis of ligand interactions across the full archive.

To address this challenge, PDBe has introduced two new data summary files:

Interacting_chains_with_ligand_functions.tsv

This comprehensive file provides a summary of all ligand interactions across the PDB archive. It includes details about interacting macromolecule chains, along with their mapped UniProt accessions (when available). Additionally, the file provides further information about each ligand, such as its functional annotation(cofactor-like, drug-like or reactant-like) and essential identifiers (InChIKey, bmID, LigandType).

PDBID Chain_Symmetry BestUnpAccession bmID inchikey LigandID LigandType annotation
100d A None bm1 PFNFFQXMRSDOHW-UHFFFAOYSA-N SPM CCD None
101d A None bm1 JLVVSXFLKOJNIY-UHFFFAOYSA-N MG CCD ion
101d A None bm2 IDBIFFKSXLYUOT-UHFFFAOYSA-N NT CCD None
101d B None bm1 JLVVSXFLKOJNIY-UHFFFAOYSA-N MG CCD ion
101d B None bm2 IDBIFFKSXLYUOT-UHFFFAOYSA-N NT CCD None
101m A P02185 bm1 QAOWNCQODCNURD-UHFFFAOYSA-L SO4 CCD ion
101m A P02185 bm2 KABFMIBPWCXCRK-RGGAHWMASA-L HEM CCD reactant-like
101m A P02185 bm3 FSBLVBBRXSCOKU-UHFFFAOYSA-N NBN CCD None
102d A None bm1 WTFXJFJYEJZMFO-UHFFFAOYSA-N TNT CCD None
102d B None bm1 WTFXJFJYEJZMFO-UHFFFAOYSA-N TNT CCD None

pdb_bound_molecules.tsv

During refinement and annotation, complex ligands are often fragmented into individual chemical components (CCDs), posing challenges in identification and mapping to other databases. This file gives details on each complete ligand within PDB entries, composed of the covalently linked, non-polymeric entities. Each complete small-molecule in a PDB entry is assigned a unique identifier (bmID), and this file defines their composition as a list of the constituent components, revealing how these chemical components (CCDs) are connected within PDB structures.

PDBID bmID composition(list:ResName:ResNumber:Chain_Symmetry) inchikey LigandID LigandType
100d bm1 SPM:21:A PFNFFQXMRSDOHW-UHFFFAOYSA-N SPM CCD
101d bm1 MG:26:A JLVVSXFLKOJNIY-UHFFFAOYSA-N MG CCD
101d bm2 NT:25:B IDBIFFKSXLYUOT-UHFFFAOYSA-N NT CCD
101m bm1 SO4:157:A QAOWNCQODCNURD-UHFFFAOYSA-L SO4 CCD
101m bm2 HEM:155:A KABFMIBPWCXCRK-RGGAHWMASA-L HEM CCD
101m bm3 NBN:156:A FSBLVBBRXSCOKU-UHFFFAOYSA-N NBN CCD
102d bm1 TNT:25:B WTFXJFJYEJZMFO-UHFFFAOYSA-N TNT CCD
102l bm1 BME:901:A,BME:902:A DGVVWUTYPXICAM-UHFFFAOYSA-N BME CCD
102l bm2 CL:173:A VEXZGXHMUGYJMC-UHFFFAOYSA-M CL CCD
102l bm3 CL:178:A VEXZGXHMUGYJMC-UHFFFAOYSA-M CL CCD

These data files empower researchers to:

Perform large-scale analyses of ligand interactions across the entire PDB using complete ligand representations.
Quick access to all the ligands bound to a specific protein or identifying all the proteins binding to a specific ligand
Gain deeper understanding of protein function by relating ligand interactions to functional categories.
Easily access and navigate ligand data through unique identifiers and clear file organisation.

Streamlining Data Access:

Beyond these new ligand interaction files, PDBe has also simplified data access for ligands by restructuring the ligand FTP directory. Ligand data is now categorised into dedicated folders for CCDs, PRDs, CLCs, and additional interaction data files. This intuitive organisation allows researchers to easily locate specific ligand information and provides consistency of access for the different ligand definition types.

Image displaying the updated directory structure for ligands data in the PDBe FTP (). Within the pdbechem_v2 directory, there are now separate directories for each of ‘ccd��, ‘prd�� and ‘clc�� reference definitions, plus a further directory for ‘additional_data��. For each reference definition, the directories are further subdivided by single characters representing the first character for CCD IDs, and the final character for PRD and CLC IDs.

The revamped ligand data structure will support structural bioinformatics research by providing a clear and consistent approach for access to these data. By unlocking comprehensive interaction insights and offering streamlined access, PDBe empowers researchers to delve deeper into protein function, paving the way for advancements in drug discovery and targeted therapies.

To access the data and explore the revamped ligand structure, visit the PDBe FTP area at .

��������Ϸ��վ