Illuminating protein conformational landscapes in the PDB archive

Image to promote the superposition tool at PDBe-KB, enabling better identification of structure conformations in the PDB archive

The is excited to announce the release of a groundbreaking scientific data pipeline to identify protein conformational states in the Protein Data Bank (PDB). The data pipeline, which utilises a deterministic clustering approach and allows users to compare the results to AlphaFold predictions from the , provides valuable insights into the structural variability of proteins. This development promises to enhance our understanding of protein function and aid in exploring biological significance.

The process employed by the data pipeline involves several key steps. Firstly, all proteins in the PDB with 100% sequence identity are collated, with a global backbone dissimilarity score, known as the "GLOCON" score, calculated for each protein chain. This step is deterministic, thereby providing consistent results with the same input and eliminating the need for a random seed, while also capturing different scales of structural variation. Chains are then clustered into groups based on their similar GLOCON scores using UPGMA agglomerative clustering. Finally, the pipeline uses the tool to superpose these chains, leading to structural alignment at the most similar regions of the proteins.

The resultant clustering of protein conformers through this pipeline is available through the PDBe-KB aggregated views of proteins. For a given UniProt accession, users can access the superposed clusters by clicking the "3D view of superposed structures" button. This opens the Mol* viewer, where aligned structural conformations are displayed in ribbon format for a representative from each unique conformation. Structures are superposed for each segment (sequence region) of the UniProt entry for which structures are available, with users able to toggle through these using the "Select Segment" drop-down.

The chains corresponding to the predicted conformations are listed under the "Cluster N" headers. Users can click on the "Superposition data" button in the green header region for a detailed view of the cluster results in dendrogram format. More detailed information about the implementation and use of this pipeline is accessible via the ‘Documentation� button.

The data generated by this pipeline offers valuable insights into protein conformational states. Clusters represent single protein chains sharing global structural similarity, which can include minor conformational differences or large domain movements. The significance of this methodology is demonstrated through several case studies:

 

How to view this data at PDBe-KB

This short video shows you how you can use the structure superposition at the PDBe-KB aggregated views of proteins to view structural conformations.

 

Case studies

Hexokinase

Hexokinase adopts two conformations based on its sugar-bound state. When bound to a sugar molecule, it curls inward into a closed conformation, while in the absence of a sugar molecule, it assumes an outward-facing conformation. The data pipeline successfully clusters the chains of hexokinase from E. coli into the open (unliganded) and closed (liganded) states. This enables researchers to explore the biological significance of all chains in either conformation.

Multiple conformations of hexokinase-6 protein structures displayed in ribbon format
Hexokinase-6 (/pdbe/pdbe-kb/proteins/Q8LQ68

 

Aldose Reductase

Multiple ligand-bound structures of Aldose Reductase have been solved in the PDB, which all adopt highly similar conformations. However, the clustering pipeline identifies a structurally heterogeneous chain, which happens to be the only apo-form of the protein in the PDB. This distinction provides valuable information for understanding the functional implications of the protein's conformational variability.

Multiple conformations of aldose reductase protein structures displayed in ribbon format
Aldose reductase (/pdbe/pdbe-kb/proteins/P23901)

 

KaiB

The circadian clock oscillator protein KaiB plays a crucial role in the day-night circadian rhythm of cyanobacteria. It associates with other related proteins, KaiA and KaiC, during different stages of the circadian cycle, acting as a checkpoint for circadian regulation. The data pipeline successfully distinguishes between the ground and fold-switch states of the KaiB protein, which is adopted during the day and night, respectively.

Multiple conformations of kaiB protein structures displayed in ribbon format
Circadian clock oscillator protein KaiB (/pdbe/pdbe-kb/proteins/Q79V61)

 

The release of this scientific data pipeline represents a significant step forward in our ability to analyse protein conformational states across the whole PDB archive. By providing researchers with a comprehensive understanding of structural variability, it offers new avenues for investigating protein function and biological relevance. The PDBe team continues to support unlocking the data hidden within the PDB, while the potential for groundbreaking discoveries and advancements in various fields of study is immense.

For more information and access to the data pipeline, please refer to the following publication: