
The official announcement of in prompts Gerard Kleywegt, Head of the Protein Data Bank in Europe, to reflect on the origins and future directions of this essential resource for raw biomolecular imaging data.
EMPIAR is the public archive for raw electron-microscopy (EM) image data. EMPIAR is the most recent member of the family of archives that store the results and data from molecular and cellular structural biology research. It was at the in 2014, originally as a complement to (Electron Microscopy Data Bank), which is the single, global archive for 3DEM volume maps and tomograms. EMPIAR is part of a larger project at called “MOL2CELL�, that aims to integrate 3D structural information on different length scales (“from molecules to cells�) so that it can be used by biologists even if they are not experts in structural biology.
Nature Methods recently named EM its . As the EM field expands and matures rapidly, the raw image data archived in EMPIAR is very useful for software and methods development, for testing new approaches to validation, for making available data related to controversial studies, for distributing data for community challenges (such as the on-going ), and for teaching and training newcomers and students in the 3DEM field. At the same time, it provides a useful test bed to investigate the requirements and technical aspects of archiving and distributing large amounts of bioimaging data. EMPIAR took off properly in 2015, and contains 48 released entries as of March 2016. EMPIAR was designed to handle large datasets: the average size of an EMPIAR entry is ~700 GB, with the largest one currently taking up over 6 TB.
The origins of EMPIAR
As is often the case with new initiatives, the genesis of EMPIAR has not been a straightforward process (although it did fortunately have a happy ending). In 2011, PDBe and the Dundee organised a . At that meeting, there was a consensus that routine deposition of raw data to EMDB was premature, but that it would be useful to develop a pilot archive of images used in single-particle cryo-EM processing and of tilt series used in electron tomography.
PDBe and OME took up this challenge and applied for funding from the UK’s for a project to develop a volume browser for 3D imaging data and to provide archiving support for soft X-ray tomography (SXT) data and for 3D scanning electron microscopy (3DSEM) reconstructions. The research council did not fund the proposal as it was deemed “premature� (although we thought it was “visionary�), possibly too ambitious and the potential user base was unclear. We then decided to organise another specifically to discuss archiving needs and opportunities in this area. The recommendations from this workshop were very much in line with our earlier grant application, and expanded on it. The workshop also discussed compelling use cases, which would help with future grant applications.
Figure 1 - The MOL2CELL project team at PDBe. From left to right: (working on the EMPIAR archive, including the website and the deposition system), (leading the project), (working on the integrated structure browser, building on the volume slicer developed by his predecessor, ), and (focusing on segmentations and adding support for other experimental methods). An undergraduate trainee, , did a project on 3D segmentation that laid some of the groundwork for Paul’s work and that was also used in the December 2015 workshop.
Armed with the expertsâ€� recommendations and with strong support from the community, PDBe submitted a new grant application, this time to the UK’s (MRC). This application was successful, and with BBSRC co-funding, we were awarded two programmer posts for three years to work on the MOL2CELL project (of which the image archive was one component). Once the leader of the project, , started the preliminary work, we had to come up with a name and decided on EMPIAR (“Electron Microscopy Pilot Image Archiveâ€�), which is fairly unique in (and even ). Once two programmers had been recruited and started at the ÀÖÌìÌÃÓÎÏ·ÍøÕ¾, the real work began, and the rest, as they say, is history. Figure 1 shows the people who are currently working on the MOL2CELL project.
What does EMPIAR have to offer now?
The provides a portal to the archived data and related services, including the FAQ list and the deposition system. The individual entry pages (for example, for entry ) give more information about the entry and provide access to individual data files and the option to download partial or complete datasets using a variety of technologies (Figure 2). Thumbnail images of various files can be viewed inside your web browser. As part of the wider MOL2CELL project, a volume-slice viewer has also been developed, which at present is offered for all EMDB entries (for example, for entry EMD-2363; this will be discussed in a separate blog soon).
Figure 2 - The EMPIAR archive of raw 3DEM image data, established by PDBe in 2014, is developing rapidly in terms of archived entries, functionality, tools, and recognition by the community. The volume slicer displayed in the bottom-left panel allows for easy visualisation of 3DEM maps and tomograms, enabling even non-specialist users to inspect very large datasets, in three different orientations. The bottom-right panel shows details of an EMPIAR entry page, including on-the-fly downloads and display of thumbnail images of individual files.
3DSEM entries: significance
On 20 January 2016, four related 3D (more precisely, Serial Block-Face or SBF-SEM) datasets were released, showing different stages of infection of a red blood cell by a malaria parasite (EMPIAR entries 10052, 10053, 10054 and 10055; Figure 3). These were the first experimental entries in EMPIAR that were not derived from any of the Transmission EM methods that are archived in EMDB. We expect other imaging modalities to become more and more important in the next few years (in the first instance, besides 3DSEM, also Correlative Light and Electron Microscopy, or CLEM, and Soft X-ray Tomography, or SXT), and EMPIAR is the perfect “playground� to investigate what is involved in archiving the results of these new techniques.
Figure 3 - EMPIAR entry 10055 is a 3DSEM dataset of a late schizont-stage malaria-parasite-infected red blood cell, deposited and described by
Imaging the future
The field of bioimaging is developing very rapidly, and there are several initiatives in the and to set up appropriate infrastructure. This includes facilities to archive (some of) the data that are generated, and EMPIAR is a well-timed early exemplar of such an archive.
The MOL2CELL project extends beyond the EMPIAR archive. We will also develop a tool for annotation of segments (“regions-of-interest�) in 3D datasets, as well as a tool for visualisation and analysis of integrated structural data, from molecules and complexes (in PDB and EMDB) all the way to cells and small samples (in EMDB and EMPIAR), linked through references to other archives (e.g., , and ). To get detailed input from community experts, PDBe and recently organised a third workshop discussing 3D segmentations (and their annotation) as well as 3D transformations.
Reference
The EMPIAR announcement was published as: A. Iudin, P.K. Korir, J. Salavert-Torres, G.J. Kleywegt & A. Patwardhan. “EMPIAR: A public archive for raw electron microscopy image data.� Nature Methods 13 (2016). . See also the of 21 March 2016.
Acknowledgements
We are very grateful for the support from our funders, in particular (grant MR/L007835, co-funded by BBSRC), (grant BB/M018423), the (grant 104948) and . Besides the people working on the MOL2CELL project (Figure 1), many colleagues inside PDBe, ÀÖÌìÌÃÓÎÏ·ÍøÕ¾ and elsewhere have contributed to the development of EMPIAR. We are particularly grateful to the EM specialists in the field who have participated in our expert workshops, contributed data to EMPIAR, helped us with user-experience testing, and spread the word about EMPIAR to colleagues and journal editors.