The next 100 years of crystallography: retroactively crowdsourcing the PDB

Dear Readers,

Macromolecular crystallography is my favorite scientific technique, and likely the most important one of the past 100 years. So many important discoveries are the fruits of this technique: the structure of DNA, the structure of the ribose, structural enzymology, the mechanism of muscle fiber contraction, and….I could keep going on. This topic has already been discussed in great detail, and will continue to be this year, the 100th birthday of X-ray Crystallography.

X-ray crystallography is certainly deserving of all of these accolades, and perhaps it is poetic justice that the next 100 years could mark its end, or at least as we know it. So was Gregory Petsko’s proclamation during his fantastic talk at the end of Sunday evening’s session (Symposium: Celebrating 100 Years of Crystallography), and I couldn’t agree more. Petsko exclaimed, “It needs to end!”, and I found myself nodding along. At its surface, crystallography is a beautiful technique: the specimens look gorgeous, the diffraction patterns can cause grown men to weep, and protein structures are delightful to look at. A more detailed examination reveals a technique wrought with clumsy: growing crystals is nearly a random exercise; the crystals themselves are unnatural environments for proteins, forcing them into strict conformations; looping the crystals out of their drops is unwieldy, and even the most experienced hand can lose a crystal; cryoprotection is frequently a haphazard endeavor; and radiation damage can effect the amount of data you can collect – better have more than one crystal! So, Petsko says, we really need something better.

So what’s better? The X-ray free electron laser – like the SLAC nearby – is a new and exciting technology that is just starting to yield interesting results (John Spence: XFELS for Imaging Molecular Dynamics). As Spence discussed, the bottleneck of growing large crystals (a nontrivial process) may soon fade, as this technology makes use of much smaller crystals. (Frequently, protein crystallographers will start out with small crystals that need to be optimized to larger crystals for conventional X-ray crystallography. XFEL obviates this step.) Check out the new GPCR paper published last month in Nature by Cherezov et al. Their methodology also eliminates the need to handle crystals, as one can simply pass the medium that facilitates crystal growth in front of the pulsed X-rays produced by the XFEL (William Weis also had a great summary of the technical advances that facilitated the GPCR work during Sunday’s X-ray session.)

Eventually, Spence says, the promise of a prediction in a paper published nearly 15 years ago (Neutze et al, Nature, 2000) will come to fruition, and single-molecule diffraction will be a real thing. This breakthrough, coupled with ever-increasing computational power and the development of new phasing techniques, will make structure determination a trivial exercise; if you can get a stable molecule, you’ll be able to solve its structure, and the PDB will contain structures of perhaps every relevant protein known to humans.

Sounds like a big data problem! Atul Butte’s fantastic seminar on the big data problem in science (Symposium: Biophysics of Personalized Medicine) can be applied to protein crystallography. He pointed out that hundreds of thousands of hard-won data points exist in a free, publicly available database of microarray data, begging to be analyzed. His group has been working to make sense of this huge repository, which could lead to exciting new therapeutic approaches. (Butte calls this “retroactive crowdsourcing”.) Something similar may soon come to the fore in the protein world, as Petsko discussed. There are nearly one hundred thousand crystal structures freely available in the Protein Data Bank, with tremendous redundancy. What do we do with all of this data? Can we say something about the prevalence of certain folds? (Via Petsko: 56 protein folds represent 50% of all structures. Amazing.) What sort of evolutionary relationships can we deduce from this? Can this be used in a translational manner? There is no limit to the questions that we can ask here, if we simply have a way to consider all of this data at once. The PDB itself must have colossal value; Spence mentioned anecdotally that a carat of protein crystals is 12,000 times more valuable than a carat of diamonds (also during the X-rays session). I can readily believe this, given the materials’ costs, labor, and time it takes to get protein structures. And it’s all out there for free. I love science.

Back to big data: this isn’t just another buzzword; based on the seminars presented here at the BPS Annual Meeting, it is the present and future of biology. I think Gregory Petsko’s insight is extremely helpful for graduate students looking ahead to career opportunities: in the coming years, it won’t be useful to simply call yourself a structural biologist or a crystallographer; the techniques are increasingly accessible to researchers and don’t require years of training. Gone are the days when you can simply hang a shingle saying “Protein Crystallographer” on your door and ply your trade. Petsko says we need to be biologists again first and foremost, which is a good lesson for someone such as myself, who often focuses on the structures and gets lost in detailed discussions about the biology.

Until later,



One thought on “The next 100 years of crystallography: retroactively crowdsourcing the PDB

  1. Interesting stuff. Very nice write-up

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s