The Bürgi–Dunitz angle revisited

I recently came across Prof. Rzepa’s blog post on the Bürgi–Dunitz angle—a topic very close to my heart. Embarrassingly, I didn’t know about this concept until halfway through my PhD at ETH, despite regular interactions with Prof. Dunitz himself!

The discovery of the Bürgi–Dunitz angle marked one of the key moments in crystallography, organic and supramolecular chemistry, and molecular orbital analysis. I was very impressed by the beauty and neat idea behind this study, and I’ve often used it as an example to show students the vast information that can be gleaned from structural analysis. It was truly insightful how Bürgi and Dunitz were able to distill the preferred reaction trajectory from just 6 (!) crystal structures.

Nearly 50 years later, with over a million structures now in the CCDC database, we should be much better equipped to re-validate the theory behind the Bürgi–Dunitz angle through extensive structural analysis. However, if you look at the angle of any nucleophilic atom (N, O, S, etc.) approaching carbonyl functionalities, the most commonly found angles are actually around 90º, not near 105º, as Prof. Rzepa clearly demonstrated. So, were Bürgi and Dunitz wrong?

After a close examination of the structural hits, I realized the primary source of this discrepancy comes from the overwhelming number of structures featuring antiparallel C=O interactions. In these cases, the oxygen of one moiety sits on the carbon of another, resulting in an O…C=O angle of approximately 90º. This is quite fascinating; starting from an analysis of nucleophilic addition to a carbonyl group, we ended up uncovering one of the most prevalent non-covalent interactions in proteins!

I’ve already added my comment below Prof. Rzepa’s post. Part of the reason I wanted to write about this again here is to emphasize a crucial point: in the age of machine learning, it’s incredibly easy to gather vast amounts of data. Yet, it remains extremely challenging to discern the actual information or patterns within that data. After all, as ML practitioners often say: “Your model will be only as good (or as bad) as the data you have.”