Our physical understanding of the macroscopic world is so good that everything from bridges to aircraft can be designed and tested on a computer. There’s no need to make every possible design to figure out which ones work. Microscopic molecules are a different story. “Basically, we are still doing chemistry like Thomas Edison,” says Anatole von Lilienfeld of Argonne National Laboratory in Lemont, Illinois.
The chief enemy of computer-aided chemical design is the Schrödinger equation. In theory, this mathematical beast can be solved to give the probability that electrons in an atom or molecule will be in certain positions, giving rise to chemical and physical properties.
But because the equation increases in complexity as more electrons and protons are introduced, exact solutions only exist for the simplest systems: the hydrogen atom, composed of one electron and one proton, and the hydrogen molecule, which has two electrons and two protons.
This complexity rules out the possibility of exactly predicting the properties of large molecules that might be useful for engineering or medicine. “It’s out of the question to solve the Schrödinger equation to arbitrary precision for, say, aspirin,” says von Lilienfeld.
So he and his colleagues bypassed the fiendish equation entirely and turned instead to a computer-science technique.
Machine learning is already widely used to find patterns in large data sets with complicated underlying rules, including stock market analysis, ecology and Amazon’s personalised book recommendations. An algorithm is fed examples (other shoppers who bought the book you’re looking at, for instance) and the computer uses them to predict an outcome (other books you might like). “In the same way, we learn from molecules and use them as previous examples to predict properties of new molecules,” says von Lilienfeld.
His team focused on a basic property: the energy tied up in all the bonds holding a molecule together, the atomisation energy. The team built a database of 7165 molecules with known atomisation energies and structures. The computer used 1000 of these to identify structural features that could predict the atomisation energies.
When the researchers tested the resulting algorithm on the remaining 6165 molecules, it produced atomisation energies within 1 per cent of the true value. That is comparable to the accuracy of mathematical approximations of the Schrödinger equation, which work but take longer to calculate as molecules get bigger (Physical Review Letters, DOI: 10.1103/PhysRevLett.108.058301).
The algorithm found solutions in a millisecond that would take these earlier methods an hour. “Instead of having to wait years to screen lots of new molecules, you might have to wait weeks or a month,” says Mark Tuckerman of New York University, who was not involved in the new work.
The algorithm is still mainly a proof of principle. If it can learn to predict something else, such as how well a molecule binds to an enzyme, it could help with designing drugs, fuel cells, batteries or biosensors. “The applications can be as broad as chemistry,” von Lilienfeld says.