Graham Richards

Introduction
With the sequencing of the human genome essentially complete, attention is turning to the protein for which DNA provides the code. Whereas there are, to the surprise of many, only some 40,000 or so genes, these may code for a few hundred thousand proteins. The study of these proteins, their structures and how they interact is the stuff of the currently fashionable area of proteomics. Enormous resources are being deployed in the construction of facilities such as synchrotrons to elucidate protein structure. But what will we do with the information about protein structure? The answer is that we will need to discover small molecules which can interfere with the action of the proteins or the interactions between them: in particular this knowledge will be the starting point for the rational design of drugs. However, whereas there are a few tens of thousands of genes and a few hundred thousand proteins, there are potentially billions of small molecules which have the qualities prerequisite for drug-like properties: the correct range of molecular weight, solubility, stability and absence of toxic groups. It is quite possible to screen a set of molecules for potential activity as a drug using computational techniques: so-called virtual screening or in silico screening. Quite simply each candidate small molecule is tested to see if it will fit into the binding site of the target protein, and a calculation of how strongly it will bind is performed. The tighter the small molecule can bind to the target protein, the better its potential as a drug or inhibitor. Strong binding leads to low doses. This type of research is routine within pharmaceutical companies who may consider screening in this way perhaps a million compounds, including both those they already possess in their libraries and novel virtual molecules as yet to be synthesised. Such work, on the other hand, only considers a tiny part of ‘chemical space’, the complete variety of molecules which could, in principle, be tested. By utilising massively distributed computing in the form of screensaver technology we have been able to involve over one million personal computers (including several thousand in Italy) which is permitting us to screen billions of potential drugs against proteins involved in cancer.

Screensaver technology
Our project derives from the well-known SETI (Search for extraterrestrial intelligence) project. In that work signals recorded from interstellar space are distributed to personal computers via the internet. While the computer is idle and would be running its screensaver, typically some flying toasters, or a selection of photographs of cats, a specially created screensaver runs calculations which test whether signals are non-random and hence, crudely speaking, try to ascertain whether ET has phoned home. So far he has not. In our cancer screening version the screensaver displays the target protein and the current molecule being tested. When connected to the internet a participant can join in by going to www.ud.com or to our departmental website www.chem.ox.ac.uk which gives much more detail about the project. Each participant receives a batch of 100 molecules to test. Testing involves trying all possible shapes of the small molecule and matching binding points specified on the target against the reciprocal binding possibilities in the small molecule. This is the so-called ‘pharmacophore’ approach, whereby binding possibilities such as hydrogen bond donors and acceptors are matched, or hydrophobic regions of molecular surface are seen to complement each other. The computer program which runs behind the screensaver is obviously constrained in terms of memory usage and data transfer. For home users internet access is often limited to a few minutes each day so the software has to be able to run productively for hours without accessing the network. The jobs are sent to the clients from a central server which dispatches jobs and receives the results. So as to be non-intrusive, the application stops as soon as the user interacts with his or her PC. The distributed software agent has two components: a permanently active part which communicates with the server when necessary, and the computational application. Figure 1 is a view of the actual screensaver.

The database of small molecules
The starting point for the database of small drug-like molecules is the list of molecules available in suppliers’ catalogues and in published combinatorial chemical libraries. These lists have to be pruned to restrict attention to those compounds which have drug-like qualities. That list can then be expanded by constructing de novo derivatives by exchanging groups, for example H could be replaced by –CH3 or –OH or –Cl. In this manner a database of 3.5 billion small molecules has been created. One at a time in batches of 100 the individual molecules are tested to see how well they fit and bind to the target site. A scoring function permits one to define molecules which constitute a ‘hit’, a compound which binds better than a defined threshold.

The scoring function
Having generated complimentary centres the scoring function measures quickly and crudely just how tight binding is likely to prove. It makes an attempt to predict the binding free energy (DG) between the potential drug and the protein according to the following formula

DG = DGO + DGH-bond x NH-bond + DGlipo x Nlipo + DGrot x Nrot + E
where DGO, DGH-bond, DGlipo and DGrot are constants (-5.48; -3.34; -0.117 and +2.56 respectively).

NH-bond is the number of hydrogen bonds;

Nlipo the number of lipophilic interactions

and Nrot the number of frozen rotatable bonds in the ligand. This last term takes into account the loss of entropy on binding. Finally E is a calculated interaction and torsional energy derived from standard force field formulae.

Target proteins
The first five target proteins are as follows:

RAS proteins which play a central role in cell growth (for which there appears to be no significant alternative signalling). It is conceivable that many of the inhibitors of H-RAS (for which there are crystal structures) will also inhibit N-RAS and K-RAS. In addition, there are crystal structures for Farnesyl Protein Transerase (FPT) which “activates” RAS and is consequently an alternative target. Figure 2 is an illustration of this structure.

Vascular Endothelial Growth Factor (VEGF) mediates a critical stage in the development of cancer - the growth of blood vessels. In principle, inhibitors can be found for VEGF or the corresponding series of receptors, VEGFr-1 (also known as FLT-1) and VEGFr-2 (also known as KDR).

Superoxide dismutase (SOD) are essential enzymes that remove the superoxide (O2-) radical. Inhibition of such enzymes result in cell damage by free radicals and ultimately cell death. The high levels of O2- in certain cancers (such as leukaemia cells) provide an unusual mechanism for treatment by inhibiting SOD.

Protein Tyrosine Kinases (PTK) play a fundamental role in signal transduction pathways and deregulation of this activity is common in many cancers. The structure of insulin receptor tyrosine kinase was one of the first known and is a potential target of unknown selectivity.

BCR-ABL a tyrosine kinase which is believed to be causally involved with chronic myeloid leukemia.

Initial results
Figure 3 depicts the scores for hits against the target VEGF. There are almost 45,000 with a binding score that is negative which indicates potential strong binding. These lists will have to be refined using more sophisticated calculations, and finally the most promising candidates will have to be synthesized and tested in the usual way.

Conclusions
In the first six months of this project over one million personal computers have joined the scheme, and a staggering 50,000 years of CPU time has been donated. People in over 200 countries are taking part, including several thousand in Italy, which ranks number 10 in the world list. There are even two participants in the Holy See (extensive statistical information is available on the United Devices website: www.ud.com). One of the most striking aspects of the enterprise has been just how enthusiastically the general public across the world has embraced the idea. At a time when raising public awareness of science is perceived as a problem, the possibility of donating screensaver time, which is essentially free, and playing a part in a real scientific project clearly hits a nerve and commands attention. This public generosity is likely to make this work one of the biggest computational projects ever undertaken, and the computer power being deployed dwarfs even the biggest supercomputers.

Graham Richards
Facoltà di Chimica dell'Università di Oxford