

Introduction
With the sequencing of the human genome essentially complete, attention
is turning to the protein for which DNA provides the code. Whereas there
are, to the surprise of many, only some 40,000 or so genes, these may code
for a few hundred thousand proteins. The study of these proteins, their
structures and how they interact is the stuff of the currently fashionable
area of proteomics. Enormous resources are being deployed in the construction
of facilities such as synchrotrons to elucidate protein structure. But what
will we do with the information about protein structure? The answer is that
we will need to discover small molecules which can interfere with the action
of the proteins or the interactions between them: in particular this knowledge
will be the starting point for the rational design of drugs. However, whereas
there are a few tens of thousands of genes and a few hundred thousand proteins,
there are potentially billions of small molecules which have the qualities
prerequisite for drug-like properties: the correct range of molecular weight,
solubility, stability and absence of toxic groups. It is quite possible
to screen a set of molecules for potential activity as a drug using computational
techniques: so-called virtual screening or in silico screening. Quite simply
each candidate small molecule is tested to see if it will fit into the binding
site of the target protein, and a calculation of how strongly it will bind
is performed. The tighter the small molecule can bind to the target protein,
the better its potential as a drug or inhibitor. Strong binding leads to
low doses. This type of research is routine within pharmaceutical companies
who may consider screening in this way perhaps a million compounds, including
both those they already possess in their libraries and novel virtual molecules
as yet to be synthesised. Such work, on the other hand, only considers a
tiny part of ‘chemical space’, the complete variety of molecules which could,
in principle, be tested. By utilising massively distributed computing in
the form of screensaver technology we have been able to involve over one
million personal computers (including several thousand in Italy) which is
permitting us to screen billions of potential drugs against proteins involved
in cancer.
Screensaver
technology
Our project derives from the well-known SETI (Search for extraterrestrial
intelligence) project. In that work signals recorded from interstellar space
are distributed to personal computers via the internet. While the computer
is idle and would be running its screensaver, typically some flying toasters,
or a selection of photographs of cats, a specially created screensaver runs
calculations which test whether signals are non-random and hence, crudely
speaking, try to ascertain whether ET has phoned home. So far he has not.
In our cancer screening version the screensaver displays the target protein
and the current molecule being tested. When connected to the internet a
participant can join in by going to www.ud.com or to our departmental website
www.chem.ox.ac.uk which gives much more detail about the project. Each participant
receives a batch of 100 molecules to test. Testing involves trying all possible
shapes of the small molecule and matching binding points specified on the
target against the reciprocal binding possibilities in the small molecule.
This is the so-called ‘pharmacophore’ approach, whereby binding possibilities
such as hydrogen bond donors and acceptors are matched, or hydrophobic regions
of molecular surface are seen to complement each other. The computer program
which runs behind the screensaver is obviously constrained in terms of memory
usage and data transfer. For home users internet access is often limited
to a few minutes each day so the software has to be able to run productively
for hours without accessing the network. The jobs are sent to the clients
from a central server which dispatches jobs and receives the results. So
as to be non-intrusive, the application stops as soon as the user interacts
with his or her PC. The distributed software agent has two components: a
permanently active part which communicates with the server when necessary,
and the computational application. Figure 1 is
a view of the actual screensaver.
The
database of small molecules
The starting point for the database of small drug-like molecules is the
list of molecules available in suppliers’ catalogues and in published combinatorial
chemical libraries. These lists have to be pruned to restrict attention
to those compounds which have drug-like qualities. That list can then be
expanded by constructing de novo derivatives by exchanging groups, for example
H could be replaced by –CH3 or –OH or –Cl. In this manner a database of
3.5 billion small molecules has been created. One at a time in batches of
100 the individual molecules are tested to see how well they fit and bind
to the target site. A scoring function permits one to define molecules which
constitute a ‘hit’, a compound which binds better than a defined threshold.
The
scoring function
Having generated complimentary centres the scoring function measures quickly
and crudely just how tight binding is likely to prove. It makes an attempt
to predict the binding free energy (DG) between the potential drug and the
protein according to the following formula
DG = DGO + DGH-bond
x NH-bond + DGlipo x Nlipo + DGrot x Nrot + E
where DGO, DGH-bond, DGlipo and DGrot are constants (-5.48; -3.34; -0.117
and +2.56 respectively).
NH-bond is the number of hydrogen bonds;
Nlipo the number of lipophilic interactions
and Nrot the number of frozen rotatable bonds in the ligand. This last term takes into account the loss of entropy on binding. Finally E is a calculated interaction and torsional energy derived from standard force field formulae.
Target
proteins
The first five target proteins are as follows:
RAS proteins which
play a central role in cell growth (for which there appears to be no significant
alternative signalling). It is conceivable that many of the inhibitors of
H-RAS (for which there are crystal structures) will also inhibit N-RAS and
K-RAS. In addition, there are crystal structures for Farnesyl Protein Transerase
(FPT) which “activates” RAS and is consequently an alternative target. Figure
2 is an illustration of this structure.
Vascular Endothelial Growth Factor (VEGF) mediates a critical stage in the development of cancer - the growth of blood vessels. In principle, inhibitors can be found for VEGF or the corresponding series of receptors, VEGFr-1 (also known as FLT-1) and VEGFr-2 (also known as KDR).
Superoxide dismutase (SOD) are essential enzymes that remove the superoxide (O2-) radical. Inhibition of such enzymes result in cell damage by free radicals and ultimately cell death. The high levels of O2- in certain cancers (such as leukaemia cells) provide an unusual mechanism for treatment by inhibiting SOD.
Protein Tyrosine Kinases (PTK) play a fundamental role in signal transduction pathways and deregulation of this activity is common in many cancers. The structure of insulin receptor tyrosine kinase was one of the first known and is a potential target of unknown selectivity.
BCR-ABL a tyrosine kinase which is believed to be causally involved with chronic myeloid leukemia.
Initial
results
Figure 3 depicts the scores for hits against
the target VEGF. There are almost 45,000 with a binding score that is negative
which indicates potential strong binding. These lists will have to be refined
using more sophisticated calculations, and finally the most promising candidates
will have to be synthesized and tested in the usual way.
Conclusions
In
the first six months of this project over one million personal computers
have joined the scheme, and a staggering 50,000 years of CPU time has been
donated. People in over 200 countries are taking part, including several
thousand in Italy, which ranks number 10 in the world list. There are even
two participants in the Holy See (extensive statistical information is available
on the United Devices website: www.ud.com). One of the most striking aspects
of the enterprise has been just how enthusiastically the general public
across the world has embraced the idea. At a time when raising public awareness
of science is perceived as a problem, the possibility of donating screensaver
time, which is essentially free, and playing a part in a real scientific
project clearly hits a nerve and commands attention. This public generosity
is likely to make this work one of the biggest computational projects ever
undertaken, and the computer power being deployed dwarfs even the biggest
supercomputers.
Graham
Richards
Facoltà di Chimica dell'Università di Oxford

