Research
The goal of the CSMP is to express, purify and determine
the structures of representative members of membrane protein classes.
Where classes of membrane proteins are represented in prokaryotes,
it is likely that structures for a homolog will be determined first
for prokaryotic or archaeal members. Many human proteins have no
good homologs in prokaryotes or archaea. These include psychopharmaceutical
receptors that are targets for neuro-psychopharmaceutical drugs,
the reuptake pumps that are currently targets for the new anti-depressants,
the ~1500 GPCRs in the human genome that include numerous key drug targets
today. Nearly 30% of all eukaryotic proteins are membrane proteins, and these
include protein targets for over 40% of all drugs in use today. There is no
understanding of the mechanisms, and atomic interactions of any one of these.
In most cases the particular membrane protein targets of today's drugs are
not yet determined. This is primarily because they are membrane proteins, where
preparation in structurally homogeneous and functionally active state, and
subsequent structure determination have been extremely challenging.
CSMP supports an integrated program, with interdependent subprojects,
and core facilities that provide for routine processes, including protein purification,
characterization, x-ray diffraction at the Advanced Light Source in Berkeley
at beamlines 5.0.2, and 8.3.1, and structure determination by electron microscopy
and NMR.
Bioinformatics supports two key aspects of the overall effort. It contributes
to the construction of a target list of representative proteins whose structures
are to be determined. The second contribution of bioinformatics is that it
leverages the experimentally determined structures by structurally and functionally
characterizing many more related protein sequences. Target selection and protein
structure modeling are clearly inter-dependent. The better are our prediction
methods, the smaller is the number of experimentally determined target structures
that are needed for a given level of structural coverage and accuracy.
To illustrate the rich set of mechanisms of membrane proteins we include a
statistical summary here:
Figure D1. The distribution of the proteins as a function of the number of
predicted transmembrane helices in the four genomes of interest. The human
sequences were taken from the NCBI RefSeq database, the others from Uniprot.
Genome
| # All Proteins
| # Transmembrane Proteins
| Fraction (%)
| # proteins with ≥3tm helices
| # clusters
| # clusters 30% seq id, 50% seq id
|
H.sapiens |
21, 901 |
5,463 |
24.9 |
2635 |
1,833 |
2,025 |
P.aeruginosa |
5,565 |
1,290 |
23.1 |
816 |
732 |
789 |
E.coli |
4,289 |
1,072 |
24.9 |
700 |
643 |
681 |
M.jannaschii |
1,715 |
364 |
21.2 |
207 |
198 |
204 |
Table D1. Frequency of membrane proteins and their clusters in four genomes. "#
transmembrane proteins" indicates the number of proteins predicted to
contain at least one transmembrane (tm) spanning helix. Predicted transmembrane
proteins with at least 3 helices were clustered. The clustering was performed
such that all members in the same cluster share at least 50% sequence identity
and are not more than 50 residues different in length. The number of clusters
for all membrane proteins from the four genomes together is 3,273 and 3,699
at the 30% and 50% sequence identity cutoffs, respectively. The average cluster
sizes are 1.3 and 1.2 proteins, respectively. At 30% sequence identity cutoff,
there are 2,648, 413, 125, and 88 clusters with 1, 2, 3, and more than 3 members,
respectively.
Membrane proteins that span the membrane three or more times will be expressed
and then subject to 3-dimensional structure determination by one of several
methods. These include high resolution X-ray crystallography, electron crystallography
of 2-dimensional crystals, single particle electron microscopic imaging and
alignment, and Nuclear Magnetic Resonance (NMR). |