In 2001, the National Institutes of Health Human Genome Project and Celera will announce the full sequence of the human genome. This progress is being hailed as the beginning of a new era in biomedical research, intended to facilitate the understanding and approaches to human disease. It is now the responsibility of each area of medical research to exploit this resource to its limit for the understanding of its target diseases. In humans, the full genome sequence is distributed over 23 pairs of chromosomes and purportedly carries the code for the fewer than 100,000 proteins that represent the full hereditary blueprint of a human person.
Chromosomes are composed of double strands of DNA nucleotides, each made up of a sugar (deoxyribose), a phosphate group, and one of the four nitrogenous bases (adenine [A], thymine [T], cytosine [C], or guanine [G]). A strand of DNA is generated through bonds between the phosphate and the sugar groups that, once linked, form the outer backbone (F1). The strands are paired to their complements (A with T and C with G) by weak hydrogen bonding between the base pairs and intertwined to form the double helix (F1). A chromosome is a single genetically specific DNA strand surrounded by large numbers of proteins that are involved in maintaining the structure of the chromosome and regulating its expression. DNA sequences encode protein structures by having contiguous nucleotide triplets code for a single amino acid. In addition, DNA sequences code for the start and the stop of each protein to produce a true reading frame.
DNA translates itself into its complementary protein through RNA, which copies the code from DNA and then produces proteins from nucleotide triplet codons in the ribosome. Even though the order of amino acids in a protein translated to the RNA nucleotide code is determined by DNA, substantial posttranslational processing of the protein takes place locally within the cell to eventually influence the function of the protein. Only a small fraction of the genome is thought to code for proteins. The known coding sequences for many proteins (exons) are not contiguous but have noncoding regions at varying intervals (introns) within the full gene. The function of the vast noncoding regions of DNA are not known but are expected to include the DNA sequences regulating genetic expression.
Address reprint requests to Dr. Tamminga, Maryland Psychiatric Research Center, University of Maryland, P.O. Box 21247, Baltimore, MD 21228. Image courtesy of Paul Thiessen of Custom Chemical Graphics. Additional DNA graphic images created by Mr. Thiessen can be viewed at www.ChemicalGraphics.com/paul/DNA.html.
The image on the left depicts a standard creatine phosphokinase model, colored by element, in which the flat base pairs are stacked in the center of the strand, perpendicular to the view, with the backbone making a right-handed spiral around the central, vertical axis. The same atomic positions and balls colored by element are shown in the image on the right, with "sticks" conveying additional information about the nitrogenous bases (purple=adenine, green=thymine, yellow=guanine, and cyan=cytosine).