top of page
Search

7. What DNA Does (1,207) DONE

  • lscole
  • Apr 15, 2025
  • 5 min read

Updated: 2 days ago

The last chapter covered DNA's structure: what it is. In this chapter, we shift our focus to function: what DNA does—or, more exactly, how it works.


The Genetic Code

If we were to walk down a stretch of double stranded DNA we could announce the letters on one or the other of the strands as we passed: "ATGTCGGATAGATGA", for example. A code is contained in these 15 letters.


The primary purpose of the code is to make proteins--but DNA does so indirectly, using mRNA as an intermediary (the "m" stands for "messenger"). Thus, the DNA code will effectively be photocopied into an mRNA that, in turn, will be used to synthesize the protein. This process will be covered thoroughly in the next chapter.


For now, there are just three things we need to know related to RNA in general—and mRNA in particular.


First, like DNA, RNA is a chain of nucleotides—but with slightly different chemistry. DNA uses deoxyribonucleotides while RNA uses ribonucleotides. They differ only in their sugars. The sugar in deoxyribonucleotides, deoxyribose, lacks just one small chemical attachment compared to the ribose sugar in ribonucleotides.


Second, one of the four bases used in nucleotides differs between DNA and RNA. DNA uses thymine (T) base while RNA uses a uracil (U) base. So, if a stretch of DNA reads ATGTAAC, the corresponding RNA sequence would read AUGUAAC.


Finally, when a gene is to be expressed, it is in fact the strand opposite the coding strand--that is, the template strand--that's used to synthesize the mRNA. Using the template strand ensures that the mRNA sequence (the photocopy) matches the gene’s sequence.


These three ideas are enough to understand how information moves from DNA to RNA. In the next chapter, we’ll see how RNA information is turned into a protein.



The genetic code: Multiple three letter codons correspond to each amino acid. Note that the code is written with uracil (U) rather than thymine (T). This is because U is used instead of T in RNA. And when the code is actually put into use, it is "written" in RNA rather than DNA. Amino acids are listed using their shortened names (e.g. at the top left of the table, "Phe" stands for the amino acid phenylalanine.
The genetic code: Multiple three letter codons correspond to each amino acid. Note that the code is written with uracil (U) rather than thymine (T). This is because U is used instead of T in RNA. And when the code is actually put into use, it is "written" in RNA rather than DNA. Amino acids are listed using their shortened names (e.g. at the top left of the table, "Phe" stands for the amino acid phenylalanine.

Codons

To make a protein, the cell must know the order of amino acids. Fortunately, the order of the nucleotides in that protein's gene (and the related mRNA) conveys the order.


It accomplishes this using a code that assigns every possible three-letter combination of nucleotide bases (called codons) to specific amino acids. The genetic code is a set of rules that maps three-letter codons to amino acids or, as we'll see, stop signals.


Let’s return to our example.


The DNA sequence I just gave you--ATGTCGGATAGATGA-- can also be viewed as a stretch of three letter codons. Grouped in threes, the sequence becomes: ATG TCG GAT AGA TGA.


Think of a three-letter codon as a word in spoken language. Continuing the analogy, think of a gene as a sentence. A gene (sentence) is a string of codons (words) that has meaning.


For example, if we have a piece of DNA 450 bases long, that would equate to 150 codons (450 bases divided by three bases per codon). And with a 150 codon gene, the cell can make a protein that's 150 amino acids long!


So DNA stores information in a chemical language—and that language is read in groups of three letters.


When this sequence is turned into an mRNA molecule, the "T" in DNA is replaced with a "U" (standing for the ribnucleotide uracil) in the mRNA. So the gene's codons in the language of mRNA would be: AUG UCG GAU AGA UGA.


The amino acid sequence can be determined directly from the codons.


We just refer to our codon chart (see figure), find the codon, and see what amino acid it specifies. The first codon in our gene is "ATG." That codon specifies the amino acid methionine ("Met" in the chart).


The second amino acid in our very small protein is specified by the codon UCG. UCG corresponds to the amino acid serine (Ser). So now we know that the first two amino acids in our very small protein are Met-Ser.



How Many Codons Are Enough?

This raises an interesting question: how many codons are needed to encode all of the amino acids?


Given that there are four different nucleotides and three different code positions, there are four to the third power--that is, 64--different possible codons. There are more codons available to the cell (64) than amino acids (20). In theory, we have too many codons.


The cell deals with this using what's referred to as "redundancy." That is, most amino acids are identified by more than one codon. For example, the amino acid alanine (Ala) is associated with four different codons: GCG, GCA, GCC and GCU.


The difference between these four codons is the last nucleotide. If you scan the codon table, you'll notice that the first two nucleotides of a codon seem to dominate.


Why do we need redundancy? In theory, we don't. The problem is, two letter codons would only generate four squared, or 16, different codons--not enough to identify all 20 amino acids.


Identifying a Gene

So far, we’ve treated DNA as if it were one long continuous message. But in reality, only certain stretches—genes—are used to make proteins. In fact, 98.5% of the genome does not code for proteins!


So how does the cell identify genes contained in long stretches of nucleotides?


It turns out that many factors combine to identify the start of a gene.


These factors include: (1) the presence of CpG islands (regions enriched in CG dinucleotides), (2) chromatin state (that is, modifications to certain proteins associated with the genome that we'll learn about soon), and (3) distant enhancer sequences located thousands to millions of bases away from the gene.


But there are sequences both within the gene and just in front of it that also play roles. I'll mention three. First, near the start of genes--that is, near the transcription start site--are promoter sequences. These sequences attract specific transcription-related proteins (i.e., the transcription machinery) which then assemble near the promoter. This occurs just prior to transcription.


In addition, certain codons identify where protein coding begins and others identify where it ends. The protein-coding portion of most genes begins with a start codon, usually ATG (AUG in RNA-speak). This start codon is indicated in blue font in the codon table.


The start codon also codes for the amino acid methionine (Met). So the first amino acid in many proteins is methionine, although in some cases it will be removed after the protein has been synthesized.


Genes also have stop codons that identify their termini. There are three stop codons: UAA, UAG, and UGA. When the ribosome encounters a stop codon on an mRNA, protein synthesis ends.


So far, we’ve focused on the code itself—how DNA stores instructions for building proteins. But a code is only useful if it can be read and executed. In the next chapter, we’ll follow the flow of information from DNA to RNA to protein—this idea was put forth by James Watson of Watson and Crick fame. He called it the "central dogma" of molecular biology.

 
 
 

Comments


Post: Blog2_Post

Get in Touch

L. Scott Cole

Berkeley, CA

  • Facebook
  • Twitter
  • LinkedIn
  • Instagram

Thanks for submitting!

bottom of page