top of page
Search

7. What DNA Does (1,130)

  • lscole
  • Apr 15, 2025
  • 5 min read

Updated: Apr 30

How does a string of letters--really, a string of molecules--actually do anything? In this chapter, we shift our focus to function: how DNA works.


The Genetic Code

If we were to walk down a stretch of double stranded DNA we could announce the letters on one or the other of the strands as we passed: "ATGTCGGATAGATGA", for example. A code is contained in these 15 letters.


Every protein in your body--from enzymes to muscle fibers--starts as a sequence like this.


A primary purpose of the code is to make proteins--but DNA does so indirectly, using mRNA as an intermediary (the "m" stands for "messenger"). Thus, the DNA code will effectively be copied into an mRNA that, in turn, will be used to synthesize the protein. This process will be covered in the next chapter.


For now, two things matter about RNA.


First, like DNA, RNA is a chain of nucleotides—but with slightly different chemistry. DNA uses deoxyribonucleotides while RNA uses ribonucleotides. They differ only by one small chemical attachment.


Second, one of the four bases used in nucleotides differs between DNA and RNA. DNA uses thymine (T) base while RNA uses a uracil (U) base. So, if a stretch of DNA reads ATGTAAC, the corresponding RNA sequence would read AUGUAAC.




Codons

To make a protein, the cell must know the order of its amino acids. Fortunately, the order of the nucleotides in a protein's gene (and related mRNA) conveys the order.


It accomplishes this using a code that assigns every possible three-letter combination of nucleotide bases (called codons) to specific amino acids. This is the genetic code.


There’s no central interpreter reading this code. Each molecule interacts based on shape--and yet an accurate translation emerges.


Returning to our example, how do we interpret the sequence ATGTCGGATAGATGA?


By grouping. That same sequence can also be viewed as a stretch of three letter codons. Grouped in threes, the sequence becomes: ATG TCG GAT AGA TGA.


Think of a three-letter codon as a word in spoken language. Continuing the analogy, think of a gene as a sentence. A gene (sentence) is a string of codons (words) that has meaning.


For example, if we have a piece of DNA 450 bases long, that would equate to 150 codons (450 bases divided by three bases per codon). And with a 150 codon gene, the cell can make a protein that's 150 amino acids long!


So DNA stores information in a chemical language. That is, it looks like a language. But it's read without a reader.


Just as DNA’s structure makes copying possible, its sequence makes it readable.


When this sequence is turned into an mRNA molecule, the "T" in DNA will be replaced with a "U" (uracil) in the mRNA. So the gene's codons in the language of mRNA would be: AUG UCG GAU AGA UGA.


These triplets aren't just letters--they're instructions.


Each codon corresponds to a specific amino acid.


The first codon in our gene is ATG. That codon specifies the amino acid methionine (Met in the chart). The first amino acid in this protein is methionine.


The second amino acid in our protein is specified by the codon UCG. UCG corresponds to the amino acid serine (Ser). Now we know that the first two amino acids in our very small protein are Met-Ser.


How Many Codons Are Enough?

At this point, something should bother us.


Given there are four different nucleotides and three code positions, there are four to the third power--that is, 64--possible codons.


There are more codons available to the cell (64) than there are amino acids (20). In theory, we have too many codons. This seems wasteful.


But instead of waste, this turns out to be a form of built-in protection.


The cell deals with this using what's referred to as redundancy. Most amino acids are identified by more than one codon. For example, the amino acid alanine (Ala) is associated with four codons: GCG, GCA, GCC and GCU.


The difference between these four codons is the last nucleotide. If you scan the codon table, you'll see that the first two nucleotides of a codon seem to dominate.


Much of the code’s resilience comes from the third position in each codon, where changes often have no effect at all.


But even when mutations alter other positions, the code is structured so that similar codons tend to specify chemically similar amino acids. Together, these features make the system surprisingly tolerant to error.


Even the structure of the code itself--its redundancy and tolerance--emerges from simple rules.


Identifying a Gene

So far, we’ve treated DNA as if it were one long continuous message. But in reality, only certain stretches--genes--are used to make proteins. It might surprise you that 98.5% of the genome does not directly code for proteins.


So how does the cell identify genes contained in long stretches of nucleotides?


There’s no single signal. Instead, multiple local features combine to make genes recognizable.


For example, genes are often preceded by CpG islands--long stretches of DNA enriched in CG dinucleotides .


Genes tend to be in loosely packed DNA (euchromatin) with specific chemical modifications rather than in tightly packed heterochromatin lacking the markings.


There are also sequences both within a gene and just in front that play key roles.


Near the start of genes--that is, near the transcription start site and also often near CpG islands--are promoter sequences. Promoter sequences are recognized by the transcription machinery, which assembles nearby. This occurs just prior to transcription.


The core promoter sequence--the minimal region needed to effect transcription--is usually small. It's on the order of a few dozen base pairs. But it typically sits inside a longer stretch of DNA, often a few hundred base pairs long, that helps regulate when and how strongly a gene is expressed.


In addition, certain codons identify where protein coding begins. The protein-coding portion of most genes begins with a start codon, usually ATG (AUG in RNA-speak). This start codon is indicated in blue font in the codon table.


The start codon also codes for the amino acid methionine (Met). So the first amino acid in many proteins is methionine, although it's sometimes removed.


Genes also have stop codons that identify their termini. There are three stop codons: UAA, UAG, and UGA. When the ribosome encounters a stop codon on an mRNA, protein synthesis ends.


So far, we’ve focused on the code itself--how DNA stores instructions for building proteins. There is no central reader and no plan--just molecules interacting locally. And yet, this chemical code is reliably turned into proteins.


In the next chapter, we’ll follow the flow of information from DNA to RNA to protein--this idea was first put forth in 1957 by Francis Crick of Watson and Crick fame. He called it the central dogma of molecular biology.

 
 
 

Recent Posts

See All
31. Precision Amid Chaos (788)

Congratulations—you made it to the end of the book. That was not an easy read. If you step back, the story I’ve told you is almost absurd. Cells are made of molecules. Molecules don’t plan, anticipate

 
 
 
30. Finishing the Job (1,155)

An active replication fork has two possible fates: it collides with another fork moving toward it or it reaches the end of a chromosome. We'll spend most of this chapter focused on the latter case. Bu

 
 
 

Comments


Post: Blog2_Post

Get in Touch

L. Scott Cole

Berkeley, CA

  • Facebook
  • Twitter
  • LinkedIn
  • Instagram

Thanks for submitting!

bottom of page