9. What DNA Does

lscole
Apr 30, 2025
4 min read

Updated: 3 days ago

The last chapter covered structure: what DNA is. In this one, I move on to function: what DNA does, or, maybe better, how DNA works.

I'll start, though, by quickly refreshing some points about structure. Recall that DNA is a polymer made of monomers called nucleotides. There are four different nucleotides that we refer to using their initials: A, T, G, and C. I also mentioned that the double helix form of DNA is really two DNA molecules (polymers) intertwined and connected by weak hydrogen bonds between complementary bases (that is, A bound to T and G bound to C).

Given complementary base pairing, if we were to walk down a stretch of DNA we could announce the letters as we passed: "ACCTCGGATAGATGC" etc. A code is embedded in these letters. But of course there's no conscious entity inside a cell to read a code. The code works largely by shape. It's the shape formed by the linear array of nucleotide bases that makes DNA work.

Before we get into mechanisms, though, let's establish that the primary purpose of the code is ultimately to make proteins (through an intermediary called mRNA that I'll discuss in the next chapter). We've covered proteins. They're polymers, too: linear arrays of amino acids. DNA is a linear array of nucleotides. Both are linear polymers. This makes the one kind of molecule (DNA) coding for the other (a protein) possible.

The genetic code: Multiple three letter codi=os correspond to each amino acid. Note that the code is written with uracil (U) rather than thymine (T). This is because U is used instead of T in RNA. And when the code is actually put into use, it is "written" in RNA rather than DNA. Amino acids are listed using their shortened names (e.g. at the top left of the table, "Phe" stands for the amino acid phenylalanine. — **The genetic code**: Multiple three letter codi=os correspond to each amino acid. Note that the code is written with uracil (U) rather than thymine (T). This is because U is used instead of T in RNA. And when the code is actually put into use, it is "written" in RNA rather than DNA. Amino acids are listed using their shortened names (e.g. at the top left of the table, "Phe" stands for the amino acid phenylalanine.

If a cell wants to make some protein, it needs to know the order of amino acids. Fortunately, the cell's DNA tells it the order. How? Using a genetic code that assigns every possible three-letter combination of nucleotide bases (called codons) to specific amino acids. The genetic code uses these three letter codons to build proteins.

A bit of an aside: Given that there are four different nucleotides and three different code positions, there are four to the third power--that is, 64--different possible codons. If the goal is to have exactly one codon correspond to each of the 20 amino acids, then we have too many codons. But that ends up not being a problem. Most amino acids are identified by more than one codon. So there is redundancy to the genetic code. A two letter codon wouldn't work. It would only generate four squared--or 16--different codons, which isn't enough to identify all 20 amino acids.

With the genetic code, we likened a three letter codon to a word. Continuing the analogy, I'll liken a gene to a sentence. A gene (sentence) is a linear array of codons (words) that has meaning. For example, if we have a piece of DNA 450 bases long, that would equate to 150 codons (450 bases divided by three bases per codon). And with a 150 codon gene, the cell can make a protein that's 150 amino acids long!

Now that we've moved on to genes, I need to make a few points. First, every gene starts with the same codon. It's called the start codon and it's ATG. It's indicated in blue in the code table. The start codon also codes for the amino acid methionine (Met). So the first amino acid in many proteins is methionine, although in some cases that first methionine is removed after the protein has been synthesized.

Genes also have stop codons that identify their termini. Also from the table, we can see that there are three stop codons: UAA, UAG, and UGA. Once the code reader (a protein called an RNA polymerase that I'll discuss in the next chapter) reaches a stop codon, protein synthesis terminates.

Genes also have sequences close to them that regulate them--that turn them on and off. These regulatory sequences are called promotors and they are typically in front of the start codon. So-called "regulatory proteins" (mentioned in the chapter "Tiny Machines") attach to promotor sequences not only to turn the gene on (i.e., make the protein it codes for) but also to control the degree to which the gene will be active.

Let me recap. A given gene codes for a given protein using three-letter codons made up of four different nucleotides that correspond to each of the 20 amino acids. Genes start with the codon ATG (the start codon) and end with one of three stop codons. Genomic DNA also includes sequences near gene start sites called promotors that plays regulatory role.

So far, I haven't talked about the process by which a gene makes a protein. That's the topic of the next chapter. Get ready to learn about transcription (making an mRNA from a gene) and translation (making a protein from an mRNA). Both are amazing processes.

9. What DNA Does

Recent Posts

Comments

Get in Touch