8. Central Dogma -- Part I (1,343)
- lscole
- Apr 16, 2025
- 5 min read
Updated: Apr 8
In the last chapter, we focused on the genetic code--how DNA stores instructions for building proteins. We learned information flows from DNA to mRNA and then to protein.
In this chapter and the next, we'll look more closely at the mechanics of that relationship. Specifically, we'll unpack transcription in this chapter. Then, in the next, we'll take a closer look at translation.
The relationship between DNA, RNA and proteins--the relationship that covers both transcription and translation--was formally introduced as the central dogma of molecular biology in 1958 by Francis Crick (the discoverer, with James Watson, of the structure of DNA in 1953).
Central dogma captures how the cell turns stored information into proteins that do things. It also captures the idea that DNA can replicate itself.
Here's how Watson drew it out:

Moving in the diagram from left to right, central dogma makes three claims:
(1) A cell can make a new copy of its DNA (genome) in a process called replication. This takes place in the nucleus just before the cell divides. DNA replication is the example case for this book.
(2) A cell can make an mRNA copy of a specific stretch of DNA in a process called transcription. This also occurs in the nucleus. The resulting mRNAs are exported from the nucleus through nuclear pores into the cytoplasm.
(3) A cell's ribosomes, found in the cytoplasm, make proteins using those mRNAs in a process called translation. Ribosomes are found both freely floating in the cytosol and embedded in the endoplasmic reticulum (ER) membrane.
We'll be spending the entire last half of this book detailing Watson's first process--DNA replication. So, I won't review it here. We'll start with transcription.
Transcription (DNA > RNA)
Transcription is the process by which a cell copies a gene’s DNA sequence into a complementary mRNA molecule using the enzyme RNA polymerase. It turns a stable, long-lived DNA sequence into a temporary, usable RNA message.
In this chapter, we'll look at how that flow--from information encoded in DNA to information encoded in mRNA--happens physically and dynamically in the nucleus. Like everything else that occurs in the cell, we'll see that transcription is tightly controlled and coordinated with other cellular activities.
My definition of transcription referred to the gene, so let's clarify that term, too.
A gene is a segment of DNA that contains the instructions to produce a functional product, usually a protein, along with the information needed to control its expression.
More precisely, a gene includes a promoter just upstream of a transcribed region. The promoter marks where transcription will begin. It also includes regulatory DNA sequences--some nearby and some far away--that control when and how strongly it is expressed.
To begin transcription, the transcription machinery assembles at the promoter. This machinery includes RNA polymerase plus a small group of helper proteins that position the polymerase at the promoter, open the DNA there (forming a transcription bubble), and get transcription started.
Once transcription begins, RNA polymerase moves along the transcribed region--the portion of the gene that will be copied into RNA.
It does not move along the coding strand. Instead, RNA polymerase moves along the template strand--opposite the coding strand--in the 3′ to 5′ direction, building the mRNA in the 5′ to 3′ direction--the only direction in which that RNA polymerase can synthesize.
As the RNA polymerase and the transcription bubble move along the DNA, the mRNA peels away and the DNA re-anneals behind it.
Eventually the RNA polymerase reaches a sequence called the transcription termination signal, a DNA sequence that tells RNA polymerase where to stop copying and release the RNA transcript into the nucleoplasm. Transcribing a gene takes from seconds to minutes.
Many RNA polymerases can be transcribing the same gene simultaneously since, when a gene needs to be expressed at a high level, many mRNAs will be needed. mRNAs are relatively short-lived--typically lasting only minutes to hours before the cell breaks them down.
But transcription doesn’t happen automatically. How is it controlled?
RNA polymerase does not bind the promoter on its own. Instead, proteins called transcription factors bind to nearby DNA regulatory sequences where they help recruit and stabilize the polymerase and the rest of the transcription machinery.
In this way, these transcription factors act as gene gatekeepers, controlling whether the transcription machinery assembles and functions and in doing so, regulating the amount of transcription that will take place.
Transcription factors can either promote or inhibit transcription—some help recruit and stabilize the transcription machinery, while others block access or create conditions that make transcription less likely.
Processing the mRNA
Up to this point, we’ve treated the RNA as if it were ready to use. It isn’t.
The mRNA first generated by RNA polymerase is referred to as the pre-mRNA. This pre-mRNA must be processed by the cell to create the final processed mRNA--the mRNA that can be used to make a protein.
Processing has three components.
First, because pre-mRNAs are susceptible to degradation by enzymes in the cell (nucleases) that chew up DNA and RNA, the cell attaches a chemical 5' cap to the starting end of the mRNA. The chemical make-up of the cap isn't important. The important point is that it protects the mRNA from 5' nuclease digestion.
Second, after transcription most human mRNAs receive a poly(A) tail—usually 50–250 adenines long--added to their 3′ end. The poly(A) tail stabilizes the molecule, helps it leave the nucleus, and makes it easier for the cell’s protein-making machinery to use it.
The poly A tail shrinks little by little as the mRNA is used for translation (i.e., for making proteins). When the polyA tail gets too short, the cell degrades the mRNA.
Finally, there is mRNA splicing, which is like mRNA editing.
Up to this point, we've assumed that the transcribed region of a gene is simply a long string of codons. But it's not that simple.
Within the gene are sequences that do not code for protein. The coding segments are called exons; the intervening sequences are introns.
The mRNA that's produced by the RNA polymerase--the pre-mRNA--contains all the gene's exons and all its introns. Then, post-synthesis, that mRNA is processed. All the introns are removed and all or some of the exons are joined, or spliced together.
The final mRNA, then, consists of exons connected to each other and no introns.
Why does the cell do this?
So it can construct many different proteins from a single gene. It accomplishes this by mixing and matching exons in a modular manner. A quick example:
Imagine a gene with a transcribed region that consists of six exons separated by five introns. It could be that case that one protein is produced by splicing together all the exons (exons 1-6).
But another protein with a slightly different activity might be made by combining exons 1, 3, 4, and 6. Another with still a different activity might be made by combining exons 1, 2, 5 and 6.
Most human genes contain introns, and many can be spliced in different ways. And the cell selectively produces the variants it needs in a given context.
The splicing of the exons to each other must be precise. No extra or missing nucleotides are allowed. If the splicing was sloppy, the three-letter codons would risk getting out of frame and then being read out of frame.
Nearly all human genes are split into exons and introns, allowing them to be spliced--and often rearranged--to produce different protein variants.
In the next chapter, we'll take a look at how this final processed mRNA is used by large molecular machines called ribosomes, which will translate it into protein.


Comments