15. Where Shall We Begin?

lscole
May 15, 2025
8 min read

Updated: Jan 19

If someone had wanted a book in the 15th century it would have had to have been copied by hand from an existing one. The scribe would have started at page one and proceeded page by page until the last page was, as they say, “in the books.”

A human genome is like a book. So, to copy it, one might imagine the cell would use the same straightforward approach: start at one end of each of the 46 chromosomes and progress to the other. Unfortunately, that wouldn’t work. Human cells have to replicate their genomes in about 8 hours. Human DNA replication proceeds at about 50 nucleotides per second. It would take over a month to copy a typical chromosome using this approach.

In fact, genome replication in every organism starts not at the ends of chromosomes but at one or more sites within them called replication origins. In all species, specific proteins recognize replication origins, unwind the double helix there, and then recruit other required proteins. Once the other proteins arrive, replication begins simultaneously in both directions.

But what defines a replication origin? In fact, it depends on the species. To provide some perspective, we’ll look at two simpler species--the bacteria E. coli, a single-celled prokaryote, and the yeast S. cerevisiae, a single-celled eukaryote--before we consider human replication origins.

What we'll see based on this comparison is that single-celled organisms with small genomes have rigidly defined replication origins whereas humans--multicellular organisms with a much larger genome-- require more flexible, vaguely-defined replication origins that have the ability to adapt to different cell types and developmental states.

Bacterial replication origins

The E. coli genome is almost 5 Mb (Megabases, or "millions of bases") and takes the form of a single circular chromosome. Replication begins at one precise replication origin, which is called OriC and which is easy to find: it's 245-bp long and contains multiple copies of two kinds of DNA sequences: 9-mer sequences called DnaA boxes (5’-TTATCCACA-3’) that serve as landing spots for the first replication protein (DnaA), and 13-mer sequences called DNA Unwinding Elements, or DUEs, that are so-called "AT-rich." The DUE sequence is 5’-GATCTnTTnTTTT-3’ where “n” can be any base. So DUEs contain mainly T bases that bind to A bases on the opposite strand.

An important side note: a double helix with a lot of A and T bases is refered to as "AT-rich." AT-rich DNA is relatively easy to open up (by separating the two strands) compared to DNA that has a lot of G and C bases. The reason is that the strength of attachment, or bonding, between A and T on the two strands is slightly weaker than between G and C bases.

E. coli (bacteria) cells — *E. coli* (bacteria) cells

Bonding strength is relevant because as more and more bacterial DnaA proteins attach to DnaA boxes within OriC, the DNA bends and coils, putting mechanical stress on the AT-rich DUE sequences. This opens up the double helix strands, making it possible for other proteins to access the DNA and commence replication.

Yeast replication origins

Yeast is also a single-celled organism but its genome is about 12 Mb (2.5 times larger than E. coli's) and takes the form of 16 linear chromosomes rather than one circular one. Also, since yeast is a eukaryote, its DNA is in the form of chromatin: it's wrapped in histone proteins to form nucleosomes.

The first difference between the yeast and bacterial replication origins is that instead of a single origin, the yeast uses several hundred. But it's a bit more complicated. S. cerevisiae actually has thousands of "potential" replication origins, but only uses a fraction of them. The ones not initially used are refered to as "dormant" replication origins.

S. cerevisiae (yeast) cells — *S. cerevisiae* (yeast) cells

Having dormant origins provide redundancy during replication. If replication from an active origin runs into a problem--for example, if it encounters DNA damage--then one or more dormant origins near the original one can be called into action to help replicate that section of the genome.

Dormant origins are also called into service when a cell is experiencing replicative stress--that is, when cellular conditions are, for some reason, not conducive to replication. This causes replication to slow down. When this occurs, calling dormant origins into service ensures that replication will be completed in the required time window.

So there are many of them, and they provide S. cerevisiae much more flexibility in terms of origin usage, but what do the yeast replication origins look like? They more or less follow the E. coli pattern. They're up to 200 bases long and, like E. coli’s, include multiple copies of two DNA sequences (although the yeast sequences aren't as strictly defined as E. coli's).

The first are 11-bp DNA stretches called ARCs (5’-WTTTAYRTTW-3’) that, like E. coli's DnaA boxes, serve as landing spots for an initial replication protein, yeast ORC. The ambiguous bases in the sequence are defined as: W = A or T, Y = C or T, and R = C or G.

The second kind of sequences are called B regions. Like E. coli's DUE sequences, B regions are AT-rich. As with E. coli, binding of yeast ORC to ARC sequences opens up the AT-rich B region double helices, giving other replication-related proteins access to the DNA there.

Why so many eukaryotic replication origins?

Why does S. cerevisiae require so many more replication origins than E. coli? Minimally, the yeast would need 16 origins, corresponding to its 16 chromosomes. But it uses about 400.

Maybe the yeast faces a much tougher replication challenge. Let's consider that. E. coli copies its genome in about 20 minutes—the time it takes for a bacterium to grow and divide. S. cerevisiae copies its genome, which is about 2.5 times larger than the E. coli’s, in about 40 minutes. So, yeast has a larger genome to copy (2.5x). But it also has more time to copy it (2x). So their replication challenges are not radically different.

In fact, the reason a yeast cell needs so many more origins than a bacterial cell is explained by the speed at which their respective replisomes, or replication apparatus', move along the DNA. E. coli synthesizes DNA at 1,000 bases per second. S. cerevisiae and other eukaryotes, including humans, synthesize at 50 bases per second—200 times slower than E. coli. [3, p.275]

Eukaryotic replisomes move slowly because the histone protein spools must be removed from the chromatin DNA as the replisome progresses. Then, they must be replaced after the replisome completes synthesis. Not only that, the epigenetic markings on the original histones and the original DNA methylation patterns must be recreated on the newly replicated strand. So DNA replication is much more challenging for eukaryotes.

Next, let's move to the even more complex eukaryote Homo sapiens. Unlike the single-celled, eukaryotic yeast, humans are made up of trillions of cells of different types and stages of development. This added complexity impacts the form and usage of human replication origins.

Human replication origins

To some extent, human replication origins continue the trends we saw as we moved from bacteria to yeast. For starters, there’s the total number of human replication origins. While E. coli has one and S. cerevisiae has several hundred, the human genome has 50,000-100,000 “potential” replication origins. 30-50% of these will fire in S-phase. So, ultimately, a single human cell will use 15,000 to 50,000 replication origins to copy its genome!

Human replication origins are also very large. While E. coli origins are 245 bases long and S. cerevisiae's are up to 200 bases long, human replication origins are typically 5,000-50,000 thousand bases long! They span huge chromosomal distances.

Finally, whereas bacterial origins are characterized by strict sequence motifs and yeast replication origins only slightly less strict ones, human replication origins have no consensus sequence at all!

This raises an obvious question: What defines them?

The answer is still being worked out. But we know their typical characteristics. I'll go through them, but the "take home lesson" will be that human replication origins seem to be defined more by chromatin features, especially DNA accessibility, rather than by specific sequence patterns.

Let's review three typical characteristics of human replication origins.

(1) Accessibility: Genomic DNA has tighly packed regions (heterochromatin) and loosely packed regions (euchromatin). Genomic DNA also has regions that are typically positioned toward the outer edges of the nucleus and other regions that are positioned deeper inside the nucleus. Human replication origins show a tendency to be: (1) in regions of more loosely packed (and thus more accessible) euchromatin, and (2) in the more accessible DNA located near edges of the nucleus.

(2) Actively transcribed genes: Human origins tend to be near actively transcribed genes, especially near the regulatory regions located in front of those genes. These regions control when and how much of a gene to transcribe into mRNA.

But a caveat here: our first and second characteristics of human replication origins are highly correlated. Actively transcribed genes also tend to be found in euchromatin. So these two characteristics--accessibility and proximity to active genes--are hard to disentangle.

In addition to being accessible, there are a couple other practical reasons why human replication origins might be located near actively transcribed genes. The first is to minimize replication-transcription conflicts.

In both DNA replication and RNA transcription (gene expression), the double helix is separated and a complex of proteins progresses down the DNA making a copy of one of the strands: a DNA copy in the case of replication and an RNA copy in the case of transcription. The replication machinery runs the risk of crashing into the transcription machinery. By copying actively transcribed genes early, the frequency of such conflicts might be minimized.

The other potential benefit of replicating active gene regions early is to ensure that essential genes--for example, those involved in cell cycle control, DNA repair, and metabolism--are quickly duplicated and available for transcription during most of S-phase.

(3) Epigenetic modifications: The third characteristic of human replication origins is specific chemical modifications to both histone proteins and to DNA located near those origins. The most important histone modification is H4K20me2.* It serves as the human ORC binding site much as E. coli’s DnaA box and S. cerevisiae’s ARC function as attachment sites for DnaA and yeast ORC, respectively.

Human replication origins have other characteristic histone modifications, too. The modification H3K4me3 is common to both human replication origins and gene promotor sites. And the modifications H3K9ac and H3K27ac are associated with human replication origins as well as with open euchromatin, generally.

Another entanglement: like actively transcribed genes, histone modifications are frequently associated with more relaxed euchromatin status and thus DNA accessibility. Some histone modifications have a direct effect on DNA accessibility and others have an indirect effect. In the former case, the physical presence of the modification causes the double helix to open more easily. In the latter, a histone modification flags other proteins called chromatin remodelers to remove and/or move nucleosomes to make DNA more accessible. In both cases, histone modifications lead to more open chromatin.

To summarize, human replication origins are defined more by chromatin features such as accessibility and epigenetic modifications than by DNA sequence. And the firing of human replication origins is less "hard wired" than the firing of bacterial or yeast origins.

These characteristics offer benefits to multicellular organisms. Such organisms are comprised of myriad cell types--skin cells, muscle cells, neurons, etc.--each with a unique transcriptional program. Having many potential replication origins allows an optimal set to fire as determined by the cell's transcription pattern.

The same is true for cells at different stages of development. For example, embryonic cells have to replicate their DNA much more quickly than differentiated cells and so can employ a denser set of replication origins at the outset of S phase. Here again we see the benefit of the flexibility inherent in human replication origins.

The ambiguity of human replication origins and the flexibility in their firing provides significant advantages to multicellular eukaryotes. The next chapter returns our focus to the human. It covers the events that transpire immediately after human ORC binds to a H4K20me2 histone modification at a physically accessible replication origin.

“H4K20me2” is read as follows: “H4”: the modification is on histone number 4 (of 4); “K20”: the modification is on the 20th amino acid of histone 4, which is the amino acid lysine (abbreviated “K”); “me2”: the modification consists of two methyl groups.

15. Where Shall We Begin?

Recent Posts

Comments

Get in Touch