top of page
Search

16. Where Shall We Begin? (1,256)

  • lscole
  • May 16, 2025
  • 5 min read

Updated: Apr 17

If someone had wanted a book in the 15th century, it would have been copied by hand from an existing one. The scribe would have started at page one and gone page by page until the last page was, as they say, “in the books.”

 

A human genome is like a book. So, to copy it, one might imagine the cell would use the same approach: start at one end of each of the 46 chromosomes and progress to the other end.


Unfortunately, that wouldn’t work.


Human cells must replicate their genomes in about 8 hours. Human DNA replication proceeds at about 50 nucleotides per second. It would take over a month to copy a typical chromosome using this approach.

 

In fact, genome replication in every organism starts not at the ends of chromosomes but at one or more sites within them called replication origins.


In all species, specific proteins recognize replication origins, unwind the double helix there, and then recruit the proteins required for replication. Once they arrive, replication begins simultaneously outward, in both directions.

 

But what defines a replication origin?


In fact, it depends on the species. To provide perspective, we’ll look at two simpler species--the bacteria E. coli, a single-celled prokaryote, and the yeast S. cerevisiae, a single-celled eukaryote--before we consider humans.


As we move from simpler prokaryotes to more complex multicellular eukaryotes, we move from a single, rigidly defined replication origin to vastly more, and much more flexibly-defined replication origins that can adapt to different cell types and developmental states.


Bacterial and yeast origins

The E. coli genome is about 5 Mb (Megabases, or "millions of bases") and takes the form of a single circular chromosome. On that small genome, replication begins at one precise replication origin. And it's easy to find: it's 245-bp long and contains multiple copies of two kinds of highly conserved, short DNA consensus sequences.


These sequences are recognized by specific proteins that recruit the replisome—the molecular machine that carries out DNA synthesis.


E. coli (bacteria) cells
E. coli (bacteria) cells

Like E. coli, the yeast S. cereviaiae is also a single-celled organism, but its genome is about 12 Mb (2.5 times larger) and takes the form of 16 linear chromosomes rather than one circular one.


Since yeast is a eukaryote, its DNA is in the form of chromatin: it's wrapped in histone proteins to form nucleosomes.


The first difference between the yeast and bacterial replication origins is that instead of a single origin, the yeast uses several hundred.


In fact, it's even more complicated. S. cerevisiae has thousands of "potential" replication origins, but only uses a fraction of them. The ones not initially used are referred to as dormant replication origins.


S. cerevisiae (yeast) cells
S. cerevisiae (yeast) cells

Dormant origins provide redundancy during replication. If replication from an active origin runs into a problem, then one or more dormant origins near the original can be called into action.


So there are many of them, providing much more flexibility, but what do the yeast replication origins look like?


They follow the E. coli pattern but are more vague. Whereas bacterial origins are characterized by strict consensus sequences, yeast origins are somewhat less strictly defined.

 

Why so many origins?

A eukaryote like S. cerevisiae requires many more replication origins than a prokaryote like E. coli because of the speed at which their respective replisomes, or replication machines, move along the DNA.


E. coli synthesizes DNA at 1,000 bases per second. S. cerevisiae and other eukaryotes, including humans, synthesize at roughly 50 bases per second—200 times slower.

 

Eukaryotic replisomes move more slowly in part because histone proteins must be removed and then replaced. Also, the original epigenetic markings on histones and DNA methylation patterns must be recreated on the newly synthesized strand.


So, as we moved from bacteria to yeast, three trends are apparent: (1) there are many more origins, (2) the origins are less well-defined, and (3) there's more flexibility in origin usage.


These trends will gain even more momentum as we move on to humans.

 

Human origins

We'll start with the number of origins. While E. coli has one and S. cerevisiae has several hundred, the human genome has 50,000-100,000 potential replication origins. About 30-50% of these (15,000 to 50,000) will fire in S-phase.


Human replication origins are not only numerous, they're very large. While E. coli origins are 245 bases long and S. cerevisiae's are up to 200 bases long, human replication origins are not sharply defined; instead, they span broad regions of several thousand to tens of thousands of bases.


Finally, whereas bacterial origins are characterized by strict consensus sequence motifs and yeast replication origins by somewhat less strictly defined ones, human replication origins have no consensus sequence at all!


This raises a question: What defines them?

 

The answer is being worked out. But we know their typical characteristics.


Human origins seem to be defined more by chromatin features like DNA accessibility and epigenetic modifications, rather than specific sequence patterns.


Three typical characteristics of active human replication origins are:


(1) DNA accessibility: Genomic DNA includes tightly packed regions (heterochromatin) and more loosely packed regions (euchromatin), as well as regions positioned in different parts of the nucleus that offer varying levels of accessibility.


Replication origins are distributed evenly throughout the genome, but the ones located in more accessible euchromatin and in the more accessible nuclear regions tend to be activated earlier and more efficiently than those in regions of the genome that aren't as accessible.


(2) High gene density and actively transcribed genes: Early firing origins tend to be in gene-rich regions--especially in regions with actively transcribed genes. There may be a few reasons for this.


First, in both transcription and replication, large protein complexes track along DNA. These risk running into each other. Replicating active genes early may reduce the frequency of such conflicts.


Also, many essential genes will be needed in S phase. Replicating them early in S phase makes them more available for use during S phase.


Finally by prioritizing its most critical operating instructions early in the process, any mutations that may have arisen in critical genes during replication will have more time to be repaired before cell division.


(3) Epigenetic modifications: There are also specific chemical modifications that human cells make to both histone proteins and DNA located near replication origins. One important histone modification is H4K20me2, but human origins have other characteristic histone modifications, as well.


As an aside, the epigenetic modification H4K20me2 is read as follows: “H4”: the modification is on histone number 4 (of 4);  “K20”: the modification is on the 20th amino acid of histone 4, which is the amino acid lysine (abbreviated “K”); “me2”: the modification consists of two methyl groups.


In bacteria, replication begins at a single, precisely defined site. In humans, it begins across many flexible regions shaped by chromatin and cellular context.


As genomes grow larger and more complex, replication becomes less rigid and more adaptable—ensuring that even a vast genome can be copied reliably within a limited time window.


The ambiguity of human replication origin--and the flexibility in how they are used--provides significant advantages to multicellular eukaryotes made up of myriad cell types each which may also be at different stages of development. It allows an optimal set of origins to fire as dictated by the cell's gene expression pattern.


But having many potential origins raises a new problem: which ones should fire, and when? In the next chapter, we’ll see how cells license replication origins—marking specific sites for activation while also ensuring that each is used only once.

·     

 
 
 

Recent Posts

See All
31. Precision Amid Chaos (788)

Congratulations—you made it to the end of the book. That was not an easy read. If you step back, the story I’ve told you is almost absurd. Cells are made of molecules. Molecules don’t plan, anticipate

 
 
 
30. Finishing the Job (1,155)

An active replication fork has two possible fates: it collides with another fork moving toward it or it reaches the end of a chromosome. We'll spend most of this chapter focused on the latter case. Bu

 
 
 

Comments


Post: Blog2_Post

Get in Touch

L. Scott Cole

Berkeley, CA

  • Facebook
  • Twitter
  • LinkedIn
  • Instagram

Thanks for submitting!

bottom of page