top of page
Search

Where to Start? Replication Origins

  • lscole
  • May 15
  • 8 min read

Updated: 1 day ago

If you had wanted a copy of some important book in the 15th century, it would have been reproduced by hand from an existing one. The scribe would have started at page one and gone page by page until the last page was, as they say, “in the books.”

 

A genome is kind of like a book. So, maybe the cell would employ the same strategy: start at one end of each of the 46 human chromosomes and progress to the other. But that wouldn’t be practical. Human cells replicate their genomes in about 8 hours. Given that human DNA replication proceeds at about 50 nucleotides per second, it would take over a month to copy each chromosome.

 

In fact, genome replication in every organism starts not at the ends of chromosomes but at one or more sites within chromosomes called replication origins. In all species, specific proteins recognize replication origins, unwind the double helix there, and then recruit the other required proteins. Replication then starts in both directions simultaneously.

 

But what defines a replication origin? In fact, it depends on the species. To provide a broad picture, we’ll look at two simpler species before we consider humans. First, we’ll look at the bacteria E. coli: a single-celled prokaryote. Then we'll consider the yeast S. cerevisiae: a single-celled eukaryote. What we’ll end up seeing is that as organism complexity increases, replication origin complexity does as well.  

 

Bacterial replication origins

The E. coli genome is about 650 times smaller than the human genome and takes the form of a single circular chromosome. E. coli's replication origin is called OriC and it's easy to spot: it's 245-bp long and contains multiple copies of two tell-tale DNA sequences: a 9-mer called a DnaA box (5’-TTATCCACA-3’) and an AT-rich 13-mer called a DNA Unwinding Element, or DUE. The DUE 13-mer sequence is: 5’-GATCTnTTnTTTT-3’ where “n” represents any base.


E.coli’s DnaA boxes are landing spots for an initial replication protein, DnaA, that kicks-off the entire replication process. As more DnaA boxes have DnaA proteins attached to them, the DNA at the origin bends and coils. This puts mechanical stress on the AT-rich DUE sequences and opens up the double helix so that the replication machinery can access the DNA. The DUE sequences open easily because, again, they are rich in A and T nucleotides and the bonding between the A and T nucleotides is weaker than between the G and C nucleotides.

 

Yeast replication origins

How do the replication origins of the single-celled yeast S. cerevisiae compare with those of E. coli? First, realize that the yeast genome is much larger. It's 250 times smaller than the human genome and takes the form of 16 linear chromosomes rather than a single circular one. Also, because yeast is a eukaryote, most of the DNA is wrapped in histone proteins to form nucleosomes—in other words, it’s in the form of chromatin.

 

The first difference between their replication origins is that instead of a single replication origin, S. cerevisiae has about 400. Each is 100-200 bases long and, like E. coli’s, includes several copies of a consensus sequence: here an 11-bp DNA stretch called an ARC that reads 5’-WTTTAYRTTW-3’. The ambiguous bases (W, Y, and R) are defined as follows: W = A or T, Y = C or T, and R = C or G.  So the yeast origin sequence isn’t as rigidly defined as that of E. coli. Yeast origins have other sequences, too, called B regions that are also AT-rich.

 

Like E. coli’s DnaA box, S. cerevisiae’s ARC sequence is the attachment point for its initial replication protein, yeast ORC. I’ve specified yeast ORC because the replication initiation protein in humans is also called ORC.


Being a eukaryote, yeasts have a replication challenge that prokaryotes don’t: their genomes are in the form of chromatin; that is, the DNA is wrapped in histone proteins to form nucleosomes. Nucleosomes, as you might guess, hinder proteins' access to DNA. Because of this, yeast origins are typically found in stretches of the genome where the cell has proactively removed the histones proteins to expose the double helix. These are called nucleosome-free regions (NFRs).

 

Not all 400 yeast origins fire at the same time. Some are used earlier in the yeast's S-phase and some later. Firing order is influenced by characteristics of the replication origin. Most important are chromatin status and specific histone modifications. Origins located in loosely packaged euchromatin fire early in S-phase whereas those in tight heterochromatin fire later. We’ll see that chromatin status and specific histone modifications also define human replication origins.

 

Why so many eukaryotic replication origins?

It would be reasonable to ask why S. cerevisiae has so many more replication origins than E. coli. Minimally, the yeast would need 16 origins, corresponding to its 16 chromosomes. But it uses 400.

 

Comparing their respective replication challenges won’t answer the question. E. coli copies its genome in about 20 minutes—the time it takes for a bacterium to grow and divide. S. cerevisiae copies its genome, which is about 2.5 times larger than the E. coli’s, in about 40-minutes during its S-phase. Yeast has a larger genome to copy (2.5 times larger). But it also has more time to copy it (about twice as long). So, these more or less cancel out.

 

The reason a yeast cell needs so many more origins than a bacterial cell is explained by the speed at which their respective replisomes move. E. coli synthesizes DNA at 1,000 bases per second. Pause and really think about that! S. cerevisiae and other eukaryotes synthesize at 50 bases per second—still fast, but 200 times slower. [3, p.275]

 

Eukaryotes’ replisomes move more slowly because, again, their DNA is in the form of chromatin. That means that the histone protein spools must be removed one-by-one from the DNA as the replisome progresses. Then the histones have to be replaced once the replisome completes its job. To add further complication, the epigenetic markings on the original histones and the original methylation pattern on the DNA must be recreated on the newly replicated chromatin. So, the reason S. cerevisiae has so many more origins is that eukaryotic replisomes move much slower.

 

Human replication origins

Human replication origins amplify even more the trends we saw as we moved from bacteria to yeast. For starters, there’s the number of human replication origins.


While E. coli has one and S. cerevisiae has 400, the human genome has 50,000-100,000 “potential” replication origins. Only 30-50% of these will fire in S-phase. So ultimately about 15,000 to 50,000 replication origins will be used.


The rest are refered to as "dormant." The cell uses dormant origins as back-ups if a replisome that started from an earlier-firing replication origin runs into a problem—for example, if it stalls or collapses at a DNA lesion.

 

Another possible reason for so many potential replication origins relates to the transcription profiles of different cell types. Different cell types call different genes into action. As we'll soon see, human replication origins tend to be near actively transcribed genes. Thus, it’s likely that the large excess of potential origins also provides human cells flexibility with regard to these different gene expression patterns.


In addition to having an enormous number of replication origins, human replication origins are very large—not 100-200 nucleotides long like bacteria and yeast, but 5,000-50,000 thousand nucleotides long!  That’s about 25-500 times larger than bacterial or yeast origins.

 

Finally, while E. coli had a clear consensus sequence and yeast a less well-conserved one, human replication origins have no consensus sequence at all. Given that consensus sequences defined E. coli and S. cerevisiae replication origins, how, then, can human replication origins be identified?

 

It isn’t easy. But they have typical characteristics. I’ll group them into three categories. Don’t think of these as completely isolated categories, though. There is significant overlap between them. They are: (1) DNA accessibility, (2) proximity to actively expressed genes and especially gene regulatory regions, and (3) specific chemical modifications to histone proteins--so-called epigentic modifications.

 

So, first, DNA found at human replication origins, especially origins that fire early in S phase, tends to be physically accessible. This manifests in two ways. The first relates to chromatin status. Like yeast origins, human replication origins tend to be found in loosely packaged euchromatin rather than tightly packed heterochromatin. Second, human replication origins are often found at the edges of nuclear structures called TADs (Topologically Associating Domains).

 

What is a TAD? Genomic DNA isn’t dispersed randomly in the nucleus. Rather, sequences close to each on the linear DNA molecule tend to be clumped together to make it easier for DNA stretches located near each other, and harder for sequences far from each other, to physically interact. Human replication origins, especially those that fire early in S phase, tend to be located near the edges of TADs to be more accessible to the replication machinery. It’s also the case that genomic DNA at the edges of TADs tends to be open euchromatin.

 

The second characteristic of human replication origins is their tendency to be located near genes that are being actively expressed, especially near the regulatory elements of those genes. The DNA around such genes tends to be in the form of open euchromatin and NFRs. So it could be that actively expressed genes simply create the kind of open chromatin environment that makes for viable human replication origins. The primary requirement for human replication origins might just be DNA accessibility rather than specific sequences.

 

There are two other reasons why human replication origins might tend to be in regions of actively expressed genes.


The first is to minimize replication-transcription conflicts. In both DNA replication and RNA transcription (gene expression), the double helix is opened by the complex protein machinery and progresses down the DNA making a copy of one of the two strands: a DNA copy in the case of genome replication and an RNA copy in the case of transcription, or gene expression. Because of this, the replication machinery has the potential to literally crash into the transcription machinery. By replicating actively transcribed regions early and quickly, the frequency of such conflicts might be minimized.


The second reason why the cell might want to replicate active genes early is to ensure that essential genes, especially those involved in cell cycle control, DNA repair, and metabolism, are quickly duplicated and available for transcription during most of S phase.

 

Third, and finally, histone modifications play a key role in defining human replication origins. The most important is called H4K20me2. It functions as the human ORC binding site in much the same way that E. coli’s DnaA box and S. cerevisiae’s ARC function as attachment sites for DnaA and yeast ORC, respectively.“H4K20me2” is interpreted as follows: “H4”: the modification is on histone number 4 (of 4);  “K20”: the modification is on the 20th amino acid of histone 4, which is the amino acid lysine (abbreviated “K”); “me2”: the modification consists of two methyl groups, which in chemical symbols is written “-CH3”.


Human replication origins have other characteristic histone modifications. For example, the modification H3K4me3 is common to both replication origins and gene promotor sites. And the modifications H3K9ac and H3K27ac, which are also common at human origins, are associated with open euchromatin.

 

Regarding the relationship between histone modification and open euchromatin, some histone modifications have a direct effect on DNA accessibility and others have an indirect effect. In the former case, the physical presence of the modification causes the double helix to open more easily. In the latter scenario, a histone modification flags other proteins called chromatin remodelers to remove and/or move nucleosomes to make DNA more accessible. In both cases, histone modifications lead to open chromatin, which is a near requirement for human replication origins.

 

Given their lack of a consensus sequence, human replication origins are nebulous things. I haven’t so much defined them in this chapter as described typical features, most of which are not absolutely required. In fact, most scientists believe that the replication origins of humans and other higher eukaryotes might be defined less by specific DNA sequences and more by DNA accessibility.

 

The next chapter covers the events that transpire immediately after human ORC binds to a H4K20me2 histone modification at a physically accessible replication origin.

 

 

·     

 
 
 

Commentaires


Post: Blog2_Post

Get in Touch

L. Scott Cole

Berkeley, CA

  • Facebook
  • Twitter
  • LinkedIn
  • Instagram

Thanks for submitting!

bottom of page