You're browsing as a guest โ€” progress won't be saved.

Browsing as Guest
Back to Biochemistry for Health Sciences

Transcription & RNA Processing

~6 min read

Lesson 4 of 12

Notes

Overview: From DNA to RNA

Transcription is the process by which the information in a DNA sequence is copied into a complementary RNA molecule. In eukaryotes, transcription occurs in the nucleus and the resulting RNA undergoes extensive processing before export to the cytoplasm. Understanding the precision of this process โ€” the controlled selection of which genes are expressed, when, and in which cells โ€” is foundational to understanding development, differentiation, and disease.

RNA Polymerases

Prokaryotes have a single RNA polymerase (core enzyme: ฮฑโ‚‚ฮฒฮฒ'ฯ‰; holoenzyme with ฯƒ factor for promoter recognition). Eukaryotes have three:

  • RNA polymerase I (Pol I): transcribes ribosomal RNA (rRNA) genes โ€” 28S, 18S, 5.8S (in nucleolus)
  • RNA polymerase II (Pol II): transcribes protein-coding genes (โ†’ mRNA precursors) and most non-coding RNAs (miRNA, lncRNA); most intensively studied; subject of many cancer mutations and drug targets
  • RNA polymerase III (Pol III): transcribes tRNA, 5S rRNA, and small nuclear RNAs

RNA polymerase does not require a primer. Synthesis is always 5'โ†’3', reading the template (non-coding) strand 3'โ†’5'.

Promoters and Transcription Initiation

The promoter is the DNA region where the transcription machinery assembles to initiate transcription. For Pol II:

Core promoter elements (within ~35 bp of the transcription start site, TSS):

  • TATA box (~โˆ’30): consensus TATAAAA; bound by TBP (TATA-binding protein) as part of TFIID; positions the TSS
  • Initiator element (Inr): spans the TSS; contributes to TSS selection
  • Downstream promoter element (DPE): ~+30 from TSS; recognised by TAFII components
  • Many genes lack a TATA box and use CpG island promoters instead

Transcription factor assembly: general transcription factors (GTFs: TFIID, TFIIA, TFIIB, TFIIF, TFIIE, TFIIH) assemble at the core promoter in a step-wise manner. TFIIH has both helicase activity (unwinds DNA at TSS) and kinase activity (phosphorylates the C-terminal domain [CTD] of Pol II large subunit to convert it from an initiating to an elongating polymerase).

Enhancers and silencers: regulatory elements that may be hundreds of kilobases from the gene they regulate. Enhancers stimulate transcription by binding activator proteins; silencers repress transcription by binding repressors. They act by looping the chromatin to bring the bound factor close to the promoter (the enhancer-promoter loop model). Mediator is a large co-activator complex that physically bridges enhancer-bound activators to the Pol II preinitiation complex.

Elongation and Termination

After promoter clearance, Pol II enters elongation mode โ€” the CTD is hyperphosphorylated on Ser2 (in addition to Ser5 for capping), which recruits mRNA processing factors. Elongation rate: ~2,000 nt/min. Pol II pauses transiently at many positions, creating "paused Pol II" that can be released by P-TEFb (CDK9) upon signal โ€” this is a key regulatory checkpoint.

Termination in eukaryotes is coupled to cleavage and polyadenylation of the 3' end of the transcript. Pol II transcribes past the poly-A signal (AAUAAA) and a downstream cleavage site; the cleavage/polyadenylation specificity factor (CPSF) and cleavage stimulation factor (CstF) recognise these signals, cleave the RNA, and trigger termination.

5' Capping

One of the first RNA processing events โ€” occurs co-transcriptionally as soon as Pol II has transcribed ~20โ€“30 nt. The 7-methylguanosine (m7G) cap is added to the 5' end in a three-step reaction:

  1. RNA 5'-triphosphatase removes the ฮณ-phosphate
  2. Guanylyl transferase adds a GMP in an unusual 5'-5' triphosphate linkage
  3. Guanine-N7 methyltransferase methylates the G at position 7

Functions of the 5' cap:

  • Protection from 5'โ†’3' exonuclease degradation
  • Translation initiation: recognised by eIF4E (the cap-binding translation initiation factor)
  • Nuclear export: recognised by CBC (cap-binding complex) for export via NXF1
  • Splicing: the cap promotes recognition of the first exon

3' Poly-A Tail

After cleavage at the poly-A site (~10โ€“30 nt downstream of AAUAAA), poly(A) polymerase (PAP) adds ~200 A residues (in humans) to the 3' end without a template, stimulated by CPSF.

Functions of the poly-A tail:

  • mRNA stability: PABP (poly-A binding protein) bound to the tail protects against 3'โ†’5' deadenylation-dependent decay
  • Translation: PABP interacts with eIF4G, stimulating translation initiation by circularising the mRNA (closed-loop model)
  • Nuclear export: required for export competence

Pre-mRNA Splicing

Protein-coding genes in eukaryotes are split genes โ€” coding sequences (exons) are interrupted by non-coding sequences (introns). The primary transcript (pre-mRNA) contains both; introns must be precisely removed and exons joined in a process called splicing, carried out by the spliceosome.

Splicing signals:

  • 5' splice site (donor): GU at the intron start (conserved GT/GU in pre-mRNA)
  • 3' splice site (acceptor): AG at the intron end, preceded by a polypyrimidine tract
  • Branch point: an A residue ~20โ€“50 nt upstream of the 3' splice site; the 2'-OH of this A attacks the 5' splice site in the first step

Mechanism (two trans-esterification reactions):

  1. Nucleophilic attack of branch point 2'-OH on the 5' splice site โ†’ lariat intermediate; free 5' exon
  2. Attack of the free 3'-OH of the 5' exon on the 3' splice site โ†’ exons joined; intron lariat released and debranched/degraded

The spliceosome is assembled from five small nuclear RNAs (snRNAs: U1, U2, U4, U5, U6) and their associated proteins forming snRNPs ("snurps"). The assembly is ordered: U1 recognises the 5' splice site; U2 AF helps position U2 at the branch point; U4/U6.U5 tri-snRNP joins to complete the active spliceosome.

Alternative Splicing

Alternative splicing allows a single gene to produce multiple mRNA isoforms by inclusion or exclusion of specific exons or by using alternative 5' or 3' splice sites. Mechanisms:

  • Exon skipping: an exon is excluded from the mRNA
  • Alternative 5' splice site: different donors for the same exon
  • Alternative 3' splice site: different acceptors
  • Intron retention: an intron remains in the mature mRNA (can trigger nonsense-mediated decay)
  • Mutually exclusive exons: only one of two adjacent exons is included

~95% of human multi-exon genes undergo alternative splicing. A single gene can produce hundreds of isoforms (e.g., Dscam in Drosophila: ~38,000 isoforms; NRXN1 in humans: >1,000 isoforms, relevant to neurexin biology in autism). Clinically: aberrant splicing caused by mutations is responsible for ~10โ€“15% of all genetic diseases (e.g., BRCA1/2 splicing mutations, spinal muscular atrophy โ€” SMN2 exon 7 skipping targeted therapeutically by nusinersen).

Non-coding RNAs

The genome encodes far more RNA than mRNA. Key non-coding RNA (ncRNA) classes:

microRNA (miRNA): ~22 nt; processed from hairpin precursors in a two-step pathway:

  1. Drosha (nuclear): cleaves primary miRNA (pri-miRNA) to ~70 nt pre-miRNA hairpin
  2. Dicer (cytoplasmic): cleaves pre-miRNA to ~22 nt miRNA duplex; one strand (guide) is loaded into the RISC complex (RNA-induced silencing complex) with Argonaute (AGO2)
  3. RISC-miRNA guides to complementary target mRNA (typically 3' UTR) โ†’ translational repression or mRNA destabilisation

miRNAs regulate the majority of human protein-coding genes. Many miRNAs function as tumour suppressors (let-7 family) or oncogenes (oncomiRs: miR-21, miR-155). OncomiRs can be therapeutic targets; tumour suppressor miRNA mimics are in clinical development.

Long non-coding RNAs (lncRNAs): >200 nt; extremely diverse functions:

  • Chromatin remodelling: XIST recruits Polycomb repressive complex to silence the inactive X chromosome
  • Transcriptional regulation: HOTAIR acts as scaffold for chromatin-modifying complexes
  • Post-transcriptional regulation: CeRNA (competing endogenous RNA) sponges for miRNAs
  • Phase separation: many lncRNAs scaffold transcriptional condensates

lncRNA dysregulation is common in cancer โ€” HOTAIR overexpression correlates with breast cancer metastasis.

What to study next