Transcription & RNA Processing
~6 min read
Lesson 4 of 12
Notes
Overview: From DNA to RNA
Transcription is the process by which the information in a DNA sequence is copied into a complementary RNA molecule. In eukaryotes, transcription occurs in the nucleus and the resulting RNA undergoes extensive processing before export to the cytoplasm. Understanding the precision of this process โ the controlled selection of which genes are expressed, when, and in which cells โ is foundational to understanding development, differentiation, and disease.
RNA Polymerases
Prokaryotes have a single RNA polymerase (core enzyme: ฮฑโฮฒฮฒ'ฯ; holoenzyme with ฯ factor for promoter recognition). Eukaryotes have three:
- RNA polymerase I (Pol I): transcribes ribosomal RNA (rRNA) genes โ 28S, 18S, 5.8S (in nucleolus)
- RNA polymerase II (Pol II): transcribes protein-coding genes (โ mRNA precursors) and most non-coding RNAs (miRNA, lncRNA); most intensively studied; subject of many cancer mutations and drug targets
- RNA polymerase III (Pol III): transcribes tRNA, 5S rRNA, and small nuclear RNAs
RNA polymerase does not require a primer. Synthesis is always 5'โ3', reading the template (non-coding) strand 3'โ5'.
Promoters and Transcription Initiation
The promoter is the DNA region where the transcription machinery assembles to initiate transcription. For Pol II:
Core promoter elements (within ~35 bp of the transcription start site, TSS):
- TATA box (~โ30): consensus TATAAAA; bound by TBP (TATA-binding protein) as part of TFIID; positions the TSS
- Initiator element (Inr): spans the TSS; contributes to TSS selection
- Downstream promoter element (DPE): ~+30 from TSS; recognised by TAFII components
- Many genes lack a TATA box and use CpG island promoters instead
Transcription factor assembly: general transcription factors (GTFs: TFIID, TFIIA, TFIIB, TFIIF, TFIIE, TFIIH) assemble at the core promoter in a step-wise manner. TFIIH has both helicase activity (unwinds DNA at TSS) and kinase activity (phosphorylates the C-terminal domain [CTD] of Pol II large subunit to convert it from an initiating to an elongating polymerase).
Enhancers and silencers: regulatory elements that may be hundreds of kilobases from the gene they regulate. Enhancers stimulate transcription by binding activator proteins; silencers repress transcription by binding repressors. They act by looping the chromatin to bring the bound factor close to the promoter (the enhancer-promoter loop model). Mediator is a large co-activator complex that physically bridges enhancer-bound activators to the Pol II preinitiation complex.
Elongation and Termination
After promoter clearance, Pol II enters elongation mode โ the CTD is hyperphosphorylated on Ser2 (in addition to Ser5 for capping), which recruits mRNA processing factors. Elongation rate: ~2,000 nt/min. Pol II pauses transiently at many positions, creating "paused Pol II" that can be released by P-TEFb (CDK9) upon signal โ this is a key regulatory checkpoint.
Termination in eukaryotes is coupled to cleavage and polyadenylation of the 3' end of the transcript. Pol II transcribes past the poly-A signal (AAUAAA) and a downstream cleavage site; the cleavage/polyadenylation specificity factor (CPSF) and cleavage stimulation factor (CstF) recognise these signals, cleave the RNA, and trigger termination.
5' Capping
One of the first RNA processing events โ occurs co-transcriptionally as soon as Pol II has transcribed ~20โ30 nt. The 7-methylguanosine (m7G) cap is added to the 5' end in a three-step reaction:
- RNA 5'-triphosphatase removes the ฮณ-phosphate
- Guanylyl transferase adds a GMP in an unusual 5'-5' triphosphate linkage
- Guanine-N7 methyltransferase methylates the G at position 7
Functions of the 5' cap:
- Protection from 5'โ3' exonuclease degradation
- Translation initiation: recognised by eIF4E (the cap-binding translation initiation factor)
- Nuclear export: recognised by CBC (cap-binding complex) for export via NXF1
- Splicing: the cap promotes recognition of the first exon
3' Poly-A Tail
After cleavage at the poly-A site (~10โ30 nt downstream of AAUAAA), poly(A) polymerase (PAP) adds ~200 A residues (in humans) to the 3' end without a template, stimulated by CPSF.
Functions of the poly-A tail:
- mRNA stability: PABP (poly-A binding protein) bound to the tail protects against 3'โ5' deadenylation-dependent decay
- Translation: PABP interacts with eIF4G, stimulating translation initiation by circularising the mRNA (closed-loop model)
- Nuclear export: required for export competence
Pre-mRNA Splicing
Protein-coding genes in eukaryotes are split genes โ coding sequences (exons) are interrupted by non-coding sequences (introns). The primary transcript (pre-mRNA) contains both; introns must be precisely removed and exons joined in a process called splicing, carried out by the spliceosome.
Splicing signals:
- 5' splice site (donor): GU at the intron start (conserved GT/GU in pre-mRNA)
- 3' splice site (acceptor): AG at the intron end, preceded by a polypyrimidine tract
- Branch point: an A residue ~20โ50 nt upstream of the 3' splice site; the 2'-OH of this A attacks the 5' splice site in the first step
Mechanism (two trans-esterification reactions):
- Nucleophilic attack of branch point 2'-OH on the 5' splice site โ lariat intermediate; free 5' exon
- Attack of the free 3'-OH of the 5' exon on the 3' splice site โ exons joined; intron lariat released and debranched/degraded
The spliceosome is assembled from five small nuclear RNAs (snRNAs: U1, U2, U4, U5, U6) and their associated proteins forming snRNPs ("snurps"). The assembly is ordered: U1 recognises the 5' splice site; U2 AF helps position U2 at the branch point; U4/U6.U5 tri-snRNP joins to complete the active spliceosome.
Alternative Splicing
Alternative splicing allows a single gene to produce multiple mRNA isoforms by inclusion or exclusion of specific exons or by using alternative 5' or 3' splice sites. Mechanisms:
- Exon skipping: an exon is excluded from the mRNA
- Alternative 5' splice site: different donors for the same exon
- Alternative 3' splice site: different acceptors
- Intron retention: an intron remains in the mature mRNA (can trigger nonsense-mediated decay)
- Mutually exclusive exons: only one of two adjacent exons is included
~95% of human multi-exon genes undergo alternative splicing. A single gene can produce hundreds of isoforms (e.g., Dscam in Drosophila: ~38,000 isoforms; NRXN1 in humans: >1,000 isoforms, relevant to neurexin biology in autism). Clinically: aberrant splicing caused by mutations is responsible for ~10โ15% of all genetic diseases (e.g., BRCA1/2 splicing mutations, spinal muscular atrophy โ SMN2 exon 7 skipping targeted therapeutically by nusinersen).
Non-coding RNAs
The genome encodes far more RNA than mRNA. Key non-coding RNA (ncRNA) classes:
microRNA (miRNA): ~22 nt; processed from hairpin precursors in a two-step pathway:
- Drosha (nuclear): cleaves primary miRNA (pri-miRNA) to ~70 nt pre-miRNA hairpin
- Dicer (cytoplasmic): cleaves pre-miRNA to ~22 nt miRNA duplex; one strand (guide) is loaded into the RISC complex (RNA-induced silencing complex) with Argonaute (AGO2)
- RISC-miRNA guides to complementary target mRNA (typically 3' UTR) โ translational repression or mRNA destabilisation
miRNAs regulate the majority of human protein-coding genes. Many miRNAs function as tumour suppressors (let-7 family) or oncogenes (oncomiRs: miR-21, miR-155). OncomiRs can be therapeutic targets; tumour suppressor miRNA mimics are in clinical development.
Long non-coding RNAs (lncRNAs): >200 nt; extremely diverse functions:
- Chromatin remodelling: XIST recruits Polycomb repressive complex to silence the inactive X chromosome
- Transcriptional regulation: HOTAIR acts as scaffold for chromatin-modifying complexes
- Post-transcriptional regulation: CeRNA (competing endogenous RNA) sponges for miRNAs
- Phase separation: many lncRNAs scaffold transcriptional condensates
lncRNA dysregulation is common in cancer โ HOTAIR overexpression correlates with breast cancer metastasis.
What to study next
Related courses