Gene Expression: Transcription & Translation
~4 min read
Lesson 3 of 12
Notes
Gene expression is the process by which information encoded in DNA is converted into functional products โ RNA and protein. It is regulated at multiple levels and is fundamental to cell identity, differentiation, and response to environmental signals.
Transcription
Transcription is the synthesis of RNA using DNA as a template, catalysed by RNA polymerase. In eukaryotes, three RNA polymerases have distinct roles: RNA Pol I transcribes rRNA (28S, 18S, 5.8S) in the nucleolus; RNA Pol II transcribes protein-coding genes (mRNA precursors) and most non-coding RNAs; RNA Pol III transcribes tRNA, 5S rRNA, and small nuclear RNAs.
RNA Pol II recognises promoter elements with the help of general transcription factors (GTFs). The core promoter typically contains a TATA box (~25 bp upstream of the transcription start site, TSS) recognised by TBP (TATA-binding protein, a component of TFIID). Additional promoter elements include the initiator (Inr), BRE, and downstream promoter element (DPE). Assembly of the pre-initiation complex (PIC) at the promoter requires sequential recruitment of TFIID, TFIIA, TFIIB, RNA Pol II/TFIIF, TFIIE, and TFIIH. TFIIH has helicase activity (unwinds DNA at the TSS) and a kinase that phosphorylates the RNA Pol II CTD (C-terminal domain), releasing it from the PIC to begin elongation.
Enhancers are cis-regulatory elements that can be located hundreds of kilobases from the gene they regulate. They are bound by sequence-specific activator transcription factors (e.g., SP1, CREB, NF-ฮบB, p53). Activators contact the Mediator co-activator complex, which bridges to the PIC via DNA looping, stimulating transcriptional initiation. Silencers and insulators (CTCF-binding sites) restrict enhancer activity to specific genomic domains (topologically associating domains, TADs).
mRNA Processing
Eukaryotic pre-mRNA undergoes extensive co-transcriptional and post-transcriptional processing before export and translation. 5' capping: within seconds of transcription initiation, the 5' end receives a 7-methylguanosine (mโทG) cap via a 5'โ5' triphosphate linkage. The cap protects mRNA from 5'โ3' exonucleases, facilitates nuclear export, and is required for cap-dependent translation initiation (recognised by eIF4E). 3' polyadenylation: cleavage of the pre-mRNA ~10โ30 nt downstream of the AAUAAA polyadenylation signal is followed by addition of ~200 adenine residues (poly-A tail) by poly-A polymerase. The poly-A tail stabilises mRNA and promotes translation. Splicing: introns are removed by the spliceosome โ a large RNA-protein complex containing five snRNPs (U1, U2, U4, U5, U6). Splicing proceeds via two transesterification reactions, forming a lariat intermediate. Alternative splicing generates multiple mRNA isoforms from a single gene: exon skipping, alternative 5'/3' splice sites, intron retention, and mutually exclusive exons. ~95% of multi-exon human genes undergo alternative splicing, vastly expanding proteome diversity. Errors in splicing cause disease (e.g., ฮฒ-thalassaemia from splicing mutations in ฮฒ-globin).
Translation
Translation decodes mRNA codon sequence into polypeptide at the ribosome (eukaryotic 80S = 40S + 60S subunits). The genetic code uses 64 triplet codons: 61 encode amino acids, 3 are stop codons (UAA, UAG, UGA). The code is degenerate (multiple codons for one amino acid), unambiguous (each codon specifies exactly one amino acid), and nearly universal. Aminoacyl-tRNA synthetases charge tRNAs with their cognate amino acid (20 enzymes, one per amino acid), ensuring fidelity via a proofreading mechanism.
Initiation: The 43S pre-initiation complex (40S + Met-tRNAแตข + eIF2-GTP + eIF3) binds the 5' cap via eIF4F (eIF4E + eIF4G + eIF4A helicase) and scans 5'โ3' until it encounters the AUG start codon in the optimal Kozak context (GCCRCCatgG). eIF2B exchanges GDP for GTP on eIF2; the 60S subunit joins and eIF5B-GTP drives subunit joining. Elongation: Aminoacyl-tRNAs enter the A site (delivered by eEF1A-GTP); peptidyl transferase (ribozyme activity of 28S rRNA) forms the peptide bond; eEF2-GTP drives translocation (AโP, PโE sites). Termination: Stop codons are recognised by eRF1 (decodes all three stop codons) + eRF3-GTP, catalysing release of the polypeptide; ABCE1 drives ribosome recycling.
Post-translational Modification and Protein Fate
Proteins undergo extensive post-translational modification (PTM) that determines their activity, localisation, and half-life. Phosphorylation (by kinases, reversed by phosphatases): regulates enzyme activity, protein-protein interactions, localisation. Ubiquitination: polyubiquitin chains target proteins for proteasomal degradation; monoubiquitination regulates endosomal sorting and DNA repair. Glycosylation: N-linked (begins in ER on Asn-X-Ser/Thr) and O-linked (Golgi, on Ser/Thr); affects protein folding, stability, and cell-surface recognition. Acetylation, methylation, sumoylation: regulate chromatin organisation and signalling.
Gene Regulation in Development and Disease
Gene expression patterns define cell identity. Transcription factors acting combinatorially (e.g., Oct4 + Sox2 + Klf4 + c-Myc to maintain pluripotency) establish and maintain cell-type-specific expression programs. Epigenetic modifications (histone marks, DNA methylation, chromatin remodelling) are heritable through cell divisions. Non-coding RNAs โ microRNAs (miRNAs, ~22 nt, loaded into RISC to degrade or repress target mRNAs by complementary base pairing to their 3' UTR), long non-coding RNAs (lncRNAs), and circular RNAs โ add further layers of regulation. Dysregulation of transcription factors and epigenetic machinery drives cancer: MYC amplification, EZH2 gain-of-function, and BCL6 translocations are examples.
What to study next
Related courses