Causation, Bias, and Confounding

~4 min read

Lesson 2 of 11

Notes

Establishing that an exposure causes a disease -- rather than merely being associated with it -- is one of the most challenging and consequential tasks in epidemiology. Observed associations can arise from true causal relationships, chance, bias, or confounding. This lesson examines frameworks for causal inference and the major threats to validity in epidemiological research.

Bradford Hill Criteria for Causation. In 1965, Sir Austin Bradford Hill proposed nine viewpoints to help assess whether an observed association is likely to be causal. These are not a checklist but a framework for reasoned argument: (1) Strength of association -- a strong RR (e.g., RR = 20 for smoking and lung cancer) is less likely to be explained entirely by confounding or bias; (2) Consistency -- the association has been observed in different populations, places, and times; (3) Specificity -- the exposure is associated with one particular disease (less critical; some causes have multiple effects); (4) Temporality -- exposure must precede the outcome; the only criterion that is absolutely necessary; (5) Biological gradient (dose-response) -- risk increases with increasing exposure; (6) Plausibility -- the association is biologically plausible given current mechanistic knowledge; (7) Coherence -- the causal interpretation does not conflict with known facts about the disease; (8) Experiment -- experimental evidence (animal models or natural experiments) supports the association; (9) Analogy -- similar exposures are known to cause similar diseases. Temporality is the only non-negotiable criterion.

Bias. Bias is a systematic error in the design, conduct, or analysis of a study that leads to incorrect estimates of an exposure-disease association. Unlike random error, bias cannot be corrected by increasing sample size. The two main categories are selection bias and information bias.

Selection bias occurs when study participants are not representative of the target population, distorting the estimate of association. Examples include: Berkson's bias (hospital controls in case-control studies are sicker than the general population, so their exposure prevalences differ), healthy worker effect (employed populations are healthier than the general population, artificially reducing apparent risk estimates in occupational cohorts), loss to follow-up bias (if dropout is related to both exposure and outcome), and prevalence-incidence (Neyman) bias in cross-sectional studies (fatal or rapidly resolving cases are missed).

Information bias (measurement bias) arises from inaccurate measurement of exposure or outcome. Recall bias occurs in retrospective studies when cases and controls recall past exposures differently (e.g., mothers of children with birth defects more carefully recalling medication use in pregnancy). Non-differential misclassification -- when misclassification is unrelated to exposure or outcome status -- generally biases the association towards the null (attenuates RR towards 1). Differential misclassification can bias in either direction.

Confounding. A confounding variable (confounder) is associated with both the exposure and the outcome, and is not in the causal pathway between them, thereby distorting the apparent exposure-outcome relationship. Classic example: a study finds that coffee drinkers have higher rates of lung cancer, but the association disappears after controlling for smoking (the confounder), since smoking is associated with both coffee drinking and lung cancer. Criteria for confounding: (1) the confounder is associated with the exposure in the study population; (2) the confounder is an independent risk factor for the outcome; (3) the confounder is not an intermediate variable (mediator) on the causal pathway. Residual confounding occurs when confounders are incompletely controlled for, even after adjustment.

Controlling confounding: at the design stage via randomisation (RCTs), restriction (studying only one level of the confounder), or matching; at the analysis stage via stratification (Mantel-Haenszel methods) or multivariable regression (allowing simultaneous adjustment for multiple confounders).

Directed Acyclic Graphs (DAGs) are causal diagrams representing relationships between variables as arrows. They distinguish confounders (common causes of exposure and outcome), mediators (variables on the causal pathway), and colliders (variables caused by both exposure and outcome). Conditioning on a collider opens a spurious association -- a subtle form of bias (collider-stratification bias) that can induce a false association even when none exists. DAGs provide a principled framework for identifying what variables must be adjusted for (sufficient adjustment set) and what must not be adjusted for (colliders, mediators when total effect is of interest).

Effect Modification (Interaction). Effect modification occurs when the magnitude of the association between exposure and outcome differs across strata of a third variable (the effect modifier). Unlike confounding -- which is a nuisance to be controlled -- effect modification is a real biological phenomenon of interest. For example, the benefit of a protective vaccine may differ by age group. Effect modification is assessed by computing stratum-specific estimates and testing for heterogeneity. On the additive scale, effect modification is present when the AR differs; on the multiplicative scale, when the RR differs. Recognising effect modification is essential for targeting interventions to subgroups that benefit most.

✍️

SAQs & Essay

Short answer questions + essay writing practice

🃏

Flashcards

FSRS spaced-repetition card review

📝

MCQ Quiz

Multiple choice questions with explanations