Measures of Disease Frequency and Epidemiological Study Design
~5 min read
Lesson 1 of 11
Notes
Epidemiological Study Designs
Introduction
Epidemiology is the study of the distribution and determinants of health-related states in specified populations, and the application of this study to the control of health problems (Last, 2001). Before we can establish whether a risk factor causes a disease, we must choose the most appropriate study design to answer our question rigorously. Different designs offer different trade-offs between feasibility, validity, and the strength of causal inference they support.
The Hierarchy of Evidence
Study designs sit within a broad evidence hierarchy:
| Level | Design | Strength of Causal Inference |
|---|---|---|
| 1 | Systematic review / Meta-analysis of RCTs | Highest |
| 2 | Randomised controlled trial (RCT) | High |
| 3 | Cohort study | Moderate |
| 4 | Case-control study | Moderate |
| 5 | Cross-sectional study | Low |
| 6 | Ecological study | Low |
| 7 | Case series / Case report | Very low |
This hierarchy exists because different designs have different susceptibility to bias and confounding, and different abilities to establish temporality (exposure precedes outcome โ a requirement for causation per Bradford Hill criteria).
Observational Study Designs
Cohort Studies
A cohort study identifies a group of people free of the outcome of interest, classifies them by exposure status, and follows them forward in time to compare the incidence of outcomes between exposed and unexposed groups. Cohorts can be prospective (exposure classified and follow-up begins now) or retrospective (exposure data already exist from historical records; outcomes also already known โ faster and cheaper but data quality is limited by what was recorded).
Strengths: can measure incidence and calculate relative risk (RR) directly; can examine multiple outcomes from one exposure; temporal relationship is clear (exposure precedes outcome). Limitations: expensive and time-consuming for chronic diseases (e.g., a cohort study of smoking and lung cancer required >20 years); large numbers needed for rare outcomes; loss to follow-up can introduce selection bias.
Key measure โ Relative Risk (RR):
RR = Incidence in exposed / Incidence in unexposed
RR = 1: no association; RR > 1: exposure associated with increased risk; RR < 1: protective.
Case-Control Studies
A case-control study identifies individuals with the disease (cases) and without (controls), then looks back to compare exposure frequencies. Incidence cannot be measured directly; the measure of association is the odds ratio (OR):
OR = (Exposure odds in cases) / (Exposure odds in controls) = (a/c) / (b/d) = ad/bc
When disease prevalence is <10% (rare disease assumption), OR โ RR. Controls should be sampled from the same base population that gave rise to the cases (source population principle).
Strengths: efficient for rare diseases (can achieve adequate power without the years required to accumulate sufficient cases in a cohort); relatively quick and inexpensive; can examine multiple exposures for one outcome. Limitations: temporal relationship may be ambiguous; susceptible to recall bias (cases may remember exposures more vividly than controls); susceptible to selection bias in control selection; cannot measure incidence.
Cross-Sectional Studies
A cross-sectional (prevalence) study measures exposure and outcome simultaneously at a single point in time in a defined population. Calculates prevalence and prevalence ratio or odds ratio. Useful for hypothesis generation, planning health services, and monitoring trends. Cannot establish temporality and therefore cannot prove causation โ "snapshot" problem: we cannot know if exposure preceded outcome.
Ecological Studies
Ecological studies use aggregate (population-level) data โ they compare disease rates between countries, regions, or time periods in relation to population-level exposures (dietary patterns, pollution levels). Useful for generating hypotheses (e.g., correlation between fat intake and breast cancer rates across countries) but highly susceptible to the ecological fallacy โ inferring individual-level associations from group-level data.
Randomised Controlled Trials
RCTs are the gold standard for evaluating interventions. Participants are randomly allocated to receive the intervention or control (placebo or active comparator). Randomisation distributes known AND unknown confounders equally between groups, allowing valid attribution of any outcome difference to the intervention. Blinding (participant, clinician, outcome assessor) further reduces bias.
Limitations: expensive; ethical constraints (cannot randomly assign participants to harmful exposures); volunteer/recruitment bias (participants may not represent the general population โ external validity); often cannot be performed for preventive exposures or long latency diseases.
Intention-to-Treat vs Per-Protocol Analysis
ITT: all randomised participants analysed in the group to which they were allocated regardless of adherence. Preserves randomisation and gives an unbiased estimate of the real-world effect. Per-protocol: analyses only those who adhered to the assigned intervention. Susceptible to selection bias but may better estimate biological efficacy.
Key Sources of Bias
| Bias Type | Definition | Example |
|---|---|---|
| Selection bias | Systematic difference between study participants and the target population | Healthy worker effect in occupational cohorts |
| Information/Recall bias | Systematic error in data collection or recall | Cases over-reporting past exposures vs controls |
| Confounding | A third variable associated with both exposure and outcome | Alcohol-liver cancer association confounded by smoking |
| Attrition bias | Differential loss to follow-up between groups | Sicker participants withdraw from cohort โ underestimates disease |
Measures of Association and Impact
- Attributable risk (AR) = Incidence in exposed โ Incidence in unexposed: the absolute excess risk due to the exposure
- Attributable risk percent (AR%) = AR / Incidence in exposed ร 100: percentage of disease in the exposed attributable to the exposure
- Population attributable risk (PAR): excess disease in the total population due to the exposure; depends on AR and prevalence of exposure โ guides public health priority-setting
- Number needed to treat (NNT) = 1 / Absolute risk reduction: number of patients who must receive treatment for one additional patient to benefit
Summary
Choosing the right study design depends on the question, the disease (rare vs common), feasibility, and available resources. Observational designs can generate strong evidence when potential confounders are carefully measured and controlled, and when the same association is replicated in multiple populations with different confounders (Bradford Hill criteria: consistency, biological gradient, plausibility, coherence). Randomised trials provide the highest causal certainty for interventions but are not always feasible or ethical. The critical consumer of medical literature must understand each design's strengths and inherent biases.
What to study next
Related courses