Back to ELM2: Evidence Based Practice & Epidemiology

Statistics Fundamentals

~2 min read

Lesson 14 of 20

Notes

Statistics is the science of collecting, analysing, and interpreting data. The primary aim of most research studies is to use data from a sample to draw inferences about characteristics of a larger population — this process is called statistical inference.

A population is the complete set of elements we wish to describe or make inferences about. A census studies the entire population; a sample is a subset. For a sample to support valid inference, it must be representative of the population. Random sampling — where every individual has an equal known probability of selection — is the best method for obtaining a representative sample.

Several sampling methods exist. Simple random sampling gives every individual an equal chance of selection. Stratified sampling performs simple random sampling within each stratum (a subdivision of the population whose members share a defined characteristic). Stratified sampling is more precise than simple random sampling because it guarantees representation of each stratum. Cluster sampling randomly samples groups (clusters) of naturally occurring individuals (e.g., households, schools); it is cheaper but clusters may not be representative of each other.

Strata are subdivisions where individuals within each stratum are similar but strata differ from each other — all strata are included. Clusters are groups assumed to be similar to each other as a whole — not all clusters are included in the study.

Sources of error: random error arises from natural variation and decreases with increasing sample size. Systematic errors (biases) arise from study design and do not decrease with sample size — they include selection bias (sample is not representative of the source population) and information bias (data collected is incorrect). Precision is how close measurements are to each other (measure of random variation). Accuracy is how close the average of measurements is to the true value (includes precision and bias).

Evidence-based policy applies the best available scientific evidence to policy decisions. Evidence-based practice preferentially uses interventions where systematic research has provided reliable evidence of benefit.

Descriptive statistics: mean (average of all values), median (middle value when ordered), mode (most frequent value). Range = largest − smallest value. Standard deviation (s) is the square root of the variance (s²); variance is the average squared distance from the mean. Small standard deviation: data clustered around the mean. Large standard deviation: data spread out. Interquartile range (IQR) = Q3 − Q1, containing the central 50% of data.

🃏

Flashcards

FSRS spaced-repetition card review

📝

MCQ Quiz

Multiple choice questions with explanations