What Is a Sampling Distribution?
In real life, we collect one sample and compute a statistic (like \(\bar{x}\) or \(\hat{p}\)). But imagine repeating the sampling process thousands of times — each time getting a slightly different value for the statistic. The distribution of all those values is called the sampling distribution.
Parameter: A number describing the population. Fixed but usually unknown. Symbols: \(\mu\), \(\sigma\), \(p\).
Statistic: A number computed from a sample. Varies from sample to sample. Symbols: \(\bar{x}\), \(s\), \(\hat{p}\).
The sampling distribution is the theoretical foundation for all of statistical inference (Units 6–9). It tells us how much our sample statistic is likely to vary from the true population parameter — which is exactly what a confidence interval and hypothesis test are built on.
Bias & Variability of a Statistic
Unbiased: The mean of the sampling distribution equals the true parameter. The statistic is "on target" on average. \(\mu_{\bar{x}} = \mu\)
Low variability: Repeated samples give similar values — the statistic is consistent. Controlled by sample size \(n\).
Reduce bias: Use random sampling. Using a biased sampling method (convenience, voluntary response) creates bias that cannot be fixed by increasing sample size.
Reduce variability: Increase sample size \(n\). Larger samples produce sampling distributions with less spread — values cluster closer to the true parameter.
Sampling Distribution of the Sample Mean \(\bar{x}\)
When we take a random sample of size \(n\) from a population and compute \(\bar{x}\), the behavior of \(\bar{x}\) across all possible samples follows predictable rules.
\(\sigma_{\bar{x}}\) is called the standard error of the mean.
Standard deviation (\(\sigma\)): Describes spread of individual values in the population.
Standard error (\(\sigma/\sqrt{n}\)): Describes spread of sample means — how much \(\bar{x}\) varies from sample to sample. Always smaller than \(\sigma\) when \(n > 1\).
Heights of adult men are Normally distributed with μ = 70 inches and σ = 3 inches. A random sample of 36 men is taken.
Mean of sampling distribution: \(\mu_{\bar{x}} = 70\) inches
Standard error: \(\sigma_{\bar{x}} = \frac{3}{\sqrt{36}} = \frac{3}{6} = 0.5\) inches
P(\(\bar{x}\) > 71): \(z = \frac{71-70}{0.5} = 2.0\), so \(P(\bar{x} > 71) = P(Z > 2.0) \approx 0.0228\)
It would be quite unusual (only 2.28% chance) to get a sample mean above 71 inches with n=36.
The Central Limit Theorem (CLT)
The Central Limit Theorem is one of the most remarkable and important results in all of statistics. It tells us about the shape of the sampling distribution.
For a random sample of size \(n\) from any population with mean \(\mu\) and standard deviation \(\sigma\):
When \(n\) is sufficiently large, the sampling distribution of \(\bar{x}\) is approximately Normal, regardless of the shape of the population distribution.
\(\bar{x} \sim N\!\left(\mu,\, \frac{\sigma}{\sqrt{n}}\right)\) approximately, for large \(n\)
If the population is Normal: The sampling distribution of \(\bar{x}\) is exactly Normal for any \(n\). No minimum sample size needed.
If the population is NOT Normal: The sampling distribution of \(\bar{x}\) is approximately Normal when \(n \geq 30\). This is the general rule of thumb for the AP exam.
Skewed populations: May need \(n\) larger than 30. More skew = larger \(n\) needed. The AP exam usually tells you if the sample is large enough.
The CLT says the sampling distribution of \(\bar{x}\) becomes Normal — NOT that the individual data values become Normal. Individual values from a skewed population remain skewed no matter how large \(n\) is.
Sampling Distribution of the Sample Proportion \(\hat{p}\)
When a categorical variable has two outcomes (success/failure), we describe results using the sample proportion \(\hat{p}\). Its sampling distribution has similar properties to that of \(\bar{x}\).
\(\hat{p}\) is an unbiased estimator of \(p\).
The sampling distribution of \(\hat{p}\) is approximately Normal when:
\[ np \geq 10 \quad \text{AND} \quad n(1-p) \geq 10 \]
This ensures there are at least 10 expected successes AND 10 expected failures. Both conditions must be checked and stated on the AP exam.
Suppose 60% of voters support a ballot measure. A random sample of 100 voters is taken.
Mean: \(\mu_{\hat{p}} = p = 0.60\)
Standard deviation: \(\sigma_{\hat{p}} = \sqrt{\frac{0.60(0.40)}{100}} = \sqrt{0.0024} \approx 0.049\)
Check Large Counts: \(np = 100(0.60) = 60 \geq 10\) ✓ \(n(1-p) = 100(0.40) = 40 \geq 10\) ✓
Shape: Approximately Normal — both conditions met.
P(\(\hat{p} < 0.55\)): \(z = \frac{0.55-0.60}{0.049} \approx -1.02\), so \(P(\hat{p} < 0.55) \approx 0.154\)
There's about a 15.4% chance that a random sample of 100 voters gives a sample proportion below 0.55, even if the true proportion is 0.60.
| Feature | Sampling Dist. of \(\bar{x}\) | Sampling Dist. of \(\hat{p}\) |
|---|---|---|
| Estimates | Population mean \(\mu\) | Population proportion \(p\) |
| Mean | \(\mu_{\bar{x}} = \mu\) | \(\mu_{\hat{p}} = p\) |
| Std Error | \(\sigma/\sqrt{n}\) | \(\sqrt{p(1-p)/n}\) |
| Normal when | Population Normal, OR \(n \geq 30\) (CLT) | \(np \geq 10\) AND \(n(1-p) \geq 10\) |
| Used for | Quantitative data | Categorical data (2 categories) |
Putting It All Together: The 3 Questions
For any sampling distribution problem on the AP exam, answer three questions:
1. CENTER: What is the mean of the sampling distribution?
For \(\bar{x}\): \(\mu_{\bar{x}} = \mu\) | For \(\hat{p}\): \(\mu_{\hat{p}} = p\)
2. SPREAD: What is the standard deviation (standard error)?
For \(\bar{x}\): \(\sigma/\sqrt{n}\) | For \(\hat{p}\): \(\sqrt{p(1-p)/n}\)
3. SHAPE: Is the sampling distribution approximately Normal?
For \(\bar{x}\): Population Normal OR \(n \geq 30\) (CLT) | For \(\hat{p}\): \(np \geq 10\) AND \(n(1-p) \geq 10\)
Always state and check conditions before using Normal calculations — never just assume normality.
When asked for a probability about \(\bar{x}\) or \(\hat{p}\), always use the standard error (not \(\sigma\)) in your z-score formula:
\(z = \dfrac{\bar{x} - \mu}{\sigma/\sqrt{n}}\) or \(z = \dfrac{\hat{p} - p}{\sqrt{p(1-p)/n}}\)
Using \(\sigma\) instead of \(\sigma/\sqrt{n}\) is a very common — and costly — AP exam error.
Multiple Choice Questions
Try each question, then reveal the answer and explanation.
A population has mean μ = 50 and standard deviation σ = 12. A random sample of size n = 36 is selected. What is the standard deviation of the sampling distribution of x̄?
- A 12
- B 6
- C 3
- D 2
- E 1.39
\(\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{12}{\sqrt{36}} = \frac{12}{6} = \mathbf{2}\)
This is the standard error — the spread of the sampling distribution of \(\bar{x}\). Note: 12 (choice A) is the population standard deviation, not the standard error. A common trap.
A population is strongly right-skewed with mean μ = 20 and standard deviation σ = 8. A researcher takes a random sample of n = 64. Which of the following best describes the sampling distribution of x̄?
- A Strongly right-skewed with mean 20 and standard deviation 8
- B Strongly right-skewed with mean 20 and standard deviation 1
- C Approximately Normal with mean 20 and standard deviation 8
- D Approximately Normal with mean 20 and standard deviation 1
- E Exactly Normal with mean 20 and standard deviation 1
By the CLT, since n = 64 ≥ 30, the sampling distribution of \(\bar{x}\) is approximately Normal (not exactly — E is wrong because the population isn't Normal).
Mean: \(\mu_{\bar{x}} = \mu = 20\)
Standard error: \(\sigma_{\bar{x}} = 8/\sqrt{64} = 8/8 = \mathbf{1}\)
In a large city, 35% of residents own a bicycle. A random sample of 200 residents is selected. What is the standard deviation of the sampling distribution of p̂?
- A 0.350
- B 0.0338
- C 0.00114
- D 0.0477
- E 0.455
\(\sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}} = \sqrt{\frac{0.35(0.65)}{200}} = \sqrt{\frac{0.2275}{200}} = \sqrt{0.0011375} \approx \mathbf{0.0338}\)
A researcher uses a voluntary response survey to estimate the average daily screen time of teenagers in a city. She uses a very large sample of 5,000 respondents. Which of the following best describes the estimate?
- A The estimate will have low bias and low variability due to the large sample size.
- B The estimate will have low bias because of the large sample, but high variability.
- C The estimate will likely be biased; the large sample size does not correct for the voluntary response bias.
- D The estimate will be unbiased because voluntary response samples are representative.
- E The estimate will be exact because the sample size is very large.
Voluntary response sampling is biased — people who feel strongly (those with very high or very low screen time) are more likely to respond. A large sample size cannot fix bias in the sampling method. A large biased sample just gives a more precise estimate of the wrong thing.
The weights of apples at a farm are Normally distributed with mean μ = 180 grams and standard deviation σ = 15 grams. A random sample of 25 apples is selected. What is the probability that the sample mean weight is less than 174 grams?
- A 0.3446
- B 0.1587
- C 0.0548
- D 0.0228
- E 0.0062
Standard error: \(\sigma_{\bar{x}} = 15/\sqrt{25} = 15/5 = 3\) grams
\(z = \frac{174 - 180}{3} = \frac{-6}{3} = -2.0\)
\(P(\bar{x} < 174) = P(Z < -2.0) \approx \mathbf{0.0228}\)
Note: If you mistakenly used σ = 15 instead of σ/√n = 3, you'd get z = −0.4 → P ≈ 0.3446 (choice A). Always use the standard error for sampling distribution problems!
Free Response Questions
Write your full solution before revealing. Always state and verify conditions, show z-score work, and answer in context.
FRQ 1 — Sampling Distribution of x̄
~12 minutes✓ Model Solution
(a) Describe the sampling distribution:
Shape: Approximately Normal. Although the population is right-skewed, the sample size n = 49 ≥ 30, so by the Central Limit Theorem, the sampling distribution of \(\bar{x}\) is approximately Normal.
Center: \(\mu_{\bar{x}} = \mu = 14\) hours
Spread: \(\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{6}{\sqrt{49}} = \frac{6}{7} \approx 0.857\) hours
(b) P(\(\bar{x}\) > 15.5):
\(z = \frac{15.5 - 14}{6/\sqrt{49}} = \frac{1.5}{0.857} \approx 1.75\)
\(P(\bar{x} > 15.5) = P(Z > 1.75) \approx 1 - 0.9599 = \mathbf{0.0401}\)
There is approximately a 4% chance that a random sample of 49 students has a mean study time greater than 15.5 hours.
(c) Would n = 9 give the same probability?
No. With n = 9, the CLT does not apply because n < 30 and the population is right-skewed (not Normal). We cannot use Normal probability calculations for the sampling distribution of \(\bar{x}\) with such a small sample from a skewed population. The probability calculation in part (b) would not be valid with n = 9.
✓ AP grading tips: Part (a) must explicitly mention the CLT and state n ≥ 30. Part (c) must say the CLT does not apply and explain why. Not just "the answer would be different."
FRQ 2 — Sampling Distribution of p̂
~12 minutes✓ Model Solution
(a) Check conditions for Normality:
Random: The sample is randomly selected. ✓
Large Counts:
\(np = 150(0.42) = 63 \geq 10\) ✓
\(n(1-p) = 150(0.58) = 87 \geq 10\) ✓
Both conditions are satisfied, so the sampling distribution of \(\hat{p}\) can be approximated by a Normal distribution.
(b) Mean and Standard Deviation:
\(\mu_{\hat{p}} = p = \mathbf{0.42}\)
\(\sigma_{\hat{p}} = \sqrt{\frac{0.42(0.58)}{150}} = \sqrt{\frac{0.2436}{150}} = \sqrt{0.001624} \approx \mathbf{0.0403}\)
(c) P(0.38 < \(\hat{p}\) < 0.48):
\(z_1 = \frac{0.38 - 0.42}{0.0403} \approx -0.99\) | \(z_2 = \frac{0.48 - 0.42}{0.0403} \approx 1.49\)
\(P(-0.99 < Z < 1.49) = P(Z < 1.49) - P(Z < -0.99)\)
\(\approx 0.9319 - 0.1611 = \mathbf{0.7708}\)
There is approximately a 77% chance that the sample proportion falls between 0.38 and 0.48.
(d) Effect of doubling sample size:
Doubling \(n\) from 150 to 300 reduces the standard deviation by a factor of \(\sqrt{2} \approx 1.414\).
\(\sigma_{\hat{p}} = \sqrt{\frac{0.42(0.58)}{300}} = \sqrt{\frac{0.2436}{300}} \approx \sqrt{0.000812} \approx \mathbf{0.0285}\)
The standard deviation decreased from 0.0403 to 0.0285. Note that doubling the sample size does not halve the standard deviation — it only reduces it by a factor of √2 ≈ 1.41. To halve the standard deviation, you must quadruple the sample size.
✓ AP tip: For (a) you must write out np ≥ 10 AND n(1-p) ≥ 10 with numbers. For (d), many students say "the variability decreases" without calculating or explaining the √n relationship — full credit requires the calculation.