Unit 5: Sampling Distributions

Section 5.1

What Is a Sampling Distribution?

In real life, we collect one sample and compute a statistic (like \(\bar{x}\) or \(\hat{p}\)). But imagine repeating the sampling process thousands of times — each time getting a slightly different value for the statistic. The distribution of all those values is called the sampling distribution.

🔑 Key Distinction: Parameter vs Statistic

Parameter: A number describing the population. Fixed but usually unknown. Symbols: \(\mu\), \(\sigma\), \(p\).

Statistic: A number computed from a sample. Varies from sample to sample. Symbols: \(\bar{x}\), \(s\), \(\hat{p}\).

💡 Why This Matters

The sampling distribution is the theoretical foundation for all of statistical inference (Units 6–9). It tells us how much our sample statistic is likely to vary from the true population parameter — which is exactly what a confidence interval and hypothesis test are built on.

Section 5.2

Bias & Variability of a Statistic

🔑 Two Properties of Estimators

Unbiased: The mean of the sampling distribution equals the true parameter. The statistic is "on target" on average. \(\mu_{\bar{x}} = \mu\)

Low variability: Repeated samples give similar values — the statistic is consistent. Controlled by sample size \(n\).

Bias vs Variability — The Target Analogy

💡 How to Reduce Each

Reduce bias: Use random sampling. Using a biased sampling method (convenience, voluntary response) creates bias that cannot be fixed by increasing sample size.

Reduce variability: Increase sample size \(n\). Larger samples produce sampling distributions with less spread — values cluster closer to the true parameter.

Section 5.3

Sampling Distribution of the Sample Mean \(\bar{x}\)

When we take a random sample of size \(n\) from a population and compute \(\bar{x}\), the behavior of \(\bar{x}\) across all possible samples follows predictable rules.

Mean and Standard Deviation of the Sampling Distribution of x̄

\[ \mu_{\bar{x}} = \mu \] \[ \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} \]

\(\mu\) = population mean | \(\sigma\) = population standard deviation | \(n\) = sample size
\(\sigma_{\bar{x}}\) is called the standard error of the mean.

⚠️ Standard Deviation vs Standard Error

Standard deviation (\(\sigma\)): Describes spread of individual values in the population.

Standard error (\(\sigma/\sqrt{n}\)): Describes spread of sample means — how much \(\bar{x}\) varies from sample to sample. Always smaller than \(\sigma\) when \(n > 1\).

Effect of Sample Size on the Sampling Distribution of x̄ (μ = 50, σ = 10)

📌 Example: Standard Error Calculation

Heights of adult men are Normally distributed with μ = 70 inches and σ = 3 inches. A random sample of 36 men is taken.

Mean of sampling distribution: \(\mu_{\bar{x}} = 70\) inches

Standard error: \(\sigma_{\bar{x}} = \frac{3}{\sqrt{36}} = \frac{3}{6} = 0.5\) inches

P(\(\bar{x}\) > 71): \(z = \frac{71-70}{0.5} = 2.0\), so \(P(\bar{x} > 71) = P(Z > 2.0) \approx 0.0228\)

It would be quite unusual (only 2.28% chance) to get a sample mean above 71 inches with n=36.

Section 5.4

The Central Limit Theorem (CLT)

The Central Limit Theorem is one of the most remarkable and important results in all of statistics. It tells us about the shape of the sampling distribution.

📐 Central Limit Theorem

For a random sample of size \(n\) from any population with mean \(\mu\) and standard deviation \(\sigma\):

When \(n\) is sufficiently large, the sampling distribution of \(\bar{x}\) is approximately Normal, regardless of the shape of the population distribution.

\(\bar{x} \sim N\!\left(\mu,\, \frac{\sigma}{\sqrt{n}}\right)\) approximately, for large \(n\)

CLT in Action: Even a Skewed Population Produces a Normal Sampling Distribution

🔑 When Can We Use the CLT?

If the population is Normal: The sampling distribution of \(\bar{x}\) is exactly Normal for any \(n\). No minimum sample size needed.

If the population is NOT Normal: The sampling distribution of \(\bar{x}\) is approximately Normal when \(n \geq 30\). This is the general rule of thumb for the AP exam.

Skewed populations: May need \(n\) larger than 30. More skew = larger \(n\) needed. The AP exam usually tells you if the sample is large enough.

⚠️ The CLT Is About x̄, Not Individual Values

The CLT says the sampling distribution of \(\bar{x}\) becomes Normal — NOT that the individual data values become Normal. Individual values from a skewed population remain skewed no matter how large \(n\) is.

Section 5.5

Sampling Distribution of the Sample Proportion \(\hat{p}\)

When a categorical variable has two outcomes (success/failure), we describe results using the sample proportion \(\hat{p}\). Its sampling distribution has similar properties to that of \(\bar{x}\).

Mean and Standard Deviation of the Sampling Distribution of p̂

\[ \mu_{\hat{p}} = p \] \[ \sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}} \]

\(p\) = true population proportion | \(n\) = sample size
\(\hat{p}\) is an unbiased estimator of \(p\).

📐 Large Counts Condition — When is p̂ Approximately Normal?

The sampling distribution of \(\hat{p}\) is approximately Normal when:

\[ np \geq 10 \quad \text{AND} \quad n(1-p) \geq 10 \]

This ensures there are at least 10 expected successes AND 10 expected failures. Both conditions must be checked and stated on the AP exam.

📌 Example: Sampling Distribution of p̂

Suppose 60% of voters support a ballot measure. A random sample of 100 voters is taken.

Mean: \(\mu_{\hat{p}} = p = 0.60\)

Standard deviation: \(\sigma_{\hat{p}} = \sqrt{\frac{0.60(0.40)}{100}} = \sqrt{0.0024} \approx 0.049\)

Check Large Counts: \(np = 100(0.60) = 60 \geq 10\) ✓ \(n(1-p) = 100(0.40) = 40 \geq 10\) ✓

Shape: Approximately Normal — both conditions met.

P(\(\hat{p} < 0.55\)): \(z = \frac{0.55-0.60}{0.049} \approx -1.02\), so \(P(\hat{p} < 0.55) \approx 0.154\)

There's about a 15.4% chance that a random sample of 100 voters gives a sample proportion below 0.55, even if the true proportion is 0.60.

Feature	Sampling Dist. of \(\bar{x}\)	Sampling Dist. of \(\hat{p}\)
Estimates	Population mean \(\mu\)	Population proportion \(p\)
Mean	\(\mu_{\bar{x}} = \mu\)	\(\mu_{\hat{p}} = p\)
Std Error	\(\sigma/\sqrt{n}\)	\(\sqrt{p(1-p)/n}\)
Normal when	Population Normal, OR \(n \geq 30\) (CLT)	\(np \geq 10\) AND \(n(1-p) \geq 10\)
Used for	Quantitative data	Categorical data (2 categories)

Section 5.6

Putting It All Together: The 3 Questions

For any sampling distribution problem on the AP exam, answer three questions:

📐 The 3-Step Framework for Sampling Distributions

1. CENTER: What is the mean of the sampling distribution?

For \(\bar{x}\): \(\mu_{\bar{x}} = \mu\) | For \(\hat{p}\): \(\mu_{\hat{p}} = p\)

2. SPREAD: What is the standard deviation (standard error)?

For \(\bar{x}\): \(\sigma/\sqrt{n}\) | For \(\hat{p}\): \(\sqrt{p(1-p)/n}\)

3. SHAPE: Is the sampling distribution approximately Normal?

For \(\bar{x}\): Population Normal OR \(n \geq 30\) (CLT) | For \(\hat{p}\): \(np \geq 10\) AND \(n(1-p) \geq 10\)

💡 AP Exam Must-Dos

Always state and check conditions before using Normal calculations — never just assume normality.

When asked for a probability about \(\bar{x}\) or \(\hat{p}\), always use the standard error (not \(\sigma\)) in your z-score formula:

\(z = \dfrac{\bar{x} - \mu}{\sigma/\sqrt{n}}\) or \(z = \dfrac{\hat{p} - p}{\sqrt{p(1-p)/n}}\)

Using \(\sigma\) instead of \(\sigma/\sqrt{n}\) is a very common — and costly — AP exam error.

Exam Practice

Multiple Choice Questions

Try each question, then reveal the answer and explanation.

MCQ · Q1 Standard Error

A population has mean μ = 50 and standard deviation σ = 12. A random sample of size n = 36 is selected. What is the standard deviation of the sampling distribution of x̄?

A 12
B 6
C 3
D 2
E 1.39

✓ Correct Answer: D — 2

\(\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{12}{\sqrt{36}} = \frac{12}{6} = \mathbf{2}\)
This is the standard error — the spread of the sampling distribution of \(\bar{x}\). Note: 12 (choice A) is the population standard deviation, not the standard error. A common trap.

MCQ · Q2 Central Limit Theorem

A population is strongly right-skewed with mean μ = 20 and standard deviation σ = 8. A researcher takes a random sample of n = 64. Which of the following best describes the sampling distribution of x̄?

A Strongly right-skewed with mean 20 and standard deviation 8
B Strongly right-skewed with mean 20 and standard deviation 1
C Approximately Normal with mean 20 and standard deviation 8
D Approximately Normal with mean 20 and standard deviation 1
E Exactly Normal with mean 20 and standard deviation 1

✓ Correct Answer: D

By the CLT, since n = 64 ≥ 30, the sampling distribution of \(\bar{x}\) is approximately Normal (not exactly — E is wrong because the population isn't Normal).
Mean: \(\mu_{\bar{x}} = \mu = 20\)
Standard error: \(\sigma_{\bar{x}} = 8/\sqrt{64} = 8/8 = \mathbf{1}\)

MCQ · Q3 Sampling Distribution of p̂

In a large city, 35% of residents own a bicycle. A random sample of 200 residents is selected. What is the standard deviation of the sampling distribution of p̂?

A 0.350
B 0.0338
C 0.00114
D 0.0477
E 0.455

✓ Correct Answer: B — 0.0338

\(\sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}} = \sqrt{\frac{0.35(0.65)}{200}} = \sqrt{\frac{0.2275}{200}} = \sqrt{0.0011375} \approx \mathbf{0.0338}\)

MCQ · Q4 Bias & Variability

A researcher uses a voluntary response survey to estimate the average daily screen time of teenagers in a city. She uses a very large sample of 5,000 respondents. Which of the following best describes the estimate?

A The estimate will have low bias and low variability due to the large sample size.
B The estimate will have low bias because of the large sample, but high variability.
C The estimate will likely be biased; the large sample size does not correct for the voluntary response bias.
D The estimate will be unbiased because voluntary response samples are representative.
E The estimate will be exact because the sample size is very large.

✓ Correct Answer: C

Voluntary response sampling is biased — people who feel strongly (those with very high or very low screen time) are more likely to respond. A large sample size cannot fix bias in the sampling method. A large biased sample just gives a more precise estimate of the wrong thing.

MCQ · Q5 Normal Probability for x̄

The weights of apples at a farm are Normally distributed with mean μ = 180 grams and standard deviation σ = 15 grams. A random sample of 25 apples is selected. What is the probability that the sample mean weight is less than 174 grams?

A 0.3446
B 0.1587
C 0.0548
D 0.0228
E 0.0062

✓ Correct Answer: D — 0.0228

Standard error: \(\sigma_{\bar{x}} = 15/\sqrt{25} = 15/5 = 3\) grams
\(z = \frac{174 - 180}{3} = \frac{-6}{3} = -2.0\)
\(P(\bar{x} < 174) = P(Z < -2.0) \approx \mathbf{0.0228}\)
Note: If you mistakenly used σ = 15 instead of σ/√n = 3, you'd get z = −0.4 → P ≈ 0.3446 (choice A). Always use the standard error for sampling distribution problems!

Exam Practice

Free Response Questions

Write your full solution before revealing. Always state and verify conditions, show z-score work, and answer in context.

FRQ 1 — Sampling Distribution of x̄

~12 minutes

A large university reports that the time students spend studying per week is right-skewed with a mean of μ = 14 hours and a standard deviation of σ = 6 hours. A random sample of 49 students is selected.

(a)

Describe the shape, center, and spread of the sampling distribution of \(\bar{x}\). Justify your answer.

(b)

Calculate the probability that the sample mean study time is greater than 15.5 hours.

(c)

Would the probability in part (b) be exactly the same if the sample size were n = 9 instead of n = 49? Explain.

✓ Model Solution

(a) Describe the sampling distribution:

Shape: Approximately Normal. Although the population is right-skewed, the sample size n = 49 ≥ 30, so by the Central Limit Theorem, the sampling distribution of \(\bar{x}\) is approximately Normal.

Center: \(\mu_{\bar{x}} = \mu = 14\) hours

Spread: \(\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{6}{\sqrt{49}} = \frac{6}{7} \approx 0.857\) hours

(b) P(\(\bar{x}\) > 15.5):

\(z = \frac{15.5 - 14}{6/\sqrt{49}} = \frac{1.5}{0.857} \approx 1.75\)

\(P(\bar{x} > 15.5) = P(Z > 1.75) \approx 1 - 0.9599 = \mathbf{0.0401}\)

There is approximately a 4% chance that a random sample of 49 students has a mean study time greater than 15.5 hours.

(c) Would n = 9 give the same probability?

No. With n = 9, the CLT does not apply because n < 30 and the population is right-skewed (not Normal). We cannot use Normal probability calculations for the sampling distribution of \(\bar{x}\) with such a small sample from a skewed population. The probability calculation in part (b) would not be valid with n = 9.

✓ AP grading tips: Part (a) must explicitly mention the CLT and state n ≥ 30. Part (c) must say the CLT does not apply and explain why. Not just "the answer would be different."

FRQ 2 — Sampling Distribution of p̂

~12 minutes

A national survey found that 42% of adults report eating breakfast every day. A health researcher randomly selects 150 adults from a large city to survey.

(a)

Verify that the sampling distribution of \(\hat{p}\) can be approximated by a Normal distribution. State the conditions and check them.

(b)

Describe the mean and standard deviation of the sampling distribution of \(\hat{p}\).

(c)

Find the probability that the sample proportion is between 0.38 and 0.48.

(d)

If the researcher doubled the sample size to 300, describe the effect on the standard deviation of the sampling distribution. Calculate the new standard deviation.

✓ Model Solution

(a) Check conditions for Normality:

Random: The sample is randomly selected. ✓

Large Counts:

\(np = 150(0.42) = 63 \geq 10\) ✓

\(n(1-p) = 150(0.58) = 87 \geq 10\) ✓

Both conditions are satisfied, so the sampling distribution of \(\hat{p}\) can be approximated by a Normal distribution.

(b) Mean and Standard Deviation:

\(\mu_{\hat{p}} = p = \mathbf{0.42}\)

\(\sigma_{\hat{p}} = \sqrt{\frac{0.42(0.58)}{150}} = \sqrt{\frac{0.2436}{150}} = \sqrt{0.001624} \approx \mathbf{0.0403}\)

(c) P(0.38 < \(\hat{p}\) < 0.48):

\(z_1 = \frac{0.38 - 0.42}{0.0403} \approx -0.99\) | \(z_2 = \frac{0.48 - 0.42}{0.0403} \approx 1.49\)

\(P(-0.99 < Z < 1.49) = P(Z < 1.49) - P(Z < -0.99)\)

\(\approx 0.9319 - 0.1611 = \mathbf{0.7708}\)

There is approximately a 77% chance that the sample proportion falls between 0.38 and 0.48.

(d) Effect of doubling sample size:

Doubling \(n\) from 150 to 300 reduces the standard deviation by a factor of \(\sqrt{2} \approx 1.414\).

\(\sigma_{\hat{p}} = \sqrt{\frac{0.42(0.58)}{300}} = \sqrt{\frac{0.2436}{300}} \approx \sqrt{0.000812} \approx \mathbf{0.0285}\)

The standard deviation decreased from 0.0403 to 0.0285. Note that doubling the sample size does not halve the standard deviation — it only reduces it by a factor of √2 ≈ 1.41. To halve the standard deviation, you must quadruple the sample size.

✓ AP tip: For (a) you must write out np ≥ 10 AND n(1-p) ≥ 10 with numbers. For (d), many students say "the variability decreases" without calculating or explaining the √n relationship — full credit requires the calculation.