Why t and Not z?
In Unit 6 we used z-procedures for proportions because we could compute the standard deviation of \(\hat{p}\) directly from \(p_0\). For means, the situation is different — the population standard deviation \(\sigma\) is almost never known in practice.
The z-test for means requires knowing \(\sigma\). In real life we almost never know \(\sigma\), so we estimate it with the sample standard deviation \(s\).
But replacing \(\sigma\) with \(s\) introduces extra uncertainty. To account for this, we use the t-distribution instead of the Normal distribution — and the t-distribution has heavier tails to reflect that extra uncertainty.
The t-Distribution
Shape: Symmetric and bell-shaped, like the Normal — but with heavier tails.
Degrees of freedom (df): One parameter that controls the shape. For one-sample procedures: \(df = n - 1\).
As df increases: The t-distribution approaches the Standard Normal. By df = 30+, they are very close.
Critical value \(t^*\): Always larger than the corresponding z* for the same confidence level — because t-distributions have heavier tails.
The AP exam provides a t-table. You need to know: find the row for your degrees of freedom (df = n − 1) and the column for your confidence level or tail probability.
Your calculator can also compute t* and p-values directly. Know both methods.
One-Sample t Confidence Interval for \(\mu\)
\(t^*\) = critical value from t-distribution with \(df = n-1\)
\(s/\sqrt{n}\) = standard error of the mean (SE)
Conditions for One-Sample t Procedures
R — Random: Data from a random sample or randomized experiment.
N — Normal/Large Sample: Either:
• Population is Normally distributed, OR
• \(n \geq 30\) (CLT applies), OR
• Sample size is small BUT no strong skew or outliers in the data (check with a dotplot or histogram)
I — Independent: \(n \leq 10\%\) of population (if sampling without replacement).
Proportions (Unit 6): Large Counts — \(np \geq 10\) and \(n(1-p) \geq 10\)
Means (Unit 7): Normal/Large Sample — population Normal, OR \(n \geq 30\), OR small sample with no outliers/skew. These are completely different conditions for different procedures — don't mix them up.
A random sample of 16 students has a mean study time of \(\bar{x} = 8.5\) hours/week with \(s = 2.4\) hours. Construct a 95% CI for the true mean study time. Assume the population is approximately Normal.
Conditions: Random ✓ | Normal (stated) ✓ | Independent (16 < 10% of all students) ✓
df = n − 1 = 15 | t* = 2.131 (from t-table, df=15, 95% CI)
\(SE = s/\sqrt{n} = 2.4/\sqrt{16} = 2.4/4 = 0.6\)
\(CI = 8.5 \pm 2.131(0.6) = 8.5 \pm 1.279\)
Interval: (7.221, 9.779) hours
✓ Interpretation: "We are 95% confident that the true mean study time for all students is between 7.22 and 9.78 hours per week."
One-Sample t Test for \(\mu\)
\(df = n - 1\) | p-value found from the t-distribution
| Situation | Use | Why |
|---|---|---|
| Inference for a proportion | z-procedure | SE uses p₀ or p̂, which is known |
| Inference for a mean, σ known | z-procedure | Rare in practice — σ almost never known |
| Inference for a mean, σ unknown | t-procedure | Must estimate σ with s — adds uncertainty |
The label on a bottling machine claims it fills bottles with 500 mL on average. A quality inspector takes a random sample of 25 bottles and finds \(\bar{x} = 497.3\) mL and \(s = 6.8\) mL. Is there evidence that the machine is underfilling? Use α = 0.05.
Step 1 — Hypotheses: \(H_0: \mu = 500\) vs \(H_a: \mu < 500\) (one-sided left)
Step 2 — Conditions: Random ✓ | Large enough sample (n=25, assume approx. Normal) ✓ | Independent ✓
Step 3 — Calculate:
\(t = \frac{497.3 - 500}{6.8/\sqrt{25}} = \frac{-2.7}{1.36} \approx -1.985\)
\(df = 24\) | p-value = P(t < −1.985) ≈ 0.029
Step 4 — Conclude: Since p-value (0.029) < α (0.05), we reject \(H_0\). There is convincing evidence that the true mean fill amount is less than 500 mL — the machine appears to be underfilling.
Two-Sample t Procedures
When comparing means from two independent groups, we use two-sample t procedures. The two groups must be independent — observations in one group do not affect the other.
On the AP exam, always use technology for df in two-sample t procedures — the formula is not required.
Unlike the two-proportion z-test which uses a pooled proportion, the AP Statistics course does not use a pooled standard deviation for two-sample t-tests. Always use the unpooled formula shown above, which uses \(s_1\) and \(s_2\) separately. Pooling is an extra assumption that AP Statistics avoids.
Do males and females differ in sleep duration? Random samples: Males (n₁=20): \(\bar{x}_1=7.1\)h, \(s_1=1.2\)h. Females (n₂=25): \(\bar{x}_2=7.6\)h, \(s_2=0.9\)h. Test at α = 0.05.
Hypotheses: \(H_0: \mu_1 = \mu_2\) vs \(H_a: \mu_1 \neq \mu_2\)
Conditions: Both random ✓ | Both samples large enough or approx Normal ✓ | Independent groups ✓
\(t = \frac{(7.1-7.6)-0}{\sqrt{1.44/20 + 0.81/25}} = \frac{-0.5}{\sqrt{0.072+0.0324}} = \frac{-0.5}{\sqrt{0.1044}} = \frac{-0.5}{0.323} \approx -1.55\)
Using calculator: df ≈ 34, p-value = 2P(t < −1.55) ≈ 0.130
Since p-value (0.130) > α (0.05), fail to reject \(H_0\). There is not convincing evidence of a difference in mean sleep duration between males and females.
Matched Pairs Design
When observations come in natural pairs — the same subject measured twice, or two subjects matched on relevant characteristics — we use a matched pairs t-test. This is actually just a one-sample t-test on the differences.
Calculate the difference for each pair: \(d_i = x_{1i} - x_{2i}\)
Then treat the differences as a single dataset and run a one-sample t-test on \(\{d_i\}\).
Hypotheses: \(H_0: \mu_d = 0\) vs \(H_a: \mu_d \neq 0\) (or >0 or <0)
Matched pairs: Same subject measured twice (before/after), or subjects paired by a common characteristic. n pairs → one dataset of n differences.
Two-sample t: Two completely separate, independent groups of subjects. Two separate datasets. The AP exam frequently tests your ability to identify which design was used.
Conditions Summary & Procedure Selection
| Procedure | When to Use | df | Key Formula |
|---|---|---|---|
| One-sample t CI | Estimating single population mean μ | \(n-1\) | \(\bar{x} \pm t^* \cdot s/\sqrt{n}\) |
| One-sample t test | Testing claim about single μ | \(n-1\) | \(t = (\bar{x}-\mu_0)/(s/\sqrt{n})\) |
| Two-sample t CI | Estimating difference \(\mu_1-\mu_2\) for two independent groups | Use calculator | \((\bar{x}_1-\bar{x}_2) \pm t^*\sqrt{s_1^2/n_1+s_2^2/n_2}\) |
| Two-sample t test | Testing \(\mu_1=\mu_2\) for two independent groups | Use calculator | \(t = (\bar{x}_1-\bar{x}_2)/\sqrt{s_1^2/n_1+s_2^2/n_2}\) |
| Matched pairs t | Two paired measurements — same subjects or matched pairs | \(n-1\) (n = # pairs) | \(t = \bar{d}/(s_d/\sqrt{n})\) |
✓ State hypotheses (test) or confidence level (CI) with correct parameter notation (\(\mu\), not \(\bar{x}\))
✓ Check ALL three conditions: Random, Normal/Large Sample, Independent
✓ For Normal condition with small samples: mention "no strong skew or outliers"
✓ Show formula, substitute values, compute t and df
✓ Conclude in context — "convincing evidence" (reject) or "not convincing evidence" (FTR)
Multiple Choice Questions
Try each question, then reveal the answer and explanation.
A researcher wants to test whether the mean commute time for workers in a city exceeds 30 minutes. She takes a random sample of 22 workers and records their commute times. The population standard deviation is unknown. Which test is most appropriate?
- A One-sample z-test for a mean
- B One-sample t-test for a mean
- C Two-sample t-test for a difference in means
- D One-sample z-test for a proportion
- E Matched pairs t-test
One group, one mean, σ unknown → one-sample t-test. We can't use z because σ is unknown. There is only one group (not two), and no pairing — so two-sample and matched pairs are ruled out.
A one-sample t-test is conducted using a sample of size n = 18. What are the degrees of freedom, and how does the corresponding t* for 95% confidence compare to z* = 1.960?
- A df = 18; t* = 1.960, same as z*
- B df = 17; t* < 1.960
- C df = 17; t* > 1.960
- D df = 18; t* > 1.960
- E df = 17; t* = 1.960
\(df = n - 1 = 17\). The t-distribution with df=17 has heavier tails than the Normal, so t* for 95% confidence is larger than z* = 1.960. Specifically, t* ≈ 2.110. This is always the case: t* > z* for finite degrees of freedom.
Researchers study the effect of a new training program on employee productivity. They measure each employee's productivity score before the training and again after. Which analysis is most appropriate?
- A Two-sample t-test comparing before scores to after scores
- B One-sample z-test for the mean difference
- C Matched pairs t-test on the differences (after − before)
- D Two-sample t-test for the difference in means
- E Chi-square test for independence
The same employees are measured twice (before and after) — this is a matched pairs design. The correct analysis is to compute the difference for each employee and run a one-sample t-test on those differences. A two-sample t-test would be wrong here because the two measurements are not independent — they come from the same people.
A researcher has a random sample of 8 measurements. A dotplot of the data shows a slight right skew and no outliers. Is it appropriate to use a one-sample t-procedure?
- A No, because n < 30 so the CLT does not apply.
- B No, because the data is skewed so we cannot use t-procedures.
- C Yes, because the sample size is large enough.
- D Yes, because with no strong skew or outliers, the Normal condition is satisfied even for small samples.
- E Only if the population standard deviation is known.
For small samples, the Normal condition is met if the data shows no strong skew or outliers. The CLT (n ≥ 30) is only one way to satisfy the condition — it is not the only way. A slight skew with no outliers is acceptable. (A) is too strict — n ≥ 30 is a sufficient but not necessary condition.
A 95% confidence interval for the mean number of hours per week adults spend on social media is (8.2, 14.6). A researcher claims that adults spend more than 10 hours per week on average. Is this claim supported?
- A Yes, because the interval is entirely above 8 hours.
- B No, because 10 is contained in the interval, so the claim is refuted.
- C The interval does not support or refute the claim — we need a hypothesis test.
- D No, because we cannot conclude μ > 10 since values below 10 (like 8.2) are also plausible.
- E Yes, because the midpoint of the interval (11.4) is above 10.
The interval (8.2, 14.6) contains values both below and above 10. Since the interval includes values like 8.2 and 9.5 which are below 10, we cannot conclude that μ > 10. The entire interval would need to be above 10 to support the claim that μ > 10. (C) is tempting but wrong — a CI can and does give information about one-sided claims.
Free Response Questions
Always use the 4-step procedure. State conditions carefully — the Normal condition for means is different from proportions.
FRQ 1 — Matched Pairs t-Test
~15 minutesAfter: 141, 153, 149, 160, 138, 157, 150, 164
✓ Model Solution
(a) Why matched pairs:
The same 8 patients are measured twice — before and after the diet. The two measurements for each patient are not independent: a patient with naturally high blood pressure will tend to have high readings both before and after. By taking differences within each patient, we control for individual variation in baseline blood pressure. A two-sample test would be inappropriate because the two groups (before/after) are not independent samples.
(b) Differences (Before − After):
7, 9, 6, 11, 4, 9, 8, 11
\(\bar{d} = (7+9+6+11+4+9+8+11)/8 = 65/8 = \mathbf{8.125}\) mmHg
\(s_d\): deviations from mean: −1.125, 0.875, −2.125, 2.875, −4.125, 0.875, −0.125, 2.875
\(s_d = \sqrt{\frac{\sum(d_i-\bar{d})^2}{n-1}} = \sqrt{\frac{1.266+0.766+4.516+8.266+17.016+0.766+0.016+8.266}{7}} = \sqrt{40.875/7} \approx \mathbf{2.416}\)
(c) Matched Pairs t-Test — 4 Steps:
Step 1 — Hypotheses: Let \(\mu_d\) = true mean difference (Before − After) in blood pressure.
\(H_0: \mu_d = 0\) vs \(H_a: \mu_d > 0\) (one-sided: we're testing whether the diet reduces BP, i.e., differences are positive)
Step 2 — Conditions: Random (stated) ✓ | Normal: n=8 is small, but the differences (7,9,6,11,4,9,8,11) show no strong skew or outliers ✓ | Independent: patients are independent of each other ✓
Step 3 — Calculate:
\(t = \frac{\bar{d} - 0}{s_d/\sqrt{n}} = \frac{8.125}{2.416/\sqrt{8}} = \frac{8.125}{0.854} \approx \mathbf{9.51}\)
\(df = n-1 = 7\) | p-value = P(t > 9.51) with df=7 ≈ < 0.0001
Step 4 — Conclude: Since p-value (< 0.0001) < α (0.05), we reject \(H_0\). There is very convincing evidence that the new diet reduces systolic blood pressure on average.
✓ AP tip: Part (a) must say "same subjects measured twice" — not just "they're paired." Part (c) hypothesis must use μ_d (the mean difference), not μ₁ and μ₂.
FRQ 2 — Two-Sample t Confidence Interval
~12 minutes✓ Model Solution
(a) Conditions:
Random: Both samples are random samples from their respective schools. ✓
Normal/Large Sample: \(n_A = 30 \geq 30\) ✓ and \(n_B = 35 \geq 30\) ✓ — CLT applies for both groups.
Independent: The two schools are separate, independent groups. 30 and 35 students are each less than 10% of their school's population. ✓
(b) 95% Confidence Interval:
\(SE = \sqrt{\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}} = \sqrt{\frac{72.25}{30} + \frac{125.44}{35}} = \sqrt{2.408 + 3.584} = \sqrt{5.992} \approx 2.448\)
\(CI = (74.2 - 69.8) \pm 2.002(2.448) = 4.4 \pm 4.901\)
Interval: (−0.501, 9.301)
(c) Interpretation and conclusion:
We are 95% confident that the true difference in mean reading scores (School A minus School B) is between −0.501 and 9.301 points.
Since the interval contains 0, we do not have convincing evidence of a difference in mean reading scores between the two schools at the 95% confidence level. Both positive and negative differences are plausible — we cannot conclude that one school outperforms the other.
✓ AP tip: Always check whether 0 is in the CI for two-sample problems. If yes → fail to reject H₀. If no → reject H₀. This connects CIs to hypothesis tests.