Home About AP Statistics 🧮 Calculator

Unit 8: Chi-Square Tests

Goodness of Fit · Homogeneity · Independence · Expected Counts · Selecting the Right Test

📊 2–5% of Exam ⏱ ~2 weeks

What Is a Chi-Square Test?

Units 6 and 7 handled inference for quantitative variables and proportions. Chi-square tests handle categorical data with two or more categories. Instead of asking "what is the mean?" we ask "does the distribution match what we expect?" or "are two categorical variables related?"

🔑 Three Chi-Square Tests at a Glance

Goodness of Fit: One sample. One categorical variable. Does the observed distribution match a claimed one?

Homogeneity: Multiple independent samples. One categorical variable. Do different populations share the same distribution?

Independence: One sample. Two categorical variables. Are the two variables associated?

Chi-Square Test Statistic — Used in ALL Three Tests
\[ \chi^2 = \sum_{\text{all cells}} \frac{(O - E)^2}{E} \]
\(O\) = observed count in a cell  |  \(E\) = expected count in that cell
Each term \((O-E)^2/E\) measures how far one cell's count is from what we expected.
A large \(\chi^2\) means large discrepancies from expected → evidence against \(H_0\).
Chi-Square Distribution — p-value Is Always the Right-Tail Area 0 5 10 15 20 25 χ² value χ²=18 p-value right tail area df = 2 (sharply peaked) df = 5 df = 10 p-value (df=10)
💡 Key Properties of the Chi-Square Distribution

• Always right-skewed — never symmetric like Normal

• Values are always ≥ 0 (sum of squared terms)

• The p-value is always the right-tail area — there are no left-tail or two-sided chi-square tests

• As df increases, the distribution shifts right and becomes more symmetric

• Degrees of freedom: GOF uses df = k−1; Homogeneity and Independence use df = (r−1)(c−1)

📐 Conditions for ALL Chi-Square Tests

Random: Data from a random sample or randomized experiment.

Large Counts: All expected counts ≥ 5. Check this AFTER computing expected counts.

Independent: Individual observations are independent. If sampling without replacement, \(n \leq 10\%\) of population.

⚠️ Always Check EXPECTED Counts, Not Observed

The Large Counts condition uses expected counts \(E\), not observed counts \(O\). You must compute all \(E\) values first, then verify each \(E \geq 5\). An observed count of 3 is fine as long as the corresponding expected count is ≥ 5.

Chi-Square Goodness of Fit Test

Use this test when you have one sample and want to know whether a single categorical variable follows a specific claimed distribution.

📐 Goodness of Fit — 4-Step Setup

H₀: The variable follows the claimed distribution. (State each claimed proportion.)

Hₐ: The variable does not follow the claimed distribution. (At least one proportion differs.)

Expected count: \(E_i = n \cdot p_i\) for each category

df = number of categories − 1 = \(k - 1\)

Test statistic: \(\chi^2 = \sum \frac{(O-E)^2}{E}\), p-value from chi-square table or calculator

Goodness of Fit: Observed vs Expected Counts Example: Are M&M colors equally distributed? n = 200 0 20 40 Count 48 40 Red 38 40 Blue 42 40 Green 36 40 Yellow 36 40 Orange Observed (O) Expected (E) — if uniform: E = 200×0.20 = 40
📌 Full Worked Example: M&M Colors

A bag of 200 M&Ms: Red=48, Blue=38, Green=42, Yellow=36, Orange=36. The company claims equal proportions. Test at α = 0.05.

H₀: \(p_{red}=p_{blue}=p_{green}=p_{yellow}=p_{orange}=0.20\)

Hₐ: At least one proportion differs from 0.20.

Conditions: Random ✓   Expected counts: all = \(200×0.20=40 \geq 5\) ✓   Independent ✓

\(\chi^2 = \frac{(48-40)^2}{40}+\frac{(38-40)^2}{40}+\frac{(42-40)^2}{40}+\frac{(36-40)^2}{40}+\frac{(36-40)^2}{40} = 1.6+0.1+0.1+0.4+0.4 = \mathbf{2.6}\)

\(df = 5-1 = 4\), p-value ≈ 0.627

Since 0.627 > 0.05, fail to reject H₀. No convincing evidence that the color distribution differs from equal proportions.

Chi-Square Test for Homogeneity

Use when you take separate random samples from two or more populations and ask: do these populations have the same distribution of a single categorical variable?

📐 Homogeneity Setup

H₀: The distribution of [categorical variable] is the same across all [populations/groups].

Hₐ: The distribution differs for at least one population.

Expected count for each cell: \(E = \dfrac{(\text{row total}) \times (\text{column total})}{\text{grand total}}\)

df = (number of rows − 1)(number of columns − 1) = \((r-1)(c-1)\)

📌 Full Worked Example: Learning Styles Across Schools

Three schools each provide a random sample of 100 students who indicate their preferred learning style.

VisualAuditoryRead/WriteTotal
School A453520100
School B384022100
School C423028100
Total12510570300

H₀: The distribution of learning styles is the same across all three schools.

Hₐ: The distribution differs for at least one school.

Expected counts: E(School A, Visual) = \(\frac{100×125}{300}=41.67\), E(School A, Auditory) = \(\frac{100×105}{300}=35.00\), E(School A, R/W) = \(\frac{100×70}{300}=23.33\) ... (all 9 cells computed similarly)

Check Large Counts: All E ≥ 5 ✓

\(df = (3-1)(3-1) = 4\)

\(\chi^2 = \frac{(45-41.67)^2}{41.67}+\frac{(35-35)^2}{35}+\cdots \approx \mathbf{3.43}\), p-value ≈ 0.488

Since 0.488 > 0.05, fail to reject H₀. No convincing evidence of a difference in learning style distributions across the three schools.

Chi-Square Test for Independence

Use when you have one sample and record two categorical variables for each individual. Ask: are these two variables associated in the population?

📐 Independence Setup

H₀: [Variable 1] and [Variable 2] are independent (no association) in the population.

Hₐ: [Variable 1] and [Variable 2] are associated in the population.

Everything else — expected counts, df, test statistic — is identical to the homogeneity test.

📌 Full Worked Example: Exercise & Health

A random sample of 200 adults: exercise frequency and self-rated health recorded for each person.

ExerciseGood HealthPoor HealthTotal
Regular721890
Occasional482270
Never251540
Total14555200

H₀: Exercise frequency and health rating are independent.   Hₐ: They are associated.

Expected counts:

E(Regular, Good) = \(\frac{90×145}{200}=65.25\)   E(Regular, Poor) = \(\frac{90×55}{200}=24.75\)

E(Occasional, Good) = \(\frac{70×145}{200}=50.75\)   E(Occasional, Poor) = \(\frac{70×55}{200}=19.25\)

E(Never, Good) = \(\frac{40×145}{200}=29.00\)   E(Never, Poor) = \(\frac{40×55}{200}=11.00\)

Large Counts: All expected ≥ 5 ✓   (smallest is 11.00)   df = (3−1)(2−1) = 2

\(\chi^2 = \frac{(72-65.25)^2}{65.25}+\frac{(18-24.75)^2}{24.75}+\frac{(48-50.75)^2}{50.75}+\frac{(22-19.25)^2}{19.25}+\frac{(25-29)^2}{29}+\frac{(15-11)^2}{11}\)

\(= 0.700+1.847+0.149+0.394+0.552+1.455 = \mathbf{5.097}\), p-value ≈ 0.078

Since 0.078 > 0.05, fail to reject H₀. There is not convincing evidence of an association between exercise frequency and health rating at α = 0.05. (Note: at α = 0.10 we would reject!)

Selecting the Right Chi-Square Test

Decision Guide: Which Chi-Square Test? How many categorical variables? 1 variable 2 variables How many samples? 1 sample Goodness of Fit df = k − 1 2+ samples Homogeneity df = (r−1)(c−1) 1 sample, 2 variables recorded Independence df = (r−1)(c−1) All Three Tests Use:   χ² = Σ (O−E)²/E   and   All Expected Counts ≥ 5 Homogeneity: separate samples from different populations  |  Independence: one sample, two variables
TestSamplesVariablesResearch Questiondf
Goodness of Fit1 sample1 categoricalDoes the distribution match a claimed model?k − 1
HomogeneityMultiple (one per population)1 categoricalSame distribution across populations?(r−1)(c−1)
Independence1 sample2 categoricalAre the two variables associated?(r−1)(c−1)

Expected Counts — Step by Step

Computing expected counts correctly is the most error-prone part of chi-square tests. Here is the logic.

🔑 Why This Formula? The Probability Logic

Under H₀ (independence), the probability of being in a particular cell is:

\(P(\text{row } i \text{ AND col } j) = P(\text{row } i) \times P(\text{col } j) = \frac{\text{row}_i \text{ total}}{n} \times \frac{\text{col}_j \text{ total}}{n}\)

So the expected count in that cell is:

\(E_{ij} = n \times \frac{\text{row}_i \text{ total}}{n} \times \frac{\text{col}_j \text{ total}}{n} = \frac{(\text{row}_i \text{ total}) \times (\text{col}_j \text{ total})}{n}\)

📌 Expected Count Step-by-Step

Using the Exercise & Health table (n=200, Regular row total=90, Good Health col total=145):

\(E(\text{Regular, Good}) = \frac{90 \times 145}{200} = \frac{13{,}050}{200} = \mathbf{65.25}\)

This means: "If exercise and health were independent, we would expect 65.25 of the 200 adults to be both Regular exercisers and in Good health."

Note: Expected counts don't have to be whole numbers. That's fine — they are theoretical averages.

💡 Checking Your Work

The row totals and column totals of your expected count table must match the row totals and column totals of the observed table.

Also: the sum of all \((O-E)^2/E\) terms in one row should roughly reflect how far that row's distribution is from expected — the largest contributions tell you which cells drive the significance.


Multiple Choice Questions

Try each question, then reveal the answer and full explanation.

MCQ · Q1Selecting the Test

A researcher surveys 500 randomly selected adults and records each person's political affiliation (Democrat, Republican, Independent) and their opinion on a new policy (Support, Oppose). Which chi-square test is most appropriate?

  • A Goodness of fit — one variable tested against a claimed distribution
  • B Homogeneity — multiple populations compared
  • C Independence — one sample, two variables recorded
  • D Two-sample z-test for proportions
  • E One-sample z-test for a proportion
✓ Correct Answer: C — Independence

One sample (500 adults), two categorical variables (affiliation AND opinion) recorded for each person → test for independence. If the researcher had taken separate samples from each political group and compared opinions, it would be homogeneity.

MCQ · Q2Expected Counts

In a 3×4 two-way table with grand total n = 240, the row total for row 2 is 80 and the column total for column 3 is 60. What is the expected count for that cell?

  • A 4800
  • B 20
  • C 140
  • D 0.333
  • E 60
✓ Correct Answer: B — 20

\(E = \frac{\text{row total} \times \text{col total}}{\text{grand total}} = \frac{80 \times 60}{240} = \frac{4800}{240} = \mathbf{20}\)

MCQ · Q3Degrees of Freedom

A chi-square goodness of fit test is performed on a variable with 6 categories. What is df?

  • A 6
  • B 5
  • C 4
  • D 12
  • E 3
✓ Correct Answer: B — 5

For GOF: df = k − 1 = 6 − 1 = 5. We lose one degree of freedom because all proportions must sum to 1 — once we know 5 of them, the 6th is determined.

MCQ · Q4p-value Direction

A chi-square test yields χ² = 9.8 with df = 4. Which of the following correctly describes the p-value?

  • A P(χ² < 9.8) with df = 4
  • B 2 × P(χ² > 9.8) with df = 4
  • C P(χ² > 9.8) with df = 4
  • D P(Z > 9.8)
  • E P(χ² = 9.8) with df = 4
✓ Correct Answer: C

The chi-square p-value is always the right-tail area — P(χ² > observed value). There are no two-sided or left-tailed chi-square tests. Large χ² values are evidence against H₀, so we always look in the right tail.

MCQ · Q5Homogeneity vs Independence

Random samples of 80 men and 80 women are each asked their preferred music genre. A chi-square test compares the genre distributions. Which test is this?

  • A Goodness of fit
  • B Independence — two variables from one sample
  • C Homogeneity — two separate samples compared on one variable
  • D Two-sample t-test
  • E Goodness of fit with df = 1
✓ Correct Answer: C — Homogeneity

Two separate random samples (one from men, one from women) are compared on one variable (music genre) → homogeneity. If a single sample of 160 people had both gender and genre recorded, it would be independence.

MCQ · Q6Large Counts Condition

In a goodness of fit test with n = 30 and 5 equally likely categories, a student checks that all observed counts are ≥ 5 and declares the Large Counts condition met. What error did the student make?

  • A No error — observing counts ≥ 5 is correct.
  • B The student should check that observed counts ≥ 10.
  • C The Large Counts condition requires all expected counts ≥ 5, not observed counts.
  • D The Large Counts condition is not needed for goodness of fit tests.
  • E The student should check n ≥ 30 instead.
✓ Correct Answer: C

The Large Counts condition requires all expected counts ≥ 5, not observed counts. In this case E = 30/5 = 6 for each category, so the condition IS met — but the student used the wrong counts to check it. Always compute E first, then verify E ≥ 5.

MCQ · Q7Contribution to χ²

In a chi-square test, cell A has O = 30 and E = 20, and cell B has O = 22 and E = 20. Which cell contributes more to the χ² statistic, and why?

  • A Cell B, because it is closer to the expected value
  • B Cell A, because (O−E)²/E = 5.0 vs 0.2 for cell B
  • C Cell A, because it has a larger observed count
  • D They contribute equally since both have E = 20
  • E Cell B, because its observed count is closer to n
✓ Correct Answer: B

Cell A: \(\frac{(30-20)^2}{20} = \frac{100}{20} = 5.0\). Cell B: \(\frac{(22-20)^2}{20} = \frac{4}{20} = 0.2\). Cell A contributes 25× more to χ²! The large discrepancy in Cell A (O is 50% above E) is what drives the test statistic. Identifying which cells contribute most is useful for interpreting results.

MCQ · Q8Conclusion

A chi-square test for independence yields χ² = 3.2 with df = 2 and p-value = 0.202. At α = 0.05, what is the correct conclusion?

  • A Reject H₀; there is convincing evidence of an association.
  • B Fail to reject H₀; there is not convincing evidence of an association.
  • C Reject H₀; the two variables are independent.
  • D Accept H₀; the two variables are definitely independent.
  • E The test is inconclusive because χ² is too small.
✓ Correct Answer: B

p-value (0.202) > α (0.05) → fail to reject H₀. There is not convincing evidence of an association between the two variables. (C) incorrectly says we "reject" and simultaneously says the variables are independent — that's backwards. (D) uses "accept" — always wrong. We never prove independence; we simply lack evidence against it.

Free Response Questions

Use the 4-step procedure. State the correct test, check expected counts, show all work.

FRQ 1 — Goodness of Fit: Genetics

~12 minutes
A genetics textbook states that flower color follows a 9:3:3:1 ratio (Purple : Red : White : Yellow). A botanist grows 320 plants and observes: Purple = 176, Red = 52, White = 54, Yellow = 38. Test whether the data are consistent with the 9:3:3:1 ratio at α = 0.05.
(a)
State the hypotheses.
(b)
Compute all expected counts. Verify the Large Counts condition.
(c)
Calculate χ² and find the p-value. State your conclusion in context.
(d)
Which color category contributes most to χ²? What does this suggest?
✓ Model Solution

(a) Hypotheses:

\(H_0\): Flower colors follow a 9:3:3:1 ratio: \(p_{purple}=\frac{9}{16},\; p_{red}=\frac{3}{16},\; p_{white}=\frac{3}{16},\; p_{yellow}=\frac{1}{16}\)

\(H_a\): At least one color proportion differs from the 9:3:3:1 ratio.


(b) Expected counts:

Purple: \(320 \times \frac{9}{16} = 180\)  |  Red: \(320 \times \frac{3}{16} = 60\)  |  White: \(320 \times \frac{3}{16} = 60\)  |  Yellow: \(320 \times \frac{1}{16} = 20\)

Large Counts: All expected counts (180, 60, 60, 20) ≥ 5. ✓


(c) Chi-square statistic:

\(\chi^2 = \frac{(176-180)^2}{180}+\frac{(52-60)^2}{60}+\frac{(54-60)^2}{60}+\frac{(38-20)^2}{20}\)

\(= \frac{16}{180}+\frac{64}{60}+\frac{36}{60}+\frac{324}{20} = 0.089+1.067+0.600+16.200 = \mathbf{17.956}\)

df = 4 − 1 = 3  |  p-value = P(χ² > 17.956 | df=3) ≈ 0.0005

Since 0.0005 < 0.05, reject H₀. There is very convincing evidence that the flower color distribution does not follow the 9:3:3:1 Mendelian ratio.


(d) Largest contributor:

Yellow contributes \(\frac{(38-20)^2}{20} = 16.2\) out of 18.0 total — 90% of χ². The observed count of 38 yellow plants is nearly double the expected 20. This suggests yellow plants appear far more frequently than predicted by Mendelian genetics, which may indicate a flaw in the 9:3:3:1 model for yellow specifically.

✓ AP tip: Identifying the largest contributing cell and explaining what it means earns full credit on part (d). Say which cell and explain the direction (over or under-represented).

FRQ 2 — Chi-Square Test for Independence

~15 minutes
A random sample of 250 high school students is asked two questions: (1) Do you get at least 8 hours of sleep on school nights? (Yes/No) and (2) How would you rate your academic performance? (Good/Average/Poor). The results are shown below.
SleepGoodAveragePoorTotal
Yes (≥8 hrs)684210120
No (<8 hrs)325840130
Total10010050250
(a)
Identify the type of chi-square test and explain why.
(b)
State the hypotheses and check all conditions.
(c)
Calculate all expected counts, the chi-square statistic, and find the p-value (df = 2).
(d)
State your conclusion. If you conclude there is an association, describe its direction based on the data.
✓ Model Solution

(a) Test type:

Chi-square test for independence. One random sample of 250 students was taken, and two categorical variables (sleep adequacy and academic performance) were recorded for each individual. This is not homogeneity because we did not take separate samples from sleep groups.


(b) Hypotheses and conditions:

\(H_0\): Sleep adequacy and academic performance are independent (no association) in the population of high school students.

\(H_a\): Sleep adequacy and academic performance are associated.

Random: Random sample stated ✓

Large Counts: (compute below — all ≥ 5 ✓)

Independent: 250 < 10% of all high school students ✓


(c) Expected counts, χ², p-value:

E(Yes, Good) = \(\frac{120×100}{250}=48\)   E(Yes, Avg) = \(\frac{120×100}{250}=48\)   E(Yes, Poor) = \(\frac{120×50}{250}=24\)

E(No, Good) = \(\frac{130×100}{250}=52\)   E(No, Avg) = \(\frac{130×100}{250}=52\)   E(No, Poor) = \(\frac{130×50}{250}=26\)

All expected ≥ 5 ✓ (smallest is 24)

\(\chi^2 = \frac{(68-48)^2}{48}+\frac{(42-48)^2}{48}+\frac{(10-24)^2}{24}+\frac{(32-52)^2}{52}+\frac{(58-52)^2}{52}+\frac{(40-26)^2}{26}\)

\(= \frac{400}{48}+\frac{36}{48}+\frac{196}{24}+\frac{400}{52}+\frac{36}{52}+\frac{196}{26}\)

\(= 8.333+0.750+8.167+7.692+0.692+7.538 = \mathbf{33.17}\)

df = (2−1)(3−1) = 2  |  p-value = P(χ² > 33.17 | df=2) ≈ <0.0001


(d) Conclusion and direction:

Since p-value (<0.0001) < α (0.05), we reject H₀. There is very convincing evidence of an association between sleep adequacy and academic performance among high school students.

Direction: Students who get ≥8 hours of sleep are much more likely to report good academic performance (68 observed vs 48 expected) and less likely to report poor performance (10 vs 24). Students who get <8 hours are more likely to report average or poor performance. The pattern strongly suggests that getting sufficient sleep is associated with better academic outcomes.

✓ AP tips: (a) must explain why it's independence, not just name it. (c) show all 6 expected counts. (d) must describe the direction — which group had more/fewer than expected in which category.

← Unit 7: Inference for Means Unit 9: Inference for Slopes →