What Is a Chi-Square Test?
Units 6 and 7 handled inference for quantitative variables and proportions. Chi-square tests handle categorical data with two or more categories. Instead of asking "what is the mean?" we ask "does the distribution match what we expect?" or "are two categorical variables related?"
Goodness of Fit: One sample. One categorical variable. Does the observed distribution match a claimed one?
Homogeneity: Multiple independent samples. One categorical variable. Do different populations share the same distribution?
Independence: One sample. Two categorical variables. Are the two variables associated?
Each term \((O-E)^2/E\) measures how far one cell's count is from what we expected.
A large \(\chi^2\) means large discrepancies from expected → evidence against \(H_0\).
• Always right-skewed — never symmetric like Normal
• Values are always ≥ 0 (sum of squared terms)
• The p-value is always the right-tail area — there are no left-tail or two-sided chi-square tests
• As df increases, the distribution shifts right and becomes more symmetric
• Degrees of freedom: GOF uses df = k−1; Homogeneity and Independence use df = (r−1)(c−1)
Random: Data from a random sample or randomized experiment.
Large Counts: All expected counts ≥ 5. Check this AFTER computing expected counts.
Independent: Individual observations are independent. If sampling without replacement, \(n \leq 10\%\) of population.
The Large Counts condition uses expected counts \(E\), not observed counts \(O\). You must compute all \(E\) values first, then verify each \(E \geq 5\). An observed count of 3 is fine as long as the corresponding expected count is ≥ 5.
Chi-Square Goodness of Fit Test
Use this test when you have one sample and want to know whether a single categorical variable follows a specific claimed distribution.
H₀: The variable follows the claimed distribution. (State each claimed proportion.)
Hₐ: The variable does not follow the claimed distribution. (At least one proportion differs.)
Expected count: \(E_i = n \cdot p_i\) for each category
df = number of categories − 1 = \(k - 1\)
Test statistic: \(\chi^2 = \sum \frac{(O-E)^2}{E}\), p-value from chi-square table or calculator
A bag of 200 M&Ms: Red=48, Blue=38, Green=42, Yellow=36, Orange=36. The company claims equal proportions. Test at α = 0.05.
H₀: \(p_{red}=p_{blue}=p_{green}=p_{yellow}=p_{orange}=0.20\)
Hₐ: At least one proportion differs from 0.20.
Conditions: Random ✓ Expected counts: all = \(200×0.20=40 \geq 5\) ✓ Independent ✓
\(\chi^2 = \frac{(48-40)^2}{40}+\frac{(38-40)^2}{40}+\frac{(42-40)^2}{40}+\frac{(36-40)^2}{40}+\frac{(36-40)^2}{40} = 1.6+0.1+0.1+0.4+0.4 = \mathbf{2.6}\)
\(df = 5-1 = 4\), p-value ≈ 0.627
Since 0.627 > 0.05, fail to reject H₀. No convincing evidence that the color distribution differs from equal proportions.
Chi-Square Test for Homogeneity
Use when you take separate random samples from two or more populations and ask: do these populations have the same distribution of a single categorical variable?
H₀: The distribution of [categorical variable] is the same across all [populations/groups].
Hₐ: The distribution differs for at least one population.
Expected count for each cell: \(E = \dfrac{(\text{row total}) \times (\text{column total})}{\text{grand total}}\)
df = (number of rows − 1)(number of columns − 1) = \((r-1)(c-1)\)
Three schools each provide a random sample of 100 students who indicate their preferred learning style.
| Visual | Auditory | Read/Write | Total | |
|---|---|---|---|---|
| School A | 45 | 35 | 20 | 100 |
| School B | 38 | 40 | 22 | 100 |
| School C | 42 | 30 | 28 | 100 |
| Total | 125 | 105 | 70 | 300 |
H₀: The distribution of learning styles is the same across all three schools.
Hₐ: The distribution differs for at least one school.
Expected counts: E(School A, Visual) = \(\frac{100×125}{300}=41.67\), E(School A, Auditory) = \(\frac{100×105}{300}=35.00\), E(School A, R/W) = \(\frac{100×70}{300}=23.33\) ... (all 9 cells computed similarly)
Check Large Counts: All E ≥ 5 ✓
\(df = (3-1)(3-1) = 4\)
\(\chi^2 = \frac{(45-41.67)^2}{41.67}+\frac{(35-35)^2}{35}+\cdots \approx \mathbf{3.43}\), p-value ≈ 0.488
Since 0.488 > 0.05, fail to reject H₀. No convincing evidence of a difference in learning style distributions across the three schools.
Chi-Square Test for Independence
Use when you have one sample and record two categorical variables for each individual. Ask: are these two variables associated in the population?
H₀: [Variable 1] and [Variable 2] are independent (no association) in the population.
Hₐ: [Variable 1] and [Variable 2] are associated in the population.
Everything else — expected counts, df, test statistic — is identical to the homogeneity test.
A random sample of 200 adults: exercise frequency and self-rated health recorded for each person.
| Exercise | Good Health | Poor Health | Total |
|---|---|---|---|
| Regular | 72 | 18 | 90 |
| Occasional | 48 | 22 | 70 |
| Never | 25 | 15 | 40 |
| Total | 145 | 55 | 200 |
H₀: Exercise frequency and health rating are independent. Hₐ: They are associated.
Expected counts:
E(Regular, Good) = \(\frac{90×145}{200}=65.25\) E(Regular, Poor) = \(\frac{90×55}{200}=24.75\)
E(Occasional, Good) = \(\frac{70×145}{200}=50.75\) E(Occasional, Poor) = \(\frac{70×55}{200}=19.25\)
E(Never, Good) = \(\frac{40×145}{200}=29.00\) E(Never, Poor) = \(\frac{40×55}{200}=11.00\)
Large Counts: All expected ≥ 5 ✓ (smallest is 11.00) df = (3−1)(2−1) = 2
\(\chi^2 = \frac{(72-65.25)^2}{65.25}+\frac{(18-24.75)^2}{24.75}+\frac{(48-50.75)^2}{50.75}+\frac{(22-19.25)^2}{19.25}+\frac{(25-29)^2}{29}+\frac{(15-11)^2}{11}\)
\(= 0.700+1.847+0.149+0.394+0.552+1.455 = \mathbf{5.097}\), p-value ≈ 0.078
Since 0.078 > 0.05, fail to reject H₀. There is not convincing evidence of an association between exercise frequency and health rating at α = 0.05. (Note: at α = 0.10 we would reject!)
Selecting the Right Chi-Square Test
| Test | Samples | Variables | Research Question | df |
|---|---|---|---|---|
| Goodness of Fit | 1 sample | 1 categorical | Does the distribution match a claimed model? | k − 1 |
| Homogeneity | Multiple (one per population) | 1 categorical | Same distribution across populations? | (r−1)(c−1) |
| Independence | 1 sample | 2 categorical | Are the two variables associated? | (r−1)(c−1) |
Expected Counts — Step by Step
Computing expected counts correctly is the most error-prone part of chi-square tests. Here is the logic.
Under H₀ (independence), the probability of being in a particular cell is:
\(P(\text{row } i \text{ AND col } j) = P(\text{row } i) \times P(\text{col } j) = \frac{\text{row}_i \text{ total}}{n} \times \frac{\text{col}_j \text{ total}}{n}\)
So the expected count in that cell is:
\(E_{ij} = n \times \frac{\text{row}_i \text{ total}}{n} \times \frac{\text{col}_j \text{ total}}{n} = \frac{(\text{row}_i \text{ total}) \times (\text{col}_j \text{ total})}{n}\)
Using the Exercise & Health table (n=200, Regular row total=90, Good Health col total=145):
\(E(\text{Regular, Good}) = \frac{90 \times 145}{200} = \frac{13{,}050}{200} = \mathbf{65.25}\)
This means: "If exercise and health were independent, we would expect 65.25 of the 200 adults to be both Regular exercisers and in Good health."
Note: Expected counts don't have to be whole numbers. That's fine — they are theoretical averages.
The row totals and column totals of your expected count table must match the row totals and column totals of the observed table.
Also: the sum of all \((O-E)^2/E\) terms in one row should roughly reflect how far that row's distribution is from expected — the largest contributions tell you which cells drive the significance.
Multiple Choice Questions
Try each question, then reveal the answer and full explanation.
A researcher surveys 500 randomly selected adults and records each person's political affiliation (Democrat, Republican, Independent) and their opinion on a new policy (Support, Oppose). Which chi-square test is most appropriate?
- A Goodness of fit — one variable tested against a claimed distribution
- B Homogeneity — multiple populations compared
- C Independence — one sample, two variables recorded
- D Two-sample z-test for proportions
- E One-sample z-test for a proportion
One sample (500 adults), two categorical variables (affiliation AND opinion) recorded for each person → test for independence. If the researcher had taken separate samples from each political group and compared opinions, it would be homogeneity.
In a 3×4 two-way table with grand total n = 240, the row total for row 2 is 80 and the column total for column 3 is 60. What is the expected count for that cell?
- A 4800
- B 20
- C 140
- D 0.333
- E 60
\(E = \frac{\text{row total} \times \text{col total}}{\text{grand total}} = \frac{80 \times 60}{240} = \frac{4800}{240} = \mathbf{20}\)
A chi-square goodness of fit test is performed on a variable with 6 categories. What is df?
- A 6
- B 5
- C 4
- D 12
- E 3
For GOF: df = k − 1 = 6 − 1 = 5. We lose one degree of freedom because all proportions must sum to 1 — once we know 5 of them, the 6th is determined.
A chi-square test yields χ² = 9.8 with df = 4. Which of the following correctly describes the p-value?
- A P(χ² < 9.8) with df = 4
- B 2 × P(χ² > 9.8) with df = 4
- C P(χ² > 9.8) with df = 4
- D P(Z > 9.8)
- E P(χ² = 9.8) with df = 4
The chi-square p-value is always the right-tail area — P(χ² > observed value). There are no two-sided or left-tailed chi-square tests. Large χ² values are evidence against H₀, so we always look in the right tail.
Random samples of 80 men and 80 women are each asked their preferred music genre. A chi-square test compares the genre distributions. Which test is this?
- A Goodness of fit
- B Independence — two variables from one sample
- C Homogeneity — two separate samples compared on one variable
- D Two-sample t-test
- E Goodness of fit with df = 1
Two separate random samples (one from men, one from women) are compared on one variable (music genre) → homogeneity. If a single sample of 160 people had both gender and genre recorded, it would be independence.
In a goodness of fit test with n = 30 and 5 equally likely categories, a student checks that all observed counts are ≥ 5 and declares the Large Counts condition met. What error did the student make?
- A No error — observing counts ≥ 5 is correct.
- B The student should check that observed counts ≥ 10.
- C The Large Counts condition requires all expected counts ≥ 5, not observed counts.
- D The Large Counts condition is not needed for goodness of fit tests.
- E The student should check n ≥ 30 instead.
The Large Counts condition requires all expected counts ≥ 5, not observed counts. In this case E = 30/5 = 6 for each category, so the condition IS met — but the student used the wrong counts to check it. Always compute E first, then verify E ≥ 5.
In a chi-square test, cell A has O = 30 and E = 20, and cell B has O = 22 and E = 20. Which cell contributes more to the χ² statistic, and why?
- A Cell B, because it is closer to the expected value
- B Cell A, because (O−E)²/E = 5.0 vs 0.2 for cell B
- C Cell A, because it has a larger observed count
- D They contribute equally since both have E = 20
- E Cell B, because its observed count is closer to n
Cell A: \(\frac{(30-20)^2}{20} = \frac{100}{20} = 5.0\). Cell B: \(\frac{(22-20)^2}{20} = \frac{4}{20} = 0.2\). Cell A contributes 25× more to χ²! The large discrepancy in Cell A (O is 50% above E) is what drives the test statistic. Identifying which cells contribute most is useful for interpreting results.
A chi-square test for independence yields χ² = 3.2 with df = 2 and p-value = 0.202. At α = 0.05, what is the correct conclusion?
- A Reject H₀; there is convincing evidence of an association.
- B Fail to reject H₀; there is not convincing evidence of an association.
- C Reject H₀; the two variables are independent.
- D Accept H₀; the two variables are definitely independent.
- E The test is inconclusive because χ² is too small.
p-value (0.202) > α (0.05) → fail to reject H₀. There is not convincing evidence of an association between the two variables. (C) incorrectly says we "reject" and simultaneously says the variables are independent — that's backwards. (D) uses "accept" — always wrong. We never prove independence; we simply lack evidence against it.
Free Response Questions
Use the 4-step procedure. State the correct test, check expected counts, show all work.
FRQ 1 — Goodness of Fit: Genetics
~12 minutes✓ Model Solution
(a) Hypotheses:
\(H_0\): Flower colors follow a 9:3:3:1 ratio: \(p_{purple}=\frac{9}{16},\; p_{red}=\frac{3}{16},\; p_{white}=\frac{3}{16},\; p_{yellow}=\frac{1}{16}\)
\(H_a\): At least one color proportion differs from the 9:3:3:1 ratio.
(b) Expected counts:
Purple: \(320 \times \frac{9}{16} = 180\) | Red: \(320 \times \frac{3}{16} = 60\) | White: \(320 \times \frac{3}{16} = 60\) | Yellow: \(320 \times \frac{1}{16} = 20\)
Large Counts: All expected counts (180, 60, 60, 20) ≥ 5. ✓
(c) Chi-square statistic:
\(\chi^2 = \frac{(176-180)^2}{180}+\frac{(52-60)^2}{60}+\frac{(54-60)^2}{60}+\frac{(38-20)^2}{20}\)
\(= \frac{16}{180}+\frac{64}{60}+\frac{36}{60}+\frac{324}{20} = 0.089+1.067+0.600+16.200 = \mathbf{17.956}\)
df = 4 − 1 = 3 | p-value = P(χ² > 17.956 | df=3) ≈ 0.0005
Since 0.0005 < 0.05, reject H₀. There is very convincing evidence that the flower color distribution does not follow the 9:3:3:1 Mendelian ratio.
(d) Largest contributor:
Yellow contributes \(\frac{(38-20)^2}{20} = 16.2\) out of 18.0 total — 90% of χ². The observed count of 38 yellow plants is nearly double the expected 20. This suggests yellow plants appear far more frequently than predicted by Mendelian genetics, which may indicate a flaw in the 9:3:3:1 model for yellow specifically.
✓ AP tip: Identifying the largest contributing cell and explaining what it means earns full credit on part (d). Say which cell and explain the direction (over or under-represented).
FRQ 2 — Chi-Square Test for Independence
~15 minutes| Sleep | Good | Average | Poor | Total |
|---|---|---|---|---|
| Yes (≥8 hrs) | 68 | 42 | 10 | 120 |
| No (<8 hrs) | 32 | 58 | 40 | 130 |
| Total | 100 | 100 | 50 | 250 |
✓ Model Solution
(a) Test type:
Chi-square test for independence. One random sample of 250 students was taken, and two categorical variables (sleep adequacy and academic performance) were recorded for each individual. This is not homogeneity because we did not take separate samples from sleep groups.
(b) Hypotheses and conditions:
\(H_0\): Sleep adequacy and academic performance are independent (no association) in the population of high school students.
\(H_a\): Sleep adequacy and academic performance are associated.
Random: Random sample stated ✓
Large Counts: (compute below — all ≥ 5 ✓)
Independent: 250 < 10% of all high school students ✓
(c) Expected counts, χ², p-value:
E(Yes, Good) = \(\frac{120×100}{250}=48\) E(Yes, Avg) = \(\frac{120×100}{250}=48\) E(Yes, Poor) = \(\frac{120×50}{250}=24\)
E(No, Good) = \(\frac{130×100}{250}=52\) E(No, Avg) = \(\frac{130×100}{250}=52\) E(No, Poor) = \(\frac{130×50}{250}=26\)
All expected ≥ 5 ✓ (smallest is 24)
\(\chi^2 = \frac{(68-48)^2}{48}+\frac{(42-48)^2}{48}+\frac{(10-24)^2}{24}+\frac{(32-52)^2}{52}+\frac{(58-52)^2}{52}+\frac{(40-26)^2}{26}\)
\(= \frac{400}{48}+\frac{36}{48}+\frac{196}{24}+\frac{400}{52}+\frac{36}{52}+\frac{196}{26}\)
\(= 8.333+0.750+8.167+7.692+0.692+7.538 = \mathbf{33.17}\)
df = (2−1)(3−1) = 2 | p-value = P(χ² > 33.17 | df=2) ≈ <0.0001
(d) Conclusion and direction:
Since p-value (<0.0001) < α (0.05), we reject H₀. There is very convincing evidence of an association between sleep adequacy and academic performance among high school students.
Direction: Students who get ≥8 hours of sleep are much more likely to report good academic performance (68 observed vs 48 expected) and less likely to report poor performance (10 vs 24). Students who get <8 hours are more likely to report average or poor performance. The pattern strongly suggests that getting sufficient sleep is associated with better academic outcomes.
✓ AP tips: (a) must explain why it's independence, not just name it. (c) show all 6 expected counts. (d) must describe the direction — which group had more/fewer than expected in which category.