Home About AP Statistics 🧮 Calculator

Unit 9: Inference for Slopes

Population Regression · t-Test for Slope · CI for Slope · Conditions · Computer Output

📊 3–6% of Exam ⏱ ~2 weeks

From Sample LSRL to Population Regression

In Unit 2 we computed the sample regression line \(\hat{y} = a + bx\). But that line is based on one sample — different samples give different slopes and intercepts. Unit 9 asks: what does the slope tell us about the true population relationship?

🔑 Population vs Sample Regression

Population model: \(\mu_y = \alpha + \beta x\)

where \(\beta\) = true population slope (unknown), \(\alpha\) = true intercept

Sample estimates: We use \(b\) to estimate \(\beta\), and \(a\) to estimate \(\alpha\).

The slope \(b\) from our sample is just one value from the sampling distribution of b — just like \(\bar{x}\) is one value from the sampling distribution of \(\bar{x}\).

Multiple Samples → Different LSRLs → Sampling Distribution of b x y True: μ_y=α+βx True β (unknown) Sample b values vary Sampling dist. of b β (true slope) centered at true β

Conditions for Inference on the Slope

The conditions for regression inference go beyond just "random" and "large sample." We must check four conditions, remembered with LINER:

📐 LINER Conditions for Regression Inference

L — Linear: The true relationship between x and y is linear. Check: scatter plot shows a linear pattern; residual plot shows random scatter (no curved pattern).

I — Independent: Individual observations are independent. Check: random sample and n ≤ 10% of population (if applicable).

N — Normal: For any fixed x, the y-values are Normally distributed. Check: histogram or Normal probability plot of residuals shows approximate Normality; no strong skew or outliers in residuals.

E — Equal Variance: The standard deviation of y-values is the same for all values of x. Check: residual plot shows roughly equal vertical spread across all x values (no "fan" shape).

R — Random: Data came from a random sample or randomized experiment.

Residual Plots: Checking LINER Conditions ✓ Good — Conditions Met Random scatter — Linear & Equal Variance ✓ ✗ Curved — Not Linear U-shape → relationship is nonlinear ✗ Fan — Unequal Variance Spread increases → unequal variance

t-Test for the Slope \(\beta\)

We use a t-test to determine whether the slope of the population regression line is different from zero (or some other value). A slope of zero means there is no linear relationship between x and y.

📐 Hypotheses and Test Statistic

Most common: \(H_0: \beta = 0\) vs \(H_a: \beta \neq 0\) (two-sided — is there any linear relationship?)

Or one-sided: \(H_a: \beta > 0\) or \(H_a: \beta < 0\)

\[ t = \frac{b - 0}{SE_b} \]

\(b\) = sample slope from LSRL  |  \(SE_b\) = standard error of the slope (from computer output)
\(df = n - 2\)   (lose 2 df because we estimate both \(\alpha\) and \(\beta\))

💡 Why df = n − 2?

In regression we estimate two parameters (\(\alpha\) and \(\beta\)), so we lose 2 degrees of freedom. Compare: one-sample t uses df = n−1 (estimates one parameter \(\mu\)).

📌 Example: t-Test for Slope

A study of 15 students finds the regression of exam score (y) on study hours (x): \(\hat{y} = 52.3 + 4.8x\), with \(SE_b = 1.92\). Test whether there is a positive linear relationship at α = 0.05.

Step 1 — Hypotheses: \(H_0: \beta = 0\) vs \(H_a: \beta > 0\) (one-sided right)

Step 2 — Conditions: Assume LINER conditions verified from scatter and residual plots ✓

Step 3 — Calculate: \(t = \frac{4.8 - 0}{1.92} = 2.50\)  |  df = 15 − 2 = 13

p-value = P(t > 2.50) with df=13 ≈ 0.013

Step 4 — Conclude: Since 0.013 < 0.05, reject H₀. There is convincing evidence of a positive linear relationship between study hours and exam score.

Confidence Interval for the Slope \(\beta\)

t-Interval for the Population Slope β
\[ b \pm t^* \cdot SE_b \]
\(b\) = sample slope  |  \(SE_b\) = standard error of slope (from computer output)
\(t^*\) from t-distribution with \(df = n - 2\)
📌 Example: CI for Slope

Using the same study: \(b = 4.8\), \(SE_b = 1.92\), \(n = 15\), df = 13.

For 95% CI: \(t^* = 2.160\) (df=13)

\(CI = 4.8 \pm 2.160(1.92) = 4.8 \pm 4.147\)

Interval: (0.653, 8.947)

Interpretation: We are 95% confident that for each additional hour of studying, the true mean exam score increases by between 0.653 and 8.947 points. Since the interval does not include 0, there is convincing evidence of a positive linear relationship.

💡 CI and Hypothesis Test Connection

If the 95% CI for \(\beta\) does not contain 0, then a two-sided test at α = 0.05 would reject \(H_0: \beta = 0\). If it contains 0, we fail to reject. This is the same CI-test duality from Unit 6.

Reading Computer Output

On the AP exam, regression inference is almost always presented through computer output. You must be able to extract the necessary values.

Typical AP Statistics Computer Output (Regression) Predictor Coef SE Coef T P-value Constant 52.34 5.821 8.99 0.000 Hours 4.812 1.923 2.502 0.026 S = 7.842 R-sq = 32.5% R-sq(adj) = 27.0% df = 13 (n − 2 = 15 − 2) n = 15 (sample size) r = √0.325 ≈ 0.570 Key Values to Find b = 4.812 sample slope SE_b = 1.923 std error of slope t = 2.502 test statistic p = 0.026 two-sided p-value S = 7.842 residual std dev R² = 32.5%
💡 What Each Output Value Means

Coef (slope row): This is \(b\) — the sample slope. Use it to write the LSRL equation.

SE Coef (slope row): This is \(SE_b\) — plug into the CI formula: \(b \pm t^* \cdot SE_b\).

T (slope row): The test statistic = b / SE_b. Already calculated for you.

P-value (slope row): Two-sided p-value for \(H_0: \beta = 0\). For one-sided, divide by 2.

S: Standard deviation of residuals — measures typical distance of points from the regression line.

R-sq: The coefficient of determination \(r^2\) — percent of variation in y explained by x.


Multiple Choice Questions

Try each question, then reveal the answer.

MCQ · Q1Hypotheses for Slope

A researcher fits a regression of crop yield (bushels) on fertilizer amount (pounds). She wants to test whether more fertilizer is associated with higher yield. What are the correct hypotheses?

  • A \(H_0: b = 0\) vs \(H_a: b > 0\)
  • B \(H_0: \beta = 0\) vs \(H_a: \beta > 0\)
  • C \(H_0: \beta = 0\) vs \(H_a: \beta \neq 0\)
  • D \(H_0: r = 0\) vs \(H_a: r > 0\)
  • E \(H_0: \beta > 0\) vs \(H_a: \beta = 0\)
✓ Correct Answer: B

Hypotheses use the population parameter \(\beta\), not the sample statistic \(b\). "Higher yield with more fertilizer" is a one-sided right test: \(H_a: \beta > 0\). \(H_0\) always uses equality. Hypotheses about \(r\) (choice D) are not standard AP Statistics procedure.

MCQ · Q2Reading Computer Output

Computer output for a regression shows: slope coefficient = 3.24, SE of slope = 1.08, n = 20. What is the t-statistic and degrees of freedom?

  • A t = 3.00, df = 20
  • B t = 3.00, df = 19
  • C t = 3.00, df = 18
  • D t = 0.333, df = 18
  • E t = 3.24, df = 18
✓ Correct Answer: C

\(t = b/SE_b = 3.24/1.08 = 3.00\). For regression inference, \(df = n - 2 = 20 - 2 = 18\). We lose 2 degrees of freedom because we estimate both the intercept and the slope.

MCQ · Q3LINER Conditions

A residual plot for a regression analysis shows a clear fan shape — the spread of residuals increases as x increases. Which LINER condition is violated?

  • A Linearity
  • B Independence
  • C Normality of residuals
  • D Equal variance
  • E Random sampling
✓ Correct Answer: D — Equal Variance

A fan shape (spread increasing with x) violates the Equal Variance condition (also called homoscedasticity). The residual spread should be roughly constant across all x values. A curved pattern would indicate violation of Linearity; random scatter with no pattern is ideal.

MCQ · Q4CI for Slope

A 95% confidence interval for the slope of a regression line is (−0.3, 2.1). Which conclusion is correct?

  • A We have convincing evidence that the slope is positive.
  • B We have convincing evidence that there is a linear relationship.
  • C We do not have convincing evidence of a linear relationship, since 0 is in the interval.
  • D The slope is definitely between −0.3 and 2.1.
  • E The true slope is 0.9 (the midpoint).
✓ Correct Answer: C

The CI (−0.3, 2.1) contains 0. This means β = 0 is plausible — we cannot conclude that a linear relationship exists. A two-sided test at α = 0.05 would fail to reject H₀: β = 0. (A) and (B) are wrong because 0 is in the interval.

MCQ · Q5Interpreting S and R²

Computer output shows S = 4.2 and R-sq = 68.5% for a regression of weight (kg) on height (cm). Which statement correctly interprets R-sq?

  • A The correlation between height and weight is 0.685.
  • B The predicted weight is within 4.2 kg of the actual weight 68.5% of the time.
  • C About 68.5% of the variation in weight is accounted for by the linear relationship with height.
  • D Height causes 68.5% of variation in weight.
  • E S = 4.2 means the slope is 4.2.
✓ Correct Answer: C

R² = 68.5% means 68.5% of variation in the response (weight) is explained by the linear relationship with height. Note: the correlation r = √0.685 ≈ 0.828 (not 0.685) — choice A confuses r and r². S = 4.2 is the standard deviation of residuals, not the slope.

Free Response Questions

FRQ 1 — Inference for Slope from Computer Output

~15 minutes
A researcher studies the relationship between the number of hours of sunlight per day (x) and the daily sales of ice cream (dollars, y) at a beach stand over 20 randomly selected days. The computer output is below:
Predictor    Coef    SE Coef    T    P
Constant    42.8     18.4     2.33   0.032
Sunlight    28.6      6.2     4.61   0.000

S = 32.4    R-sq = 54.2%    n = 20    df = 18
(a)
Write the equation of the LSRL. Interpret the slope in context.
(b)
Interpret R² in context.
(c)
Construct a 95% confidence interval for the slope. Use t* = 2.101 for df = 18. Interpret your interval.
(d)
Based on the p-value in the output, what conclusion would you draw about the linear relationship between sunlight and ice cream sales? Use α = 0.05.
✓ Model Solution

(a) LSRL and slope interpretation:

\(\hat{y} = 42.8 + 28.6x\)   or   \(\widehat{\text{sales}} = 42.8 + 28.6(\text{sunlight hours})\)

Slope interpretation: For each additional hour of sunlight per day, the predicted ice cream sales increase by $28.60, on average.


(b) R² interpretation:

About 54.2% of the variation in daily ice cream sales is accounted for by the linear relationship with hours of sunlight. The remaining 45.8% is due to other factors not included in the model.


(c) 95% CI for slope:

\(CI = b \pm t^* \cdot SE_b = 28.6 \pm 2.101(6.2) = 28.6 \pm 13.026\)

Interval: (15.574, 41.626)

Interpretation: We are 95% confident that for each additional hour of sunlight, the true mean increase in ice cream sales is between $15.57 and $41.63.


(d) Conclusion:

The p-value for the slope (0.000, which means very small, approximately 0.0001) is less than α = 0.05. We reject \(H_0: \beta = 0\). There is very convincing evidence of a positive linear relationship between hours of sunlight and ice cream sales. The CI (15.57, 41.63) also confirms this — it does not contain 0.

✓ AP tip: (a) must name the variables, not just write numbers. (b) always say "accounted for" not "caused." (c) use SE Coef from the slope row, not the constant row. (d) state the decision AND context.

← Unit 8: Chi-Square Tests All Units ↑