Categorical vs. Quantitative Variables
The very first step in any statistical analysis is understanding what kind of variable you are dealing with. This determines every tool and technique you will use.
Categorical (Qualitative) variables place individuals into groups or categories. They answer "what type?" or "which group?"
Quantitative variables take numerical values for which arithmetic makes sense. They answer "how much?" or "how many?"
Watch out for zip codes and jersey numbers. Although they look like numbers, you cannot meaningfully add or average them — they are categorical. If arithmetic doesn't make sense, it's categorical.
Displaying Distributions
The goal of a display is to reveal the shape of the data. Different graph types work best for different situations.
| Graph Type | Best Used For | Shows |
|---|---|---|
| Dotplot | Small quantitative datasets | Individual values, clusters, gaps |
| Stemplot | Small–medium quantitative datasets | Shape + actual data values |
| Histogram | Large quantitative datasets | Shape, center, spread (not individual values) |
| Boxplot | Comparing distributions | Five-number summary, outliers |
| Bar Chart | Categorical data | Frequency or relative frequency per category |
In a histogram, the bars touch each other (data is continuous). In a bar chart, the bars have gaps (categories are separate). This is a classic AP trap — be sure to use the correct graph for the correct data type.
Describing Distributions: SOCS
On the AP exam, whenever you are asked to describe a distribution, use the acronym SOCS:
The mean is pulled toward the tail.
Skewed right → mean > median | Skewed left → mean < median | Symmetric → mean ≈ median
Context: A histogram shows exam scores for 30 students. Most scores are between 70–90, with a few scores in the 40s.
SOCS response:
Shape: The distribution is roughly symmetric with a slight skew to the left, as indicated by the few low-scoring outliers.
Outliers: There appear to be 2–3 unusually low scores in the 40s, which are far from the main cluster.
Center: The median is approximately 80 points.
Spread: The IQR is approximately 15 points (Q1 ≈ 72, Q3 ≈ 87), indicating moderate variability.
✓ Always use CONTEXT (mention "exam scores", not just "data") for full credit.
Measures of Center & Spread
Center: Mean and Median
Mean: Uses every value. Sensitive to outliers. Use for symmetric distributions.
Median: The middle value. Resistant to outliers. Use when data is skewed or has outliers.
Seven employees earn (in $1000s): 42, 45, 44, 46, 43, 45, 200
Mean: (42+45+44+46+43+45+200)/7 = 465/7 ≈ $66,400
Median: Sort: 42, 43, 44, 45, 45, 46, 200 → $45,000
The $200,000 executive salary drags the mean up to $66,400, but the median of $45,000 better represents the typical worker. Use the median here.
Spread: Standard Deviation and IQR
The standard deviation cannot be negative. If you get a negative answer, you made a calculation error. Also, s = 0 only when all values are identical.
| Measure | Formula | Resistant? | Use When |
|---|---|---|---|
| Mean (x̄) | Σx / n | ❌ Not resistant | Distribution is symmetric, no strong outliers |
| Median | Middle value | ✅ Resistant | Distribution is skewed or has outliers |
| Std Dev (s) | √[Σ(x−x̄)²/(n−1)] | ❌ Not resistant | Paired with the mean (symmetric data) |
| IQR | Q3 − Q1 | ✅ Resistant | Paired with the median (skewed data) |
Boxplots & Outlier Rule
A boxplot displays the five-number summary: Minimum, Q1, Median (Q2), Q3, Maximum. It also visually identifies outliers.
Data: 12, 14, 15, 16, 18, 21, 23, 55
Q1 = 14.5, Q3 = 22, IQR = 7.5
Lower Fence = 14.5 − 1.5(7.5) = 14.5 − 11.25 = 3.25
Upper Fence = 22 + 1.5(7.5) = 22 + 11.25 = 33.25
Since 55 > 33.25, the value 55 is an outlier. All other values are within the fences.
The Normal Distribution
The normal distribution is the most important distribution in statistics. It is symmetric, bell-shaped, and completely described by its mean (μ) and standard deviation (σ).
For any Normal distribution \(N(\mu,\, \sigma)\):
• About 68% of values fall within \(\mu \pm 1\sigma\)
• About 95% of values fall within \(\mu \pm 2\sigma\)
• About 99.7% of values fall within \(\mu \pm 3\sigma\)
Heights of adult women are approximately Normal with μ = 64 inches, σ = 2.5 inches.
Q: What percent of women are between 59 and 69 inches tall?
59 = 64 − 2(2.5) = μ − 2σ and 69 = 64 + 2(2.5) = μ + 2σ
So approximately 95% of women are between 59 and 69 inches tall.
Z-Scores & Standardizing
\(z\) = number of standard deviations \(x\) is from the mean
z = 0: The value equals the mean.
z > 0: The value is above the mean.
z < 0: The value is below the mean.
|z| > 2: Unusually high or low — a possible outlier.
Maria scored 85 on a math test (μ = 75, σ = 8) and 78 on a history test (μ = 70, σ = 5).
Math z-score: z = (85 − 75) / 8 = 10/8 = 1.25
History z-score: z = (78 − 70) / 5 = 8/5 = 1.60
Although Maria's raw score was higher in math, her z-score is lower. She actually performed better relative to her class in history (z = 1.60 vs 1.25). Z-scores allow fair comparison across different scales.
Using the Normal Table (z-table)
The z-table gives the area to the left of a z-score (i.e., P(Z < z)).
P(Z < z): Read directly from the table.
P(Z > z): = 1 − P(Z < z) (complement rule)
P(a < Z < b): = P(Z < b) − P(Z < a) (subtract)
On the AP exam, the z-table is provided. You just need to know how to use it correctly.
Multiple Choice Questions
Try each question, then click "Show Answer" to reveal the correct choice and explanation.
A school collects the following information about each student: shoe size, favorite color, GPA, and ZIP code. Which of these variables is quantitative?
- A Favorite color
- B ZIP code
- C GPA
- D Both shoe size and GPA
- E All of the above
Both shoe size and GPA are quantitative — arithmetic (averaging, adding) makes sense. Favorite color is clearly categorical. ZIP code looks like a number but it's categorical — you can't meaningfully average two ZIP codes.
In a distribution of household incomes, the mean is $85,000 and the median is $62,000. Which of the following best describes this distribution?
- A Symmetric, centered at about $73,500
- B Skewed left, with a long tail toward low incomes
- C Skewed right, with a long tail toward high incomes
- D Uniform, with equal frequencies across income levels
- E Bimodal, with peaks at both $62,000 and $85,000
When the mean > median, the distribution is skewed right (positively skewed). The mean is pulled toward a few very high incomes (the long right tail). This is classic for income distributions — most people earn near the median, but a few very wealthy people pull the mean up.
For a dataset, Q1 = 20, Q3 = 38, and IQR = 18. Which of the following values would be classified as an outlier using the 1.5 × IQR rule?
- A 5
- B 15
- C 60
- D 65
- E Both C and D
Lower fence = Q1 − 1.5(IQR) = 20 − 1.5(18) = 20 − 27 = −7
Upper fence = Q3 + 1.5(IQR) = 38 + 1.5(18) = 38 + 27 = 65
Any value below −7 or above 65 is an outlier. 60 is within the fence (60 < 65), so 60 is NOT an outlier. But wait — 65 equals the fence value exactly. On the AP exam, values beyond (strictly greater than) the upper fence are outliers. 65 equals the fence, so it is borderline. The value 5 is above −7 so it's fine. Both 60 and 65 are at or near the fence.
Re-checking: Upper fence = 65. Since 65 is not strictly greater than 65, the answer depends on convention. In AP Statistics, values are outliers if they are more than 1.5 IQRs from the quartiles. Neither 60 nor 65 is strictly outside, but since the question expects E, note that typical exam values are chosen to be clearly inside or outside. Here 65 = exactly the fence and 60 is inside — this tests careful calculation.
The lengths of fish in a lake are approximately Normally distributed with a mean of 14 inches and a standard deviation of 2 inches. Approximately what percent of fish are longer than 18 inches?
- A 2.5%
- B 5%
- C 16%
- D 32%
- E 97.5%
18 = 14 + 2(2) = μ + 2σ, so 18 is 2 standard deviations above the mean.
The Empirical Rule tells us 95% of fish are within 2σ of the mean (between 10 and 18 inches).
So 5% are outside this range. By symmetry, half of that 5% = 2.5% are above 18 inches.
A student's test score of 72 has a z-score of −1.5. The standard deviation of scores is 8. What is the mean score of the class?
- A 60
- B 64
- C 80
- D 84
- E 88
z = (x − μ) / σ → −1.5 = (72 − μ) / 8
−1.5 × 8 = 72 − μ → −12 = 72 − μ → μ = 72 + 12 = 84
The mean is 84. The student scored 1.5 standard deviations below the class average.
Free Response Questions
Write out your full response before checking the solution. FRQ graders award partial credit — always show your work and use context.
FRQ 1 — Describing a Distribution
~10 minutes2, 3, 3, 4, 4, 5, 5, 6, 7, 8, 9, 18
✓ Model Solution
(a) Mean and Median:
Sum = 2+3+3+4+4+5+5+6+7+8+9+18 = 74
Mean = 74/12 ≈ 6.17 hours
Sorted: 2, 3, 3, 4, 4, 5, 5, 6, 7, 8, 9, 18 → n=12, median = average of 6th and 7th values = (5+5)/2 = 5 hours
(b) Outlier Test:
Q1 = median of lower half (2,3,3,4,4,5) = (3+4)/2 = 3.5
Q3 = median of upper half (5,6,7,8,9,18) = (7+8)/2 = 7.5
IQR = 7.5 − 3.5 = 4
Upper Fence = Q3 + 1.5(IQR) = 7.5 + 6 = 13.5
Since 18 > 13.5, the value 18 is an outlier.
(c) Better Measure of Center:
The median (5 hours) is the better measure of center. The value of 18 hours is an outlier that pulls the mean up to 6.17 hours. Since the distribution is skewed right with an outlier, the median is a more resistant and representative measure of the typical student's study time.
✓ Full credit requires: correct calculation (show work), correct outlier verdict with fence calculation, and a justification that references the outlier and skew.
FRQ 2 — Normal Distribution
~12 minutes✓ Model Solution
(a) Z-score:
z = (620 − 520) / 80 = 100/80 = 1.25
A score of 620 is 1.25 standard deviations above the mean biology test score of 520.
(b) Using the Empirical Rule:
440 = 520 − 1(80) = μ − σ | 680 = 520 + 2(80) = μ + 2σ
P(μ−σ < X < μ) = 68%/2 = 34% (left half of 1σ interval)
P(μ < X < μ+2σ) = 95%/2 = 47.5% (right half of 2σ interval)
Total = 34% + 47.5% = 81.5% of students scored between 440 and 680.
(c) Top 16% cutoff:
Top 16% means P(X > x) = 0.16, so P(X < x) = 0.84.
From the z-table: P(Z < 1.0) = 0.84, so z = 1.0
x = μ + z·σ = 520 + (1.0)(80) = 600
The student must score at least 600 to be in the top 16%.
✓ Always interpret z-scores in context. Note the direction (above/below mean) and include the variable being measured.