Unit 3: Collecting Data | AP Statistics

Section 3.1

Observational Study vs. Experiment

Before collecting any data, you must decide how to collect it. This decision determines what conclusions you can legally draw — and it is one of the most heavily tested ideas in all of AP Statistics.

The Big Picture: Types of Studies

🔑 The Golden Rule

Only a well-designed, randomized experiment can establish a cause-and-effect relationship.

An observational study can show association, but never causation — no matter how strong or how large the study is.

📌 Example: Which is it?

Scenario A: Researchers track 10,000 coffee drinkers and non-drinkers for 20 years and compare their heart disease rates. → Observational study. No one was assigned to drink coffee.

Scenario B: Researchers randomly assign 200 patients to take either a new drug or a placebo and measure blood pressure after 3 months. → Experiment. The researcher assigned the treatments.

Census vs Sample

Term	Definition	Practical Reality
Population	The entire group of interest	All U.S. adults, all fish in a lake
Census	Collecting data from every member of the population	Very expensive, often impossible
Sample	A subset of the population selected for study	Practical and usually sufficient
Parameter	A number describing a population (e.g., μ, p)	Usually unknown — what we estimate
Statistic	A number describing a sample (e.g., x̄, p̂)	Calculated from data we collect

Section 3.2

Sampling Methods

The goal of sampling is to select a sample that represents the population without bias. Different methods have different strengths and weaknesses.

Five Major Sampling Methods

💡 Stratified vs Cluster — The Key Difference

Stratified: You sample from every group. Groups are homogeneous within, different across.

Cluster: You sample entire groups, and select only some groups. Groups look like mini-versions of the whole population. Think of school classrooms — each classroom is a "cluster" that represents the school.

Method	How	Best When	Weakness
SRS	Random draw — everyone equally likely	Small, accessible populations	May miss subgroups
Stratified	SRS within each subgroup (stratum)	Population has distinct subgroups	Must know strata in advance
Cluster	Randomly select whole groups	Population naturally in clusters	Higher variability than stratified
Systematic	Every kth after random start	Long lists, assembly lines	Periodic patterns cause bias
Voluntary Response	People self-select	Never — always biased	Over-represents strong opinions
Convenience	Whoever is easiest to reach	Never — always biased	Rarely represents population

Section 3.3

Sources of Bias in Sampling

Bias means the sampling method systematically favors certain outcomes. A biased sample produces results that consistently over- or under-estimate the true population value. Bias cannot be fixed by increasing sample size.

⚠️ Critical AP Fact

Larger samples do NOT fix bias. A biased method with 1,000,000 people is still biased. Only better sampling design removes bias. This is a classic AP trap question.

Type of Bias	Definition	Example
Undercoverage	Some members of the population are systematically excluded from the sample	Phone survey excludes people without phones; online survey excludes those without internet
Voluntary Response Bias	People with strong opinions are more likely to respond	Online poll about a controversial topic — only passionate people bother
Nonresponse Bias	Selected individuals don't respond, and non-responders differ from responders	Mailed survey — busy people don't respond, retired people do
Response Bias	Respondents give inaccurate answers due to question wording, interviewer presence, or social desirability	"Do you recycle as often as you should?" over-reports yes; asking about illegal behavior face-to-face
Question Wording Bias	Leading or loaded questions push respondents toward certain answers	"Do you support wasteful government spending?" vs "Do you support government investment in infrastructure?"

📌 Example: Identifying Bias

A magazine asks readers to mail in responses to a survey about whether they enjoy the magazine.

Bias type: Voluntary response bias AND undercoverage.

Only readers who feel strongly (usually those who love or hate it) will bother mailing back a response. People who are indifferent will not respond. The sample will not represent the typical reader's opinion.

Section 3.4

Experimental Design

An experiment imposes a treatment on subjects to observe the response. The key vocabulary is tested heavily on the AP exam.

Term	Definition	Example
Experimental Unit	The individual on which the experiment is performed	A patient, a plant, a car
Subject	Experimental unit that is a person	A student, a patient
Treatment	The specific condition applied to experimental units	Drug A, Drug B, placebo
Factor	An explanatory variable in the experiment	Type of fertilizer, dosage level
Level	The specific values of a factor	Low dose, medium dose, high dose
Response Variable	The outcome measured after treatment	Blood pressure, plant height
Control Group	Group receiving no treatment (or placebo)	Patients given a sugar pill
Placebo	An inactive treatment that looks like the real one	Sugar pill identical in appearance to the drug
Confounding Variable	A variable associated with both the explanatory and response variable that distorts results	Healthier people both exercise more AND eat better — hard to isolate exercise effect

Structure of a Randomized Controlled Experiment

Section 3.5

The Three Principles of Good Experiments

📐 The Three Principles: RCR

R

Randomization

Randomly assign subjects to treatment groups to eliminate confounding

C

Control

Keep all other variables the same across treatment groups

R

Replication

Use enough subjects so results are reliable and not due to chance

Blinding

Type	Who Doesn't Know the Treatment	Purpose
Single-blind	The subjects (patients) don't know if they got the drug or placebo	Eliminates placebo effect in subjects
Double-blind	Neither the subjects NOR the evaluators know who got which treatment	Eliminates both placebo effect and evaluator bias — gold standard

💡 Why Double-Blind?

If the doctor who measures "improvement" knows which patients got the real drug, they might unconsciously rate those patients higher. Double-blinding removes this bias from both ends — the patient and the evaluator.

Blocked Designs

A block is a group of experimental units that are similar in some way that might affect the response. By blocking, we control for known sources of variability.

🔑 Block vs Stratify

Stratified sampling (Unit 3 sampling) — groups used in selecting who to include in the study.

Blocking (experimental design) — groups used in assigning treatments within an experiment. Same idea, different context. Block on variables that might affect your results (sex, age, health status).

📌 Example: Blocked Design

A researcher tests whether a new fertilizer increases crop yield. She suspects soil type (clay vs sandy) matters.

Block by soil type: Within each soil type block, randomly assign plots to fertilizer vs no fertilizer.

This way, the comparison of fertilizer vs control is fair within each soil type — soil type cannot confound the results.

Matched Pairs Design

A special case of blocking where each "block" has exactly 2 units (or is the same person measured twice). Common designs:

Design Type	How It Works	Example
Before/After	Same person measured before and after treatment	Measure blood pressure before and after giving a drug to each patient
Paired individuals	Two very similar people paired; one gets treatment, one gets control	Pairs of identical twins — one twin gets new curriculum, the other gets old

Section 3.6

Drawing Conclusions — Scope of Inference

The scope of inference — what conclusions you can draw — depends on two things: (1) was there random selection? and (2) was there random assignment?

Scope of Inference: The 2×2 Framework

💡 AP Exam Language

"Can we generalize to the population?" → Only if random selection (sampling) was used.

"Can we conclude causation?" → Only if random assignment (experiment) was used.

These are two completely separate questions. An experiment with a convenience sample can show causation but only for those subjects — you can't generalize the results to all people.

Exam Practice

Multiple Choice Questions

Try each question, then reveal the answer.

MCQ · Q1 Observational vs Experiment

A researcher wants to determine whether listening to classical music improves concentration. She surveys 500 college students, asking how often they listen to classical music and their GPA. She finds that students who listen more frequently have higher GPAs. Which of the following is the most appropriate conclusion?

A Listening to classical music causes higher GPA.
B Higher GPA causes students to listen to more classical music.
C There is an association between classical music listening and GPA, but causation cannot be established.
D The study proves that classical music is beneficial for all students.
E No conclusions can be drawn because the sample size is too small.

✓ Correct Answer: C

This is an observational study — the researcher did not assign students to listen to classical music. She merely observed their habits. No matter how strong the association, an observational study cannot establish causation. There may be lurking variables: students who study more might both listen to more classical music and get better grades.

MCQ · Q2 Sampling Methods

A school district wants to survey students about cafeteria food quality. The district has 12 schools. Officials randomly select 3 schools, then survey every student in those 3 schools. What type of sampling method is this?

A Simple random sample
B Stratified random sample
C Systematic sample
D Cluster sample
E Voluntary response sample

✓ Correct Answer: D — Cluster Sample

The schools are the clusters. Three clusters were randomly selected, and then all members of those clusters were surveyed. This is the defining feature of cluster sampling — entire groups are selected. In stratified sampling, you would randomly select some students from each of the 12 schools.

MCQ · Q3 Bias

A polling company calls randomly selected landline phone numbers to survey adults about their opinions on a new tax policy. Which of the following is the most significant source of bias in this survey?

A Response bias, because people will lie about their opinions
B Undercoverage, because adults without landline phones are excluded
C Voluntary response bias, because people choose whether to answer
D Nonresponse bias, because the sample size is too small
E There is no bias because phone numbers are randomly selected

✓ Correct Answer: B — Undercoverage

Calling only landline numbers systematically excludes adults who only use cell phones — a large portion of the population, especially younger adults. This is undercoverage bias. The sampling frame (landline numbers) does not match the target population (all adults). Note that random selection within the frame doesn't fix the undercoverage of those outside the frame.

MCQ · Q4 Experimental Design

In a clinical trial, neither the patients nor the doctors measuring their outcomes know which patients received the new drug and which received the placebo. This design feature is called:

A Blocking
B Stratification
C Single-blind
D Double-blind
E Matched pairs

✓ Correct Answer: D — Double-blind

When both the subjects and the evaluators are unaware of the treatment assignment, the study is double-blind. This is the gold standard for clinical trials. Single-blind means only the subjects don't know. Double-blinding prevents both the placebo effect (from subjects) and measurement bias (from evaluators).

MCQ · Q5 Scope of Inference

Researchers randomly select 400 adults from a city's voter registration list and randomly assign each to one of two exercise programs. After 8 weeks, the group assigned to Program A shows significantly greater improvement in cardiovascular health. Which conclusion is best supported?

A Program A causes greater cardiovascular improvement in all adults everywhere.
B Program A is associated with greater improvement, but causation cannot be determined.
C Program A causes greater cardiovascular improvement; results can be generalized to registered voters in the city.
D Program A causes greater improvement only for the 400 adults in the study.
E No conclusion is possible because the study was not blinded.

✓ Correct Answer: C

There was random assignment (experiment) → we can conclude causation.
There was random selection from voter registration list → we can generalize to registered voters in the city.
However, we cannot generalize to "all adults everywhere" because the sample was only from one city's voter rolls — not all adults. (A) overgeneralizes. (B) wrongly denies causation. (D) ignores the random selection.

Exam Practice

Free Response Questions

Write your full solution before revealing. Unit 3 FRQs often ask you to design a study or identify flaws in one.

FRQ 1 — Design an Experiment

~15 minutes

A nutritionist wants to determine whether drinking green tea daily reduces fasting blood sugar levels in adults with pre-diabetes. She has 60 adult volunteers with pre-diabetes available for the study. The study will last 12 weeks.

(a)

Describe how you would design a completely randomized experiment to investigate this question. Include how you would assign subjects to treatments.

(b)

Explain why a control group receiving a placebo (a non-green-tea beverage) is necessary in this study.

(c)

The nutritionist is concerned that age might affect blood sugar levels differently for the two groups. Describe how you would modify your design to account for this.

(d)

Suppose the results show that the green tea group had significantly lower blood sugar after 12 weeks. Can you conclude that green tea causes lower blood sugar in all pre-diabetic adults? Explain.

✓ Model Solution

(a) Experimental Design:

Number the 60 volunteers 1–60. Use a random number generator (or random digit table) to randomly assign 30 volunteers to the treatment group (daily green tea) and the remaining 30 to the control group (non-green-tea beverage). Measure each participant's fasting blood sugar at the start and after 12 weeks. Compare the change in blood sugar between the two groups.

(b) Why a control group is necessary:

Without a control group, we cannot know if any change in blood sugar was due to the green tea itself or to other factors such as the placebo effect (participants believing they are being treated may change behavior), dietary changes during the study, natural changes in blood sugar over 12 weeks, or increased attention from researchers. The placebo control isolates the effect of green tea specifically.

(c) Blocking for age:

Divide the 60 volunteers into age blocks (e.g., under 50 and 50+). Within each age block, randomly assign half to the green tea group and half to the control group. This is a randomized block design. By blocking on age, we ensure both groups are balanced with respect to age, preventing age from confounding the results.

(d) Scope of inference:

Yes, because random assignment was used, we can conclude causation — the green tea caused the reduction in blood sugar for the subjects in this study. However, since the 60 volunteers were not randomly selected from all pre-diabetic adults (they were volunteers), we cannot generalize these results to all pre-diabetic adults. The conclusion is limited to people similar to those in the study.

✓ AP grading tips: (a) must mention randomization method and comparison. (b) must mention placebo effect. (c) must say "block" and explain the grouping. (d) must distinguish causation (yes — random assignment) from generalization (no — volunteers, not random sample).

FRQ 2 — Identifying Flaws in a Study

~10 minutes

A city wants to estimate the proportion of residents who support a new public transit expansion. A reporter stands outside a downtown subway station during the morning rush hour and asks commuters whether they support the expansion. Of the 120 commuters surveyed, 94% said yes.

(a)

Identify two sources of bias in this survey. For each, explain how it would affect the estimate.

(b)

A city official argues that since 120 people is a large enough sample, the results are reliable. Explain why this argument is flawed.

(c)

Describe a better sampling method to estimate the true proportion of city residents who support the expansion.

✓ Model Solution

(a) Two sources of bias:

1. Undercoverage: The survey only reaches people who commute through a downtown subway station. Residents who drive, live in suburbs, or don't use this station are completely excluded. These excluded residents may have very different opinions about transit expansion. This would likely cause the 94% figure to be an overestimate of support among all city residents.

2. Convenience/Location Bias (or Undercoverage): Surveying only during morning rush hour misses residents who commute at different times or don't commute at all (e.g., retirees, work-from-home residents, night-shift workers). Transit commuters already use the system, so they are more likely to support its expansion — further inflating the estimate.

(b) Why sample size doesn't fix bias:

The official's argument is flawed because increasing sample size cannot correct for bias in the sampling method. If the method systematically over-samples transit supporters, then surveying 1,200 or even 12,000 people at the same location would still produce a biased estimate. A larger biased sample just gives you a more precise estimate of the wrong thing. Only a better sampling design can remove bias.

(c) Better sampling method:

Obtain a complete list of all city residents (e.g., from voter registration, utility billing records, or address database). Use a simple random sample to select residents, then contact them by phone, mail, or in person. To reduce nonresponse bias, make multiple attempts to contact each selected resident. This gives every resident an equal chance of being included, reducing undercoverage and location bias.

✓ For full credit on (a): name the bias AND explain the direction (over- or under-estimate). For (b): explicitly say larger samples don't fix bias. For (c): describe a probability sampling method with a complete sampling frame.