Home About AP Statistics 🧮 Calculator

Unit 3: Collecting Data

Sampling Methods · Sources of Bias · Experimental Design · Randomization · Drawing Conclusions

📊 12–15% of Exam ⏱ ~2–3 weeks

Observational Study vs. Experiment

Before collecting any data, you must decide how to collect it. This decision determines what conclusions you can legally draw — and it is one of the most heavily tested ideas in all of AP Statistics.

The Big Picture: Types of Studies
How is data collected? This determines your conclusions. Observational Study Researcher observes — does NOT assign Experiment Researcher deliberately assigns treatments ❌ Cannot establish causation Association only — lurking variables possible ✅ Can establish causation If properly randomized and controlled
🔑 The Golden Rule

Only a well-designed, randomized experiment can establish a cause-and-effect relationship.

An observational study can show association, but never causation — no matter how strong or how large the study is.

📌 Example: Which is it?

Scenario A: Researchers track 10,000 coffee drinkers and non-drinkers for 20 years and compare their heart disease rates. → Observational study. No one was assigned to drink coffee.

Scenario B: Researchers randomly assign 200 patients to take either a new drug or a placebo and measure blood pressure after 3 months. → Experiment. The researcher assigned the treatments.

Census vs Sample

TermDefinitionPractical Reality
PopulationThe entire group of interestAll U.S. adults, all fish in a lake
CensusCollecting data from every member of the populationVery expensive, often impossible
SampleA subset of the population selected for studyPractical and usually sufficient
ParameterA number describing a population (e.g., μ, p)Usually unknown — what we estimate
StatisticA number describing a sample (e.g., x̄, p̂)Calculated from data we collect

Sampling Methods

The goal of sampling is to select a sample that represents the population without bias. Different methods have different strengths and weaknesses.

Five Major Sampling Methods
Simple Random (SRS) Every individual equally likely to be chosen. Use: random number table Stratified Random Stratum 1 Stratum 2 Divide into groups (strata), SRS from each stratum. Reduces variability Cluster Sampling Divide into clusters, randomly select entire clusters. Cost-effective Systematic Sampling Random start, then every kth individual. Every 10th customer, every 3rd house ⚠ Beware periodic patterns in the population Voluntary Response (Non-probability — AVOID) 📢 People choose to respond. Strong feelings dominate responses. Always biased — never use for inference
💡 Stratified vs Cluster — The Key Difference

Stratified: You sample from every group. Groups are homogeneous within, different across.

Cluster: You sample entire groups, and select only some groups. Groups look like mini-versions of the whole population. Think of school classrooms — each classroom is a "cluster" that represents the school.

MethodHowBest WhenWeakness
SRSRandom draw — everyone equally likelySmall, accessible populationsMay miss subgroups
StratifiedSRS within each subgroup (stratum)Population has distinct subgroupsMust know strata in advance
ClusterRandomly select whole groupsPopulation naturally in clustersHigher variability than stratified
SystematicEvery kth after random startLong lists, assembly linesPeriodic patterns cause bias
Voluntary ResponsePeople self-selectNever — always biasedOver-represents strong opinions
ConvenienceWhoever is easiest to reachNever — always biasedRarely represents population

Sources of Bias in Sampling

Bias means the sampling method systematically favors certain outcomes. A biased sample produces results that consistently over- or under-estimate the true population value. Bias cannot be fixed by increasing sample size.

⚠️ Critical AP Fact

Larger samples do NOT fix bias. A biased method with 1,000,000 people is still biased. Only better sampling design removes bias. This is a classic AP trap question.

Type of BiasDefinitionExample
Undercoverage Some members of the population are systematically excluded from the sample Phone survey excludes people without phones; online survey excludes those without internet
Voluntary Response Bias People with strong opinions are more likely to respond Online poll about a controversial topic — only passionate people bother
Nonresponse Bias Selected individuals don't respond, and non-responders differ from responders Mailed survey — busy people don't respond, retired people do
Response Bias Respondents give inaccurate answers due to question wording, interviewer presence, or social desirability "Do you recycle as often as you should?" over-reports yes; asking about illegal behavior face-to-face
Question Wording Bias Leading or loaded questions push respondents toward certain answers "Do you support wasteful government spending?" vs "Do you support government investment in infrastructure?"
📌 Example: Identifying Bias

A magazine asks readers to mail in responses to a survey about whether they enjoy the magazine.

Bias type: Voluntary response bias AND undercoverage.

Only readers who feel strongly (usually those who love or hate it) will bother mailing back a response. People who are indifferent will not respond. The sample will not represent the typical reader's opinion.

Experimental Design

An experiment imposes a treatment on subjects to observe the response. The key vocabulary is tested heavily on the AP exam.

TermDefinitionExample
Experimental UnitThe individual on which the experiment is performedA patient, a plant, a car
SubjectExperimental unit that is a personA student, a patient
TreatmentThe specific condition applied to experimental unitsDrug A, Drug B, placebo
FactorAn explanatory variable in the experimentType of fertilizer, dosage level
LevelThe specific values of a factorLow dose, medium dose, high dose
Response VariableThe outcome measured after treatmentBlood pressure, plant height
Control GroupGroup receiving no treatment (or placebo)Patients given a sugar pill
PlaceboAn inactive treatment that looks like the real oneSugar pill identical in appearance to the drug
Confounding VariableA variable associated with both the explanatory and response variable that distorts resultsHealthier people both exercise more AND eat better — hard to isolate exercise effect
Structure of a Randomized Controlled Experiment
Available Subjects n = 200 RANDOM ASSIGN Group 1 Treatment A n = 100 Group 2 Control / Placebo n = 100 Measure Response Variable Compare Results

The Three Principles of Good Experiments

📐 The Three Principles: RCR
R
Randomization
Randomly assign subjects to treatment groups to eliminate confounding
C
Control
Keep all other variables the same across treatment groups
R
Replication
Use enough subjects so results are reliable and not due to chance

Blinding

TypeWho Doesn't Know the TreatmentPurpose
Single-blindThe subjects (patients) don't know if they got the drug or placeboEliminates placebo effect in subjects
Double-blindNeither the subjects NOR the evaluators know who got which treatmentEliminates both placebo effect and evaluator bias — gold standard
💡 Why Double-Blind?

If the doctor who measures "improvement" knows which patients got the real drug, they might unconsciously rate those patients higher. Double-blinding removes this bias from both ends — the patient and the evaluator.

Blocked Designs

A block is a group of experimental units that are similar in some way that might affect the response. By blocking, we control for known sources of variability.

🔑 Block vs Stratify

Stratified sampling (Unit 3 sampling) — groups used in selecting who to include in the study.

Blocking (experimental design) — groups used in assigning treatments within an experiment. Same idea, different context. Block on variables that might affect your results (sex, age, health status).

📌 Example: Blocked Design

A researcher tests whether a new fertilizer increases crop yield. She suspects soil type (clay vs sandy) matters.

Block by soil type: Within each soil type block, randomly assign plots to fertilizer vs no fertilizer.

This way, the comparison of fertilizer vs control is fair within each soil type — soil type cannot confound the results.

Matched Pairs Design

A special case of blocking where each "block" has exactly 2 units (or is the same person measured twice). Common designs:

Design TypeHow It WorksExample
Before/AfterSame person measured before and after treatmentMeasure blood pressure before and after giving a drug to each patient
Paired individualsTwo very similar people paired; one gets treatment, one gets controlPairs of identical twins — one twin gets new curriculum, the other gets old

Drawing Conclusions — Scope of Inference

The scope of inference — what conclusions you can draw — depends on two things: (1) was there random selection? and (2) was there random assignment?

Scope of Inference: The 2×2 Framework
Random Assignment? YES NO Random Selection? YES NO ✅ BEST Cause & effect Generalize to pop. Ideal experiment Generalize to pop. ❌ No causation Observational study with random sample ✅ Cause & effect ❌ Only for subjects Experiment with convenience sample ❌ No causation ❌ No generalize Weakest design
💡 AP Exam Language

"Can we generalize to the population?" → Only if random selection (sampling) was used.

"Can we conclude causation?" → Only if random assignment (experiment) was used.

These are two completely separate questions. An experiment with a convenience sample can show causation but only for those subjects — you can't generalize the results to all people.


Multiple Choice Questions

Try each question, then reveal the answer.

MCQ · Q1 Observational vs Experiment

A researcher wants to determine whether listening to classical music improves concentration. She surveys 500 college students, asking how often they listen to classical music and their GPA. She finds that students who listen more frequently have higher GPAs. Which of the following is the most appropriate conclusion?

  • A Listening to classical music causes higher GPA.
  • B Higher GPA causes students to listen to more classical music.
  • C There is an association between classical music listening and GPA, but causation cannot be established.
  • D The study proves that classical music is beneficial for all students.
  • E No conclusions can be drawn because the sample size is too small.
✓ Correct Answer: C

This is an observational study — the researcher did not assign students to listen to classical music. She merely observed their habits. No matter how strong the association, an observational study cannot establish causation. There may be lurking variables: students who study more might both listen to more classical music and get better grades.

MCQ · Q2 Sampling Methods

A school district wants to survey students about cafeteria food quality. The district has 12 schools. Officials randomly select 3 schools, then survey every student in those 3 schools. What type of sampling method is this?

  • A Simple random sample
  • B Stratified random sample
  • C Systematic sample
  • D Cluster sample
  • E Voluntary response sample
✓ Correct Answer: D — Cluster Sample

The schools are the clusters. Three clusters were randomly selected, and then all members of those clusters were surveyed. This is the defining feature of cluster sampling — entire groups are selected. In stratified sampling, you would randomly select some students from each of the 12 schools.

MCQ · Q3 Bias

A polling company calls randomly selected landline phone numbers to survey adults about their opinions on a new tax policy. Which of the following is the most significant source of bias in this survey?

  • A Response bias, because people will lie about their opinions
  • B Undercoverage, because adults without landline phones are excluded
  • C Voluntary response bias, because people choose whether to answer
  • D Nonresponse bias, because the sample size is too small
  • E There is no bias because phone numbers are randomly selected
✓ Correct Answer: B — Undercoverage

Calling only landline numbers systematically excludes adults who only use cell phones — a large portion of the population, especially younger adults. This is undercoverage bias. The sampling frame (landline numbers) does not match the target population (all adults). Note that random selection within the frame doesn't fix the undercoverage of those outside the frame.

MCQ · Q4 Experimental Design

In a clinical trial, neither the patients nor the doctors measuring their outcomes know which patients received the new drug and which received the placebo. This design feature is called:

  • A Blocking
  • B Stratification
  • C Single-blind
  • D Double-blind
  • E Matched pairs
✓ Correct Answer: D — Double-blind

When both the subjects and the evaluators are unaware of the treatment assignment, the study is double-blind. This is the gold standard for clinical trials. Single-blind means only the subjects don't know. Double-blinding prevents both the placebo effect (from subjects) and measurement bias (from evaluators).

MCQ · Q5 Scope of Inference

Researchers randomly select 400 adults from a city's voter registration list and randomly assign each to one of two exercise programs. After 8 weeks, the group assigned to Program A shows significantly greater improvement in cardiovascular health. Which conclusion is best supported?

  • A Program A causes greater cardiovascular improvement in all adults everywhere.
  • B Program A is associated with greater improvement, but causation cannot be determined.
  • C Program A causes greater cardiovascular improvement; results can be generalized to registered voters in the city.
  • D Program A causes greater improvement only for the 400 adults in the study.
  • E No conclusion is possible because the study was not blinded.
✓ Correct Answer: C

There was random assignment (experiment) → we can conclude causation.
There was random selection from voter registration list → we can generalize to registered voters in the city.
However, we cannot generalize to "all adults everywhere" because the sample was only from one city's voter rolls — not all adults. (A) overgeneralizes. (B) wrongly denies causation. (D) ignores the random selection.

Free Response Questions

Write your full solution before revealing. Unit 3 FRQs often ask you to design a study or identify flaws in one.

FRQ 1 — Design an Experiment

~15 minutes
A nutritionist wants to determine whether drinking green tea daily reduces fasting blood sugar levels in adults with pre-diabetes. She has 60 adult volunteers with pre-diabetes available for the study. The study will last 12 weeks.
(a)
Describe how you would design a completely randomized experiment to investigate this question. Include how you would assign subjects to treatments.
(b)
Explain why a control group receiving a placebo (a non-green-tea beverage) is necessary in this study.
(c)
The nutritionist is concerned that age might affect blood sugar levels differently for the two groups. Describe how you would modify your design to account for this.
(d)
Suppose the results show that the green tea group had significantly lower blood sugar after 12 weeks. Can you conclude that green tea causes lower blood sugar in all pre-diabetic adults? Explain.
✓ Model Solution

(a) Experimental Design:

Number the 60 volunteers 1–60. Use a random number generator (or random digit table) to randomly assign 30 volunteers to the treatment group (daily green tea) and the remaining 30 to the control group (non-green-tea beverage). Measure each participant's fasting blood sugar at the start and after 12 weeks. Compare the change in blood sugar between the two groups.


(b) Why a control group is necessary:

Without a control group, we cannot know if any change in blood sugar was due to the green tea itself or to other factors such as the placebo effect (participants believing they are being treated may change behavior), dietary changes during the study, natural changes in blood sugar over 12 weeks, or increased attention from researchers. The placebo control isolates the effect of green tea specifically.


(c) Blocking for age:

Divide the 60 volunteers into age blocks (e.g., under 50 and 50+). Within each age block, randomly assign half to the green tea group and half to the control group. This is a randomized block design. By blocking on age, we ensure both groups are balanced with respect to age, preventing age from confounding the results.


(d) Scope of inference:

Yes, because random assignment was used, we can conclude causation — the green tea caused the reduction in blood sugar for the subjects in this study. However, since the 60 volunteers were not randomly selected from all pre-diabetic adults (they were volunteers), we cannot generalize these results to all pre-diabetic adults. The conclusion is limited to people similar to those in the study.

✓ AP grading tips: (a) must mention randomization method and comparison. (b) must mention placebo effect. (c) must say "block" and explain the grouping. (d) must distinguish causation (yes — random assignment) from generalization (no — volunteers, not random sample).

FRQ 2 — Identifying Flaws in a Study

~10 minutes
A city wants to estimate the proportion of residents who support a new public transit expansion. A reporter stands outside a downtown subway station during the morning rush hour and asks commuters whether they support the expansion. Of the 120 commuters surveyed, 94% said yes.
(a)
Identify two sources of bias in this survey. For each, explain how it would affect the estimate.
(b)
A city official argues that since 120 people is a large enough sample, the results are reliable. Explain why this argument is flawed.
(c)
Describe a better sampling method to estimate the true proportion of city residents who support the expansion.
✓ Model Solution

(a) Two sources of bias:

1. Undercoverage: The survey only reaches people who commute through a downtown subway station. Residents who drive, live in suburbs, or don't use this station are completely excluded. These excluded residents may have very different opinions about transit expansion. This would likely cause the 94% figure to be an overestimate of support among all city residents.

2. Convenience/Location Bias (or Undercoverage): Surveying only during morning rush hour misses residents who commute at different times or don't commute at all (e.g., retirees, work-from-home residents, night-shift workers). Transit commuters already use the system, so they are more likely to support its expansion — further inflating the estimate.


(b) Why sample size doesn't fix bias:

The official's argument is flawed because increasing sample size cannot correct for bias in the sampling method. If the method systematically over-samples transit supporters, then surveying 1,200 or even 12,000 people at the same location would still produce a biased estimate. A larger biased sample just gives you a more precise estimate of the wrong thing. Only a better sampling design can remove bias.


(c) Better sampling method:

Obtain a complete list of all city residents (e.g., from voter registration, utility billing records, or address database). Use a simple random sample to select residents, then contact them by phone, mail, or in person. To reduce nonresponse bias, make multiple attempts to contact each selected resident. This gives every resident an equal chance of being included, reducing undercoverage and location bias.

✓ For full credit on (a): name the bias AND explain the direction (over- or under-estimate). For (b): explicitly say larger samples don't fix bias. For (c): describe a probability sampling method with a complete sampling frame.

← Unit 2: Two-Variable Data Unit 4: Probability →