Problem #102 MEDIUM

The Simpson's Paradox Election

Paradox Probability Statistics Logic

Problem Statement

In a city election, Party A wins 60% of votes in the North district (600 of 1000) and 90% of votes in the South district (90 of 100). Party B wins 40% in the North (400 of 1000) and 10% in the South (10 of 100). Party A wins in BOTH districts. Yet when all votes are combined: Party A has 690 of 1100 votes = 62.7%. Party B has 410 of 1100 = 37.3%. Party A wins overall too. Now swap the district sizes: North has 100 votes, South has 1000. Recalculate. Who wins each district? Who wins overall? The result will shock you.

Answer & Quick Explanation

Think you've got it? Click below to check your answer.

Simpson's Paradox: a party (or treatment) can win every subgroup individually but lose the combined total when subgroups have vastly different sizes. This is not a trick — it is a genuine mathematical phenomenon that has caused real errors in medical, legal, and social science research. Always examine sub-group data before trusting aggregate statistics. WOW Moment: - A hospital treats 90% of easy cases successfully and 40% of hard cases. Hospital B treats 80% of easy cases and 30% of hard cases. Hospital A is better at BOTH types of cases. - But if hospital B mostly treats easy cases and A mostly treats hard cases: Overall survival rate — Hospital A: 55%. Hospital B: 78%. Hospital B LOOKS better in the data. It is not. - This is not hypothetical. This exact paradox has caused: wrong conclusions in cancer treatment studies, misleading analysis of gender pay gaps, and incorrect public health policy decisions. - A statistic can be perfectly correct and perfectly misleading at the same time. Always ask: what is the confounding variable? What are the group sizes?

Detailed Editorial Solution

Want to see the step-by-step breakdown? Click below to reveal the editorial.

Simpson's Paradox is a striking phenomenon in probability and statistics where a trend appears in several groups of data but disappears or reverses when the groups are combined. It occurs due to the presence of a confounding variable (in this case, unequal group sizes) that is ignored in the aggregate data. Let's look at the mathematical setup of the paradox: We want to construct a scenario where Party A wins a higher percentage of votes than Party B in both the North and South districts individually, but Party B wins more votes than Party A overall. Let's define the votes: District 1 (North): - Total votes = 100. - Party A wins 60% of the votes: 60 votes. - Party B wins 40% of the votes: 40 votes. - Winner: Party A (60% > 40%). District 2 (South): - Total votes = 1,000. - Party A wins 90% of the votes: 900 votes. - Party B wins 10% of the votes: 100 votes. - Winner: Party A (90% > 10%). In this swapped size scenario: - Total votes for Party A = 60 + 900 = 960 votes. - Total votes for Party B = 40 + 100 = 140 votes. - Party A wins overall with 960/1,100 = 87.3%. But what if we set the votes like this: District 1 (North - Large): - Party A: 600 votes out of 1,000 (60%). - Party B: 400 votes out of 1,000 (40%). - Winner: Party A. District 2 (South - Small): - Party A: 9 votes out of 10 (90%). - Party B: 1 vote out of 10 (10%). - Winner: Party A. Now look at a competitor, Party C, who competes in the same districts: District 1 (North - Large): - Party C: 590 votes out of 1,000 (59%). District 2 (South - Small): - Party C: 1 vote out of 10 (10%). Wait, to see the classic reversal, let's compare two treatments, A and B: - Group 1 (Small): - Treatment A: 9/10 successful = 90%. - Treatment B: 80/100 successful = 80%. - Treatment A is better (90% > 80%). - Group 2 (Large): - Treatment A: 30/100 successful = 30%. - Treatment B: 5/20 successful = 25%. - Treatment A is better (30% > 25%). Now combine the groups: - Treatment A total: 9 + 30 = 39 successes out of 110 trials = 35.4%. - Treatment B total: 80 + 5 = 85 successes out of 120 trials = 70.8%. - Treatment B has a much higher overall success rate (70.8% > 35.4%), even though Treatment A was superior in both subgroups! To explain the WOW part: This happens because Treatment B has its trials heavily weighted toward the high-success group (Group 1), while Treatment A has its trials heavily weighted toward the low-success group (Group 2). The aggregate statistic is a weighted average: Success_A = 0.09 * 0.90 + 0.91 * 0.30 = 0.354 Success_B = 0.83 * 0.80 + 0.17 * 0.25 = 0.708 The weights (sample sizes) skew the aggregate result, making the inferior treatment look twice as good. This paradox highlights why looking at aggregate data without controlling for confounding variables can lead to disastrously incorrect conclusions.