A toy model of how algorithms can treat groups differently even when the math “looks neutral.” Adjust thresholds and base rates for two groups and watch common fairness metrics shift.
Predict who will repay a loan. Explore how different approval thresholds change denial rates by group.
Rank job applicants and decide who gets interviews. See how small shifts change who gets screened out.
Flag “high-risk” patients for extra care. Examine who is over- or under-flagged across groups.
Each group has a different base rate (how often the true outcome is present in that population) and a decision threshold (how strict the algorithm is about saying “yes”). In real systems, these differences can come from data quality, historical bias, or explicit policy choices.
Group A Group B We simulate 10,000 people per group and approximate fairness metrics from that toy data.
In this scenario, a positive prediction means “approved for a loan.” Ideally, we want equal true positive rates and false positive rates across groups, but that rarely happens automatically.
—
Share of each group getting a positive prediction.
—
Difference in true positive rates (TPR) between groups.
—
Difference in false positive rates (FPR) between groups.
—
Which group shoulders more “unfair” errors (false negatives or false positives), given this scenario.