AcademyCase Studies

Method 9 Academy

AI Governance Case Studies

The following case studies examine real and illustrative AI governance failures across industries. Each case identifies the specific governance gap that created risk and articulates what effective governance would have looked like.

CRIMINAL JUSTICE

COMPAS Recidivism Scoring

Bias and Opacity in High-Stakes Decisions Became a National Case Study

ProPublica’s investigation into COMPAS — a recidivism risk-scoring tool used in Florida courts — ignited a national debate about racial bias, opacity, and the appropriate role of AI in consequential government decisions.

Algorithmic BiasTransparencyHigh-Risk AIPublic Sector

What Happened

ProPublica analyzed COMPAS scores in Broward County and found that Black defendants were disproportionately labeled as higher risk among those who did not go on to reoffend, while white defendants who did reoffend were more often labeled lower risk. Northpointe (the tool’s developer) disputed the methodology and framing. The academic debate over which fairness metric applies continues, but the case permanently elevated questions about how risk scores enter sentencing, bail, and parole decisions.

Governance Gap

High-stakes AI systems were deployed without adequate transparency for affected individuals, without public documentation of how fairness was defined and measured, and without clear constraints on how scores could or should influence judicial decisions. There was no meaningful mechanism for defendants to understand or challenge their scores, and no ongoing monitoring for disparate impact after deployment.

HUMAN RESOURCES

Biased Automated Screening Tool

Historical Training Data Reproduced Demographic Bias at Scale

A hiring model trained on historical applicant data began systematically filtering out qualified candidates from certain demographic groups — not because of explicit criteria, but because the data it learned from encoded past biases.

Hiring BiasTraining DataBias TestingModel Monitoring

What Happened

A company implemented an AI-driven resume screening tool to handle high application volumes. The model, trained on resumes of successful hires from the previous decade, learned to favor candidates whose profiles matched the existing (predominantly male/certain background) workforce. It began downgrading resumes that mentioned women’s colleges or specific extracurriculars associated with underrepresented groups, despite those candidates being objectively qualified.

Governance Gap

The training data was not audited for historical bias before use. There was no pre-deployment testing for disparate impact across protected classes. The system lacked a ‘human-in-the-loop’ audit process to verify why qualified candidates were being rejected, and there was no clear owner accountable for the model’s ethical performance.

AVIATION / CONSUMER

Air Canada Chatbot

Misinformation Created Legal Liability

Air Canada’s website chatbot gave a passenger incorrect guidance about bereavement fares, leading to a legal ruling that the airline was responsible for its AI’s statements — a landmark case for AI accountability in customer service.

Chatbot LiabilityCustomer-Facing AINegligent Misrepresentation

What Happened

A grieving passenger asked Air Canada’s chatbot about bereavement pricing and received inaccurate instructions. When the passenger acted on that information and was denied the discount, he filed a complaint. The BC Civil Resolution Tribunal ruled that Air Canada was liable for negligent misrepresentation, ordering compensation. The airline’s attempt to argue that the chatbot was a separate entity responsible for its own statements was rejected.

Governance Gap

The chatbot had no effective mechanism to ensure its answers reflected authoritative, current company policy. There was no human review layer for customer-facing AI responses, no audit trail for chatbot interactions, and no clear accountability structure for representations made by automated systems.

HEALTHCARE

Clinical Triage Automation

Proxy Variables Led to Disparate Health Outcomes

An algorithm designed to identify high-risk patients for care management programs inadvertently prioritized healthier white patients over sicker Black patients by using ‘past healthcare spending’ as a proxy for ‘health need.’

Healthcare EquityProxy VariablesClinical Safety

What Happened

A large health system used an algorithm to automate triage for chronic disease management. The model used historical cost data as a predictor of future health needs. Because systemic barriers resulted in lower historical spending for Black patients regardless of health status, the model incorrectly concluded they were ‘healthier’ and required less intervention than white patients with similar conditions.

Governance Gap

The choice of ‘cost’ as a proxy for ‘need’ was not vetted for its potential to introduce systemic bias. Governance failed to include clinical and equity experts in the model design phase. There was no validation of the model’s outputs against actual clinical outcomes for different demographic groups before full-scale rollout.

CYBERSECURITY / PRIVACY

Sensitive Data Exposure via Third-Party AI Tool

Unmanaged Use of Public LLMs Leaked Proprietary Source Code

Employees at a technology firm used a public generative AI tool to debug proprietary code and summarize internal strategy documents, inadvertently feeding sensitive corporate data into the tool’s training set.

Data PrivacyShadow AIThird-Party Risk

What Happened

Engineers seeking to optimize their workflow pasted snippets of sensitive, pre-release source code into a popular public LLM to identify bugs. Later, it was discovered that the tool’s provider could use this data for training, potentially exposing trade secrets to competitors or the public. This led to an immediate ban on public AI tools and a massive internal data audit.

Governance Gap

The organization lacked a clear policy on ‘Shadow AI’ and the use of third-party generative tools. There was no technical control (like a DLP or a private API gateway) to prevent sensitive data from leaving the corporate perimeter. Governance failed to provide employees with safe, approved alternatives for AI-assisted productivity.

LEGAL / INTELLECTUAL PROPERTY

Generative Content IP Risk

Marketing Campaign Stalled by Copyright Uncertainty

A global brand’s major advertising campaign was halted at the last minute when it was discovered that the core visual assets were generated by AI without a clear chain of title or copyright protection.

IP OwnershipMarketing RiskCopyright Law

What Happened

A creative agency used generative AI to produce ‘original’ artwork for a client’s global product launch. During the final legal review, it was determined that the assets could not be copyrighted under current laws, and there was a risk of ‘style-theft’ claims from original artists whose work influenced the model. The client had to scrap the campaign and re-shoot with human artists, resulting in millions in lost revenue and delayed launch.

Governance Gap

The procurement process for creative services did not include disclosures or warranties regarding the use of generative AI. There was no legal framework within the company to assess the protectability of AI-generated assets. Governance failed to align creative output with intellectual property strategy.

PROFESSIONAL SERVICES

Internal Tool Hallucinations in Client Work Product

Unverified AI Output Undermined Professional Credibility

A consultancy included fabricated data and non-existent legal citations in a high-stakes client report after an associate relied on an internal AI research assistant without performing manual verification.

Professional LiabilityHallucination RiskQuality Control

What Happened

To meet a tight deadline, a project team used an internal AI tool to summarize complex regulatory filings. The AI ‘hallucinated’ several key precedents and statistics that sounded plausible but were entirely false. These errors were included in the final deliverable to the client. The client discovered the inaccuracies, leading to a loss of trust, a fee refund, and significant reputational damage to the firm.

Governance Gap

The firm had no ‘human-in-the-loop’ verification requirement for AI-generated content used in client deliverables. There was a lack of training for staff on the limitations of LLMs, specifically regarding hallucinations. Governance failed to update standard operating procedures to include mandatory fact-checking for AI-assisted research.

FINANCIAL SERVICES

Apple Card Credit Algorithm

Opacity and Perceived Bias Triggered Regulator Scrutiny

When the Apple Card launched, users reported that women were receiving significantly lower credit limits than their husbands despite similar financial profiles, leading to a major investigation into algorithmic bias.

Algorithmic BiasFinancial RegulationModel Explainability

What Happened

High-profile users, including Steve Wozniak, reported that the Apple Card’s credit limit algorithm discriminated against women. While an investigation by the NY Department of Financial Services eventually found no intentional violation of fair lending laws, the incident caused massive reputational damage and highlighted the risks of using ‘black box’ models for critical financial decisions.

Governance Gap

The primary gap was a lack of algorithmic transparency and proactive bias testing. The bank (Goldman Sachs) could not immediately explain how specific credit decisions were reached, and the governance framework failed to anticipate how proxy variables might lead to disparate impacts even if gender wasn’t a direct input.

REAL ESTATE / PROPTECH

Zillow iBuying Failure

Model Drift and Market Miscalculation

Zillow was forced to shut down its home-buying business and lay off 25% of its staff after its ‘Zestimate’ algorithm failed to accurately predict housing prices in a volatile market.

Model DriftInventory RiskHuman-in-the-loop

What Happened

Zillow relied heavily on its Zestimate algorithm to flip houses. However, the model failed to account for rapid market shifts and labor shortages. The company ended up overpaying for thousands of homes that it couldn’t sell for a profit, leading to a $304 million inventory write-down in a single quarter.

Governance Gap

The failure stemmed from a lack of robust model monitoring and ‘circuit breakers.’ Governance should have required stricter human-in-the-loop validation for high-value transactions and more frequent stress-testing of the model against outlier market conditions. There was a disconnect between the data science team and the operational realities of the real estate market.

HUMAN RESOURCES

Amazon AI Recruiting Tool

Historical Bias in Training Data

Amazon abandoned an experimental AI recruiting tool after discovering it was biased against women because it was trained on resumes submitted to the company over a 10-year period when most applicants were men.

Training Data BiasHR TechEthical AI

What Happened

The system taught itself that male candidates were preferable. It penalized resumes that included the word ‘women’s’ (as in ‘women’s chess club captain’) and downgraded graduates of two all-women’s colleges. Despite attempts to fix the bias, Amazon eventually realized the tool could not be made neutral and scrapped the project.

Governance Gap

The governance failure occurred at the data selection stage. There was no rigorous audit of the training data for historical bias before model development. Effective governance would have required a ‘bias impact assessment’ and ongoing testing for disparate impact across protected groups during the prototyping phase.

Build Better AI Governance with Method 9

Most AI failures cluster into a few predictable governance breakdowns: unclear accountability, lack of transparency/explainability, missing monitoring, and weak data lineage/traceability. Method 9 operationalizes these into repeatable controls: defined owners, documented intended use, pre-deployment checks, continuous monitoring, and audit-ready evidence artifacts that stand up when an incident happens.

Schedule a Consultation Today