Quantifying Healthcare AI Value: ROI, Error Cost, and Baseline-First Governance

Aaron Mackey
Insights from Aaron MackeyDecember 30, 2025

Healthcare AI creates value only when its impact can be measured in business and clinical terms. Too often, organizations invest in sophisticated models without a clear understanding of ROI, error cost, or whether simpler alternatives could perform just as well. In a HealthAI Collective lightning talk, Aaron Mackey outlines a practical framework for evaluating healthcare AI using baselines, cost of errors, and continuous ROI governance so leaders can make defensible investment decisions.

Key Takeaways

  • Value must be defined before any AI work begins.
  • Accuracy metrics do not help executives make decisions.
  • Error cost is the most important measure of real world AI performance.
  • ROI must be recalculated quarterly to stay accurate.
  • Baselines protect budgets and create a defensible starting point for investment.

What Healthcare Leaders Really Need From AI

Aaron has seen the same executive conversation repeat across organizations. Leaders feel pressure to present an AI strategy to boards and investors, often before clarifying what success should look like. This creates a risk of moving directly into technology decisions without first defining the outcome that matters.

He redirects these conversations to first principles. Before discussing models or platforms, leaders must answer two questions.

  1. What business or clinical problem are we solving?
  2. Which metric must change for this investment to be worthwhile?

Every credible AI initiative fits into one of four value categories: cost reduction, growth, efficiency, or experience. Without this clarity, teams chase sophisticated models that never translate into measurable outcomes.

Business stakes

An unclear value target leads to fragmented investments and models that do not tie to operational or clinical improvements. Future AI budgets then become harder to defend.

Decision criteria

  • Can the problem be described without mentioning algorithms?
  • Will solving it change a financial, operational, or clinical metric?
  • Would we pursue the initiative even if AI did not exist?
“At the end of the day, what my CEO is looking for is not a whole bunch of code. What they're looking for is value. They're looking for outcomes that that code enables.”

How Healthcare AI ROI Should Actually Be Calculated

Article image

Many organizations treat AI ROI as a one-time estimate used during annual planning. This approach creates blind spots because AI systems and the environments around them change.

ROI must therefore be a living measure, recalculated every quarter.

The operational structure of ROI

Total cost of ownership:

  • Team and talent
  • Data preparation and acquisition
  • Infrastructure and compute
  • Deployment and monitoring

Direct benefits:

  • The metric expected to improve
  • How improvement translates to dollars

Intangible benefits:

  • Brand value
  • Competitive parity
  • Investor confidence

Quarterly ROI review helps executives determine whether an AI initiative is still the best use of capital or whether funds should be redeployed elsewhere.

Executive decision framework

  1. Define the outcome and its metric
  2. Quantify TCO
  3. Estimate the financial value of improvement
  4. Recalculate ROI quarterly
  5. Compare against non-AI investments using NPV

This turns AI from an experimental project into a disciplined capital allocation process.

Why AI Accuracy Does Not Predict Business Value

Article image


Teams celebrate high accuracy, strong AUC, or impressive F1 scores. Aaron has spent years explaining to executive teams why these numbers do not predict value. They do not capture the central question:

What does it cost when the model is wrong?

Healthcare carries serious consequences for both error types.

False positives

These produce unnecessary work, unnecessary tests, and unnecessary spend.

False negatives

These represent missed diagnoses, missed risk signals, and missed reimbursement or revenue.

The problem is that most machine learning pipelines treat these errors as equal. Healthcare rarely does. Error costs are asymmetric, and ignoring this asymmetry hides the real business impact of a model.

“It's not about accuracy. It's about what's the problem that we're trying to solve.”

Stakes

A model with strong accuracy can still deliver negative ROI if it makes errors that cost more than the value it produces.

Executive decision criteria

  • What is the dollar cost of a false positive?
  • What is the dollar cost of a false negative?
  • Do the error costs differ significantly?
  • Does the model outperform the existing workflow when evaluated in dollars instead of accuracy?

Cost Curves: A More Executive Friendly Way to Compare Models

Most AI evaluation tools are designed for technologists, not executives. Metrics such as ROC curves or precision recall charts describe model behavior, but they do not explain business impact.

A cost curve reframes the discussion by plotting expected cost against real-world conditions such as prevalence and error trade-offs. Before any real models are evaluated, it establishes simple reference behaviors: always saying yes, always saying no, always being wrong, and the theoretical case of always being right. These baselines define the boundaries of what “good” performance actually means.

Each real AI model then appears as a line on the curve, showing how its cost changes as conditions change. A model that looks strong under one set of assumptions can become expensive under another, especially when false positives and false negatives have very different consequences.

For executives, the value is practical and immediate:

  • See whether a model actually outperforms naive strategies
  • Understand how sensitive model value is to changes in prevalence or workflow context
  • Compare models based on expected cost, not abstract accuracy
  • Identify which model minimizes real-world risk and spend

This reframes the decision from which model is more accurate to which model delivers the lowest cost at scale.

Why Baseline-First Development Protects Healthcare AI Budgets

One of Aaron’s strongest recommendations is to start with a simple baseline. Before building complex systems, build something that takes an afternoon.

A baseline can be as simple as:

  • A linear or logistic regression
  • A simple rule set
  • A clinician-defined heuristic

The surprising outcome is how often these simple baselines meet or exceed the business goal. When they do not, they still offer a grounded starting point for incremental investment.

Stakes

When teams skip baselines, they overbuild, overspend, and delay integrating the workflow changes that create real adoption.

Executive decision criteria

Before funding a more complex model, leaders should ask:

  • Does the baseline achieve the required outcome?
  • What incremental value would a more complex model provide?
  • Is the additional ROI worth the cost, complexity, and risk?

Baselines protect budgets by anchoring investment decisions in evidence rather than ambition. They also make governance easier by creating a clear standard against which future models are evaluated.

Conclusion

AI delivers value only when it is quantified. This requires discipline. Leaders must evaluate AI through ROI, error cost, and continuous refinement. Organizations that adopt this discipline will deploy fewer models but create significantly greater economic and clinical impact.

If your organization is defining AI value or establishing ROI governance, begin with the baseline and the cost of the first error. Everything else emerges from that clarity.

About the Speaker

Aaron Mackey is a pharmaceutical data science leader specializing in advanced modeling, multi-omics, real-world data, and clinical trial optimization. He integrates diverse datasets to generate actionable insights and has led unified data and engineering teams at McKinsey, Lokavant, Sonata Therapeutics, and Roivant. He also mentors emerging computational scientists through teaching and invited talks.

Watch the Full Talk

Beyond the Algorithm: Quantifying AI’s Real-World Impact