Quantifying Healthcare AI Value: ROI, Error Cost, and Baseline-First Governance

Healthcare AI creates value only when its impact can be measured in business and clinical terms. Too often, organizations invest in sophisticated models without a clear understanding of ROI, error cost, or whether simpler alternatives could perform just as well. In a HealthAI Collective lightning talk, Aaron Mackey outlines a practical framework for evaluating healthcare AI using baselines, cost of errors, and continuous ROI governance so leaders can make defensible investment decisions.
Key Takeaways
- Value must be defined before any AI work begins.
- Accuracy metrics do not help executives make decisions.
- Error cost is the most important measure of real world AI performance.
- ROI must be recalculated quarterly to stay accurate.
- Baselines protect budgets and create a defensible starting point for investment.
What Healthcare Leaders Really Need From AI
Aaron has seen the same executive conversation repeat across organizations. Leaders feel pressure to present an AI strategy to boards and investors, often before clarifying what success should look like. This creates a risk of moving directly into technology decisions without first defining the outcome that matters.
He redirects these conversations to first principles. Before discussing models or platforms, leaders must answer two questions.
- What business or clinical problem are we solving?
- Which metric must change for this investment to be worthwhile?
Every credible AI initiative fits into one of four value categories: cost reduction, growth, efficiency, or experience. Without this clarity, teams chase sophisticated models that never translate into measurable outcomes.
Business stakes
An unclear value target leads to fragmented investments and models that do not tie to operational or clinical improvements. Future AI budgets then become harder to defend.
Decision criteria
- Can the problem be described without mentioning algorithms?
- Will solving it change a financial, operational, or clinical metric?
- Would we pursue the initiative even if AI did not exist?
How Healthcare AI ROI Should Actually Be Calculated

Many organizations treat AI ROI as a one-time estimate used during annual planning. This approach creates blind spots because AI systems and the environments around them change.
ROI must therefore be a living measure, recalculated every quarter.
The operational structure of ROI
Total cost of ownership:
- Team and talent
- Data preparation and acquisition
- Infrastructure and compute
- Deployment and monitoring
Direct benefits:
- The metric expected to improve
- How improvement translates to dollars
Intangible benefits:
- Brand value
- Competitive parity
- Investor confidence
Quarterly ROI review helps executives determine whether an AI initiative is still the best use of capital or whether funds should be redeployed elsewhere.
Executive decision framework
- Define the outcome and its metric
- Quantify TCO
- Estimate the financial value of improvement
- Recalculate ROI quarterly
- Compare against non-AI investments using NPV
This turns AI from an experimental project into a disciplined capital allocation process.
Why AI Accuracy Does Not Predict Business Value

Teams celebrate high accuracy, strong AUC, or impressive F1 scores. Aaron has spent years explaining to executive teams why these numbers do not predict value. They do not capture the central question:
What does it cost when the model is wrong?
Healthcare carries serious consequences for both error types.
False positives
These produce unnecessary work, unnecessary tests, and unnecessary spend.
False negatives
These represent missed diagnoses, missed risk signals, and missed reimbursement or revenue.
The problem is that most machine learning pipelines treat these errors as equal. Healthcare rarely does. Error costs are asymmetric, and ignoring this asymmetry hides the real business impact of a model.
Stakes
A model with strong accuracy can still deliver negative ROI if it makes errors that cost more than the value it produces.
Executive decision criteria
- What is the dollar cost of a false positive?
- What is the dollar cost of a false negative?
- Do the error costs differ significantly?
- Does the model outperform the existing workflow when evaluated in dollars instead of accuracy?
Cost Curves: A More Executive Friendly Way to Compare Models
Most AI evaluation tools are designed for technologists, not executives. Metrics such as ROC curves or precision recall charts describe model behavior, but they do not explain business impact.
A cost curve reframes the discussion by plotting expected cost against real-world conditions such as prevalence and error trade-offs. Before any real models are evaluated, it establishes simple reference behaviors: always saying yes, always saying no, always being wrong, and the theoretical case of always being right. These baselines define the boundaries of what “good” performance actually means.
Each real AI model then appears as a line on the curve, showing how its cost changes as conditions change. A model that looks strong under one set of assumptions can become expensive under another, especially when false positives and false negatives have very different consequences.
For executives, the value is practical and immediate:
- See whether a model actually outperforms naive strategies
- Understand how sensitive model value is to changes in prevalence or workflow context
- Compare models based on expected cost, not abstract accuracy
- Identify which model minimizes real-world risk and spend
This reframes the decision from which model is more accurate to which model delivers the lowest cost at scale.
Why Baseline-First Development Protects Healthcare AI Budgets
One of Aaron’s strongest recommendations is to start with a simple baseline. Before building complex systems, build something that takes an afternoon.
A baseline can be as simple as:
- A linear or logistic regression
- A simple rule set
- A clinician-defined heuristic
The surprising outcome is how often these simple baselines meet or exceed the business goal. When they do not, they still offer a grounded starting point for incremental investment.
Stakes
When teams skip baselines, they overbuild, overspend, and delay integrating the workflow changes that create real adoption.
Executive decision criteria
Before funding a more complex model, leaders should ask:
- Does the baseline achieve the required outcome?
- What incremental value would a more complex model provide?
- Is the additional ROI worth the cost, complexity, and risk?
Baselines protect budgets by anchoring investment decisions in evidence rather than ambition. They also make governance easier by creating a clear standard against which future models are evaluated.
Conclusion
AI delivers value only when it is quantified. This requires discipline. Leaders must evaluate AI through ROI, error cost, and continuous refinement. Organizations that adopt this discipline will deploy fewer models but create significantly greater economic and clinical impact.
If your organization is defining AI value or establishing ROI governance, begin with the baseline and the cost of the first error. Everything else emerges from that clarity.
About the Speaker
Aaron Mackey is a pharmaceutical data science leader specializing in advanced modeling, multi-omics, real-world data, and clinical trial optimization. He integrates diverse datasets to generate actionable insights and has led unified data and engineering teams at McKinsey, Lokavant, Sonata Therapeutics, and Roivant. He also mentors emerging computational scientists through teaching and invited talks.
Watch the Full Talk
Beyond the Algorithm: Quantifying AI’s Real-World Impact