
Healthcare AI creates value only when its impact can be measured in business and clinical terms. Too often, organizations invest in sophisticated models without a clear understanding of ROI, error cost, or whether simpler alternatives could perform just as well. In a HealthAI Collective lightning talk, Aaron Mackey outlines a practical framework for evaluating healthcare AI using baselines, cost of errors, and continuous ROI governance so leaders can make defensible investment decisions.
Aaron has seen the same executive conversation repeat across organizations. Leaders feel pressure to present an AI strategy to boards and investors, often before clarifying what success should look like. This creates a risk of moving directly into technology decisions without first defining the outcome that matters.
He redirects these conversations to first principles. Before discussing models or platforms, leaders must answer two questions.
Every credible AI initiative fits into one of four value categories: cost reduction, growth, efficiency, or experience. Without this clarity, teams chase sophisticated models that never translate into measurable outcomes.
An unclear value target leads to fragmented investments and models that do not tie to operational or clinical improvements. Future AI budgets then become harder to defend.

Many organizations treat AI ROI as a one-time estimate used during annual planning. This approach creates blind spots because AI systems and the environments around them change.
ROI must therefore be a living measure, recalculated every quarter.
Total cost of ownership:
Direct benefits:
Intangible benefits:
Quarterly ROI review helps executives determine whether an AI initiative is still the best use of capital or whether funds should be redeployed elsewhere.
This turns AI from an experimental project into a disciplined capital allocation process.

Teams celebrate high accuracy, strong AUC, or impressive F1 scores. Aaron has spent years explaining to executive teams why these numbers do not predict value. They do not capture the central question:
What does it cost when the model is wrong?
Healthcare carries serious consequences for both error types.
These produce unnecessary work, unnecessary tests, and unnecessary spend.
These represent missed diagnoses, missed risk signals, and missed reimbursement or revenue.
The problem is that most machine learning pipelines treat these errors as equal. Healthcare rarely does. Error costs are asymmetric, and ignoring this asymmetry hides the real business impact of a model.
A model with strong accuracy can still deliver negative ROI if it makes errors that cost more than the value it produces.
Most AI evaluation tools are designed for technologists, not executives. Metrics such as ROC curves or precision recall charts describe model behavior, but they do not explain business impact.
A cost curve reframes the discussion by plotting expected cost against real-world conditions such as prevalence and error trade-offs. Before any real models are evaluated, it establishes simple reference behaviors: always saying yes, always saying no, always being wrong, and the theoretical case of always being right. These baselines define the boundaries of what “good” performance actually means.
Each real AI model then appears as a line on the curve, showing how its cost changes as conditions change. A model that looks strong under one set of assumptions can become expensive under another, especially when false positives and false negatives have very different consequences.
For executives, the value is practical and immediate:
This reframes the decision from which model is more accurate to which model delivers the lowest cost at scale.
One of Aaron’s strongest recommendations is to start with a simple baseline. Before building complex systems, build something that takes an afternoon.
A baseline can be as simple as:
The surprising outcome is how often these simple baselines meet or exceed the business goal. When they do not, they still offer a grounded starting point for incremental investment.
When teams skip baselines, they overbuild, overspend, and delay integrating the workflow changes that create real adoption.
Before funding a more complex model, leaders should ask:
Baselines protect budgets by anchoring investment decisions in evidence rather than ambition. They also make governance easier by creating a clear standard against which future models are evaluated.
AI delivers value only when it is quantified. This requires discipline. Leaders must evaluate AI through ROI, error cost, and continuous refinement. Organizations that adopt this discipline will deploy fewer models but create significantly greater economic and clinical impact.
If your organization is defining AI value or establishing ROI governance, begin with the baseline and the cost of the first error. Everything else emerges from that clarity.
Aaron Mackey is a pharmaceutical data science leader specializing in advanced modeling, multi-omics, real-world data, and clinical trial optimization. He integrates diverse datasets to generate actionable insights and has led unified data and engineering teams at McKinsey, Lokavant, Sonata Therapeutics, and Roivant. He also mentors emerging computational scientists through teaching and invited talks.
Beyond the Algorithm: Quantifying AI’s Real-World Impact