Causal ML in Healthcare: Beyond Prediction to Prevention

Every day, surgical departments face critical decisions that impact patient lives. Imagine a scenario: A hospital’s AI system predicts higher complications for morning surgeries, leading administrators to recommend moving all complex procedures to afternoon slots. But this decision, based on traditional machine learning, might actually harm patients. This article will explain why — and how a new approach called Causal Machine Learning (Causal ML) could transform healthcare decision-making for the better.

Risk prediction

There is a lot of excitement about AI-driven risk prediction. It has the potential to:

Reduce prevalence and severity of adverse outcomes through early intervention
Enable informed patient-doctor discussions about treatment options
Optimise resource allocation and operational efficiency

While traditional Machine Learning excels at finding patterns in medical data, it identifies correlations but does not determine true cause and effect. This limitation can lead to false and potentially dangerous conclusions in healthcare settings.

Consider our fictional hospital MorningStar. Their ML model accurately showed worse outcomes for morning surgeries. However, this wasn’t because morning surgeries were inherently less safe. Rather, the hospital policy was to schedule complex, high-risk cases in the morning when teams were fresh. The AI had discovered a correlation but missed the underlying causal relationship — leading to potentially misleading recommendations.

In actual fact, complexity affects both time-of-day, the treatment and the complication rate, the outcome. It is a confounder.

Causal graph showing the relationship between complexity, time-of-day, and outcomes

The model learnt to maximise accuracy by using time-of-day, and that works really well for MorningStar. But now imagine another hospital AllStar, where they schedule complex throughout the day to spread the load. The model trained at MorningStar will not generalise well to a new setting where the statistics are different. In ML terminology, we say that the model is ‘biassed’ and does not generalise ‘out-of-distribution’.

Limitations and opportunities

This example highlights three critical limitations of traditional (non-causal) ML in healthcare:

It can’t distinguish between correlation and causation, leading to misleading conclusions
It assumes historical patterns will hold true in new situations
It can’t answer “what-if” questions about different treatment choices

Moreover, for AI to be trusted and useful, explainability is crucial — clinicians should understand ‘why’ a prediction was made. There are methods to ‘explain’ model predictions, but they can only explain the correlations learnt by the model and not the factors that are causally related to those outcomes.

Also, we can achieve more, by breaking out of a traditional mindset and going beyond straightforward risk prediction. The highest risk patients are not necessarily the ones whose outcomes we can improve. For example, they may be the sickest patients, without any modifiable factors. In order to intervene in the most effective way, it is better to ask which risks are most preventable, and which input-variables are modifiable and causally related to the outcome. This will enable healthcare systems to ensure sustainability by better allocating resources to have a higher impact.

Causal ML

Causal ML represents a fundamental shift in how we analyse healthcare data. Instead of simply identifying patterns, it helps us to understand the actual mechanisms behind patient outcomes, which unlocks opportunities that we’ll discuss. Think of it as upgrading from a traffic forecast that predicts congestion to one that explains why it’s going to be congested and what we can do to prepare or mitigate.

Causal ML consists of two main domains: Discovery and Inference.

Causal Discovery

Maps relationships between variables, by consulting experts and/or automated algorithms
Creates “causal graphs” showing how different factors influence each other¹
Identifies critical factors like confounders, which need to be controlled²

Causal Inference

Quantifies the strength of cause-and-effect relationships
Enables “what-if” scenario planning
Adjusts for bias, such as those introduced by confounders

Causal inference can usually be done at two levels. On the group level (i.e. considering all the examples in the dataset), which could inform protocols and process, or individualised for personalised precision medicine.

More on ‘what-if’ scenarios

What if we could see how different treatment decisions will affect reality? Asking ‘what-if’ questions allows us to explore alternative scenarios using the data we already have (i.e. to ask counterfactual questions with only observational data). Traditionally, to understand whether a treatment really works, we’d need to run a randomised controlled trial (RCT) — the gold standard of medical research. But RCT’s are expensive and often not practical or ethical. For example, we can’t ask people to smoke, to study its effects on surgery. And if we are studying rare conditions like unusual tumours, gathering enough patients for a trial could take decades.

Real-world impact

Let’s revisit our surgery timing example through a Causal ML lens:

Distinguishing between correlation and causation

Traditional ML: The conclusion was that morning surgeries have worse outcomes.
Causal ML: We learned that complex cases are scheduled in the morning, and complexity is a strong driver of outcomes. Also, the time-of-day is a driver of outcomes, but in direct contrast to the conclusion reached by traditional ML. I.e. earlier means fresher surgeons, which means less complications than if they were done in the afternoon.
Action Impact: Simply moving surgeries to the afternoons would actually result in worse outcomes for patients, as the complexity wouldn’t change, but the team freshness would be worse. Instead, hospitals can focus on factors that they can control, like ICU availability and on factors that will influence the outcomes, like team freshness.

What-if scenarios / counterfactuals

Traditional ML: Surgeries were moved to the afternoon, and the complication rate did not improve.
Causal ML: We simulated the effect of changing the time-of-day for different complexity patients and saw how the outcomes changed.
Action Impact: Data-driven schedule improvements to minimise bad outcomes.

More generalisable models

Traditional ML: Developed a model with all the inputs and hoped it will generalise.
Causal ML: Identified the confounders, adjusted for them, and developed a less biassed model that could generalise to changing conditions.
Action Impact: The model works well for a wider range of changes (e.g. scheduling choices).

‘Preventable’ risk

Traditional ML: Got a risk prediction, and applied an intervention to patients whose condition were so bad that treatment was not effective e.g. elderly patients at risk because of their age.
Causal ML: Combined causal effect and risk to give preventable risk, identifying patients who will benefit the most from the treatment.
Action Impact: Resources (time and money) translate to outcomes.

The future of healthcare decision-making

While traditional ML has improved healthcare prediction, Causal ML could take us to the next level — from merely predicting outcomes to understanding and influencing them more effectively. This isn’t just about better predictions; it’s about empowering healthcare providers to make more informed, effective decisions that truly improve patient care.

Discovering the correct causal graph is difficult as there are unknowns and hidden assumptions. So it’s good practice to derive multiple causal graphs and regard them as hypotheses which are challenged and refined. ↩
There are other important variable types such as mediators (the ‘freshness’ in this example) and colliders. Colliders are affected by both treatment and variable, and should not be controlled. ↩