TL;DR Large Language Models (LLM) are used as agents to drive pre-surgery questionnaires

Pre-surgery questionnaires – why do we need them?

Pre-surgery questionnaires are used to collect detailed health histories before patients undergo surgical procedures. They help to identify potential risks, such as allergies, underlying conditions, or previous reactions to anaesthesia. Historically, questionnaires were completed on paper, but with advancements in technology, some are now delivered in electronic formats. Even though many health systems now use Electronic Health Records (EHR), there are often significant information gaps related to patient history, which need to be filled in before surgery, so questionnaires are still required.

Many questions, shallow information

Pre-surgery questionnaires often require a large number of questions to cover a wide range of potential clinical risks across various organs and symptoms, which can lead to patient fatigue and incomplete responses. Some questionnaires include dozens of questions, but collect only high-level information. They cannot drill down on issues indicated by the patient’s answers.

Electronic questionnaires can improve the process by using rules-based question selection, such as triggering follow-up questions based on a patient’s response. For example, a ‘rule’ could be that if a patient indicates that they have sleep apnea, the following question should be what CPAP treatment has been prescribed. However, this approach is only effective for simple scenarios, and a large number of questions is still needed to cover all potential risks.

Personalised questionnaires – more information, less questions

Every patient is different; they have different histories, underlying conditions, and treatment plans. In line with the evolution of personalised medicine, this article describes research that aims to personalise questionnaires by intelligently determining which questions are essential for each patient. The system will recognize when follow-up questions are needed for more detailed information and when sufficient data have been gathered, ensuring a balance between thoroughness and efficiency.

The conceptual chart below compares a standard questionnaire, where the number of questions is unrelated to the patient, with personalised questionnaires, where the number of questions in each clinical area may be different, due to the different conditions of each patient.

Illustration: conceptual difference between a standard questionnaire and personalised questions to match health profile

Limitations of LLMs

While Large Language Models (LLMs) hold great promise for enhancing the efficiency of pre-surgery questionnaires, they also come with notable limitations.

One significant limitation is their ‘single-mindedness’ – LLMs are inherently designed as single streams of token completion. This can be thought of as similar to ‘single-threaded’ applications – i.e. they can work on one thing at a time. It can be challenging for LLMs to multitask or take on different roles simultaneously. For instance, in our experiments, we found that while an LLM could pick out appropriate questions from a list, it often failed to do so effectively when it needed to also consider which clinical areas had been sufficiently covered and which required further inquiry. This inability to balance question selection with gap evaluation limited the model’s utility in creating an adaptive, context-aware questionnaire.

Using agentic flow

LLM agents, where multiple LLMs work together in a coordinated manner, offer a promising way to implement complex flows, where different agents can take on specialised roles. The diagram below illustrates the agents that we implemented: a ‘clinical agent’, responsible for understanding and evaluating the patient’s medical history, and a ‘question selector’ that focuses on choosing the most relevant questions from a given list of questions. This approach allows for more effective multitasking and ensures that the process is both thorough and efficient.

Illustration: LLM agents interact with patient to deliver personalised questionnaire

Leveraging LLM knowledge with guardrails

Large language models (LLMs), trained on extensive datasets, including medical literature, hold valuable knowledge that can greatly assist in analysing patient responses and generating relevant follow-up questions. For instance, we found that some LLMs have significant knowledge of medications and can, when provided with a list of a patient’s medications, infer underlying health conditions. This capability can be enhanced by integrating external tools, such as querying medication databases, to provide additional context. Incorporating the data into the prompt can enable the agent to perform more accurate analyses. This approach removes the need for complex, rules-based logic, streamlining application development and reducing complexity. However, it relies on a critical assumption: that LLMs correctly interpret and apply the data. This is not trivial — LLMs require well-defined guardrails to operate safely within clinical settings.

There are various ways to implement guardrails. In the case of questionnaires, one method that we implemented was to allow agents to select questions from a list of predetermined questions, but not create new ones. This ensures that the wording of questions is exactly as defined by clinicians, which is important because small wording changes, which may be grammatically correct, may result in different clinical meaning. 

For example, suppose a clinician has carefully crafted a question like, “Do you experience chest pain during physical activity?” If the agent were allowed to create new questions, it might phrase the question as, “Do you feel discomfort when you exercise?” While both questions are grammatically correct, the rewording introduces ambiguity. “Discomfort” is broader than “chest pain” and might lead patients to report symptoms unrelated to cardiac issues, potentially altering the clinical relevance of the answer. Limiting agents to select from predetermined questions ensures the clinical meaning remains consistent and precise, avoiding misinterpretation.

Following a semi–structured process

An LLM serving as clinical agent can be prompted to operate in phases; for example, it can start by asking a set of specified questions and then, based on the responses, explore additional areas if necessary. This phased approach ensures that basic information is gathered while still allowing the flexibility to dive deeper when needed.

Business rules can be injected into the prompt to further guide the process. For example, the agent could be instructed to avoid certain questions, avoid repeating questions, or limit the length of the questionnaire.

Techniques such as Retrieval Augmented Generation (RAG) can be used to guide the exploration by including relevant content such as hospital guidelines or research literature. For example, if the hospital has a policy of asking about the patient’s ability to look after themselves if a particular medication is prescribed, the clinical agent could take that into consideration when determining what to ask.

Understanding ‘performance’

Evaluating agentic questionnaires should focus on ensuring they collect essential clinical information. The key objectives are: 

  • asking important, relevant questions (such as those in existing questionnaires)
  • avoiding irrelevant questions to save time, and 
  • introducing new questions based on patient responses to capture additional insights. 

Measuring performance is challenging since these questionnaires deviate from traditional practices, but it can be tested by comparing the generated questions and answers with known patient records. Key metrics include

  • compliance (% of essential questions asked)
  • coverage (% of health issues uncovered)
  • waste (% of irrelevant questions asked)
  • efficiency (number of questions asked compared to standard questionnaires).

Implementing within a clinical application

We implemented this functionality in our existing perioperative application, Patient Optimizer (POP), allowing us to leverage a familiar user interface for clinicians. This enabled smooth reviews and quick feedback, streamlining the development process. Clinical feedback helped refine the logic, question relevance, and overall user experience. The iterative design allowed us to optimise the system for real-world use; for example, differentiating between clinical language for clinicians and simplified summaries for patients.

From the patient’s perspective, the use of AI agents is seamless—they simply respond to the questions presented during each stage of the questionnaire. However, we found it valuable to display the agent’s output to clinicians. This provided insight into the reasoning behind question selection and helped to identify opportunities for improvement.

Opportunity to collaborate

This article demonstrates a real-world application of agentic workflows in a clinical setting and their potential to improve and streamline processes. This approach can be useful in many other scenarios where process efficiency and flexibility are important. 

Please comment to share your experience using agentic workflows, or reach out if you would like to explore collaboration opportunities.