Using Language Models (AI) In Medical Devices Responsibly
Generative language models, colloquially known as “AI”, have been making waves across many different sectors in a wide range of roles, ranging from customer support chatbots to programming assistants, and even to a new wave of web search tools.
The invasion of so many (large language model) LLM-based products has also sparked furious debate about their reliability, particularly in high-risk applications. In the medical field, potentially gigantic opportunity is balanced against life and death stakes. This new technology must be used responsibly in a way that can enhance the skills of the clinicians or improve the patient experience without exposing either to increased risk.
For high-risk systems, a good rule of thumb is that no generated text should be user-facing. This is important for several reasons. Firstly, LLMs are statistical text generators, which have no sense of statement accuracy or nuance. You cannot depend on an LLM to have factual output, and while you can massage the statistics towards the right direction, this is simply not enough for critical tasks that require more control and accuracy.
Second, LLMs are susceptible to prompt injection, a technique that involves tuning the input one provides to a model in order to modify its behaviour. This exploit methodology can bypass safety training, or even constraints placed on its actions.
Finally, these issues could expose the product owner to legal liability, as courts have been holding companies accountable for promises made by language model-based agents ostensibly acting on their behalf.
Thor Tronrud is a research and data analysis-focused software engineer at StarFish Medical who specializes in the development and application of machine learning tools. Previously an astrophysicist working with magnetohydrodynamic simulations, Thor joined StarFish in 2021 and has applied machine learning techniques to problems including image segmentation, signal analysis, and language processing.