3 Healthcare AI Large Language Models (LLMs) to watch

4 Healthcare AI Large Language Models (LLMs) to Watch

Emerging AI and machine learning technologies are entering the many facets of healthcare delivery and administration. At Keywell, we’re keeping an eye on emerging technologies that would benefit the mission and operations of our clients.Large language models such as chatGPT hold promise for direct clinical application (for example, diagnosing complex conditions). Still, they will emerge more significantly in helping to solve industry pain points such as staffing shortages and the increasing cost and complexity of healthcare administration.

An interesting dimension in the world of large-language models is that the industry-leading models are expensive and resource-intensive to generate (on the order of millions to billions of dollars to train). These foundational LLMs serve as basic infrastructure hub, like AI train stations for the masses. Fine-tuned models then are the rail operators building on the LLMs, allowing for customization and industry-focused content. Developing fine-tuned models can be low-cost and feasible, making it possible to cost-effectively deploy powerful technologies. Here are a few healthcare-tuned LLMs under construction that we’re keeping an eye on:


3 Healthcare AI Large Language Models (LLMs) to watch

Google’s MedPaLM-2 

MedPalm 2 is designed to answer medical questions and, according to Google, was the first AI system to obtain a passing score on USMLE questions from the MedQA dataset, with an accuracy of 85.4%, matching expert test takers.  The model is based on Google’s PaLM with 54B parameters. 




OpenAI GPT-4

OpenAI’s GPT-4 is a widely known LLM model (best known for ChatGPT) with 100T parameters.  OpenAI has not released a healthcare-tuned version, but led the first major integration of LLMs in healthcare with the Microsoft Nuance partnership.  Epic (the world’s largest Electronic Health Record)  is embedding the technology in messaging capabilities to provide pre-drafted notes in the EHR. 



John Snow Labs  Clinical QA BioGPT (JSL) 

John Snow Labs has long been a leader in natural language processing (NLP)  tools and algorithms for healthcare use cases.  In addition to data labeling and extraction, they have tools to de-identify clinical notes and healthcare data.  

JSL recently announced an LLM based on BioGPT (an older, smaller LLM trained on medical information) with fine-tuning based on JSL data and NLP tools.  The model may perform better than ChatGPT in areas such as patient de-identification, entity resolution such as extracting procedure codes and healthcare terminology, and accuracy in clinical summarization.



While not yet a commercial offering, PMC-LLaMA is an academic project based on a fine-tuned version of Meta’s powerful LLaMA 13B model, one of the major LLM “hubs”.  The model was trained on 4.8 million biomedical academic papers and demonstrated improved performance over the standard LLaMA model. (Who was the judge?  GPT-4!) 


All of these models involve varying degrees of commercial licensing, which shouldn’t come as a surprise for use of state-of-the-art models.  There are a number of emerging open source or free models that are worth watching (i.e.  BIOGPT and iterations), but note that these are often based on much smaller models without the sophisticated conversation capabilities of larger LLMs.  Conversational AI tools are obviously not fit for every healthcare use case, but they are rapidly solving for common healthcare pain points like clinical decision support, patient education, and administrative documentation.