Google Research and Google DeepMind collaborated to develop an advanced Language Model (LLM) equipped with conversational abilities and tailored for differential diagnosis (DDx) in complex medical cases. Utilizing Med-PaLM 2, Google’s AI technology, this specialized LLM was fine-tuned using medical data, demonstrating substantial enhancements in performance. In a study involving 20 clinicians evaluating 302 challenging medical cases, the LLM significantly outperformed unassisted clinicians, achieving a 59.1% accuracy in DDx compared to 33.6%. While acknowledging limitations, researchers highlighted the potential of this LLM to enhance diagnostic accuracy, calling for further real-world evaluation to empower physicians and expand patient access to expert insights. However, caution is advised as the LLM’s performance might differ in routine clinical cases and its reliance on isolated symptoms rather than holistic analysis.
A study done by Google Research in collaboration with Google DeepMind reveals the tech giant developed an LLM with conversational and collaborative capabilities that can provide an accurate differential diagnosis (DDx) and help improve clinicians’ diagnostic reasoning and accuracy in diagnosing complex medical conditions.
The LLM for DDx builds upon Med-PaLM 2, the company’s generative AI technology that utilizes Google’s LLMs to answer medical questions.
The DDx-focused LLM was fine-tuned on medical domain data with substantial performance improvements and included an interface that allowed its use as an interactive clinician assistant.
Want to publish your own articles on DistilINFO Publications?
Send us an email, we will get in touch with you.
In the study, 20 clinicians evaluated 302 challenging, real-world medical cases from The New England Journal of Medicine.
Each case was read by two clinicians who were randomly provided either standard assistance methods, such as search engines and traditional medical resources, or standard assistance methods in addition to Google’s LLM for DDx. All clinicians provided a baseline DDx before being given the assisted tools.
Upon conclusion of the study, researchers found that the performance of its LLM for DDx exceeded that of unassisted clinicians, with 59.1% accuracy compared to 33.6%.
Additionally, clinicians who were assisted by the LLM had a more comprehensive list of differential diagnoses with 51.7% accuracy compared to those unassisted by the LLM at 36.1% and clinicians with search at 44.4%.
“Our study suggests that our LLM for DDx has the potential to improve clinicians’ diagnostic reasoning and accuracy in challenging cases, meriting further real-world evaluation for its ability to empower physicians and widen patients’ access to specialist-level expertise,” researchers noted.
THE LARGER TREND
Researchers reported limitations with the study. Clinicians were provided a redacted case report with access to the case presentation and associated figures and tables. The LLM was only given access to the main body of the text of each case report.
Researchers noted the LLM outperformed clinicians despite this limitation. If the LLM was given access to the tables and figures, it is unknown how much the accuracy gap would widen.
Additionally, the format of inputting information into the LLM would differ from how a clinician would input case information into the LLM.
“For example, while the case reports are created as ‘puzzles’ with enough clues that should enable a specialist to reason towards the final diagnosis, it would be challenging to create such a concise, complete, and coherent case report at the beginning of a real clinical encounter,” researcher’s wrote.
The cases were also selected as challenging conditions to diagnose. Therefore, evaluators noted the results do not suggest clinicians should leverage the LLM for DDx for typical cases seen in daily practice.
The LLM was also found to conclude isolated symptoms rather than seeing the whole case holistically, with one clinician noting the LLM was more beneficial for simpler cases with specific keywords or pathognomonic signs.
“Generating a DDx is a critical step in clinical case management, and the capabilities of LLMs present new opportunities for assistive tooling to help with this task. Our randomized study showed that the LLM for DDx was a helpful AI tool for DDx generation for generalist clinicians. Clinician participants indicated utility for learning and education, and additional work is needed to understand suitability for clinical settings,” the researchers concluded.
Source: Mobihealthnews







