Scientists from Weill Cornell Medicine and the Dana-Farber Cancer Institute in Boston have developed and tested new artificial intelligence (AI) tools tailored to digital pathology – a rapidly growing field that uses high-resolution digital images created from tissue samples to help diagnose disease and guide treatment.
Their paper, published July 9 in the Lancet Digital Health, demonstrates that ChatGPT, an AI language model developed to understand and generate text, can be tailored to provide accurate responses to questions about digital pathology and compile detailed results. The authors also found that ChatGPT can help pathologists without extensive coding experience use complex software that analyzes tissue samples, helping bridge the gap between pathology and digital pathology skills.
ChatGPT is a large language model (LLM), meaning it generates text on a wide range of topics using extensive amounts of data.
“LLMs are good for general tasks, but they aren’t the best tools for getting useful information for specialized fields,” said the study’s lead author Dr. Mohamed Omar, assistant professor of research in pathology and laboratory medicine and a member of the Division of Computational and Systems Pathology at Weill Cornell Medicine. Dr. Luigi Marchionni, associate professor in pathology and laboratory medicine and head of the Division of Computational and Systems Pathology, is also a co-author on this study.
To create AI tools that could increase the efficiency and accuracy needed for the nuanced decision-making necessary in digital pathology, corresponding author Dr. Renato Umeton, director of Artificial Intelligence Operations and Data Science Services in the Informatics & Analytics Department at Dana-Farber Cancer Institute, spearheaded the effort to customize ChatGPT.
“General LLMs have two major problems. First, they often provide lengthy generic responses that don’t contain useful information,” Omar said. “Second, these models can hallucinate and make things up out of nowhere, including literature citations. This is especially bad in specialized fields like digital pathology and cancer biology, for example.”
To address the glitches, Umeton started with a safe, private and secure ChatGPT version implemented at Dana-Farber Cancer Institute (GPT4DFCI). The researchers augmented GPT4DFCI with access to a comprehensive and curated database of the latest developments in digital pathology, consisting of 650 publications from 2022 onward, which added over 10,000 pages of literature.
“We could ask this new system to catch us up on many specific topics or techniques in digital pathology and get results in seconds, with a level of detail, depth and summarization that doesn’t exist in current scientific literature tools or search engines,” Umeton said.
They used retrieval-augmented generation (RAG), a technique that enabled GPT4DFCI to access relevant documents or information from this specialized database and generate accurate responses to user prompts about digital pathology, but nothing outside of that realm.
Omar and his colleagues compared the responses from GTP4DFCI to those provided with the more general GPT-4 model. By requiring GTP4DFCI to provide links to the specific publications it used to generate responses, they determined that the answers were accurate and grounded. The refined model provided more precise and relevant answers than GPT-4 and did not hallucinate even once.
“My hope is that this will be a catalyst for more domain-specific tools in other fields of medicine or medical research,” Omar said.
The second AI program the team developed helps pathologists use PathML, a specialized software library that requires familiarity with the programming language Python for analyzing vast and complex histopathology image datasets. “Pathologists or scientists without prior coding experience often find PathML very challenging to use for image analysis tasks,” Omar said.
The researchers integrated PathML with ChatGPT, allowing users to simply type in their questions about running histopathology analyses with PathML. The tool then provides step-by-step, accurate instructions on coding for their samples.
“Our research shows that, when combined with the proper information retrieval techniques, ChatGPT and safeguarded AI tools, like GPT4DFCI, can be extremely effective in supporting basic researchers,” Umeton said. “These tools are helpful even across very complex topics that need extremely precise answers, like digital pathology.”
This work was supported by the National Cancer Institute.
Heather Lindsey is a freelance writer for Weill Cornell Medicine.