- Large language models typically lack the ability to integrate multimodal clinical data, limiting their utility in complex diagnostic scenarios.
- Researchers conducted a randomized, blinded exploratory study involving 105 simulated telehealth consultations to evaluate a multimodal AI system.
- The multimodal AI system demonstrated superior performance on 29 of 32 evaluation axes, including seven of nine multimodal reasoning metrics.
- The authors concluded that state-aware reasoning effectively bridges text and visual information, augmenting clinicians in multimodal diagnostic settings.
- This suggests AI systems could enhance diagnostic accuracy and conversation quality in complex, real-world clinical practice.
The Multimodal Imperative in Diagnostic AI
Clinical diagnosis requires physicians to synthesize patient narratives with diverse data, including imaging, lab reports, and other documents [1, 2]. While artificial intelligence has advanced in specific medical tasks, many systems are limited to a single data type, such as an image or a text file [3, 4, 5]. This unimodal focus does not reflect the complexity of clinical encounters, particularly in telehealth, where integrating various information streams is essential for accurate assessment [6, 7, 8, 9, 10]. A significant gap exists in developing AI that can conduct a diagnostic conversation while simultaneously gathering and interpreting multiple forms of medical data, mirroring a real-world clinical workflow.
Addressing Multimodal Clinical Reality
To bridge the gap between text-only conversational AI and the multifaceted nature of clinical practice, researchers have developed a multimodal extension of the Articulate Medical Intelligence Explorer (AMIE). This updated system, called multimodal AMIE, is a large language model designed for diagnostic dialogue that can now request, interpret, and reason about varied medical data within a single conversational context. The system aims to emulate a clinician's comprehensive diagnostic process by integrating a patient's spoken history with visual inputs like dermatology photographs, data from electrocardiograms, and information from clinical documents.
State-Aware Reasoning: Emulating Clinical Thought
A key innovation powering this system is its ability to dynamically guide a clinical interview based on an evolving understanding of the patient's condition, a process the researchers term a state-aware dialogue framework. This computational approach allows the AI to mimic the structured reasoning of an experienced clinician, who continuously refines a differential diagnosis as new information becomes available. The framework tracks diagnostic uncertainty and adjusts its line of questioning to seek the most relevant information, whether from the patient's history or by requesting a specific piece of data like an ECG or a clinical note. This capacity for adaptive, context-sensitive inquiry is what enables multimodal AMIE to synthesize diverse data types effectively within a simulated telehealth environment.
Rigorous Evaluation in Simulated Telehealth
The system's capabilities were tested in a randomized, blinded exploratory study that placed multimodal AMIE in a head-to-head comparison with practicing primary care physicians (PCPs). The study involved 105 simulated telehealth consultations, each containing a mix of data types including dermatology photographs, electrocardiograms, and clinical documents, to reflect real-world diagnostic challenges. To ensure an objective and clinically meaningful assessment, a panel of 18 independent specialist physicians evaluated the performance of both the AI and the PCPs across a wide range of metrics. The blinded design meant these specialists did not know if they were reviewing a transcript from a human physician or the AI, minimizing potential bias.
Multimodal AMIE's Performance Metrics
In the evaluation by specialist physicians, multimodal AMIE demonstrated higher diagnostic accuracy than the primary care physicians. The AI's performance was also assessed as superior in overall conversation quality, which included the thoroughness of its history-taking and its ability to convey empathy. Across the comprehensive evaluation, multimodal AMIE's performance was rated higher than that of PCPs on 29 of 32 distinct axes. The system's specific design for integrating different data formats proved effective, as it also achieved higher performance on seven of nine metrics designed to assess multimodal reasoning, which is the ability to connect insights from visual data, text, and patient history.
Clinical Implications for AI Integration
These findings validate the effectiveness of a state-aware reasoning framework for bridging the gap between text-based conversation and the interpretation of visual medical data. The study demonstrates that an AI system can successfully mirror the complex cognitive task of integrating disparate information streams, a cornerstone of diagnostic medicine. For practicing clinicians, this suggests the potential for AI systems like multimodal AMIE to function as powerful adjunctive tools in complex diagnostic settings. Such technology could help synthesize patient data, organize differential diagnoses, and augment clinical decision-making, particularly in telehealth scenarios where access to information is varied. This may allow physicians to offload some cognitive burden associated with data integration, freeing them to focus on patient management and the uniquely human aspects of care.
References
1. Huang S, Pareek A, Seyyedi S, Banerjee I, Lungren MP. Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines. npj Digital Medicine. 2020. doi:10.1038/s41746-020-00341-z
2. Ismayilzada K. Artificial intelligence for acute appendicitis diagnosis: A systematic review of current evidence, challenges, and future directions. Medicine. 2026. doi:10.1097/MD.0000000000048094
3. Yu Z, Mulholland A, Huang T, Liu Q. Multimodal AI for Alzheimer Disease Diagnosis: Systematic Review of Datasets, Models, and Modalities.. Journal of medical Internet research. 2026. doi:10.2196/85414
4. Gavade AB, Gavade PA, Nerli RB, Cooper DC, Sztandera L, Mehta U. Artificial intelligence in prostate cancer diagnosis: A systematic review of advances in gleason grade and PI-RADS classification. Imaging. 2025. doi:10.1556/1647.2025.00325
5. Almaabreh O, Al-Dafi R, Tabassum A, Othman A, Abd-Alrazaq A. The Performance of Artificial Intelligence in Classifying Molecular Markers in Adult-Type Gliomas Using Histopathological Images: Systematic Review. Journal of Medical Internet Research. 2026. doi:10.2196/78377
6. Khathami AA, Baklola M, Alalyani MA, et al. Multimodal artificial intelligence in retinal vascular and neovascular macular diseases: a systematic review of diagnostic and prognostic applications.. BMC ophthalmology. 2025. doi:10.1186/s12886-025-04561-3
7. Muglan JA. DIAGNOSTIC ACCURACY OF MULTIMODAL AI FRAMEWORKS VS. CLINICAL ASSESSMENT FOR EARLY-STAGE PARKINSON’S DISEASE: A SYSTEMATIC REVIEW AND META-ANALYSIS. First Workshop on Insights from Negative Results in NLP. 2025. doi:10.71000/grqkfv91
8. Catino F, Castellana F, Zupo R, et al. Multimodal biomarker AI techniques for early neurocognitive disorder diagnosis: A systematic review.. Artificial intelligence in medicine. 2026. doi:10.1016/j.artmed.2026.103389
9. Laranjo L, Dunn AG, Tong HL, et al. Conversational agents in healthcare: a systematic review. Journal of the American Medical Informatics Association. 2018. doi:10.1093/jamia/ocy072
10. Golinelli D, Boetto E, Carullo G, Nuzzolese AG, Landini MP, Fantini MP. Adoption of Digital Technologies in Health Care During the COVID-19 Pandemic: Systematic Review of Early Scientific Literature. Journal of Medical Internet Research. 2020. doi:10.2196/22280