For Doctors in a Hurry
- Diagnosing parkinsonism before death remains difficult because patients often present with overlapping symptoms and multiple underlying brain pathologies.
- Researchers trained machine learning models on medical records from 949 deceased patients to predict nine specific neuropathological diagnoses using early clinical symptoms.
- The algorithm achieved an area under the curve of 0.83 across nine diagnostic categories, utilizing predictors like age, restricted eye movement, and tremor.
- The authors concluded that this algorithm reliably identifies underlying pathologies even with incomplete data, requiring only 23 of 200 clinical parameters.
- This tool provides a cost-effective screening method to estimate diagnostic probabilities, helping clinicians guide biomarker testing and select molecular-targeted therapies.
The Diagnostic Maze of Atypical Parkinsonism
Distinguishing idiopathic Parkinson's disease from atypical parkinsonian disorders, such as progressive supranuclear palsy, multiple system atrophy, and corticobasal syndrome, remains a persistent clinical challenge, particularly early in the disease course [1, 2]. Because these conditions often present with overlapping clinical features and heterogeneous underlying neuropathologies, accurate pre-mortem diagnosis frequently requires subspecialty expertise or advanced imaging modalities like fluorine-18 fluorodeoxyglucose positron emission tomography (18F-FDG PET) [2, 3]. While routinely collected healthcare data and longitudinal clinical history can help guide differential diagnosis, existing diagnostic criteria are complex and difficult to implement in general practice [2, 4]. As molecular-targeted therapies enter development, clinicians increasingly need accessible, cost-effective tools to predict a patient's true underlying pathology without relying solely on specialized tertiary centers. To address this, researchers developed a machine learning algorithm utilizing 197 chronological clinical symptoms to forecast post-mortem neuropathological diagnoses [5]. Evaluated in 949 brain bank donors, the model predicted nine distinct neuropathological categories with an area under the receiver operating characteristic curve of 0.83 at three years after symptom onset, requiring only 23 clinical parameters for reliable prediction [5].
Mining Brain Bank Records with AI
Pre-mortem diagnosis of parkinsonism is often complicated by atypical presentations, overlapping syndromes, and mixed pathologies. To address this diagnostic hurdle, researchers developed a predictive algorithm based on the chronological appearance of clinical symptoms. By mapping the exact timeline of symptom development, the investigators sought to create a tool that reflects the real-world progression patients experience in the clinic. To build the dataset, clinical information was automatically abstracted from medical records of the Mayo Clinic Brain Bank using Generative Pre-trained Transformer 4, a large language model capable of reading and structuring free-text clinical notes. This artificial intelligence tool processed thousands of complex patient charts to extract standardized data points. Among 7,825 donors, 949 met inclusion criteria for the final analysis. Specifically, the study included patients who developed parkinsonism within three years of disease onset, ensuring the cohort represented individuals with early and clearly documented motor symptoms. With the cohort established, six machine learning models were trained with age, sex, family history, and 197 clinical presentations paired with onset information. The models were designed to predict neuropathologic diagnoses, including co-pathologies, directly from these chronological clinical inputs. By feeding the algorithms a dense array of clinical parameters alongside demographic data, the investigators aimed to capture the subtle, time-dependent symptom patterns that differentiate distinct underlying brain pathologies before death.
Mapping Nine Neuropathological Categories
The final cohort of 949 donors represented nine distinct neuropathologic categories, providing a comprehensive dataset of post-mortem confirmed diagnoses. To capture the reality of mixed pathologies frequently seen in clinical practice, the researchers stratified these donors based on both primary and co-occurring proteinopathies. Within the synucleinopathy spectrum, the cohort included patients with Lewy body disease (LBD; n = 128) and LBD with co-occurring Alzheimer's disease (AD; n = 136). The largest diagnostic subset consisted of patients with tauopathies, specifically progressive supranuclear palsy (PSP; n = 303). The investigators also accounted for mixed presentations within this group, identifying cases of PSP with AD (n = 56) and PSP with LBD (n = 27). Beyond LBD and PSP, the analysis captured several other atypical parkinsonian syndromes that frequently complicate pre-mortem diagnosis, including multiple system atrophy (MSA; n = 120) and corticobasal degeneration (CBD; n = 99). Finally, the cohort encompassed patients whose primary post-mortem diagnoses are traditionally associated with cognitive or behavioral decline but who exhibited early parkinsonism, specifically AD (n = 43) and frontotemporal lobar degeneration (FTLD; n = 37). By incorporating these diverse and overlapping conditions, the dataset accurately mirrors the complex differential diagnoses physicians navigate when evaluating patients with early-stage motor symptoms.
Predictive Accuracy and Key Clinical Features
To evaluate diagnostic performance, the researchers utilized the CatBoost algorithm, a machine learning technique optimized for processing categorical data like symptom checklists. The CatBoost algorithm achieved an area under the receiver operating characteristic curve of 0.83 across the nine diagnostic categories at three years after onset. An area under the curve (AUC), a statistical metric where 1.0 represents perfect prediction and 0.5 represents random chance, of 0.83 indicates a strong ability to discriminate between complex, overlapping pathologies early in the disease course. When analyzing the specific clinical features driving these accurate forecasts, the investigators identified several highly weighted variables. Important predictors included age at onset, restricted eye movement, and tremor. For clinicians, these findings reinforce the diagnostic value of tracking specific motor and demographic milestones during early patient evaluations. Recognizing that real-world medical records frequently lack comprehensive documentation, the researchers tested the algorithm's resilience to missing information. The model remained robust to incomplete data, requiring only 23 of 200 parameters for reliable predictions. This significant reduction in required inputs makes the tool highly practical for standard clinical settings, where exhaustive symptom checklists are rarely completed. Even after stripping away the vast majority of the initial variables, the diagnostic accuracy remained high. With only 23 parameters, the model achieved an AUC of 0.80. By maintaining strong predictive power with a fraction of the data, this streamlined algorithm offers physicians an accessible method to estimate underlying neuropathology without requiring specialized biomarker testing.
To ensure these predictive capabilities are accessible during routine patient visits, the researchers translated their model into a clinical interface. Specifically, the algorithm was implemented into a user-friendly program providing diagnostic probabilities with visualizations of parameter contributions. Rather than functioning as an opaque system, the software allows physicians to see exactly how individual clinical signs influence the final pathological prediction. This transparency helps clinicians weigh the algorithmic output against their own clinical judgment when evaluating complex motor symptoms. By converting complex machine learning into an accessible format, this neuropathology-confirmed diagnostic algorithm provides a cost-effective and interpretable screening tool for parkinsonism. As the treatment landscape for neurodegenerative diseases evolves, accurately identifying the underlying proteinopathy early in the disease course becomes increasingly critical. Ultimately, the algorithm bridges biomarker testing and molecular-targeted therapies, allowing physicians to better stratify patients for advanced diagnostics and select appropriate candidates for emerging disease-modifying treatments without relying solely on expensive or invasive procedures.
References
1. Arienti F, Lazzeri G, Vizziello M, et al. Unravelling Genetic Factors Underlying Corticobasal Syndrome: A Systematic Review.. Cells. 2021. doi:10.3390/cells10010171
2. Bruno MK, Dhall R, Duquette A, et al. A General Neurologist's Practical Diagnostic Algorithm for Atypical Parkinsonian Disorders: A Consensus Statement.. Neurology. Clinical practice. 2024. doi:10.1212/CPJ.0000000000200345
3. Zhao T, Wang B, Liang W, et al. Accuracy of 18F-FDG PET Imaging in Differentiating Parkinson's Disease from Atypical Parkinsonian Syndromes: A Systematic Review and Meta-Analysis.. Academic Radiology. 2024. doi:10.1016/j.acra.2024.08.016
4. Harding Z, Wilkinson T, Stevenson A, et al. Identifying Parkinson’s disease and parkinsonism cases using routinely-collected healthcare data: a systematic review. bioRxiv. 2018. doi:10.1101/331652
5. Ono D, Sekiya H, Maier AR, Graff-Radford NR, Wszolek ZK, Dickson DW. Chronological Diagnostic Algorithm Predicting Neuropathology in Parkinsonism.. Annals of neurology. 2026. doi:10.1002/ana.78193