For Doctors in a Hurry
- Physicians currently rely on labor-intensive clinical events committees to adjudicate major adverse cardiovascular events in large global clinical trials.
- The researchers validated an automated artificial intelligence system against human committee reviews using data from 5,661 patients in the PARADISE-MI trial.
- The automated system achieved agreement with human reviewers in 97 percent of deaths, 89 percent of myocardial infarctions, and 88 percent of strokes.
- The authors concluded that the artificial intelligence model produced hazard ratios for treatment effects nearly identical to those determined by human committees.
- Automated adjudication could reduce clinical trial workloads by handling confident cases while reserving human review for more complex or uncertain events.
The Evolution of Endpoint Adjudication in Cardiovascular Trials
The validity of modern cardiovascular medicine rests on the rigorous reporting of randomized controlled trials, which serve as the gold standard for evaluating life-saving interventions [1]. Central to these trials is the accurate adjudication of major adverse cardiovascular events, a composite endpoint typically comprising cardiovascular death, myocardial infarction, and stroke [2, 3]. While standardized definitions for these events and associated complications like bleeding have been established to improve consistency, the process remains heavily dependent on manual review by physician committees [2, 4]. This human-led adjudication is essential for determining the efficacy of diverse therapies, ranging from intensive blood pressure control to lipid-lowering agents and antithrombotic regimens [5, 6, 7]. However, the increasing complexity and global scale of these trials have made traditional oversight increasingly resource-intensive. A new study now evaluates whether artificial intelligence can meet the rigorous standards of human clinical event committees to maintain the reliability of trial outcomes.
Developing the Auto-MACE Adjudication Framework
In the landscape of global randomized trials, the adjudication of major adverse cardiovascular events (MACE) serves as the primary metric for determining treatment efficacy. These events are strictly defined as cardiovascular death, nonfatal myocardial infarction (MI), and nonfatal stroke. Currently, the conventional standard for identifying these outcomes involves a medical records review by a physician clinical events committee (CEC). While this process ensures clinical rigor, it is notoriously labor intensive, requiring significant time and expertise from specialized clinicians to manually parse through extensive patient documentation. To address these logistical burdens, researchers are investigating automated adjudication using artificial intelligence (AI) as a means to reduce costs and improve the reproducibility of trial results across different geographic sites. The researchers developed a specialized system named Auto-MACE to streamline this adjudication process. This framework utilizes an iteratively refined prompt of the OpenAI o1-mini language model (a large-scale computational tool designed to process and interpret complex text) to adjudicate potential MACE events based on clinical data. To ensure the reliability of these automated decisions, the system incorporates a Clinical Longformer model, which is a specialized language model trained specifically on clinical text to handle long-form medical records. This secondary model functions by assigning a confidence level to each adjudication, allowing the system to flag cases where the AI determination may require further human oversight. By combining these two technologies, the Auto-MACE system aims to replicate the nuanced decision-making of a physician committee while maintaining the speed and consistency of an automated platform.
Validation Against the PARADISE-MI Trial Cohort
To validate the Auto-MACE system, the researchers utilized data from the PARADISE-MI global clinical trial. This large-scale study evaluated the efficacy of sacubitril/valsartan compared to ramipril in a cohort of 5,661 patients who had recently experienced a myocardial infarction complicated by systolic dysfunction or pulmonary congestion. By using this specific dataset, the authors could test the artificial intelligence system against a rigorous, real-world benchmark where human experts had already meticulously adjudicated every potential cardiovascular event. This comparison is vital for clinicians to understand if automated systems can maintain the high standards required for phase 3 clinical trials. The system first categorized events based on its internal certainty, a process that identifies cases where the data is clear enough for the model to make a definitive determination. In this validation cohort, Auto-MACE provided a confident adjudication in 315 out of 455 deaths (69%), 301 out of 659 potential myocardial infarctions (46%), and 136 out of 167 potential strokes (81%). These variations in confidence levels suggest that while the artificial intelligence is highly capable of identifying clear-cut cases of mortality and cerebrovascular events, the nuanced presentation of myocardial infarction in post-infarction patients (where biomarkers and ECG changes may be confounded by the index event) remains more challenging for automated systems to categorize without human oversight. When the system expressed high confidence in its determination, its accuracy closely mirrored that of the physician clinical events committee. Specifically, Auto-MACE agreed with the committee adjudication in 97% of confident death events, 89% of confident myocardial infarction events, and 88% of confident stroke events. For the practicing clinician, these high agreement rates in the confident category indicate that artificial intelligence could potentially handle the majority of straightforward adjudications, allowing human committees to focus their expertise on the more complex or ambiguous cases that fall below the confidence threshold.
Comparative Accuracy and Treatment Effect Estimates
The researchers evaluated the performance of the Auto-MACE system across the entire spectrum of potential events, including those where the model's internal confidence was low. When considering the full dataset without filtering for certainty, the system demonstrated robust alignment with the physician clinical events committee. Specifically, Auto-MACE agreed with the committee adjudications in 86% of all deaths. The agreement rate for potential strokes was 84%, while the identification of potential myocardial infarctions showed an agreement rate of 76%. These figures represent the baseline reliability of the automated system in a real-world trial environment, suggesting that while myocardial infarction remains the most difficult endpoint to adjudicate automatically, the system maintains a high degree of accuracy for mortality and cerebrovascular events. Beyond individual event agreement, the ultimate utility of an automated adjudication system lies in its ability to produce the same trial conclusions as human experts. In the analysis of the primary composite endpoint of major adverse cardiovascular events, the treatment effect estimates were nearly identical between the two methods. When using the Auto-MACE system to adjudicate outcomes, the estimated effect of sacubitril/valsartan versus ramipril yielded a hazard ratio of 0.91 (95% CI: 0.78 to 1.07). This result was remarkably consistent with the findings of the physician clinical events committee, which reported a hazard ratio of 0.90 (95% CI: 0.77 to 1.05). The overlap in these point estimates and their corresponding 95% confidence intervals indicates that the automated system would have led to the same clinical interpretation of the PARADISE-MI trial results as the traditional, more resource-intensive human adjudication process.
Clinical Implications for Future Trial Design
The validation of the Auto-MACE system within the PARADISE-MI trial cohort demonstrates that AI-based adjudication achieves high agreement with human clinical events committees, particularly for the most definitive endpoints. The system proved most reliable in identifying cardiovascular death and stroke, where the objective nature of the clinical data allowed for the highest levels of diagnostic concordance. Specifically, when the model expressed high confidence in its assessment, it matched physician experts in 97% of deaths and 88% of potential strokes. Even when including all potential events regardless of model certainty, the agreement remained robust at 86% for deaths and 84% for strokes. These data suggest that automated systems can reliably capture the primary drivers of major adverse cardiovascular events, providing a stable foundation for trial results without the inherent variability of human review. For practicing clinicians involved in trial leadership or site investigation, these findings support a transition toward a hybrid adjudication workflow. The researchers propose that initial AI-based adjudication combined with physician review of uncertain events may reduce workload while maintaining accuracy. Under this model, the automated system would process the bulk of straightforward cases, such as the 315 of 455 deaths (69%) and 136 of 167 potential strokes (81%) where Auto-MACE provided a confident adjudication. Human experts would then focus their efforts on the more complex or ambiguous cases, such as the 54% of potential myocardial infarctions where the model did not reach a high confidence threshold. This targeted approach addresses the administrative burden of traditional adjudication while ensuring that the final trial data remains as rigorous as a fully manual process.
References
1. Moher D, Hopewell S, Schulz KF, et al. CONSORT 2010 Explanation and Elaboration: updated guidelines for reporting parallel group randomised trials. BMJ. 2010. doi:10.1136/bmj.c869
2. Wang Y, Fuerte-Hortigon A, Chetty S, et al. Types and Rates of Major Adverse Cardiovascular Events in Antithrombotic Trials: A Systematic Review and Meta-Analysis.. The Canadian journal of cardiology. 2025. doi:10.1016/j.cjca.2025.08.358
3. Ballacci F, Giordano F, Conte C, Telesca A, Collini V, Imazio M. Colchicine for prevention of major adverse cardiovascular events: a meta-analysis of randomized clinical trials.. Journal of cardiovascular medicine (Hagerstown, Md.). 2025. doi:10.2459/JCM.0000000000001744
4. Mehran R, Rao SV, Bhatt DL, et al. Standardized Bleeding Definitions for Cardiovascular Clinical Trials. Circulation. 2011. doi:10.1161/circulationaha.110.009449
5. Grundy SM, Cleeman JI, Merz CNB, et al. Implications of Recent Clinical Trials for the National Cholesterol Education Program Adult Treatment Panel III Guidelines. Circulation. 2004. doi:10.1161/01.cir.0000133317.49796.0e
6. Group TSR. A Randomized Trial of Intensive versus Standard Blood-Pressure Control. New England Journal of Medicine. 2015. doi:10.1056/nejmoa1511939
7. Giugliano RP, Ruff CT, Braunwald E, et al. Edoxaban versus Warfarin in Patients with Atrial Fibrillation. New England Journal of Medicine. 2013. doi:10.1056/nejmoa1310907