Commercial AI for Lung Cancer Screening Lacks Full Guideline Alignment

An analysis of 16 CE-marked products reveals gaps in managing cystic lesions and a lack of prospective evidence for patient outcomes.

For Doctors in a Hurry

Researchers evaluated whether 16 commercially available artificial intelligence products for lung nodule analysis align with international clinical management guidelines.
The study analyzed 16 regulated products and 60 peer-reviewed publications to assess functional coverage of tasks like nodule detection and measurement.
While 14 products detect solid nodules, none support endobronchial or cystic lesions, and only 7 percent of studies were prospective.
The authors concluded that significant gaps in task coverage and a lack of high-level evidence necessitate cautious implementation in screening programs.
Clinicians should recognize that limited evidence on patient outcomes complicates the integration of these tools into standard lung cancer screening protocols.

The Evolving Role of Automated Analysis in Lung Cancer Screening

Lung cancer remains a leading cause of global mortality, necessitating highly accurate early detection strategies through low-dose computed tomography screening. Artificial intelligence has emerged as a potential adjunct to improve the sensitivity of nodule detection and the accuracy of malignancy risk stratification [1, 2]. Meta-analytic data indicate that deep learning (a type of machine learning that uses multi-layered neural networks to identify complex patterns in imaging data) can achieve a pooled area under the receiver operating curve of 0.82 (95% CI 0.80 to 0.85), which compares to 0.73 (95% CI 0.72 to 0.74) observed in traditional risk regression models [3]. However, integrating these tools into clinical workflows often introduces a trade-off. AI-assisted reading may increase sensitivity for actionable nodules by 5% to 20% while simultaneously decreasing specificity by 3% to 7%, potentially leading to nearly 80,000 unnecessary follow-up scans per million patients screened [4, 5]. Furthermore, the transition from experimental algorithms to validated clinical tools requires rigorous comparison against established management frameworks such as the Lung Imaging Reporting and Data System (a standardized classification tool used by radiologists to categorize lung nodules based on malignancy risk) [5]. A new study evaluates how current commercial products align with these standardized clinical management guidelines and the strength of the evidence supporting their implementation.

Benchmarking AI Capabilities Against Clinical Standards

To evaluate the clinical utility of current technology, researchers conducted a systematic analysis of 16 CE-marked artificial intelligence products (software certified to meet European health, safety, and environmental protection standards) developed by 16 different vendors for lung nodule analysis. The study assessed these tools based on their ability to perform six core tasks essential to the radiologic workflow: nodule detection, classification (categorizing the type of lesion), measurement, growth assessment (comparing changes over time), malignancy risk estimation, and structured management (providing specific follow-up recommendations). By focusing on these specific functionalities, the authors aimed to determine how well automated tools support the complex decision-making required in a daily screening program. The researchers measured the functional overlap of these products against four major international clinical frameworks: the Lung Imaging Reporting and Data System 2022 (Lung-RADS 2022), the British Thoracic Society (BTS) guidelines, the European Union Position Statement (EUPS), and the European Society of Thoracic Imaging (ESTI). To gather this data, the authors utilized a standardized questionnaire confirmed by 10 of the participating vendors, while public documentation and technical specifications were used to evaluate the remaining six non-responders. This methodology allowed for a direct comparison between the technical capabilities of the software and the rigorous requirements of established medical guidelines, highlighting where current artificial intelligence offerings align with or fall short of standard clinical practice.

Functional Strengths and Critical Gaps in Nodule Management

The analysis revealed that most commercial tools are equipped for the fundamental tasks of nodule identification and quantification. Specifically, 14 out of 16 products are capable of detecting and measuring both solid and subsolid nodules, which are the primary targets in most screening protocols. Furthermore, 12 out of 16 products support growth assessment of nodules, a critical function for determining the stability or progression of findings over sequential imaging sessions by comparing current and prior scans. Beyond basic detection, the researchers found that 9 out of 16 products provide malignancy risk estimation, a feature intended to assist clinicians in triaging patients for further diagnostic workup or surveillance. Among these tools, the methodology for calculating risk varied significantly. Malignancy risk estimation was based on the PanCan model in 5 products and artificial intelligence-based scores in 4 products. The PanCan model is a validated clinical prediction tool that uses patient and nodule characteristics to estimate the probability of malignancy, while the automated scores rely on proprietary algorithms to analyze imaging features and generate a probability metric. Despite these capabilities, the study identified significant omissions in the current technological landscape that directly impact clinical workflow. Notably, zero products provide support for the management of endobronchial or cystic lesions. Endobronchial lesions are masses located within the airway, and cystic lesions are nodules associated with thin-walled air spaces. Both require specific management strategies under comprehensive screening guidelines. For the practicing radiologist or pulmonologist, the absence of automated support for these lesion types means they must continue to rely entirely on manual identification and classification for these specific clinical scenarios, as current software does not yet account for these complex morphologies.

Discrepancies in Guideline Adherence

The researchers evaluated how well the 16 products aligned with specific clinical guidelines by calculating task coverage, which represents the percentage of functional overlap between the software capabilities and the requirements of each recommendation. The analysis demonstrated that alignment varies significantly depending on which international standard is applied. For the European Union Position Statement (EUPS), which outlines requirements for lung cancer screening in Europe, 10 products achieved high task coverage, defined as greater than 75% functional overlap. In contrast, when measured against the British Thoracic Society (BTS) guidelines, which are widely used for the management of pulmonary nodules, only 4 products reached the threshold for high task coverage of more than 75%. This discrepancy suggests that while many commercial tools are well suited for the European Union Position Statement framework, their utility may be more limited for clinicians strictly following the British Thoracic Society protocols. The gap in guideline adherence became even more pronounced when the products were assessed against the Lung-RADS 2022 and European Society of Thoracic Imaging (ESTI) recommendations. Despite the widespread clinical reliance on Lung-RADS for standardized reporting and management in the United States, no products achieved high task coverage for the Lung-RADS 2022 or ESTI recommendations. This lack of comprehensive support for Lung-RADS 2022, which requires specific categorization and management steps based on nodule size and type, means that radiologists using these automated tools must still manually perform a significant portion of the workflow to ensure full compliance with the standard. For the practicing clinician, these findings indicate that while current software can assist in detection and measurement, it cannot yet be relied upon to provide a complete, automated management pathway that fully satisfies the most rigorous international screening standards.

The Evidence Gap: Accuracy Versus Outcomes

The clinical utility of artificial intelligence in lung cancer screening depends not only on software functionality but also on the quality of the supporting literature. In this analysis, the researchers identified a total of 60 peer-reviewed studies across all 16 evaluated products. However, the nature of this evidence suggests a significant disconnect between technical validation and clinical application. The data show that 70% of the peer-reviewed evidence focused on assessing diagnostic accuracy, such as the ability of the software to correctly identify or measure a nodule. While these metrics are necessary for initial validation, they do not provide a complete picture of how these tools function in a real-world clinical environment where patient variability and workflow integration are critical factors. A more granular look at the study designs reveals a scarcity of high-level evidence required for robust clinical recommendations. Only 7% of the identified peer-reviewed studies were prospective in design, meaning the vast majority of data stems from retrospective analyses of existing datasets. Furthermore, the researchers found that the evidence for these products is clustered at lower efficacy levels within a six-level framework (a standardized hierarchy used to evaluate the clinical efficacy of diagnostic imaging, ranging from basic technical performance to broad societal impact). Most notably, zero studies reported on patient outcomes or the societal impact of the artificial intelligence products. For the practicing clinician, this means there is currently no published evidence demonstrating that these tools improve survival rates, reduce unnecessary biopsies, or lower healthcare costs. Consequently, integrating these technologies into guidelines or securing reimbursement will require a cautious, monitored approach until higher-level clinical evidence becomes available.

Study Info

Commercial AI for CT lung cancer screening: product capabilities, coverage of nodule management tasks and supporting evidence

Noa Antonissen, Steven Schalekamp, H Hahn, Kicky G. van Leeuwen, et al.

Journal European Radiology

Published May 07, 2026

References

1. Gao C, Wu L, Wu W, et al. Deep learning in pulmonary nodule detection and segmentation: a systematic review. European Radiology. 2024. doi:10.1007/s00330-024-10907-0

2. Wulaningsih W, Villamaria C, Akram A, Benemile J, Croce F, Watkins J. Deep Learning Models for Predicting Malignancy Risk in CT-Detected Pulmonary Nodules: A Systematic Review and Meta-analysis. Lung. 2024. doi:10.1007/s00408-024-00706-1

3. Leonard S, Patel MA, Zhou Z, Le H, Mondal P, Adams SJ. Comparing Artificial Intelligence and Traditional Regression Models in Lung Cancer Risk Prediction Using A Systematic Review and Meta-Analysis.. Journal of the American College of Radiology : JACR. 2025. doi:10.1016/j.jacr.2025.02.042

4. Geppert J, Asgharzadeh A, Brown A, et al. Software using artificial intelligence for nodule and cancer detection in CT lung cancer screening: systematic review of test accuracy studies.. Thorax. 2024. doi:10.1136/thorax-2024-221662

5. Quirk J, Donnchadha CM, Vaantaja J, et al. Future implications of artificial intelligence in lung cancer screening: a systematic review.. BJR open. 2024. doi:10.1093/bjro/tzae035