JAMA Cohort Study

Protein-Based Model Improves Short-Term Lung Cancer Risk Prediction

A new protein-based model enhances identification of individuals with a smoking history at high short-term risk for lung cancer.

Protein-Based Model Improves Short-Term Lung Cancer Risk Prediction
For Doctors in a Hurry
  • The study addressed the need for improved lung cancer risk prediction in smokers not eligible for current screening guidelines.
  • Researchers developed the INTEGRAL-Risk model using age, smoking history, and 13 proteins from 3695 participants in 14 cohorts.
  • The model demonstrated strong discrimination at one year (AUC 0.88, 95% CI 0.85-0.91), exceeding PLCOm2012 (AUC 0.79, p < .001).
  • The authors concluded that the protein-based INTEGRAL-Risk model improved short-term lung cancer prediction in individuals with a smoking history.
  • This model could enhance selection of high-risk individuals for lung cancer screening, potentially increasing early detection.

Refining Lung Cancer Screening Eligibility

Despite advances in oncology, lung cancer remains a leading cause of cancer mortality, underscoring the need for improved early detection [1, 2, 3, 4, 5]. Screening with low-dose computed tomography (LDCT) is known to reduce mortality in high-risk individuals, but current eligibility criteria, based largely on age and smoking history, fail to identify a substantial portion of patients who ultimately develop the disease [6, 7, 8, 9]. A recent study introduces a risk model that integrates protein biomarkers with clinical data, aiming to more accurately pinpoint individuals with a smoking history who are at high short-term risk and could benefit from screening.

Developing a Protein-Based Risk Assessment

To address the limitations of current screening guidelines, researchers developed and validated the protein-based Integrative Analysis of Lung Cancer Risk and Etiology (INTEGRAL)–Risk model. The investigation drew upon data from the Lung Cancer Cohort Consortium, a large international effort that followed participants from the US, Europe, Asia, and Australia for decades. The study was designed around 14 case cohorts totaling 3695 participants with a history of smoking, a group that included 2305 randomly sampled individuals and 1390 patients who were diagnosed with lung cancer within three years of their blood draw. This design allowed for a direct comparison between individuals who developed cancer and their peers from the same general population. The model itself was constructed by analyzing plasma or serum samples with a panel of 13 specific proteins, combining this biological data with standard risk factors like age and smoking history. To ensure rigor, the model was first developed on a training set of 1951 participants and then independently validated on a separate testing set of 1744 participants, with performance evaluated at 1, 2, and 3 years post-collection. The primary measures of success were the model's discrimination, its ability to distinguish between who will and will not get cancer, and its calibration, or how well its predictions align with the actual number of cancer cases observed.

Study Population Characteristics

The study's foundation was its 3695 participants with a smoking history, who were carefully divided to prevent bias in model development. The training set, used to build the INTEGRAL-Risk model, included 1951 participants, of whom 807 had lung cancer. The separate testing set, crucial for unbiased validation, was composed of 1744 participants, including 583 with lung cancer. To ensure the findings were relevant to a broader clinical population, the researchers applied statistical weights to the combined sample. This technique allowed the study cohort to represent a much larger population of 323,570 individuals, reflecting the demographics a clinician might see in practice. In this statistically represented population, 57% (185,016) were female, and the median age was 60 years (interquartile range, 51-67 years), aligning with the age group where lung cancer risk becomes a significant concern.

Enhanced Short-Term Prediction Accuracy

The protein-based model demonstrated superior performance in identifying near-term lung cancer risk. In the independent testing set, the model's ability to distinguish between individuals who would and would not develop lung cancer within one year, a metric known as discrimination, was significantly higher than the standard questionnaire-based PLCOm2012 model. The INTEGRAL-Risk model achieved an area under the curve (AUC) of 0.88 (95% CI, 0.85-0.91), compared to an AUC of 0.79 (95% CI, 0.75-0.83) for the PLCOm2012 model (P < .001 for the difference). From a clinical standpoint, this means the protein model is substantially better at correctly classifying patients. When calibrated to the same specificity as the 2021 US Preventive Services Task Force (USPSTF) criteria, the INTEGRAL-Risk model identified 85% of incident lung cancer cases, compared to just 63% identified by the USPSTF criteria and 70% by the PLCOm2012 model. As expected, the model's predictive accuracy decreased over longer time frames, with the AUC declining to 0.84 (95% CI, 0.81-0.86) at two years and 0.81 (95% CI, 0.79-0.83) at three years. The model also proved to be well-calibrated, meaning its risk predictions closely matched the actual number of cancers observed, with a ratio of expected-to-observed cases over three years of 0.87 (95% CI, 0.69-1.14).

Clinical Implications for Screening

These findings suggest that incorporating protein biomarkers can refine the selection of candidates for lung cancer screening beyond what is possible with questionnaires alone. The INTEGRAL-Risk model's improved short-term prediction in people with a smoking history is particularly relevant for the many patients who develop lung cancer but do not meet current rigid screening criteria. By providing a more nuanced, biologically informed risk score, such a model could help clinicians identify individuals whose underlying biology places them at high risk, even if their age or pack-year history falls below current thresholds. The ultimate clinical utility lies in its potential to expand the population benefiting from LDCT screening. A more accurate method for selecting high-risk individuals could lead to earlier diagnoses in a broader patient group, facilitating more effective and timely interventions and potentially reducing lung cancer mortality.

Study Info
Biomarker-Based Eligibility for Lung Cancer Screening
Hana Zahed, Xiaoshuang Feng, Karine Alcala, Karl Smith-Byrne, et al.
Journal JAMA
Published May 18, 2026

References

1. Siegel RL, Miller KD, Wagle NS, Jemal A. Cancer statistics, 2023. CA A Cancer Journal for Clinicians. 2023. doi:10.3322/caac.21763

2. Siegel RL, Giaquinto AN, Jemal A. Cancer statistics, 2024. CA A Cancer Journal for Clinicians. 2024. doi:10.3322/caac.21820

3. Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer Statistics, 2021. CA A Cancer Journal for Clinicians. 2021. doi:10.3322/caac.21654

4. Torre LA, Bray F, Siegel RL, Ferlay J, Lortet‐Tieulent J, Jemal A. Global cancer statistics, 2012. CA A Cancer Journal for Clinicians. 2015. doi:10.3322/caac.21262

5. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2019. CA A Cancer Journal for Clinicians. 2019. doi:10.3322/caac.21551

6. Wender RC, Fontham ETH, Barrera E, et al. American Cancer Society lung cancer screening guidelines. CA A Cancer Journal for Clinicians. 2013. doi:10.3322/caac.21172

7. Huang K, Wang S, Lu W, Chang Y, Su J, Lu Y. Effects of low-dose computed tomography on lung cancer screening: a systematic review, meta-analysis, and trial sequential analysis.. BMC pulmonary medicine. 2019. doi:10.1186/s12890-019-0883-x

8. Toumazis I, Bastani M, Han SS, Plevritis SK. Risk-Based lung cancer screening: A systematic review.. Lung cancer (Amsterdam, Netherlands). 2020. doi:10.1016/j.lungcan.2020.07.007

9. Amicizia D, Piazza MF, Marchini F, et al. Systematic Review of Lung Cancer Screening: Advancements and Strategies for Implementation.. Healthcare (Basel, Switzerland). 2023. doi:10.3390/healthcare11142085