Machine Learning Identifies Three Subtypes of Comorbid Depression and Anxiety

A 15-feature model using data from 2,951 young adults achieves high predictive accuracy for identifying specific risk and protective profiles.

For Doctors in a Hurry

Clinicians lack precise tools to identify young adults at high risk for comorbid depressive and anxiety symptoms.
The researchers analyzed 15 multidimensional health features from 2,951 Chinese young adults using six machine learning models.
The Random Forest model achieved an area under the curve of 0.905 with a Brier score of 0.097.
The authors concluded that specific protective factors and three distinct risk subtypes effectively differentiate comorbid patients from asymptomatic individuals.
This 15-feature model provides a clinically useful tool for scalable community screening within 2 to 67 percent risk thresholds.

The Clinical Complexity of Comorbid Affective Disorders in Young Adulthood

The onset of most mental health disorders occurs early in life, with 48.4% of all conditions manifesting by age 18 [1]. Among young adults, the psychological burden of depression and anxiety is frequently compounded by poor sleep quality, where psychological interventions have demonstrated a moderate effect size (a statistical measure of the strength of a treatment effect) of -0.53 in improving sleep scores [2]. Clinical detection is further complicated by significant barriers to seeking professional help, including stigma and a desire for self-reliance [3]. While early intervention improves long-term outcomes, standard screening tools often lack the granularity to account for individual variability, though digital health tools using passive monitoring (the collection of data through smartphone sensors without active user input) are being evaluated to bridge these diagnostic gaps [1, 4]. A new study now examines how multidimensional data can identify specific patient subtypes to tailor preventive strategies and improve clinical precision.

Predictive Modeling Using Multidimensional Data

The diagnostic challenge of managing young adults is underscored by the fact that comorbid depressive and anxiety symptoms (CDAS) present greater clinical severity and functional impairment than isolated symptoms of either disorder. To address this, researchers analyzed data from 2,951 Chinese young adults (aged 19 to 35) sourced from the PBICR2021 database. Because traditional linear analyses often struggle to capture the complex, non-linear interactions between multidimensional risk and protective factors, the study utilized machine learning to distinguish individuals with CDAS from asymptomatic individuals. This approach aimed to identify specific risk subtypes that could eventually inform more precise clinical interventions, moving beyond the limitations of standard symptom checklists. To refine the predictive model, the authors selected 15 features spanning psychological, physiological, social, and environmental dimensions. This selection process utilized Lasso Regularization (a statistical method that simplifies models by penalizing complex coefficients to prevent overfitting) and the Boruta algorithm (a feature selection wrapper that identifies truly relevant variables by comparing them to randomized "shadow" features). These techniques allowed the researchers to filter through high-dimensional data to isolate the most clinically significant predictors of comorbidity, ensuring the model remained robust across different data types. The researchers trained and evaluated six machine learning models to determine which could most accurately identify CDAS. Among these, the Random Forest model achieved the optimal performance, demonstrating an Area Under the Curve (AUC) of 0.905. The model also yielded a Precision-Recall Area Under the Curve (PR-AUC) of 0.703, a metric that evaluates the balance between positive predictive value and sensitivity in imbalanced datasets where the condition of interest is relatively rare. Furthermore, the model showed high reliability through good calibration, evidenced by a Brier Score of 0.097 and an Expected Calibration Error (ECE) of 0.088, indicating that the predicted probabilities closely align with actual clinical outcomes.

Mapping Risk Factors and Resistance Resources

To move beyond traditional disease-centric models, the researchers applied Antonovsky's salutogenic framework, a medical approach focusing on factors that support human health and well-being rather than just those that cause disease. This framework allowed the team to categorize variables into either risk factors or Generalized Resistance Resources (GRRs), which are the biological, material, and psychosocial factors that help individuals manage stress and maintain health. To interpret how these variables influenced the Random Forest model's output, the authors utilized SHAP analysis (a method used to explain the output of complex machine learning models by quantifying the specific contribution of each feature to the final prediction). This analysis revealed that protective and risk factors showed opposite associations between comorbid versus asymptomatic groups, providing a clear statistical divergence that clinicians can use to stratify patient risk. The SHAP analysis identified several potent predictors of CDAS within this cohort. Avoidant emotional regulation was identified as a positive predictor for CDAS, suggesting that patients who habitually bypass or suppress emotional responses are at higher risk for severe comorbidity. Furthermore, obesogenic behaviors and stress were positively linked to CDAS, reinforcing the known interplay between metabolic health, external pressures, and affective stability. Unexpectedly, the data showed that solo exercise, including swimming and using sports equipment, was positively linked to CDAS among Chinese youth. This finding suggests that in this specific demographic, solitary physical activities may be a marker for social withdrawal or a lack of interpersonal engagement rather than a purely health-promoting behavior. Conversely, the study identified specific Generalized Resistance Resources that served as robust protective factors. Self-efficacy, family health, social support, and healthcare access were identified as protective Generalized Resistance Resources (GRRs), with higher scores in these areas significantly reducing the probability of a CDAS diagnosis. By quantifying these contributions, the researchers demonstrated that the presence of these resources does not merely represent the absence of risk but serves as an active buffer. For the practicing clinician, these findings suggest that assessing a patient's access to healthcare and their level of family health may be just as diagnostic as evaluating their stress levels or emotional regulation strategies.

Clinical Utility and Patient Subtyping

The transition toward personalized psychiatric care requires a shift from viewing depression and anxiety as monolithic entities to recognizing their underlying biological and behavioral diversity. To achieve this, the researchers utilized bidirectional hierarchical clustering (a method of grouping data points into a tree-like structure based on similarity) to analyze the SHAP values of the study participants. This analysis identified three distinct risk subtypes of CDAS among the 2,951 young adults. These subtypes represent heterogeneous factor combination patterns, meaning that patients within each group share specific clusters of risk and protective factors that differ significantly from the other groups. For the practicing physician, these findings suggest that comorbid depression and anxiety is a collection of distinct phenotypes. Identifying these subtypes supports the development of precision-targeted interventions that address the specific psychological, physiological, and social profiles of individual patients rather than relying on a generalized treatment protocol. The practical application of this model in a clinical or public health setting was evaluated through decision curve analysis (a method for evaluating the clinical utility of a predictive model by weighing the benefits of true positive identifications against the harms of false positives). This analysis validated the model within 2% to 67% risk thresholds, indicating that the tool provides a net benefit for clinical decision-making across a broad range of risk probabilities. Because of this robust performance and the use of accessible data points, the 15-feature model is proposed for scalable community screening to identify at-risk young adults. The model's reliability is further supported by its calibration metrics, including a Brier Score of 0.097 and an Expected Calibration Error of 0.088, which demonstrate that the predicted probabilities of the Random Forest model closely match the actual observed outcomes in the study population, offering a reliable foundation for early intervention strategies.

Study Info

Differential factor effects in comorbid depressive and anxiety symptoms (CDAS): A machine learning approach to individualized mental health promotion

Sydney X. Hu, Simin Liu, Yuanwei Guo

Journal Journal of Affective Disorders

Published December 18, 2025

References

1. Solmi M, Raduà J, Olivola M, et al. Age at onset of mental disorders worldwide: large-scale meta-analysis of 192 epidemiological studies. Molecular Psychiatry. 2021. doi:10.1038/s41380-021-01161-7

2. Kodsi A, Bullock B, Kennedy G, Tirlea L. Psychological Interventions to Improve Sleep in Young Adults: A Systematic Review and Meta-analysis of Randomized Controlled Trials. Behavioural Sleep Medicine. 2021. doi:10.1080/15402002.2021.1876062

3. Gulliver A, Griffiths KM, Christensen H. Perceived barriers and facilitators to mental health help-seeking in young people: a systematic review. BMC Psychiatry. 2010. doi:10.1186/1471-244x-10-113

4. Angel VD, Lewis S, White KM, et al. Digital health tools for the passive monitoring of depression: a systematic review of methods. npj Digital Medicine. 2022. doi:10.1038/s41746-021-00548-8

or

The Clinical Lighthouse

Subscribe to read the full analysis

Full access to every article, clinical summary, and specialty feed.

Unlimited article access
Weekly curated digest
Specialty-specific alerts
CME-ready summaries

Sign Up for full access

Machine Learning Identifies Three Subtypes of Comorbid Depression and Anxiety

Already a member?

Subscribe to read the full analysis