For Doctors in a Hurry
- Clinicians lack empirical benchmarks comparing patient-reported sleep duration and quality against data captured by consumer-grade wearable devices.
- The study analyzed 841 sleep instances from 130 participants who used three different Garmin smartwatch generations over seven days.
- Mean differences in sleep duration between devices and self-reports ranged from 6.58 to 21.22 minutes across the three models.
- The researchers concluded that smartwatches reliably measure sleep duration, though sleep quality estimates require further algorithmic refinement and validation.
- Physicians may use newer generation wearables to track sleep duration, but should interpret quality metrics with caution during clinical assessments.
The clinical utility of wearable sleep monitoring
Sleep assessment serves as a foundational pillar in cardiovascular disease prevention and the longitudinal management of psychiatric conditions such as depression [1, 2]. The European Society of Cardiology identifies sleep as a modifiable risk factor for cardiovascular events, while digital phenotyping (the use of continuous sensor data to characterize an individual's behavioral patterns) has identified sleep disturbances as a primary biomarker for differentiating between unipolar and bipolar depression [1, 2, 3]. Although traditional sleep diaries remain a clinical standard, digital health interventions like chatbots have demonstrated measurable efficacy in improving sleep duration, yielding a standardized mean difference of 0.44 (95% CI 0.32 to 0.55) [4, 5]. Clinicians frequently encounter poor patient adherence with manual tracking, yet a meta-analysis of 135 studies involving 64,541 participants indicates a pooled adherence rate of 81.6% (95% CI 78.7% to 84.4%) for actigraphy (the use of wearable accelerometers to record movement and rest cycles) [6]. Beyond monitoring, wearable activity trackers have shown direct clinical utility by increasing physical activity by approximately 1,800 steps per day and reducing body weight by 1 kg, though their precision in measuring specific sleep parameters compared to patient perception requires further validation [7, 8, 9]. A new study now evaluates the accuracy of multiple generations of consumer smartwatches in capturing sleep duration and quality.
Comparative accuracy across device generations
The researchers evaluated the concordance (the degree of agreement between two different measurement methods) between user-reported sleep and data captured by wearable technology. Between November 7, 2023, and June 30, 2024, the study recruited 130 participants from two decentralized digital health well-being studies. These individuals completed a 7-day sleep diary, recording their perceived sleep quality and specific sleep timestamps (the exact start and end times of a sleep period). Simultaneously, participants wore one of three generations of Garmin smartwatches to sleep: the Garmin Vívoactive 4 (released in 2019), the Garmin Venu 2 Plus (2022), or the Garmin Venu 3/3S (2023). This methodology allowed the authors to analyze 841 sleep instances, comparing subjective reports against sleep timestamps and quality metrics derived directly from the devices. This longitudinal approach is particularly relevant for clinicians who rely on multi-day data to assess sleep hygiene rather than a single night's snapshot.
The analysis demonstrated that the accuracy of sleep duration tracking improved significantly with newer hardware and software iterations. For the oldest model tested, the Garmin Vívoactive 4, the mean difference in sleep duration between smartwatch-recorded and participant-reported data was 21.22 minutes. This discrepancy was large enough to result in statistically significant between-group differences in mean sleep durations for the Vívoactive 4 cohort. In contrast, the newer models showed higher levels of agreement. The Garmin Venu 2 Plus exhibited a mean difference of 11.67 minutes, while the Garmin Venu 3/3S achieved a mean difference of only 6.58 minutes. Crucially, the researchers found no statistically significant differences in mean sleep durations when comparing self-reported data to the measurements from the Venu 2 Plus or the Venu 3/3S, suggesting that these newer devices provide a reliable proxy for patient-reported sleep duration in a clinical context.
Discrepancies in perceived sleep quality and specialized populations
To evaluate the subjective experience of rest, participants self-reported their perceived sleep quality using the Sleep Quality Scale (SQS), a validated instrument that assesses recovery and daytime dysfunction. The researchers employed several statistical methods to validate the wearable data against these patient reports, including paired t-tests and chi-square tests. A key component of the analysis was equipercentile linking, which is a statistical method used to map scores from one scale to another to ensure they are comparable (essentially creating a mathematical bridge between the device's proprietary 0 to 100 score and the patient's 0 to 10 SQS report). This technique allowed the authors to align the proprietary sleep scores generated by the Garmin devices with the standardized SQS values provided by the participants, providing a direct comparison between objective sensor data and the patient's internal experience.
The results of the equipercentile linking revealed that the concordance between smartwatch sleep scores and self-reported sleep quality was highest for SQS scores between 4 and 7. However, the devices were less accurate at the ends of the spectrum. Disagreements in sleep quality assessment were observed at SQS ranges from 0 to 4 and 7 to 10, indicating that the smartwatches struggled to capture the nuances of very poor or exceptionally high-quality sleep as perceived by the patient. For clinicians, this suggests that while a wearable may provide a reliable estimate of sleep duration, the device's internal sleep score may not fully reflect a patient's subjective dissatisfaction or high satisfaction with their rest, particularly in cases of severe sleep disturbance or optimal recovery. This finding is critical for the management of insomnia, where the discrepancy between objective sleep time and subjective sleep quality is often a primary clinical focus.
The study also included exploratory analyses that established the difference between reported and recorded sleep duration in healthcare shift workers. This sub-analysis is clinically relevant because shift work is frequently associated with circadian rhythm disruption (a misalignment between the internal biological clock and the external environment) and inconsistent sleep hygiene. By comparing the diary entries of these workers against the objective timestamps from the smartwatches, the researchers highlighted the specific challenges of tracking sleep in populations with non-traditional schedules. These findings underscore the necessity for further refinement of wearable algorithms to better accommodate the irregular sleep patterns often found in medical professionals and other shift-based populations, ensuring that digital monitoring tools remain accurate even when biological rhythms are fragmented.
References
1. Visseren FL, Mach F, Smulders YM, et al. 2021 ESC Guidelines on cardiovascular disease prevention in clinical practice. European Heart Journal. 2021. doi:10.1093/eurheartj/ehab484
2. Angel VD, Lewis S, White KM, et al. Digital health tools for the passive monitoring of depression: a systematic review of methods. npj Digital Medicine. 2022. doi:10.1038/s41746-021-00548-8
3. Zhong R, Wu X, Chen J, Fang Y. Using Digital Phenotyping to Discriminate Unipolar Depression and Bipolar Disorder: Systematic Review. Journal of Medical Internet Research. 2025. doi:10.2196/72229
4. Singh B, Olds T, Brinsley J, et al. Systematic review and meta-analysis of the effectiveness of chatbots on lifestyle behaviours. npj Digital Medicine. 2023. doi:10.1038/s41746-023-00856-1
5. Aggarwal A, Tam CC, Wu D, Li X, Qiao S. Artificial Intelligence–Based Chatbots for Promoting Health Behavioral Changes: Systematic Review. Journal of Medical Internet Research. 2023. doi:10.2196/40789
6. Morris A, Seker A, Telesia L, et al. Adherence to Actigraphic Devices in Elementary School–Aged Children: Systematic Review and Meta-Analysis. Journal of Medical Internet Research. 2025. doi:10.2196/79718
7. Ferguson T, Olds T, Curtis R, et al. Effectiveness of wearable activity trackers to increase physical activity and improve health: a systematic review of systematic reviews and meta-analyses. The Lancet Digital Health. 2022. doi:10.1016/s2589-7500(22)00111-x
8. Alishaq M, Hag WA, Saleh N, et al. Exploring the Role of Wearable Electronic Medical Devices in Enhancing Patient Safety and Quality of Life for Older Adults: A Systematic Review. NILES journal for Geriatric and Gerontology. 2024. doi:10.21608/niles.2024.337727.1103
9. Brickwood K, Watson G, O’Brien J, Williams AD. Consumer-Based Wearable Activity Trackers Increase Physical Activity Participation: Systematic Review and Meta-Analysis. JMIR mhealth and uhealth. 2019. doi:10.2196/11819