TY - GEN
T1 - Accelerometry- and Temperature-Based Algorithms to Assess Sleep Habits Among Danish Children and Adolescents
AU - Lykke, Esben Høegholm
PY - 2023/12/4
Y1 - 2023/12/4
N2 - Introduction: Sleep is an important element in promoting health, and the quantification
of sleep has been improved with modern technology. Polysomnography, considered
the gold standard, provides in-depth insight into sleep but is costly. In contrast, accelerometry is a cheaper and less invasive method, especially for longer home-based
recordings. Machine learning is a tool that has the potential to automate and facilitate
the estimation of sleep from accelerometer data. However, there are three challenges:
producing reliable training data, ensuring data integrity through accurate removal of
non-wear, and effectively using data to estimate sleep. Firstly, it is necessary to have
sufficient and accurate annotations in the data for effective supervised machine learning,
emphasizing the importance of methods for manual annotations based on accelerometer
data. Secondly, it is essential to detect and remove periods when the device is not worn to
perform accurate analyses. Identifying periods of non-wear is challenging, as traditional
methods like logbooks can be prone to bias. Existing algorithms removes bias, but their
accuracy is still debated. Finally, once data is correctly collected and processed, it is
crucial to apply it effectively. Current methods for estimating sleep using accelerometers
are based on data from wrist-worn and hip-worn devices, while data from thigh-worn
accelerometers remains largely untapped for sleep estimation.Objectives: Firstly, we will assess the accuracy of manual annotations for in-bed and
out-of-bed timestamps in raw accelerometer data, comparing them to the timestamps
determined by sleep diaries and an EEG-based sleep monitor. Secondly, we will assess
heuristic algorithms and machine learning models for detecting non-wear. Finally, we
will develop machine learning models for sleep classification and the estimation of sleep
quality metrics using data from thigh-worn accelerometers and compare them with
EEG-based sleep recordings.Methods: For Paper I, accelerometer data from the hip and thigh of 14 children and 19
adults were used. Using Audacity, an open-source audio editing program, three raters
annotated each accelerometer recording by marking the times when the person went to
bed and when they got out of bed. Two rounds of annotations were performed to test
reliability. The manual annotations were evaluated against both sleep diaries and EGbased sleep recordings. Concordance and agreement was evaluated using the intraclass
correlation coefficient and Bland-Altman analyses.Paper II used accelerometer data from sensors placed on the wrist, thigh, and hip. In
hip and thigh data from 64 persons and wrist data from 42 participants, periods of
non-wear were manually annotated in the same way as described in paper I. Three
variants of decision trees were trained on 79.2% data from the hip and thigh and were
evaluated against a selection of heuristic algorithms and recently developed machine
learning models. The remaining data were used as test data for all included algorithms
and models. Decision tree hyperparameters were optimized through five-fold crossvalidation. External validation was performed on all wrist data. All included algorithms
and models were evaluated using metrics derived from confusion matrices.For Paper III, accelerometry and EEG-based sleep recordings from children aged 4-17
years were used. Data preprocessing included a low-pass Butterworth filter, removal of
non-wear periods using the method described paper II, and a set of 64 predictors were
extracted. Sleep recordings were median filtered in 5 and 10-minute windows before
models were trained to better capture true awakenings. Two model strategies were
used, a sequential approach with four pairs of binary classification models, and the other
strategy used a multi-class model. Hyperparameter optimization was performed using
ten-fold Monte Carlo cross-validation on the binary classifiers. Class imbalance was
addressed using the synthetic minority oversampling technique. Data for training the
multi-class model was split in a ratio of 50/25/25 for training, validation, and testing.
For both strategies, the F1 score was used as an optimization target. Confusion matrix
derivatives were used to assess epoch-to-epoch performance, and agreements on sleep
quality metrics were assessed using Bland-Altman plots and Pearson correlations.Results: The results of Paper I indicated excellent inter- and intra-rater agreement.
Furthermore, the Bland–Altman limits of agreement were approximately ±30 min, showcasing only a minimal mean bias of manual annotation compared to EEG-based and sleep
diary in-bed timestamps.In Paper II, for detecting non-wear periods longer than 60 minutes, the established
consecutive zeros algorithms were the most effective, registering F1-scores above 0.96.
However, for durations shorter than 60 minutes, decision trees stood out, achieving
F1-scores of over 0.74 across all sensor locations. Notably, the recently published deep
learning and random forests models could not match this performance.In Paper III, the XGBoost model performed the best when compared to an EEG-based
sleep monitor in detecting sleep. The model demonstrated small biases in sleep period
time (0.2 minutes), total sleep time (-7.0 minutes), sleep efficiency (-1.1%), and wake after
sleep onset (-0.9 minutes). The model showed a moderate 0.66 correlation with total sleep
time. Our limits of agreement for total sleep time, ranging from -95.5 to 81.4 minutes,
were consistent with previous studies on hip and wrist devices.Conclusions: Overall, the findings of this thesis underscore the reliability and precision
of emerging technological methods in sleep and non-wear detection research. Paper 1
examined the agreement of manual annotations of in-bed time against traditional benchmarks and found it to be good to excellent across all comparisons. Paper 2 emphasized
the nuances of non-wear detection, revealing clear strengths in certain algorithms for
specific durations and highlighting areas where newer models need enhancement. Paper
3 highlights the XGBoost model for sleep assessment with thigh-worn accelerometers,
situating it as a valid alternative compared to methods employed on hip and wrist accelerometer data. However,challenges remain in identifying in-bed awake periods and in
assessing sleep quality metrics on an individual-basis, consistent with previous findings
from wrist and hip-worn devices.
AB - Introduction: Sleep is an important element in promoting health, and the quantification
of sleep has been improved with modern technology. Polysomnography, considered
the gold standard, provides in-depth insight into sleep but is costly. In contrast, accelerometry is a cheaper and less invasive method, especially for longer home-based
recordings. Machine learning is a tool that has the potential to automate and facilitate
the estimation of sleep from accelerometer data. However, there are three challenges:
producing reliable training data, ensuring data integrity through accurate removal of
non-wear, and effectively using data to estimate sleep. Firstly, it is necessary to have
sufficient and accurate annotations in the data for effective supervised machine learning,
emphasizing the importance of methods for manual annotations based on accelerometer
data. Secondly, it is essential to detect and remove periods when the device is not worn to
perform accurate analyses. Identifying periods of non-wear is challenging, as traditional
methods like logbooks can be prone to bias. Existing algorithms removes bias, but their
accuracy is still debated. Finally, once data is correctly collected and processed, it is
crucial to apply it effectively. Current methods for estimating sleep using accelerometers
are based on data from wrist-worn and hip-worn devices, while data from thigh-worn
accelerometers remains largely untapped for sleep estimation.Objectives: Firstly, we will assess the accuracy of manual annotations for in-bed and
out-of-bed timestamps in raw accelerometer data, comparing them to the timestamps
determined by sleep diaries and an EEG-based sleep monitor. Secondly, we will assess
heuristic algorithms and machine learning models for detecting non-wear. Finally, we
will develop machine learning models for sleep classification and the estimation of sleep
quality metrics using data from thigh-worn accelerometers and compare them with
EEG-based sleep recordings.Methods: For Paper I, accelerometer data from the hip and thigh of 14 children and 19
adults were used. Using Audacity, an open-source audio editing program, three raters
annotated each accelerometer recording by marking the times when the person went to
bed and when they got out of bed. Two rounds of annotations were performed to test
reliability. The manual annotations were evaluated against both sleep diaries and EGbased sleep recordings. Concordance and agreement was evaluated using the intraclass
correlation coefficient and Bland-Altman analyses.Paper II used accelerometer data from sensors placed on the wrist, thigh, and hip. In
hip and thigh data from 64 persons and wrist data from 42 participants, periods of
non-wear were manually annotated in the same way as described in paper I. Three
variants of decision trees were trained on 79.2% data from the hip and thigh and were
evaluated against a selection of heuristic algorithms and recently developed machine
learning models. The remaining data were used as test data for all included algorithms
and models. Decision tree hyperparameters were optimized through five-fold crossvalidation. External validation was performed on all wrist data. All included algorithms
and models were evaluated using metrics derived from confusion matrices.For Paper III, accelerometry and EEG-based sleep recordings from children aged 4-17
years were used. Data preprocessing included a low-pass Butterworth filter, removal of
non-wear periods using the method described paper II, and a set of 64 predictors were
extracted. Sleep recordings were median filtered in 5 and 10-minute windows before
models were trained to better capture true awakenings. Two model strategies were
used, a sequential approach with four pairs of binary classification models, and the other
strategy used a multi-class model. Hyperparameter optimization was performed using
ten-fold Monte Carlo cross-validation on the binary classifiers. Class imbalance was
addressed using the synthetic minority oversampling technique. Data for training the
multi-class model was split in a ratio of 50/25/25 for training, validation, and testing.
For both strategies, the F1 score was used as an optimization target. Confusion matrix
derivatives were used to assess epoch-to-epoch performance, and agreements on sleep
quality metrics were assessed using Bland-Altman plots and Pearson correlations.Results: The results of Paper I indicated excellent inter- and intra-rater agreement.
Furthermore, the Bland–Altman limits of agreement were approximately ±30 min, showcasing only a minimal mean bias of manual annotation compared to EEG-based and sleep
diary in-bed timestamps.In Paper II, for detecting non-wear periods longer than 60 minutes, the established
consecutive zeros algorithms were the most effective, registering F1-scores above 0.96.
However, for durations shorter than 60 minutes, decision trees stood out, achieving
F1-scores of over 0.74 across all sensor locations. Notably, the recently published deep
learning and random forests models could not match this performance.In Paper III, the XGBoost model performed the best when compared to an EEG-based
sleep monitor in detecting sleep. The model demonstrated small biases in sleep period
time (0.2 minutes), total sleep time (-7.0 minutes), sleep efficiency (-1.1%), and wake after
sleep onset (-0.9 minutes). The model showed a moderate 0.66 correlation with total sleep
time. Our limits of agreement for total sleep time, ranging from -95.5 to 81.4 minutes,
were consistent with previous studies on hip and wrist devices.Conclusions: Overall, the findings of this thesis underscore the reliability and precision
of emerging technological methods in sleep and non-wear detection research. Paper 1
examined the agreement of manual annotations of in-bed time against traditional benchmarks and found it to be good to excellent across all comparisons. Paper 2 emphasized
the nuances of non-wear detection, revealing clear strengths in certain algorithms for
specific durations and highlighting areas where newer models need enhancement. Paper
3 highlights the XGBoost model for sleep assessment with thigh-worn accelerometers,
situating it as a valid alternative compared to methods employed on hip and wrist accelerometer data. However,challenges remain in identifying in-bed awake periods and in
assessing sleep quality metrics on an individual-basis, consistent with previous findings
from wrist and hip-worn devices.
M3 - Ph.D. thesis
PB - Syddansk Universitet. Det Sundhedsvidenskabelige Fakultet
ER -