Validity, reliability, and challenges of data from nationwide health surveys in Denmark

Denmark has a longstanding tradition of conducting large representative health surveys with an overall aim of describing the status and trends in health and morbidity in the population. These health surveys are essential in public health surveillance by providing information for health care planning, prioritizing, and policy development. Accordingly, it is of utmost importance that data from health surveys are valid and reliable and not compromised by systematic methodological challenges or bias.

Aim: The overall aim of the present Ph.D. thesis was to investigate the validity, reliability, and challenges of using data from nationwide health surveys in Denmark. More specifically, the aims of the included studies were as follows:

•To describe the study design, including the mode of data collection, response rates, and samples ofthe Danish Health and Morbidity Survey in 2010, 2013, and 2017 (Paper I)
•To assess the impact of applying calibrated non-response weights on prevalence estimates ofprimary health care utilization (Paper II)
•To examine the agreement between self-reported and register-documented diseases (Paper III)
•To examine the consistency in self-reported lifetime use of illicit drugs using panel data (PaperIV)
•To examine the consistency in self-reported diseases using panel data (Paper V)

Method: Data for the present thesis were derived from both health surveys and registers. Paper I (2010,2013, 2017), Paper IV (2000, 2005, 2010, 2013), and Paper V (2013, 2017) were based on data fromthe Danish Health and Morbidity Surveys. The whole sample and respondents from the Danish NationalHealth Survey 2017 were included in Paper II, whereas only data from respondents from the DanishNational Health Survey 2017 were included in Paper III. In Paper I-III, health survey data were linkedon an individual level to data from administrative registers.

Results: The survey sample of the Danish Health and Morbidity Survey in 2010, 2013, and 2017 eachincluded a random sample of 25,000 individuals aged 16 years or older, of whom 15,165, and 14,265,and 14,022 individuals, respectively, completed the questionnaire. This corresponded a response rate of60.7%, 57.1%, and 56.1%, respectively, with the response mode distribution for the web questionnaire increasing from 31.7% in 2010 to 73.8% in 2017. The response rate was particularly low among youngmen, unmarried individuals, and individuals with another ethnic background than Danish (Paper I).Weighting for non-response is a widely applied method to reduce the potential risk of non-response biaswith an overall aim to achieve generalizable population estimates. By applying non-response weights toregister-based estimates on primary health care use among respondents, estimate bias were reducedwhen compared to estimates among the entire sample (Paper II). Paper III showed that the validity ofself-reported diseases varied across agreement measures and diseases when compared to register data onthe same diseases. Specifically, all diseases exhibited high values of specificity and negative predictivevalue (>90%), i.e., in non-disease cases. In contrast, large variations were observed for sensitivity andpositive predictive value. The sensitivity, i.e., the proportion of individuals identified in registers with agiven disease who reported the presence of that disease in the questionnaire, varied between 65.9% forcancer to 95.0% for diabetes. The positive predictive value, i.e., the proportion of respondents with agiven self-reported disease who were identified in registers with that disease, varied between 13.0% forrheumatoid arthritis and 90.1% for cancer. By using repeated measurements among the samerespondents, the reliability of self-reported data on lifetime use of illicit drugs and specified diseases,respectively, showed large inconsistencies. In Paper IV, the inconsistency increased with higher ageand longer time since baseline measurement of lifetime use of illicit drugs and was more pronounced forhard drug use than for cannabis use. High levels of inconsistencies were also demonstrated for selfreported diseases, especially among respondents reporting a mental disorder with a duration of less than6 months, angina pectoris, or rheumatoid arthritis (Paper V). 

Conclusion: Declining response rates are observed in health surveys, and non-response is not equallydistributed across socioeconomic and sociodemographic strata, which may result in biased populationestimates. Accordingly, this may have substantial implications for health care planning and policydevelopment that rely on these data. However, the use of non-response weighting can reduce bias inestimates and, thus, increase population generalizability. The validity of self-reported diseases is low forsome diseases, and in such cases, register data should be preferred for both research purposes and policydevelopment to ensure decisions based on high-quality data. The reliability of self-reported lifetime useof illicit drugs decreases with longer time since baseline measurement, and the level of inconsistency ishighest for hard drugs. Accordingly, survey questions on lifetime use of illicit drugs should therefore beavoided or interpreted with caution. Moreover, additional questions on the frequency of use should beincluded to gain insight into the extent of the adverse health effects and, thus, the relevance in a publichealth context. Several self-reported diseases exhibit a low test-rest reliability based on data on repeatedmeasurements among the same respondents. Inconsistency levels were highest for mental disorders with a duration of less than 6 months, angina pectoris, and rheumatoid arthritis. These finding may give riseto some considerations whether questionnaire revisions are needed to increase data accuracy.  
