Comparison of non-parametric methods for ungrouping coarsely aggregated data

Silvia Rizzi, Mikael Thinggaard, Gerda Engholm, Niels Christensen, Tom Børge Johannesen, James W. Vaupel, Rune Jacobsen

Research output: Contribution to journalJournal articleResearchpeer-review

261 Downloads (Pure)

Abstract

Background
Histograms are a common tool to estimate densities non-parametrically. They are extensively encountered in health sciences to summarize data in a compact format. Examples are age-specific distributions of death or onset of diseases grouped in 5-years age classes with an open-ended age group at the highest ages. When histogram intervals are too coarse, information is lost and comparison between histograms with different boundaries is arduous. In these cases it is useful to estimate detailed distributions from grouped data.

Methods
From an extensive literature search we identify five methods for ungrouping count data. We compare the performance of two spline interpolation methods, two kernel density estimators and a penalized composite link model first via a simulation study and then with empirical data obtained from the NORDCAN Database. All methods analyzed can be used to estimate differently shaped distributions; can handle unequal interval length; and allow stretches of 0 counts.

Results
The methods show similar performance when the grouping scheme is relatively narrow, i.e. 5-years age classes. With coarser age intervals, i.e. in the presence of open-ended age groups, the penalized composite link model performs the best.

Conclusion
We give an overview and test different methods to estimate detailed distributions from grouped count data. Health researchers can benefit from these versatile methods, which are ready for use in the statistical software R. We recommend using the penalized composite link model when data are grouped in wide age classes.
Original languageEnglish
Article number59
JournalBMC Medical Research Methodology
Volume16
Number of pages12
ISSN1471-2288
DOIs
Publication statusPublished - 2016

Fingerprint

Age Groups
Age Distribution
Health
Research Personnel
Databases

Cite this

@article{d4b728780c6a424fbac80e33b6d8a8ea,
title = "Comparison of non-parametric methods for ungrouping coarsely aggregated data",
abstract = "BackgroundHistograms are a common tool to estimate densities non-parametrically. They are extensively encountered in health sciences to summarize data in a compact format. Examples are age-specific distributions of death or onset of diseases grouped in 5-years age classes with an open-ended age group at the highest ages. When histogram intervals are too coarse, information is lost and comparison between histograms with different boundaries is arduous. In these cases it is useful to estimate detailed distributions from grouped data.MethodsFrom an extensive literature search we identify five methods for ungrouping count data. We compare the performance of two spline interpolation methods, two kernel density estimators and a penalized composite link model first via a simulation study and then with empirical data obtained from the NORDCAN Database. All methods analyzed can be used to estimate differently shaped distributions; can handle unequal interval length; and allow stretches of 0 counts.ResultsThe methods show similar performance when the grouping scheme is relatively narrow, i.e. 5-years age classes. With coarser age intervals, i.e. in the presence of open-ended age groups, the penalized composite link model performs the best.ConclusionWe give an overview and test different methods to estimate detailed distributions from grouped count data. Health researchers can benefit from these versatile methods, which are ready for use in the statistical software R. We recommend using the penalized composite link model when data are grouped in wide age classes.",
author = "Silvia Rizzi and Mikael Thinggaard and Gerda Engholm and Niels Christensen and Johannesen, {Tom B{\o}rge} and Vaupel, {James W.} and Rune Jacobsen",
year = "2016",
doi = "10.1186/s12874-016-0157-8",
language = "English",
volume = "16",
journal = "B M C Medical Research Methodology",
issn = "1471-2288",
publisher = "BioMed Central",

}

Comparison of non-parametric methods for ungrouping coarsely aggregated data. / Rizzi, Silvia; Thinggaard, Mikael; Engholm, Gerda; Christensen, Niels ; Johannesen, Tom Børge; Vaupel, James W. ; Jacobsen, Rune.

In: BMC Medical Research Methodology, Vol. 16, 59, 2016.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - Comparison of non-parametric methods for ungrouping coarsely aggregated data

AU - Rizzi, Silvia

AU - Thinggaard, Mikael

AU - Engholm, Gerda

AU - Christensen, Niels

AU - Johannesen, Tom Børge

AU - Vaupel, James W.

AU - Jacobsen, Rune

PY - 2016

Y1 - 2016

N2 - BackgroundHistograms are a common tool to estimate densities non-parametrically. They are extensively encountered in health sciences to summarize data in a compact format. Examples are age-specific distributions of death or onset of diseases grouped in 5-years age classes with an open-ended age group at the highest ages. When histogram intervals are too coarse, information is lost and comparison between histograms with different boundaries is arduous. In these cases it is useful to estimate detailed distributions from grouped data.MethodsFrom an extensive literature search we identify five methods for ungrouping count data. We compare the performance of two spline interpolation methods, two kernel density estimators and a penalized composite link model first via a simulation study and then with empirical data obtained from the NORDCAN Database. All methods analyzed can be used to estimate differently shaped distributions; can handle unequal interval length; and allow stretches of 0 counts.ResultsThe methods show similar performance when the grouping scheme is relatively narrow, i.e. 5-years age classes. With coarser age intervals, i.e. in the presence of open-ended age groups, the penalized composite link model performs the best.ConclusionWe give an overview and test different methods to estimate detailed distributions from grouped count data. Health researchers can benefit from these versatile methods, which are ready for use in the statistical software R. We recommend using the penalized composite link model when data are grouped in wide age classes.

AB - BackgroundHistograms are a common tool to estimate densities non-parametrically. They are extensively encountered in health sciences to summarize data in a compact format. Examples are age-specific distributions of death or onset of diseases grouped in 5-years age classes with an open-ended age group at the highest ages. When histogram intervals are too coarse, information is lost and comparison between histograms with different boundaries is arduous. In these cases it is useful to estimate detailed distributions from grouped data.MethodsFrom an extensive literature search we identify five methods for ungrouping count data. We compare the performance of two spline interpolation methods, two kernel density estimators and a penalized composite link model first via a simulation study and then with empirical data obtained from the NORDCAN Database. All methods analyzed can be used to estimate differently shaped distributions; can handle unequal interval length; and allow stretches of 0 counts.ResultsThe methods show similar performance when the grouping scheme is relatively narrow, i.e. 5-years age classes. With coarser age intervals, i.e. in the presence of open-ended age groups, the penalized composite link model performs the best.ConclusionWe give an overview and test different methods to estimate detailed distributions from grouped count data. Health researchers can benefit from these versatile methods, which are ready for use in the statistical software R. We recommend using the penalized composite link model when data are grouped in wide age classes.

U2 - 10.1186/s12874-016-0157-8

DO - 10.1186/s12874-016-0157-8

M3 - Journal article

VL - 16

JO - B M C Medical Research Methodology

JF - B M C Medical Research Methodology

SN - 1471-2288

M1 - 59

ER -