The Role of Simulated Data in Making the Best Predictions

Stephen D. Ousley, George R. Milner, Jesper L. Boldsen, Richard L. Jantz

Research output: Contribution to journalConference abstract in journalResearchpeer-review

Abstract

Machine Learning (ML) methods for regression and classification, along with the bootstrap, have revolutionized the analysis of data through resa-mpling. The resulting simulated data sets are used to select the best fitting models and to esti-mate prediction precision and accuracy. These two tasks are especially important in forensic analyses, which should reflect predictive data analysis because they will be applied to new cases, rather than summarized in descriptive data analysis. Naturally, we want to use the methods that are expected to be the most accurate and precise for new cases. However, as the great Zen master Berra noted, "It’s tough to make predic-tions, especially about the future." Predictive methods must therefore incorporate the "Known Unknowns" (Rumsfeld, 2002), and avoid overfit-ting by analyzing multiple independent training and test samples, each of which ideally should be large. Bootstrap and Monte Carlo methods mimic sampling variability that would be present in future cases, and both methods are incorporated into numerous routines to estimate prediction accuracy. No routine is perfect due to bias and variance issues, and to the nature of the data and the analytical method. New routines are always being explored.This presentation provides results from two forensic scenarios: predicting sex and ancestry using bone measurements, and predicting age using many osteological traits with a new method (TA3). We demonstrate that the consequences of supposed overfitting may be relatively small in classification, and predicting age using TA3 is far more accurate than using previous methods, even with their underestimated prediction error.
Original languageEnglish
JournalAmerican Journal of Physical Anthropology
Volume165
Issue number66
Pages (from-to)195
Number of pages1
ISSN0002-9483
Publication statusPublished - 1. Apr 2018
Event87th Annual Meeting of the American Association of Physical Anthropologists - Austin, United States
Duration: 11. Apr 201815. Apr 2018

Conference

Conference87th Annual Meeting of the American Association of Physical Anthropologists
Country/TerritoryUnited States
CityAustin
Period11/04/201815/04/2018

Fingerprint

Dive into the research topics of 'The Role of Simulated Data in Making the Best Predictions'. Together they form a unique fingerprint.

Cite this