Comparing clustering approaches for smart meter time series: Investigating the influence of dataset properties on performance

Luke W. Yerbury*, Ricardo J.G.B. Campello, G. C. Livingston, Mark Goldsworthy, Lachlan O'Neil

*Kontaktforfatter

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

Abstract

The widespread adoption of smart meters for monitoring energy consumption has generated vast quantities of high-resolution time series data which remain underutilised. While clustering has emerged as a fundamental tool for mining smart meter time series (SMTS) data, selecting appropriate clustering methods remains challenging despite numerous comparative studies. These studies often rely on problematic methodologies and consider a limited scope of methods, frequently overlooking compelling methods from the broader time series clustering literature. Consequently, they struggle to provide dependable guidance for practitioners designing their own clustering approaches. This paper presents a comprehensive comparative framework for SMTS clustering methods using expert-informed synthetic datasets that emphasise peak consumption behaviours as fundamental cluster concepts. Using a phased methodology, we first evaluated 31 distance measures and 8 representation methods using leave-one-out classification, then examined the better-suited methods in combination with 11 clustering algorithms. We further assessed the robustness of these combinations to systematic changes in key dataset properties that affect clustering performance on real-world datasets, including cluster balance, noise, and the presence of outliers. Our results revealed that methods accommodating local temporal shifts while maintaining amplitude sensitivity, particularly Dynamic Time Warping and k-sliding distance, consistently outperformed traditional approaches. Among other key findings, we identified that when combined with k-medoids or hierarchical clustering using Ward's linkage, these methods exhibited consistent robustness across varying dataset characteristics without careful parameter tuning. These and other findings inform actionable recommendations for practitioners, and validation with real-world data demonstrates that our findings translate effectively to practical SMTS clustering tasks. Finally, our datasets and code are publicly available to support the development, evaluation, and comparison of both novel and overlooked methods.

OriginalsprogEngelsk
Artikelnummer125811
TidsskriftApplied Energy
Vol/bind391
Antal sider28
ISSN0306-2619
DOI
StatusUdgivet - 1. aug. 2025

Fingeraftryk

Dyk ned i forskningsemnerne om 'Comparing clustering approaches for smart meter time series: Investigating the influence of dataset properties on performance'. Sammen danner de et unikt fingeraftryk.

Citationsformater