Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine

Kasper Grud Skat Madsen, Yongluan Zhou, Jianneng Cao

Publikation: Bidrag til bog/antologi/rapport/konference-proceedingKonferencebidrag i proceedingsForskningpeer review

112 Downloads (Pure)

Resumé

Load balancing, operator instance collocations and horizontal scaling are critical issues in Parallel Stream Processing Engines to achieve low data processing latency, optimized cluster utilization and minimized communication cost respectively. In previous work, these issues are typically tackled separately and independently. We argue that these problems are tightly coupled in the sense that they all need to determine the allocations of workloads and migrate computational states at runtime. Optimizing them independently would result in suboptimal solutions. Therefore, in this paper, we investigate how these three issues can be modeled as one integrated optimization problem. In particular, we first consider jobs where workload allocations have little effect on the communication cost, and model the problem of load balance as a Mixed-Integer Linear Program. Afterwards, we present an extended solution called ALBIC, which support general jobs. We implement the proposed techniques on top of Apache Storm, an open-source Parallel Stream Processing Engine. The extensive experimental results over both synthetic and real datasets show that our techniques clearly outperform existing approaches.
OriginalsprogEngelsk
TitelProceedings of the 33rd International Conference on Data Engineering (ICDE)
ForlagIEEE Press
Publikationsdato2017
Sider227-230
ISBN (Trykt)978-1-5090-6544-8
ISBN (Elektronisk)978-1-5090-6543-1
DOI
StatusUdgivet - 2017
Begivenhed33rd International Conference on Data Engineering - San Diego, USA
Varighed: 19. apr. 201722. apr. 2017
Konferencens nummer: 33

Konference

Konference33rd International Conference on Data Engineering
Nummer33
LandUSA
BySan Diego
Periode19/04/201722/04/2017

Fingeraftryk

Engines
Communication
Processing
Resource allocation
Costs

Emneord

  • Computational modeling
  • Engines
  • Load management
  • Load modeling
  • Resource management
  • Runtime
  • Storms

Citer dette

Madsen, K. G. S., Zhou, Y., & Cao, J. (2017). Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine. I Proceedings of the 33rd International Conference on Data Engineering (ICDE) (s. 227-230). IEEE Press. https://doi.org/10.1109/ICDE.2017.81
Madsen, Kasper Grud Skat ; Zhou, Yongluan ; Cao, Jianneng. / Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine. Proceedings of the 33rd International Conference on Data Engineering (ICDE). IEEE Press, 2017. s. 227-230
@inproceedings{05949a240cb3440cb30a6f87f2874b9f,
title = "Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine",
abstract = "Load balancing, operator instance collocations and horizontal scaling are critical issues in Parallel Stream Processing Engines to achieve low data processing latency, optimized cluster utilization and minimized communication cost respectively. In previous work, these issues are typically tackled separately and independently. We argue that these problems are tightly coupled in the sense that they all need to determine the allocations of workloads and migrate computational states at runtime. Optimizing them independently would result in suboptimal solutions. Therefore, in this paper, we investigate how these three issues can be modeled as one integrated optimization problem. In particular, we first consider jobs where workload allocations have little effect on the communication cost, and model the problem of load balance as a Mixed-Integer Linear Program. Afterwards, we present an extended solution called ALBIC, which support general jobs. We implement the proposed techniques on top of Apache Storm, an open-source Parallel Stream Processing Engine. The extensive experimental results over both synthetic and real datasets show that our techniques clearly outperform existing approaches.",
keywords = "Computational modeling, Engines, Load management, Load modeling, Resource management, Runtime, Storms",
author = "Madsen, {Kasper Grud Skat} and Yongluan Zhou and Jianneng Cao",
year = "2017",
doi = "10.1109/ICDE.2017.81",
language = "English",
isbn = "978-1-5090-6544-8",
pages = "227--230",
booktitle = "Proceedings of the 33rd International Conference on Data Engineering (ICDE)",
publisher = "IEEE Press",

}

Madsen, KGS, Zhou, Y & Cao, J 2017, Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine. i Proceedings of the 33rd International Conference on Data Engineering (ICDE). IEEE Press, s. 227-230, 33rd International Conference on Data Engineering, San Diego, USA, 19/04/2017. https://doi.org/10.1109/ICDE.2017.81

Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine. / Madsen, Kasper Grud Skat; Zhou, Yongluan; Cao, Jianneng.

Proceedings of the 33rd International Conference on Data Engineering (ICDE). IEEE Press, 2017. s. 227-230.

Publikation: Bidrag til bog/antologi/rapport/konference-proceedingKonferencebidrag i proceedingsForskningpeer review

TY - GEN

T1 - Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine

AU - Madsen, Kasper Grud Skat

AU - Zhou, Yongluan

AU - Cao, Jianneng

PY - 2017

Y1 - 2017

N2 - Load balancing, operator instance collocations and horizontal scaling are critical issues in Parallel Stream Processing Engines to achieve low data processing latency, optimized cluster utilization and minimized communication cost respectively. In previous work, these issues are typically tackled separately and independently. We argue that these problems are tightly coupled in the sense that they all need to determine the allocations of workloads and migrate computational states at runtime. Optimizing them independently would result in suboptimal solutions. Therefore, in this paper, we investigate how these three issues can be modeled as one integrated optimization problem. In particular, we first consider jobs where workload allocations have little effect on the communication cost, and model the problem of load balance as a Mixed-Integer Linear Program. Afterwards, we present an extended solution called ALBIC, which support general jobs. We implement the proposed techniques on top of Apache Storm, an open-source Parallel Stream Processing Engine. The extensive experimental results over both synthetic and real datasets show that our techniques clearly outperform existing approaches.

AB - Load balancing, operator instance collocations and horizontal scaling are critical issues in Parallel Stream Processing Engines to achieve low data processing latency, optimized cluster utilization and minimized communication cost respectively. In previous work, these issues are typically tackled separately and independently. We argue that these problems are tightly coupled in the sense that they all need to determine the allocations of workloads and migrate computational states at runtime. Optimizing them independently would result in suboptimal solutions. Therefore, in this paper, we investigate how these three issues can be modeled as one integrated optimization problem. In particular, we first consider jobs where workload allocations have little effect on the communication cost, and model the problem of load balance as a Mixed-Integer Linear Program. Afterwards, we present an extended solution called ALBIC, which support general jobs. We implement the proposed techniques on top of Apache Storm, an open-source Parallel Stream Processing Engine. The extensive experimental results over both synthetic and real datasets show that our techniques clearly outperform existing approaches.

KW - Computational modeling

KW - Engines

KW - Load management

KW - Load modeling

KW - Resource management

KW - Runtime

KW - Storms

U2 - 10.1109/ICDE.2017.81

DO - 10.1109/ICDE.2017.81

M3 - Article in proceedings

SN - 978-1-5090-6544-8

SP - 227

EP - 230

BT - Proceedings of the 33rd International Conference on Data Engineering (ICDE)

PB - IEEE Press

ER -

Madsen KGS, Zhou Y, Cao J. Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine. I Proceedings of the 33rd International Conference on Data Engineering (ICDE). IEEE Press. 2017. s. 227-230 https://doi.org/10.1109/ICDE.2017.81