Abstract
Correlated failures in large-scale clusters have significant effects on
systems’ availability, especially for streaming data applications that
run continuously and require low processing latency. Most stateof-
the-art distributed stream processing engines (DSPEs) adopt a
blocking recovery paradigm, which, upon correlated failure, would
block the progress of recovery until sufficient new resources for
recovery are available. As the arrival of new resources is usually
progressive, a blocking paradigm fails to minimize the recovery
latency. To address this problem, we propose a progressive and
query-centric recovery paradigm where the recovery of the failed
operators would be carefully scheduled to progressively recover the
outputs of queries as early as possible based on the current availability
of resources. In this work, we propose and implement a
fault-tolerance framework which supports progressive recovery after
correlated failures with minimum overhead during the system’s
normal execution. We also formulate the new problem of recovery
scheduling under correlated failures and design effective algorithms
to optimize the recovery latency. The proposed methods are
implemented on Apache Storm and preliminary experiments are
conducted to verify their validity
systems’ availability, especially for streaming data applications that
run continuously and require low processing latency. Most stateof-
the-art distributed stream processing engines (DSPEs) adopt a
blocking recovery paradigm, which, upon correlated failure, would
block the progress of recovery until sufficient new resources for
recovery are available. As the arrival of new resources is usually
progressive, a blocking paradigm fails to minimize the recovery
latency. To address this problem, we propose a progressive and
query-centric recovery paradigm where the recovery of the failed
operators would be carefully scheduled to progressively recover the
outputs of queries as early as possible based on the current availability
of resources. In this work, we propose and implement a
fault-tolerance framework which supports progressive recovery after
correlated failures with minimum overhead during the system’s
normal execution. We also formulate the new problem of recovery
scheduling under correlated failures and design effective algorithms
to optimize the recovery latency. The proposed methods are
implemented on Apache Storm and preliminary experiments are
conducted to verify their validity
| Originalsprog | Engelsk |
|---|---|
| Titel | Advances in Database Technology : Proceedings of the 20th International Conference on Extending Database Technology |
| Redaktører | Volker Markl, Salvatore Orlando, Bernhard Mitschang, Periklis Andritsos, Kai-Uwe Sattler, Sebastian Breß |
| Forlag | OpenProceedings |
| Publikationsdato | 2017 |
| Sider | 518-521 |
| ISBN (Elektronisk) | 978-3-89318-073-8 |
| DOI | |
| Status | Udgivet - 2017 |
| Begivenhed | 20th International Conference on Extending Database Technology - Venice, Italien Varighed: 21. mar. 2017 → 24. mar. 2017 Konferencens nummer: 20 |
Konference
| Konference | 20th International Conference on Extending Database Technology |
|---|---|
| Nummer | 20 |
| Land/Område | Italien |
| By | Venice |
| Periode | 21/03/2017 → 24/03/2017 |
| Navn | Advances in Database Technology |
|---|---|
| Vol/bind | 2017 |
| ISSN | 2367-2005 |
Fingeraftryk
Dyk ned i forskningsemnerne om 'Progressive Recovery of Correlated Failures in Distributed Stream Processing Engines'. Sammen danner de et unikt fingeraftryk.Citationsformater
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver