CRGC: Fault-Recovering Actor Garbage Collection in Pekko

Dan Plyukhin*, Gul Agha, Fabrizio Montesi

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

1 Downloads (Pure)

Abstract

Actors are lightweight reactive processes that communicate by asynchronous message-passing. Actors address common problems like concurrency control and fault tolerance, but resource management remains challenging: in all four of the most popular actor frameworks (Pekko, Akka, Erlang, and Elixir) programmers must explicitly kill actors to free up resources. To simplify resource management, researchers have devised actor garbage collectors (actor GCs) that monitor the application and detect when actors are safe to kill. However, existing actor GCs are impractical for distributed systems where the network is unreliable and nodes can fail. The simplest actor GCs do not collect cyclic garbage, whereas more sophisticated actor GCs are not fault-recovering: dropped messages and crashed nodes can cause actors to become garbage that never gets collected. We present Conflict-free Replicated Garbage Collection (CRGC): the first fault-recovering cyclic actor GC. In CRGC, actors and nodes record information locally and broadcast updates to the garbage collectors running on each node. CRGC does not require locks, explicit memory barriers, or any assumptions about message delivery order, except for reliable FIFO channels from actors to their local garbage collector. Moreover, CRGC is simple: we concisely present its operational semantics, which has been formalized in TLA+, and prove both soundness (non-garbage actors are never killed) and completeness (all garbage actors are eventually killed, under reasonable assumptions). We also present a preliminary implementation in Apache Pekko and measure its performance using two actor benchmark suites. Our results show the performance overhead of CRGC is competitive with simpler approaches like weighted reference counting, while also being much more powerful.

Original languageEnglish
JournalProceedings of the ACM on Programming Languages
Volume9
Issue numberPLDI
Pages (from-to)945 - 969
ISSN2475-1421
DOIs
Publication statusPublished - 2025

Keywords

  • actor model
  • actors
  • distributed systems
  • fault tolerance
  • garbage collection

Fingerprint

Dive into the research topics of 'CRGC: Fault-Recovering Actor Garbage Collection in Pekko'. Together they form a unique fingerprint.

Cite this