PAC-Bayesian lifelong learning for multi-armed bandits

Hamish Flynn*, David Reeb, Melih Kandemir, Jan Peters

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

12 Downloads (Pure)


We present a PAC-Bayesian analysis of lifelong learning. In the lifelong learning problem, a sequence of learning tasks is observed one-at-a-time, and the goal is to transfer information acquired from previous tasks to new learning tasks. We consider the case when each learning task is a multi-armed bandit problem. We derive lower bounds on the expected average reward that would be obtained if a given multi-armed bandit algorithm was run in a new task with a particular prior and for a set number of steps. We propose lifelong learning algorithms that use our new bounds as learning objectives. Our proposed algorithms are evaluated in several lifelong multi-armed bandit problems and are found to perform better than a baseline method that does not use generalisation bounds.

Original languageEnglish
JournalData Mining and Knowledge Discovery
Pages (from-to)841-876
Publication statusPublished - Mar 2022


  • Lifelong learning
  • Multi-armed bandits
  • PAC-Bayesian


Dive into the research topics of 'PAC-Bayesian lifelong learning for multi-armed bandits'. Together they form a unique fingerprint.

Cite this