Exploiting GPU and cluster parallelism in single scan frequent itemset mining

Youcef Djenouri*, Djamel Djenouri, Asma Belhadi, Alberto Cano

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

Abstract

This paper considers discovering frequent itemsets in transactional databases and addresses the time complexity problem by using high performance computing (HPC). Three HPC versions of the Single Scan (SS) algorithm are proposed. The first one (GSS) implements SS on a GPU (Graphics Processing Unit) architecture using an efficient mapping between thread blocks and the input data. The second approach (CSS) implements SS on a cluster architecture by scheduling independent jobs to workers in a cluster. The third, (CGSS) accelerates the frequent itemset mining process by using multiple cluster nodes equipped with GPUs. Moreover, three partitioning strategies are proposed to reduce GPU thread divergence and cluster load imbalance. Results show that CGSS outperforms SS, GSS, and CSS in terms of speedup. Specifically, CGSS provides up to a 350 times speedup for low minimum support values on large datasets. GCSS demonstrably outperforms the state-of-the-art HPC-based algorithms on big databases.

Original languageEnglish
JournalInformation Sciences
Volume496
Pages (from-to)363-377
Number of pages15
ISSN0020-0255
DOIs
Publication statusPublished - Sep 2019

    Fingerprint

Keywords

  • Big data
  • Frequent itemset mining
  • GPU
  • High-performance computing
  • Support computing

Cite this