Federated Principal Component Analysis for Genome-Wide Association Studies

Anne Hartebrodt, Reza Nasirigerdeh, David B. Blumenthal, Richard Rottger

Publikation: Kapitel i bog/rapport/konference-proceedingKonferencebidrag i proceedingsForskningpeer review


Federated learning (FL) has emerged as a privacy-aware alternative to centralized data analysis, especially for biomedical analyses such as genome-wide association studies (GWAS). The data remains with the owner, which enables studies previously impossible due to privacy protection regulations. Principal component analysis (PCA) is a frequent preprocessing step in GWAS, where the eigenvectors of the sample-by-sample covariance matrix are used as covariates in the statistical tests. Therefore, a federated version of PCA suitable for vertical data partitioning is required for federated GWAS. Existing federated PCA algorithms exchange the complete sample eigenvectors, a potential privacy breach. In this paper, we present a federated PCA algorithm for vertically partitioned data which does not exchange the sample eigenvectors and is hence suitable for federated GWAS. We show that it outperforms existing federated solutions in terms of convergence behavior and scalability. Additionally, we provide a user-friendly privacy-aware web tool to promote acceptance of federated PCA among GWAS researchers.

TitelProceedings - 21st IEEE International Conference on Data Mining, ICDM 2021
RedaktørerJames Bailey, Pauli Miettinen, Yun Sing Koh, Dacheng Tao, Xindong Wu
ISBN (Trykt)978-1-6654-2399-1
ISBN (Elektronisk)978-1-6654-2398-4
StatusUdgivet - 2021
Begivenhed21st IEEE International Conference on Data Mining, ICDM 2021 - Virtual, Online, New Zealand
Varighed: 7. dec. 202110. dec. 2021


Konference21st IEEE International Conference on Data Mining, ICDM 2021
Land/OmrådeNew Zealand
ByVirtual, Online
SponsorGoogle Inc., IEEE Technical Committee on Intelligent Informatics, School of Computer Science - The University of Auckland, Two Sigma, US National Science Foundation (NSF)
NavnProceedings - IEEE International Conference on Data Mining, ICDM

Bibliografisk note

Funding Information:
ACKNOWLEDGMENTS The FeatureCloud project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 826078. This publication reflects only the authors’ view and the European Commission is not responsible for any use that may be made of the information it contains.

Publisher Copyright:
© 2021 IEEE.


Dyk ned i forskningsemnerne om 'Federated Principal Component Analysis for Genome-Wide Association Studies'. Sammen danner de et unikt fingeraftryk.