Collaborative filtering has been a dominant approach in the recommender systems community since the early 1990s. Collaborative filtering (and other) algorithms, however, have been predominantly evaluated by aggregating results across users or user groups. These performance averages hide large disparities: an algorithm may perform very well for some users (or groups) and poorly for others. We show that performance variation is large and systematic. In experiments on three large-scale datasets and using an array of collaborative filtering algorithms, we demonstrate large performance disparities across algorithms, datasets and metrics for different users. We then show that two key features that characterize users, their mean taste similarity and dispersion in taste similarity with other users, can systematically explain performance variation better than previously identified features. We use these two features to visualize algorithm performance for different users and we point out that this mapping can capture different categories of users that have been proposed before. Our results demonstrate an extensive mainstream-taste bias in collaborative filtering algorithms, which implies a fundamental fairness limitation that needs to be mitigated.
|Proceedings of the 17th ACM Conference on Recommender Systems, RecSys 2023
|Association for Computing Machinery
|14. sep. 2023
|Udgivet - 14. sep. 2023
|17th ACM Conference on Recommender Systems, RecSys 2023 - Singapore, Singapore
Varighed: 18. sep. 2023 → 22. sep. 2023
|17th ACM Conference on Recommender Systems, RecSys 2023
|18/09/2023 → 22/09/2023
Bibliografisk noteFunding Information:
We would like to thank Joseph Konstan and Thorsten Joachims for their insightful remarks and K. Rhett Nichols for editing the manuscript. The presented research was supported by a Sapere Aude starting grant to Pantelis P. Analytis bestowed by the Independent Research Fund Denmark.
© 2023 ACM.