Redundancies in Data and their Effect on the Evaluation of Recommendation Systems: A Case Study on the Amazon Reviews Datasets

Daniel Basaran, Eirini Ntoutsi, Arthur Zimek

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

A collection of datasets crawled from Amazon, “Amazon reviews”, is popular in the evaluation of recommendation systems. These datasets, however, contain redundancies (duplicated recommendations for variants of certain items). These redundancies went unnoticed in earlier use of these datasets and thus incurred to a certain extent wrong conclusions in the evaluation of algorithms tested on these datasets. We analyze the nature and amount of these redundancies and their impact on the evaluation of recommendation methods. While the general and obvious conclusion is that redundancies should be avoided and datasets should be carefully preprocessed, we observe more specifically that their impact depends on the complexity of the methods. With this work, we also want to raise the awareness of the importance of data quality, model understanding, and appropriate evaluation.
Original languageEnglish
Title of host publicationProceedings of the 2017 SIAM International Conference on Data Mining
Volume17
PublisherSociety for Industrial and Applied Mathematics
Publication date2017
Pages390-398
ISBN (Electronic)978-1-61197-497-3
DOIs
Publication statusPublished - 2017
EventSIAM International Conference on Data Mining - Houston, Houston, United States
Duration: 27 Apr 201729 Apr 2017
http://www.siam.org/meetings/sdm17/index.php

Conference

ConferenceSIAM International Conference on Data Mining
LocationHouston
CountryUnited States
CityHouston
Period27/04/201729/04/2017
Internet address
SeriesSIAM Data Mining

Fingerprint

data quality
effect
evaluation
recommendation
method

Cite this

Basaran, D., Ntoutsi, E., & Zimek, A. (2017). Redundancies in Data and their Effect on the Evaluation of Recommendation Systems: A Case Study on the Amazon Reviews Datasets. In Proceedings of the 2017 SIAM International Conference on Data Mining (Vol. 17, pp. 390-398). Society for Industrial and Applied Mathematics. SIAM Data Mining https://doi.org/10.1137/1.9781611974973.44
Basaran, Daniel ; Ntoutsi, Eirini ; Zimek, Arthur. / Redundancies in Data and their Effect on the Evaluation of Recommendation Systems : A Case Study on the Amazon Reviews Datasets. Proceedings of the 2017 SIAM International Conference on Data Mining. Vol. 17 Society for Industrial and Applied Mathematics, 2017. pp. 390-398 (SIAM Data Mining).
@inproceedings{8b4b1787eda54b57be06edf9a55e665a,
title = "Redundancies in Data and their Effect on the Evaluation of Recommendation Systems: A Case Study on the Amazon Reviews Datasets",
abstract = "A collection of datasets crawled from Amazon, “Amazon reviews”, is popular in the evaluation of recommendation systems. These datasets, however, contain redundancies (duplicated recommendations for variants of certain items). These redundancies went unnoticed in earlier use of these datasets and thus incurred to a certain extent wrong conclusions in the evaluation of algorithms tested on these datasets. We analyze the nature and amount of these redundancies and their impact on the evaluation of recommendation methods. While the general and obvious conclusion is that redundancies should be avoided and datasets should be carefully preprocessed, we observe more specifically that their impact depends on the complexity of the methods. With this work, we also want to raise the awareness of the importance of data quality, model understanding, and appropriate evaluation.",
author = "Daniel Basaran and Eirini Ntoutsi and Arthur Zimek",
year = "2017",
doi = "10.1137/1.9781611974973.44",
language = "English",
volume = "17",
pages = "390--398",
booktitle = "Proceedings of the 2017 SIAM International Conference on Data Mining",
publisher = "Society for Industrial and Applied Mathematics",
address = "United States",

}

Basaran, D, Ntoutsi, E & Zimek, A 2017, Redundancies in Data and their Effect on the Evaluation of Recommendation Systems: A Case Study on the Amazon Reviews Datasets. in Proceedings of the 2017 SIAM International Conference on Data Mining. vol. 17, Society for Industrial and Applied Mathematics, SIAM Data Mining, pp. 390-398, SIAM International Conference on Data Mining, Houston, United States, 27/04/2017. https://doi.org/10.1137/1.9781611974973.44

Redundancies in Data and their Effect on the Evaluation of Recommendation Systems : A Case Study on the Amazon Reviews Datasets. / Basaran, Daniel; Ntoutsi, Eirini; Zimek, Arthur.

Proceedings of the 2017 SIAM International Conference on Data Mining. Vol. 17 Society for Industrial and Applied Mathematics, 2017. p. 390-398.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

TY - GEN

T1 - Redundancies in Data and their Effect on the Evaluation of Recommendation Systems

T2 - A Case Study on the Amazon Reviews Datasets

AU - Basaran, Daniel

AU - Ntoutsi, Eirini

AU - Zimek, Arthur

PY - 2017

Y1 - 2017

N2 - A collection of datasets crawled from Amazon, “Amazon reviews”, is popular in the evaluation of recommendation systems. These datasets, however, contain redundancies (duplicated recommendations for variants of certain items). These redundancies went unnoticed in earlier use of these datasets and thus incurred to a certain extent wrong conclusions in the evaluation of algorithms tested on these datasets. We analyze the nature and amount of these redundancies and their impact on the evaluation of recommendation methods. While the general and obvious conclusion is that redundancies should be avoided and datasets should be carefully preprocessed, we observe more specifically that their impact depends on the complexity of the methods. With this work, we also want to raise the awareness of the importance of data quality, model understanding, and appropriate evaluation.

AB - A collection of datasets crawled from Amazon, “Amazon reviews”, is popular in the evaluation of recommendation systems. These datasets, however, contain redundancies (duplicated recommendations for variants of certain items). These redundancies went unnoticed in earlier use of these datasets and thus incurred to a certain extent wrong conclusions in the evaluation of algorithms tested on these datasets. We analyze the nature and amount of these redundancies and their impact on the evaluation of recommendation methods. While the general and obvious conclusion is that redundancies should be avoided and datasets should be carefully preprocessed, we observe more specifically that their impact depends on the complexity of the methods. With this work, we also want to raise the awareness of the importance of data quality, model understanding, and appropriate evaluation.

U2 - 10.1137/1.9781611974973.44

DO - 10.1137/1.9781611974973.44

M3 - Article in proceedings

VL - 17

SP - 390

EP - 398

BT - Proceedings of the 2017 SIAM International Conference on Data Mining

PB - Society for Industrial and Applied Mathematics

ER -

Basaran D, Ntoutsi E, Zimek A. Redundancies in Data and their Effect on the Evaluation of Recommendation Systems: A Case Study on the Amazon Reviews Datasets. In Proceedings of the 2017 SIAM International Conference on Data Mining. Vol. 17. Society for Industrial and Applied Mathematics. 2017. p. 390-398. (SIAM Data Mining). https://doi.org/10.1137/1.9781611974973.44