Distributed K-means clustering with low transmission cost

Publikation: Kapitel i bog/rapport/konference-proceedingKonferencebidrag i proceedingsForskningpeer review

Abstract

Dealing with big amounts of data is one of the challenges for clustering, which causes the need for distribution of large data sets in separate repositories. However, most clustering techniques require the data to be centralized. One of them, the k-means, has been elected one of the most influential data mining algorithms. Although exact distributed versions of the k-means algorithm have been proposed, the algorithm is still sensitive to the selection of the initial cluster prototypes and requires that the number of clusters be specified in advance. Additionally, distributed versions of clustering algorithms usually requires multiple rounds of data transmission. This work tackles the problem of generating an approximated model for distributed clustering, based on k-means, for scenarios where the number of clusters of the distributed data is unknown and the data transmission rate is low or costly. A collection of algorithms is proposed to combine k-means clustering for each distributed subset of the data with a single round of communication. These algorithms are compared from two perspectives: the theoretical one, through asymptotic complexity analyses, and the experimental one, through a comparative evaluation of results obtained from experiments and statistical tests.

OriginalsprogEngelsk
TitelProceedings - 2013 Brazilian Conference on Intelligent Systems, BRACIS 2013
ForlagIEEE
Publikationsdato2013
Sider70-75
Artikelnummer6726428
ISBN (Trykt)9780769550923
DOI
StatusUdgivet - 2013
Udgivet eksterntJa
Begivenhed2nd Brazilian Conference on Intelligent Systems, BRACIS 2013 - Fortaleza, Ceara, Brasilien
Varighed: 20. okt. 201324. okt. 2013

Konference

Konference2nd Brazilian Conference on Intelligent Systems, BRACIS 2013
Land/OmrådeBrasilien
ByFortaleza, Ceara
Periode20/10/201324/10/2013
SponsorBrazilian Computer Society (SBC)

Fingeraftryk

Dyk ned i forskningsemnerne om 'Distributed K-means clustering with low transmission cost'. Sammen danner de et unikt fingeraftryk.

Citationsformater