Distributed K-means clustering with low transmission cost

Publikation: Kapitel i bog/rapport/konference-proceedingKonferencebidrag i proceedingsForskningpeer review


Dealing with big amounts of data is one of the challenges for clustering, which causes the need for distribution of large data sets in separate repositories. However, most clustering techniques require the data to be centralized. One of them, the k-means, has been elected one of the most influential data mining algorithms. Although exact distributed versions of the k-means algorithm have been proposed, the algorithm is still sensitive to the selection of the initial cluster prototypes and requires that the number of clusters be specified in advance. Additionally, distributed versions of clustering algorithms usually requires multiple rounds of data transmission. This work tackles the problem of generating an approximated model for distributed clustering, based on k-means, for scenarios where the number of clusters of the distributed data is unknown and the data transmission rate is low or costly. A collection of algorithms is proposed to combine k-means clustering for each distributed subset of the data with a single round of communication. These algorithms are compared from two perspectives: the theoretical one, through asymptotic complexity analyses, and the experimental one, through a comparative evaluation of results obtained from experiments and statistical tests.

TitelProceedings - 2013 Brazilian Conference on Intelligent Systems, BRACIS 2013
ISBN (Trykt)9780769550923
StatusUdgivet - 2013
Udgivet eksterntJa
Begivenhed2nd Brazilian Conference on Intelligent Systems, BRACIS 2013 - Fortaleza, Ceara, Brasilien
Varighed: 20. okt. 201324. okt. 2013


Konference2nd Brazilian Conference on Intelligent Systems, BRACIS 2013
ByFortaleza, Ceara
SponsorBrazilian Computer Society (SBC)


Dyk ned i forskningsemnerne om 'Distributed K-means clustering with low transmission cost'. Sammen danner de et unikt fingeraftryk.