Optimizing Sorting Algorithms by Using Sorting Networks

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

Resumé

In this paper, we show how the theory of sorting networks can be applied to synthesize optimized general-purpose sorting libraries. Standard sorting libraries are often based on combinations of the classic Quicksort algorithm, with insertion sort applied as base case for small, fixed, numbers of inputs. Unrolling the code for the base case by ignoring loop conditions eliminates branching, resulting in code equivalent to a sorting network. By replacing it with faster sorting networks, we can improve the performance of these algorithms. We show that by considering the number of comparisons and swaps alone we are not able to predict any real advantage of this approach. However, significant speed-ups are obtained when taking advantage of instruction level parallelism and non-branching conditional assignment instructions, both of which are common in modern CPU architectures. Furthermore, a close control of how often registers have to be spilled to memory gives us a complete explanation of the performance of different sorting networks, allowing us to choose an optimal one for each particular architecture. Our experimental results show that using code synthesized from these efficient sorting networks as the base case for Quicksort libraries results in significant real-world speed-ups.
OriginalsprogEngelsk
TidsskriftFormal Aspects of Computing
Vol/bind29
Udgave nummer3
Sider (fra-til)559-579
ISSN0934-5043
DOI
StatusUdgivet - 2017

Fingeraftryk

Sorting Networks
Sorting algorithm
Sorting
Quicksort
Instruction Level Parallelism
Swap
Sort
Insertion
Branching
Assignment
Eliminate
Choose
Program processors
Predict
Experimental Results
Data storage equipment
Libraries

Citer dette

@article{8a3c01a59ae5490dbcd201b847ec1dda,
title = "Optimizing Sorting Algorithms by Using Sorting Networks",
abstract = "In this paper, we show how the theory of sorting networks can be applied to synthesize optimized general-purpose sorting libraries. Standard sorting libraries are often based on combinations of the classic Quicksort algorithm, with insertion sort applied as base case for small, fixed, numbers of inputs. Unrolling the code for the base case by ignoring loop conditions eliminates branching, resulting in code equivalent to a sorting network. By replacing it with faster sorting networks, we can improve the performance of these algorithms. We show that by considering the number of comparisons and swaps alone we are not able to predict any real advantage of this approach. However, significant speed-ups are obtained when taking advantage of instruction level parallelism and non-branching conditional assignment instructions, both of which are common in modern CPU architectures. Furthermore, a close control of how often registers have to be spilled to memory gives us a complete explanation of the performance of different sorting networks, allowing us to choose an optimal one for each particular architecture. Our experimental results show that using code synthesized from these efficient sorting networks as the base case for Quicksort libraries results in significant real-world speed-ups.",
author = "Michael Codish and Luis Cruz-Filipe and Markus Nebel and Peter Schneider-Kamp",
year = "2017",
doi = "10.1007/s00165-016-0401-3",
language = "English",
volume = "29",
pages = "559--579",
journal = "Formal Aspects of Computing",
issn = "0934-5043",
publisher = "Springer",
number = "3",

}

Optimizing Sorting Algorithms by Using Sorting Networks. / Codish, Michael; Cruz-Filipe, Luis; Nebel, Markus; Schneider-Kamp, Peter.

I: Formal Aspects of Computing, Bind 29, Nr. 3, 2017, s. 559-579.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

TY - JOUR

T1 - Optimizing Sorting Algorithms by Using Sorting Networks

AU - Codish, Michael

AU - Cruz-Filipe, Luis

AU - Nebel, Markus

AU - Schneider-Kamp, Peter

PY - 2017

Y1 - 2017

N2 - In this paper, we show how the theory of sorting networks can be applied to synthesize optimized general-purpose sorting libraries. Standard sorting libraries are often based on combinations of the classic Quicksort algorithm, with insertion sort applied as base case for small, fixed, numbers of inputs. Unrolling the code for the base case by ignoring loop conditions eliminates branching, resulting in code equivalent to a sorting network. By replacing it with faster sorting networks, we can improve the performance of these algorithms. We show that by considering the number of comparisons and swaps alone we are not able to predict any real advantage of this approach. However, significant speed-ups are obtained when taking advantage of instruction level parallelism and non-branching conditional assignment instructions, both of which are common in modern CPU architectures. Furthermore, a close control of how often registers have to be spilled to memory gives us a complete explanation of the performance of different sorting networks, allowing us to choose an optimal one for each particular architecture. Our experimental results show that using code synthesized from these efficient sorting networks as the base case for Quicksort libraries results in significant real-world speed-ups.

AB - In this paper, we show how the theory of sorting networks can be applied to synthesize optimized general-purpose sorting libraries. Standard sorting libraries are often based on combinations of the classic Quicksort algorithm, with insertion sort applied as base case for small, fixed, numbers of inputs. Unrolling the code for the base case by ignoring loop conditions eliminates branching, resulting in code equivalent to a sorting network. By replacing it with faster sorting networks, we can improve the performance of these algorithms. We show that by considering the number of comparisons and swaps alone we are not able to predict any real advantage of this approach. However, significant speed-ups are obtained when taking advantage of instruction level parallelism and non-branching conditional assignment instructions, both of which are common in modern CPU architectures. Furthermore, a close control of how often registers have to be spilled to memory gives us a complete explanation of the performance of different sorting networks, allowing us to choose an optimal one for each particular architecture. Our experimental results show that using code synthesized from these efficient sorting networks as the base case for Quicksort libraries results in significant real-world speed-ups.

U2 - 10.1007/s00165-016-0401-3

DO - 10.1007/s00165-016-0401-3

M3 - Journal article

VL - 29

SP - 559

EP - 579

JO - Formal Aspects of Computing

JF - Formal Aspects of Computing

SN - 0934-5043

IS - 3

ER -