sort (C++)

From HandWiki

sort is a generic function in the C++ Standard Library for doing comparison sorting. The function originated in the Standard Template Library (STL).

The specific sorting algorithm is not mandated by the language standard and may vary across implementations, but the worst-case asymptotic complexity of the function is specified: a call to sort must perform no more than O(N log N) comparisons when applied to a range of N elements.[1]

Usage

The sort function is included from the <algorithm> header of the C++ Standard Library, and carries three arguments: RandomAccessIterator first, RandomAccessIterator last, Compare comp. Here, RandomAccessIterator is a templated type that must be a random access iterator, and first and last must define a sequence of values, i.e., last must be reachable from first by repeated application of the increment operator to first. The third argument, also of a templated type, denotes a comparison predicate. This comparison predicate must define a strict weak ordering on the elements of the sequence to be sorted. The third argument is optional; if not given, the "less-than" (<) operator is used, which may be overloaded in C++.

This code sample sorts a given array of integers (in ascending order) and prints it out.

#include <algorithm>
#include <iostream>

int main() {
  int array[] = { 23, 5, -10, 0, 0, 321, 1, 2, 99, 30 };
  std::sort(std::begin(array), std::end(array));
  for (size_t i = 0; i < std::size(array); ++i) {
     std::cout << array[i] << ' ';
  }
  std::cout << '\n';
}

The same functionality using a vector container, using its begin and end methods to obtain iterators:

#include <algorithm>
#include <iostream>
#include <vector>

int main() {
  std::vector<int> vec = { 23, 5, -10, 0, 0, 321, 1, 2, 99, 30 };
  std::sort(vec.begin(), vec.end());
  for (size_t i = 0; i < vec.size(); ++i) {
     std::cout << vec[i] << ' ';
  }
  std::cout << '\n';
}

Genericity

sort is specified generically, so that it can work on any random-access container and any way of determining that an element x of such a container should be placed before another element y.

Although generically specified, sort is not easily applied to all sorting problems. A particular problem that has been the subject of some study is the following:

  • Let A and B be two arrays, where there exists some relation between the element A[i] and the element B[i] for all valid indices i.
  • Sort A while maintaining the relation with B, i.e., apply the same permutation to B that sorts A.
  • Do the previous without copying the elements of A and B into a new array of pairs, sorting, and moving the elements back into the original arrays (which would require O(n) temporary space).

A solution to this problem was suggested by A. Williams in 2002, who implemented a custom iterator type for pairs of arrays and analyzed some of the difficulties in correctly implementing such an iterator type.[2] Williams's solution was studied and refined by K. Åhlander.[3]

Complexity and implementations

The C++ standard requires that a call to sort performs O(N log N) comparisons when applied to a range of N elements.[4] In previous versions of C++, such as C++03, only average complexity was required to be O(N log N).[5] This was to allow the use of algorithms like (median-of-3) quicksort, which are fast in the average case, indeed significantly faster than other algorithms like heap sort with optimal worst-case complexity, and where the worst-case quadratic complexity rarely occurs.[6] The introduction of hybrid algorithms such as introsort allowed both fast average performance and optimal worst-case performance, and thus the complexity requirements were tightened in later standards.

Different implementations use different algorithms. The GNU Standard C++ library, for example, uses a 3-part hybrid sorting algorithm: introsort is performed first (introsort itself being a hybrid of quicksort and heap sort), to a maximum depth given by 2×log2 n, where n is the number of elements, followed by an insertion sort on the result.[7]

Other types of sorting

sort is not stable: equivalent elements that are ordered one way before sorting may be ordered differently after sorting. stable_sort ensures stability of result at expense of worse performance (in some cases), requiring only quasilinear time with exponent 2 – O(n log2 n) – if additional memory is not available, but linearithmic time O(n log n) if additional memory is available.[8] This allows the use of in-place merge sort for in-place stable sorting and regular merge sort for stable sorting with additional memory.

Partial sorting is implemented by partial_sort, which takes a range of n elements and an integer m < n, and reorders the range so that the smallest m elements are in the first m positions in sorted order (leaving the remaining nm in the remaining positions, in some unspecified order). Depending on design this may be considerably faster than complete sort. Historically, this was commonly implemented using a heap-based algorithm that takes Θ(n + m log n) worst-case time. A better algorithm called quickselsort is used in the Copenhagen STL implementation, bringing the complexity down to Θ(n + m log m).[9]

Selection of the nth element is implemented by nth_element, which actually implements an in-place partial sort: it correctly sorts the nth element, and also ensures that this element partitions so elements before it are less than it, and elements after it are greater than it. There is the requirement that this takes linear time on average, but there is no worst-case requirement; these requirements are exactly met by quickselect, for any choice of pivot strategy.

Some containers, among them list, provide specialised version of sort as a member function. This is because linked lists don't have random access (and therefore can't use the regular sort function); and the specialised version also preserves the values list iterators point to.

Comparison to qsort

Aside from sort, the C++ standard library also includes the qsort function from the C standard library. Compared to qsort, the templated sort is more type-safe since it does not require access to data items through unsafe void pointers, as qsort does. Also, qsort accesses the comparison function using a function pointer, necessitating large numbers of repeated function calls, whereas in sort, comparison functions may be inlined into the custom object code generated for a template instantiation. In practice, C++ code using sort is often considerably faster at sorting simple data like integers than equivalent C code using qsort.[10]

References

  1. "Working Draft, Standard for Programming Language C++". ISO. p. 911. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf. 
  2. Williams, Anthony (2002). "Pairing off iterators". Just Software Solutions. https://www.justsoftwaresolutions.co.uk/articles/pair_iterators.pdf. 
  3. Åhlander, Krister (2005). "Sorting Out the Relationships Between Pairs of Iterators, Values, and References". Proc. Int'l Conf. Generative Programming: Concepts & Experiences. 3676. pp. 342–356. 
  4. "Working Draft, Standard for Programming Language C++". ISO. p. 911. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf. 
  5. ISO/IEC (2003). ISO/IEC 14882:2003(E): Programming Languages – C++ §25.3.1.1 sort [lib.sort] para. 2
  6. "Generic Algorithms", David Musser
  7. libstdc++ Documentation: Sorting Algorithm
  8. stable_sort
  9. Martínez, Conrado (2004). "Partial quicksort". Proc. 6th ACM-SIAM Workshop on Algorithm Engineering and Experiments and 1st ACM-SIAM Workshop on Analytic Algorithmics and Combinatorics. http://www.lsi.upc.edu/~conrado/research/reports/ALCOMFT-TR-03-50.pdf. 
  10. Meyers, Scott (2001). Effective STL: 50 specific ways to improve your use of the standard template library. Addison-Wesley. pp. 203. ISBN 0-201-74962-9. 

External links