The fastest sort method - FlashSort

in #popularscience8 years ago

There are numerous sorting methods, each of which has its advantages and disadvantages. Sorting algorithms are widely used in programming, but sometimes developers do not even think what algorithm works best (combination of speed and complexity of implementation).

Today I would like to tell you about sorting algorithm with O (N) number of readings and permutations.

The algorithm sorts the input data in place and does not use additional memory.

We will talk about FlashSort, that processes very large arrays with uniformly distributed data.

The method was introduced in 1998 by the German scientist Karl-Dietrich Neubert.


The operating principle is easy to explain on a specific example.

Suppose there is a large array of n elements, which values range from 1 to 100. If we find an element with a value of 50, it is reasonable to consider that its rightful place is in the middle of the array. Similarly with other elements: 5, probably should be closer to the top of the structure and 95 is appropriate to be almost in the end. As a result of careless manipulation, we quickly get nearly sorted an array.

The main task is to ensure that all elements of the array according to their values must be divided into several classes. The smallest numbers are in the first class, the largest - in the last, the other numbers - in the intermediate groups.

To sort we need to use a special buffer - "buckets". This is an array of 256 lengths, each element of which contains the size of the bucket, and a link to the border of this bucket - a pointer to the first bucket element in the original array. Initially, the buffer is initialized with zeros.

Thus, the sorting algorithm for each i-th iteration consists of four stages

Stage 1. Counting bucket size

In a first step size of buckets is calculated. We use our buffer, where a counter of the number of elements is used for each bucket. We run through all the elements, we take the value of the i-th byte and increment the counter for the corresponding bucket.

Stage 2. The alignment of the boundaries

Now, as we know the dimensions of each bucket, we can set clear boundaries in our original array for every bucket. We run through our buffer and set the boundaries – then set the pointers on the elements of each bucket.

Stage 3. Permutation

Now we rearrange the elements in the source array so that each of them was in its place, in its bucket.

Permutation algorithm is as follows:

We run through all elements of the array. For the current element, we take its i-th byte.

  • If the current element is in its place (in your bucket), then everything is OK - we move on to the next element.

  • If the item is not in its place – and the corresponding bucket is further, then we make a permutation of the element which is in the far bucket. We repeat this procedure until we get the right element with the desired bucket. Each time, making the change, we put the current element in the bucket and don’t re-analyze it. Thus, the number of permutations will never exceed N.

During each permutation, the counters in the buffer will be decremented back for the corresponding bucket, and in the end, our buffer will again be filled with zeros and is ready to be used in other iterations.

Permutation process will, in theory, be performed in the linear time since the number of permutations will never exceed N - the number of elements.

Stage 4. Recursive descent

And the last stage. Now we recursively sort items within each formed bucket. For each iteration, it is only necessary to know the border of the current bucket. Since we do not use additional memory, before moving to the inner iteration, we again run through the elements and compute the length of the current bucket.

Time Efficiency

As already mentioned, the algorithm is very effective in large arrays with uniformly distributed data. In this case, FlashSort with an average time of complexity O (n), will significantly get ahead QuickSort and other effective methods of sorting.


Despite the fact that the sorting method presented here always makes O (N) readings and permutations, yet it would be naive to expect that the algorithm will work in linear time with all types of data. It is obvious that, in practice, its speed will depend on the nature of the data that is sorted.

So you see that even this method has some minuses and we just need to wait until new really effective universal method appears.

Watch this awesome video to see all existing sorting algorithms

References: 1

Follow me, to be the first to learn about my publications devoted to popular science and educational topics

With Love,
Kate

Image credit: 1, 2