Play and learn 300 000+ tabs online

Wednesday, August 11, 2010

Heap Sort

 

We've already looked at several O(n^2) sorting algorithms, bubble sort and selection and insertion sort. Now we turn to faster sorting algorithms that can sort in time proportional to O(n*log(n)) in the average and best case time, a significant speedup, as n, the number of items to sort, grows larger. It will also now be important to consider the space taken by an algorithm. The O(n^2) sorts did not require any space over and above what was needed to store the input. Some faster sorts require (or are much easier to understand when explained by using) additional storage. Sorting methods that do not require additional space are called "in place". This property is particularly useful when dealing with large data sets, and some algorithms that seem to require additional space can be made in place with a bit of work.

Perhaps the simplest of these algorithms is heap sort, which is based on the heap data structure. The first important thing to remember about heaps is that the top element of the heap is always "next" in order (either the next highest or next lowest, in the case of numbers). Think about the consequences of this fact -- if we take all of our input values and store them in a heap, and we remove one element at a time, we will remove these elements in sorted order. But how fast is doing this?
There are two factors at work: the time it takes to create a heap by adding each element and the time it takes to remove all of the elements from a heap. Fortunately, we have a guarantee that adding a single element to and removing a single element from a heap both take O(log(n)) time. As noted above, each of these operations takes place once for each element in the input. Consequently, the algorithmic efficiency of a heap sort is O(n*log(n)), rather good indeed.
There are, however, a few tradeoffs. First, heap sort will always take O(n*log(n)) time -- while this means that its worst-case efficiency is more robust than any algorithm previously discussed, it means that even the best case time is O(n*log(n)). As you may recall, bubble sort can be optimized so that it takes only O(n) time in the best case!
Moreover, the simple implementation of heap sort will require additional space sufficient to hold a heap of size n, meaning that you need at least double the amount of space for heapsort as you do for our previous sorts. And creating a heap may have other consequences, such as increasing the constant that is normally dropped in big-O notation. As a result, for small data sets, heap sort might not be the fastest choice!
A last consideration is that heap sort is not a "stable" sort in the sense that it doesn't preserve the original order of equal elements. This might matter if, for instance, you had a list of emails that you wanted to sort by date, and you also wanted to sort alphabetically by sender. One option would be to first sort the emails alphabetically, and then sort by the date received. A stable sort would keep the alphabetical order for emails sent on the same day; an unstable sort would not.
For instance, sorting the following list of names (already in alphabetical order)

From: example@example.com       Sent: 2/1/2005
From: john@example.com Sent: 2/1/2005
From: smith@mit.edu Sent: 1/2/2005
...

using heap sort might yield


From: smith@mit.edu             Sent: 1/2/2005
From: john@example.com Sent: 2/1/2005
From: example@example.com Sent: 2/1/2005
...

Note that john@example.com is now ahead of example@example.com, even though "john" should be after "example". A proper stable sort would never put john in front of example, choosing instead to preserve the original order


From: smith@mit.edu             Sent: 1/2/2005
From: example@example.com Sent: 2/1/2005
From: john@example.com Sent: 2/1/2005
...

In some applications (for instance, mail readers) this can be an important feature. (Note that it's somewhat complicated to sort by treating both name and date as one feature because in some cases you might want the sort names in reverse order and dates in ascending order.)
You might take some time to think about how you could implement a heap sort without using any additional memory.
Summary
Heap sort is a relatively simple algorithm built upon the heap data structure. A naive implementation requires additional space, but it is possible to do a heap sort in place. Heap sort has guaranteed O(n*log(n))) performance, though the constant factor is typically a bit higher than for other algorithms such as quicksort. Heap sort is not a stable sort, so the original ordering of equal elements may not be maintained.
If you want more details on implementation, you can go here to get the source code for a heap and heap sort implementation.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.