Heap Sort

HeapSort comes in two flavors - recursive and non-recursive. Here we will discuss the non-recursive.

Priority Queue
A priority queue is a logical data structure that supports two operations. These are enque and deque.

Enque inserts an element into the priority queue.
Deque returns the element with the greatest (or least) value from the priority queue.

Heap
A heap is a binary tree with a special property. The property is that the value of any node must be greater than the value of either child of the node. (An alternative implementation of a heap places the lower values on top rather than the higher values. I.e. the parent's value must be less than the children.)


Heap Sort - the Algorithm
  • Enqueue (add) each data item into the heap. (We will use a largest-on-top heap.) This may be done with a for loop that processes each of the N input data items.
    Each queue operation requires THETA(log(k)) operations where k is the number of items currently in the queue (k <= N).
  • Dequeue (remove) each data item from the heap, maintaining the heap property, and place it into the output array starting with the highest index and moving to index=0. This may also be done with a for loop. Each dequeue operation also requires THETA(log(k)) operations where k is the number of items currently in the queue (k <= N).
  • Since the items will be removed in order, largest first, this will create a sorted list and since each dequeue and each enqueue require <= log(N) operations, HeapSort is THETA(N*log(N)) in the worst case.

    If one needs a full implementation of a priority queue using a heap for one's toolbox then coding a proper implementation of the priority queue is an excellent idea. The queue and dequeue operations can then be used in the algorithm given above to effect an implementation of HeapSort. However if one wants a more efficient implementation of HeapSort that is simpler to code one may take advantage of specific implementation details to achieve this.

    Implementing HeapSort
    A heap is an "almost complete binary tree" (i.e. it is a complete binary tree possibly missing some of the right-most entries in the bottom level). Therefore one may implement the heap using a one-dimensional array. This may be done by numbering the nodes of the binary tree 0 through N-1 beginning with the root and moving top to bottom (lower-numbered level to higher-numbered level) and left to right. Recall there is one node at the 0 level, 2 nodes at the one level, 4 nodes at the two level, ... k nodes at the k level.
    One may easily demonstrate that the left child of the node numbered k in the array is node 2*k+1 and the right child is 2*k+2. The parent of node k is (k-1)/2.

    Consider the following integers, stored in a one-dimentional array, which are to be stored in a heap, H.

    index0123456
    value1786252104


    We will now show how the input array can be simultaneously used as the heap; as the heap is built by adding input data items.
    The first item in the list, 17, is the first item to be added to the heap and so should be placed at the root of the binary tree that is the heap, H.
    After placing 17 into the heap we have exactly one item in the heap. Therefore we consider the first item of the array to be the heap and items 1 through 6 to be the rest of the input data.
    After we add the next two items, 8 and 6, we have a heap that contains three items.
    At this point items 0 through 2 are the heap and items 3 through 6 are the remaining input data items in the array pictured above. So far the properties of a heap have been preserved with no special effort.

    However, the next data item, 25, in its current position violates the heap (largest on top) property. This may be remedied by a process that we shall call "siftUp".

    In siftUp we compare the current node with its parent. If the value of the current node is greater than its parent's value then we swap the current node and its parent in the array (and thus, in the heap). We continue with this compare and swap-with-parent operation until the new value sifts up to the top of the heap or it encounters a parent that is larger.
    Applying this process to item value 25 yields the following heap/input array.

    index0123456
    value2517682104


    Finally the completed heap will look like the following:
    index0123456
    value2517108264


    Next, we may begin the process whereby we remove each element from the heap and place it into the sorted array.
    The first element to go is the top one, 25. It is swapped with the last element in the heap yielding the following.
    index0123456
    value4171082625


    25 is now in its proper position (the last item in the sorted array) but the number now at the top of the heap, 4, is not in its proper place. It needs to be "sifted down" to where it belongs. (See the siftDown function described below.)
    After sifting down 4 we have the following.
    index0123456
    value1781042625


    Now we have sorted exactly one item so position 6 in the array is a list of sorted items and positions 0 through 5 are the items remaining in the heap yet to be placed into the sorted list.
    As each successive item in the heap is placed into the sorted output list, the sorted list will expand from the largest (rightmost) index toward 0 as the heap decreases in size toward 0.

    The second item to be removed from the heap is 17. It is swapped with 6 yielding the following.
    index0123456
    value6810421725


    Now the sorted list is size two and consists of 17 and 25 in locations 5-6. The heap requires a siftDown to bring element 6 down to where it belongs. So 6 is compared to the largest of its two children, 8 and 10, and swapped with 10 since 10 > 6. Continuing in this manner we restore the heap property and are left with the following.
    index0123456
    value1086421725


    So now the heap extends from index 0 to index 4 and the sorted list extends from index 5 to index 6.

    Eventually we will have sorted the entire array yielding the following.
    index0123456
    value2468101725


    HeapSort Functions

    A is a list of integers and lastIndex is the index of the last element in the array.
    The header for HeapSort is the following.
    void HeapSort(int A[], int lastIndex)
    This is the main HeapSort function. It accepts the list A and sorts it in place.

    In support of HeapSort you should code several other functions.

    void make_heap(int a[], int Top);
    make_heap constructs a heap by queueing each input data item into the heap.

    void del_heap(int a[], int last);
    del_heap removes the top item off the heap and puts it into its proper place in the array.
    It then puts the last element of the heap on the top of the heap and calls siftDown to bring it to its proper position in the heap.

    void siftUp(int a[], int loc);
    siftUp compares an element of the heap with its parent and swaps the two if the element is larger than the parent. It continues with this compare and swap-with-parent operation until the element sifts up to the top of the heap or it encounters a parent that is larger.

    void siftDown(int a[], int last)
    siftDown brings the element on top of the heap down to its proper place. It does this as follows. It compares the element with the larger of its children. If the element is smaller that the larger child then it is swapped with that child. This continues until the element is moved to a leaf or until it is larger than both of its children. Care must be taken to handle the case where the element has only one child.

    Our driver program will, basically, generate a dataset of a certain size and then call HeapSort to sort the data. We will need several utility functions to do basic work like print out the results of the sort and to check to make sure the data is actually sorted.
    void DisplayData(float dataArray[], int lastIndex);

    bool CheckSorted(float dataArray[], int lastIndex);

    The final program should do the following:
    1. Explain its purpose to the user
    2. Ask the user for the number of numbers to be sorted
    3. Generate a single random data set
    4. Sort the data
    5. Report the size of the dataset and the time required to sort or the number of comparisons or both.

    Special Note: - A faster implementation of HeapSort
    The siftUp operation given above is required for function enque. However, if we are only using the heap to implement HeapSort then we do not acutally need the enque operation to build the heap. The heap may be built by applying the siftDown operation to some of the nodes in a particular order.
    We, obviously, do not need to apply siftDown to nodes without children. If we start with the node at the deepest level that has children (node index (n-1)/2) and apply siftDown to it and every node with lower index in the array in reverse order of array index then we will have formed a heap. (try it!) The code for this is quite simple.
    for (int i=(lastIndex-1)/2; i>=0; i--) siftDown(dataArray, lastIndex, i);
    A careful analysis will reveal that the above code executes in order N time. Although this does speed up HeapSort it does not change its order of complexity as the time required to unload the data items from the heap still requires order N*log(N) operations.

    A good extra credit enhancement would be to ask the user how many trials he/she would like and execute the program that many times with that many data sets and write the results to a file. This will enable us to compare the running times with respect to the size of the input.



    Top of this page   Top of page      Home page   Home page