/*----------------------------------------------------------------------------*
SORTING
Lists of data in main storage can be lexicographically sorted by this function,
the precise ordering being determined by another function whose address is
supplied as a parameter. At the beginning of each list element is a forward
index, which defines the next element in the list. A zero index terminates the
list.
On entry, the indexes may link the data in any order. On exit, they have been
modified so that they link the data in proper lexicographic order. The sort is
stable, which means that elements which are lexicographically equal retain the
order they had originally. No data in main storage are actually moved.
The routine is quite fast, requiring at most about 'n log2(n)' comparisons,
where 'n' is the number of elements in the list. This is near the theoretical
maximum speed for any sort algorithm that must use pairwise comparisons.
Presequencing reduces the number of comparisons required. Even if the list is in
random order, there is some presequencing. For example, if we select two
adjacent entries in a random list, the chance is better than even that they will
already be in order (equal elements are considered to be in order). If the list
is fully ordered on entry, only 'n-1' comparisons are required. Detailed
mathematical aspects of the function's behavior have not been worked out for
general cases.
The main routine handles a few special cases itself---simple ones that occur
frequently in certain applications---then passes the remainder through the full
sorting algorithm.
ENTRY: 'list' points to an array of forward indexes. 'list[0]' is unused.
'p' indexes the first element of the list, which ends with a zero.
'n' contains the number of items in the list, if known. If zero, the
number of items is not known and 'sort' should count.
'c' is a function to compare two list elements. Its entry and exit
conditions are given below.
EXIT: 'sort' indexes the first element in the sorted list, which ends with a
zero.
*/
#include "common.h"
static int *P, pcurr, pprev, count;
static int (*order)(int,int);
int sort(int list[], int p, int n, int (*c)(int,int))
{ int i;
P = list; //Record the location of the list
order = c; //and of the ordering function.
if(n==0) for(i=p; i; i=P[i],n++); //Count the number of elements and
if(n==0||p==0) return 0; //return empty lists immediately.
if(n==1) return p; //Do the same for single elements.
if(n==2) //If the list contains only two
{ if(order(p, P[p])<=0) return p; //elements, sort it by inspection
i = P[p]; P[i] = p; P[p] = 0; //(One and two element lists are
return i; } //the most common in applications
//like hashing.)
pcurr = p; //Otherwise sort the list with the
return isort(n); //full algorithm.
}
/*----------------------------------------------------------------------------*
ENTRY/EXIT CONDITIONS FOR THE ORDERING SUBROUTINE
The lexicographic ordering routine must be compatible with the following
definition:
ENTRY: 'p' indexes list element one.
'q' indexes list element two.
EXIT: 'order' contains a status code.
'<0' Element one is less than element two.
'=0' Element one is equal to element two.
'>0' Element one is greater than element two.
int order(int p, int q)
*/
/*----------------------------------------------------------------------------*
RECURSIVE MERGE-SORT
This is a recursive merge-sort. To understand intuitively how it works, think
about writing a new sort routine in terms of an existing one. Postulate that we
already have two routines, one which accepts a list of data and sorts it into
order, another which accepts two lists already in order and merges them into a
single list, also in order. Now ask the question, given these two postulated
routines, how could we write a new sort routine? An appropriate algorithm
requires only two major steps: (1) If the number of elements in the list to be
sorted is precisely one, then return immediately, since a list containing a
single element is already sorted. (2) If the number of elements in the list to
be sorted is greater than one, then divide the list in half (as closely as
integer arithmetic permits), call the existing sort routine to sort the first
half, do the same to sort the second half, then call the merge routine to
combine the two sorted halves. Provided that the postulated sort and merge
routines work, it is clear that the new sort routine also works.
So using an existing sort routine, we have easily defined another. This is not
surprising. But now we note that the new sort routine satisfies all the
requirements of the postulated one, and therefore could be called in its place.
That is, instead of calling the postulated sort routine, we substitute recursive
calls to our newly defined routine. Through the magic of recursion, we obtain a
sort routine merely by postulating the existence of one! It remains to write
the postulated merge routine, but this task is elementary.
The appearance of recursion in a non-recursive data structure may be surprising,
but what is more surprising is that such a simple sort algorithm runs as fast as
the best known sort algorithms, and far faster than many algorithms in popular
use. The algorithm used below is somewhat more advanced than the one just
described, but all the steps described can be found in the code that follows.
The departure consists in the method used to take advantage of presequencing.
A different approach to recursive merge-sorting is given by C. Bron
(Communications of the ACM, 5.15, May, 1972. His method can be thought of as
sorting from the bottom up, whereas this one sorts from the top down. Moreover,
that method, at least in the form presented, destroys the ordering of elements
with equal keys and cannot take advantage of presequencing.
ENTRY: 'n' defines the minimum number of elements to be sorted.
'P' points to the list of forward indexes.
'pcurr' indexes the first element of the list.
'order' is a function to compare two list elements.
EXIT: 'isort' indexes the first element in the sorted list. The list ends
with a zero index.
'count' defines the number of elements which were actually sorted. This
can be greater than or equal to its value on entry.
'pcurr' indexes the element following the last element sorted. If the
entire list has been sorted, 'pcurr' is null.
*/
int isort(int n)
{
int wp1, wp2, count1;
// SORT A SINGLE ELEMENT
if(n<=1) //If a single element has been
{ if(pcurr==0) return 0; //requested, initialize variables
wp1 = pcurr; count = 0; //and check for error in count.
do //Scan forward in the list to
{ pprev = pcurr; count += 1; //find the longest string that
if ((pcurr=P[pcurr])==0) return wp1; } //is already in order.
while(order(pprev, pcurr)<=0);
P[pprev] = 0; //Break this string from the
return wp1; } //rest of the list and return.
// SORT MORE THAN ONE ELEMENT
wp1 = isort(n/2); //Sort the first part of the list
if(n<=count) return wp1; //and return if the required
count1 = count; //amount was fortuitously sorted.
wp2 = isort(n-count); //If it was not, then sort what
count += count1; //remains to be sorted on this
return imerge(wp1, wp2); //call and merge the two sublists.
}
/*----------------------------------------------------------------------------*
MERGE TWO SORTED LISTS
This function accepts two lists already sorted and merges them together to form
a single sorted list. One of the lists is designated as primary. When a merging
ambiguity occurs (that is, when the same key appears in both lists), elements
from the primary list are merged first.
Both lists end with zero index values. Merging is done by manipulation of index
values; no data are actually moved.
ENTRY: 'P' points to the list of forward indexes.
'p' indexes the first element of a sorted primary list.
'q' indexes the first element of a sorted secondary list.
'order' is a function to compare two list elements.
EXIT: 'imerge' indexes the first element of the merged list.
*/
int imerge(int p, int q)
{
int pbeg;
if(p==0) return q; //Handle cases where one or more
if(q==0) return p; //of the lists is null.
pbeg = p; //Save an index to the beginning
if(order(p,q)<=0) goto scanp; //of the list and enter the
pbeg = q; //proper scanning routine.
scans: //Scan for a secondary element
do { pprev = q; //greater than or equal to the
if ((q=P[q])==0) goto ends; } //current primary element and mend
while(order(p,q)>0); //the secondary list.
P[pprev] = p;
scanp: //Scan for a primary element
do { pprev = p; //greater than the current
if ((p=P[p])==0) goto endp; } //secondary element, mend the
while(order(p,q)<=0); //primary list, and repeat.
P[pprev] = q; goto scans;
endp: p = q; //Attach any remaining elements,
ends: P[pprev] = p; //which need not be scanned and
return pbeg; //return the merged list.
}
/*
CLARENCE LEHMAN, MARCH 1978.
Modifications:
1. Converted to C, February 1988 [CLL].
2. Internal functions marked static, April 1988 [CLL].
3. Function 'merge' made externally callable, May 1988 [CLL].
4. Pointers converted to array indices for simplicity/comprehensibility
in scientific programs, August 2009 [CLL].
5. Converted from K&R C to Ansi C format, August 2009 [CLL].
6. Separate arrays of forward pointers, February 2011 [CLL].
7. Ordering routine passed as a parameter again, March 2012 [CLL].
NOTE: I created this particular algorithm for the early microcomputers and have
used it ever since. Its present form was my very first C program. I would code
it a little differently now, but that recoding would not affect its speed.
*/