GNU Info

Info Node: (gawk.info)Array Sorting

(gawk.info)Array Sorting


Prev: Multi-scanning Up: Arrays
Enter node , (file) or (file)node

Sorting Array Values and Indices with `gawk'
============================================

   The order in which an array is scanned with a `for (i in array)'
loop is essentially arbitrary.  In most `awk' implementations, sorting
an array requires writing a `sort' function.  While this can be
educational for exploring different sorting algorithms, usually that's
not the point of the program.  `gawk' provides the built-in `asort'
function (Note: String Manipulation Functions.)  that
sorts an array.  For example:

     POPULATE THE ARRAY data
     n = asort(data)
     for (i = 1; i <= n; i++)
         DO SOMETHING WITH data[i]

   After the call to `asort', the array `data' is indexed from 1 to
some number N, the total number of elements in `data'.  (This count is
`asort''s return value.)  `data[1]' <= `data[2]' <= `data[3]', and so
on.  The comparison of array elements is done using `gawk''s usual
comparison rules (*note Variable Typing and Comparison Expressions:
Typing and Comparison.).

   An important side effect of calling `asort' is that _the array's
original indices are irrevocably lost_.  As this isn't always
desirable, `asort' accepts a second argument:

     POPULATE THE ARRAY source
     n = asort(source, dest)
     for (i = 1; i <= n; i++)
         DO SOMETHING WITH dest[i]

   In this case, `gawk' copies the `source' array into the `dest' array
and then sorts `dest', destroying its indices.  However, the `source'
array is not affected.

   Often, what's needed is to sort on the values of the _indices_
instead of the values of the elements.  To do this, use a helper array
to hold the sorted index values, and then access the original array's
elements.  It works in the following way:

     POPULATE THE ARRAY data
     # copy indices
     j = 1
     for (i in data) {
         ind[j] = i    # index value becomes element value
         j++
     }
     n = asort(ind)    # index values are now sorted
     for (i = 1; i <= n; i++)
         DO SOMETHING WITH data[ind[i]]

   Sorting the array by replacing the indices provides maximal
flexibility.  To traverse the elements in decreasing order, use a loop
that goes from N down to 1, either over the elements or over the
indices.

   Copying array indices and elements isn't expensive in terms of
memory.  Internally, `gawk' maintains "reference counts" to data.  For
example, when `asort' copies the first array to the second one, there
is only one copy of the original array elements' data, even though both
arrays use the values.  Similarly, when copying the indices from `data'
to `ind', there is only one copy of the actual index strings.

   As with array subscripts, the value of `IGNORECASE' does not affect
array sorting.


automatically generated by info2www version 1.2.2.9