GNU Info

Info Node: (gawk.info)Internals

(gawk.info)Internals


Next: Sample Library Prev: Dynamic Extensions Up: Dynamic Extensions
Enter node , (file) or (file)node

A Minimal Introduction to `gawk' Internals
------------------------------------------

   The truth is that `gawk' was not designed for simple extensibility.
The facilities for adding functions using shared libraries work, but
are something of a "bag on the side."  Thus, this tour is brief and
simplistic; would-be `gawk' hackers are encouraged to spend some time
reading the source code before trying to write extensions based on the
material presented here.  Of particular note are the files `awk.h',
`builtin.c', and `eval.c'.  Reading `awk.y' in order to see how the
parse tree is built would also be of use.

   With the disclaimers out of the way, the following types, structure
members, functions, and macros are declared in `awk.h' and are of use
when writing extensions.  The next minor node shows how they are used:

`AWKNUM'
     An `AWKNUM' is the internal type of `awk' floating-point numbers.
     Typically, it is a C `double'.

`NODE'
     Just about everything is done using objects of type `NODE'.  These
     contain both strings and numbers, as well as variables and arrays.

`AWKNUM force_number(NODE *n)'
     This macro forces a value to be numeric. It returns the actual
     numeric value contained in the node.  It may end up calling an
     internal `gawk' function.

`void force_string(NODE *n)'
     This macro guarantees that a `NODE''s string value is current.  It
     may end up calling an internal `gawk' function.  It also
     guarantees that the string is zero-terminated.

`n->param_cnt'
     The number of parameters actually passed in a function call at
     runtime.

`n->stptr'
`n->stlen'
     The data and length of a `NODE''s string value, respectively.  The
     string is _not_ guaranteed to be zero-terminated.  If you need to
     pass the string value to a C library function, save the value in
     `n->stptr[n->stlen]', assign `'\0'' to it, call the routine, and
     then restore the value.

`n->type'
     The type of the `NODE'. This is a C `enum'. Values should be
     either `Node_var' or `Node_var_array' for function parameters.

`n->vname'
     The "variable name" of a node.  This is not of much use inside
     externally written extensions.

`void assoc_clear(NODE *n)'
     Clears the associative array pointed to by `n'.  Make sure that
     `n->type == Node_var_array' first.

`NODE **assoc_lookup(NODE *symbol, NODE *subs, int reference)'
     Finds, and installs if necessary, array elements.  `symbol' is the
     array, `subs' is the subscript.  This is usually a value created
     with `tmp_string' (see below).  `reference' should be `TRUE' if it
     is an error to use the value before it is created. Typically,
     `FALSE' is the correct value to use from extension functions.

`NODE *make_string(char *s, size_t len)'
     Take a C string and turn it into a pointer to a `NODE' that can be
     stored appropriately.  This is permanent storage; understanding of
     `gawk' memory management is helpful.

`NODE *make_number(AWKNUM val)'
     Take an `AWKNUM' and turn it into a pointer to a `NODE' that can
     be stored appropriately.  This is permanent storage; understanding
     of `gawk' memory management is helpful.

`NODE *tmp_string(char *s, size_t len);'
     Take a C string and turn it into a pointer to a `NODE' that can be
     stored appropriately.  This is temporary storage; understanding of
     `gawk' memory management is helpful.

`NODE *tmp_number(AWKNUM val)'
     Take an `AWKNUM' and turn it into a pointer to a `NODE' that can
     be stored appropriately.  This is temporary storage; understanding
     of `gawk' memory management is helpful.

`NODE *dupnode(NODE *n)'
     Duplicate a node.  In most cases, this increments an internal
     reference count instead of actually duplicating the entire `NODE';
     understanding of `gawk' memory management is helpful.

`void free_temp(NODE *n)'
     This macro releases the memory associated with a `NODE' allocated
     with `tmp_string' or `tmp_number'.  Understanding of `gawk' memory
     management is helpful.

`void make_builtin(char *name, NODE *(*func)(NODE *), int count)'
     Register a C function pointed to by `func' as new built-in
     function `name'. `name' is a regular C string. `count' is the
     maximum number of arguments that the function takes.  The function
     should be written in the following manner:

          /* do_xxx --- do xxx function for gawk */
          
          NODE *
          do_xxx(NODE *tree)
          {
              ...
          }

`NODE *get_argument(NODE *tree, int i)'
     This function is called from within a C extension function to get
     the `i''th argument from the function call.  The first argument is
     argument zero.

`void set_value(NODE *tree)'
     This function is called from within a C extension function to set
     the return value from the extension function.  This value is what
     the `awk' program sees as the return value from the new `awk'
     function.

`void update_ERRNO(void)'
     This function is called from within a C extension function to set
     the value of `gawk''s `ERRNO' variable, based on the current value
     of the C `errno' variable.  It is provided as a convenience.

   An argument that is supposed to be an array needs to be handled with
some extra code, in case the array being passed in is actually from a
function parameter.  The following "boiler plate" code shows how to do
this:

     NODE *the_arg;
     
     the_arg = get_argument(tree, 2); /* assume need 3rd arg, 0-based */
     
     /* if a parameter, get it off the stack */
     if (the_arg->type == Node_param_list)
         the_arg = stack_ptr[the_arg->param_cnt];
     
     /* parameter referenced an array, get it */
     if (the_arg->type == Node_array_ref)
         the_arg = the_arg->orig_array;
     
     /* check type */
     if (the_arg->type != Node_var && the_arg->type != Node_var_array)
         fatal("newfunc: third argument is not an array");
     
     /* force it to be an array, if necessary, clear it */
     the_arg->type = Node_var_array;
     assoc_clear(the_arg);

   Again, you should spend time studying the `gawk' internals; don't
just blindly copy this code.


automatically generated by info2www version 1.2.2.9