GNU Info

Info Node: (gawkinet.info)Interacting Service

(gawkinet.info)Interacting Service


Next: Simple Server Prev: Primitive Service Up: Using Networking
Enter node , (file) or (file)node

A Web Service with Interaction
==============================

   This node shows how to set up a simple web server.  The subnode is a
library file that we will use with all the examples in Note: Some
Applications and Techniques.

CGI Lib
A simple CGI library.
   Setting up a web service that allows user interaction is more
difficult and shows us the limits of network access in `gawk'. In this
node, we develop  a main program (a `BEGIN' pattern and its action)
that will become the core of event-driven execution controlled by a
graphical user interface (GUI).  Each HTTP event that the user triggers
by some action within the browser is received in this central
procedure. Parameters and menu choices are extracted from this request
and an appropriate measure is taken according to the user's choice.
For example:

     BEGIN {
       if (MyHost == "") {
          "uname -n" | getline MyHost
          close("uname -n")
       }
       if (MyPort ==  0) MyPort = 8080
       HttpService = "/inet/tcp/" MyPort "/0/0"
       MyPrefix    = "http://" MyHost ":" MyPort
       SetUpServer()
       while ("awk" != "complex") {
         # header lines are terminated this way
         RS = ORS = "\r\n"
         Status   = 200          # this means OK
         Reason   = "OK"
         Header   = TopHeader
         Document = TopDoc
         Footer   = TopFooter
         if        (GETARG["Method"] == "GET") {
             HandleGET()
         } else if (GETARG["Method"] == "HEAD") {
             # not yet implemented
         } else if (GETARG["Method"] != "") {
             print "bad method", GETARG["Method"]
         }
         Prompt = Header Document Footer
         print "HTTP/1.0", Status, Reason       |& HttpService
         print "Connection: Close"              |& HttpService
         print "Pragma: no-cache"               |& HttpService
         len = length(Prompt) + length(ORS)
         print "Content-length:", len           |& HttpService
         print ORS Prompt                       |& HttpService
         # ignore all the header lines
         while ((HttpService |& getline) > 0)
             ;
         # stop talking to this client
         close(HttpService)
         # wait for new client request
         HttpService |& getline
         # do some logging
         print systime(), strftime(), $0
         # read request parameters
         CGI_setup($1, $2, $3)
       }
     }

   This web server presents menu choices in the form of HTML links.
Therefore, it has to tell the browser the name of the host it is
residing on. When starting the server, the user may supply the name of
the host from the command line with `gawk -v MyHost="Rumpelstilzchen"'.
If the user does not do this, the server looks up the name of the host
it is running on for later use as a web address in HTML documents. The
same applies to the port number. These values are inserted later into
the HTML content of the web pages to refer to the home system.

   Each server that is built around this core has to initialize some
application-dependent variables (such as the default home page) in a
procedure `SetUpServer', which is called immediately before entering the
infinite loop of the server. For now, we will write an instance that
initiates a trivial interaction.  With this home page, the client user
can click on two possible choices, and receive the current date either
in human-readable format or in seconds since 1970:

     function SetUpServer() {
       TopHeader = "<HTML><HEAD>"
       TopHeader = TopHeader \
          "<title>My name is GAWK, GNU AWK</title></HEAD>"
       TopDoc    = "<BODY><h2>\
         Do you prefer your date <A HREF=" MyPrefix \
         "/human>human</A> or \
         <A HREF=" MyPrefix "/POSIX>POSIXed</A>?</h2>" ORS ORS
       TopFooter = "</BODY></HTML>"
     }

   On the first run through the main loop, the default line terminators
are set and the default home page is copied to the actual home page.
Since this is the first run, `GETARG["Method"]' is not initialized yet,
hence the case selection over the method does nothing. Now that the
home page is initialized, the server can start communicating to a
client browser.

   It does so by printing the HTTP header into the network connection
(`print ... |& HttpService'). This command blocks execution of the
server script until a client connects. If this server script is
compared with the primitive one we wrote before, you will notice two
additional lines in the header. The first instructs the browser to
close the connection after each request. The second tells the browser
that it should never try to _remember_ earlier requests that had
identical web addresses (no caching). Otherwise, it could happen that
the browser retrieves the time of day in the previous example just once,
and later it takes the web page from the cache, always displaying the
same time of day although time advances each second.

   Having supplied the initial home page to the browser with a valid
document stored in the parameter `Prompt', it closes the connection and
waits for the next request.  When the request comes, a log line is
printed that allows us to see which request the server receives. The
final step in the loop is to call the function `CGI_setup', which reads
all the lines of the request (coming from the browser), processes them,
and stores the transmitted parameters in the array `PARAM'. The complete
text of these application-independent functions can be found in Note: A
Simple CGI Library.  For now, we use a simplified version of
`CGI_setup':

     function CGI_setup(   method, uri, version, i) {
       delete GETARG;         delete MENU;        delete PARAM
       GETARG["Method"] = $1
       GETARG["URI"] = $2
       GETARG["Version"] = $3
       i = index($2, "?")
       # is there a "?" indicating a CGI request?
       if (i > 0) {
         split(substr($2, 1, i-1), MENU, "[/:]")
         split(substr($2, i+1), PARAM, "&")
         for (i in PARAM) {
           j = index(PARAM[i], "=")
           GETARG[substr(PARAM[i], 1, j-1)] = \
                                       substr(PARAM[i], j+1)
         }
       } else {    # there is no "?", no need for splitting PARAMs
         split($2, MENU, "[/:]")
       }
     }

   At first, the function clears all variables used for global storage
of request parameters. The rest of the function serves the purpose of
filling the global parameters with the extracted new values.  To
accomplish this, the name of the requested resource is split into parts
and stored for later evaluation. If the request contains a `?', then
the request has CGI variables seamlessly appended to the web address.
Everything in front of the `?' is split up into menu items, and
everything behind the `?' is a list of `VARIABLE=VALUE' pairs
(separated by `&') that also need splitting. This way, CGI variables are
isolated and stored. This procedure lacks recognition of special
characters that are transmitted in coded form(1). Here, any optional
request header and body parts are ignored. We do not need header
parameters and the request body. However, when refining our approach or
working with the `POST' and `PUT' methods, reading the header and body
becomes inevitable. Header parameters should then be stored in a global
array as well as the body.

   On each subsequent run through the main loop, one request from a
browser is received, evaluated, and answered according to the user's
choice. This can be done by letting the value of the HTTP method guide
the main loop into execution of the procedure `HandleGET', which
evaluates the user's choice. In this case, we have only one
hierarchical level of menus, but in the general case, menus are nested.
The menu choices at each level are separated by `/', just as in file
names. Notice how simple it is to construct menus of arbitrary depth:

     function HandleGET() {
       if (       MENU[2] == "human") {
         Footer = strftime() TopFooter
       } else if (MENU[2] == "POSIX") {
         Footer = systime()  TopFooter
       }
     }

   The disadvantage of this approach is that our server is slow and can
handle only one request at a time. Its main advantage, however, is that
the server consists of just one `gawk' program. No need for installing
an `httpd', and no need for static separate HTML files, CGI scripts, or
`root' privileges. This is rapid prototyping.  This program can be
started on the same host that runs your browser.  Then let your browser
point to `http://localhost:8080'.

   It is also possible to include images into the HTML pages.  Most
browsers support the not very well-known `.xbm' format, which may
contain only monochrome pictures but is an ASCII format. Binary images
are possible but not so easy to handle. Another way of including images
is to generate them with a tool such as GNUPlot, by calling the tool
with the `system' function or through a pipe.

   ---------- Footnotes ----------

   (1) As defined in RFC 2068.


automatically generated by info2www version 1.2.2.9