GNU Info

Info Node: (gawkinet.info)CGI Lib

(gawkinet.info)CGI Lib


Prev: Interacting Service Up: Interacting Service
Enter node , (file) or (file)node

A Simple CGI Library
--------------------

     HTTP is like being married: you have to be able to handle whatever
     you're given, while being very careful what you send back.
     Phil Smith III,
     `http://www.netfunny.com/rhf/jokes/99/Mar/http.html'

   In Note: A Web Service with Interaction, we saw
the function `CGI_setup' as part of the web server "core logic"
framework. The code presented there handles almost everything necessary
for CGI requests.  One thing it doesn't do is handle encoded characters
in the requests.  For example, an `&' is encoded as a percent sign
followed by the hexadecimal value--`%26'.  These encoded values should
be decoded.  Following is a simple library to perform these tasks.
This code is used for all web server examples used throughout the rest
of this Info file.  If you want to use it for your own web server,
store the source code into a file named `inetlib.awk'. Then you can
include these functions into your code by placing the following
statement into your program:

     @include inetlib.awk

on the first line of your script. But beware, this mechanism is only
possible if you invoke your web server script with `igawk' instead of
the usual `awk' or `gawk'.  Here is the code:

     # CGI Library and core of a web server
     # Global arrays
     #   GETARG --- arguments to CGI GET command
     #   MENU   --- menu items (path names)
     #   PARAM  --- parameters of form x=y
     
     # Optional variable MyHost contains host address
     # Optional variable MyPort contains port number
     # Needs TopHeader, TopDoc, TopFooter
     # Sets MyPrefix, HttpService, Status, Reason
     
     BEGIN {
       if (MyHost == "") {
          "uname -n" | getline MyHost
          close("uname -n")
       }
       if (MyPort ==  0) MyPort = 8080
       HttpService = "/inet/tcp/" MyPort "/0/0"
       MyPrefix    = "http://" MyHost ":" MyPort
       SetUpServer()
       while ("awk" != "complex") {
         # header lines are terminated this way
         RS = ORS    = "\r\n"
         Status      = 200             # this means OK
         Reason      = "OK"
         Header      = TopHeader
         Document    = TopDoc
         Footer      = TopFooter
         if        (GETARG["Method"] == "GET") {
             HandleGET()
         } else if (GETARG["Method"] == "HEAD") {
             # not yet implemented
         } else if (GETARG["Method"] != "") {
             print "bad method", GETARG["Method"]
         }
         Prompt = Header Document Footer
         print "HTTP/1.0", Status, Reason     |& HttpService
         print "Connection: Close"            |& HttpService
         print "Pragma: no-cache"             |& HttpService
         len = length(Prompt) + length(ORS)
         print "Content-length:", len         |& HttpService
         print ORS Prompt                     |& HttpService
         # ignore all the header lines
         while ((HttpService |& getline) > 0)
             continue
         # stop talking to this client
         close(HttpService)
         # wait for new client request
         HttpService |& getline
         # do some logging
         print systime(), strftime(), $0
         CGI_setup($1, $2, $3)
       }
     }
     
     function CGI_setup(   method, uri, version, i)
     {
         delete GETARG
         delete MENU
         delete PARAM
         GETARG["Method"] = method
         GETARG["URI"] = uri
         GETARG["Version"] = version
     
         i = index(uri, "?")
         if (i > 0) {  # is there a "?" indicating a CGI request?
             split(substr(uri, 1, i-1), MENU, "[/:]")
             split(substr(uri, i+1), PARAM, "&")
             for (i in PARAM) {
                 PARAM[i] = _CGI_decode(PARAM[i])
                 j = index(PARAM[i], "=")
                 GETARG[substr(PARAM[i], 1, j-1)] = \
     	                                 substr(PARAM[i], j+1)
             }
         } else { # there is no "?", no need for splitting PARAMs
             split(uri, MENU, "[/:]")
         }
         for (i in MENU)     # decode characters in path
             if (i > 4)      # but not those in host name
                 MENU[i] = _CGI_decode(MENU[i])
     }

   This isolates details in a single function, `CGI_setup'.  Decoding
of encoded characters is pushed off to a helper function,
`_CGI_decode'. The use of the leading underscore (`_') in the function
name is intended to indicate that it is an "internal" function,
although there is nothing to enforce this:

     function _CGI_decode(str,   hexdigs, i, pre, code1, code2,
                                 val, result)
     {
        hexdigs = "123456789abcdef"
     
        i = index(str, "%")
        if (i == 0) # no work to do
           return str
     
        do {
           pre = substr(str, 1, i-1)   # part before %xx
           code1 = substr(str, i+1, 1) # first hex digit
           code2 = substr(str, i+2, 1) # second hex digit
           str = substr(str, i+3)      # rest of string
     
           code1 = tolower(code1)
           code2 = tolower(code2)
           val = index(hexdigs, code1) * 16 \
                 + index(hexdigs, code2)
     
           result = result pre sprintf("%c", val)
           i = index(str, "%")
        } while (i != 0)
        if (length(str) > 0)
           result = result str
        return result
     }

   This works by splitting the string apart around an encoded character.
The two digits are converted to lowercase and looked up in a string of
hex digits.  Note that `0' is not in the string on purpose; `index'
returns zero when it's not found, automatically giving the correct
value!  Once the hexadecimal value is converted from characters in a
string into a numerical value, `sprintf' converts the value back into a
real character.  The following is a simple test harness for the above
functions:

     BEGIN {
       CGI_setup("GET",
       "http://www.gnu.org/cgi-bin/foo?p1=stuff&p2=stuff%26junk" \
            "&percent=a %25 sign",
       "1.0")
       for (i in MENU)
           printf "MENU[\"%s\"] = %s\n", i, MENU[i]
       for (i in PARAM)
           printf "PARAM[\"%s\"] = %s\n", i, PARAM[i]
       for (i in GETARG)
           printf "GETARG[\"%s\"] = %s\n", i, GETARG[i]
     }

   And this is the result when we run it:

     $ gawk -f testserv.awk
     -| MENU["4"] = www.gnu.org
     -| MENU["5"] = cgi-bin
     -| MENU["6"] = foo
     -| MENU["1"] = http
     -| MENU["2"] =
     -| MENU["3"] =
     -| PARAM["1"] = p1=stuff
     -| PARAM["2"] = p2=stuff&junk
     -| PARAM["3"] = percent=a % sign
     -| GETARG["p1"] = stuff
     -| GETARG["percent"] = a % sign
     -| GETARG["p2"] = stuff&junk
     -| GETARG["Method"] = GET
     -| GETARG["Version"] = 1.0
     -| GETARG["URI"] = http://www.gnu.org/cgi-bin/foo?p1=stuff&
     p2=stuff%26junk&percent=a %25 sign


automatically generated by info2www version 1.2.2.9