GNU Info

Info Node: (gawkinet.info)STATIST

(gawkinet.info)STATIST


Next: MAZE Prev: WEBGRAB Up: Some Applications and Techniques
Enter node , (file) or (file)node

STATIST: Graphing a Statistical Distribution
============================================

   In the HTTP server examples we've shown thus far, we never present
an image to the browser and its user. Presenting images is one task.
Generating images that reflect some user input and presenting these
dynamically generated images is another. In this node, we use GNUPlot
for generating `.png', `.ps', or `.gif' files.(1)

   The program we develop takes the statistical parameters of two
samples and computes the t-test statistics. As a result, we get the
probabilities that the means and the variances of both samples are the
same. In order to let the user check plausibility, the program presents
an image of the distributions. The statistical computation follows
`Numerical Recipes in C: The Art of Scientific Computing' by William H.
Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery.
Since `gawk' does not have a built-in function for the computation of
the beta function, we use the `ibeta' function of GNUPlot. As a side
effect, we learn how to use GNUPlot as a sophisticated calculator. The
comparison of means is done as in `tutest', paragraph 14.2, page 613,
and the comparison of variances is done as in `ftest', page 611 in
`Numerical Recipes'.

   As usual, we take the site-independent code for servers and append
our own functions `SetUpServer' and `HandleGET':

     function SetUpServer() {
       TopHeader = "<HTML><title>Statistics with GAWK</title>"
       TopDoc = "<BODY>\
        <h2>Please choose one of the following actions:</h2>\
        <UL>\
         <LI><A HREF=" MyPrefix "/AboutServer>About this server</A></LI>\
         <LI><A HREF=" MyPrefix "/EnterParameters>Enter Parameters</A></LI>\
        </UL>"
       TopFooter  = "</BODY></HTML>"
       GnuPlot    = "gnuplot 2>&1"
       m1=m2=0;    v1=v2=1;    n1=n2=10
     }

   Here, you see the menu structure that the user sees. Later, we will
see how the program structure of the `HandleGET' function reflects the
menu structure. What is missing here is the link for the image we
generate. In an event-driven environment, request, generation, and
delivery of images are separated.

   Notice the way we initialize the `GnuPlot' command string for the
pipe. By default, GNUPlot outputs the generated image via standard
output, as well as the results of `print'(ed) calculations via standard
error.  The redirection causes standard error to be mixed into standard
output, enabling us to read results of calculations with `getline'.  By
initializing the statistical parameters with some meaningful defaults,
we make sure the user gets an image the first time he uses the program.

   Following is the rather long function `HandleGET', which implements
the contents of this service by reacting to the different kinds of
requests from the browser. Before you start playing with this script,
make sure that your browser supports JavaScript and that it also has
this option switched on. The script uses a short snippet of JavaScript
code for delayed opening of a window with an image.  A more detailed
explanation follows:

     function HandleGET() {
       if(MENU[2] == "AboutServer") {
         Document  = "This is a GUI for a statistical computation.\
           It compares means and variances of two distributions.\
           It is implemented as one GAWK script and uses GNUPLOT."
       } else if (MENU[2] == "EnterParameters") {
         Document = ""
         if ("m1" in GETARG) {     # are there parameters to compare?
           Document = Document "<SCRIPT LANGUAGE=\"JavaScript\">\
             setTimeout(\"window.open(\\\"" MyPrefix "/Image" systime()\
              "\\\",\\\"dist\\\", \\\"status=no\\\");\", 1000); </SCRIPT>"
           m1 = GETARG["m1"]; v1 = GETARG["v1"]; n1 = GETARG["n1"]
           m2 = GETARG["m2"]; v2 = GETARG["v2"]; n2 = GETARG["n2"]
           t = (m1-m2)/sqrt(v1/n1+v2/n2)
           df = (v1/n1+v2/n2)*(v1/n1+v2/n2)/((v1/n1)*(v1/n1)/(n1-1) \
                + (v2/n2)*(v2/n2) /(n2-1))
           if (v1>v2) {
               f = v1/v2
               df1 = n1 - 1
               df2 = n2 - 1
           } else {
               f = v2/v1
               df1 = n2 - 1
               df2 = n1 - 1
           }
           print "pt=ibeta(" df/2 ",0.5," df/(df+t*t) ")"  |& GnuPlot
           print "pF=2.0*ibeta(" df2/2 "," df1/2 "," \
                 df2/(df2+df1*f) ")"                    |& GnuPlot
           print "print pt, pF"                         |& GnuPlot
           RS="\n"; GnuPlot |& getline; RS="\r\n"    # $1 is pt, $2 is pF
           print "invsqrt2pi=1.0/sqrt(2.0*pi)"          |& GnuPlot
           print "nd(x)=invsqrt2pi/sd*exp(-0.5*((x-mu)/sd)**2)" |& GnuPlot
           print "set term png small color"             |& GnuPlot
           #print "set term postscript color"           |& GnuPlot
           #print "set term gif medium size 320,240"    |& GnuPlot
           print "set yrange[-0.3:]"                    |& GnuPlot
           print "set label 'p(m1=m2) =" $1 "' at 0,-0.1 left"  |& GnuPlot
           print "set label 'p(v1=v2) =" $2 "' at 0,-0.2 left"  |& GnuPlot
           print "plot mu=" m1 ",sd=" sqrt(v1) ", nd(x) title 'sample 1',\
             mu=" m2 ",sd=" sqrt(v2) ", nd(x) title 'sample 2'" |& GnuPlot
           print "quit"                                         |& GnuPlot
           GnuPlot |& getline Image
           while ((GnuPlot |& getline) > 0)
               Image = Image RS $0
           close(GnuPlot)
         }
         Document = Document "\
         <h3>Do these samples have the same Gaussian distribution?</h3>\
         <FORM METHOD=GET> <TABLE BORDER CELLPADDING=5>\
         <TR>\
         <TD>1. Mean    </TD>
         <TD><input type=text name=m1 value=" m1 " size=8></TD>\
         <TD>1. Variance</TD>
         <TD><input type=text name=v1 value=" v1 " size=8></TD>\
         <TD>1. Count   </TD>
         <TD><input type=text name=n1 value=" n1 " size=8></TD>\
         </TR><TR>\
         <TD>2. Mean    </TD>
         <TD><input type=text name=m2 value=" m2 " size=8></TD>\
         <TD>2. Variance</TD>
         <TD><input type=text name=v2 value=" v2 " size=8></TD>\
         <TD>2. Count   </TD>
         <TD><input type=text name=n2 value=" n2 " size=8></TD>\
         </TR>                   <input type=submit value=\"Compute\">\
         </TABLE></FORM><BR>"
       } else if (MENU[2] ~ "Image") {
         Reason = "OK" ORS "Content-type: image/png"
         #Reason = "OK" ORS "Content-type: application/x-postscript"
         #Reason = "OK" ORS "Content-type: image/gif"
         Header = Footer = ""
         Document = Image
       }
     }

   As usual, we give a short description of the service in the first
menu choice. The third menu choice shows us that generation and
presentation of an image are two separate actions. While the latter
takes place quite instantly in the third menu choice, the former takes
place in the much longer second choice. Image data passes from the
generating action to the presenting action via the variable `Image'
that contains a complete `.png' image, which is otherwise stored in a
file. If you prefer `.ps' or `.gif' images over the default `.png'
images, you may select these options by uncommenting the appropriate
lines. But remember to do so in two places: when telling GNUPlot which
kind of images to generate, and when transmitting the image at the end
of the program.

   Looking at the end of the program, the way we pass the
`Content-type' to the browser is a bit unusual.  It is appended to the
`OK' of the first header line to make sure the type information becomes
part of the header.  The other variables that get transmitted across
the network are made empty, because in this case we do not have an HTML
document to transmit, but rather raw image data to contain in the body.

   Most of the work is done in the second menu choice. It starts with a
strange JavaScript code snippet. When first implementing this server,
we used a short `"<IMG SRC=" MyPrefix "/Image>"' here. But then
browsers got smarter and tried to improve on speed by requesting the
image and the HTML code at the same time. When doing this, the browser
tries to build up a connection for the image request while the request
for the HTML text is not yet completed. The browser tries to connect to
the `gawk' server on port 8080 while port 8080 is still in use for
transmission of the HTML text. The connection for the image cannot be
built up, so the image appears as "broken" in the browser window.  We
solved this problem by telling the browser to open a separate window
for the image, but only after a delay of 1000 milliseconds.  By this
time, the server should be ready for serving the next request.

   But there is one more subtlety in the JavaScript code.  Each time
the JavaScript code opens a window for the image, the name of the image
is appended with a timestamp (`systime').  Why this constant change of
name for the image? Initially, we always named the image `Image', but
then the Netscape browser noticed the name had _not_ changed since the
previous request and displayed the previous image (caching behavior).
The server core is implemented so that browsers are told _not_ to cache
anything.  Obviously HTTP requests do not always work as expected. One
way to circumvent the cache of such overly smart browsers is to change
the name of the image with each request. These three lines of JavaScript
caused us a lot of trouble.

   The rest can be broken down into two phases. At first, we check if
there are statistical parameters. When the program is first started,
there usually are no parameters because it enters the page coming from
the top menu.  Then, we only have to present the user a form that he
can use to change statistical parameters and submit them. Subsequently,
the submission of the form causes the execution of the first phase
because _now_ there _are_ parameters to handle.

   Now that we have parameters, we know there will be an image
available.  Therefore we insert the JavaScript code here to initiate
the opening of the image in a separate window. Then, we prepare some
variables that will be passed to GNUPlot for calculation of the
probabilities. Prior to reading the results, we must temporarily change
`RS' because GNUPlot separates lines with newlines.  After instructing
GNUPlot to generate a `.png' (or `.ps' or `.gif') image, we initiate
the insertion of some text, explaining the resulting probabilities. The
final `plot' command actually generates the image data. This raw binary
has to be read in carefully without adding, changing, or deleting a
single byte. Hence the unusual initialization of `Image' and completion
with a `while' loop.

   When using this server, it soon becomes clear that it is far from
being perfect. It mixes source code of six scripting languages or
protocols:

   * GNU `awk' implements a server for the protocol:

   * HTTP which transmits:

   * HTML text which contains a short piece of:

   * JavaScript code opening a separate window.

   * A Bourne shell script is used for piping commands into:

   * GNUPlot to generate the image to be opened.

   After all this work, the GNUPlot image opens in the JavaScript window
where it can be viewed by the user.

   It is probably better not to mix up so many different languages.
The result is not very readable.  Furthermore, the statistical part of
the server does not take care of invalid input.  Among others, using
negative variances will cause invalid results.

   ---------- Footnotes ----------

   (1) Due to licensing problems, the default installation of GNUPlot
disables the generation of `.gif' files.  If your installed version
does not accept `set term gif', just download and install the most
recent version of GNUPlot and the GD library
(http://www.boutell.com/gd/) by Thomas Boutell.  Otherwise you still
have the chance to generate some ASCII-art style images with GNUPlot by
using `set term dumb'.  (We tried it and it worked.)


automatically generated by info2www version 1.2.2.9