GNU Info

Info Node: (gawkinet.info)STOXPRED

(gawkinet.info)STOXPRED


Next: PROTBASE Prev: MOBAGWHO Up: Some Applications and Techniques
Enter node , (file) or (file)node

STOXPRED: Stock Market Prediction As A Service
==============================================

     Far out in the uncharted backwaters of the unfashionable end of
     the Western Spiral arm of the Galaxy lies a small unregarded
     yellow sun.

     Orbiting this at a distance of roughly ninety-two million miles is
     an utterly insignificant little blue-green planet whose
     ape-descendent life forms are so amazingly primitive that they
     still think digital watches are a pretty neat idea.

     This planet has -- or rather had -- a problem, which was this:
     most of the people living on it were unhappy for pretty much of
     the time.  Many solutions were suggested for this problem, but
     most of these were largely concerned with the movements of small
     green pieces of paper, which is odd because it wasn't the small
     green pieces of paper that were unhappy.
     Douglas Adams, `The Hitch Hiker's Guide to the Galaxy'

   Valuable services on the Internet are usually _not_ implemented as
mobile agents. There are much simpler ways of implementing services.
All Unix systems provide, for example, the `cron' service.  Unix system
users can write a list of tasks to be done each day, each week, twice a
day, or just once. The list is entered into a file named `crontab'.
For example, to distribute a newsletter on a daily basis this way, use
`cron' for calling a script each day early in the morning.

     # run at 8 am on weekdays, distribute the newsletter
     0 8 * * 1-5   $HOME/bin/daily.job >> $HOME/log/newsletter 2>&1

   The script first looks for interesting information on the Internet,
assembles it in a nice form and sends the results via email to the
customers.

   The following is an example of a primitive newsletter on stock
market prediction. It is a report which first tries to predict the
change of each share in the Dow Jones Industrial Index for the
particular day. Then it mentions some especially promising shares as
well as some shares which look remarkably bad on that day. The report
ends with the usual disclaimer which tells every child _not_ to try
this at home and hurt anybody.

     Good morning Uncle Scrooge,
     
     This is your daily stock market report for Monday, October 16, 2000.
     Here are the predictions for today:
     
             AA      neutral
             GE      up
             JNJ     down
             MSFT    neutral
             ...
             UTX     up
             DD      down
             IBM     up
             MO      down
             WMT     up
             DIS     up
             INTC    up
             MRK     down
             XOM     down
             EK      down
             IP      down
     
     The most promising shares for today are these:
     
             INTC            http://biz.yahoo.com/n/i/intc.html
     
     The stock shares to avoid today are these:
     
             EK              http://biz.yahoo.com/n/e/ek.html
             IP              http://biz.yahoo.com/n/i/ip.html
             DD              http://biz.yahoo.com/n/d/dd.html
             ...

   The script as a whole is rather long. In order to ease the pain of
studying other people's source code, we have broken the script up into
meaningful parts which are invoked one after the other.  The basic
structure of the script is as follows:

     BEGIN {
       Init()
       ReadQuotes()
       CleanUp()
       Prediction()
       Report()
       SendMail()
     }

   The earlier parts store data into variables and arrays which are
subsequently used by later parts of the script. The `Init' function
first checks if the script is invoked correctly (without any
parameters).  If not, it informs the user of the correct usage. What
follows are preparations for the retrieval of the historical quote
data. The names of the 30 stock shares are stored in an array `name'
along with the current date in `day', `month', and `year'.

   All users who are separated from the Internet by a firewall and have
to direct their Internet accesses to a proxy must supply the name of
the proxy to this script with the `-v Proxy=NAME' option. For most
users, the default proxy and port number should suffice.

     function Init() {
       if (ARGC != 1) {
         print "STOXPRED - daily stock share prediction"
         print "IN:\n    no parameters, nothing on stdin"
         print "PARAM:\n    -v Proxy=MyProxy -v ProxyPort=80"
         print "OUT:\n    commented predictions as email"
         print "JK 09.10.2000"
         exit
       }
       # Remember ticker symbols from Dow Jones Industrial Index
       StockCount = split("AA GE JNJ MSFT AXP GM JPM PG BA HD KO \
         SBC C HON MCD T CAT HWP MMM UTX DD IBM MO WMT DIS INTC \
         MRK XOM EK IP", name);
       # Remember the current date as the end of the time series
       day   = strftime("%d")
       month = strftime("%m")
       year  = strftime("%Y")
       if (Proxy     == "")  Proxy     = "chart.yahoo.com"
       if (ProxyPort ==  0)  ProxyPort = 80
       YahooData = "/inet/tcp/0/" Proxy "/" ProxyPort
     }

   There are two really interesting parts in the script. One is the
function which reads the historical stock quotes from an Internet
server. The other is the one that does the actual prediction. In the
following function we see how the quotes are read from the Yahoo
server. The data which comes from the server is in CSV format
(comma-separated values):

     Date,Open,High,Low,Close,Volume
     9-Oct-00,22.75,22.75,21.375,22.375,7888500
     6-Oct-00,23.8125,24.9375,21.5625,22,10701100
     5-Oct-00,24.4375,24.625,23.125,23.50,5810300

   Lines contain values of the same time instant, whereas columns are
separated by commas and contain the kind of data that is described in
the header (first) line. At first, `gawk' is instructed to separate
columns by commas (`FS = ","'). In the loop that follows, a connection
to the Yahoo server is first opened, then a download takes place, and
finally the connection is closed. All this happens once for each ticker
symbol. In the body of this loop, an Internet address is built up as a
string according to the rules of the Yahoo server. The starting and
ending date are chosen to be exactly the same, but one year apart in
the past. All the action is initiated within the `printf' command which
transmits the request for data to the Yahoo server.

   In the inner loop, the server's data is first read and then scanned
line by line. Only lines which have six columns and the name of a month
in the first column contain relevant data. This data is stored in the
two-dimensional array `quote'; one dimension being time, the other
being the ticker symbol. During retrieval of the first stock's data,
the calendar names of the time instances are stored in the array `day'
because we need them later.

     function ReadQuotes() {
       # Retrieve historical data for each ticker symbol
       FS = ","
       for (stock = 1; stock <= StockCount; stock++) {
         URL = "http://chart.yahoo.com/table.csv?s=" name[stock] \
               "&a=" month "&b=" day   "&c=" year-1 \
               "&d=" month "&e=" day   "&f=" year \
               "g=d&q=q&y=0&z=" name[stock] "&x=.csv"
         printf("GET " URL " HTTP/1.0\r\n\r\n") |& YahooData
         while ((YahooData |& getline) > 0) {
           if (NF == 6 && $1 ~ /Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec/) {
             if (stock == 1)
               days[++daycount] = $1;
             quote[$1, stock] = $5
           }
         }
         close(YahooData)
       }
       FS = " "
     }

   Now that we _have_ the data, it can be checked once again to make
sure that no individual stock is missing or invalid, and that all the
stock quotes are aligned correctly. Furthermore, we renumber the time
instances. The most recent day gets day number 1 and all other days get
consecutive numbers. All quotes are rounded toward the nearest whole
number in US Dollars.

     function CleanUp() {
       # clean up time series; eliminate incomplete data sets
       for (d = 1; d <= daycount; d++) {
         for (stock = 1; stock <= StockCount; stock++)
           if (! ((days[d], stock) in quote))
               stock = StockCount + 10
         if (stock > StockCount + 1)
             continue
         datacount++
         for (stock = 1; stock <= StockCount; stock++)
           data[datacount, stock] = int(0.5 + quote[days[d], stock])
       }
       delete quote
       delete days
     }

   Now we have arrived at the second really interesting part of the
whole affair.  What we present here is a very primitive prediction
algorithm: _If a stock fell yesterday, assume it will also fall today;
if it rose yesterday, assume it will rise today_.  (Feel free to
replace this algorithm with a smarter one.) If a stock changed in the
same direction on two consecutive days, this is an indication which
should be highlighted.  Two-day advances are stored in `hot' and
two-day declines in `avoid'.

   The rest of the function is a sanity check. It counts the number of
correct predictions in relation to the total number of predictions one
could have made in the year before.

     function Prediction() {
       # Predict each ticker symbol by prolonging yesterday's trend
       for (stock = 1; stock <= StockCount; stock++) {
         if         (data[1, stock] > data[2, stock]) {
           predict[stock] = "up"
         } else if  (data[1, stock] < data[2, stock]) {
           predict[stock] = "down"
         } else {
           predict[stock] = "neutral"
         }
         if ((data[1, stock] > data[2, stock]) && (data[2, stock] > data[3, stock]))
           hot[stock] = 1
         if ((data[1, stock] < data[2, stock]) && (data[2, stock] < data[3, stock]))
           avoid[stock] = 1
       }
       # Do a plausibility check: how many predictions proved correct?
       for (s = 1; s <= StockCount; s++) {
         for (d = 1; d <= datacount-2; d++) {
           if         (data[d+1, s] > data[d+2, s]) {
             UpCount++
           } else if  (data[d+1, s] < data[d+2, s]) {
             DownCount++
           } else {
             NeutralCount++
           }
           if (((data[d, s]  > data[d+1, s]) && (data[d+1, s]  > data[d+2, s])) ||
               ((data[d, s]  < data[d+1, s]) && (data[d+1, s]  < data[d+2, s])) ||
               ((data[d, s] == data[d+1, s]) && (data[d+1, s] == data[d+2, s])))
             CorrectCount++
         }
       }
     }

   At this point the hard work has been done: the array `predict'
contains the predictions for all the ticker symbols. It is up to the
function `Report' to find some nice words to introduce the desired
information.

     function Report() {
       # Generate report
       report =        "\nThis is your daily "
       report = report "stock market report for "strftime("%A, %B %d, %Y")".\n"
       report = report "Here are the predictions for today:\n\n"
       for (stock = 1; stock <= StockCount; stock++)
         report = report "\t" name[stock] "\t" predict[stock] "\n"
       for (stock in hot) {
         if (HotCount++ == 0)
           report = report "\nThe most promising shares for today are these:\n\n"
         report = report "\t" name[stock] "\t\thttp://biz.yahoo.com/n/" \
           tolower(substr(name[stock], 1, 1)) "/" tolower(name[stock]) ".html\n"
       }
       for (stock in avoid) {
         if (AvoidCount++ == 0)
           report = report "\nThe stock shares to avoid today are these:\n\n"
         report = report "\t" name[stock] "\t\thttp://biz.yahoo.com/n/" \
           tolower(substr(name[stock], 1, 1)) "/" tolower(name[stock]) ".html\n"
       }
       report = report "\nThis sums up to " HotCount+0 " winners and " AvoidCount+0
       report = report " losers. When using this kind\nof prediction scheme for"
       report = report " the 12 months which lie behind us,\nwe get " UpCount
       report = report " 'ups' and " DownCount " 'downs' and " NeutralCount
       report = report " 'neutrals'. Of all\nthese " UpCount+DownCount+NeutralCount
       report = report " predictions " CorrectCount " proved correct next day.\n"
       report = report "A success rate of "\
                  int(100*CorrectCount/(UpCount+DownCount+NeutralCount)) "%.\n"
       report = report "Random choice would have produced a 33% success rate.\n"
       report = report "Disclaimer: Like every other prediction of the stock\n"
       report = report "market, this report is, of course, complete nonsense.\n"
       report = report "If you are stupid enough to believe these predictions\n"
       report = report "you should visit a doctor who can treat your ailment."
     }

   The function `SendMail' goes through the list of customers and opens
a pipe to the `mail' command for each of them. Each one receives an
email message with a proper subject heading and is addressed with his
full name.

     function SendMail() {
       # send report to customers
       customer["uncle.scrooge@ducktown.gov"] = "Uncle Scrooge"
       customer["more@utopia.org"           ] = "Sir Thomas More"
       customer["spinoza@denhaag.nl"        ] = "Baruch de Spinoza"
       customer["marx@highgate.uk"          ] = "Karl Marx"
       customer["keynes@the.long.run"       ] = "John Maynard Keynes"
       customer["bierce@devil.hell.org"     ] = "Ambrose Bierce"
       customer["laplace@paris.fr"          ] = "Pierre Simon de Laplace"
       for (c in customer) {
         MailPipe = "mail -s 'Daily Stock Prediction Newsletter'" c
         print "Good morning " customer[c] "," | MailPipe
         print report "\n.\n" | MailPipe
         close(MailPipe)
       }
     }

   Be patient when running the script by hand.  Retrieving the data for
all the ticker symbols and sending the emails may take several minutes
to complete, depending upon network traffic and the speed of the
available Internet link.  The quality of the prediction algorithm is
likely to be disappointing.  Try to find a better one.  Should you find
one with a success rate of more than 50%, please tell us about it! It
is only for the sake of curiosity, of course. `:-)'


automatically generated by info2www version 1.2.2.9