GNU Info

Info Node: (am-utils.info)Keep-alives

(am-utils.info)Keep-alives


Next: Non-blocking Operation Prev: Automatic Unmounting Up: Overview
Enter node , (file) or (file)node

Keep-alives
===========

   Use of some filesystem types requires the presence of a server on
another machine.  If a machine crashes then it is of no concern to
processes on that machine that the filesystem is unavailable.  However,
to processes on a remote host using that machine as a fileserver this
event is important.  This situation is most widely recognized when an
NFS server crashes and the behavior observed on client machines is that
more and more processes hang.  In order to provide the possibility of
recovery, Amd implements a "keep-alive" interval timer for some
filesystem types.  Currently only NFS makes use of this service.

   The basis of the NFS keep-alive implementation is the observation
that most sites maintain replicated copies of common system data such as
manual pages, most or all programs, system source code and so on.  If
one of those servers goes down it would be reasonable to mount one of
the others as a replacement.

   The first part of the process is to keep track of which fileservers
are up and which are down.  Amd does this by sending RPC requests to the
servers' NFS `NullProc' and checking whether a reply is returned.
While the server state is uncertain the requests are re-transmitted at
three second intervals and if no reply is received after four attempts
the server is marked down.  If a reply is received the fileserver is
marked up and stays in that state for 30 seconds at which time another
NFS ping is sent.

   Once a fileserver is marked down, requests continue to be sent every
30 seconds in order to determine when the fileserver comes back up.
During this time any reference through Amd to the filesystems on that
server fail with the error "Operation would block".  If a replacement
volume is available then it will be mounted, otherwise the error is
returned to the user.

   Although this action does not protect user files, which are unique on
the network, or processes which do not access files via Amd or already
have open files on the hung filesystem, it can prevent most new
processes from hanging.

   By default, fileserver state is not maintained for NFS/TCP mounts.
The remote fileserver is always assumed to be up.


automatically generated by info2www version 1.2.2.9