Copyright (C) 2000-2012 |
GNU Info (am-utils.info)Keep-alivesKeep-alives =========== Use of some filesystem types requires the presence of a server on another machine. If a machine crashes then it is of no concern to processes on that machine that the filesystem is unavailable. However, to processes on a remote host using that machine as a fileserver this event is important. This situation is most widely recognized when an NFS server crashes and the behavior observed on client machines is that more and more processes hang. In order to provide the possibility of recovery, Amd implements a "keep-alive" interval timer for some filesystem types. Currently only NFS makes use of this service. The basis of the NFS keep-alive implementation is the observation that most sites maintain replicated copies of common system data such as manual pages, most or all programs, system source code and so on. If one of those servers goes down it would be reasonable to mount one of the others as a replacement. The first part of the process is to keep track of which fileservers are up and which are down. Amd does this by sending RPC requests to the servers' NFS `NullProc' and checking whether a reply is returned. While the server state is uncertain the requests are re-transmitted at three second intervals and if no reply is received after four attempts the server is marked down. If a reply is received the fileserver is marked up and stays in that state for 30 seconds at which time another NFS ping is sent. Once a fileserver is marked down, requests continue to be sent every 30 seconds in order to determine when the fileserver comes back up. During this time any reference through Amd to the filesystems on that server fail with the error "Operation would block". If a replacement volume is available then it will be mounted, otherwise the error is returned to the user. Although this action does not protect user files, which are unique on the network, or processes which do not access files via Amd or already have open files on the hung filesystem, it can prevent most new processes from hanging. By default, fileserver state is not maintained for NFS/TCP mounts. The remote fileserver is always assumed to be up. automatically generated by info2www version 1.2.2.9 |