The kernel has lots of parameters which
can be tuned for different circumstances. While, as usual, the default
parameters serve 99% of installations very well, we don't call this the
Advanced HOWTO for the fun of it!
The interesting bits are in /proc/sys/net, take a look there. Not everything
will be documented here initially, but we're working on it.
By default, routers route everything, even packets which 'obviously' don't
belong on your network. A common example is private IP space escaping onto
the internet. If you have an interface with a route of 195.96.96.0/24 to it,
you do not expect packets from 212.64.94.1 to arrive there.
Lots of people will want to turn this feature off, so the kernel hackers
have made it easy. There are files in /proc where you can tell
the kernel to do this for you. The method is called "Reverse Path
Filtering". Basically, if the reply to this packet wouldn't go out the
interface this packet came in, then this is a bogus packet and should be
ignored.
The following fragment will turn this on for all current and future
interfaces.
# for i in /proc/sys/net/ipv4/conf/*/rp_filter ; do
> echo 2 > $i
> done
Going by the example above, if a packet arrived on the Linux router on eth1
claiming to come from the Office+ISP subnet, it would be dropped. Similarly,
if a packet came from the Office subnet, claiming to be from somewhere
outside your firewall, it would be dropped also.
The above is full reverse path filtering. The default is to only filter
based on IPs that are on directly connected networks. This is because the
full filtering breaks in the case of asymmetric routing (where packets come
in one way and go out another, like satellite traffic, or if you have
dynamic (bgp, ospf, rip) routes in your network. The data comes down
through the satellite dish and replies go back through normal land-lines).
If this exception applies to you (and you'll probably know if it does) you
can simply turn off the rp_filter on the interface where the
satellite data comes in. If you want to see if any packets are being
dropped, the log_martians file in the same directory will tell
the kernel to log them to your syslog.
Ok, there are a lot of parameters which can be modified. We try to list them
all. Also documented (partly) in Documentation/ip-sysctl.txt.
Some of these settings have different defaults based on whether you
answered 'Yes' to 'Configure as router and not host' while compiling your
kernel.
Generic ipv4
As a generic note, most rate limiting features don't work on loopback, so
don't test them locally. The limits are supplied in 'jiffies', and are
enforced using the earlier mentioned token bucket filter.
The kernel has an internal clock which runs at 'HZ' ticks (or 'jiffies') per
second. On intel, 'HZ' is mostly 100. So setting a *_rate file to, say 50,
would allow for 2 packets per second. The token bucket filter is also
configured to allow for a burst of at most 6 packets, if enough tokens have
been earned.
Several entries in the following list have been copied from
/usr/src/linux/Documentation/networking/ip-sysctl.txt, written by Alexey
Kuznetsov <kuznet@ms2.inr.ac.ru> and Andi Kleen <ak@muc.de>
/proc/sys/net/ipv4/icmp_destunreach_rate
If the kernel decides that it can't deliver a packet, it will drop it, and
send the source of the packet an ICMP notice to this effect.
/proc/sys/net/ipv4/icmp_echo_ignore_all
Don't act on echo packets at all. Please don't set this by default, but if
you are used as a relay in a DoS attack, it may be useful.
If you ping the broadcast address of a network, all hosts are supposed to
respond. This makes for a dandy denial-of-service tool. Set this to 1 to
ignore these broadcast messages.
/proc/sys/net/ipv4/icmp_echoreply_rate
The rate at which echo replies are sent to any one destination.
Set this to ignore ICMP errors caused by hosts in the network reacting badly
to frames sent to what they perceive to be the broadcast address.
/proc/sys/net/ipv4/icmp_paramprob_rate
A relatively unknown ICMP message, which is sent in response to incorrect
packets with broken IP or TCP headers. With this file you can control the
rate at which it is sent.
/proc/sys/net/ipv4/icmp_timeexceed_rate
This the famous cause of the 'Solaris middle star' in traceroutes. Limits
number of ICMP Time Exceeded messages sent.
/proc/sys/net/ipv4/igmp_max_memberships
Maximum number of listening igmp (multicast) sockets on the host.
FIXME: Is this true?
/proc/sys/net/ipv4/inet_peer_gc_maxtime
FIXME: Add a little explanation about the inet peer storage?
Minimum interval between garbage collection passes. This interval is in
effect under low (or absent) memory pressure on the pool. Measured in
jiffies.
/proc/sys/net/ipv4/inet_peer_gc_mintime
Minimum interval between garbage collection passes. This interval is in
effect under high memory pressure on the pool. Measured in jiffies.
/proc/sys/net/ipv4/inet_peer_maxttl
Maximum time-to-live of entries. Unused entries will expire after this
period of time if there is no memory pressure on the pool (i.e. when the
number of entries in the pool is very small). Measured in jiffies.
/proc/sys/net/ipv4/inet_peer_minttl
Minimum time-to-live of entries. Should be enough to cover fragment
time-to-live on the reassembling side. This minimum time-to-live
is guaranteed if the pool size is less than inet_peer_threshold.
Measured in jiffies.
/proc/sys/net/ipv4/inet_peer_threshold
The approximate size of the INET peer storage. Starting from this threshold
entries will be thrown aggressively. This threshold also determines
entries' time-to-live and time intervals between garbage collection passes.
More entries, less time-to-live, less GC interval.
/proc/sys/net/ipv4/ip_autoconfig
This file contains the number one if the host received its IP configuration by
RARP, BOOTP, DHCP or a similar mechanism. Otherwise it is zero.
/proc/sys/net/ipv4/ip_default_ttl
Time To Live of packets. Set to a safe 64. Raise it if you have a huge
network. Don't do so for fun - routing loops cause much more damage that
way. You might even consider lowering it in some circumstances.
/proc/sys/net/ipv4/ip_dynaddr
You need to set this if you use dial-on-demand with a dynamic interface
address. Once your demand interface comes up, any local TCP sockets which haven't seen replies will be rebound to have the right address. This solves the problem that the
connection that brings up your interface itself does not work, but the
second try does.
/proc/sys/net/ipv4/ip_forward
If the kernel should attempt to forward packets. Off by default.
/proc/sys/net/ipv4/ip_local_port_range
Range of local ports for outgoing connections. Actually quite small by
default, 1024 to 4999.
/proc/sys/net/ipv4/ip_no_pmtu_disc
Set this if you want to disable Path MTU discovery - a technique to
determine the largest Maximum Transfer Unit possible on your path. See also
the section on Path MTU discovery in the cookbook chapter.
/proc/sys/net/ipv4/ipfrag_high_thresh
Maximum memory used to reassemble IP fragments. When
ipfrag_high_thresh bytes of memory is allocated for this purpose,
the fragment handler will toss packets until ipfrag_low_thresh
is reached.
/proc/sys/net/ipv4/ip_nonlocal_bind
Set this if you want your applications to be able to bind to an address
which doesn't belong to a device on your system. This can be useful when
your machine is on a non-permanent (or even dynamic) link, so your services
are able to start up and bind to a specific address when your link is down.
/proc/sys/net/ipv4/ipfrag_low_thresh
Minimum memory used to reassemble IP fragments.
/proc/sys/net/ipv4/ipfrag_time
Time in seconds to keep an IP fragment in memory.
/proc/sys/net/ipv4/tcp_abort_on_overflow
A boolean flag controlling the behaviour under lots of incoming connections.
When enabled, this causes the kernel to actively send RST packets when a
service is overloaded.
/proc/sys/net/ipv4/tcp_fin_timeout
Time to hold socket in state FIN-WAIT-2, if it was closed by our side. Peer
can be broken and never close its side, or even died unexpectedly. Default
value is 60sec. Usual value used in 2.2 was 180 seconds, you may restore it,
but remember that if your machine is even underloaded WEB server, you risk
to overflow memory with kilotons of dead sockets, FIN-WAIT-2 sockets are
less dangerous than FIN-WAIT-1, because they eat maximum 1.5K of memory, but
they tend to live longer. Cf. tcp_max_orphans.
/proc/sys/net/ipv4/tcp_keepalive_time
How often TCP sends out keepalive messages when keepalive is enabled.
Default: 2hours.
/proc/sys/net/ipv4/tcp_keepalive_intvl
How frequent probes are retransmitted, when a probe isn't acknowledged.
Default: 75 seconds.
/proc/sys/net/ipv4/tcp_keepalive_probes
How many keepalive probes TCP will send, until it decides that the
connection is broken.
Default value: 9.
Multiplied with tcp_keepalive_intvl, this gives the time a link can be
nonresponsive after a keepalive has been sent.
/proc/sys/net/ipv4/tcp_max_orphans
Maximal number of TCP sockets not attached to any user file handle, held by
system. If this number is exceeded orphaned connections are reset
immediately and warning is printed. This limit exists only to prevent simple
DoS attacks, you _must_ not rely on this or lower the limit artificially,
but rather increase it (probably, after increasing installed memory), if
network conditions require more than default value, and tune network
services to linger and kill such states more aggressively. Let me remind you
again: each orphan eats up to 64K of unswappable memory.
/proc/sys/net/ipv4/tcp_orphan_retries
How may times to retry before killing TCP connection, closed by our side.
Default value 7 corresponds to 50sec-16min depending on RTO. If your machine
is a loaded WEB server, you should think about lowering this value, such
sockets may consume significant resources. Cf. tcp_max_orphans.
/proc/sys/net/ipv4/tcp_max_syn_backlog
Maximal number of remembered connection requests, which still did not
receive an acknowledgement from connecting client. Default value is 1024 for
systems with more than 128Mb of memory, and 128 for low memory machines. If
server suffers of overload, try to increase this number. Warning! If you
make it greater than 1024, it would be better to change TCP_SYNQ_HSIZE in
include/net/tcp.h to keep TCP_SYNQ_HSIZE*16<=tcp_max_syn_backlog and to
recompile kernel.
/proc/sys/net/ipv4/tcp_max_tw_buckets
Maximal number of timewait sockets held by system simultaneously. If this
number is exceeded time-wait socket is immediately destroyed and warning is
printed. This limit exists only to prevent simple DoS attacks, you _must_
not lower the limit artificially, but rather increase it (probably, after
increasing installed memory), if network conditions require more than
default value.
/proc/sys/net/ipv4/tcp_retrans_collapse
Bug-to-bug compatibility with some broken printers.
On retransmit try to send bigger packets to work around bugs in
certain TCP stacks.
/proc/sys/net/ipv4/tcp_retries1
How many times to retry before deciding that something is wrong
and it is necessary to report this suspection to network layer.
Minimal RFC value is 3, it is default, which corresponds
to 3sec-8min depending on RTO.
/proc/sys/net/ipv4/tcp_retries2
How may times to retry before killing alive TCP connection.
RFC1122 says that the limit should be longer than 100 sec.
It is too small number. Default value 15 corresponds to 13-30min
depending on RTO.
/proc/sys/net/ipv4/tcp_rfc1337
This boolean enables a fix for 'time-wait assassination hazards in tcp', described
in RFC 1337. If enabled, this causes the kernel to drop RST packets for
sockets in the time-wait state.
Default: 0
/proc/sys/net/ipv4/tcp_sack
Use Selective ACK which can be used to signify that specific packets are
missing - therefore helping fast recovery.
/proc/sys/net/ipv4/tcp_stdurg
Use the Host requirements interpretation of the TCP urg pointer
field.
Most hosts use the older BSD interpretation, so if you turn this on
Linux might not communicate correctly with them.
Default: FALSE
/proc/sys/net/ipv4/tcp_syn_retries
Number of SYN packets the kernel will send before giving up on the new
connection.
/proc/sys/net/ipv4/tcp_synack_retries
To open the other side of the connection, the kernel sends a SYN with a
piggybacked ACK on it, to acknowledge the earlier received SYN. This is part
2 of the threeway handshake. This setting determines the number of SYN+ACK
packets sent before the kernel gives up on the connection.
/proc/sys/net/ipv4/tcp_timestamps
Timestamps are used, amongst other things, to protect against wrapping
sequence numbers. A 1 gigabit link might conceivably re-encounter a previous
sequence number with an out-of-line value, because it was of a previous
generation. The timestamp will let it recognise this 'ancient packet'.
/proc/sys/net/ipv4/tcp_tw_recycle
Enable fast recycling TIME-WAIT sockets. Default value is 1.
It should not be changed without advice/request of technical experts.
/proc/sys/net/ipv4/tcp_window_scaling
TCP/IP normally allows windows up to 65535 bytes big. For really fast
networks, this may not be enough. The window scaling options allows for
almost gigabyte windows, which is good for high bandwidth*delay products.
Per device settings
DEV can either stand for a real interface, or for 'all' or 'default'.
Default also changes settings for interfaces yet to be created.
/proc/sys/net/ipv4/conf/DEV/accept_redirects
If a router decides that you are using it for a wrong purpose (ie, it needs
to resend your packet on the same interface), it will send us a ICMP
Redirect. This is a slight security risk however, so you may want to turn it
off, or use secure redirects.
/proc/sys/net/ipv4/conf/DEV/accept_source_route
Not used very much anymore. You used to be able to give a packet a list of
IP addresses it should visit on its way. Linux can be made to honor this IP
option.
/proc/sys/net/ipv4/conf/DEV/bootp_relay
FIXME: fill this in
/proc/sys/net/ipv4/conf/DEV/forwarding
FIXME:
/proc/sys/net/ipv4/conf/DEV/log_martians
See the section on reverse path filters.
/proc/sys/net/ipv4/conf/DEV/mc_forwarding
If we do multicast forwarding on this interface
/proc/sys/net/ipv4/conf/DEV/proxy_arp
If you set this to 1, all other interfaces will respond to arp queries
destined for addresses on this interface. Can be very useful when building 'ip
pseudo bridges'. Do take care that your netmasks are very correct before
enabling this!
/proc/sys/net/ipv4/conf/DEV/rp_filter
See the section on reverse path filters.
/proc/sys/net/ipv4/conf/DEV/secure_redirects
FIXME: fill this in
/proc/sys/net/ipv4/conf/DEV/send_redirects
If we send the above mentioned redirects.
/proc/sys/net/ipv4/conf/DEV/shared_media
FIXME: fill this in
/proc/sys/net/ipv4/conf/DEV/tag
FIXME: fill this in
Neighbor policy
Dev can either stand for a real interface, or for 'all' or 'default'.
Default also changes settings for interfaces yet to be created.