Whole document tree 5. Performance Tuning and Troubleshooting5.1. TuningOK, now we are up and running, and we want to be running at warp factor nine. No such thing as too fast, right? Linux networking is pretty robust, even a default installation with no "tuning". You may well not need to do anything else. But if your connection is not performing up to what you think it should be, then possibly there is a problem somewhere. This is may be a more worthwhile approach than the pursuit of any magical "tweak". A very rough guideline on what you might reasonably expect as a maximum sync rate, based on distance from DSLAM/CO: 0-12 K ft (0-3.6 km) - 2000 Kbps or more (8100 max for ADSL) There are many conceivable factors that could effect this one way or the other. Newer generations of DSL will surely improve this, as will related technologies like repeaters. You will loose 10-20% of the modem's attainable sync rate to networking overheads (TCP, ATM, ethernet). So a 1500 Kbps connection, is only going to realize about 1100-1300 Kbps or so of real world throughput. No tweaking is going to change the built-in protocol overheads. Also, if your service is capped at a lesser speed by your provider, then you can't get above that speed no matter what. AND -- that there are numerous variables that can effect your loop/signal quality, and subsequently your speed (aka sync rate). Some of these may be beyond your control. But there are a few things that you might want to look at. 5.1.1. TCP Receive WindowFor many of us, a default Linux installation is going to give something close to optimum performance. Windows 9x users often get a big boost by increasing their TCP Receive Window (RWIN). But this is because it is too small to start with. This is just not the case with Linux where the default value is 32KB. The exception here is if you have to routinely deal with a high latency connection. For instance, if your provider has a satellite uplink that is consistently adding unusual latency (250ms or greater?). Then a larger TCP Window will likely help. For more on TCP Receive Window and related issues, look at http://www.psc.edu/networking/perf_tune.html. The Receive Window is a buffer that helps control the flow of data. If set too low, it can be a bottleneck and restrict throughput. The optimum value for this depends completely on your bandwidth and latency. Latency being what you would find as average roundtrip time (RTT) based on your typical destinations and conditions. This can be determined with ping. For example, the Linux default of 32KB is acceptable up to speeds of 2 Mbps and a typical latency of 125ms or so, or 1.0 Mbps and latency of 250ms. Setting this value too high can also adversely effect throughput, so don't over do it. An example courtesy of Juha Saarinen of New Zealand:
The Receive Window can be dynamically set in the /proc filesystem. This requires entering a value that is twice the desired buffer size:
The above example actually sets the value to 128K. The Send Window can also be set, but is not as likely to be a limiting factor on DSL connections as the Receive Window:
These values can also be set using the sysctl command. See the man page. Other suggested kernel options for those who want to squeeze every last bit out of that copper (selected entries only):
A brief description of these, and other, options may be found in /usr/src/linux/Documentation/networking/ip-sysctl.txt, in the kernel source directory. 5.1.2. Interleaving"Interleaving" is an error control mechanism of ADSL with DMT line encoding. DMT is now the standard for ADSL, and is by far and away the most prevalent form of ADSL. Interleaving buffers the raw data and corrects errors on the fly at the DSLAM. This can significantly help marginal loops that may be prone to line errors. The downside is that this buffering also adds significant latency to the connection. So for those with reasonable quality lines, interleaving is of no real benefit, and may actually add unnecessary latency. Interleaving is an adjustable parameter and can be turned on or off by the telco. Many telcos seem to like to have this on by default, since it probably reduces tech support calls in those cases where it does help stabilize a line. But everyone else pays a price. How to know if your line is interleaved or not, and how to change it? Good question. Generally speaking, if your first hop or two on a traceroute is less than 25ms or so, you can pretty much figure that interleaving is off. But there may be other factors such as how far away those hops actually are. Unless your modem accurately reports this, the only other real way to know is to talk to someone at the telco. This may prove easier said than done. "FastPath" DMT is synonymous with "interleaving off". Again, this only applies to ADSL/DMT. 5.1.3. TCP BottlenecksDSL connections may suffer performance degradations under certain circumstances. Thankfully, Linux has very robust and flexible networking tools to help us deal with these. One such common situation is where traffic bottlenecks are created whenever data from a fast network segment hits a slower one. Such as ethernet hitting a DSL modem/router. This can cause short term traffic backlogs, known as "queues" in the device. Queuing can result in degraded performance, particularly for interactive protocols (like telnet or ssh) and streaming protocols (like RealAudio), and increased latency for ICMP and other network protocols. This is most evident when the upstream link is saturated (since downstream data is queued at the ISP's end and we can't do as much about that). The queued traffic is processed such that lower volume traffic protocols (like ssh) often get drowned out so to speak, by the higher volume, bulk traffic (like http or ftp), as there isn't any special prioritizing in default usage. And if the upstream queuing, or other factors, causes enough of a delay, it can even decrease downstream bandwidth utilization by slowing the ACKnowledgements (which are heading upstream), that are required to keep a download moving at optimal rates. So it is possible that an upload can hurt a simultaneous download. Such effects can be largely mitigated with Linux's built-in traffic shaping abilities. The user space tool for manipulating the kernel's advanced traffic routing features is iproute, sometimes packaged as iproute2. This includes various tools that can classify and prioritize traffic with a considerable degree of flexibility. It also requires various kernel config options to be turned on. And is also fairly close to Black Magic ;-) The definitive document on this is the Advanced Routing and Traffic Control HOWTO (http://linuxdoc.org/HOWTO/Adv-Routing-HOWTO.html). Pay particular attention to the "Cookbook" Section #15, and in particular #15.8, "The Ultimate Traffic Conditioner: Low Latency, Fast Up & Downloads". A great read! 5.1.4. Dropped PPP ConnectionsPPPoE and PPPoA both rely on the venerable PPP protocol. This protocol incorporates the Link Control Protocol (LCP), which is used to maintain the viability of the connection. Each end can send LCP echoes to other end, and if there is no response in the alloted time frame, the session is presumed dead, and is torn down. Again, either end can initiate this process. The client should then negotiate a new connection. But, this normally means a new IP address is assigned along with the new session. Perhaps this is undesirable? While you certainly can't control what happens on the remote end in this regard, you can adjust PPP's very flexible way of dealing with LCP echoes on your end, to increase the number of echoes, extend the interval and timeout period on your end. This might help prolong the life of an unstable connection in situations with marginal line conditions, or a buggy peer on the other end. Read your client's documentation. YMMV. Some providers may deliberately enforce some time limit. There is not much you can do about this. Also, frequently dropped connections are often an indication of a line problem of some kind. This is something the telco should investigate. 5.2. Installation ProblemsRead this section, if you have no sync at all or are completely unable to connect. See your modem's owner's manual for interpreting the modem's LEDs. (Many will show a solid red (or orange) light if not in sync.) 5.2.1. No syncThe modem sync LED has never been green.
5.2.2. Network Card (NIC) ProblemsSymptoms here are: NIC is not recognized, modules won't load, or ifconfig shows the interface is not up, or is generating lots of errors, etc.
5.2.3. IP Connection ProblemsRead this section if you are sure the modem is syncing, the NIC is recognized and seems to be working properly, client software is installed and running without error, but the connection to the ISP fails. Verify the modem is indeed syncing by the LED(s). An IP connection failure may be evidenced by ifconfig not showing an active eth0 interface (or ppp0 for PPPoX), or pinging gateway and other destinations generates 'network unreachable' or similar errors.
5.3. Sync ProblemsRead this section if you have had a working connection, but now have lost sync, are intermittently losing sync, your sync rate has dropped significantly, or are getting a "sync/no surf" condition. (Better quality modems will have a way to report sync rate, usually via telnet or a web browser interface. See the owner's manual.) A loss of sync indicates a problem with the DSLAM, your line (inside or outside) or your modem. DSLAMs typically have "shelves" with "cards". Alcatel DSLAM cards, just for instance, have a capacity of four connections each. If the card goes bad, at most four customers are effected. The point being that sync loss outages can be very isolated. Unlike network outages that tend to effect large numbers of users. Sync outages are a telco problem, not an ISP problem. If your service agreement is with the ISP, you will need to contact them, who will in turn contact the telco. Degraded sync rates, and disruption of the DSL signal, can cause various problems. Obviously, you will never get your maximum throughput under these conditions. But, the symptoms are not always obvious as to whether the problem is on your end or the provider's. For instance, a poor inside wire connection may result in retransmissions of packets that have been dropped. This can really reduce throughput and slow a connection down. It is tempting to think of packet loss as a traditional networking problem, but with DSL it is possible to be the result of a bad line, impaired signal, or even the modem itself. Some things to try:
Another possibility is a nearby AM radio station, or bandit ham radio operator that are disrupting the DSL signal since they operate in a similar frequency range. These may only cause problems at certain times of day, like when the station boosts its signal at night. A good telco DSL tech may be able to help minimize the impact of this. YMMV. 5.4. Network and Throughput ProblemsRead this section if your connection is up, but are having throughput problems. In other words, your speed isn't what it should be based on your bit rate plan, and your distance from the CO. "Network" here is the WAN -- the ISP's gateway and local subnet/backbone, etc. Remember that a marginal line can cause a reduced sync rate, and this will impact throughput. See above. The two factors we will be looking for are "latency" and "packet loss". Both are pretty easy to track down with the standard networking tools ping and traceroute. If either of these occur in our path, they will impact performance. Latency means "responsiveness" or "lag time". Actually what we are interested in is abnormally high latency, since there is always some latency. Packet loss is when a packet of data gets dropped somewhere along the way. TCP/IP will know it's been "lost", and there will be a retransmission of the lost data. Enough of this can really slow things down. Ideally packet loss should be 0%. What we really need to be concerned about is that part of the WAN route that we routinely traverse. If you do a traceroute to several different sites, you will probably see that the first few "hops" tend to be the same. These are your ISP's local backbone, and your ISP's upstream provider's gateway. Any problem with any of this, and it will effect everywhere you go and everything you do. We can start looking for packet loss and latency by pinging two or three different sites, hopefully in at least a couple of different directions. We will be looking for packet loss and/or unusually high latency.
The above example is pretty normal from here. (You probably have a very different route to this site, and your results may thus be quite different.) Apparently no serious underlying problems that would slow me down. The below example reveals a problem:
High packet loss at 35%, and some really slow roundtrip times in there as well. A little digging on this showed that it was a backbone router 13 hops into the traceroute that was the problem. While making this site really slow from here, it would only effect those routes that happen to hit that same router. Now what would really hurt us is if something similar happens with a router that we tend to go through consistently. Like our gateway, or maybe the second hop router too. Find these with traceroute, by just picking a random site:
The first hop is the gateway. In fact, for me the first two hops are always the same, and the first three or four are often the same. So a problem with any of these may cause a problem anywhere I go. (The specifics of your own situation may be a little different than this example.) A "normal" gateway ping (normal for me!):
And a problem with the same gateway on a different day:
41% packet loss is very high, to the point where many services, like HTTP, come to a screeching halt. Those services that were working, were working very, very slowly. It's a little tempting on this last real-life example to think this gateway router is acting up. But, as it turned out, this was the result of a problem in the DSLAM/ATM segment of the telco's network. So any first hop problem with packet loss or high latency, may actually be the result of something occurring before the first hop. We just don't have the tools to isolate where it is starting well enough. Packet loss can be a telco problem, just as much as an ISP/NSP problem. Or conceivably, even a modem problem. In which case try resetting the modem by power cycling and by unplugging/replugging the DSL cable (from the wall jack). It is also quite possible for the modem itself to cause packet loss. The fix here is to power cycle the modem, and resync by unplugging the DSL connection for 30 seconds or so. In fact, any part of the connection can be a source of packet loss -- modem, DSLAM, ATM network, etc. If you do find a problem within your ISP's network, it's time to report the problem to tech support. 5.4.1. Miscellaneous Network ProblemsSome odds and ends:
5.5. Measuring ThroughputOne of the first things most of us do is check our speeds to make sure we aren't getting short changed, and that our system is up to snuff. Doing this accurately is easier said than done however. First, remember you are losing 10-20% right off the top due to networking protocol overhead. Just how much is "lost" here depends on your provider's network architecture, where and how you are measuring this and other considerations. Most of us may wind up being closer to 20% than 10%. Then, any time you hit the Internet, there is some slight degradation of performance with each hop you take. Now this may not amount to much, as long as you are not taking too many hops and all the components -- your system, your ISP's network, your ISP's upstream provider, and the destination itself -- are all working like well oiled machines. But there's the rub -- how do you really know with so many variables in the mix? One flaky interface, on one router, on one hop along the path, may cause misleading results. Your absolute max speed is going to be at your point of connection to your ISP -- the ISP's gateway. It can only go downhill from there, not up! So the ideal test is as close to home as possible. This eliminates as many unknown variables as possible. If your ISP has a local ftp server, this is an excellent place to run your own tests. (Run a traceroute though just to see how local it really is.) If your ISP does not have this, look for an ftp site that is close -- the fewer the hops, the better. And look for one that isn't too busy, or you will get misleading results. Find a large file -- like 10 Megs -- and time the download. Try this over several days, and at different times of day. The server, and the backbone, are going to be busier at certain times of day, which can skew results and you want to eliminate these variables as much as possible. Your provider cannot compensate for heavy backbone traffic, backbone bottlenecks, slow or busy servers, etc. There are many test sites scattered around the web. Some are better than others, but take these with a grain of salt. There are just too many variables for these tests to reliably give you an accurate snapshot of your connection and throughput. They may give you a general picture of whether you are in the ballpark of where you think you should be or not. One good speed test is http://www.dslreports.com/stest/0. Another test is http://speedtest.mybc.com/ (both are Java). I find these to be better than some of the others out there. Now keeping in mind that we are limited by the ~10-20% networking overhead rule, here is an example. My speed is capped at 1472 Kbps sync rate. Minus the ~15% is 1275 Kbps. My sync rate is known to be good and my distance to the CO is about 11,000 Ft, which is close enough that I should be able to hit my real world maximum throughput of 1275 Kbps or roughly 1.2-1.3 Mbps -- all other things being equal. From dslreports.com speed test:
1.211 Mbps is probably about as good as I can realistically expect based on my service. There is no reason for me to go troubleshooting or looking for tweaks. Big Caution: my ISP uses a caching proxy server for web pages. This is a big equalizer for these kinds of web based tests. Without that, I surely would have been significantly slower on this test. The effect of the proxy is that you are actually testing throughput from the proxy -- NOT the test site. Just FYI. Another note: at the same time I tried another test site and was consistently getting 600-700 Kbps. So YMMV with these tests. (Usually I get the same on each, more or less.) Timing a large ftp download from two different sites, I calculated about 1.25 Mbps. |