GNU Info

Info Node: (mysql.info)Replication FAQ

(mysql.info)Replication FAQ


Next: Replication Problems Prev: Replication SQL Up: Replication
Enter node , (file) or (file)node

Replication FAQ
---------------

*Q*: Why do I sometimes see more than one `Binlog_Dump' thread on the
master after I have restarted the slave?

*A*: `Binlog_Dump' is a continuous process that is handled by the
server in the following way:

   * Catch up on the updates.

   * Once there are no more updates left, go into `pthread_cond_wait()',
     from which we can be awakened either by an update or a kill.

   * On wake up, check the reason. If we are not supposed to die,
     continue the `Binlog_dump' loop.

   * If there is some fatal error, such as detecting a dead client,
     terminate the loop.

So if the slave thread stops on the slave, the corresponding
`Binlog_Dump' thread on the master will not notice it until after at
least one update to the master (or a kill), which is needed to wake it
up from `pthread_cond_wait()'.  In the meantime, the slave could have
opened another connection, which resulted in another `Binlog_Dump'
thread.

The above problem should not be present in Version 3.23.26 and later
versions.  In Version 3.23.26 we added `server-id' to each replication
server, and now all the old zombie threads are killed on the master
when a new replication thread connects from the same slave

*Q*: How do I rotate replication logs?

*A*: In Version 3.23.28 you should use `PURGE MASTER LOGS TO' command
after determining which logs can be deleted, and optionally backing
them up first. In earlier versions the process is much more painful,
and cannot be safely done without stopping all the slaves in the case
that you plan to re-use log names. You will need to stop the slave
threads, edit the binary log index file, delete all the old logs,
restart the master, start slave threads, and then remove the old log
files.

*Q*: How do I upgrade on a hot replication setup?

*A*: If you are upgrading  pre-3.23.26 versions, you should just lock
the master tables, let the slave catch up, then run `FLUSH MASTER' on
the master, and `FLUSH SLAVE' on the slave to reset the logs, then
restart new versions of the master and the slave. Note that the slave
can stay down for some time - since the master is logging all the
updates, the slave will be able to catch up once it is up and can
connect.

After 3.23.26, we have locked the replication protocol for
modifications, so you can upgrade masters and slave on the fly to a
newer 3.23 version and you can have different versions of MySQL running
on the slave and the master, as long as they are both newer than
3.23.26.

*Q*: What issues should I be aware of when setting up two-way
replication?

*A*: MySQL replication currently does not support any locking protocol
between master and slave to guarantee the atomicity of a distributed
(cross-server) update. In in other words, it is possible for client A
to make an update to  co-master 1, and in the meantime, before it
propagates to co-master 2, client B could make an update to co-master 2
that will make the update of client A work differently than it did on
co-master 1. Thus when the update of client A will make it to co-master
2, it will produce  tables that will be different than what you have on
co-master 1, even after all the updates from co-master 2 have also
propagated. So you should not co-chain two servers in a two-way
replication relationship, unless you are sure that you updates can
safely happen in any order, or unless you take care of mis-ordered
updates somehow in the client code.

You must also realize that two-way replication actually does not improve
performance very much, if at all, as far as updates are concerned. Both
servers need to do the same amount of updates each, as you would have
one server do. The only difference is that there will be a little less
lock contention, because the updates originating on another server will
be serialized in one slave thread. This benefit, though, might be
offset by network delays.

*Q*: How can I use replication to improve performance of my system?

*A*: You should set up one server as the master, and direct all writes
to it, and configure as many slaves as you have the money and rackspace
for, distributing the reads among the master and the slaves.  You can
also start the slaves with `--skip-bdb', `--low-priority-updates' and
`--delay-key-write-for-all-tables' to get speed improvements for the
slave.  In this case the slave will use non-transactional `MyISAM'
tables instead of `BDB' tables to get more speed.

*Q*: What should I do to prepare my client code to use
performance-enhancing replication?

*A*: If the part of your code that is responsible for database access
has been properly abstracted/modularized, converting it to run with the
replicated setup should be very smooth and easy - just change the
implementation of your database access to read from some slave or the
master, and to always write to the master. If your code does not have
this level of abstraction, setting up a replicated system will give you
an opportunity/motivation to it clean up.   You should start by
creating a wrapper library /module with the following functions:

   * `safe_writer_connect()'

   * `safe_reader_connect()'

   * `safe_reader_query()'

   * `safe_writer_query()'

`safe_' means that the function will take care of handling all the
error conditions.

You should then convert your client code to use the wrapper library.
It may be a painful and scary process at first, but it will pay off in
the long run. All applications that follow the above pattern will be
able to take advantage of one-master/many slaves solution.  The code
will be a lot easier to maintain, and adding troubleshooting options
will be trivial. You will just need to modify one or two functions, for
example, to log how long each query took, or which query, among your
many thousands, gave you an error. If you have written a lot of code
already, you may want to automate the conversion task by using Monty's
`replace' utility, which comes with the standard distribution of MySQL,
or just write your own Perl script. Hopefully, your code follows some
recognizable pattern. If not, then you are probably better off
re-writing it anyway, or at least going through and manually beating it
into a pattern.

Note that, of course, you can use different names for the functions.
What is important is having unified interface for connecting for reads,
connecting for writes, doing a read, and doing a write.

*Q*: When and how much can MySQL replication improve the performance of
my system?

*A*: MySQL replication is most beneficial for a system with frequent
reads and not so frequent writes. In theory, by using a one master/many
slaves setup you can scale by adding more slaves until you either run
out of network bandwidth, or your update load grows to the point that
the master cannot handle it.

In order to determine how many slaves you can get before the added
benefits begin to level out, and how much you can improve performance
of your site, you need to know your query patterns, and empirically
(by benchmarking) determine the relationship between the throughput on
reads (reads per second, or `max_reads') and on writes `max_writes') on
a typical master and a typical slave. The example below will show you a
rather simplified calculation of what you can get with replication for
our imagined system.

Let's say our system load consists of 10% writes and 90% reads, and we
have determined that `max_reads' = 1200 - 2 * `max_writes', or in other
words, our system can do 1200 reads per second with no writes, our
average write is twice as slow as average read, and the relationship is
linear. Let us suppose that our master and slave are of the same
capacity, and we have N slaves and 1 master. Then we have for each
server (master or slave):

`reads = 1200 - 2 * writes' (from bencmarks)

`reads = 9* writes / (N + 1) ' (reads split, but writes go to all
servers)

`9*writes/(N+1) + 2 * writes = 1200'

`writes = 1200/(2 + 9/(N+1)'

So if N = 0, which means we have no replication, our system can handle
1200/11, about 109 writes per second (which means we will have 9 times
as many reads due to the nature of our application).

If N = 1, we can get up to 184 writes per second.

If N = 8, we get up to 400.

If N = 17, 480 writes.

Eventually as N approaches infinity (and our budget negative infinity),
we can get very close to 600 writes per second, increasing system
throughput about 5.5 times. However, with only 8 servers, we increased
it almost 4 times already.

Note that our computations assumed infinite network bandwidth, and
neglected several other factors that could turn out to be significant on
your system. In many cases, you may not be able to make a computation
similar to the one above that will accurately predict what will happen
on your system if you add N replication slaves. However, answering the
following questions should help you decided whether and how much, if at
all, the replication will improve the performance of your system:

   * What is the read/write ratio on your system?

   * How much more write load can one server handle if you reduce the
     reads?

   * How many slaves do you have bandwidth for on your network?

*Q*: How can I use replication to provide redundancy/high availability?

*A*: With the currently available features, you would have to set up a
master and a slave (or several slaves), and write a script that will
monitor the master to see if it is up, and instruct your applications
and the slaves of the master change in case of failure. Some
suggestions:

   * To tell a slave to change the master use the `CHANGE MASTER TO'
     command.

   * A good way to keep your applications informed where the master is
     by having a dynamic DNS entry for the master. With *bind* you can
     use `nsupdate' to dynamically update your DNS.

   * You should run your slaves with the `log-bin' option and without
     `log-slave-updates'. This way the slave will be ready to become a
     master as soon as you issue `STOP SLAVE'; `RESET MASTER', and
     `CHANGE MASTER TO' on the other slaves. It will also help you catch
     spurious updates that may happen because of misconfiguration of the
     slave (ideally, you want to configure access rights so that no
     client can update the slave, except for the slave thread) combined
     with the bugs in your client programs (they should never update
     the slave directly).


We are currently working on integrating an automatic master election
system into MySQL, but until it is ready, you will have to create your
own monitoring tools.


automatically generated by info2www version 1.2.2.9