GNU Info

Info Node: (mysql.info)Replication Problems

(mysql.info)Replication Problems


Prev: Replication FAQ Up: Replication
Enter node , (file) or (file)node

Troubleshooting Replication
---------------------------

If you have followed the instructions, and your replication setup is not
working, first eliminate the user error factor by checking the
following:

   * Is the master logging to the binary log? Check with `SHOW MASTER
     STATUS'.  If it is, `Position' will be non-zero. If not, verify
     that you have given the master `log-bin' option and have set
     `server-id'.

   * Is the slave running? Check with `SHOW SLAVE STATUS'. The answer
     is found in `Slave_running' column. If not, verify slave options
     and check the error log for messages.

   * If the slave is running, did it establish connection with the
     master? Do `SHOW PROCESSLIST', find the thread with `system user'
     value in `User' column and `none' in the `Host' column, and check
     the `State' column. If it says `connecting to master', verify the
     privileges for the replication user on the master, master host
     name, your DNS setup, whether the master is actually running,
     whether it is reachable from the slave, and if all that seems ok,
     read the error logs.

   * If the slave was running, but then stopped, look at SHOW SLAVE
     STATUS output and check the error logs. It usually happens when
     some query that succeeded on the master fails on the slave. This
     should never happen if you have taken a proper snapshot of the
     master, and never modify the data on the slave outside of the
     slave thread. If it does, it is a bug, read below on how to report
     it.

   * If a query on that succeeded on the master refuses to run on the
     slave, and a full database resync ( the proper thing to do ) does
     not seem feasible, try the following:
        - First see if there is some stray record in the way.
          Understand how it got there, then delete it and run `SLAVE
          START'

        - If the above does not work or does not apply, try to
          understand if it would be safe to make the update manually (
          if needed) and then ignore the next query from the master.

        - If you have decided you can skip the next query, do `SET
          SQL_SLAVE_SKIP_COUNTER=1; SLAVE START;' to skip a query that
          does not use auto_increment, or last_insert_id  or `SET
          SQL_SLAVE_SKIP_COUNTER=2; SLAVE START;' otherwise. The reason
          auto_increment/last_insert_id queries are different is that
          they take two events in the binary log of the master.

        - If you are sure the slave started out perfectly in sync with
          the master, and no one has updated  the tables involved
          outside of slave thread, report the bug, so you will not have
          to do the above tricks again.

   * Make sure you are not running into an old bug by upgrading to the
     most recent version.

   * If all else fails, read the error logs. If they are big, `grep -i
     slave /path/to/your-log.err' on the slave. There is no generic
     pattern to search for on the master, as the only errors it logs
     are general system errors - if it can, it will send the error to
     the slave when things go wrong.

When you have determined that there is no user error involved, and
replication still either does not work at all or is unstable, it is
time to start working on a bug report. We need to get as much info as
possible from you to be able to track down the bug. Please do spend
some time and effort preparing a good bug report. Ideally, we would
like to have a test case in the format found in `mysql-test/t/rpl*'
directory of the source tree. If you submit a test case like that, you
can expect a patch within a day or two in most cases, although, of
course, you mileage may vary depending on a number of factors.

Second best option is a just program with easily configurable connection
arguments for the master and the slave that will demonstrate the
problem on our systems. You can write one in Perl or in C, depending on
which language you know better.

If you have one of the above ways to demonstrate the bug, use
`mysqlbug' to prepare a bug report and send it to
<bugs@lists.mysql.com>. If you have a phantom - a problem that does
occur but you cannot duplicate "at will":

   * Verify that there is no user error involved. For example, if you
     update the slave outside of the slave thread, the data will be out
     of sync, and you can have unique key violations on updates, in
     which case the slave thread will stop and wait for you to clean up
     the tables manually to bring them in sync.

   * Run slave with `log-slave-updates' and `log-bin' - this will keep
     a log of all updates on the slave.

   * Save all evidence before resetting the replication. If we have no
     or only sketchy information, it would take us a while to track
     down the problem. The evidence you should collect is:
        - All binary logs on the master

        - All binary log on the slave

        - The output of `SHOW MASTER STATUS' on the master at the time
          you have discovered the problem

        - The output of `SHOW SLAVE STATUS' on the master at the time
          you have discovered the problem

        - Error logs on the master and on the slave

   * Use `mysqlbinlog' to examine the binary logs. The following should
     be helpful to find the trouble query, for example:
          mysqlbinlog -j pos_from_slave_status /path/to/log_from_slave_status | head

Once you have collected the evidence on the phantom problem, try hard to
isolate it into a separate test case first. Then report the problem to
<bugs@lists.mysql.com> with as much info as possible.


automatically generated by info2www version 1.2.2.9