Whole document tree
As of MySQL Version 3.23.6, you can choose between three basic
table formats (
Note that to use
The default table type in MySQL is
You can convert tables between different types with the
Note that MySQL supports two different kinds of
tables. Transaction-safe tables (
Advantages of transaction-safe tables (TST):
Advantages of not transaction-safe tables (NTST):
You can combine TST and NTST tables in the same statements to get the best of both worlds.
The index is stored in a file with the
The following is new in
Note that index files are usually much smaller with
The following options to
The automatic recovery is activated if you start
If the recover wouldn't be able to recover all rows from a previous
completed statement and you didn't specify
Error: Couldn't repair table: test.g00pages
If you in this case had used the
Warning: Found 344 of 354 rows when repairing ./test/g00pages
Note that if you run automatic recover with the
See section 4.1.1 mysqld Command-line Options.
MySQL can support different index types, but the normal type is
ISAM or MyISAM. These use a B-tree index, and you can roughly calculate
the size for the index file as
String indexes are space compressed. If the first index part is a
string, it will also be prefix compressed. Space compression makes the
index file smaller than the above figures if the string column has a lot
of trailing space or is a
MyISAM supports 3 different table types. Two of them are chosen
automatically depending on the type of columns you are using. The third,
compressed tables, can only be created with the
This is the default format. It's used when the table contains no
This format is the simplest and most secure format. It is also the fastest of the on-disk formats. The speed comes from the easy way data can be found on disk. When looking up something with an index and static format it is very simple. Just multiply the row number by the row length.
Also, when scanning a table it is very easy to read a constant number of records with each disk read.
The security is evidenced if your computer crashes when writing to a
fixed-size MyISAM file, in which case
This format is used if the table contains any
This format is a little more complex because each row has to have a header that says how long it is. One record can also end up at more than one location when it is made longer at an update.
You can use
This is a read-only type that is generated with the optional
The file format that MySQL uses to store data has been extensively tested, but there are always circumstances that may cause database tables to become corrupted.
Even if the MyISAM table format is very reliable (all changes to a table is written before the SQL statements returns) , you can still get corrupted tables if some of the following things happens:
Typial typical symptoms for a corrupt table is:
You can check if a table is ok with the command
You can repair a corrupted table with
If your tables get corrupted a lot you should try to find the reason for this! See section A.4.1 What To Do If MySQL Keeps Crashing.
In this case the most important thing to know is if the table got
corrupted if the
If you get the following warning from
# clients is using or hasn't closed the table properly
this means that this counter has come out of sync. This doesn't mean that the table is corrupted, but means that you should at least do a check on the table to verify that it's ok.
The counter works as follows:
In other words, the only ways this can go out of sync are:
With identical tables we mean that all tables are created with identical
column and key information. You can't put a MERGE over tables where the
columns are packed differently, doesn't have exactly the same columns or
have the keys in different order. Some of the tables can however be
When you create a
For the moment you need to have
The disadvantages with
The following example shows you how to use
CREATE TABLE t1 (a INT AUTO_INCREMENT PRIMARY KEY, message CHAR(20)); CREATE TABLE t2 (a INT AUTO_INCREMENT PRIMARY KEY, message CHAR(20)); INSERT INTO t1 (message) VALUES ("Testing"),("table"),("t1"); INSERT INTO t2 (message) VALUES ("Testing"),("table"),("t2"); CREATE TABLE total (a INT NOT NULL, message CHAR(20), KEY(a)) TYPE=MERGE UNION=(t1,t2);
Note that we didn't create a
Note that you can also manipulate the
shell> cd /mysql-data-directory/current-database shell> ls -1 t1.MYI t2.MYI > total.MRG shell> mysqladmin flush-tables
Now you can do things like:
mysql> select * from total; +---+---------+ | a | message | +---+---------+ | 1 | Testing | | 2 | table | | 3 | t1 | | 1 | Testing | | 2 | table | | 3 | t2 | +---+---------+
To remap a
You can also use the deprecated ISAM table type. This will disappear
rather soon because
Most of the things true for
If you want to convert an
mysql> ALTER TABLE tbl_name TYPE = MYISAM;
The MySQL internal HEAP tables use 100% dynamic hashing
without overflow areas. There is no extra space needed for free lists.
mysql> CREATE TABLE test TYPE=HEAP SELECT ip,SUM(downloads) as down FROM log_table GROUP BY ip; mysql> SELECT COUNT(ip),AVG(down) FROM test; mysql> DROP TABLE test;
Here are some things you should consider when you use
The memory needed for one row in a
SUM_OVER_ALL_KEYS(max_length_of_key + sizeof(char*) * 2) + ALIGN(length_of_row+1, sizeof(char*))
Support for BDB tables is included in the MySQL source distribution starting from Version 3.23.34 and is activated in the MySQL-Max binary.
BerkeleyDB, available at http://www.sleepycat.com/ has provided
MySQL with a transactional table handler. By using BerkeleyDB
tables, your tables may have a greater chance of surviving crashes, and also
We at MySQL AB are working in close cooperation with Sleepycat to keep the quality of the MySQL/BDB interface high.
When it comes to supporting BDB tables, we are committed to help our users to locate the problem and help creating a reproducable test case for any problems involving BDB tables. Any such test case will be forwarded to Sleepycat who in turn will help us find and fix the problem. As this is a two stage operation, any problems with BDB tables may take a little longer for us to fix than for other table handlers. However, as the BerkeleyDB code itself has been used by many other applications than MySQL, we don't envision any big problems with this. See section 126.96.36.199 Support for other table handlers.
If you have downloaded a binary version of MySQL that includes support for BerkeleyDB, simply follow the instructions for installing a binary version of MySQL. See section M.1 Installing a MySQL Binary Distribution. See section 4.7.5 mysqld-max, An extended mysqld server.
To compile MySQL with Berkeley DB support, download MySQL
Version 3.23.34 or newer and configure
cd /path/to/source/of/mysql-3.23.34 ./configure --with-berkeley-db
Please refer to the manual provided with the
Even though Berkeley DB is in itself very tested and reliable, the MySQL interface is still considered beta quality. We are actively improving and optimizing it to get it stable very soon.
If you are running with
If you are running with
The following options to
If you use
Normally you should start
You may also want to change
If you after having built MySQL with support for BDB tables get
the following error in the log file when you start
bdb: architecture lacks fast mutexes: applications cannot be threaded Can't init dtabases
This means that
NOTE: The following list is not complete; We will update this as we get more information about this.
Currently we know that BDB tables works with the following operating system.
It doesn't work with the following operating systems:
InnoDB tables are included in the MySQL source distribution starting from 3.23.34a and are activated in the MySQL -max binary.
If you have downloaded a binary version of MySQL that includes support for InnoDB (mysqld-max), simply follow the instructions for installing a binary version of MySQL. See section M.1 Installing a MySQL Binary Distribution. See section 4.7.5 mysqld-max, An extended mysqld server.
To compile MySQL with InnoDB support, download MySQL-3.23.37 or newer
and configure MySQL with the
cd /path/to/source/of/mysql-3.23.37 ./configure --with-innodb
To get InnoDB to work you have to specify where the data for InnoDB
tables should be stored by specifying the
Can't initialize InnoDB as 'innodb_data_file_path' is not set
InnoDB provides MySQL with a transaction-safe table handler with
commit, rollback, and crash recovery capabilities. InnoDB does
locking on row level, and also provides an Oracle-style consistent
non-locking read in
InnoDB has been designed for maximum performance when processing large data volumes. Its CPU efficiency is probably not matched by any other disk-based relational database engine.
You can find the latest information about InnoDB at http://www.innodb.com. The most up-to-date version of the InnoDB manual is always placed there, and you can also order commercial support for InnoDB.
Technically, InnoDB is a database backend placed under MySQL. InnoDB
has its own buffer pool for caching data and indexes in main
memory. InnoDB stores its tables and indexes in a tablespace, which
may consist of several files. This is different from, for example,
InnoDB is distributed under the GNU GPL License Version 2 (of June 1991). In the source distribution of MySQL, InnoDB appears as a subdirectory.
Beginning from MySQL-3.23.37 the prefix of the options is changed
To use InnoDB tables you MUST specify configuration parameters
in the MySQL configuration file in the
The only required parameter to use InnoDB is
Suppose you have a Windows NT machine with 128 MB RAM and a single 10 GB hard disk. Below is an example of possible configuration parameters in `my.cnf' for InnoDB:
innodb_data_file_path = ibdata1:2000M;ibdata2:2000M innodb_data_home_dir = c:\ibdata set-variable = innodb_mirrored_log_groups=1 innodb_log_group_home_dir = c:\iblogs set-variable = innodb_log_files_in_group=3 set-variable = innodb_log_file_size=30M set-variable = innodb_log_buffer_size=8M innodb_flush_log_at_trx_commit=1 innodb_log_arch_dir = c:\iblogs innodb_log_archive=0 set-variable = innodb_buffer_pool_size=80M set-variable = innodb_additional_mem_pool_size=10M set-variable = innodb_file_io_threads=4 set-variable = innodb_lock_wait_timeout=50
Note that data files must be < 4G, and < 2G on some file systems! The total size of data files has to be >= 10 MB. InnoDB does not create directories: you have to create them yourself.
Suppose you have a Linux machine with 512 MB RAM and three 20 GB hard disks (at directory paths `/', `/dr2' and `/dr3'). Below is an example of possible configuration parameters in `my.cnf' for InnoDB:
innodb_data_file_path = ibdata/ibdata1:2000M;dr2/ibdata/ibdata2:2000M innodb_data_home_dir = / set-variable = innodb_mirrored_log_groups=1 innodb_log_group_home_dir = /dr3 set-variable = innodb_log_files_in_group=3 set-variable = innodb_log_file_size=50M set-variable = innodb_log_buffer_size=8M innodb_flush_log_at_trx_commit=1 innodb_log_arch_dir = /dr3/iblogs innodb_log_archive=0 set-variable = innodb_buffer_pool_size=400M set-variable = innodb_additional_mem_pool_size=20M set-variable = innodb_file_io_threads=4 set-variable = innodb_lock_wait_timeout=50
Note that we have placed the two data files on different disks.
The reason for the name
The meanings of the configuration parameters are the following:
Suppose you have installed MySQL and have edited `my.cnf' so that it contains the necessary InnoDB configuration parameters. Before starting MySQL you should check that the directories you have specified for InnoDB data files and log files exist and that you have access rights to those directories. InnoDB cannot create directories, only files. Check also you have enough disk space for the data and log files.
When you now start MySQL, InnoDB will start creating your data files and log files. InnoDB will print something like the following:
~/mysqlm/sql > mysqld InnoDB: The first specified data file /home/heikki/data/ibdata1 did not exist: InnoDB: a new database to be created! InnoDB: Setting file /home/heikki/data/ibdata1 size to 134217728 InnoDB: Database physically writes the file full: wait... InnoDB: Data file /home/heikki/data/ibdata2 did not exist: new to be created InnoDB: Setting file /home/heikki/data/ibdata2 size to 262144000 InnoDB: Database physically writes the file full: wait... InnoDB: Log file /home/heikki/data/logs/ib_logfile0 did not exist: new to be c reated InnoDB: Setting log file /home/heikki/data/logs/ib_logfile0 size to 5242880 InnoDB: Log file /home/heikki/data/logs/ib_logfile1 did not exist: new to be c reated InnoDB: Setting log file /home/heikki/data/logs/ib_logfile1 size to 5242880 InnoDB: Log file /home/heikki/data/logs/ib_logfile2 did not exist: new to be c reated InnoDB: Setting log file /home/heikki/data/logs/ib_logfile2 size to 5242880 InnoDB: Started mysqld: ready for connections
A new InnoDB database has now been created. You can connect to the MySQL
server with the usual MySQL client programs like
010321 18:33:34 mysqld: Normal shutdown 010321 18:33:34 mysqld: Shutdown Complete InnoDB: Starting shutdown... InnoDB: Shutdown completed
You can now look at the data files and logs directories and you will see the files created. The log directory will also contain a small file named `ib_arch_log_0000000000'. That file resulted from the database creation, after which InnoDB switched off log archiving. When MySQL is again started, the output will be like the following:
~/mysqlm/sql > mysqld InnoDB: Started mysqld: ready for connections
If something goes wrong in an InnoDB database creation, you should delete all files created by InnoDB. This means all data files, all log files, the small archived log file, and in the case you already did create some InnoDB tables, delete also the corresponding `.frm' files for these tables from the MySQL database directories. Then you can try the InnoDB database creation again.
Suppose you have started the MySQL client with the command
CREATE TABLE CUSTOMER (A INT, B CHAR (20), INDEX (A)) TYPE = InnoDB;
This SQL command will create a table and an index on column
You can query the amount of free space in the InnoDB tablespace
by issuing the table status command of MySQL for any table you have
SHOW TABLE STATUS FROM test LIKE 'CUSTOMER'
Note that the statistics
InnoDB does not have a special optimization for separate index creation.
Therefore it does not pay to export and import the table and create indexes
The fastest way to alter a table to InnoDB is to do the inserts
directly to an InnoDB table, that is, use
To get better control over the insertion process, it may be good to insert big tables in pieces:
INSERT INTO newtable SELECT * FROM oldtable WHERE yourkey > something AND yourkey <= somethingelse;
After all data has been inserted you can rename the tables.
During the conversion of big tables you should set the InnoDB buffer pool size big to reduce disk i/o. Not bigger than 80 % of the physical memory, though. You should set InnoDB log files big, and also the log buffer large.
Make sure you do not run out of tablespace: InnoDB tables take a lot
more space than MyISAM tables. If an
In the case of a runaway rollback, if you do not have valuable data in your database, it is better that you kill the database process and delete all InnoDB data and log files and all InnoDB table `.frm' files, and start your job again, rather than wait for millions of disk i/os to complete.
You cannot increase the size of an InnoDB data file. To add more into
your tablespace you have to add a new data file. To do this you have to
shut down your MySQL database, edit the `my.cnf' file, adding a
new file to
Currently you cannot remove a data file from InnoDB. To decrease the
size of your database you have to use
If you want to change the number or the size of your InnoDB log files, you have to shut down MySQL and make sure that it shuts down without errors. Then copy the old log files into a safe place just in case something went wrong in the shutdown and you will need them to recover the database. Delete then the old log files from the log file directory, edit `my.cnf', and start MySQL again. InnoDB will tell you at the startup that it is creating new log files.
The key to safe database management is taking regular backups. To take a 'binary' backup of your database you have to do the following:
There is currently no on-line or incremental backup tool available for InnoDB, though they are in the TODO list.
In addition to taking the binary backups described above, you should also regularly take dumps of your tables with `mysqldump'. The reason to this is that a binary file may be corrupted without you noticing it. Dumped tables are stored into text files which are human-readable and much simpler than database binary files. Seeing table corruption from dumped files is easier, and since their format is simpler, the chance for serious data corruption in them is smaller.
A good idea is to take the dumps at the same time you take a binary backup of your database. You have to shut out all clients from your database to get a consistent snapshot of all your tables into your dumps. Then you can take the binary backup, and you will then have a consistent snapshot of your database in two formats.
To be able to recover your InnoDB database to the present from the binary backup described above, you have to run your MySQL database with the general logging and log archiving of MySQL switched on. Here by the general logging we mean the logging mechanism of the MySQL server which is independent of InnoDB logs.
To recover from a crash of your MySQL server process, the only thing you have to do is to restart it. InnoDB will automatically check the logs and perform a roll-forward of the database to the present. InnoDB will automatically roll back uncommitted transactions which were present at the time of the crash. During recovery, InnoDB will print out something like the following:
~/mysqlm/sql > mysqld InnoDB: Database was not shut down normally. InnoDB: Starting recovery from log files... InnoDB: Starting log scan based on checkpoint at InnoDB: log sequence number 0 13674004 InnoDB: Doing recovery: scanned up to log sequence number 0 13739520 InnoDB: Doing recovery: scanned up to log sequence number 0 13805056 InnoDB: Doing recovery: scanned up to log sequence number 0 13870592 InnoDB: Doing recovery: scanned up to log sequence number 0 13936128 ... InnoDB: Doing recovery: scanned up to log sequence number 0 20555264 InnoDB: Doing recovery: scanned up to log sequence number 0 20620800 InnoDB: Doing recovery: scanned up to log sequence number 0 20664692 InnoDB: 1 uncommitted transaction(s) which must be rolled back InnoDB: Starting rollback of uncommitted transactions InnoDB: Rolling back trx no 16745 InnoDB: Rolling back of trx no 16745 completed InnoDB: Rollback of uncommitted transactions completed InnoDB: Starting an apply batch of log records to the database... InnoDB: Apply batch completed InnoDB: Started mysqld: ready for connections
If your database gets corrupted or your disk fails, you have to do the recovery from a backup. In the case of corruption, you should first find a backup which is not corrupted. From a backup do the recovery from the general log files of MySQL according to instructions in the MySQL manual.
InnoDB implements a checkpoint mechanism called a fuzzy checkpoint. InnoDB will flush modified database pages from the buffer pool in small batches, there is no need to flush the buffer pool in one single batch, which would in practice stop processing of user SQL statements for a while.
In crash recovery InnoDB looks for a checkpoint label written to the log files. It knows that all modifications to the database before the label are already present on the disk image of the database. Then InnoDB scans the log files forward from the place of the checkpoint applying the logged modifications to the database.
InnoDB writes to the log files in a circular fashion. All committed modifications which make the database pages in the buffer pool different from the images on disk must be available in the log files in case InnoDB has to do a recovery. This means that when InnoDB starts to reuse a log file in the circular fashion, it has to make sure that the database page images on disk already contain the modifications logged in the log file InnoDB is going to reuse. In other words, InnoDB has to make a checkpoint and often this involves flushing of modified database pages to disk.
The above explains why making your log files very big may save disk i/o in checkpointing. It can make sense to set the total size of the log files as big as the buffer pool or even bigger. The drawback in big log files is that crash recovery can last longer because there will be more log to apply to the database.
InnoDB data and log files are binary-compatible on all platforms
if the floating point number format on the machines is the same.
You can move an InnoDB database simply by copying all the relevant
files, which we already listed in the previous section on backing up
a database. If the floating point formats on the machines are
different but you have not used
A performance tip is to switch off the auto commit when you import data into your database, assuming your tablespace has enough space for the big rollback segment the big import transaction will generate. Do the commit only after importing a whole table or a segment of a table.
In the InnoDB transaction model the goal has been to combine the best sides of a multiversioning database to traditional two-phase locking. InnoDB does locking on row level and runs queries by default as non-locking consistent reads, in the style of Oracle. The lock table in InnoDB is stored so space-efficiently that lock escalation is not needed: typically several users are allowed to lock every row in the database, or any random subset of the rows, without InnoDB running out of memory.
In InnoDB all user activity happens inside transactions. If the
auto commit mode is used in MySQL, then each SQL statement
will form a single transaction. If the auto commit mode is
switched off, then we can think that a user always has a transaction
open. If he issues
A consistent read means that InnoDB uses its multiversioning to present to a query a snapshot of the database at a point in time. The query will see the changes made by exactly those transactions that committed before that point of time, and no changes made by later or uncommitted transactions. The exception to this rule is that the query will see the changes made by the transaction itself which issues the query.
When a transaction issues its first consistent read, InnoDB assigns the snapshot, or the point of time, which all consistent reads in the same transaction will use. In the snapshot are all transactions that committed before assigning the snapshot. Thus the consistent reads within the same transaction will also be consistent with respect to each other. You can get a fresher snapshot for your queries by committing the current transaction and after that issuing new queries.
Consistent read is the default mode in which InnoDB processes
A consistent read is not convenient in some circumstances.
Suppose you want to add a new row into your table
Suppose you use a consistent read to read the table
The solution is to perform the
SELECT * FROM PARENT WHERE NAME = 'Jones' LOCK IN SHARE MODE;
Performing a read in share mode means that we read the latest
available data, and set a shared mode lock on the rows we read.
If the latest data belongs to a yet uncommitted transaction of another
user, we will wait until that transaction commits.
A shared mode lock prevents others from updating or deleting
the row we have read. After we see that the above query returns
Let us look at another example: we have an integer counter field in
In this case there are two good ways to implement the
reading and incrementing of the counter: (1) update the counter
first by incrementing it by 1 and only after that read it,
or (2) read the counter first with
a lock mode
SELECT COUNTER_FIELD FROM CHILD_CODES FOR UPDATE; UPDATE CHILD_CODES SET COUNTER_FIELD = COUNTER_FIELD + 1;
In row level locking InnoDB uses an algorithm called next-key locking. InnoDB does the row level locking so that when it searches or scans an index of a table, it sets shared or exclusive locks on the index records in encounters. Thus the row level locks are more precisely called index record locks.
The locks InnoDB sets on index records also affect the 'gap'
before that index record. If a user has a shared or exclusive
lock on record R in an index, then another user cannot insert
a new index record immediately before R in the index order.
This locking of gaps is done to prevent the so-called phantom
problem. Suppose I want to read and lock all children with identifier
bigger than 100 from table
SELECT * FROM CHILD WHERE ID > 100 FOR UPDATE;
Suppose there is an index on table
SELECT * FROM CHILD WHERE ID > 100 FOR UPDATE;
again, I will see a new child in the result set the query returns. This is against the isolation principle of transactions: a transaction should be able to run so that the data it has read does not change during the transaction. If we regard a set of rows as a data item, then the new 'phantom' child would break this isolation principle.
When InnoDB scans an index it can also lock the gap
after the last record in the index. Just that happens in the previous
example: the locks set by InnoDB will prevent any insert to
the table where
You can use the next-key locking to implement a uniqueness check in your application: if you read your data in share mode and do not see a duplicate for a row you are going to insert, then you can safely insert your row and know that the next-key lock set on the successor of your row during the read will prevent anyone meanwhile inserting a duplicate for your row. Thus the next-key locking allows you to 'lock' the non-existence of something in your table.
InnoDB automatically detects a deadlock of transactions and rolls
back the transaction whose lock request was the last one to build
a deadlock, that is, a cycle in the waits-for graph of transactions.
InnoDB cannot detect deadlocks where a lock set by a MySQL
When InnoDB performs a complete rollback of a transaction, all the locks of the transaction are released. However, if just a single SQL statement is rolled back as a result of an error, some of the locks set by the SQL statement may be preserved. This is because InnoDB stores row locks in a format where it cannot afterwards know which was set by which SQL statement.
When you issue a consistent read, that is, an ordinary
You can advance your timepoint by committing your transaction
and then doing another
This is called multiversioned concurrency control.
User A User B set autocommit=0; set autocommit=0; time | SELECT * FROM t; | empty set | INSERT INTO t VALUES (1, 2); | v SELECT * FROM t; empty set COMMIT; SELECT * FROM t; empty set; COMMIT; SELECT * FROM t; ---------------------- | 1 | 2 | ----------------------
Thus user A sees the row inserted by B only when B has committed the insert, and A has committed his own transaction so that the timepoint is advanced past the the commit of B.
If you want to see the 'freshest' state of the database, you should use a locking read:
SELECT * FROM t LOCK IN SHARE MODE;
1. If the Unix `top' or the Windows `Task Manager' shows that the CPU usage percentage with your workload is less than 70 %, your workload is probably disk-bound. Maybe you are making too many transaction commits, or the buffer pool is too small. Making the buffer pool bigger can help, but do not set it bigger than 80 % of physical memory.
2. Wrap several modifications into one transaction. InnoDB must flush the log to disk at each transaction commit, if that transaction made modifications to the database. Since the rotation speed of a disk is typically at most 167 revolutions/second, that constrains the number of commits to the same 167/second if the disk does not fool the operating system.
If you can afford the loss of some latest committed transactions, you can
set the `my.cnf' parameter
4. Make your log files big, even as big as the buffer pool. When InnoDB has written the log files full, it has to write the modified contents of the buffer pool to disk in a checkpoint. Small log files will cause many unnecessary disk writes. The drawback in big log files is that recovery time will be longer.
5. Also the log buffer should be quite big, say 8 MB.
6. (Relevant from 3.23.39 up.)
In some versions of Linux and Unix, flushing files to disk with the Unix
7. In importing data to InnoDB, make sure that MySQL does not have
and after it
If you use the `mysqldump' option
8. Beware of big rollbacks of mass inserts: InnoDB uses the insert buffer to save disk i/o in inserts, but in a corresponding rollback no such mechanism is used. A disk-bound rollback can take 30 times the time of the corresponding insert. Killing the database process will not help because the rollback will start again at the database startup. The only way to get rid of a runaway rollback is to increase the buffer pool so that the rollback becomes CPU-bound and runs fast, or delete the whole InnoDB database.
Beware also of other big disk-bound operations.
Use the multi-line
INSERT INTO yourtable VALUES (1, 2), (5, 5);
This tip is of course valid for inserts into any table type, not just InnoDB.
Starting from version 3.23.41 InnoDB includes the InnoDB Monitor which prints information on the InnoDB internal state. When swithed on, InnoDB Monitor will make the MySQL server to print data to the standard output about once every 10 seconds. This data is useful in performance tuning.
The printed information includes data on:
You can start InnoDB Monitor through the following SQL command:
CREATE TABLE innodb_monitor(a int) type = innodb;
and stop it by
DROP TABLE innodb_monitor;
A sample output of the InnoDB Monitor:
================================ 010809 18:45:06 INNODB MONITOR OUTPUT ================================ -------------------------- LOCKS HELD BY TRANSACTIONS -------------------------- LOCK INFO: Number of locks in the record hash table 1294 LOCKS FOR TRANSACTION ID 0 579342744 TABLE LOCK table test/mytable trx id 0 582333343 lock_mode IX RECORD LOCKS space id 0 page no 12758 n bits 104 table test/mytable index PRIMARY trx id 0 582333343 lock_mode X Record lock, heap no 2 PHYSICAL RECORD: n_fields 74; 1-byte offs FALSE; info bits 0 0: len 4; hex 0001a801; asc ;; 1: len 6; hex 000022b5b39f; asc ";; 2: len 7; hex 000002001e03ec; asc ;; 3: len 4; hex 00000001; ... ----------------------------------------------- CURRENT SEMAPHORES RESERVED AND SEMAPHORE WAITS ----------------------------------------------- SYNC INFO: Sorry, cannot give mutex list info in non-debug version! Sorry, cannot give rw-lock list info in non-debug version! ----------------------------------------------------- SYNC ARRAY INFO: reservation count 6041054, signal count 2913432 4a239430 waited for by thread 49627477 op. S-LOCK file NOT KNOWN line 0 Mut ex 0 sp 5530989 r 62038708 sys 2155035; rws 0 8257574 8025336; rwx 0 1121090 1848344 ----------------------------------------------------- CURRENT PENDING FILE I/O'S -------------------------- Pending normal aio reads: Reserved slot, messages 40157658 4a4a40b8 Reserved slot, messages 40157658 4a477e28 ... Reserved slot, messages 40157658 4a4424a8 Reserved slot, messages 40157658 4a39ea38 Total of 36 reserved aio slots Pending aio writes: Total of 0 reserved aio slots Pending insert buffer aio reads: Total of 0 reserved aio slots Pending log writes or reads: Reserved slot, messages 40158c98 40157f98 Total of 1 reserved aio slots Pending synchronous reads or writes: Total of 0 reserved aio slots ----------- BUFFER POOL ----------- LRU list length 8034 Free list length 0 Flush list length 999 Buffer pool size in pages 8192 Pending reads 39 Pending writes: LRU 0, flush list 0, single page 0 Pages read 31383918, created 51310, written 2985115 ---------------------------- END OF INNODB MONITOR OUTPUT ============================ 010809 18:45:22 InnoDB starts purge 010809 18:45:22 InnoDB purged 0 pages
Some notes on the output:
Since InnoDB is a multiversioned database, it must keep information of old versions of rows in the tablespace. This information is stored in a data structure we call a rollback segment after an analogous data structure in Oracle.
InnoDB internally adds two fields to each row stored in the database. A 6-byte field tells the transaction identifier for the last transaction which inserted or updated the row. Also a deletion is internally treated as an update where a special bit in the row is set to mark it as deleted. Each row also contains a 7-byte field called the roll pointer. The roll pointer points to an undo log record written to the rollback segment. If the row was updated, then the undo log record contains the information necessary to rebuild the content of the row before it was updated.
InnoDB uses the information in the rollback segment to perform the undo operations needed in a transaction rollback. It also uses the information to build earlier versions of a row for a consistent read.
Undo logs in the rollback segment are divided into insert and update undo logs. Insert undo logs are only needed in transaction rollback and can be discarded as soon as the transaction commits. Update undo logs are used also in consistent reads, and they can be discarded only after there is no transaction present for which InnoDB has assigned a snapshot that in a consistent read could need the information in the update undo log to build an earlier version of a database row.
You must remember to commit your transactions regularly. Otherwise InnoDB cannot discard data from the update undo logs, and the rollback segment may grow too big, filling up your tablespace.
The physical size of an undo log record in the rollback segment is typically smaller than the corresponding inserted or updated row. You can use this information to calculate the space need for your rollback segment.
In our multiversioning scheme a row is not physically removed from the database immediately when you delete it with an SQL statement. Only when InnoDB can discard the update undo log record written for the deletion, it can also physically remove the corresponding row and its index records from the database. This removal operation is called a purge, and it is quite fast, usually taking the same order of time as the SQL statement which did the deletion.
Every InnoDB table has a special index called the clustered index
where the data of the rows is stored. If you define a
If you do not define a primary key for your table, InnoDB will internally generate a clustered index where the rows are ordered by the row id InnoDB assigns to the rows in such a table. The row id is a 6-byte field which monotonically increases as new rows are inserted. Thus the rows ordered by the row id will be physically in the insertion order.
Accessing a row through the clustered index is fast, because the row data will be on the same page where the index search leads us. In many databases the data is traditionally stored on a different page from the index record. If a table is large, the clustered index architecture often saves a disk i/o when compared to the traditional solution.
The records in non-clustered indexes (we also call them secondary indexes), in InnoDB contain the primary key value for the row. InnoDB uses this primary key value to search for the row from the clustered index. Note that if the primary key is long, the secondary indexes will use more space.
All indexes in InnoDB are B-trees where the index records are stored in the leaf pages of the tree. The default size of an index page is 16 kB. When new records are inserted, InnoDB tries to leave 1 / 16 of the page free for future insertions and updates of the index records.
If index records are inserted in a sequential (ascending or descending) order, the resulting index pages will be about 15/16 full. If records are inserted in a random order, then the pages will be 1/2 - 15/16 full. If the fillfactor of an index page drops below 1/2, InnoDB will try to contract the index tree to free the page.
It is a common situation in a database application that the primary key is a unique identifier and new rows are inserted in the ascending order of the primary key. Thus the insertions to the clustered index do not require random reads from a disk.
On the other hand, secondary indexes are usually non-unique and insertions happen in a relatively random order into secondary indexes. This would cause a lot of random disk i/o's without a special mechanism used in InnoDB.
If an index record should be inserted to a non-unique secondary index, InnoDB checks if the secondary index page is already in the buffer pool. If that is the case, InnoDB will do the insertion directly to the index page. But, if the index page is not found from the buffer pool, InnoDB inserts the record to a special insert buffer structure. The insert buffer is kept so small that it entirely fits in the buffer pool, and insertions can be made to it very fast.
The insert buffer is periodically merged to the secondary index trees in the database. Often we can merge several insertions on the same page in of the index tree, and hence save disk i/o's. It has been measured that the insert buffer can speed up insertions to a table up to 15 times.
If a database fits almost entirely in main memory, then the fastest way to perform queries on it is to use hash indexes. InnoDB has an automatic mechanism which monitors index searches made to the indexes defined for a table, and if InnoDB notices that queries could benefit from building of a hash index, such an index is automatically built.
But note that the hash index is always built based on an existing B-tree index on the table. InnoDB can build a hash index on a prefix of any length of the key defined for the B-tree, depending on what search pattern InnoDB observes on the B-tree index. A hash index can be partial: it is not required that the whole B-tree index is cached in the buffer pool. InnoDB will build hash indexes on demand to those pages of the index which are often accessed.
In a sense, through the adaptive hash index mechanism InnoDB adapts itself to ample main memory, coming closer to the architecture of main memory databases.
After a database startup, when a user first does an insert to a
InnoDB follows the same procedure in initializing the auto-increment counter for a freshly created table.
Note that if the user specifies in an insert the value 0 to the auto-increment column, then InnoDB treats the row like the value would not have been specified.
After the auto-increment counter has been initialized, if a user inserts a row where he explicitly specifies the column value, and the value is bigger than the current counter value, then the counter is set to the specified column value. If the user does not explicitly specify a value, then InnoDB increments the counter by one and assigns its new value to the column.
The auto-increment mechanism, when assigning values from the counter, bypasses locking and transaction handling. Therefore you may also get gaps in the number sequence if you roll back transactions which have got numbers from the counter.
The behavior of auto-increment is not defined if a user gives a negative value to the column or if the value becomes bigger than the maximum integer that can be stored in the specified integer type.
In disk i/o InnoDB uses asynchronous i/o. On Windows NT it uses the native asynchronous i/o provided by the operating system. On Unix, InnoDB uses simulated asynchronous i/o built into InnoDB: InnoDB creates a number of i/o threads to take care of i/o operations, such as read-ahead. In a future version we will add support for simulated aio on Windows NT and native aio on those versions of Unix which have one.
On Windows NT InnoDB uses non-buffered i/o. That means that the disk pages InnoDB reads or writes are not buffered in the operating system file cache. This saves some memory bandwidth.
Starting from 3.23.41 InnoDB uses a novel file flush technique called doublewrite. It adds safety to crash recovery after an operating system crash or a power outage, and improves performance on most Unix flavors by reducing the need for fsync operations.
Doublewrite means that InnoDB before writing pages to a data file first writes them to a contiguous tablespace area called the doublewrite buffer. Only after the write and the flush to the doublewrite buffer has completed, InnoDB writes the pages to their proper positions in the data file. If the operating system crashes in the middle of a page write, InnoDB will in recovery find a good copy of the page from the doublewrite buffer.
Starting from 3.23.41
you can also use a raw disk partition as a data file, though this has
not been tested yet. When you create a new data file you have
to put the keyword
When you start the database again you MUST change the keyword
Using a raw disk you can on some Unixes perform non-buffered i/o.
There are two read-ahead heuristics in InnoDB: sequential read-ahead and random read-ahead. In sequential read-ahead InnoDB notices that the access pattern to a segment in the tablespace is sequential. Then InnoDB will post in advance a batch of reads of database pages to the i/o system. In random read-ahead InnoDB notices that some area in a tablespace seems to be in the process of being fully read into the buffer pool. Then InnoDB posts the remaining reads to the i/o system.
The data files you define in the configuration file form the tablespace of InnoDB. The files are simply catenated to form the tablespace, there is no striping in use. Currently you cannot directly instruct where the space is allocated for your tables, except by using the following fact: from a newly created tablespace InnoDB will allocate space starting from the low end.
The tablespace consists of database pages whose default size is 16 kB. The pages are grouped into extents of 64 consecutive pages. The 'files' inside a tablespace are called segments in InnoDB. The name of the rollback segment is somewhat misleading because it actually contains many segments in the tablespace.
For each index in InnoDB we allocate two segments: one is for non-leaf nodes of the B-tree, the other is for the leaf nodes. The idea here is to achieve better sequentiality for the leaf nodes, which contain the data.
When a segment grows inside the tablespace, InnoDB allocates the first 32 pages to it individually. After that InnoDB starts to allocate whole extents to the segment. InnoDB can add to a large segment up to 4 extents at a time to ensure good sequentiality of data.
Some pages in the tablespace contain bitmaps of other pages, and therefore a few extents in an InnoDB tablespace cannot be allocated to segments as a whole, but only as individual pages.
When you issue a query
When you delete data from a table, InnoDB will contract the corresponding B-tree indexes. It depends on the pattern of deletes if that frees individual pages or extents to the tablespace, so that the freed space is available for other users. Dropping a table or deleting all rows from it is guaranteed to release the space to other users, but remember that deleted rows can be physically removed only in a purge operation after they are no longer needed in transaction rollback or consistent read.
If there are random insertions or deletions in the indexes of a table, the indexes may become fragmented. By fragmentation we mean that the physical ordering of the index pages on the disk is not close to the alphabetical ordering of the records on the pages, or that there are many unused pages in the 64-page blocks which were allocated to the index.
It can speed up index scans if you
If the insertions to and index are always ascending and records are deleted only from the end, then the the file space management algorithm of InnoDB guarantees that fragmentation in the index will not occur.
The error handling in InnoDB is not always the same as specified in the ANSI SQL standards. According to the ANSI standard, any error during an SQL statement should cause the rollback of that statement. InnoDB sometimes rolls back only part of the statement. The following list specifies the error handling of InnoDB.
phone: 358-9-6969 3250 (office) 358-40-5617367 (mobile) InnoDB Oy Inc. World Trade Center Helsinki Aleksanterinkatu 17 P.O.Box 800 00101 Helsinki Finland