GNU Info

Info Node: (mysql.info)Tips

(mysql.info)Tips


Prev: Delete speed Up: Query Speed
Enter node , (file) or (file)node

Other Optimization Tips
-----------------------

Unsorted tips for faster systems:

   * Use persistent connections to the database to avoid the connection
     overhead. If you can't use persistent connections and you are
     doing a lot of new connections to the database, you may want to
     change the value of the `thread_cache_size' variable. Note: Server
     parameters.

   * Always check that all your queries really use the indexes you have
     created in the tables. In MySQL you can do this with the `EXPLAIN'
     command. Note: Explain.

   * Try to avoid complex `SELECT' queries on tables that are updated a
     lot. This is to avoid problems with table locking.

   * The new `MyISAM' tables can insert rows in a table without deleted
     rows at the same time another table is reading from it.  If this
     is important for you, you should consider methods where you don't
     have to delete rows or run `OPTIMIZE TABLE' after you have deleted
     a lot of rows.

   * Use `ALTER TABLE ... ORDER BY expr1,expr2...' if you mostly
     retrieve rows in expr1,expr2.. order.  By using this option after
     big changes to the table, you may be able to get higher
     performance.

   * In some cases it may make sense to introduce a column that is
     'hashed' based on information from other columns. If this column
     is short and reasonably unique it may be much faster than a big
     index on many columns. In MySQL it's very easy to use this extra
     column: `SELECT * FROM table_name WHERE hash=MD5(concat(col1,col2))
     AND col_1='constant' AND col_2='constant''

   * For tables that change a lot you should try to avoid all `VARCHAR'
     or `BLOB' columns. You will get dynamic row length as soon as you
     are using a single `VARCHAR' or `BLOB' column. Note: Table types.

   * It's not normally useful to split a table into different tables
     just because the rows gets 'big'. To access a row, the biggest
     performance hit is the disk seek to find the first byte of the
     row. After finding the data most new disks can read the whole row
     fast enough for most applications. The only cases where it really
     matters to split up a table is if it's a dynamic row size table
     (see above) that you can change to a fixed row size, or if you
     very often need to scan the table and don't need most of the
     columns. Note: Table types.

   * If you very often need to calculate things based on information
     from a lot of rows (like counts of things), it's probably much
     better to introduce a new table and update the counter in real
     time. An update of type `UPDATE table set count=count+1 where
     index_column=constant' is very fast!

     This is really important when you use databases like MySQL that
     only have table locking (multiple readers / single writers). This
     will also give better performance with most databases, as the row
     locking manager in this case will have less to do.

   * If you need to collect statistics from big log tables, use summary
     tables instead of scanning the whole table. Maintaining the
     summaries should be much faster than trying to do statistics
     'live'. It's much faster to regenerate new summary tables from the
     logs when things change (depending on business decisions) than to
     have to change the running application!

   * If possible, one should classify reports as 'live' or
     'statistical', where data needed for statistical reports are only
     generated based on summary tables that are generated from the
     actual data.

   * Take advantage of the fact that columns have default values. Insert
     values explicitly only when the value to be inserted differs from
     the default. This reduces the parsing that MySQL need to do and
     improves the insert speed.

   * In some cases it's convenient to pack and store data into a blob.
     In this case you have to add some extra code in your application
     to pack/unpack things in the blob, but this may save a lot of
     accesses at some stage.  This is practical when you have data that
     doesn't conform to a static table structure.

   * Normally you should try to keep all data non-redundant (what is
     called 3rd normal form in database theory), but you should not be
     afraid of duplicating things or creating summary tables if you
     need these to gain more speed.

   * Stored procedures or UDF (user-defined functions) may be a good
     way to get more performance.  In this case you should, however,
     always have a way to do this some other (slower) way if you use
     some database that doesn't support this.

   * You can always gain something by caching queries/answers in your
     application and trying to do many inserts/updates at the same
     time.  If your database supports lock tables (like MySQL and
     Oracle), this should help to ensure that the index cache is only
     flushed once after all updates.

   * Use `INSERT /*! DELAYED */' when you do not need to know when your
     data is written. This speeds things up because many records can be
     written with a single disk write.

   * Use `INSERT /*! LOW_PRIORITY */' when you want your selects to be
     more important.

   * Use `SELECT /*! HIGH_PRIORITY */' to get selects that jump the
     queue. That is, the select is done even if there is somebody
     waiting to do a write.

   * Use the multi-line `INSERT' statement to store many rows with one
     SQL command (many SQL servers supports this).

   * Use `LOAD DATA INFILE' to load bigger amounts of data. This is
     faster than normal inserts and will be even faster when `myisamchk'
     is integrated in `mysqld'.

   * Use `AUTO_INCREMENT' columns to make unique values.

   * Use `OPTIMIZE TABLE' once in a while to avoid fragmentation when
     using dynamic table format. Note: `OPTIMIZE TABLE'.

   * Use `HEAP' tables to get more speed when possible. Note: Table
     types.

   * When using a normal Web server setup, images should be stored as
     files. That is, store only a file reference in the database.  The
     main reason for this is that a normal Web server is much better at
     caching files than database contents. So it it's much easier to
     get a fast system if you are using files.

   * Use in memory tables for non-critical data that are accessed often
     (like information about the last shown banner for users that don't
     have cookies).

   * Columns with identical information in different tables should be
     declared identical and have identical names. Before Version 3.23
     you got slow joins otherwise.

     Try to keep the names simple (use `name' instead of
     `customer_name' in the customer table). To make your names portable
     to other SQL servers you should keep them shorter than 18
     characters.

   * If you need REALLY high speed, you should take a look at the
     low-level interfaces for data storage that the different SQL
     servers support!  For example, by accessing the MySQL `MyISAM'
     directly, you could get a speed increase of 2-5 times compared to
     using the SQL interface.  To be able to do this the data must be
     on the same server as the application, and usually it should only
     be accessed by one process (because external file locking is
     really slow).  One could eliminate the above problems by
     introducing low-level `MyISAM' commands in the MySQL server (this
     could be one easy way to get more performance if needed).  By
     carefully designing the database interface, it should be quite
     easy to support this types of optimization.

   * In many cases it's faster to access data from a database (using a
     live connection) than accessing a text file, just because the
     database is likely to be more compact than the text file (if you
     are using numerical data), and this will involve fewer disk
     accesses.  You will also save code because you don't have to parse
     your text files to find line and column boundaries.

   * You can also use replication to speed things up. Note:
     Replication.

   * Declaring a table with `DELAY_KEY_WRITE=1' will make the updating
     of indexes faster, as these are not logged to disk until the file
     is closed.  The downside is that you should run `myisamchk' on
     these tables before you start `mysqld' to ensure that they are
     okay if something killed `mysqld' in the middle.  As the key
     information can always be generated from the data, you should not
     lose anything by using `DELAY_KEY_WRITE'.


automatically generated by info2www version 1.2.2.9