GNU Info

Info Node: (mysql.info)Fulltext Search

(mysql.info)Fulltext Search


Prev: Transactional Commands Up: Reference
Enter node , (file) or (file)node

MySQL Full-text Search
======================

Since Version 3.23.23, MySQL has support for full-text indexing and
searching.  Full-text indexes in MySQL are an index of type `FULLTEXT'.
`FULLTEXT' indexes can be created from `VARCHAR' and `TEXT' columns at
`CREATE TABLE' time or added later with `ALTER TABLE' or `CREATE
INDEX'.  For large datasets, adding `FULLTEXT' index with `ALTER TABLE'
(or `CREATE INDEX') would be much faster than inserting rows into the
empty table with a `FULLTEXT' index.

Full-text search is performed with the `MATCH' function.

     mysql> CREATE TABLE articles (
         ->   id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
         ->   title VARCHAR(200),
         ->   body TEXT,
         ->   FULLTEXT (title,body)
         -> );
     Query OK, 0 rows affected (0.00 sec)
     
     mysql> INSERT INTO articles VALUES
         -> (0,'MySQL Tutorial', 'DBMS stands for DataBase Management ...'),
         -> (0,'How To Use MySQL Efficiently', 'After you went through a ...'),
         -> (0,'Optimizing MySQL','In this tutorial we will show how to ...'),
         -> (0,'1001 MySQL Trick','1. Never run mysqld as root. 2. Normalize ...'),
         -> (0,'MySQL vs. YourSQL', 'In the following database comparison we ...'),
         -> (0,'MySQL Security', 'When configured properly, MySQL could be ...');
     Query OK, 5 rows affected (0.00 sec)
     Records: 5  Duplicates: 0  Warnings: 0
     
     mysql> SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('database');
     +----+-------------------+---------------------------------------------+
     | id | title             | body                                        |
     +----+-------------------+---------------------------------------------+
     |  5 | MySQL vs. YourSQL | In the following database comparison we ... |
     |  1 | MySQL Tutorial    | DBMS stands for DataBase Management ...     |
     +----+-------------------+---------------------------------------------+
     2 rows in set (0.00 sec)

The function `MATCH' matches a natural language query `AGAINST' a text
collection (which is simply the set of columns covered by a `FULLTEXT'
index).  For every row in a table it returns relevance - a similarity
measure between the text in that row (in the columns that are part of
the collection) and the query.  When it is used in a `WHERE' clause
(see example above) the rows returned are automatically sorted with
relevance decreasing.  Relevance is a non-negative floating-point
number.  Zero relevance means no similarity.  Relevance is computed
based on the number of words in the row, the number of unique words in
that row, the total number of words in the collection, and the number
of documents (rows) that contain a particular word.

The above is a basic example of using `MATCH' function. Rows are
returned with relevance decreasing.

     mysql> SELECT id,MATCH (title,body) AGAINST ('Tutorial') FROM articles;
     +----+-----------------------------------------+
     | id | MATCH (title,body) AGAINST ('Tutorial') |
     +----+-----------------------------------------+
     |  1 |                        0.64840710366884 |
     |  2 |                                       0 |
     |  3 |                        0.66266459031789 |
     |  4 |                                       0 |
     |  5 |                                       0 |
     |  6 |                                       0 |
     +----+-----------------------------------------+
     5 rows in set (0.00 sec)

This example shows how to retrieve the relevances. As neither `WHERE'
nor `ORDER BY' clauses are present, returned rows are not ordered.

     mysql> SELECT id, body, MATCH (title,body) AGAINST (
         -> 'Security implications of running MySQL as root') AS score
         -> FROM articles WHERE MATCH (title,body) AGAINST
         -> ('Security implications of running MySQL as root');
     +----+-----------------------------------------------+-----------------+
     | id | body                                          | score           |
     +----+-----------------------------------------------+-----------------+
     |  4 | 1. Never run mysqld as root. 2. Normalize ... | 1.5055546709332 |
     |  6 | When configured properly, MySQL could be ...  |   1.31140957288 |
     +----+-----------------------------------------------+-----------------+
     2 rows in set (0.00 sec)

This is more complex example - the query returns the relevance and still
sorts the rows with relevance decreasing. To achieve it one should
specify `MATCH' twice. Note, that this will cause no additional
overhead, as MySQL optimizer will notice that these two `MATCH' calls
are identical and will call full-text search code only once.

MySQL uses a very simple parser to split text into words.  A "word" is
any sequence of letters, numbers, `'', and `_'.  Any "word" that is
present in the stopword list or just too short (3 characters or less)
is ignored.

Every correct word in the collection and in the query is weighted,
according to its significance in the query or collection.  This way, a
word that is present in many documents will have lower weight (and may
even have a zero weight), because it has lower semantic value in this
particular collection.  Otherwise, if the word is rare, it will receive
a higher weight.  The weights of the words are then combined to compute
the relevance of the row.

Such a technique works best with large collections (in fact, it was
carefully tuned this way).  For very small tables, word distribution
does not reflect adequately their semantical value, and this model may
sometimes produce bizarre results.

     mysql> SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('MySQL');
     Empty set (0.00 sec)

Search for the word `MySQL' produces no results in the above example.
Word `MySQL' is present in more than half of rows, and as such, is
effectively treated as a stopword (that is, with semantical value zero).
It is, really, the desired behavior - a natural language query should
not return every second row in 1GB table.

A word that matches half of rows in a table is less likely to locate
relevant documents.  In fact, it will most likely find plenty of
irrelevant documents.  We all know this happens far too often when we
are trying to find something on the Internet with a search engine.  It
is with this reasoning that such rows have been assigned a low
semantical value in *this particular dataset*.

Fulltext restrictions
Fulltext restrictions
Fulltext Fine-tuning
Fine-tuning MySQL Full-text Search
Fulltext Features to Appear in MySQL 4.0
New Features of Full-text Search to Appear in MySQL 4.0
Fulltext TODO
Full-text Search TODO

automatically generated by info2www version 1.2.2.9