参考了 MySQL Performance Blog 上的很多内容,同时浏览了 High Performance MySQL, Second Edition 这本书中部分内容,对MySQL索引的使用和优化有了一些新的认识。
主要有几点:
1)充分利用聚集索引和覆盖索引的优势,另外注意使用的场景和条件。
2)书中对Optimizing Sorts和Optimizing LIMIT and OFFSET分别放在了2个不同的章节,但说的有一个共同点。
InnoDB 索引
聚集索引 - InnoDB,MyISAM不是
Clustered Index
非聚集索引
non-clustered indexes (Secondary Indexes)
其它概念
联合索引(组合索引)
composite indexes
Multiple-Column Indexes
覆盖索引
Covering Indexes
MySQL高效使用索引
Using index to find rows
The main purpose of the index is to find rows quickly - without scanning whole data set.
Using Index to Sort Data
Another great benefit of BTREE index is - it allows to retrieve data in sorted form hence avoiding external sort process for executing of queries which require sorting.
MySQL高效使用索引
Using index to read data
Some storage engines (MyISAM and Innodb included) can also use index to read the data, hence avoiding to read the row data itself.
If MySQL is only reading index and not accessing rows you will see "using index" in EXPLAIN output.
让MySQL使用正确的索引
有索引就使用"最好"的索引,除非优化器认为使用表扫描更有效。
是否使用扫描取决于是否"最好"的索引跨越超过(30%)的表。
其它因素:例如表大小、行数和I/O块大小,固定比例不再决定选择使用索引还是扫描。
有时MySQL不使用索引,即使有可用的索引。一种情形是当优化器估计到使用索引将需要MySQL访问表中的大部分行时。
(在这种情况下,表扫描可能会更快些,因为需要的搜索要少)。然而,如果此类查询使用LIMIT只搜索部分行,MySQL则使用索引,因为它可以更快地找到几行并在结果中返回。
OPTIMIZE
ANALYZ
USE/Force INDEX
IGNORE INDEX
Negate use of PRIMARY KEY
Make MySQL think the problem is harder than it really is
MySQL如何优化排序
在某些情况中,MySQL可以使用一个索引来满足ORDER BY子句,而不需要额外的排序。
即使ORDER BY不确切匹配索引,只要WHERE子句中的所有未使用的索引部分和所有额外的ORDER BY 列为常数,就可以使用索引。
索引的问题
增加了写操作的开销
非序列增长主键带来的开销
理想
只有一个聚集索引
查询只根据主键来进行
插入数据主键顺序递增
使用合适的最小类型
...
->Sphinx
==============================================================
Optimizing Sorts
The last issue we want to cover in this case study is sorting. Sorting small result sets
with filesorts is fast, but what if millions of rows match a query? For example, what if
only sex is specified in the WHERE clause?
We can add special indexes for sorting these low-selectivity cases. For example, an
index on (sex,rating) can be used for the following query:
mysql> SELECT <cols> FROM profiles WHERE sex='M' ORDER BY rating LIMIT 10;
This query has both ORDER BY and LIMIT clauses, and it would be very slow without
the index.
Even with the index, the query can be slow if the user interface is paginated and
someone requests a page that's not near the beginning. This case creates a bad com-
bination of ORDER BY and LIMIT with an offset:
mysql> SELECT <cols> FROM profiles WHERE sex='M' ORDER BY rating LIMIT 100000, 10;
Such queries can be a serious problem no matter how they're indexed, because the
high offset requires them to spend most of their time scanning a lot of data that they
will then throw away. Denormalizing, precomputing, and caching are likely to be the
only strategies that work for queries like this one. An even better strategy is to limit
the number of pages you let the user view. This is unlikely to impact the user's expe-
rience, because no one really cares about the 10,000th page of search results.
Another good strategy for optimizing such queries is to use a covering index to
retrieve just the primary key columns of the rows you'll eventually retrieve. You can
then join this back to the table to retrieve all desired columns. This helps minimize
the amount of work MySQL must do gathering data that it will only throw away.
Here's an example that requires an index on (sex, rating) to work efficiently:
mysql> SELECT <cols> FROM profiles INNER JOIN (
-> SELECT <primary key cols> FROM profiles
-> WHERE x.sex='M' ORDER BY rating LIMIT 100000, 10
-> ) AS x USING(<primary key cols>);
Optimizing LIMIT and OFFSET
Queries with LIMITs and OFFSETs are common in systems that do pagination, nearly
always in conjunction with an ORDER BY clause. It's helpful to have an index that sup-
ports the ordering; otherwise, the server has to do a lot of filesorts.
A frequent problem is having a high value for the offset. If your query looks like
LIMIT 10000, 20, it is generating 10,020 rows and throwing away the first 10,000 of
them, which is very expensive. Assuming all pages are accessed with equal fre-
quency, such queries scan half the table on average. To optimize them, you can
either limit how many pages are permitted in a pagination view, or try to make the
high offsets more efficient.
One simple technique to improve efficiency is to do the offset on a covering index,
rather than the full rows. You can then join the result to the full row and retrieve the
additional columns you need. This can be much more efficient. Consider the follow-
ing query:
mysql> SELECT film_id, description FROM sakila.film ORDER BY title LIMIT 50, 5;
If the table is very large, this query is better written as follows:
mysql> SELECT film.film_id, film.description
-> FROM sakila.film
-> INNER JOIN (
-> SELECT film_id FROM sakila.film
-> ORDER BY title LIMIT 50, 5
-> ) AS lim USING(film_id);
This works because it lets the server examine as little data as possible in an index
without accessing rows, and then, once the desired rows are found, join them against
the full table to retrieve the other columns from the row. A similar technique applies
to joins with LIMIT clauses.
==============================================================
--EOF--
Leave a comment