Tens of millions of data query optimization of Mysql
Mysql query optimization.
First, there are more than 10 million pieces of data in a table in Mysql database. How to quickly find out 100 pieces of data after the 9 millionth?
How can I check, who can tell me the answer? Does anyone think about it, don’t you get a statement?
select * from table limit 9000000,100;
Then let’s try, let’s take a look at this SQL.
Checked 100 data and used 7.063s. Can this be a quick query? It is estimated that no one can accept this speed!
First, I demonstrate the big data paging query. There are more than 10 million data in test table, and then use the limit for paging test: select * from test limit 0,100;
Select * from test limit 1000,100;
Select * from test limit 10000,100;
Select * from test limit 100000,100;
Select * from test limit 500000,100;
Select * from test limit 1000000,100;
Time consuming: 0.823s
Select * from test limit 5000000,100;
Time consuming: 3.909s
Select * from test limit 10000000,100;
We found a phenomenon where the more the paged query, the slower the query. This also leads us to a conclusion:
1, the query time of the limit statement is proportional to the position of the starting record.
2, mysql limit statement is very convenient, but the table is not suitable for direct use.
Optimized paging performance for large data volume limit
When it comes to query optimization, the first thing we think of is the use of indexes. In a statement that utilizes an index query, if the condition only contains that index column, then in this case the query speed is very fast. Because the index search has a corresponding optimization algorithm, and the data is on the query index, there is no need to find the relevant data address, which saves a lot of time. In addition, there is a related index cache in Mysql, and it is better to use the cache when the concurrency is high.
My test table uses InnoDB as the storage engine, id as the self-incrementing primary key, and the default is the primary key index. Then let’s use the overlay index query to see how it works:
SELECT id FROM test LIMIT 9000000,100;
The total time is 4.256s, which is much less than the 7.063s. There are two optimization schemes now, that is, using id as a query condition to use subquery implementation and using join implementation; 1, id>= (subquery) form implementation
select * from test where id >= (select id from test limit 9000000,1)limit 0,100
It takes 4.262s;
2, using the form of join;
SELECT * FROM test a JOIN (SELECT id FROM test LIMIT 9000000,100) b ON a.id = b.id
It takes 4.251s; the two optimized queries use time is relatively close. In fact, both use a principle, so the effect is similar. However, personal advice is best to use join to minimize the use of subqueries. Note: At present, it is a tens of millions of queries. If it is going to the million level, the speed will be faster. I have personally tested the statement and the query time is 0.410s.
SELECT * FROM test a JOIN (SELECT id FROM test LIMIT 1000000,100) b ON a.id = b.id
Second, you have used mysql storage engines, what characteristics and differences do they have?
This is a question frequently asked by senior developers during an interview. In fact, we often encounter in the usual development, when creating a table with tools such as SQLyog, there is an engine item to choose. As shown below:
There are so many storage engines for MySQL. Actually, we use the most in InnoDB and MyISAM. All if the interviewer asks mysql which storage engine, you only need to tell the two commonly used. What characteristics and differences do they have? MyISAM : The default table type, which is based on the traditional ISAM type. ISAM is the abbreviation for Indexed Sequential Access Method, which is the standard way to store records and files. Not transaction-safe, and does not support foreign keys. If you execute a large number of selects, insert MyISAM is suitable. InnoDB : An engine that supports transaction security. Supporting foreign keys, row locks, and transactions is his biggest feature. If you have a large number of updates and inserts, it is recommended to use InnoDB, especially for multiple concurrency and higher QPS. Note: In versions prior to MySQL 5.5, the default search engine was MyISAM. From MySQL 5.5 and later, the default search engine was changed to InnoDB. The difference between MyISAM and InnoDB:
1, InnoDB supports transactions, MyISAM does not support. For InnoDB, every SQL language is encapsulated into a transaction by default, and it is automatically submitted. This will affect the speed. Therefore, it is best to put multiple SQL languages between begin and commit to form a transaction.
2, InnoDB supports foreign keys, and MyISAM does not.
3, InnoDB is a clustered index, using B+Tree as the index structure, the data file is tied with the (primary key) index (the table data file itself is an index structure organized by B+Tree), must have a primary key, through Primary key indexing is very efficient. MyISAM is a non-clustered index. It also uses B+Tree as the index structure. The index and data files are separated. The index stores pointers to data files. The primary key index and the secondary index are independent.
4, InnoDB does not save the specific number of rows of the table, the implementation of select count (*) from table requires a full table scan. MyISAM uses a variable to save the number of rows in the entire table. When executing the above statement, you only need to read the variable, which is very fast.
5, Innodb does not support full-text indexing, and MyISAM supports full-text indexing, query efficiency is high on MyISAM; InnoDB after 5.7 supports full-text indexing.
6, InnoDB supports table, row-level locks (default), and MyISAM supports table-level locks. ;
7, InnoDB table must have a primary key (the user will find or produce a primary key if not specified by the user), and Myisam can not.
8, Innodb storage files have frm, ibd, and Myisam is frm, MYD, MYI.
Innodb:frm is a table definition file, and ibd is a data file.
Myisam: frm is a table definition file, myd is a data file, and myi is an index file.
Third, the optimization of Mysql complex query statement, what do you do?
Speaking of complex SQL optimization, the most is due to the multi-table association caused by a large number of complex SQL statements, then we get the sql how to optimize in the end, the actual optimization is also a routine, as long as the implementation of the routine. Complex SQL optimization solution:
1. Check the SQL using the EXPLAIN keyword. EXPLAIN can help you analyze the performance bottleneck of your query statement or table structure, and the result of EXPLAIN will also tell you how your index primary key is used, how your data table is searched and sorted, whether there is Full table scan, etc.;
2, the conditions of the query try to use the index field, such as a table has multiple conditions, try to use the composite index query, the composite index should pay attention to the order of the fields.
3, multi-table associations try to use join, reduce the use of sub-queries. The associated field of the table uses the primary key if the primary key can be used, that is, the index field is used as much as possible. If the associated field is not an index field, you can add an index as appropriate.
4, try to use the limit for page-by-page bulk query, do not get all at once.
5, absolutely avoid the use of select *, try to select the specific required fields, reduce the query of unnecessary fields;
6, try to convert or to union all.
7, try to avoid using is null or is not null.
8, pay attention to the use of like, the front blur and full blur will not go indexing.
9, the query field after the Where to minimize the use of the function, because the function will cause the index to fail.
10, avoid using not equal (!=) because it will not use the index.
11, with exists instead of in, not exists instead of not in, efficiency will be better;
12. Avoid using the HAVING clause. HAVING will only filter the result set after all records have been retrieved. This process requires sorting, totaling, and so on. If you can limit the number of records through the WHERE clause, you can reduce this overhead.
13, do not ORDER BY RAND ()
Next, I will continue to summarize some of the questions in the interview and share them with everyone. If you feel that the content is good, please pay attention to this public number, I will push some dry goods to you from time to time.