1.确认已安装
很多人动不动就想到要用分布式数据库,其实单库的性能随着MYSQL版本的提升越来越高,单表1000w数据应用优化的好的话用单库是完全没问题的。一个更优的做法是做表分区。也就是mysql partition。本文大量参考官网和。
这里以mysql5.7.15为例。确认有partition插件。
1 2
| mysql> SHOW PLUGINS; partition | ACTIVE | STORAGE ENGINE | NULL | GPL
|
有partition就ok。
另外5.7有所不同的是mysql_install_db以后,会生成一个随便密码,在~/.mysql_secret,之后用
1 2 3
| SET PASSWORD FOR 'root'@'localhost' = PASSWORD('new_password'); 或者: ALTER USER user IDENTIFIED BY 'new_password';
|
重置密码,然后就可以开始使用了。
5.7给我的感觉很不错。repo源很快,mysql被oracle买了以后运作的很规范,文档也是全的不行。
2.分区的类型
介绍常用的4中分区(当然不止这4个):
- RANGE分区
- LIST分区
- HASH分区
- KEY分区
3.RANGE分区
很好理解,就是安装区域来分。
这里先看不用分区,
1 2 3
| create table t1( id INT )engine=innodb
|
默认生成一个ibd文件。
1 2 3 4 5 6
| create table t2( id INT )engine=innodb partition by RANGE(id)( partition p0 values less than (10), partition p1 values less than (20));
|
t2用了分区,可以看到p0,p1分区信息的ibd文件。
1
| t2.frm t2#P#p0.ibd t2#P#p1.ibd
|
再插入几条数据试试
1 2 3
| insert into t2 select 9; insert into t2 select 10; insert into t2 select 15;
|
可以通过information_schema.partitions中的有相关分区信息:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
| mysql> select * from information_schema.partitions where table_name='t2'\G *************************** 1. row *************************** TABLE_CATALOG: def TABLE_SCHEMA: test TABLE_NAME: t2 PARTITION_NAME: p0 SUBPARTITION_NAME: NULL PARTITION_ORDINAL_POSITION: 1 SUBPARTITION_ORDINAL_POSITION: NULL PARTITION_METHOD: RANGE SUBPARTITION_METHOD: NULL PARTITION_EXPRESSION: id SUBPARTITION_EXPRESSION: NULL PARTITION_DESCRIPTION: 10 TABLE_ROWS: 1 AVG_ROW_LENGTH: 16384 DATA_LENGTH: 16384 MAX_DATA_LENGTH: NULL INDEX_LENGTH: 0 DATA_FREE: 0 CREATE_TIME: 2016-09-26 15:34:03 UPDATE_TIME: 2016-09-26 15:34:52 CHECK_TIME: NULL CHECKSUM: NULL PARTITION_COMMENT: NODEGROUP: default TABLESPACE_NAME: NULL *************************** 2. row *************************** TABLE_CATALOG: def TABLE_SCHEMA: test TABLE_NAME: t2 PARTITION_NAME: p1 SUBPARTITION_NAME: NULL PARTITION_ORDINAL_POSITION: 2 SUBPARTITION_ORDINAL_POSITION: NULL PARTITION_METHOD: RANGE SUBPARTITION_METHOD: NULL PARTITION_EXPRESSION: id SUBPARTITION_EXPRESSION: NULL PARTITION_DESCRIPTION: 20 TABLE_ROWS: 2 AVG_ROW_LENGTH: 8192 DATA_LENGTH: 16384 MAX_DATA_LENGTH: NULL INDEX_LENGTH: 0 DATA_FREE: 0 CREATE_TIME: 2016-09-26 15:34:03 UPDATE_TIME: 2016-09-26 15:34:58 CHECK_TIME: NULL CHECKSUM: NULL PARTITION_COMMENT: NODEGROUP: default TABLESPACE_NAME: NULL 2 rows in set (0.01 sec)
|
但是这存在一个问题
1 2
| mysql> insert into t2 select 30; ERROR 1526 (HY000): Table has no partition for value 30
|
插入30时即会报错,分区range没有涵盖所以条件,当插入30这个值的时候MYSQL不知道插入到哪。我们可以增加一个小于MAXVALUE的分区来涵盖所以的区域:
1 2 3 4 5 6 7 8 9 10 11 12
| alter table t2 add partition (partition p2 values less than MAXVALUE); Query OK, 0 rows affected (0.06 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> insert into t2 select 30; Query OK, 1 row affected (0.01 sec) Records: 1 Duplicates: 0 Warnings: 0 mysql> insert into t2 select 40; Query OK, 1 row affected (0.00 sec) Records: 1 Duplicates: 0 Warnings: 0
|
再插入遍不会报错了。
range分区主要用于日期列的分区。
mysql里面data type 格式:
1 2 3 4 5 6
| Data Type “Zero” Value DATE '0000-00-00' TIME '00:00:00' DATETIME '0000-00-00 00:00:00' TIMESTAMP '0000-00-00 00:00:00' YEAR 0000
|
官网举了个例子
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| CREATE TABLE members ( firstname VARCHAR(25) NOT NULL, lastname VARCHAR(25) NOT NULL, username VARCHAR(16) NOT NULL, email VARCHAR(35), joined DATE NOT NULL ) PARTITION BY RANGE( YEAR(joined) ) ( PARTITION p0 VALUES LESS THAN (1960), PARTITION p1 VALUES LESS THAN (1970), PARTITION p2 VALUES LESS THAN (1980), PARTITION p3 VALUES LESS THAN (1990), PARTITION p4 VALUES LESS THAN MAXVALUE );
|
按joined日期来的YEAR函数来分区,
把上面的表格简化一下,变成t3:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| CREATE TABLE t3 ( firstname VARCHAR(25) NOT NULL, joined DATE NOT NULL ) PARTITION BY RANGE( YEAR(joined) ) ( PARTITION p0 VALUES LESS THAN (1960), PARTITION p1 VALUES LESS THAN (1970), PARTITION p2 VALUES LESS THAN (1980), PARTITION p3 VALUES LESS THAN (1990), PARTITION p4 VALUES LESS THAN MAXVALUE ); mysql> select * from t3; +-----------+------------+ | firstname | joined | +-----------+------------+ | aaa | 1955-01-01 | | bbb | 1965-01-01 | | ccc | 1966-02-02 | | ccc | 1967-03-03 | | ccc | 1977-03-03 | | ccc | 1987-03-03 | +-----------+------------+
|
然后做select查询
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
| mysql> select * from t3 where joined<'1969-12-30'; +-----------+------------+ | firstname | joined | +-----------+------------+ | aaa | 1955-01-01 | | bbb | 1965-01-01 | | ccc | 1966-02-02 | | ccc | 1967-03-03 | +-----------+------------+ 4 rows in set (0.00 sec) mysql> explain partitions select * from t3 where joined<'1969-12-30'; +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+ | 1 | SIMPLE | t3 | p0,p1 | ALL | NULL | NULL | NULL | NULL | 4 | 33.33 | Using where | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+ 1 row in set, 2 warnings (0.00 sec)
|
year(‘1955-01-01’)返回1955,表格根据year做了分区,可以看到where子句只扫了p0,p1两个分区,提升了查询效率。这个叫partition pruning,官网地址 http://dev.mysql.com/doc/refman/5.7/en/partitioning-pruning.html
但并不是对所以的函数都能进行partition pruning,只是几个时间的函数,YEAR(),TO_DAYS(),TO_SECONDS(),UNIX_TIMESTAMP()这几个有效。
4.LIST分区
LIST分区跟RANGE特别类似,只是RANGE是连续的值,用less than 表达,LIST分区用VALUES IN(…)来表达,是离散的值。
1 2 3 4 5 6
| create table t4( a INT, b INT)engine=innodb partition by LIST(b)( partition p0 values in (1,3,5,7,9), partition p1 values in (2,4,6,8,0));
|
这时候你select * from t4 where b=1;那么只会扫p1分区。
5.HASH分区
hash目的将数据均匀分布到预设的分区中,保证各分区数据量大致一样。HASH分区也和之前的其他分区一样,需要制定分区的函数,以及HASH分区的个数。如下:
1 2 3 4 5
| create table t_hash( a INT, b DATETIME) engine=innodb partition by HASH (YEAR(b)) partitions 4;
|
上面根据YEAR的结果进行hash分区,不是很科学,最好根据自增字段进行分区,如下:
1 2 3 4 5 6
| create table t_hash2( a INT AUTO_INCREMENT, b DATETIME, PRIMARY KEY (a)) engine=innodb partition by HASH(a) partitions 4;
|
6.KEY分区
KEY分区跟HASH分区类似,不过HASH分区根据用户提供的函数进行分区,而KEY分区则使用MYSQL内部的散列函数进行分区。
1 2 3 4 5
| create table t_key( a INT, b DATETIME) engine=innodb partition by KEY(b) partitions 4;
|