1.确认已安装
很多人动不动就想到要用分布式数据库,其实单库的性能随着MYSQL版本的提升越来越高,单表1000w数据应用优化的好的话用单库是完全没问题的。一个更优的做法是做表分区。也就是mysql partition。本文大量参考官网和。
这里以mysql5.7.15为例。确认有partition插件。
1 2 
  | mysql> SHOW PLUGINS; partition                  | ACTIVE   | STORAGE ENGINE     | NULL    | GPL 
  | 
 
有partition就ok。
另外5.7有所不同的是mysql_install_db以后,会生成一个随便密码,在~/.mysql_secret,之后用
1 2 3 
  | SET PASSWORD FOR 'root'@'localhost' = PASSWORD('new_password'); 或者: ALTER USER user IDENTIFIED BY 'new_password'; 
  | 
 
重置密码,然后就可以开始使用了。
5.7给我的感觉很不错。repo源很快,mysql被oracle买了以后运作的很规范,文档也是全的不行。
2.分区的类型
介绍常用的4中分区(当然不止这4个):
- RANGE分区
 
- LIST分区
 
- HASH分区
 
- KEY分区
 
3.RANGE分区
很好理解,就是安装区域来分。
这里先看不用分区,
1 2 3 
  | create table t1( id INT )engine=innodb 
  | 
 
默认生成一个ibd文件。
1 2 3 4 5 6 
  | create table t2( id INT )engine=innodb partition by RANGE(id)( partition p0 values less than (10), partition p1 values less than (20)); 
  | 
 
t2用了分区,可以看到p0,p1分区信息的ibd文件。
1 
  | t2.frm  t2#P#p0.ibd  t2#P#p1.ibd 
  | 
 
再插入几条数据试试
1 2 3 
  | insert into t2 select 9; insert into t2 select 10; insert into t2 select 15; 
  | 
 
可以通过information_schema.partitions中的有相关分区信息:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 
  | mysql> select * from information_schema.partitions where table_name='t2'\G *************************** 1. row ***************************                 TABLE_CATALOG: def                  TABLE_SCHEMA: test                    TABLE_NAME: t2                PARTITION_NAME: p0             SUBPARTITION_NAME: NULL    PARTITION_ORDINAL_POSITION: 1 SUBPARTITION_ORDINAL_POSITION: NULL              PARTITION_METHOD: RANGE           SUBPARTITION_METHOD: NULL          PARTITION_EXPRESSION: id       SUBPARTITION_EXPRESSION: NULL         PARTITION_DESCRIPTION: 10                    TABLE_ROWS: 1                AVG_ROW_LENGTH: 16384                   DATA_LENGTH: 16384               MAX_DATA_LENGTH: NULL                  INDEX_LENGTH: 0                     DATA_FREE: 0                   CREATE_TIME: 2016-09-26 15:34:03                   UPDATE_TIME: 2016-09-26 15:34:52                    CHECK_TIME: NULL                      CHECKSUM: NULL             PARTITION_COMMENT:                     NODEGROUP: default               TABLESPACE_NAME: NULL *************************** 2. row ***************************                 TABLE_CATALOG: def                  TABLE_SCHEMA: test                    TABLE_NAME: t2                PARTITION_NAME: p1             SUBPARTITION_NAME: NULL    PARTITION_ORDINAL_POSITION: 2 SUBPARTITION_ORDINAL_POSITION: NULL              PARTITION_METHOD: RANGE           SUBPARTITION_METHOD: NULL          PARTITION_EXPRESSION: id       SUBPARTITION_EXPRESSION: NULL         PARTITION_DESCRIPTION: 20                    TABLE_ROWS: 2                AVG_ROW_LENGTH: 8192                   DATA_LENGTH: 16384               MAX_DATA_LENGTH: NULL                  INDEX_LENGTH: 0                     DATA_FREE: 0                   CREATE_TIME: 2016-09-26 15:34:03                   UPDATE_TIME: 2016-09-26 15:34:58                    CHECK_TIME: NULL                      CHECKSUM: NULL             PARTITION_COMMENT:                     NODEGROUP: default               TABLESPACE_NAME: NULL 2 rows in set (0.01 sec) 
  | 
 
但是这存在一个问题
1 2 
  | mysql> insert into t2 select 30; ERROR 1526 (HY000): Table has no partition for value 30 
  | 
 
插入30时即会报错,分区range没有涵盖所以条件,当插入30这个值的时候MYSQL不知道插入到哪。我们可以增加一个小于MAXVALUE的分区来涵盖所以的区域:
1 2 3 4 5 6 7 8 9 10 11 12 
  | alter table t2 add partition (partition p2 values less than MAXVALUE); Query OK, 0 rows affected (0.06 sec) Records: 0  Duplicates: 0  Warnings: 0 mysql> insert into t2 select 30; Query OK, 1 row affected (0.01 sec) Records: 1  Duplicates: 0  Warnings: 0 mysql> insert into t2 select 40; Query OK, 1 row affected (0.00 sec) Records: 1  Duplicates: 0  Warnings: 0 
  | 
 
再插入遍不会报错了。
range分区主要用于日期列的分区。
mysql里面data type 格式:
1 2 3 4 5 6 
  | Data Type	“Zero” Value DATE	'0000-00-00' TIME	'00:00:00' DATETIME	'0000-00-00 00:00:00' TIMESTAMP	'0000-00-00 00:00:00' YEAR	0000 
  | 
 
官网举了个例子
1 2 3 4 5 6 7 8 9 10 11 12 13 14 
  | CREATE TABLE members (     firstname VARCHAR(25) NOT NULL,     lastname VARCHAR(25) NOT NULL,     username VARCHAR(16) NOT NULL,     email VARCHAR(35),     joined DATE NOT NULL ) PARTITION BY RANGE( YEAR(joined) ) (     PARTITION p0 VALUES LESS THAN (1960),     PARTITION p1 VALUES LESS THAN (1970),     PARTITION p2 VALUES LESS THAN (1980),     PARTITION p3 VALUES LESS THAN (1990),     PARTITION p4 VALUES LESS THAN MAXVALUE ); 
  | 
 
按joined日期来的YEAR函数来分区,
把上面的表格简化一下,变成t3:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 
  | CREATE TABLE t3 (     firstname VARCHAR(25) NOT NULL,     joined DATE NOT NULL ) PARTITION BY RANGE( YEAR(joined) ) (     PARTITION p0 VALUES LESS THAN (1960),     PARTITION p1 VALUES LESS THAN (1970),     PARTITION p2 VALUES LESS THAN (1980),     PARTITION p3 VALUES LESS THAN (1990),     PARTITION p4 VALUES LESS THAN MAXVALUE ); mysql> select * from t3; +-----------+------------+ | firstname | joined     | +-----------+------------+ | aaa       | 1955-01-01 | | bbb       | 1965-01-01 | | ccc       | 1966-02-02 | | ccc       | 1967-03-03 | | ccc       | 1977-03-03 | | ccc       | 1987-03-03 | +-----------+------------+ 
  | 
 
然后做select查询
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 
  | mysql> select * from t3 where joined<'1969-12-30'; +-----------+------------+ | firstname | joined     | +-----------+------------+ | aaa       | 1955-01-01 | | bbb       | 1965-01-01 | | ccc       | 1966-02-02 | | ccc       | 1967-03-03 | +-----------+------------+ 4 rows in set (0.00 sec) mysql> explain partitions select * from t3 where joined<'1969-12-30'; +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+ |  1 | SIMPLE      | t3    | p0,p1      | ALL  | NULL          | NULL | NULL    | NULL |    4 |    33.33 | Using where | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+ 1 row in set, 2 warnings (0.00 sec) 
  | 
 
year(‘1955-01-01’)返回1955,表格根据year做了分区,可以看到where子句只扫了p0,p1两个分区,提升了查询效率。这个叫partition pruning,官网地址 http://dev.mysql.com/doc/refman/5.7/en/partitioning-pruning.html
但并不是对所以的函数都能进行partition pruning,只是几个时间的函数,YEAR(),TO_DAYS(),TO_SECONDS(),UNIX_TIMESTAMP()这几个有效。
4.LIST分区
LIST分区跟RANGE特别类似,只是RANGE是连续的值,用less than 表达,LIST分区用VALUES IN(…)来表达,是离散的值。
1 2 3 4 5 6 
  | create table t4( a INT, b INT)engine=innodb partition by LIST(b)( partition p0 values in (1,3,5,7,9), partition p1 values in (2,4,6,8,0)); 
  | 
 
这时候你select * from t4 where b=1;那么只会扫p1分区。
5.HASH分区
hash目的将数据均匀分布到预设的分区中,保证各分区数据量大致一样。HASH分区也和之前的其他分区一样,需要制定分区的函数,以及HASH分区的个数。如下:
1 2 3 4 5 
  | create table t_hash( a INT, b DATETIME) engine=innodb partition by HASH (YEAR(b)) partitions 4; 
  | 
 
上面根据YEAR的结果进行hash分区,不是很科学,最好根据自增字段进行分区,如下:
1 2 3 4 5 6 
  | create table t_hash2( a INT AUTO_INCREMENT, b DATETIME, PRIMARY KEY (a)) engine=innodb partition by HASH(a) partitions 4; 
  | 
 
6.KEY分区
KEY分区跟HASH分区类似,不过HASH分区根据用户提供的函数进行分区,而KEY分区则使用MYSQL内部的散列函数进行分区。
1 2 3 4 5 
  | create table t_key( a INT, b DATETIME) engine=innodb partition by KEY(b) partitions 4; 
  |