秦悦明的运维笔记

mysql分区

1.确认已安装

很多人动不动就想到要用分布式数据库,其实单库的性能随着MYSQL版本的提升越来越高,单表1000w数据应用优化的好的话用单库是完全没问题的。一个更优的做法是做表分区。也就是mysql partition。本文大量参考官网和

这里以mysql5.7.15为例。确认有partition插件。

1
2
mysql> SHOW PLUGINS;
partition | ACTIVE | STORAGE ENGINE | NULL | GPL

有partition就ok。
另外5.7有所不同的是mysql_install_db以后,会生成一个随便密码,在~/.mysql_secret,之后用

1
2
3
SET PASSWORD FOR 'root'@'localhost' = PASSWORD('new_password');
或者:
ALTER USER user IDENTIFIED BY 'new_password';

重置密码,然后就可以开始使用了。
5.7给我的感觉很不错。repo源很快,mysql被oracle买了以后运作的很规范,文档也是全的不行。

2.分区的类型

介绍常用的4中分区(当然不止这4个):

  1. RANGE分区
  2. LIST分区
  3. HASH分区
  4. KEY分区

3.RANGE分区

很好理解,就是安装区域来分。

这里先看不用分区,

1
2
3
create table t1(
id INT
)engine=innodb

默认生成一个ibd文件。

1
t1.frm t1.ibd
1
2
3
4
5
6
create table t2(
id INT
)engine=innodb
partition by RANGE(id)(
partition p0 values less than (10),
partition p1 values less than (20));

t2用了分区,可以看到p0,p1分区信息的ibd文件。

1
t2.frm t2#P#p0.ibd t2#P#p1.ibd

再插入几条数据试试

1
2
3
insert into t2 select 9;
insert into t2 select 10;
insert into t2 select 15;

可以通过information_schema.partitions中的有相关分区信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
mysql> select * from information_schema.partitions where table_name='t2'\G
*************************** 1. row ***************************
TABLE_CATALOG: def
TABLE_SCHEMA: test
TABLE_NAME: t2
PARTITION_NAME: p0
SUBPARTITION_NAME: NULL
PARTITION_ORDINAL_POSITION: 1
SUBPARTITION_ORDINAL_POSITION: NULL
PARTITION_METHOD: RANGE
SUBPARTITION_METHOD: NULL
PARTITION_EXPRESSION: id
SUBPARTITION_EXPRESSION: NULL
PARTITION_DESCRIPTION: 10
TABLE_ROWS: 1
AVG_ROW_LENGTH: 16384
DATA_LENGTH: 16384
MAX_DATA_LENGTH: NULL
INDEX_LENGTH: 0
DATA_FREE: 0
CREATE_TIME: 2016-09-26 15:34:03
UPDATE_TIME: 2016-09-26 15:34:52
CHECK_TIME: NULL
CHECKSUM: NULL
PARTITION_COMMENT:
NODEGROUP: default
TABLESPACE_NAME: NULL
*************************** 2. row ***************************
TABLE_CATALOG: def
TABLE_SCHEMA: test
TABLE_NAME: t2
PARTITION_NAME: p1
SUBPARTITION_NAME: NULL
PARTITION_ORDINAL_POSITION: 2
SUBPARTITION_ORDINAL_POSITION: NULL
PARTITION_METHOD: RANGE
SUBPARTITION_METHOD: NULL
PARTITION_EXPRESSION: id
SUBPARTITION_EXPRESSION: NULL
PARTITION_DESCRIPTION: 20
TABLE_ROWS: 2
AVG_ROW_LENGTH: 8192
DATA_LENGTH: 16384
MAX_DATA_LENGTH: NULL
INDEX_LENGTH: 0
DATA_FREE: 0
CREATE_TIME: 2016-09-26 15:34:03
UPDATE_TIME: 2016-09-26 15:34:58
CHECK_TIME: NULL
CHECKSUM: NULL
PARTITION_COMMENT:
NODEGROUP: default
TABLESPACE_NAME: NULL
2 rows in set (0.01 sec)

但是这存在一个问题

1
2
mysql> insert into t2 select 30;
ERROR 1526 (HY000): Table has no partition for value 30

插入30时即会报错,分区range没有涵盖所以条件,当插入30这个值的时候MYSQL不知道插入到哪。我们可以增加一个小于MAXVALUE的分区来涵盖所以的区域:

1
2
3
4
5
6
7
8
9
10
11
12
alter table t2
add partition (partition p2 values less than MAXVALUE);
Query OK, 0 rows affected (0.06 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> insert into t2 select 30;
Query OK, 1 row affected (0.01 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> insert into t2 select 40;
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0

再插入遍不会报错了。

range分区主要用于日期列的分区。
mysql里面data type 格式:

1
2
3
4
5
6
Data Type “Zero” Value
DATE '0000-00-00'
TIME '00:00:00'
DATETIME '0000-00-00 00:00:00'
TIMESTAMP '0000-00-00 00:00:00'
YEAR 0000

官网举了个例子

1
2
3
4
5
6
7
8
9
10
11
12
13
14
CREATE TABLE members (
firstname VARCHAR(25) NOT NULL,
lastname VARCHAR(25) NOT NULL,
username VARCHAR(16) NOT NULL,
email VARCHAR(35),
joined DATE NOT NULL
)
PARTITION BY RANGE( YEAR(joined) ) (
PARTITION p0 VALUES LESS THAN (1960),
PARTITION p1 VALUES LESS THAN (1970),
PARTITION p2 VALUES LESS THAN (1980),
PARTITION p3 VALUES LESS THAN (1990),
PARTITION p4 VALUES LESS THAN MAXVALUE
);

按joined日期来的YEAR函数来分区,

把上面的表格简化一下,变成t3:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
CREATE TABLE t3 (
firstname VARCHAR(25) NOT NULL,
joined DATE NOT NULL
)
PARTITION BY RANGE( YEAR(joined) ) (
PARTITION p0 VALUES LESS THAN (1960),
PARTITION p1 VALUES LESS THAN (1970),
PARTITION p2 VALUES LESS THAN (1980),
PARTITION p3 VALUES LESS THAN (1990),
PARTITION p4 VALUES LESS THAN MAXVALUE
);
mysql> select * from t3;
+-----------+------------+
| firstname | joined |
+-----------+------------+
| aaa | 1955-01-01 |
| bbb | 1965-01-01 |
| ccc | 1966-02-02 |
| ccc | 1967-03-03 |
| ccc | 1977-03-03 |
| ccc | 1987-03-03 |
+-----------+------------+

然后做select查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
mysql> select * from t3 where joined<'1969-12-30';
+-----------+------------+
| firstname | joined |
+-----------+------------+
| aaa | 1955-01-01 |
| bbb | 1965-01-01 |
| ccc | 1966-02-02 |
| ccc | 1967-03-03 |
+-----------+------------+
4 rows in set (0.00 sec)
mysql> explain partitions select * from t3 where joined<'1969-12-30';
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | t3 | p0,p1 | ALL | NULL | NULL | NULL | NULL | 4 | 33.33 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 2 warnings (0.00 sec)

year(‘1955-01-01’)返回1955,表格根据year做了分区,可以看到where子句只扫了p0,p1两个分区,提升了查询效率。这个叫partition pruning,官网地址 http://dev.mysql.com/doc/refman/5.7/en/partitioning-pruning.html

但并不是对所以的函数都能进行partition pruning,只是几个时间的函数,YEAR(),TO_DAYS(),TO_SECONDS(),UNIX_TIMESTAMP()这几个有效。

4.LIST分区

LIST分区跟RANGE特别类似,只是RANGE是连续的值,用less than 表达,LIST分区用VALUES IN(…)来表达,是离散的值。

1
2
3
4
5
6
create table t4(
a INT,
b INT)engine=innodb
partition by LIST(b)(
partition p0 values in (1,3,5,7,9),
partition p1 values in (2,4,6,8,0));

这时候你select * from t4 where b=1;那么只会扫p1分区。

5.HASH分区

hash目的将数据均匀分布到预设的分区中,保证各分区数据量大致一样。HASH分区也和之前的其他分区一样,需要制定分区的函数,以及HASH分区的个数。如下:

1
2
3
4
5
create table t_hash(
a INT,
b DATETIME) engine=innodb
partition by HASH (YEAR(b))
partitions 4;

上面根据YEAR的结果进行hash分区,不是很科学,最好根据自增字段进行分区,如下:

1
2
3
4
5
6
create table t_hash2(
a INT AUTO_INCREMENT,
b DATETIME,
PRIMARY KEY (a)) engine=innodb
partition by HASH(a)
partitions 4;

6.KEY分区

KEY分区跟HASH分区类似,不过HASH分区根据用户提供的函数进行分区,而KEY分区则使用MYSQL内部的散列函数进行分区。

1
2
3
4
5
create table t_key(
a INT,
b DATETIME) engine=innodb
partition by KEY(b)
partitions 4;