千万级数据分区存储方案测试
场景
基于项目中的业务场景,对数据量大,对时间比较敏感(比如历史数据基本不用,热点数据集中在近期)的数据采用分区存储。 本次测试流程主要是创建测试表、插入千万条数据(本次两千万),查看分区表执行情况。
创建表
CREATE TABLE test ( id BIGINT(20) auto_increment, area_id INT (8) NOT NULL DEFAULT 0, create_time datetime NOT NULL , num INT (8) NOT NULL DEFAULT 0, PRIMARY KEY (id, area_id, create_time) ) ENGINE = INNODB DEFAULT CHARSET = utf8 PARTITION BY RANGE (TO_DAYS(create_time))( PARTITION p0 VALUES LESS THAN (TO_DAYS(2019-11-01)) ENGINE = INNODB, PARTITION p20191101 VALUES LESS THAN (TO_DAYS(2019-11-02)) ENGINE = INNODB, PARTITION p20191102 VALUES LESS THAN (TO_DAYS(2019-11-03)) ENGINE = INNODB, PARTITION p20191103 VALUES LESS THAN (TO_DAYS(2019-11-04)) ENGINE = INNODB, PARTITION p20191104 VALUES LESS THAN (TO_DAYS(2019-11-05)) ENGINE = INNODB, PARTITION p20191105 VALUES LESS THAN (TO_DAYS(2019-11-06)) ENGINE = INNODB, PARTITION p20191106 VALUES LESS THAN (TO_DAYS(2019-11-07)) ENGINE = INNODB, PARTITION p20191107 VALUES LESS THAN (TO_DAYS(2019-11-08)) ENGINE = INNODB, PARTITION p20191108 VALUES LESS THAN (TO_DAYS(2019-11-09)) ENGINE = INNODB, PARTITION p20191109 VALUES LESS THAN (TO_DAYS(2019-11-10)) ENGINE = INNODB );
创建存储过程
DELIMITER // create PROCEDURE insert_test(in num INT, in time varchar(10)) BEGIN DECLARE rowid INT DEFAULT 0; DECLARE create_time datetime; DECLARE area_id INT; SET @exedata = ""; WHILE rowid < num DO SET create_time = (select str_to_date(CONCAT(time, ,LPAD(FLOOR(0 + (RAND() * 23)),2,0),:,LPAD(FLOOR(0 + (RAND() * 59)),2,0),:,LPAD(FLOOR(0 + (RAND() * 59)),2,0)), "%Y-%m-%d %H:%i:%s") from dual); SET area_id = (select FLOOR(910000 + (RAND() * 9))); SET rowid = rowid + 1; IF length(@exedata)>0 THEN SET @exedata = CONCAT(@exedata,,); END IF; SET @exedata=concat(@exedata,"(",area_id,",",create_time,",",rowid,")"); IF rowid%5000=0 THEN SET @exesql =concat("insert into test(area_id,create_time,num) values ", @exedata); prepare stmt from @exesql; execute stmt; DEALLOCATE prepare stmt; SET @exedata = ""; END IF; END WHILE; IF length(@exedata)>0 THEN SET @exesql =concat("insert into test(area_id,create_time,num) values ", @exedata); prepare stmt from @exesql; execute stmt; DEALLOCATE prepare stmt; END IF; END // DELIMITER ;
调用存储过程插入数据
CALL insert_test (1000000, 2019-10-31); CALL insert_test (1000000, 2019-11-01); CALL insert_test (1000000, 2019-11-02); CALL insert_test (1000000, 2019-11-03); CALL insert_test (1000000, 2019-11-04); CALL insert_test (1000000, 2019-11-05); CALL insert_test (1000000, 2019-11-06); CALL insert_test (1000000, 2019-11-07); CALL insert_test (1000000, 2019-11-08); CALL insert_test (1000000, 2019-11-09);
分析
- 两千万数据下,根据日期按小时平均在1s左右;保持单分区数据不变情况下,后期sql性能影响基本不会太大。
- 此执行计划为全表扫描,经测试后发现分区后where已经问题不大,后期可尝试将主键优化(去掉ID)后,执行计划将来到range级别;
- 未分区表,同样数据走索引情况下,多次执行平均在2.5s左右。
select PARTITION_NAME as "分区",TABLE_ROWS as "行数" from information_schema.partitions where table_schema="test" and table_name="test"; explain partitions select * from test where area_id = 910005 and create_time > 2019-11-02 00:00:00 and create_time < 2019-11-02 23:59:59; explain select avg(num) , DATE_FORMAT(create_time,%Y%m%d%H) time from test where area_id = 910005 and create_time > 2019-11-06 00:00:00 and create_time < 2019-11-06 23:59:59 group by time
下一篇:
MySQL启动1053错误之解决方法