PostgreSQL 慢查询SQL跟踪操作及解决方案
生产案例 随着数据量的增加,数据库cpu占用爆炸,直接100%导致服务崩溃。 原因居然是一个简单的 update 语句。
赶紧定位问题 简单流程如下:
- 定位问题库 > 读库 or 写库
- 查看连接数。CPU利用率到达100%,首先怀疑,是不是业务高峰活跃连接陡增,而数据库预留的资源不足造成的结果。我们需要查看下,问题发生时,活跃的连接数是否比平时多很多。
- 排除连接数激增与读写库挂掉的可能。所以只能是慢sql占用资源
- 定位是否频繁读写造成
select * from pg_stat_user_tables where n_live_tup > 100000 and seq_scan > 0 order by seq_tup_read desc limit 10;
有几张分区表被疯狂读
- 定位慢sql https://www.jb51.net/article/204841.htm 保姆级教程,如何定位问题SQL 核心
select datname, usename, client_addr, application_name, state, backend_start, xact_start, xact_stay, query_start, query_stay, replace( query, chr(10), ) as query from ( select pgsa.datname as datname, pgsa.usename as usename, pgsa.client_addr client_addr, pgsa.application_name as application_name, pgsa.state as state, pgsa.backend_start as backend_start, pgsa.xact_start as xact_start, extract( epoch from (now() - pgsa.xact_start) ) as xact_stay, pgsa.query_start as query_start, extract( epoch from (now() - pgsa.query_start) ) as query_stay, pgsa.query as query from pg_stat_activity as pgsa where 1=1 and pgsa.state != idle and pgsa.state != idle in transaction and pgsa.state != idle in transaction (aborted) ) idleconnections order by query_stay desc limit 10;
- 慢sql是一句update
update table set xxx = yyy where id = xxx
- 查看执行计划
- 原来这个语句在疯狂扫描全表表,那就是分表分区没有命中,他在做全表扫描
- 修改sql 用分区键直接指定要update 的表
update table set xxx = yyy where id = xxx and 分区1=xxx and 分区2 = xxx
- 看效果 before after 监控 ps 想知道分表分区怎么做的出门左转 https://blog..net/weixin_45893488/article/details/104844933