问题说明:
业务人员反馈电信跑批速度慢,单条SQL耗时2s左右。
而联通和移动跑批速度正常,但条SQL耗时不超过0.2s。
环境说明:
DB:Oracle 11.2.0.4.0 RAC
OS:AIX 7.1
问题分析:
快、慢SQL文本如下:
慢SQL:
--电信1.6-1.8秒
select msgserial, objaddr, endtime, retrytimes, srvtype, msgcont, report
from (select msgserial,
objaddr,
endtime,
retrytimes,
srvtype,
msgcont,
report
from CJCTABXXXX
where msgstat = '0'
and srcaddr = '11111'
and xxxxxno = '2'
and lastsndtime < to_char(sysdate, 'yyyymmddHH24MISS')
order by msglevel, rpttime)
where rownum <= 40;
快SQL:
--联通0.1-0.2秒
select msgserial, objaddr, endtime, retrytimes, srvtype, msgcont, report
from (select msgserial,
objaddr,
endtime,
retrytimes,
srvtype,
msgcont,
report
from CJCTABXXXX
where msgstat = '0'
and srcaddr = '11111'
and xxxxxno = '1'
and lastsndtime < to_char(sysdate, 'yyyymmddHH24MISS')
order by msglevel, rpttime)
where rownum <= 40;
可以看到,执行快慢两个SQL,只有xxxxxno条件值取值不同,其他条件相同。
其中慢的SQL,条件为xxxxxno = ‘2’,快的SQL,条件为xxxxxno = ‘1’。
那么猜测,一定是慢的SQL条件为xxxxxno = '2’时结果集比快的SQL大,所有速度才慢的,需要检查xxxxxno不同取值下结果集大小。
—数据量: 74418(慢SQL)
select count(*)
from (select msgserial,
objaddr,
endtime,
retrytimes,
srvtype,
msgcont,
report
from CJCTABXXXX
where msgstat = '0'
and srcaddr = '11111'
and xxxxxno = '2'
and lastsndtime < to_char(sysdate, 'yyyymmddHH24MISS')
order by msglevel, rpttime)
—数据量:411628(快SQL)
select count(*)
from (select msgserial,
objaddr,
endtime,
retrytimes,
srvtype,
msgcont,
report
from CJCTABXXXX
where msgstat = '0'
and srcaddr = '11111'
and xxxxxno = '1'
and lastsndtime < to_char(sysdate, 'yyyymmddHH24MISS')
order by msglevel, rpttime)
疑惑1?
慢SQL的结果集有74418,远小于快SQL的结果集411628,为什么慢SQL结果集小,逻辑读反而很大呢?
难道是慢的SQL执行计划选错了,导致虽然结果集小,但是cost很高?
分别生成快、慢两个SQL执行计划:
慢SQL执行计划如下:
Elapsed: 00:00:01.98
Execution Plan
----------------------------------------------------------
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 37 | 21904 | 1424 (1)|
| 1 | COUNT STOPKEY | | | | |
| 2 | VIEW | | 37 | 21904 | 1424 (1)|
| 3 | TABLE ACCESS BY INDEX ROWID| CJCTABXXXX | 37 | 8658 | 1424 (1)|
| 4 | INDEX RANGE SCAN | ICJCTABXXXX_4 | 1757 | | 49 (0)|
-----------------------------------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
682865 consistent gets
0 physical reads
0 redo size
6636 bytes sent via SQL*Net to client
542 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
40 rows processed
快SQL执行计划如下:
Execution Plan
----------------------------------------------------------
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 40 | 23680 | 310 (0)|
| 1 | COUNT STOPKEY | | | | |
| 2 | VIEW | | 40 | 23680 | 310 (0)|
| 3 | TABLE ACCESS BY INDEX ROWID| CJCTABXXXX | 40 | 9360 | 310 (0)|
| 4 | INDEX RANGE SCAN | ICJCTABXXXX_4 | 1757 | | 13 (0)|
-----------------------------------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
16850 consistent gets
0 physical reads
0 redo size
5866 bytes sent via SQL*Net to client
542 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
40 rows processed
可以看到执行计划也完全一样,都是走的ICJCTABXXXX_4索引范围扫描,
慢的SQL逻辑读consistent gets(682865)远高于快的SQL逻辑读consistent gets(16850)。
疑惑2?
那么为什么会出现执行计划相同,结果集小的逻辑读反而更大呢?
问题原因:
让我们在认真看下快慢两条SQL:
慢SQL:
--电信1.6-1.8秒
select msgserial, objaddr, endtime, retrytimes, srvtype, msgcont, report
from (select msgserial,
objaddr,
endtime,
retrytimes,
srvtype,
msgcont,
report
from CJCTABXXXX
where msgstat = '0'
and srcaddr = '11111'
and xxxxxno = '2'
and lastsndtime < to_char(sysdate, 'yyyymmddHH24MISS')
order by msglevel, rpttime)
where rownum <= 40;
快SQL:
--联通0.1-0.2秒
select msgserial, objaddr, endtime, retrytimes, srvtype, msgcont, report
from (select msgserial,
objaddr,
endtime,
retrytimes,
srvtype,
msgcont,
report
from CJCTABXXXX
where msgstat = '0'
and srcaddr = '11111'
and xxxxxno = '1'
and lastsndtime < to_char(sysdate, 'yyyymmddHH24MISS')
order by msglevel, rpttime)
where rownum <= 40;
在不考虑xxxxxno条件时,两个SQL文本是完全一样的,分别是对msglevel, rpttime排序后取前40条记录。
那么出现执行计划相同,结果集小的逻辑读反而更大的原因很有可能是因为:
1.SQL并没有完全执行完order by排序完、回表完然后才去执行rownum<=40操作。
而是边对部分排序和回表边取出部分rownum<=40的值。
2.检查order by msglevel, rpttime排序的两个列都包含在同一个索引里,
也就是这两列排序操作不需要访问数据块,只需范围索引块,
然后回表返回其他列的值,同时边回表,边返回部分rownum <= 40的值。
3.那么为什么xxxxxno过滤性差,结果集多的逻辑读反而少了,这是因为结果集越大,越容易找到前40条符合条件的值。
例如:
有一个大纸箱,里面混合装了100个球,其中红色球80个,篮色球10个,粉色球10个,
请问在纸箱里分别取出10个红色球、10个篮色球、10个粉色球,哪个速度更快呢?
显然是取出10个红色球速度更快,因为红色球数量最多,更容易找到10个红色球。
本次案例也是类似的道理,因为xxxxxno条件过滤性好的结果集小,想取出前40个值时,需要扫描更多的块,索引逻辑读更高,速度更慢。
解决方案:
可以考虑调整组合索引,将xxxxxno列加入到组合索引中,具体方案还需要充分测试后在使用。
#####chenjuchao 2021-08-15 21:05#####
欢迎关注我的公众号《IT小Chen》