filter和neested loop的区别

1. filter介绍

filter的操作是对外表的每一行,都要对内表执行一次扫描。这个处理过程很像nested loop,但它的独特之处在于会维护一个hash table。
以下面这个SQL为例:

SQL> create table t1 as select * from dba_objects;
SQL> create table t2 as select * from dba_objects;
SQL> select object_id from t1 where exists (select 1 from t2 where t1.object_id=t2.object_id);
SQL> exec dbms_stats.gather_table_stats(ownname => 'SYS', tabname => 'T1', estimate_percent =>100, cascade =>true);  
SQL> exec dbms_stats.gather_table_stats(ownname => 'SYS', tabname => 'T2', estimate_percent =>100, cascade =>true);

filter的处理过程如下:
如果t1里取出object_id=1,那么对于t2来说即select 1 from t2 where t2.object_id=1。如果条件满足,那么对于子查询,输入和输出组成一对(key, value),即为(1(t1.object_id),1(常量)),
存储在hash table里,并且由于条件满足,t1.object_id=1被放入结果集。
然后接着从t1取出object_id=2,如果子查询依旧条件满足,那么子查询产生另一个输入和输出,即(2,1),被放入hash table里;并且t1.object_id=2被放入结果集。
接着假设t1里有重复的object_id,例如第三次从t1取出的object_id=2,那么由于我们对于子查询来说,已经有输入输出对(2,1)在hash table里了,
这时侯就不用去再次全表扫描t2了,ORACLE非常聪明地知道object_id=2是结果集。这里,filter和nested loop相比,省去了一次全表扫描t2。
这时侯就有一种情况,如果t1表的object_id字段重复值比较多,也就是distinct值比较少,那么filter的效率是比nested loop高。
测试如下:

2. distinct值比较多的情况

这个时侯,t1.object_id值基本无重复

SQL> select object_id from t1 where exists (select /*+no_unnest*/ 1 from t2 where t1.object_id=t2.object_id*10);

 8595 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 895956251

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     5 |   151K  (1)| 00:30:14 |
|*  1 |  FILTER            |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL| T1   | 86276 |   421K|   336   (1)| 00:00:05 |
|*  3 |   TABLE ACCESS FULL| T2   |     1 |     5 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( EXISTS (SELECT /*+ NO_UNNEST */ 0 FROM "T2" "T2" WHERE
              "T2"."OBJECT_ID"*10=:B1))
   3 - filter("T2"."OBJECT_ID"*10=:B1)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
   96437020  consistent gets
          0  physical reads
          0  redo size
     156741  bytes sent via SQL*Net to client
       6811  bytes received via SQL*Net from client
        574  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       8595  rows processed

这里我们使用了no_unnest的hint,如果不使用,SQL会走hash join,因为会对子查询展开。这里解释一下两个hint,/*+unnest*/和/*+no_unnest*/
unnest是指子查询自动展开,这时侯t1和t2一般会走hash join
no_unnest,是两次否定表示肯定,让子查询不展开,独立的完成。
如果走嵌套查询是什么情况呢:

SQL> select /*+use_nl(t1 t2)*/ t1.object_id from t1,t2 where t1.object_id=t2.object_id*10;

 8595 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1967407726

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 86277 |   842K|    28M  (1)| 96:05:28 |
|   1 |  NESTED LOOPS      |      | 86277 |   842K|    28M  (1)| 96:05:28 |
|   2 |   TABLE ACCESS FULL| T1   | 86276 |   421K|   336   (1)| 00:00:05 |
|*  3 |   TABLE ACCESS FULL| T2   |     1 |     5 |   334   (1)| 00:00:05 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("T1"."OBJECT_ID"="T2"."OBJECT_ID"*10)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
  106553223  consistent gets
          0  physical reads
          0  redo size
     156741  bytes sent via SQL*Net to client
       6811  bytes received via SQL*Net from client
        574  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       8595  rows processed

走filter的一致读是96437020,而走nested loop的一致读是:106553223,成本基本一致,filter的效率稍高。

3. distinct值比较少的情况

前面测试是关联条件t1.object_id distinct值比较多的情况,现在测试一下关联条件t1.object_type distinct值比较少的情况

SQL> select t1.object_id from t1 where exists(select 1 from t2 where t1.object_type=t2.object_type);

 86276 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 895956251

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 86276 |  1179K|   424   (1)| 00:00:06 |
|*  1 |  FILTER            |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL| T1   | 86276 |  1179K|   336   (1)| 00:00:05 |
|*  3 |   TABLE ACCESS FULL| T2   |     2 |    18 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( EXISTS (SELECT 0 FROM "T2" "T2" WHERE
              "T2"."OBJECT_TYPE"=:B1))
   3 - filter("T2"."OBJECT_TYPE"=:B1)


Statistics
----------------------------------------------------------
          6  recursive calls
          0  db block gets
      18244  consistent gets
          0  physical reads
          0  redo size
    1577189  bytes sent via SQL*Net to client
      63780  bytes received via SQL*Net from client
       5753  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      86276  rows processed

可以看到FILTER对于关联条件distinct值比较少的情况,效率还是很高的,因为有hash table的存在,实际上大部分的判断是通过hash table里判断的,而实际的内表扫描是很少的。
而如果走嵌套,情况就会比较差了。不过NESTED LOOPS有个其它连接没有的优点就是:可以先返回已经连接的行,而不必等待所有的连接操作处理完才返回数据,这可以实现快速的响应时间。这是从连接操作中得到第一个匹配行的最快的方法之一,这种类型的连接可以用在需要快速响应的语句中,以响应速度为主要目标。
当然,当前这个SQL语句,最好的连接方式是走HASH JOIN

关于紫砂壶

感悟技术人生
此条目发表在Oracle性能问题分类目录,贴了标签。将固定链接加入收藏夹。