Skip to content
Keep Learning Keep Living
Keep Learning Keep Living
Keep Learning Keep Living

How to Compare Two Tables

Mustafa, 2023-10-262023-10-27

Hi,

Few days ago, I joined Chris Saxon’s webinar about “how to compare two tables”. Chris is one of the amazing guy in Oracle community and he shared different methods about how to compare two tables and also, strengths and weakness of the methods. before reading my post, I strongly suggest you to watch it on youtube:

I won’t go into details of the methods because Chris did this already 🙂

Until today, I always used double minus method and I must confess that never thought about “group by” method. I mostly don’t need to find duplicate rows in one table that’s because I use double minus. biggest downside is LOB objects. I must remove lob columns before comparing the data sets. Same thing applies to “group by” method too. I must admit that it was an unnecessary workload to write all the columns and remove LOB columns etc (with SQL Macro, we can save so much time while working).

So, I get a screen shot that main methods that Chris discussed:

As you would guess, group by method is the fastest method because it reads table once and generate data we want. I don’t count “full outer join” because all those null column break it and it would be even harder to write a NVL for every each of them.

So, I want to use group by method anymore to find differences between two tables but as you can see it does not support LOBs because we cannot group data on lob columns. what can we do? A saver is “Hash” functions. we can generate a hash data for our LOB data an use hash data to compare lobs.

Here is an example:

Oracle PL/SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
create table tmp1 as select* from v$sql;
exec dbms_Session.sleep(10); -- wait a little bit to change some data in v$sql
create table tmp2 as select* from v$sql;
 
with t1 as (select 0 c1, 1 c2, sql_Text, sql_id, dbms_crypto.hash(sql_fulltext, 2) sql_fulltext from tmp1),
  t2 as (select 1 c1, 0 c2, sql_Text, sql_id, dbms_crypto.hash(sql_fulltext, 2) sql_fulltext from tmp2)
select sql_Text, sql_id,sql_fulltext
from   (
         select * from t1
         union all
         select * from t2
       )
group by sql_Text, sql_id,sql_fulltext
having sum(c1) <> sum(c2) ;

In my example, I copied V$SQL into two tables with some time delay so that I will have both similar and different rows. I use dbms_crypto.hash function to generate hash value of a LOB column. second parameter 2 means use MD5 to generate hash. if you have more clob data you can use bigger hash algorithms but of course it will take more time:

HASH_MD4 CONSTANT PLS_INTEGER := 1;
HASH_MD5 CONSTANT PLS_INTEGER := 2;
HASH_SH1 CONSTANT PLS_INTEGER := 3;
HASH_SH256 CONSTANT PLS_INTEGER := 4;
HASH_SH384 CONSTANT PLS_INTEGER := 5;
HASH_SH512 CONSTANT PLS_INTEGER := 6;

you can use any hash algorithms.

Of course, again, it is not easy to find all lob columns then add hash function around them and then compare data but thanks to SQL Macro, we can automate this:

Oracle PL/SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
create or replace function sf_get_row_diff(p_table dbms_tf.table_t, p_table2 dbms_tf.table_t) return clob sql_macro as
  v_col clob;
  v_group_col clob;
begin
 
  for i in 1..p_table.column.count loop
    if p_table.column(i).description.type in (dbms_tf.TYPE_clob, dbms_tf.TYPE_blob) then
      v_col := v_col  || ',dbms_crypto.hash('||p_table.column(i).description.name || ',2) as ' ||p_table.column(i).description.name;
    else
      v_col := v_col  || ','||p_table.column(i).description.name;
    end if;
 
    v_group_col := v_group_col || ','||p_table.column(i).description.name;
  end loop;
  v_col := ltrim(v_col , ',');
  v_group_col := ltrim(v_group_col, ',');
 
  return 'with t1 as (select 0 c1, 1 c2, ' || v_col ||' from p_Table),
               t2 as (select 1 c1, 0 c2, ' || v_col ||' from p_Table2)
          select ' || v_group_col ||'
          from   (
                    select * from t1
                    union all
                    select * from t2
                 )
          group by ' || v_group_col ||'
          having sum(c1) <> sum(c2)';
end;
/

This sql macro function will generate columns automatically and also add hash function for LOB columns. All we need to do is give two tables / common table expressions (like with clause aliases). on this sample code, I assume that both data sets / tables has same columns with same data type. before generating SQL statemen, I loop through the columns and if there are any CLOB or BLOB then I add “dbms_crypto.hash” function around them. I also use actual lob column name as alias to it.

While generating group by columns, I cannot use dbms_crypto statement because I use it in a subselect. so I also have a second column list (v_group_col) which only stores column names and I use them in group by and select list.

Oracle PL/SQL
1
2
3
-- output suppressed
select * from sf_get_row_diff(tmp1, tmp2);
...

Now I have all the different rows between two tables. As I said you can use common table expression too:

Oracle PL/SQL
1
2
3
4
5
-- output suppressed
with t1 as (select a,b,c from mytable),
     t1 as (select a,b,c from yourtable),
select * from sf_get_row_diff(t1,t2);
...

even if it is fast, comparing lob columns could be very slow because big lobs means more time to generate hash and many rows means much much more time. so we might want to exclude LOB columns from comparison. A simple parameter could help:

Oracle PL/SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
create or replace function sf_get_row_diff(p_table dbms_tf.table_t, p_table2 dbms_tf.table_t, p_compare_lobs number := 1) return clob sql_macro as
  v_col clob;
  v_group_col clob;
begin
 
  for i in 1..p_table.column.count loop
    if p_table.column(i).description.type in (dbms_tf.TYPE_clob, dbms_tf.TYPE_blob)  then
      if p_compare_lobs = 1 then
        v_col := v_col  || ',dbms_crypto.hash('||p_table.column(i).description.name || ',2) as ' ||p_table.column(i).description.name;
        v_group_col := v_group_col || ','||p_table.column(i).description.name;
      end if;
    else
      v_col := v_col  || ','||p_table.column(i).description.name;
      v_group_col := v_group_col || ','||p_table.column(i).description.name;
    end if;
 
  end loop;
  v_col := ltrim(v_col , ',');
  v_group_col := ltrim(v_group_col, ',');
 
  return 'with t1 as (select 0 c1, 1 c2, ' || v_col ||' from p_Table),
               t2 as (select 1 c1, 0 c2, ' || v_col ||' from p_Table2)
          select ' || v_group_col ||'
          from   (
                    select * from t1
                    union all
                    select * from t2
                 )
          group by ' || v_group_col ||'
          having sum(c1) <> sum(c2)';
end;
/

if p_compare_lobs  columns is 1 (which is default) then it will add lob columns into compare and if anything else, it will not add lob columns into compare. you can even understand this from the timing:

Oracle PL/SQL
1
2
3
4
5
6
7
select * from sf_get_row_diff(tmp1, tmp2, 1);
...
Elapsed: 00:00:06.663
 
select * from sf_get_row_diff(tmp1, tmp2, 0);
...
Elapsed: 00:00:02.145

So, from now on, this function is more than enough to compare two tables for me. We can also add an excluded column list too as a forth parameter, so we can remove those columns from compare like update_date or create_user columns which mostly contain metadata and not actual business data. I won’t create an example for it too but you can see an example on one of my previous posts: https://mustafakalayci.me/2020/11/02/which-column-has-changed-in-every-row/

Many thanks to Chris for bringing up the topic and provide effective solutions. I learned a lot and added something new to me.

Wish you all healthy and happy days.

EDIT:

Ok, let’s not take the easy way out 🙂 here is another example based on my first design:

Oracle PL/SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
with m1 as (select 1 a, 10 b, sysdate -2000 create_date from dual
            union all
            select 2 a, 20 b, sysdate -2000 create_date from dual ),
     m2 as (select 1 a, 10 b, sysdate +2000 create_date from dual
            union all
            select 2 a, 21 b, sysdate +2000 create_date from dual
            union all
            select 3 a, 30 b, sysdate +2000 create_date from dual )
select *
from sf_get_row_diff(m1, m2, p_compare_lobs => 1);
 
         A          B CREATE_DATE        
---------- ---------- -------------------
         1         10 2029-04-18 11:27:58
         1         10 2018-05-06 11:27:58
         2         20 2018-05-06 11:27:58
         2         21 2029-04-18 11:27:58
         3         30 2029-04-18 11:27:58

so, I want to compare two tables (I used CTE (common table expression) here) m1 and m2 and all rows are different because they have different create_date values. Sometimes we may not want to compare “all” columns for example create_Date might be different for the rows and we might want to exclude it. so how can I adapt this into my sql macro? Simply we can pass the columns that we don’t want to use. First we need to add a new parameter for excluded columns and then remove those columns from select/group by etc list.

Oracle PL/SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
create or replace function sf_get_row_diff(p_table dbms_tf.table_t, p_table2 dbms_tf.table_t,
                                           p_compare_lobs number := 1,
                                           p_excluded_columns dbms_tf.columns_t := null) return clob sql_macro as
  v_col clob;
  v_group_col clob;
begin
 
  for i in 1..p_table.column.count loop
    continue when p_table.column(i).description.name member of p_excluded_columns;
    
    if p_table.column(i).description.type in (dbms_tf.TYPE_clob, dbms_tf.TYPE_blob)  then
      if p_compare_lobs = 1 then
        v_col := v_col  || ',dbms_crypto.hash('||p_table.column(i).description.name || ',2) as ' ||p_table.column(i).description.name;
        v_group_col := v_group_col || ','||p_table.column(i).description.name;
      end if;
    else
      v_col := v_col  || ','||p_table.column(i).description.name;
      v_group_col := v_group_col || ','||p_table.column(i).description.name;
    end if;
 
  end loop;
  v_col := ltrim(v_col , ',');
  v_group_col := ltrim(v_group_col, ',');
 
  return 'with t1 as (select 0 c1, 1 c2, ' || v_col ||' from p_Table),
               t2 as (select 1 c1, 0 c2, ' || v_col ||' from p_Table2)
          select ' || v_group_col ||'
          from   (
                    select * from t1
                    union all
                    select * from t2
                 )
          group by ' || v_group_col ||'
          having sum(c1) <> sum(c2)';
end;
/

I wrote whole code again but I only added 2 new lines, 3th and 10th lines. p_excluded_columns parameter will be the list of columns that we want to exclude from comparison. In the for loop which we generate the group by columns, I continue the loop if the current column is a member of columns array. let’s run the query again with excluded columns:

Oracle PL/SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
with m1 as (select 1 a, 10 b, sysdate -2000 create_date from dual
            union all
            select 2 a, 20 b, sysdate -2000 create_date from dual ),
     m2 as (select 1 a, 10 b, sysdate +2000 create_date from dual
            union all
            select 2 a, 21 b, sysdate +2000 create_date from dual
            union all
            select 3 a, 30 b, sysdate +2000 create_date from dual )
select *
from sf_get_row_diff(m1, m2, p_compare_lobs => 1, p_excluded_columns=>columns(create_date));
 
         A          B
---------- ----------
         2         20
         3         30
         2         21
 
with m1 as (select 1 a, 10 b, sysdate -2000 create_date from dual
            union all
            select 2 a, 20 b, sysdate -2000 create_date from dual ),
     m2 as (select 1 a, 10 b, sysdate +2000 create_date from dual
            union all
            select 2 a, 21 b, sysdate +2000 create_date from dual
            union all
            select 3 a, 30 b, sysdate +2000 create_date from dual )
select *
from sf_get_row_diff(m1, m2, p_compare_lobs => 1, p_excluded_columns=>columns(create_date, b));
 
         A
----------
         3

as you see, create_Date column is not included into comparison and row with a value 1 is not in the result because it is same on both tables when create_Date is excluded.

quite nice but for the rows with a value 2, which one is on which table? for few rows it is easy to identify but if there are over a million rows in the tables and thousands are different, then it will be hard to understand which table they are in. Can we add a new column to show table? of course, we just need to change our sql template in return command.

Oracle PL/SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
create or replace function sf_get_row_diff(p_table dbms_tf.table_t, p_table2 dbms_tf.table_t,
                                           p_compare_lobs number := 1,
                                           p_excluded_columns dbms_tf.columns_t := null) return clob sql_macro as
  v_col clob;
  v_group_col clob;
begin
 
  for i in 1..p_table.column.count loop
    continue when p_table.column(i).description.name member of p_excluded_columns;
    
    if p_table.column(i).description.type in (dbms_tf.TYPE_clob, dbms_tf.TYPE_blob)  then
      if p_compare_lobs = 1 then
        v_col := v_col  || ',dbms_crypto.hash('||p_table.column(i).description.name || ',2) as ' ||p_table.column(i).description.name;
        v_group_col := v_group_col || ','||p_table.column(i).description.name;
      end if;
    else
      v_col := v_col  || ','||p_table.column(i).description.name;
      v_group_col := v_group_col || ','||p_table.column(i).description.name;
    end if;
 
  end loop;
  v_col := ltrim(v_col , ',');
  v_group_col := ltrim(v_group_col, ',');
 
  return 'with t1 as (select 0 c1, 1 c2, ''First Table'' as tbl,' || v_col ||' from p_Table),
               t2 as (select 1 c1, 0 c2, ''Second Table'' as tbl,' || v_col ||' from p_Table2)
          select Min(Tbl) Which_Table, ' || v_group_col ||'
          from   (
                    select * from t1
                    union all
                    select * from t2
                 )
          group by ' || v_group_col ||'
          having sum(c1) <> sum(c2)';
end;
/
 
with m1 as (select 1 a, 10 b, sysdate -2000 create_date from dual
            union all
            select 2 a, 20 b, sysdate -2000 create_date from dual ),
     m2 as (select 1 a, 10 b, sysdate +2000 create_date from dual
            union all
            select 2 a, 21 b, sysdate +2000 create_date from dual
            union all
            select 3 a, 30 b, sysdate +2000 create_date from dual )
select *
from sf_get_row_diff(m1, m2, p_compare_lobs => 1, p_excluded_columns=>columns(create_date));
 
WHICH_TABLE           A          B
------------ ---------- ----------
First Table           2         20
Second Table          3         30
Second Table          2         21

I added a new column, in with clauses to be able to identify the tables. Now, we can see the which table that row is in on the first column. You may wonder why I didn’t write the actual table name on the column. because I couldn’t 🙂 varchar2 data is restricted in sql macro so I couldn’t read the table name from “p_table” parameter, so I decided to go with pre defined value.

one more thing to add, what would happen if we exclude all the columns? we will get an error about wrong sql statement of course because select list, group by list will be empty. we can add a small check after the loop:

Oracle PL/SQL
1
2
3
4
5
6
7
8
9
    ...
    end loop;
 
    if v_group_col is null then
      return 'select ''there are no columns left. choose at least one column'' from dual';
    end if;
    
    v_col := ...
    ...

here is the result if we exclude all columns from now on:

Oracle PL/SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
with m1 as (select 1 a, 10 b, 9999 c from dual
            union all
            select 2 a, 20 b, 9999 c from dual ),
     m2 as (select 1 a, 10 b, 5555 c from dual
            union all
            select 2 a, 21 b, 5555 c from dual
            union all
            select 3 a, 30 b, 5555 c from dual )
select *
from sf_get_row_diff(m1, m2, p_compare_lobs => 1, p_excluded_columns => columns(a,c,b));
 
'THEREARENOCOLUMNSLEFT.CHOOSEATLEASTONECOLUMN'
-----------------------------------------------------
there are no columns left. choose at least one column

thanks for reading, any comments and questions are welcomed.

19c 21c 23ai Administration Development Performance compare two tablesdifference between two tablessql macrosql_macro

Post navigation

Previous post
Next post

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • How to call HTTPS Url Without SSL Wallet in 19c
  • Is Table Unnecessary for Pipelined Function
  • Password Rollover Time in Seconds
  • PDB Syncronization Issue
  • How to limit DB Link Connection Timeout
  • Cloud Base Database Service
  • 29th of February and interval
  • How to Copy Local Files to Oracle Object Storage

Recent Comments

  • Mustafa on How to call HTTPS Url Without SSL Wallet in 19c
  • Накрутка авито on How to call HTTPS Url Without SSL Wallet in 19c
  • Mustafa on Cloud Base Database Service
  • Raja on Cloud Base Database Service
  • Mustafa on Refreshable PDB and ORA-17627: ORA-12578: TNS:wallet open failed

Categories

  • 11g
  • 12c
  • 18c
  • 19c
  • 21c
  • 23ai
  • Administration
  • Cloud
  • Compression
  • Development
  • Materialized View
  • Multi-tenant
  • Performance
  • Security
  • SQL / PLSQL
  • Uncategorized
  • Undocumented
  • Useful Scripts

Archives

  • April 2025
  • November 2024
  • July 2024
  • April 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • October 2021
  • September 2021
  • August 2021
  • April 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • July 2019
  • June 2019
  • May 2019
  • March 2019
  • February 2019
  • June 2018

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

RSS Follow This Blog

  • How to call HTTPS Url Without SSL Wallet in 19c
  • Is Table Unnecessary for Pipelined Function
  • Password Rollover Time in Seconds
  • PDB Syncronization Issue
  • How to limit DB Link Connection Timeout
  • Cloud Base Database Service
  • 29th of February and interval
  • How to Copy Local Files to Oracle Object Storage
  • Guid vs Sequences
  • Refreshable PDB and ORA-17627: ORA-12578: TNS:wallet open failed

Archives

  • April 2025
  • November 2024
  • July 2024
  • April 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • October 2021
  • September 2021
  • August 2021
  • April 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • July 2019
  • June 2019
  • May 2019
  • March 2019
  • February 2019
  • June 2018

RSS Follow This Blog

  • How to call HTTPS Url Without SSL Wallet in 19c
  • Is Table Unnecessary for Pipelined Function
  • Password Rollover Time in Seconds
  • PDB Syncronization Issue
  • How to limit DB Link Connection Timeout
  • Cloud Base Database Service
  • 29th of February and interval
  • How to Copy Local Files to Oracle Object Storage
  • Guid vs Sequences
  • Refreshable PDB and ORA-17627: ORA-12578: TNS:wallet open failed
RSS Error: A feed could not be found at `http://www.mywebsite.com/feed/`; the status code is `200` and content-type is `text/html; charset=UTF-8`
©2025 Keep Learning Keep Living | WordPress Theme by SuperbThemes