How to Compare Two Tables Mustafa, 2023-10-262023-10-27 Hi, Few days ago, I joined Chris Saxon’s webinar about “how to compare two tables”. Chris is one of the amazing guy in Oracle community and he shared different methods about how to compare two tables and also, strengths and weakness of the methods. before reading my post, I strongly suggest you to watch it on youtube: I won’t go into details of the methods because Chris did this already 🙂 Until today, I always used double minus method and I must confess that never thought about “group by” method. I mostly don’t need to find duplicate rows in one table that’s because I use double minus. biggest downside is LOB objects. I must remove lob columns before comparing the data sets. Same thing applies to “group by” method too. I must admit that it was an unnecessary workload to write all the columns and remove LOB columns etc (with SQL Macro, we can save so much time while working). So, I get a screen shot that main methods that Chris discussed: As you would guess, group by method is the fastest method because it reads table once and generate data we want. I don’t count “full outer join” because all those null column break it and it would be even harder to write a NVL for every each of them. So, I want to use group by method anymore to find differences between two tables but as you can see it does not support LOBs because we cannot group data on lob columns. what can we do? A saver is “Hash” functions. we can generate a hash data for our LOB data an use hash data to compare lobs. Here is an example: Oracle PL/SQL create table tmp1 as select* from v$sql; exec dbms_Session.sleep(10); -- wait a little bit to change some data in v$sql create table tmp2 as select* from v$sql; with t1 as (select 0 c1, 1 c2, sql_Text, sql_id, dbms_crypto.hash(sql_fulltext, 2) sql_fulltext from tmp1), t2 as (select 1 c1, 0 c2, sql_Text, sql_id, dbms_crypto.hash(sql_fulltext, 2) sql_fulltext from tmp2) select sql_Text, sql_id,sql_fulltext from ( select * from t1 union all select * from t2 ) group by sql_Text, sql_id,sql_fulltext having sum(c1) <> sum(c2) ; 1234567891011121314 create table tmp1 as select* from v$sql;exec dbms_Session.sleep(10); -- wait a little bit to change some data in v$sqlcreate table tmp2 as select* from v$sql; with t1 as (select 0 c1, 1 c2, sql_Text, sql_id, dbms_crypto.hash(sql_fulltext, 2) sql_fulltext from tmp1), t2 as (select 1 c1, 0 c2, sql_Text, sql_id, dbms_crypto.hash(sql_fulltext, 2) sql_fulltext from tmp2)select sql_Text, sql_id,sql_fulltextfrom ( select * from t1 union all select * from t2 )group by sql_Text, sql_id,sql_fulltexthaving sum(c1) <> sum(c2) ; In my example, I copied V$SQL into two tables with some time delay so that I will have both similar and different rows. I use dbms_crypto.hash function to generate hash value of a LOB column. second parameter 2 means use MD5 to generate hash. if you have more clob data you can use bigger hash algorithms but of course it will take more time: HASH_MD4 CONSTANT PLS_INTEGER := 1; HASH_MD5 CONSTANT PLS_INTEGER := 2; HASH_SH1 CONSTANT PLS_INTEGER := 3; HASH_SH256 CONSTANT PLS_INTEGER := 4; HASH_SH384 CONSTANT PLS_INTEGER := 5; HASH_SH512 CONSTANT PLS_INTEGER := 6; you can use any hash algorithms. Of course, again, it is not easy to find all lob columns then add hash function around them and then compare data but thanks to SQL Macro, we can automate this: Oracle PL/SQL create or replace function sf_get_row_diff(p_table dbms_tf.table_t, p_table2 dbms_tf.table_t) return clob sql_macro as v_col clob; v_group_col clob; begin for i in 1..p_table.column.count loop if p_table.column(i).description.type in (dbms_tf.TYPE_clob, dbms_tf.TYPE_blob) then v_col := v_col || ',dbms_crypto.hash('||p_table.column(i).description.name || ',2) as ' ||p_table.column(i).description.name; else v_col := v_col || ','||p_table.column(i).description.name; end if; v_group_col := v_group_col || ','||p_table.column(i).description.name; end loop; v_col := ltrim(v_col , ','); v_group_col := ltrim(v_group_col, ','); return 'with t1 as (select 0 c1, 1 c2, ' || v_col ||' from p_Table), t2 as (select 1 c1, 0 c2, ' || v_col ||' from p_Table2) select ' || v_group_col ||' from ( select * from t1 union all select * from t2 ) group by ' || v_group_col ||' having sum(c1) <> sum(c2)'; end; / 1234567891011121314151617181920212223242526272829 create or replace function sf_get_row_diff(p_table dbms_tf.table_t, p_table2 dbms_tf.table_t) return clob sql_macro as v_col clob; v_group_col clob;begin for i in 1..p_table.column.count loop if p_table.column(i).description.type in (dbms_tf.TYPE_clob, dbms_tf.TYPE_blob) then v_col := v_col || ',dbms_crypto.hash('||p_table.column(i).description.name || ',2) as ' ||p_table.column(i).description.name; else v_col := v_col || ','||p_table.column(i).description.name; end if; v_group_col := v_group_col || ','||p_table.column(i).description.name; end loop; v_col := ltrim(v_col , ','); v_group_col := ltrim(v_group_col, ','); return 'with t1 as (select 0 c1, 1 c2, ' || v_col ||' from p_Table), t2 as (select 1 c1, 0 c2, ' || v_col ||' from p_Table2) select ' || v_group_col ||' from ( select * from t1 union all select * from t2 ) group by ' || v_group_col ||' having sum(c1) <> sum(c2)';end;/ This sql macro function will generate columns automatically and also add hash function for LOB columns. All we need to do is give two tables / common table expressions (like with clause aliases). on this sample code, I assume that both data sets / tables has same columns with same data type. before generating SQL statemen, I loop through the columns and if there are any CLOB or BLOB then I add “dbms_crypto.hash” function around them. I also use actual lob column name as alias to it. While generating group by columns, I cannot use dbms_crypto statement because I use it in a subselect. so I also have a second column list (v_group_col) which only stores column names and I use them in group by and select list. Oracle PL/SQL -- output suppressed select * from sf_get_row_diff(tmp1, tmp2); ... 123 -- output suppressedselect * from sf_get_row_diff(tmp1, tmp2);... Now I have all the different rows between two tables. As I said you can use common table expression too: Oracle PL/SQL -- output suppressed with t1 as (select a,b,c from mytable), t1 as (select a,b,c from yourtable), select * from sf_get_row_diff(t1,t2); ... 12345 -- output suppressedwith t1 as (select a,b,c from mytable), t1 as (select a,b,c from yourtable),select * from sf_get_row_diff(t1,t2);... even if it is fast, comparing lob columns could be very slow because big lobs means more time to generate hash and many rows means much much more time. so we might want to exclude LOB columns from comparison. A simple parameter could help: Oracle PL/SQL create or replace function sf_get_row_diff(p_table dbms_tf.table_t, p_table2 dbms_tf.table_t, p_compare_lobs number := 1) return clob sql_macro as v_col clob; v_group_col clob; begin for i in 1..p_table.column.count loop if p_table.column(i).description.type in (dbms_tf.TYPE_clob, dbms_tf.TYPE_blob) then if p_compare_lobs = 1 then v_col := v_col || ',dbms_crypto.hash('||p_table.column(i).description.name || ',2) as ' ||p_table.column(i).description.name; v_group_col := v_group_col || ','||p_table.column(i).description.name; end if; else v_col := v_col || ','||p_table.column(i).description.name; v_group_col := v_group_col || ','||p_table.column(i).description.name; end if; end loop; v_col := ltrim(v_col , ','); v_group_col := ltrim(v_group_col, ','); return 'with t1 as (select 0 c1, 1 c2, ' || v_col ||' from p_Table), t2 as (select 1 c1, 0 c2, ' || v_col ||' from p_Table2) select ' || v_group_col ||' from ( select * from t1 union all select * from t2 ) group by ' || v_group_col ||' having sum(c1) <> sum(c2)'; end; / 1234567891011121314151617181920212223242526272829303132 create or replace function sf_get_row_diff(p_table dbms_tf.table_t, p_table2 dbms_tf.table_t, p_compare_lobs number := 1) return clob sql_macro as v_col clob; v_group_col clob;begin for i in 1..p_table.column.count loop if p_table.column(i).description.type in (dbms_tf.TYPE_clob, dbms_tf.TYPE_blob) then if p_compare_lobs = 1 then v_col := v_col || ',dbms_crypto.hash('||p_table.column(i).description.name || ',2) as ' ||p_table.column(i).description.name; v_group_col := v_group_col || ','||p_table.column(i).description.name; end if; else v_col := v_col || ','||p_table.column(i).description.name; v_group_col := v_group_col || ','||p_table.column(i).description.name; end if; end loop; v_col := ltrim(v_col , ','); v_group_col := ltrim(v_group_col, ','); return 'with t1 as (select 0 c1, 1 c2, ' || v_col ||' from p_Table), t2 as (select 1 c1, 0 c2, ' || v_col ||' from p_Table2) select ' || v_group_col ||' from ( select * from t1 union all select * from t2 ) group by ' || v_group_col ||' having sum(c1) <> sum(c2)';end;/ if p_compare_lobs columns is 1 (which is default) then it will add lob columns into compare and if anything else, it will not add lob columns into compare. you can even understand this from the timing: Oracle PL/SQL select * from sf_get_row_diff(tmp1, tmp2, 1); ... Elapsed: 00:00:06.663 select * from sf_get_row_diff(tmp1, tmp2, 0); ... Elapsed: 00:00:02.145 1234567 select * from sf_get_row_diff(tmp1, tmp2, 1);...Elapsed: 00:00:06.663 select * from sf_get_row_diff(tmp1, tmp2, 0);...Elapsed: 00:00:02.145 So, from now on, this function is more than enough to compare two tables for me. We can also add an excluded column list too as a forth parameter, so we can remove those columns from compare like update_date or create_user columns which mostly contain metadata and not actual business data. I won’t create an example for it too but you can see an example on one of my previous posts: https://mustafakalayci.me/2020/11/02/which-column-has-changed-in-every-row/ Many thanks to Chris for bringing up the topic and provide effective solutions. I learned a lot and added something new to me. Wish you all healthy and happy days. EDIT: Ok, let’s not take the easy way out 🙂 here is another example based on my first design: Oracle PL/SQL with m1 as (select 1 a, 10 b, sysdate -2000 create_date from dual union all select 2 a, 20 b, sysdate -2000 create_date from dual ), m2 as (select 1 a, 10 b, sysdate +2000 create_date from dual union all select 2 a, 21 b, sysdate +2000 create_date from dual union all select 3 a, 30 b, sysdate +2000 create_date from dual ) select * from sf_get_row_diff(m1, m2, p_compare_lobs => 1); A B CREATE_DATE ---------- ---------- ------------------- 1 10 2029-04-18 11:27:58 1 10 2018-05-06 11:27:58 2 20 2018-05-06 11:27:58 2 21 2029-04-18 11:27:58 3 30 2029-04-18 11:27:58 123456789101112131415161718 with m1 as (select 1 a, 10 b, sysdate -2000 create_date from dual union all select 2 a, 20 b, sysdate -2000 create_date from dual ), m2 as (select 1 a, 10 b, sysdate +2000 create_date from dual union all select 2 a, 21 b, sysdate +2000 create_date from dual union all select 3 a, 30 b, sysdate +2000 create_date from dual )select *from sf_get_row_diff(m1, m2, p_compare_lobs => 1); A B CREATE_DATE ---------- ---------- ------------------- 1 10 2029-04-18 11:27:58 1 10 2018-05-06 11:27:58 2 20 2018-05-06 11:27:58 2 21 2029-04-18 11:27:58 3 30 2029-04-18 11:27:58 so, I want to compare two tables (I used CTE (common table expression) here) m1 and m2 and all rows are different because they have different create_date values. Sometimes we may not want to compare “all” columns for example create_Date might be different for the rows and we might want to exclude it. so how can I adapt this into my sql macro? Simply we can pass the columns that we don’t want to use. First we need to add a new parameter for excluded columns and then remove those columns from select/group by etc list. Oracle PL/SQL create or replace function sf_get_row_diff(p_table dbms_tf.table_t, p_table2 dbms_tf.table_t, p_compare_lobs number := 1, p_excluded_columns dbms_tf.columns_t := null) return clob sql_macro as v_col clob; v_group_col clob; begin for i in 1..p_table.column.count loop continue when p_table.column(i).description.name member of p_excluded_columns; if p_table.column(i).description.type in (dbms_tf.TYPE_clob, dbms_tf.TYPE_blob) then if p_compare_lobs = 1 then v_col := v_col || ',dbms_crypto.hash('||p_table.column(i).description.name || ',2) as ' ||p_table.column(i).description.name; v_group_col := v_group_col || ','||p_table.column(i).description.name; end if; else v_col := v_col || ','||p_table.column(i).description.name; v_group_col := v_group_col || ','||p_table.column(i).description.name; end if; end loop; v_col := ltrim(v_col , ','); v_group_col := ltrim(v_group_col, ','); return 'with t1 as (select 0 c1, 1 c2, ' || v_col ||' from p_Table), t2 as (select 1 c1, 0 c2, ' || v_col ||' from p_Table2) select ' || v_group_col ||' from ( select * from t1 union all select * from t2 ) group by ' || v_group_col ||' having sum(c1) <> sum(c2)'; end; / 123456789101112131415161718192021222324252627282930313233343536 create or replace function sf_get_row_diff(p_table dbms_tf.table_t, p_table2 dbms_tf.table_t, p_compare_lobs number := 1, p_excluded_columns dbms_tf.columns_t := null) return clob sql_macro as v_col clob; v_group_col clob;begin for i in 1..p_table.column.count loop continue when p_table.column(i).description.name member of p_excluded_columns; if p_table.column(i).description.type in (dbms_tf.TYPE_clob, dbms_tf.TYPE_blob) then if p_compare_lobs = 1 then v_col := v_col || ',dbms_crypto.hash('||p_table.column(i).description.name || ',2) as ' ||p_table.column(i).description.name; v_group_col := v_group_col || ','||p_table.column(i).description.name; end if; else v_col := v_col || ','||p_table.column(i).description.name; v_group_col := v_group_col || ','||p_table.column(i).description.name; end if; end loop; v_col := ltrim(v_col , ','); v_group_col := ltrim(v_group_col, ','); return 'with t1 as (select 0 c1, 1 c2, ' || v_col ||' from p_Table), t2 as (select 1 c1, 0 c2, ' || v_col ||' from p_Table2) select ' || v_group_col ||' from ( select * from t1 union all select * from t2 ) group by ' || v_group_col ||' having sum(c1) <> sum(c2)';end;/ I wrote whole code again but I only added 2 new lines, 3th and 10th lines. p_excluded_columns parameter will be the list of columns that we want to exclude from comparison. In the for loop which we generate the group by columns, I continue the loop if the current column is a member of columns array. let’s run the query again with excluded columns: Oracle PL/SQL with m1 as (select 1 a, 10 b, sysdate -2000 create_date from dual union all select 2 a, 20 b, sysdate -2000 create_date from dual ), m2 as (select 1 a, 10 b, sysdate +2000 create_date from dual union all select 2 a, 21 b, sysdate +2000 create_date from dual union all select 3 a, 30 b, sysdate +2000 create_date from dual ) select * from sf_get_row_diff(m1, m2, p_compare_lobs => 1, p_excluded_columns=>columns(create_date)); A B ---------- ---------- 2 20 3 30 2 21 with m1 as (select 1 a, 10 b, sysdate -2000 create_date from dual union all select 2 a, 20 b, sysdate -2000 create_date from dual ), m2 as (select 1 a, 10 b, sysdate +2000 create_date from dual union all select 2 a, 21 b, sysdate +2000 create_date from dual union all select 3 a, 30 b, sysdate +2000 create_date from dual ) select * from sf_get_row_diff(m1, m2, p_compare_lobs => 1, p_excluded_columns=>columns(create_date, b)); A ---------- 3 12345678910111213141516171819202122232425262728293031 with m1 as (select 1 a, 10 b, sysdate -2000 create_date from dual union all select 2 a, 20 b, sysdate -2000 create_date from dual ), m2 as (select 1 a, 10 b, sysdate +2000 create_date from dual union all select 2 a, 21 b, sysdate +2000 create_date from dual union all select 3 a, 30 b, sysdate +2000 create_date from dual )select *from sf_get_row_diff(m1, m2, p_compare_lobs => 1, p_excluded_columns=>columns(create_date)); A B---------- ---------- 2 20 3 30 2 21 with m1 as (select 1 a, 10 b, sysdate -2000 create_date from dual union all select 2 a, 20 b, sysdate -2000 create_date from dual ), m2 as (select 1 a, 10 b, sysdate +2000 create_date from dual union all select 2 a, 21 b, sysdate +2000 create_date from dual union all select 3 a, 30 b, sysdate +2000 create_date from dual )select *from sf_get_row_diff(m1, m2, p_compare_lobs => 1, p_excluded_columns=>columns(create_date, b)); A---------- 3 as you see, create_Date column is not included into comparison and row with a value 1 is not in the result because it is same on both tables when create_Date is excluded. quite nice but for the rows with a value 2, which one is on which table? for few rows it is easy to identify but if there are over a million rows in the tables and thousands are different, then it will be hard to understand which table they are in. Can we add a new column to show table? of course, we just need to change our sql template in return command. Oracle PL/SQL create or replace function sf_get_row_diff(p_table dbms_tf.table_t, p_table2 dbms_tf.table_t, p_compare_lobs number := 1, p_excluded_columns dbms_tf.columns_t := null) return clob sql_macro as v_col clob; v_group_col clob; begin for i in 1..p_table.column.count loop continue when p_table.column(i).description.name member of p_excluded_columns; if p_table.column(i).description.type in (dbms_tf.TYPE_clob, dbms_tf.TYPE_blob) then if p_compare_lobs = 1 then v_col := v_col || ',dbms_crypto.hash('||p_table.column(i).description.name || ',2) as ' ||p_table.column(i).description.name; v_group_col := v_group_col || ','||p_table.column(i).description.name; end if; else v_col := v_col || ','||p_table.column(i).description.name; v_group_col := v_group_col || ','||p_table.column(i).description.name; end if; end loop; v_col := ltrim(v_col , ','); v_group_col := ltrim(v_group_col, ','); return 'with t1 as (select 0 c1, 1 c2, ''First Table'' as tbl,' || v_col ||' from p_Table), t2 as (select 1 c1, 0 c2, ''Second Table'' as tbl,' || v_col ||' from p_Table2) select Min(Tbl) Which_Table, ' || v_group_col ||' from ( select * from t1 union all select * from t2 ) group by ' || v_group_col ||' having sum(c1) <> sum(c2)'; end; / with m1 as (select 1 a, 10 b, sysdate -2000 create_date from dual union all select 2 a, 20 b, sysdate -2000 create_date from dual ), m2 as (select 1 a, 10 b, sysdate +2000 create_date from dual union all select 2 a, 21 b, sysdate +2000 create_date from dual union all select 3 a, 30 b, sysdate +2000 create_date from dual ) select * from sf_get_row_diff(m1, m2, p_compare_lobs => 1, p_excluded_columns=>columns(create_date)); WHICH_TABLE A B ------------ ---------- ---------- First Table 2 20 Second Table 3 30 Second Table 2 21 1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253 create or replace function sf_get_row_diff(p_table dbms_tf.table_t, p_table2 dbms_tf.table_t, p_compare_lobs number := 1, p_excluded_columns dbms_tf.columns_t := null) return clob sql_macro as v_col clob; v_group_col clob;begin for i in 1..p_table.column.count loop continue when p_table.column(i).description.name member of p_excluded_columns; if p_table.column(i).description.type in (dbms_tf.TYPE_clob, dbms_tf.TYPE_blob) then if p_compare_lobs = 1 then v_col := v_col || ',dbms_crypto.hash('||p_table.column(i).description.name || ',2) as ' ||p_table.column(i).description.name; v_group_col := v_group_col || ','||p_table.column(i).description.name; end if; else v_col := v_col || ','||p_table.column(i).description.name; v_group_col := v_group_col || ','||p_table.column(i).description.name; end if; end loop; v_col := ltrim(v_col , ','); v_group_col := ltrim(v_group_col, ','); return 'with t1 as (select 0 c1, 1 c2, ''First Table'' as tbl,' || v_col ||' from p_Table), t2 as (select 1 c1, 0 c2, ''Second Table'' as tbl,' || v_col ||' from p_Table2) select Min(Tbl) Which_Table, ' || v_group_col ||' from ( select * from t1 union all select * from t2 ) group by ' || v_group_col ||' having sum(c1) <> sum(c2)';end;/ with m1 as (select 1 a, 10 b, sysdate -2000 create_date from dual union all select 2 a, 20 b, sysdate -2000 create_date from dual ), m2 as (select 1 a, 10 b, sysdate +2000 create_date from dual union all select 2 a, 21 b, sysdate +2000 create_date from dual union all select 3 a, 30 b, sysdate +2000 create_date from dual )select *from sf_get_row_diff(m1, m2, p_compare_lobs => 1, p_excluded_columns=>columns(create_date)); WHICH_TABLE A B------------ ---------- ----------First Table 2 20Second Table 3 30Second Table 2 21 I added a new column, in with clauses to be able to identify the tables. Now, we can see the which table that row is in on the first column. You may wonder why I didn’t write the actual table name on the column. because I couldn’t 🙂 varchar2 data is restricted in sql macro so I couldn’t read the table name from “p_table” parameter, so I decided to go with pre defined value. one more thing to add, what would happen if we exclude all the columns? we will get an error about wrong sql statement of course because select list, group by list will be empty. we can add a small check after the loop: Oracle PL/SQL ... end loop; if v_group_col is null then return 'select ''there are no columns left. choose at least one column'' from dual'; end if; v_col := ... ... 123456789 ... end loop; if v_group_col is null then return 'select ''there are no columns left. choose at least one column'' from dual'; end if; v_col := ... ... here is the result if we exclude all columns from now on: Oracle PL/SQL with m1 as (select 1 a, 10 b, 9999 c from dual union all select 2 a, 20 b, 9999 c from dual ), m2 as (select 1 a, 10 b, 5555 c from dual union all select 2 a, 21 b, 5555 c from dual union all select 3 a, 30 b, 5555 c from dual ) select * from sf_get_row_diff(m1, m2, p_compare_lobs => 1, p_excluded_columns => columns(a,c,b)); 'THEREARENOCOLUMNSLEFT.CHOOSEATLEASTONECOLUMN' ----------------------------------------------------- there are no columns left. choose at least one column 1234567891011121314 with m1 as (select 1 a, 10 b, 9999 c from dual union all select 2 a, 20 b, 9999 c from dual ), m2 as (select 1 a, 10 b, 5555 c from dual union all select 2 a, 21 b, 5555 c from dual union all select 3 a, 30 b, 5555 c from dual )select *from sf_get_row_diff(m1, m2, p_compare_lobs => 1, p_excluded_columns => columns(a,c,b)); 'THEREARENOCOLUMNSLEFT.CHOOSEATLEASTONECOLUMN'-----------------------------------------------------there are no columns left. choose at least one column thanks for reading, any comments and questions are welcomed. 19c 21c 23c Administration Development Performance compare two tablesdifference between two tablessql macrosql_macro