Direct Path Insert &/vs Conventional Insert

Hi there,

I couldn’t write anything for a while. Those days were busy. So I would talk to about direct path insert and conventional insert. Many people think that they are using “direct path insert” when used sqlldr utility but they are not. it is easy to confused batch insert and direct path insert. of course this is a big subject but I want to talk about the basics and how to use sqlldr to demonstrate it.

first of all, let’s discuss the basics.

High Water Mark : This is a pointer which points end of the table! this is the boundary of used space. when you create a table (without a segment) and start to insert into it, Oracle will allocate new blocks and put your data into them. Oracle already knows the starting block because it is a fixed point when you created the segment but end of the table (last block) is not fixed. it can be change when new data arrived. So, Oracle uses a pointer which points end of this table called High Water Mark (HWM). as a simple example, “full table scan” means searching all blocks from starting block to HWM. if you truncate a table this means moving HWM to starting block. that is why truncate is really fast. it won’t delete rows, it will move HWM to starting point and don’t forget this is end of the table so you remove all rows automatically. of course rows are still there but they are not belong to this table anymore and data is obsolete, can be overwritten.
Insert & Delete operations: when you insert a new row using “insert into values” command, Oracle will search for an “empty place” in table blocks. if it is found then this row will be inserted into that block. Oracle always try to use space efficiently. motto is; if there is empty space, use it. This is valid for “insert into … values (…)” command. sqlldr for example can act differently and I will discuss about it. Delete command will mark a rows as “deleted”. that means that place is considered as “empty” anymore and if a new row arrives (insert into values) this row can be written to “empty” place. if you delete all rows from a billion rows table and then select it, your select will take huge amount of time. Why? deleting will not remove blocks from table just mark rows as deleted. so if your table has 1million blocks, after deleting all rows, you will still have 1 million blocks. if you run “select * from table” at that point, as said above, a full table scan will be done and all blocks from starting block to HWM will search, as a result your select will search 1million blocks but couldn’t find any “not deleted” row. that is why it will take so long. if you truncate the table on the other hand, since it will move HWM to starting point, first truncate will complete really fast and after then when you run “select * from table”, it will complete in milliseconds because there is no million blocks, there is only starting point. In conclusion, delete cause empty spaces and when an Insert statement issued Oracle wants to use that empty space to eliminate unnecessary growth.

At this point you can say that, insert statements could take more time because Oracle searches for an “empty space”. you are right! oracle takes some time but not much of course, this is controlled by a bit map so it is easy to find an empty space but still there is a search.

What defined here is “conventional insert”. search for an empty space and insert it. At this point, “direct path insert” does what you think. don’t check empty spaces just add new data to the end of the table. “add data to the end of the table” means, new data won’t interact with table directly! Oracle will use “new blocks” to insert this data. those new blocks are free blocks which means they are not related to any other segment (table, index, cluster etc). data will be inserted directly to those new free blocks so Oracle won’t look for empty spaces in the table and those completely new blocks will be added to “end of the table”. this also means, HWM will be moved to end of those newly created blocks. this operation is of course much more faster especially if you are loading huge amount of data. you can “tell” oracle to do that via some hints like APPEND: insert /*+ append */ into table select * from ….; this date will be inserted using direct path insert and data will be added directly to end of this table (won’t use any empty space in table”. even if whole table is empty (all rows are deleted by delete command), Oracle will add new blocks for this table and table size will increase.

I wanted to show you how sqlldr tool does that and difference of performance. this is for my setup:

drop table tmp;
create table tmp as select* from dba_objects;

insert into tmp select * from tmp;
insert into tmp select * from tmp;
insert into tmp select * from tmp;
commit;

select bytes,blocks from dba_Segments where segment_name ='TMP';
     BYTES     BLOCKS 
---------- ---------- 
 100663296      12288

drop table tmp;

create table tmp as select* from dba_objects;

insert into tmp select * from tmp;

commit;

select bytes,blocks from dba_Segments where segment_name ='TMP';

BYTES BLOCKS

---------- ----------

100663296 12288

this is add 582.616 rows into tmp table on my local db. After inserting those rows, I just exported this data as csv (using sql developer) and save it as object.csv file.

As you see, table has 12.288 blocks which contains 582.616 rows. Now I am deleting all rows:

delete tmp;
582,616 rows deleted
commit;

select bytes,blocks from dba_Segments where segment_name ='TMP'; 
BYTES      BLOCKS 
---------- ---------- 
100663296  12288

delete tmp;

582,616 rows deleted

commit;

select bytes,blocks from dba_Segments where segment_name ='TMP';

BYTES BLOCKS

---------- ----------

100663296 12288

of course table still has 12.288 blocks because we just marked rows as deleted!

Let’s insert our previously exported object.csv file into this table using sqlldr. here is sqlldr.ctl file:

LOAD DATA 
INFILE object.csv 

Append INTO TABLE tmp
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"' trailing nullcols

(OWNER, OBJECT_NAME, SUBOBJECT_NAME, OBJECT_ID, DATA_OBJECT_ID, OBJECT_TYPE, CREATED DATE'DD/MM/YYYY HH24:MI:SS', LAST_DDL_TIME DATE'DD/MM/YYYY HH24:MI:SS', TIMESTAMP, STATUS, TEMPORARY, 
GENERATED, SECONDARY, NAMESPACE, EDITION_NAME, SHARING, EDITIONABLE, ORACLE_MAINTAINED, APPLICATION, DEFAULT_COLLATION, DUPLICATED, 
SHARDED, CREATED_APPID, CREATED_VSNID, MODIFIED_APPID, MODIFIED_VSNID)

LOAD DATA

INFILE object.csv

Append INTO TABLE tmp

FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"' trailing nullcols

(OWNER, OBJECT_NAME, SUBOBJECT_NAME, OBJECT_ID, DATA_OBJECT_ID, OBJECT_TYPE, CREATED DATE'DD/MM/YYYY HH24:MI:SS', LAST_DDL_TIME DATE'DD/MM/YYYY HH24:MI:SS', TIMESTAMP, STATUS, TEMPORARY,

GENERATED, SECONDARY, NAMESPACE, EDITION_NAME, SHARING, EDITIONABLE, ORACLE_MAINTAINED, APPLICATION, DEFAULT_COLLATION, DUPLICATED,

SHARDED, CREATED_APPID, CREATED_VSNID, MODIFIED_APPID, MODIFIED_VSNID)

my sqlldr command is like this (I am using windows 10 for this demonstration):

sqlldr mustafa/mustafa control=sqlldr.ctl log=conventional.log
output:

SQL*Loader: Release 19.0.0.0.0 - Production on Tue Oct 27 17:30:39 2020
Version 19.6.0.0.0

Copyright (c) 1982, 2020, Oracle and/or its affiliates.  All rights reserved.

Path used:      Conventional
Commit point reached - logical record count 156
Commit point reached - logical record count 312
...
...
Table TMP:
  582616 Rows successfully loaded.

Check the log file:
  conventional.log
for more information about the load.

sqlldr mustafa/mustafa control=sqlldr.ctl log=conventional.log

output:

SQL*Loader: Release 19.0.0.0.0 - Production on Tue Oct 27 17:30:39 2020

Version 19.6.0.0.0

Path used: Conventional

Commit point reached - logical record count 156

Commit point reached - logical record count 312

...

Table TMP:

582616 Rows successfully loaded.

Check the log file:

conventional.log

for more information about the load.

all rows are inserted. first, let’s check table size:

select bytes,blocks from dba_Segments where segment_name ='TMP';
     BYTES     BLOCKS 
---------- ---------- 
 100663296      12288

select bytes,blocks from dba_Segments where segment_name ='TMP';

BYTES BLOCKS

---------- ----------

100663296 12288

there is no change on table size! because this is a “conventional” insert. please check first lines of sqlldr output just before “commit point reached” lines, it says: “Path used: Conventional”. this means look for empty spaces and insert new data into those empty spaces as we discussed earlier. so since I have deleted all rows before the insert, all table was empty and has enough space to get same rows again.

when I check my log file (conventional.log) sqlldr takes 11.20 seconds to insert all rows.

Let’s do it again with direct path insert this time. delete table and check size:

delete tmp;
commit;
select bytes,blocks from dba_Segments where segment_name ='TMP';
     BYTES     BLOCKS 
---------- ---------- 
 100663296      12288

delete tmp;

commit;

select bytes,blocks from dba_Segments where segment_name ='TMP';

BYTES BLOCKS

---------- ----------

100663296 12288

all rows are deleted and still have same size. here is the direct path insert:

sqlldr mustafa/mustafa control=sqlldr.ctl log=direct.log direct=true
output:
SQL*Loader: Release 19.0.0.0.0 - Production on Tue Oct 27 17:47:02 2020
Version 19.6.0.0.0

Copyright (c) 1982, 2020, Oracle and/or its affiliates.  All rights reserved.

Path used:      Direct

Load completed - logical record count 582616.

Table TMP:
  582616 Rows successfully loaded.

Check the log file:
  direct.log
for more information about the load.

sqlldr mustafa/mustafa control=sqlldr.ctl log=direct.log direct=true

output:

SQL*Loader: Release 19.0.0.0.0 - Production on Tue Oct 27 17:47:02 2020

Version 19.6.0.0.0

Path used: Direct

Load completed - logical record count 582616.

Table TMP:

582616 Rows successfully loaded.

Check the log file:

direct.log

for more information about the load.

as you see, now “Path used” says Direct. I did this by using “direct=true” parameter of sqlldr tool and same amount of rows are inserted. Let’s check table size now:

select bytes,blocks from dba_Segments where segment_name ='TMP';
     BYTES     BLOCKS
---------- ----------
 201326592      24576

select bytes,blocks from dba_Segments where segment_name ='TMP';

BYTES BLOCKS

---------- ----------

201326592 24576

it almost doubled the size! why? because new data is added to end of the table, no empty space is not checked! we still have same number of rows in the table but also have a huge empty space in it. so what is the benefit here? Let’s check log file (direct.log) and see how long does it take to complete:

Elapsed time was: 00:00:02.83

2.83 seconds it is almost 4 times faster than conventional insert! here is the benefit.

sqlldr is a great tool if you use it correctly. of course it has a lot more options like parallelism, array size etc (I hope I will write about them too) but if you have a huge data to load into a table and empty spaces are not an issue then you can use direct path insert. only problem is “logging” option is important. if object is in “no logging” mode then it will even run faster but no redo log will be generated. this means you can not get your data from backups! if you have a failure after no logging direct path insert operation, you can not use your backups. you must backup immediately. of course your database administrator might put your database into “Force Logging” mode, then it is not important whether your object is logging or nologging mode. database will log it anyway and your data will be safe in backups.

so use it carefully. I hope this helps.

wish you healthy, corona free days.

Leave a Reply Cancel reply