I couldn’t write anything for a while. Those days were busy. So I would talk to about direct path insert and conventional insert. Many people think that they are using “direct path insert” when used sqlldr utility but they are not. it is easy to confused batch insert and direct path insert. of course this is a big subject but I want to talk about the basics and how to use sqlldr to demonstrate it.
first of all, let’s discuss the basics.
- High Water Mark : This is a pointer which points end of the table! this is the boundary of used space. when you create a table (without a segment) and start to insert into it, Oracle will allocate new blocks and put your data into them. Oracle already knows the starting block because it is a fixed point when you created the segment but end of the table (last block) is not fixed. it can be change when new data arrived. So, Oracle uses a pointer which points end of this table called High Water Mark (HWM). as a simple example, “full table scan” means searching all blocks from starting block to HWM. if you truncate a table this means moving HWM to starting block. that is why truncate is really fast. it won’t delete rows, it will move HWM to starting point and don’t forget this is end of the table so you remove all rows automatically. of course rows are still there but they are not belong to this table anymore and data is obsolete, can be overwritten.
- Insert & Delete operations: when you insert a new row using “insert into values” command, Oracle will search for an “empty place” in table blocks. if it is found then this row will be inserted into that block. Oracle always try to use space efficiently. motto is; if there is empty space, use it. This is valid for “insert into … values (…)” command. sqlldr for example can act differently and I will discuss about it. Delete command will mark a rows as “deleted”. that means that place is considered as “empty” anymore and if a new row arrives (insert into values) this row can be written to “empty” place. if you delete all rows from a billion rows table and then select it, your select will take huge amount of time. Why? deleting will not remove blocks from table just mark rows as deleted. so if your table has 1million blocks, after deleting all rows, you will still have 1 million blocks. if you run “select * from table” at that point, as said above, a full table scan will be done and all blocks from starting block to HWM will search, as a result your select will search 1million blocks but couldn’t find any “not deleted” row. that is why it will take so long. if you truncate the table on the other hand, since it will move HWM to starting point, first truncate will complete really fast and after then when you run “select * from table”, it will complete in milliseconds because there is no million blocks, there is only starting point. In conclusion, delete cause empty spaces and when an Insert statement issued Oracle wants to use that empty space to eliminate unnecessary growth.
At this point you can say that, insert statements could take more time because Oracle searches for an “empty space”. you are right! oracle takes some time but not much of course, this is controlled by a bit map so it is easy to find an empty space but still there is a search.
What defined here is “conventional insert”. search for an empty space and insert it. At this point, “direct path insert” does what you think. don’t check empty spaces just add new data to the end of the table. “add data to the end of the table” means, new data won’t interact with table directly! Oracle will use “new blocks” to insert this data. those new blocks are free blocks which means they are not related to any other segment (table, index, cluster etc). data will be inserted directly to those new free blocks so Oracle won’t look for empty spaces in the table and those completely new blocks will be added to “end of the table”. this also means, HWM will be moved to end of those newly created blocks. this operation is of course much more faster especially if you are loading huge amount of data. you can “tell” oracle to do that via some hints like APPEND: insert /*+ append */ into table select * from ….; this date will be inserted using direct path insert and data will be added directly to end of this table (won’t use any empty space in table”. even if whole table is empty (all rows are deleted by delete command), Oracle will add new blocks for this table and table size will increase.
I wanted to show you how sqlldr tool does that and difference of performance. this is for my setup:
this is add 582.616 rows into tmp table on my local db. After inserting those rows, I just exported this data as csv (using sql developer) and save it as object.csv file.
As you see, table has 12.288 blocks which contains 582.616 rows. Now I am deleting all rows:
of course table still has 12.288 blocks because we just marked rows as deleted!
Let’s insert our previously exported object.csv file into this table using sqlldr. here is sqlldr.ctl file:
my sqlldr command is like this (I am using windows 10 for this demonstration):
all rows are inserted. first, let’s check table size:
there is no change on table size! because this is a “conventional” insert. please check first lines of sqlldr output just before “commit point reached” lines, it says: “Path used: Conventional”. this means look for empty spaces and insert new data into those empty spaces as we discussed earlier. so since I have deleted all rows before the insert, all table was empty and has enough space to get same rows again.
when I check my log file (conventional.log) sqlldr takes 11.20 seconds to insert all rows.
Let’s do it again with direct path insert this time. delete table and check size:
all rows are deleted and still have same size. here is the direct path insert:
as you see, now “Path used” says Direct. I did this by using “direct=true” parameter of sqlldr tool and same amount of rows are inserted. Let’s check table size now:
it almost doubled the size! why? because new data is added to end of the table, no empty space is not checked! we still have same number of rows in the table but also have a huge empty space in it. so what is the benefit here? Let’s check log file (direct.log) and see how long does it take to complete:
Elapsed time was: 00:00:02.83
2.83 seconds it is almost 4 times faster than conventional insert! here is the benefit.
sqlldr is a great tool if you use it correctly. of course it has a lot more options like parallelism, array size etc (I hope I will write about them too) but if you have a huge data to load into a table and empty spaces are not an issue then you can use direct path insert. only problem is “logging” option is important. if object is in “no logging” mode then it will even run faster but no redo log will be generated. this means you can not get your data from backups! if you have a failure after no logging direct path insert operation, you can not use your backups. you must backup immediately. of course your database administrator might put your database into “Force Logging” mode, then it is not important whether your object is logging or nologging mode. database will log it anyway and your data will be safe in backups.
so use it carefully. I hope this helps.
wish you healthy, corona free days.